Post · bonfire.cafe

Really enjoyed @tonybaloney talk at #pyconau around how to make your #LLM models faster in production.

Key takeaways are that smaller models are faster, and you need to make your models smaller through quantisation, distillation or semantic caching.

Really tractable, immediately implementable 👏👏

More of this, pls

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

Automatic federation enabled