#Tag · bonfire.cafe

I updated the slides for my talk "Run LLMs Locally":

Now including requirements, costs, setup, llama.cpp, stable-diffusion.cpp, embeddings, function calling, opencode, image recognition, speech recognition, image generation, prompt injection and popular models like GPT-OSS, Qwen3, Qwen3-vl, Z-Image and Whisper.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #stablediffusion #gptoss #qwen3 #opencode #php

View (PDF)

Thomas

@thbley@phpc.social · last month

I updated the slides for my talk "Run LLMs Locally":

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #stablediffusion #gptoss #qwen3 #opencode #php

View (PDF)

Al Sutton

@alsutton@alsutton.social · 7 months ago

Looking at the latest #qwen3 coder #AI models…. my rough calculations are that you’d need two of these to run its maximum parameter set model at the full precision.

So if you’ve got more than 120k GBP to burn, and think it’s OK to use about the same amount of power as a toaster to generate unreliable answers, this might be something you want.

For the rest of us I think it’s a hard pass on the idea of local running that model in a way that could minimise hallucinations

https://uk.insight.com/en_GB/shop/product/4X67A97315/lenovo/4X67A97315/NVIDIA-H200-NVL-GPU-computing-processor-H200-Tensor-Core-141-GB/