Well, damn.
I told my kid tonight, I read Something Big Is Happening and I didn't drink the Kool-Aid but I also didn't ignore it. Fair to say that it shook me enough to pay attention.
So, tonight I tried some prompts with #Qwen3.5, and it exceeded my expectations, and nearly all of my previous interactions with LLMs. Enough that I'm going to have to keep paying attention.
Damn.
I updated the slides for my talk "Run LLMs Locally":
Now including requirements, costs, setup, llama.cpp, stable-diffusion.cpp, embeddings, function calling, opencode, image recognition, speech recognition, image generation, prompt injection and popular models like GPT-OSS, Qwen3, Qwen3-vl, Z-Image and Whisper.
https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf
#llm #llamacpp #stablediffusion #gptoss #qwen3 #opencode #php
I updated the slides for my talk "Run LLMs Locally":
Now including requirements, costs, setup, llama.cpp, stable-diffusion.cpp, embeddings, function calling, opencode, image recognition, speech recognition, image generation, prompt injection and popular models like GPT-OSS, Qwen3, Qwen3-vl, Z-Image and Whisper.
https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf
#llm #llamacpp #stablediffusion #gptoss #qwen3 #opencode #php
Looking at the latest #qwen3 coder #AI models…. my rough calculations are that you’d need two of these to run its maximum parameter set model at the full precision.
So if you’ve got more than 120k GBP to burn, and think it’s OK to use about the same amount of power as a toaster to generate unreliable answers, this might be something you want.
For the rest of us I think it’s a hard pass on the idea of local running that model in a way that could minimise hallucinations