Discussion

I looked at just one of those multi-agent-systems, called MAS-ZERO. In its minimal setting, it dispatches 5 concurrent queries to e.g. gpt-5 and loops that 10 times.
Every one of these queries is turned into tens of thousands of tokens by the system prompt, and routinely dispatches secondary queries to 5 other models. So if your initial query was 100 tokens, this turns it into 10,000 x 5 x 50 tokens, so several million.
And that is without even going to "reasoning"
(2/3)

#FrugalComputing

Wim🧮

@wim_v12e@scholar.social replied · 2 months ago

#FrugalComputing

Wim🧮

@wim_v12e@scholar.social replied · 2 months ago

With reasoning, there is yet another explosion of at least 10x, often 100x.
Which takes us rapidly into tens to hundreds of millions of tokens for a single query that started out as a few hundred tokens.
This is why Google now processes quadrillions of tokens monthly.
(3/3)
#FrugalComputing #AgenticAI #GenAI

Wim🧮

@wim_v12e@scholar.social replied · 2 months ago

Let's for a moment assume Google's figure on the energy consumption of their median LLM query is accurate (*) and we very leniently assume that this unspecified median is 100 tokens (likely it is much shorter). Then with 1.3e15 tokens/month we end up at 31 TWh/year. If the median was 10 tokens this would be 310 TWh/year. So let's take the geometric average:

100 TWh/year

(4/3)
#FrugalComputing
(*) 🤣