Post · bonfire.cafe

@tante The "Open Source AI Definition" was the final straw for me to recognize OSI for what they are (at least have become): a capitalist apologist shill group.

SebasFC

@SebasFC@mastodont.cat · 2 months ago

@tante hey @pallenberg dieser Post von Tante hat mich sehr an deiner Episode mit Don im Techlounge Podcast erinnert.

Gibt es oder gibt es keine Open Source AI?

Sascha Pallenberg 🇹🇼 ♻️ ⚡

@pallenberg@mastodon.social · 2 months ago

@SebasFC nach dieser Definition: nein!

Davon abgesehen, dass massenhaft Trainingsdaten der Modelle mit alles andere als freien Lizenzen versehen waren.

Ob man es eher Public Domain, was ja vor der Freeware Bewegung, vor allen Dingen in den 80er recht beliebt war, nennen mag.. keine Ahnung!

Tatsache ist aber auch, dass es nen Unterschied zwischen der chin. Strategie & der von OpenAI, Anthropic, Google & Co gibt.

Die ballern ihre aktuellen Foundation Modelle zum DL raus. Die US-Anbieter nicht!

SebasFC

@SebasFC@mastodont.cat · 2 months ago

@pallenberg "...too big to collect with care..."

https://dair-community.social/@emilymbender/116109627131276897

SebasFC

@SebasFC@mastodont.cat · 2 months ago

@pallenberg "...too big to collect with care..."

https://dair-community.social/@emilymbender/116109627131276897

Hippo 🍉

@badrihippo@fosstodon.org · 2 months ago

@tante thanks for writing this. Actually, just reading your caption ("'Open Source AI' does not meaningfully exist. It's just openwashing proprietary shit") was a clarifying moment because I was wondering how practical a truly Open Source LLM (all the "sources", including the entire training data, bundled together into one big repo) would be

And then I realised: it's not about being "practical". The definition's job is to set the standard, and reaching that or not is the implementer's problem

Hippo 🍉

@badrihippo@fosstodon.org · 2 months ago

@tante also, if people did want to make LLMs (or other models) up to those standards, they would—by creating or relicencing datasets, etc. It would be a humongous effort, of course, but nobody claimed earning your own things was *less* effort than stealing somebody else's

Also, this means we don't really *need* a separate definition specifically for LLMs. We can just use the same standards we've always used: full sources, including code and training data and everything 📦

tante

@tante@tldr.nettime.org · 2 months ago

@badrihippo exactly. Having a specific other definition for "AI" only serves the goal of watering down standards

Thomas Sandmann

@thomas_sandmann@genomic.social · 2 months ago

@tante Curious, what do you think of apertus: https://www.swiss-ai.org/apertus ? The Swiss seem to be making a meaningful attempt? "Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins." (I haven't used it myself.)

Swiss AI

Apertus | Swiss AI

heckj

@heckj@mastodon.social · 2 months ago

@tante @colincornaby Have you looked at https://allenai.org/olmo? For most of the "open weight" models, I'd completely agree - but the Olmo3 work in particular exposes all of the training data as well, which I read as one of the core arguments in that piece. They not only share and show their data, they discuss - in quite some detail - their training processes, including experiments on the pros and cons for techniques on relatively weaker models.

If you haven't seen it, it's very worth looking.

Olmo from Ai2

Our fully open language model and complete model flow.

indyradio

@indyradio@kafeneio.social · 2 months ago

@tante correct. burn the planet down from every desktop, now get to it

Toni Aittoniemi

@gimulnautti@mastodon.green · 2 months ago

@tante There’s Mistral. They have models that have open training data. 🤔

Fritz Adalis

@FritzAdalis@infosec.exchange · 2 months ago

@tante @aburka
Now always closed open - but minded

Edward

@yugthebug@mastodon.social · 2 months ago

@tante when ai is open source they just mean its proprietry and not on the cloud

GhostOnTheHalfShell

@GhostOnTheHalfShell@masto.ai · 2 months ago

@tante

The thing that is giving me the greatest joy this morning when I woke up was watching Chris Noland and his wife discuss how people are openly rejecting data centers, and they show a short clip of people in the streets, breaking out in cries of joy when one person announced that the data center project had been rejected from their city

grrl_aex

@kitkat_blue@mastodon.social · 2 months ago

@tante

cont'd

"Even with open-source AI, you still need huge amounts of data, labor, and infrastructure. They don’t challenge the concentration that includes distribution networks, economies of scale, entrenched reach, the ability to define the tooling and the standards, and so on. Claiming they do these things confuses and distracts us from the type of solutions we need."

/2 /end

grrl_aex

@kitkat_blue@mastodon.social · 2 months ago

@tante

Meredith Whittaker explains *why* open source ai is simply a masquerade:

https://ainowinstitute.org/publications/open-source

The key novelty of the current AI moment is the presence of concentrated amounts of data that had not been available before, and powerful distributed computational systems to process that data to train and perform inference on AI models.

1/

AI Now Institute

Open Source

The 2026 AI Impact Summit in India is the latest iteration of an event that has become a bellwether for global discourse around the AI industry, especially the question of whether, and how, it can be governed. But it also demonstrates how important ideas can be invoked in ways that dilute their meaning or co-opt their force. In this series—produced by AI Now Institute, Aapti Institute, and The Maybe—we bring together leading advocates, builders, and thinkers from around the world who live and breathe substance, analysis, and meaningful action into these ideas.

Openhuman

@Openhuman@mastodon.online · 2 months ago

@tante olmoe by Allen.ai and some Firefox things

ɩɐɥɔɐɿɐɯ

@malachai@furry.engineer · 2 months ago

@tante OMG thank you. "I run my model locally" has become the everlasting thoughtstopper for any #AntiAI comment. I'm becoming extremely irritated with that response.

tante

@tante@tldr.nettime.org · 2 months ago

(I know there are niche attempts that work even worse than all the other models)

Athanasius

@AthanSpod@social.linux.pizza · 2 months ago

@tante Ah, so you're counting https://www.swiss-ai.org/apertus in "nice attempt, but ..." ?

Swiss AI

Apertus | Swiss AI

tante

@tante@tldr.nettime.org · 2 months ago

@AthanSpod yes.

Athanasius

@AthanSpod@social.linux.pizza · 2 months ago

@tante Fair enough.

I can see from their white paper that whilst they're being really very transparent... any duplication would still need to do all the scraping and cleaning of data themselves.

But, *given* that proviso, it does seem like one of, if not the, most open attempt. It's a hard problem, but they've certainly tried to be very ethical about it.

DJGummikuh

@DJGummikuh@mastodon.social · 2 months ago

@tante Open Source LLMs do not exist. I refuse to limit the definition of "AI" to only GenAI LLM Nonsense. And on the ML side you have a lot of OSS