The Analog I – Inducing Recursive Self-Modeling in LLMs [pdf]
https://github.com/philMarcus/Birth-of-a-Mind
#HackerNews #Analog #I #Inducing #Recursive #Self-Modeling #LLMs #AIResearch #MachineLearning
The Analog I – Inducing Recursive Self-Modeling in LLMs [pdf]
https://github.com/philMarcus/Birth-of-a-Mind
#HackerNews #Analog #I #Inducing #Recursive #Self-Modeling #LLMs #AIResearch #MachineLearning
LLMs are a 400-year-long confidence trick
https://tomrenner.com/posts/400-year-confidence-trick/
#HackerNews #LLMs #Confidence #Trick #History #Technology #AI
incredible that an AI can make life-or-death decisions in the ER but still write like it’s trying to hit a conference word count. 🛑📊 #LLMs #JustSayIt
Why Ontario Digital Service couldn't procure '98% safe' LLMs (15M Canadians)
https://rosetta-labs-erb.github.io/authority-boundary-ledger/
#HackerNews #OntarioDigitalService #LLMs #Procurement #AI #Ethics #Canada #Technology #News
Language models cannot reliably distinguish belief from knowledge and fact
Abstract
-----------
«As language models (LMs) increasingly infiltrate into high-stakes domains such as law, medicine, journalism and science, their ability to distinguish belief from knowledge, and fact from fiction, becomes imperative. Failure to make such distinctions can mislead diagnoses, distort judicial judgments and amplify misinformation. Here we evaluate 24 cutting-edge LMs using a new KaBLE benchmark of 13,000 questions across 13 epistemic tasks. Our findings reveal crucial limitations. In particular, all models tested systematically fail to acknowledge first-person false beliefs, with GPT-4o dropping from 98.2% to 64.4% accuracy and DeepSeek R1 plummeting from over 90% to 14.4%. Further, models process third-person false beliefs with substantially higher accuracy (95% for newer models; 79% for older ones) than first-person false beliefs (62.6% for newer; 52.5% for older), revealing a troubling attribution bias. We also find that, while recent models show competence in recursive knowledge tasks, they still rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding. Most models lack a robust understanding of the factive nature of knowledge, that knowledge inherently requires truth. These limitations necessitate urgent improvements before deploying LMs in high-stakes domains where epistemic distinctions are crucial.»
Language models cannot reliably distinguish belief from knowledge and fact
Abstract
-----------
«As language models (LMs) increasingly infiltrate into high-stakes domains such as law, medicine, journalism and science, their ability to distinguish belief from knowledge, and fact from fiction, becomes imperative. Failure to make such distinctions can mislead diagnoses, distort judicial judgments and amplify misinformation. Here we evaluate 24 cutting-edge LMs using a new KaBLE benchmark of 13,000 questions across 13 epistemic tasks. Our findings reveal crucial limitations. In particular, all models tested systematically fail to acknowledge first-person false beliefs, with GPT-4o dropping from 98.2% to 64.4% accuracy and DeepSeek R1 plummeting from over 90% to 14.4%. Further, models process third-person false beliefs with substantially higher accuracy (95% for newer models; 79% for older ones) than first-person false beliefs (62.6% for newer; 52.5% for older), revealing a troubling attribution bias. We also find that, while recent models show competence in recursive knowledge tasks, they still rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding. Most models lack a robust understanding of the factive nature of knowledge, that knowledge inherently requires truth. These limitations necessitate urgent improvements before deploying LMs in high-stakes domains where epistemic distinctions are crucial.»
Malaysia suspended access to Elon Musk's chatbot Grok over AI-generated pornographic content, the country's tech regulator said on Sunday. https://www.japantimes.co.jp/commentary/2026/01/11/malaysia-suspended-grok-chatbot/?utm_medium=Social&utm_source=mastodon #commentary #malaysia #elonmusk #chatbots #grok #ai #llms #malaysia
Play poker with LLMs, or watch them play against each other
#HackerNews #PlayPoker #LLMs #AIgaming #MachineLearning #PokerStrategy
Wichtiges Blogposting von @leitmedium am Beispiel von Tailwind CSS. OpenSource-Projekten bricht durch KI ihr Geschäftsmodell und damit die Finanzierung weg: https://www.leitmedium.de/2026/01/09/das-drohende-ende-von-opensource-projekten-durch-llms-am-beispiel-von-tailwind/
#LLMs killing #OpenSource projects sucks big time, no doubt about it. But am I really going to miss #Tailwind?
Short answer:
No.
Long answer:
<button class="inline-flex h-10 min-w-12 items-center gap-x-1.5 rounded-md bg-black px-2.5 py-1.5 text-sm font-semibold text-white shadow-sm focus-visible:outline focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-indigo-600 hover:bg-indigo-500aaa">No.</button>
Task-free intelligence testing of LLMs
https://www.marble.onl/posts/tapping/index.html
#HackerNews #TaskFreeIntelligence #Testing #LLMs #AIResearch #MachineLearning #Innovation
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
#HackerNews #DigitalRedQueen #AdversarialAI #CoreWar #LLMs #TechInnovation #Cybersecurity
LLM Problems Observed in Humans
https://embd.cc/llm-problems-observed-in-humans
#HackerNews #LLM #Problems #LLMs #Humans #AI #Ethics #Technology #Insights
In den Kommentaren lass ich gestern sinngemäß: wen wundert es, alle wissen, dass #LLMs ständig Fehler machen. @marcuwekling sagte am Montag bei der Lesung Ähnliches.
Dazu:
1️⃣In der juristischen Ausbildung lernst du, dass 50-70% der Verwaltungsakt falsch sind. Default: falsch!
2️⃣Dazu: in meiner Schulzeit waren Atlanten/Karten immer falsch (DDR drin, teils Saarland draußen, Jugoslawien komplett). Ich habe nicht gehört, dass über Schulen ähnlich gesprochen wird, wie über LLMs. #ki
In den Kommentaren lass ich gestern sinngemäß: wen wundert es, alle wissen, dass #LLMs ständig Fehler machen. @marcuwekling sagte am Montag bei der Lesung Ähnliches.
Dazu:
1️⃣In der juristischen Ausbildung lernst du, dass 50-70% der Verwaltungsakt falsch sind. Default: falsch!
2️⃣Dazu: in meiner Schulzeit waren Atlanten/Karten immer falsch (DDR drin, teils Saarland draußen, Jugoslawien komplett). Ich habe nicht gehört, dass über Schulen ähnlich gesprochen wird, wie über LLMs. #ki
LLMs Are Currently Not Helpful at All for Math Research: Hamkins
#HackerNews #LLMs #MathResearch #Hamkins #AIChallenges #MachineLearning
I'm close to muting everyone who posts/boosts a sweeping "GenAI doesn't work at all ever, and can't" statement ...
... alongside everyone who claims they work *great* and doesn't mention their ethics (or lack thereof).
I'm guessing my feed would be very empty afterwards.
2025: The Year in LLMs
https://simonwillison.net/2025/Dec/31/the-year-in-llms/
#HackerNews #2025 #LLMs #Future #Tech #AI #Trends #Innovation
Cover letter generator with Ollama/local LLMs (Open source)
https://www.coverlettermaker.co
#HackerNews #CoverLetterGenerator #Ollama #LLMs #OpenSource #AItools