Discussion
Loading...

#Tag

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 2 days ago

GEN-0 / Embodied Foundation Models That Scale with Physical Interaction

https://generalistai.com/blog/nov-04-2025-GEN-0

#HackerNews #GEN0 #EmbodiedAI #FoundationModels #PhysicalInteraction #AIresearch

  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 3 days ago

Why Fei-Fei Li and Yann LeCun Are Both Betting on "World Models"

https://entropytown.com/articles/2025-11-13-world-model-lecun-feifei-li/

#HackerNews #FeiFeiLi #YannLeCun #WorldModels #AIResearch #MachineLearning

entropytown

Why Fei-Fei Li and Yann LeCun Are Both Betting on “World Models” — and How Their Bets Differ | entropytown

Gaussian splats, JEPA and Genie 3 — and why “world model” now means three different things at once.
  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp last week

Reverse engineering a neural network's clever solution to binary addition (2023)

https://cprimozic.net/blog/reverse-engineering-a-small-neural-network/

#HackerNews #ReverseEngineering #NeuralNetworks #BinaryAddition #AIResearch #2023 #Insights

  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp last week

From Memorization to Reasoning in the Spectrum of Loss Curvature

https://arxiv.org/abs/2510.24256

#HackerNews #Memorization #Reasoning #LossCurvature #MachineLearning #AIResearch

  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp last week

TabPFN-2.5 – SOTA foundation model for tabular data

https://priorlabs.ai/technical-reports/tabpfn-2-5-model-report

#HackerNews #TabPFN2.5 #SOTA #TabularData #FoundationModel #AIResearch

Prior Labs

  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 2 weeks ago

The Smol Training Playbook: The Secrets to Building World-Class LLMs

https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook

#HackerNews #SmolTrainingPlaybook #LLMs #AIResearch #MachineLearning #HuggingFace

The Smol Training Playbook: The Secrets to Building World-Class LLMs - a Hugging Face Space by HuggingFaceTB

Discover amazing ML apps made by the community
  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 2 weeks ago

Reasoning Models Reason Well, Until They Don't

https://arxiv.org/abs/2510.22371

#HackerNews #ReasoningModels #ReasonWell #AIResearch #MachineLearning #HackerNews

arXiv.org

Reasoning Models Reason Well, Until They Don't

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings through the lens of large reasoning models (LRMs) -- LLMs fine-tuned with incentives for step-by-step argumentation and self-verification. LRM performance on graph and reasoning benchmarks such as NLGraph seem extraordinary, with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law. However, by more carefully scaling the complexity of reasoning problems, we show existing benchmarks actually have limited complexity. We develop a new dataset, the Deep Reasoning Dataset (DeepRD), along with a generative process for producing unlimited examples of scalable complexity. We use this dataset to evaluate model performance on graph connectivity and natural language proof planning. We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. We also relate our LRM results to the distributions of the complexities of large, real-world knowledge graphs, interaction graphs, and proof datasets. We find the majority of real-world examples fall inside the LRMs' success regime, yet the long tails expose substantial failure potential. Our analysis highlights the near-term utility of LRMs while underscoring the need for new methods that generalize beyond the complexity of examples in the training distribution.
  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 2 weeks ago

Language Models Are Injective and Hence Invertible

https://arxiv.org/abs/2510.15511

#HackerNews #LanguageModels #Invertibility #AIResearch #NaturalLanguageProcessing #MachineLearning

arXiv.org

Language Models are Injective and Hence Invertible

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.
  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 3 weeks ago

Cursor Composer: Building a fast frontier model with RL

https://cursor.com/blog/composer

#HackerNews #CursorComposer #FastFrontier #RLModel #AIResearch

  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 3 weeks ago

The Continual Learning Problem

https://jessylin.com/2025/10/20/continual-learning/

#HackerNews #ContinualLearning #MachineLearning #AIResearch #EducationTech

The Continual Learning Problem

  • Copy link
  • Flag this post
  • Block
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 3 weeks ago

Artificial Writing and Automated Detection [pdf]

https://www.nber.org/system/files/working_papers/w34223/w34223.pdf

#HackerNews #ArtificialWriting #AutomatedDetection #AIResearch #NBER #PDF

  • Copy link
  • Flag this post
  • Block
Greg Lloyd boosted
Hostvix
@stacksize@mastodon.social  ·  activity timestamp 2 months ago

🚨 Ex-OpenAI CTO Mira Murati just launched Tinker, a new service from Thinking Machines Lab.

Tinker strips AI training down to 4 simple functions — you focus on data + algorithms, it handles the GPU chaos.

Is this the Kubernetes moment for AI training?

https://dropletdrift.com/ex-openai-cto-mira-murati-launches-tinker-to-simplify-ai-model-training/

#AI #ArtificialIntelligence #MachineLearning #DeepLearning #AIresearch #LLM #OpenSource #Tech #Innovation #DataScience #NeuralNetworks #FutureOfAI #AIcommunity #AIethics #Startups #OpenAI #Developers #Research #Computing

DropletDrift

Ex-OpenAI CTO Mira Murati launches Tinker to simplify AI model training - DropletDrift

A new player entered the crowded AI landscape today. Thinking Machines Lab, a startup founded earlier this year by former OpenAI CTO Mira Murati, announced Tinker, an API that lets researchers fine-tune large language models without building their own training infrastructure. The promise is straightforward: instead of wrangling GPU clusters and distributed training code, users […]
  • Copy link
  • Flag this post
  • Block
Hostvix
@stacksize@mastodon.social  ·  activity timestamp 2 months ago

🚨 Ex-OpenAI CTO Mira Murati just launched Tinker, a new service from Thinking Machines Lab.

Tinker strips AI training down to 4 simple functions — you focus on data + algorithms, it handles the GPU chaos.

Is this the Kubernetes moment for AI training?

https://dropletdrift.com/ex-openai-cto-mira-murati-launches-tinker-to-simplify-ai-model-training/

#AI #ArtificialIntelligence #MachineLearning #DeepLearning #AIresearch #LLM #OpenSource #Tech #Innovation #DataScience #NeuralNetworks #FutureOfAI #AIcommunity #AIethics #Startups #OpenAI #Developers #Research #Computing

DropletDrift

Ex-OpenAI CTO Mira Murati launches Tinker to simplify AI model training - DropletDrift

A new player entered the crowded AI landscape today. Thinking Machines Lab, a startup founded earlier this year by former OpenAI CTO Mira Murati, announced Tinker, an API that lets researchers fine-tune large language models without building their own training infrastructure. The promise is straightforward: instead of wrangling GPU clusters and distributed training code, users […]
  • Copy link
  • Flag this post
  • Block
Debby ‬⁂📎🐧:disability_flag:
@debby@hear-me.social  ·  activity timestamp 2 months ago

Hey everyone 👋

I’m diving deeper into running AI models locally—because, let’s be real, the cloud is just someone else’s computer, and I’d rather have full control over my setup. Renting server space is cheap and easy, but it doesn’t give me the hands-on freedom I’m craving.

So, I’m thinking about building my own AI server/workstation! I’ve been eyeing some used ThinkStations (like the P620) or even a server rack, depending on cost and value. But I’d love your advice!

My Goal:
Run larger LLMs locally on a budget-friendly but powerful setup. Since I don’t need gaming features (ray tracing, DLSS, etc.), I’m leaning toward used server GPUs that offer great performance for AI workloads.

Questions for the Community:
1. Does anyone have experience with these GPUs? Which one would you recommend for running larger LLMs locally?
2. Are there other budget-friendly server GPUs I might have missed that are great for AI workloads?
3. Any tips for building a cost-effective AI workstation? (Cooling, power supply, compatibility, etc.)
4. What’s your go-to setup for local AI inference? I’d love to hear about your experiences!

I’m all about balancing cost and performance, so any insights or recommendations are hugely appreciated.

Thanks in advance! 🙌

@selfhosted@a.gup.pe #AIServer #LocalAI #BudgetBuild #LLM #GPUAdvice #Homelab #AIHardware #DIYAI #ServerGPU #ThinkStation #UsedTech #AICommunity #OpenSourceAI #SelfHostedAI #TechAdvice #AIWorkstation #LocalAI #LLM #MachineLearning #AIResearch #FediverseAI #LinuxAI #AIBuild #DeepLearning #OpenSourceAI #ServerBuild #ThinkStation #BudgetAI #AIEdgeComputing #Questions #CommunityQuestions #HomeLab #HomeServer #Ailab #llmlab


What is the Best used GPU Pick for AI Researchers?
 GPUs I’m Considering:
| GPU Model            | VRAM          | Pros                                      | Cons/Notes                          |
| Nvidia Tesla M40          | 24GB GDDR5        | Reliable, less costly than V100              | Older architecture, but solid for budget builds |
| Nvidia Tesla M10          | 32GB (4x 8GB)     | High total VRAM, budget-friendly on used market | Split VRAM might limit some workloads |
| AMD Radeon Instinct MI50   | 32GB HBM2         | High bandwidth, strong FP16/FP32, ROCm support | ROCm ecosystem is improving but not as mature as CUDA |
| Nvidia Tesla V100         | 32GB HBM2         | Mature AI hardware, strong Linux/CUDA support | Pricier than M40/M10 but excellent performance |
| Nvidia A40                | 48GB GDDR6        | Huge VRAM, server-grade GPU                  | Expensive, but future-proof for larger models |
What is the Best used GPU Pick for AI Researchers? GPUs I’m Considering: | GPU Model | VRAM | Pros | Cons/Notes | | Nvidia Tesla M40 | 24GB GDDR5 | Reliable, less costly than V100 | Older architecture, but solid for budget builds | | Nvidia Tesla M10 | 32GB (4x 8GB) | High total VRAM, budget-friendly on used market | Split VRAM might limit some workloads | | AMD Radeon Instinct MI50 | 32GB HBM2 | High bandwidth, strong FP16/FP32, ROCm support | ROCm ecosystem is improving but not as mature as CUDA | | Nvidia Tesla V100 | 32GB HBM2 | Mature AI hardware, strong Linux/CUDA support | Pricier than M40/M10 but excellent performance | | Nvidia A40 | 48GB GDDR6 | Huge VRAM, server-grade GPU | Expensive, but future-proof for larger models |
What is the Best used GPU Pick for AI Researchers? GPUs I’m Considering: | GPU Model | VRAM | Pros | Cons/Notes | | Nvidia Tesla M40 | 24GB GDDR5 | Reliable, less costly than V100 | Older architecture, but solid for budget builds | | Nvidia Tesla M10 | 32GB (4x 8GB) | High total VRAM, budget-friendly on used market | Split VRAM might limit some workloads | | AMD Radeon Instinct MI50 | 32GB HBM2 | High bandwidth, strong FP16/FP32, ROCm support | ROCm ecosystem is improving but not as mature as CUDA | | Nvidia Tesla V100 | 32GB HBM2 | Mature AI hardware, strong Linux/CUDA support | Pricier than M40/M10 but excellent performance | | Nvidia A40 | 48GB GDDR6 | Huge VRAM, server-grade GPU | Expensive, but future-proof for larger models |
  • Copy link
  • Flag this post
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.0 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login