R.A. Fisher wrote that the purpose of statisticians was "constructing a hypothetical infinite population of which the actual data are regarded as constituting a random sample." ( p. 311 here ). In The Zeroth Problem Colin Mallows wrote "As Fisher pointed out, statisticians earn their living by using two basic tricks-they regard data as being realizations of random variables, and they assume that they know an appropriate specification for these random variables."
Some of the pathological beliefs we attribute to techbros were already present in this view of statistics that started forming over a century ago. Our writing is just data; the real, important object is the “hypothetical infinite population” reflected in a large language model, which at base is a random variable. Stable Diffusion, the image generator, is called that because it is based on latent diffusion models, which are a way of representing complicated distribution functions--the hypothetical infinite populations--of things like digital images. Your art is just data; it’s the latent diffusion model that’s the real deal. The entities that are able to identify the distribution functions (in this case tech companies) are the ones who should be rewarded, not the data generators (you and me).
So much of the dysfunction in today’s machine learning and AI points to how problematic it is to give statistical methods a privileged place that they don’t merit. We really ought to be calling out Fisher for his trickery and seeing it as such.
#AI #GenAI #GenerativeAI #LLM #StableDiffusion #statistics #StatisticalMethods #DiffusionModels #MachineLearning #ML
R.A. Fisher wrote that the purpose of statisticians was "constructing a hypothetical infinite population of which the actual data are regarded as constituting a random sample." ( p. 311 here ). In The Zeroth Problem Colin Mallows wrote "As Fisher pointed out, statisticians earn their living by using two basic tricks-they regard data as being realizations of random variables, and they assume that they know an appropriate specification for these random variables."
Some of the pathological beliefs we attribute to techbros were already present in this view of statistics that started forming over a century ago. Our writing is just data; the real, important object is the “hypothetical infinite population” reflected in a large language model, which at base is a random variable. Stable Diffusion, the image generator, is called that because it is based on latent diffusion models, which are a way of representing complicated distribution functions--the hypothetical infinite populations--of things like digital images. Your art is just data; it’s the latent diffusion model that’s the real deal. The entities that are able to identify the distribution functions (in this case tech companies) are the ones who should be rewarded, not the data generators (you and me).
So much of the dysfunction in today’s machine learning and AI points to how problematic it is to give statistical methods a privileged place that they don’t merit. We really ought to be calling out Fisher for his trickery and seeing it as such.
#AI #GenAI #GenerativeAI #LLM #StableDiffusion #statistics #StatisticalMethods #DiffusionModels #MachineLearning #ML
Honda: 2 years of ml vs 1 month of prompting - heres what we learned
https://www.levs.fyi/blog/2-years-of-ml-vs-1-month-of-prompting/
#HackerNews #Honda #ml #prompting #machinelearning #AI #insights
💡3-Month FULLY FUNDED Summer Internship in Germany!
🇩🇪 Apply for the Internship (Jul-Sep 2026) in Tübingen. Work on cutting-edge research in #ML, #Neuroscience & #DataAnalysis at MPI. Open to BSc./MSc. students.
🗓️Deadline: Nov 20, 2025
🔗 Apply: https://cactus-internship.tuebingen.mpg.de
💡3-Month FULLY FUNDED Summer Internship in Germany!
🇩🇪 Apply for the Internship (Jul-Sep 2026) in Tübingen. Work on cutting-edge research in #ML, #Neuroscience & #DataAnalysis at MPI. Open to BSc./MSc. students.
🗓️Deadline: Nov 20, 2025
🔗 Apply: https://cactus-internship.tuebingen.mpg.de
Soon to open (mid-november)
https://statml.peercommunityin.org/
To keep in mind.
#ML #statistics #OpenScience
Soon to open (mid-november)
https://statml.peercommunityin.org/
To keep in mind.
#ML #statistics #OpenScience
🎤 Upcoming at SeaGL 2025:
📍 02:00 PM on November 08
🗣️ "“Hidden in Plain Sight: Addressing Data Bias in AI-Driven Systems”"
👥 Speaker(s): Autumn Nash
📍 Room: Room 145
🏷️ Track: Open source AI and Data Science
📝 As AI increasingly powers critical systems across industries, the quality and neutrality of training...
#SeaGL2025 #ai #ml #performance #automation #data
🔗 https://pretalx.seagl.org/2025/talk/ETZQ8V/
🎤 Upcoming at SeaGL 2025:
📍 02:00 PM on November 08
🗣️ "“Hidden in Plain Sight: Addressing Data Bias in AI-Driven Systems”"
👥 Speaker(s): Autumn Nash
📍 Room: Room 145
🏷️ Track: Open source AI and Data Science
📝 As AI increasingly powers critical systems across industries, the quality and neutrality of training...
#SeaGL2025 #ai #ml #performance #automation #data
🔗 https://pretalx.seagl.org/2025/talk/ETZQ8V/
Super excited that Dr. Keir Winesmith is one of the keynotes for @everythingopen #EverythingOpen #EO2026 in Canberra in January.
The #NFSA are doing incredible things with #transcription of speech archives with their Bowerbird project, and their commitment to #AI and #ML practices is sector-leading. Interested in what he has to say.
Super excited that Dr. Keir Winesmith is one of the keynotes for @everythingopen #EverythingOpen #EO2026 in Canberra in January.
The #NFSA are doing incredible things with #transcription of speech archives with their Bowerbird project, and their commitment to #AI and #ML practices is sector-leading. Interested in what he has to say.
#Barcelona people, here's a rare opening for a software engineer + machine learning, LLM work, and all that jazz
This is an excellent intro to linear algebra for anyone getting into #numpy or #ML and wanting a primer.
https://little-book-of.github.io/linear-algebra/books/en-US/lab.html
#Barcelona people, here's a rare opening for a software engineer + machine learning, LLM work, and all that jazz
Asking the Fedi.
Is the expected hyperscaling of AI data centers a generativeAI/LLM thing? I cannot see "old school" ML type things (factory and port optimisation, medical image analysis) generating that leap in data center use.
Or am I missing something?
and if it is does that mean, if the bubble bursts, it will take out the big western base load electrical demand growth story as well?