Designing machine minds requires a level of epistemological understanding. What makes good training data? What is data quality?
It's not enough to learn old philosophical books about epistemology, because those aren't in themselves the core truth of the topic. Making machine minds shows us what knowledge actually is, and what data quality actually is, in ways which weren't accessible to past philosophers. We are doing applied epistemology in AI.
Saying for example that AI slop content in the internet is going to lead to model collapse is an inherently epistemological claim, and certainly true for naive data pipelines which aren't designed with understanding.
So, I have my own ideas of epistemology which support my views and ideas on how to push the envelope in AI further. For example:
- Intelligence is competence in an open set of ridiculously transferrable cognitive skills. Nothing more, nothing less.
- Data quality relates to how the data can be utilized in training intelligent models. Hence good quality data, regardless of domain or modality, has a good coverage of high quality knowledge and traces of competent skills use.
- Natural data is not the gold standard, and neither is human-imitative data. We can refine data by different means, human or machine. Wikipedia is in itself an example of a refined data repository, refined by humans from source data of lower quality.
- High quality data can be refined from lower quality data, as long as the truth and the skills are in the data, the raw digital ore. The ore containing what we want to refine and extract can be sourced from different trial-and-error evolutionary processes, real world measurements and experiments, or by applying compute to produce new knowledge and skills from knowledge and skills already there.
It is not possible to design the next generation of machine minds without at least implicitly basing it on some epistemological insights. Doing otherwise is just struggling blindly in the dark, to only succeed by chance.