Post · bonfire.cafe

just heard a really interesting talk by Ruoxi Qi from the University of Hong Kong about bias in LLMs.

They investigated LLMs bias toward WEIRD values by prompting LLMs and comparing their answers to World Values Survey (WVS) data (Haerpfer et
al., 2022). The WVS contains questions about human values and data from large representative samples from different parts of the world.

As expected they found bias toward WEIRD but also bias toward East Asia and Russia, presumably reflecting balance in the training data. In fact, whether a country was rich or not, was the best predictor of bias.

A really nice summary plot of their results from the paper is the Fig. 4 heatmap overlaid with clustering results that plots distance between model distribution and WSV distributions as a measure of value alignment! #cogsci25

https://escholarship.org/content/qt87d9k3tg/qt87d9k3tg.pdf