#cogsci25 a great talk by Ellie Pavlick on ‘emergent compositionality in neural networks’:

Compositionality in language and thought has been one of the long running debates in cognitive science. It refers to the way complex meanings are established from component parts. Specifically, it’s the idea that the meaning of a complex unit can be derived solely from the meanings of its parts: e.g., the meaning of “black cat” can be built up directly from the meanings of “black” and “cat”.
🧵

2/ Compositionality is a hallmark of symbolic computation. The difficulty that connectionist networks had with compositionality was marshalled as a key reason for rejecting them as candidate cognitive models (see e.g., Fodor & Pylyshyn, 1988 https://uh.edu/~garson/F&P1.PDF )

So the question of whether 3rd generation neural networks (e,g, large language models) fare better has the potential to be hugely informative to questions about how to think about human cognition.

3/ Ellie distinguished two aspects/kinds of compositionality: structural and functional compositionality and provided evidence of both in 3rd generation models.
‘Structural compositionality’ refers to the way -on symbolic representations- the constituent parts are part of the overall representation (our representation of ‘pink’ elephant’ is part of our representation of ‘pink elephant’). For neural networks, this would translate into the fact that activations and weights of the network are organised into identifiable and re-combinable parts.

Evidence for exactly this can be found in studies like Lepori, Serre & Pavlik (2023) which shows evidence for structured representations in the weights as networks break tasks down into subroutines

https://proceedings.neurips.cc/paper_files/paper/2023/hash/85069585133c4c168c865e65d72e9775-Abstract-Conference.html

4/ The second type of compositionality she distinguished is ‘functional compositionality’. Imagine a function like y = g(f(x)). As a universal function approximator a network could learn to approximate this functions output directly, or it could derive it via intermediate computation of f(x). In support of this she described new work that shows evidence of such intermediate computation.

5/ In short, LLMs can learn representations that are structural and modular in both activation and weight space. But at the same time, they remain context sensitive - so they capture ways in which human cognition deviates from purely symbolic architectures. In this way, they can move forward this long standing debate by providing an example computational system that combines these properties.

1 more replies (not shown)