🤖 Our findings suggest strategies for future convention-aware multimodal agents that: (1) learn users’ chunked conventions as they emerge, (2) shift to abstract-first instructions over time, (3) adapt modality to evolving user preferences, and (4) use redundancy to highlight changes from prior interactions.
3/4
Post
Using #AR, we carefully isolate speech and gestures, removing other cues (e.g., gaze, facial expressions). This allows us to analyze how partners coordinate on abstractions and how information shifts across these modalities over time.
We develop a computational model, extending the Rational Speech Act (RSA) framework to multimodal settings, and simulate the behaviors we observe.
2/4
🤖 Our findings suggest strategies for future convention-aware multimodal agents that: (1) learn users’ chunked conventions as they emerge, (2) shift to abstract-first instructions over time, (3) adapt modality to evolving user preferences, and (4) use redundancy to highlight changes from prior interactions.
3/4
If you saw @jefan present our poster at #CogSci2025, the full paper will appear at #CHI2026:
“Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks”
🔗 https://multimodal-conventions.github.io
📄 https://arxiv.org/pdf/2602.08914
@hci 4/4