Discussion
Loading...

Post

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Parastoo Abtahi
Parastoo Abtahi
@parastoo@hci.social  ·  activity timestamp last week

People form ad hoc conventions, by establishing linguistic & gestural abstractions, and shift information across speech and gesture to communicate more efficiently over time.

In our upcoming #CHI2026 paper, we study how these multimodal communications evolve in repeated physical collaboration.

Led by Kiyosu Maeda in close collaboration with @jefan, @rdhawkins, and team: William McCarthy, Ching-Yi Tsai, Jeffrey Mu, and Haoliang Wang.

🧵👇 1/4

Top: Modality shift in block instruction: R1: Take the green block and put it on the left side of the grid. A hand is holding an imaginary piece toward the left column of a 2×2 grid; label reads Redundant position and orientation. R4: the green block pointing this way. A hand is pointing near the bottom left cell with an arrow showing movement toward the top left cell; label reads Complementary position and orientation. Target tower: a 3 block green and red C-shape tower on a 2x2 grid. Bottom: Modality shift in tower instruction: R1: they are going to form a C-shape. A c-shape hand pose with the index and thumb is shown far from the grid; the label reads No information about position or orientation. R4: Put the C on the left side, facing away from you. Right hand shows the C shape facing away, and left hand with the palm open indicates placement on the left side; labels read Redundant position and orientation.
Top: Modality shift in block instruction: R1: Take the green block and put it on the left side of the grid. A hand is holding an imaginary piece toward the left column of a 2×2 grid; label reads Redundant position and orientation. R4: the green block pointing this way. A hand is pointing near the bottom left cell with an arrow showing movement toward the top left cell; label reads Complementary position and orientation. Target tower: a 3 block green and red C-shape tower on a 2x2 grid. Bottom: Modality shift in tower instruction: R1: they are going to form a C-shape. A c-shape hand pose with the index and thumb is shown far from the grid; the label reads No information about position or orientation. R4: Put the C on the left side, facing away from you. Right hand shows the C shape facing away, and left hand with the palm open indicates placement on the left side; labels read Redundant position and orientation.
Top: Modality shift in block instruction: R1: Take the green block and put it on the left side of the grid. A hand is holding an imaginary piece toward the left column of a 2×2 grid; label reads Redundant position and orientation. R4: the green block pointing this way. A hand is pointing near the bottom left cell with an arrow showing movement toward the top left cell; label reads Complementary position and orientation. Target tower: a 3 block green and red C-shape tower on a 2x2 grid. Bottom: Modality shift in tower instruction: R1: they are going to form a C-shape. A c-shape hand pose with the index and thumb is shown far from the grid; the label reads No information about position or orientation. R4: Put the C on the left side, facing away from you. Right hand shows the C shape facing away, and left hand with the palm open indicates placement on the left side; labels read Redundant position and orientation.
  • Copy link
  • Flag this post
  • Block
Parastoo Abtahi
Parastoo Abtahi
@parastoo@hci.social  ·  activity timestamp last week

Using #AR, we carefully isolate speech and gestures, removing other cues (e.g., gaze, facial expressions). This allows us to analyze how partners coordinate on abstractions and how information shifts across these modalities over time.

We develop a computational model, extending the Rational Speech Act (RSA) framework to multimodal settings, and simulate the behaviors we observe.
2/4

  • Copy link
  • Flag this comment
  • Block
Parastoo Abtahi
Parastoo Abtahi
@parastoo@hci.social  ·  activity timestamp last week

🤖 Our findings suggest strategies for future convention-aware multimodal agents that: (1) learn users’ chunked conventions as they emerge, (2) shift to abstract-first instructions over time, (3) adapt modality to evolving user preferences, and (4) use redundancy to highlight changes from prior interactions.
3/4

  • Copy link
  • Flag this comment
  • Block
Parastoo Abtahi
Parastoo Abtahi
@parastoo@hci.social  ·  activity timestamp last week

If you saw @jefan present our poster at #CogSci2025, the full paper will appear at #CHI2026:

“Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks”
🔗 https://multimodal-conventions.github.io
📄 https://arxiv.org/pdf/2602.08914

@hci 4/4

Screenshot of the paper. Teaser figure: five-panel teaser showing a shift from block-by-block to abstract tower descriptions. Panel 1 shows the first L-shaped tower made from three LEGO blocks (blue base, two red blocks stacked). A speech bubble says Put a blue block on the front side of the grid, with a hand precisely placing an imaginary block on a two-by-two grid. Panel 2 shows a speech bubble saying a red block on top of the blue, on the left side, with a hand holding an imaginary block vertically above the previous position. Panel 3 shows a speech bubble saying then another red block on top of that, with the right hand stacking another imaginary block. Panel 4 shows a speech bubble saying like an L shape. Two hands depict an L-shape gesture representing tower shape without position or orientation. The final panel shows the same tower in a different position and orientation, with a speech bubble reading Put a backward L-shape tower on the back of the grid and a hand indicating the back row of the grid.
Screenshot of the paper. Teaser figure: five-panel teaser showing a shift from block-by-block to abstract tower descriptions. Panel 1 shows the first L-shaped tower made from three LEGO blocks (blue base, two red blocks stacked). A speech bubble says Put a blue block on the front side of the grid, with a hand precisely placing an imaginary block on a two-by-two grid. Panel 2 shows a speech bubble saying a red block on top of the blue, on the left side, with a hand holding an imaginary block vertically above the previous position. Panel 3 shows a speech bubble saying then another red block on top of that, with the right hand stacking another imaginary block. Panel 4 shows a speech bubble saying like an L shape. Two hands depict an L-shape gesture representing tower shape without position or orientation. The final panel shows the same tower in a different position and orientation, with a speech bubble reading Put a backward L-shape tower on the back of the grid and a hand indicating the back row of the grid.
Screenshot of the paper. Teaser figure: five-panel teaser showing a shift from block-by-block to abstract tower descriptions. Panel 1 shows the first L-shaped tower made from three LEGO blocks (blue base, two red blocks stacked). A speech bubble says Put a blue block on the front side of the grid, with a hand precisely placing an imaginary block on a two-by-two grid. Panel 2 shows a speech bubble saying a red block on top of the blue, on the left side, with a hand holding an imaginary block vertically above the previous position. Panel 3 shows a speech bubble saying then another red block on top of that, with the right hand stacking another imaginary block. Panel 4 shows a speech bubble saying like an L shape. Two hands depict an L-shape gesture representing tower shape without position or orientation. The final panel shows the same tower in a different position and orientation, with a speech bubble reading Put a backward L-shape tower on the back of the grid and a hand indicating the back row of the grid.
https://arxiv.org/pdf/2602.08914
  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.34 no JS en
Automatic federation enabled
Log in
Instance logo
  • Explore
  • About
  • Members
  • Code of Conduct