Enabling Communication in Multi-Agent VLAs

Presenter: Dorian Benhamou Goldfajn

Faculty Sponsor: Shlomo Zilberstein

School: UMass Amherst

Research Area: Computer Science

Session: Poster Session 3, 1:15 PM - 2:00 PM, 163, C24

ABSTRACT

Vision–language–action models (VLAs) have emerged as a promising approach for enabling general-purpose robots by jointly learning visual perception, natural-language understanding, and motor control within a single framework. However, training data requirements increase exponentially as action space grows, exposing a fundamental bottleneck in their ability to execute high-dimensional actions. Additionally, many real-world domains, including search-and-rescue and disaster response, inherently require multiple robots to coordinate actions and share information in real time. Despite recent progress, existing VLA architectures lack explicit mechanisms for inter-agent communication and coordination. 


This limitation motivates the development of VLAs that can communicate in multi-agent scenarios to distribute action complexity and extend overall system capability. This proposal aims to develop novel VLA architectures that support explicit communication among multiple agents, enabling real-time coordination and information sharing.  


Natural language provides an intuitive starting point for inter-agent communication, leveraging the language-model backbone within VLA architectures. However, long-form language is inefficient for high-throughput, low-latency robotic coordination. More effective alternatives include constrained vocabularies, structured symbolic messages, or compact latent representations that encode agent intent or planned actions, preserving semantic structure while enabling scalable real-time coordination. The proposed communication pipeline consists of annotating offline trajectories with a compact, predefined vocabulary or learning communication protocols from environment interaction through multi-agent reinforcement learning. The resulting models will be evaluated on a suite of scalable multi-agent robotics tasks developed in IsaacLab, designed to test coordination efficiency, generalization, and robustness across increasing task complexity.


RELATED ABSTRACTS