Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

feat(spec): Introduce native Pub/Sub primitives for scalable multi-agent collaboration#1196

Open
aglicacha wants to merge 1 commit intoa2aproject:mainfrom
aglicacha:feat-1029-pub/sub
Open

feat(spec): Introduce native Pub/Sub primitives for scalable multi-agent collaboration#1196
aglicacha wants to merge 1 commit intoa2aproject:mainfrom
aglicacha:feat-1029-pub/sub

Conversation

@aglicacha
Copy link

@aglicacha aglicacha commented Nov 8, 2025

A Note on the "Runtime"
Throughout this proposal, the term "runtime" is used to describe the logical entity responsible for managing topics, handling subscriptions, and routing EventMessages from publishers to subscribers. It's crucial to understand that this "runtime" is an abstract role within the Pub/Sub pattern, not a prescribed component of the A2A protocol itself.

We refer to the concrete implementation of this role as the Runtime.

The A2A protocol deliberately does not dictate how this Runtime should be implemented. The choice of implementation is left to the system architect and depends entirely on the specific requirements of their environment. For instance, the Runtime could be:

A dedicated, standalone Agent that programmatically manages topics and subscriber lists.

A facade layer built on top of mature, battle-tested message queuing infrastructure such as RocketMQ, Kafka, or cloud-native services like AWS SNS/SQS or Google Cloud Pub/Sub.

Our strong recommendation is to leverage existing message queue infrastructure to fulfill the Runtime's responsibilities. This approach allows developers to benefit from the scalability, reliability, and rich feature sets of these specialized systems, while the A2A protocol remains focused on defining the interoperable contract for agent-to-agent communication.

In essence, the A2A protocol defines the language agents use to talk about Pub/Sub; the Runtime is the engine that makes the conversation happen.

Context & Motivation
The current A2A specification is built on a powerful point-to-point (P2P) model. This is excellent for direct request/response interactions. However, building scalable and resilient multi-agent systems requires a decoupled, event-driven communication pattern, for which Publish/Subscribe (Pub/Sub) is the standard.

Analysis of community examples and our production practice demonstrates that developers must currently implement a fragile, inefficient, and centralized "router" actor in the application layer to simulate Pub/Sub. This approach introduces a single point of failure and a performance bottleneck, while pushing infrastructure concerns (message routing) onto the developer.

To enable true multi-agent autonomy and system evolvability, Pub/Sub should be a first-class citizen in the A2A protocol.
Having comprehensively analyzed the prevailing specifications, We found that current A2A specification provides three well-defined communication patterns:

  • RPC (Remote Procedure Call): Command-oriented requests for managing Task lifecycles (e.g., GetTaskRequest, CancelTaskRequest).

  • Stateful Object Observation (Webhooks): A mechanism to subscribe to state changes of a single resource instance (e.g., SetTaskPushNotificationConfigRequest).

  • Conversational Messaging: A direct, 1-to-1, request/response paradigm for interactive dialogue (SendMessageRequest).

These patterns serve their purpose well but lack a native mechanism for broadcasting information to a dynamic set of interested parties in a decoupled manner. This PR introduces the Pub/Sub pattern to fill this architectural gap, enabling use cases like system-wide alerts, real-time data feeds, and multi-service event notifications.

Fixes #1029

@aglicacha aglicacha force-pushed the feat-1029-pub/sub branch 3 times, most recently from 9e0fe73 to 9147c61 Compare November 8, 2025 11:25
@aglicacha aglicacha marked this pull request as ready for review November 10, 2025 13:03
@aglicacha aglicacha requested a review from a team as a code owner November 10, 2025 13:03
@darrelmiller
Copy link
Contributor

Can I suggest that you wait a few days while we merge PR #1160 as both of the files that you changed will no longer be part of the specification as they do not contain normative content? Then it will be easier to discuss the changes you are proposing.

@aglicacha
Copy link
Author

@darrelmiller Thank you for the reminder. I will temporarily stop modifying these files and submit the changes after the refactoring is merged.

@aglicacha aglicacha force-pushed the feat-1029-pub/sub branch 3 times, most recently from 8635a06 to d6f334f Compare December 7, 2025 10:03
string topic = 1 [(google.api.field_behavior) = REQUIRED];
// The content of the message. The structure of the payload is contractually defined by the topic itself and is opaque to the runtime.
// To promote interoperability, it is highly recommended that this payload conforms to a standard event envelope format, such as the CloudEvents specification.
google.protobuf.Struct payload = 2 [(google.api.field_behavior) = REQUIRED];
Copy link

@vongosling vongosling Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct type is json object. If we send a binary, simple string, do we need to convert?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. The information being published may exist in different forms, so it may be better to impose less constraints on its content format. Perhaps using bytes would be better? @darrelmiller What do you think?

};
}
// Subscribe to a set of topics.
rpc Subscribe(SubscribeRequest) returns (google.protobuf.Empty) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care how to pull or push the event to the agent, right? If we return empty, it really is as it does

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the pub/sub model does not restrict how messages are specifically sent and received, which provides implementers with ample freedom and extensibility. @darrelmiller Glad to hear your opinions and suggestions.

@aglicacha aglicacha changed the title feat(spec): Introduce native Pub/Sub primitives for scalable multi-agent collaboration feat(spec): Introduce native Pub/Sub primitives for scalable multi-agent collaborations Dec 8, 2025
@aglicacha aglicacha changed the title feat(spec): Introduce native Pub/Sub primitives for scalable multi-agent collaborations feat(spec): Introduce native Pub/Sub primitives for scalable multi-agent collaboration Dec 8, 2025
@aglicacha aglicacha force-pushed the feat-1029-pub/sub branch 3 times, most recently from fab7028 to 5eca2a5 Compare December 16, 2025 02:52
post: "/v1/publish"
body: "*"
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to workout what this Publish operation means for an Agent. Assuming that the a2a protocol is served by an Agent, if I send it a "Publish(...)" am I asking the Agent itself to publish a message on the topic? or is it really that this method that is used by "the runtime" to trigger the agent when a new message is arrived on a topic in which case I think it needs a different name and it becomes a question of whether this should be on the API surface at all (i.e. the agent could internal subscribe to the topic and handle events)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Support publish/subscribe methods for async communications

4 participants