While much has been written about agent architectures, real-world production-grade implementations remain limited. Building on my earlier post about A2A fundamentals, this piece highlights the agentic AI mesh, a concept introduced in a recent McKinsey. 1
We will examine the challenges that emerge in production environments and demonstrate how our proposed architecture enables controlled scaling of AI capabilities.
Challenges in agentic systems
As teams transition from testing and experimenting with AI agents to deploying scalable, real-world use cases, several challenges emerge:
- Integration gaps: During experimentation, teams use pre-built solutions that accelerate development, but these solutions often lack a consistent approach when scaling. As a result, integration and coordination issues emerge, leading to gaps in coverage. For example, we have seen that when trying to scale AI-powered chatbots, different systems for customer data and interactions fail to sync.
- Isolation of agents: Most agents today work independently with local information. For example, a Planner, Retriever, and Executor agent connected via APIs may lack a unified context. As organizations scale to multi-agent ecosystems, the absence of shared memory and coordination becomes a key challenge.
- Operational limitations: AI agent applications can lead to unpredictable outputs and non-deterministic behavior, generating inconsistent responses or failing to deliver accurate solutions.
Introducing the agentic mesh architecture
The AI mesh envisions an “Internet for Agents,” where multiple agents can reason, collaborate, and act autonomously across a distributed network of systems and tools.
Unlike RAG pipelines or microservice APIs, it creates a system of record for agent behavior: every tool invocation, error, and outcome is distributed through the event mesh and preserved by the coordination layer.
Over time, this shared history compounds into a richer knowledge base, enabling agents to align around common context and collaborate more effectively.
How does an agentic mesh work:

1. Composability:
Any agent, tool, or model (for example, a new LLM) can be connected to the mesh without requiring changes to other components.
This modular design supports scaling by allowing organizations to add or replace capabilities incrementally, without disrupting existing workflows.
2. Parallel agent reasoning:
The mesh enables reasoning to be spread across multiple agents. This increases complexity but allows specialized agents to handle parts of a larger task instead of relying on a single LLM.
This division of labor makes it easier to scale AI systems, since workloads can be distributed across agents running in parallel.
3. Layered secoupling:
The mesh separates key functions (e.g.logic, memory, orchestration, and interfaces) into distinct layers. This means an agent’s reasoning can operate independently from its data storage or user interface.
4. Vendor neutrality:
The mesh is not tied to any single vendor or platform. Components can be replaced or updated independently, with a preference for open standards such as the Model Context Protocol (MCP) and Agent2Agent (A2A) over proprietary APIs.
For instance, Google’s A2A defines a common message format and discovery mechanism for cross-framework collaboration, while Anthropic’s MCP provides a universal way for agents to fetch data. Similar to USB, these standards enable interoperability so teams can mix tools and models from different providers without extra integration work.
5. Governed autonomy:
Agents in the mesh act on their own, but within guardrails, embedded policies, and other constraints. In other words, every autonomous action is pre-governed by rules.
Operational capabilities: How the mesh works in practice?

Each of these capabilities spans the entire mesh (not tied to a single platform) and often parallels concepts from cloud or microservices environments, such as service registries or audit logs.
Below, we outline each capability and how it operates in practice:
Agent & workflow discovery:
The mesh maintains a central directory of all available agents and workflows. This ensures teams can easily find, reuse, and integrate existing capabilities rather than rebuilding them from scratch. Agents may also publish standardized “capability cards” describing what they can do, which can be queried by other agents or human operators.
By enforcing a common taxonomy and metadata standards, organizations can also apply governance policies like restricting certain sensitive tasks to certified agents only.
AI asset registry:
The asset registry provides a repository for all critical AI assets that shape agent behavior. This includes prompts, tool definitions, model configurations, datasets, and policies. Everything in the repository is version-controlled, auditable, and subject to governance.
Essential assets often include:
- Prompts and instructions tested against jailbreaks or bias.
- Agent configurations specifying which tools, APIs, and models are permitted.
- LLM settings defining available models and parameters.
- Tool definitions and MCP servers with access controls embedded.
- Golden input/output examples that form trusted references for learning and evaluation.
Feedback management:
Feedback loops are embedded into the mesh so that every workflow execution becomes a source of learning. Metrics such as latency, accuracy, error rates, or even human ratings are collected and fed back into the system.
Compliance & risk management:
Every agentic workflow must operate within defined rules and constraints. Compliance and risk management tools are built directly into the mesh to ensure this.
For example,
- Compliance agents may audit actions against organizational or regulatory standards before outputs are finalized.
- Policies can require that sensitive tasks include checks from privacy or security agents, while audit trails log every action for later review.
Evaluation systems:
Evaluation pipelines function like integration tests for agentic workflows. It aims to ensure that workflows remain robust even when underlying LLMs change or external conditions shift.
Whenever a deployment or model update occurs, they run structured test suites to validate correctness.
They typically include:
- Step-level tests (e.g., was the right API/tool invoked?).
- Workflow-level tests (e.g., did the overall process produce the expected outcome?).
- Adversarial tests (e.g., prompt injection, misuse, denial-of-service).
Observability:
In an agentic mesh, observability ensures every agent interaction and workflow can be traced, logged, and analyzed. This capability provides end-to-end visibility into how agents collaborate, which tools are invoked, and what resources are consumed.
By centralizing metrics and event logs, organizations can detect anomalies, control costs, and verify that outputs stay within governance policies.
Emerging standards like OpenTelemetry for agents are helping to make observability interoperable across different runtimes.
Authentication and authorization:
In an agentic mesh, every agent-to-agent or agent-to-service call must be authenticated and authorized. Think of it like issuing temporary security badges: agents only get the exact permissions they need, and those expire quickly.
Using standards like OAuth 2.0, JWTs, and least-privilege access keeps interactions secure and limits the impact if one component is compromised.
Why this matters
Taken together, these capabilities turn loosely connected agents into a coherent, well-governed mesh. Workflows become supervised, auditable, and adaptive, while still maintaining the flexibility to integrate new agents, tools, or models as needed.
For instance, an Atlassian-built agent could seamlessly discover and invoke a specialist Salesforce agent through the mesh, with identity and data flows managed by shared protocols.
This is what sets an agentic mesh apart from traditional workflow management systems. Conventional orchestrators can connect APIs and tasks, but they typically lack the built-in governance, continuous feedback, and compliance mechanisms that the mesh provides.
Use cases of agentic mesh
Agentic mesh concepts are gaining traction, but real-world, production-grade implementations are still limited. Most current examples are early deployments or proofs-of-concept. That said, several vendors are beginning to showcase practical use cases:
Kubernetes and ingress control
Instead of relying solely on static ingress controllers, in an agentic mesh system, AI agents can extend Kubernetes-native environments by enabling agents to manage traffic, enforce security, and optimize workloads across APIs and event streams.
Application areas:
- Ingress control: Agents enforce authentication, TLS termination, and policy rules to protect APIs from unauthorized access.
- Cluster-aware orchestration: Agents scale workloads up or down and adjust routing strategies based on resource availability.
Real-world example:
Optimizing backend systems
An agentic mesh can help optimize backend systems by enabling agents to manage traffic, enforce policies, and balance workloads in real time.
Application areas:
- Traffic management: Apply fine-grained rate limits, quotas, and spike controls to avoid overload.
- Load balancing: Distribute incoming API calls and event-stream traffic across servers to maintain responsive services.
- Bottleneck prevention: Detect and throttle excessive API or data-stream requests to ensure consistent performance.
- Resilience and uptime optimization: Improve fault tolerance by rerouting failed API/event requests.
Real-world example:
Railway company Eurostar utilizes an agentic mesh to optimize backend systems. They manage client access to APIs in a granular way for more secure traffic control and load distribution.5
Centralized API management
An agentic mesh helps organizations centralize APIs, event streams, and AI agents into one unified platform.
Application areas:
- Multiple gateway support: Integrate APIs from various platforms like AWS, Azure, and Apigee
- Enterprise-grade authentication: Ensure proper access control to manage who can interact with APIs and agents.
Real-world example:
SKF, a manufacturing company, uses an agentic mesh platform to centralize and manage its APIs. 6
Managing and exposing real-time data and event streams
An agentic mesh helps organizations manage and secure access to real-time data and event streams, providing seamless integration and control. Think of it as a centralized hub where different systems, like APIs and event brokers, can communicate and share data efficiently.
Application areas:
- Centralized security: Ensure all data and APIs are secure and meet organizational standards.
- Protocol mediation: Convert different types of data streams (e.g., Kafka, MQTT) into common, easy-to-use formats like REST or WebSocket.
- API and event discovery: Provide a single portal for developers to find and use data and APIs.
- Unified management: Handle all types of APIs and data streams, including REST and WebSocket, in one place.
The future of agentic mesh: Just another hype?
Agentic mesh promises a transformative way for autonomous AI agents to collaborate within a structured ecosystem. However, there’s a risk it could become just another technical framework, dominated by infrastructure solutions like service meshes and integration fabrics:
- A similar pattern emerged with the data mesh concept. When Zhamak Dehghani introduced it, the idea revolutionized data management by focusing on ownership, governance, and treating data as a product. Yet, vendors quickly rebranded existing solutions as Data Mesh.7
- The same trend is now visible with agentic mesh. While the conversation is focused on technical aspects like secure communication/orchestration, these are primarily infrastructure components.
To avoid reducing it to just another Service Mesh 2.0 or Data Fabric 2.0 with AI, the real opportunity lies in focusing on value creation, not just the underlying infrastructure.
It is essential to ensure that business domains take responsibility for their agents, not just relying on middleware vendors. If organizations embrace domain ownership, stewardship, and federated governance, agentic mesh can become a powerful tool for transformation.
External Links
- 1. Seizing the agentic AI advantage | McKinsey. McKinsey & Company
- 2. Solace Agent Mesh Episode 1 - Introduction - YouTube.
- 3. How we enabled Agents at Scale in the Enterprise with the Agentic AI Mesh | by QuantumBlack, AI by McKinsey | QuantumBlack, AI by McKinsey | Medium. QuantumBlack, AI by McKinsey
- 4. Microservices and ingress controller.
- 5. Make backend systems more reliable.
- 6. Make backend systems more reliable.
- 7. https://arxiv.org/pdf/2304.01062
Comments
Your email address will not be published. All fields are required.