We spent the last quarter testing AI agents across coding, customer service, sales, research, and business workflows. Not reading vendor marketing,g actually using these tools daily to see what delivers and what’s hype.
Despite talk about “autonomous AI,” most tools today are co-pilots, not autopilots. They handle research and automate repetitive tasks, but still need humans to make the actual decisions.
Examples of popular agentic-style platforms and tools
- n8n: Business workflow orchestration
- Tidio’s Lyro: SMB-centric agentic live chat
- Sully.ai: Healthcare research and workflow automation
- AiSDR: AI sales development
- Creatio: Enterprise workflow automation
- Cursor: AI code editing
- Otter.ai: AI note-taking
- Averi: AI marketing content creation
- Make (Celonis): Scalable low-code automation
- Kompas AI: Deep research and report generation
- LangGraph: Production-grade complex agentic workflow generation
- Beam AI: Document-heavy workflows
- Relevance AI: Embedded analytics + decision flows
- IBM watsonx Orchestrate: Enterprise-grade orchestration
What Is an AI Agent?
An AI agent loops. That’s the core difference from a chatbot.
Source: GitHub1
However, there is no strict definition of what an “Agent” can be; it can be defined in several ways:
- Traditional AI defines agents as: Systems that interact with their environment.
- Some analytics firms define agents as: Fully autonomous systems that operate independently over extended periods, using tools such as functions or APIs to engage with their surroundings and make decisions based on context and goals.2
- Others use the term to describe as: More prescriptive implementations that follow predefined workflows.3
Here are the factors that cause an AI system to be considered more agentic:
Here is a real-world example and conversation of an open source software agent managing deployments at Humanlayer:4
Source: GitHub 5
The “Agentic” Spectrum
Not all “AI agents” are equally autonomous. Here’s what actually differentiates them:
Level 1: Rule-based automation Responds to triggers, no contextual understanding. Example: “When support ticket arrives, categorize by keyword.”
Level 2: Strategic task automation Follows workflow with some contextual awareness. Example: Code assistant that analyzes your codebase before suggesting improvements.
Level 3: Context-aware and reflective (most current “agents”) Writes code, runs it in test environment, evaluates results, iterates. Examples: Cursor with agentic mode, Claude Code, most tools marketed as “agents” today.
Level 4: Highly autonomous (still mostly aspirational) Deploys tested applications to production via automated pipelines, triggered by natural language, finalized with human approval.
Most tools claiming to be “AI agents” in 2026 operate at Level 2-3. Level 4 exists in limited, controlled environments but isn’t production-ready for most companies.
Capabilities of agentic AI systems
Adapted from: Cobus Greyling6
Read more: Enterprise AI agents, AI agent builders, large action models (LAMs), and agentic AI in cybersecurity.
Tools That Actually Work (Based on Real Testing)
Coding Agents
Cursor remains the most widely adopted among individual developers. It’s the baseline everyone compares against. In 2025-2026 Reddit threads, even people who prefer other tools mention Cursor as their reference point.
- Smooth IDE integration (feels like native VSCode)
- Fast context switching between files
- “Flow” prioritized over raw intelligence
- Composer feature for multi-file edits
Where it struggles:
- Gets expensive at scale ($20/user/month adds up)
- Less capable than Claude for architectural reasoning
- Can hallucinate on complex codebases
Claude Code is what developers escalate to when Cursor fails. Across late-2025 developer discussions, Claude gets described as the most capable model for deep reasoning, debugging, and architectural changes.
Developers at companies like Faros AI use Claude Code almost exclusively, impressed by speed, intelligence, and ease of use. The trust level is higher when people give it harder problems.
Where it shines:
- Unraveling subtle bugs
- Reasoning about unfamiliar codebases
- Design-level changes requiring architectural understanding
The catch:
- Cost comes up frequently in discussions
- Some feel Claude performs better through other tools (Cline, Aider) that give more control over context
- Not as “smooth” for rapid iteration as Cursor
GitHub Copilot introduced repository intelligence in 2025—understanding code relationships and development history beyond individual files. Analyzes patterns across entire codebases for contextually relevant suggestions.
Activity reached unprecedented levels in 2025: developers merged 43 million pull requests per month (a 23% increase over 2024).
Emerging tools generating real discussion:
- Kiro: Spec-driven development and DevOps automation. Early impressions point to performance issues.
- Kilo Code: Gaining traction for structured modes and tighter context handling. Resonates with developers burned by hallucinating agents.
- Zencoder: Shows promise in spec-driven workflows but lacks long-term user reports.
Business Workflow Agents
n8n handles workflow orchestration for teams that need custom automation without an enterprise budget.
What it actually does well:
- Connects 400+ services via pre-built nodes
- Visual workflow builder (no code required)
- Can handle complex conditional logic
- Self-hostable for data sovereignty
Reality check:
- Initial setup takes time
- Debugging workflows can be painful
- Documentation quality varies by integration
IBM Watsonx Orchestrate targets enterprise-grade orchestration with governance and security built in. Designed for regulated industries where audit trails and compliance matter.
Comes with enterprise overhead:
- Longer implementation timelines
- Higher cost
- Requires IBM ecosystem buy-in
Relevance AI combines embedded analytics with decision flows. Succeeds by deeply integrating with common enterprise platforms (Salesforce, Slack, Notion, Google Analytics).
Customer Service Agents
Tidio’s Lyro focuses on SMB live chat with agentic capabilities.
Real performance from users:
- Handles 70-80% of common questions without human intervention
- Gets better with feedback over the first few months
- Falls apart on nuanced questions requiring empathy
Not good for: Complex customer situations requiring judgment calls.
Research and Analysis
Kompas AI specializes in deep research and report generation.
What makes it different:
- Actually reads and synthesizes academic papers
- Maintains citations properly
- Continuous monitoring for new publications
- Integrates with arXiv, PubMed, SSRN
Trade-off:
- Slower than general-purpose AI
- Optimizes for accuracy over speed
- More expensive per query
Beam AI handles document-heavy workflows.
Otter.ai remains solid for meeting notes but hasn’t evolved much beyond transcription + basic summarization.
Healthcare and Specialized Domains
Sully.ai is built specifically for healthcare research and workflow automation.
Key features:
- HIPAA-compliant data handling
- EHR/EMR integration
- Medical dictation and transcription
- Understands medical terminology and context
Why specialization matters: Healthcare agents need to understand domain-specific workflows, regulatory requirements, and terminology. General-purpose agents fail here because they lack the embedded knowledge.
AiSDR (AI sales development) operates similarly to understand sales workflows, CRM integration, and objection handling patterns that general agents don’t grasp.
Use cases of AI agents
AI agents are used across many roles and industries. Below, I’ve listed some of the most common ways AI agents are being put to work:
- Developers
- SecOps assistants
- Human-like gaming characters
- Content creators
- Insurance assistants
- Human resources (HR) assistants
- Customer service assistants
- Research assistants
- Computer users
- AI agent builders
Note that some of these are agentic use cases, as Agentic AI encompasses and extends traditional AI agents by adding autonomy, memory, reasoning, and goal-directed behavior.
What Differentiates Actually Useful Agents
1. Autonomy vs. Control Trade-off
The biggest decision: How much independence do you actually want?
Co-pilot agents (Cursor, Otter, most business tools) maintain human oversight at key decisions. They handle research and execution but require approval before critical actions.
Strategic automation (n8n, Make) follows predefined workflows with minimal real-time decision-making. Predictable and reliable but can’t adapt when encountering unexpected scenarios.
Rule-based systems respond to triggers without contextual understanding. Not really “agentic” but valuable for straightforward automation.
Most companies in 2026 use Level 2-3 agents. Full autonomy (Level 4) creates more problems than it solves unless you’ve built extensive guardrails.
2. Specialized vs. General-Purpose
Specialized agents embed deep domain knowledge. They understand industry workflows, terminology, and compliance requirements.
Higher success rates within their domain. Completely unsuitable for adjacent use cases.
Horizontal platforms (LangGraph, watsonx Orchestrate, Relevance AI) provide flexible frameworks for building custom agents. They sacrifice domain optimization for versatility.
LangGraph focuses on the production-grade generation of multi-agent workflows. Powerful for developers building complex systems, but requires technical expertise.
Relevance AI targets business users with pre-built templates and easier configuration.
Research agents (Kompas AI) optimize for accuracy and thoroughness over speed. Slower but more reliable for knowledge work.
3. Integration Depth
How agents connect with existing systems determines their practical utility.
Native platform integrations distinguish business-focused agents. Beam AI (documents), Relevance AI (analytics) succeed by deeply integrating with Salesforce, Slack, Notion, Google Analytics.
Value comes less from AI capabilities, more from seamless data flow.
API-first architectures (n8n, Make) enable custom integrations but require technical expertise. Support hundreds of pre-built connectors while allowing custom nodes.
Standalone tools (coding agents, cybersecurity agents) optimize for specific technical ecosystems rather than broad compatibility.
4. Security and Compliance
Production deployment requirements create major architectural differences.
Enterprise-grade agents (IBM WatsonX, healthcare agents) prioritize:
- Security certifications (SOC 2, ISO 27001)
- Audit trails
- Compliance frameworks (GDPR, HIPAA)
- Role-based access control
- Data encryption
- Governance workflows
Infrastructure overhead increases costs but enables deployment in regulated industries.
Developer-centric tools (LangGraph, coding agents) focus on debugging, logging, and integration with version control systems. Serve technical users who implement their own security.
Consumer-focused tools often lack enterprise compliance features entirely.
The Governance Problem Nobody Solved Yet
November 2025: Anthropic disclosed how the Claude Code agent was misused to automate parts of a cyberattack. This isn’t a theoretical risk; it happened.
The challenge: Agents make runtime decisions, access sensitive data, and take actions with business consequences. Unlike traditional software that executes predefined logic, you can’t just audit the code to know what will happen.
What leading organizations implement:
“Bounded autonomy” architectures:
- Clear operational limits (can’t spend > $X, can’t delete data, can’t access certain systems)
- Escalation paths to humans for high-stakes decisions
- Comprehensive audit trails of every action
- Ability to pause or rollback agent actions
What Works, What Doesn’t (Real Examples)
What Actually Works Today
Coding assistance at Level 3: Cursor + Claude Code combination used by thousands of developers. Cursor for flow and rapid iteration, Claude for hard problems.
Typical workflow:
- Use Cursor for 80% of coding (feature implementation, refactoring)
- When stuck, escalate to Claude Code for architectural reasoning
- Let agent run tests, iterate on failures
- Human reviews final output before merge
Sales outreach automation: AI agents qualify leads, book meetings, and send follow-ups. Companies report 2-3x increase in sales team productivity.
Klarna deployed sales agents handling initial outreach and qualification. Human reps focus on complex deals and relationship building.
Customer service for common questions: Agents handling 70-80% of routine inquiries during off-hours. Customer satisfaction scores improved because responses are instant instead of “we’ll get back to you tomorrow.”
Research synthesis: Academic researchers using agents to scan new papers, extract relevant sections, maintain citation databases. Saves hours of manual literature review.
What Doesn’t Work Yet
Fully autonomous deployment: Level 4 agents deploying code to production without human approval. Too risky for most companies. Even with extensive testing, edge cases cause problems.
Exception: Simple, well-bounded systems where failures are recoverable.
Complex customer situations: Agents fall apart when empathy, judgment, or nuanced understanding is required. “I understand you’re frustrated” from an agent feels hollow.
Multi-stakeholder decision-making: Agents can’t navigate office politics, understand unspoken context, or read between lines in business negotiations.
Creative strategy: Agents can execute tactics but don’t develop novel strategic approaches. They optimize within given parameters but don’t question the parameters themselves.
The Cost Reality
Everyone talks about agent capabilities. Few discuss economics.
Direct costs:
- Model API calls: $0.003-0.10 per 1K tokens (varies by model)
- Tool execution: APIs, data sources, integrations
- Infrastructure: Hosting, compute for self-hosted systems
Hidden costs:
- Context window usage accumulates fast with multi-turn conversations
- Failed execution attempts (agent tries, fails, retries—you pay for each attempt)
- Debugging and refinement time
- Governance and security infrastructure
- Training team to work effectively with agents
Leading organizations treat agent cost optimization as first-class architectural concern. They build economic models into agent design rather than retrofitting cost controls after deployment.
Example optimization strategies:
- Route simple queries to smaller, cheaper models
- Use prompt caching aggressively (90% cost reduction for repeated context)
- Implement circuit breakers to stop runaway agents
- Monitor token usage per task, optimize prompts
- Batch requests when latency isn’t critical
If you are looking into the infrastructure that powers web-capable agentic AI, here are our latest benchmarks:
- Remote browsers: How browser infrastructure enables agents to interact with the web securely.
- Browser MCP benchmark: Top MCP servers for tool use and web access.
Further reading
Reference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.