Executive summary

AI agents are best understood as goal-directed systems that do more than generate text: they decide when to use tools, retrieve knowledge, coordinate sub-tasks, and sometimes take actions in external software or infrastructure. OpenAI’s own developer documentation frames agents as systems that “accomplish tasks” across simple goals and complex workflows, supported by models, toolkits, and monitoring primitives.

From 2022 to 2026, three shifts made “agentic” systems viable for mainstream teams. First, prompting and training methods explicitly blend reasoning with action and tool invocation (for example ReAct and Toolformer), improving reliability when an agent must look things up or operate in interactive environments. Second, platforms productized the “agent loop” (tools, memory, orchestration, observability) so teams can ship agents with governance and metrics rather than ad hoc scripts. Third, the security conversation moved from generic “LLM safety” toward agent-specific threat models, including supply-chain risks from plugin and skill ecosystems and new “agentic” categories in security guidance.

Pricing in the agent market is not one-dimensional. Most buyers face a hybrid cost stack: (1) per-seat SaaS plans (common for consumer and workplace assistants), (2) usage-based inference (tokens, tool calls, messages, traces), and (3) enterprise governance and support (SSO, audit logs, data residency, indemnities) that is often bundled or custom-quoted. Representative list prices illustrate the spread: ChatGPT Go/Plus/Pro at $8/$20/$200 USD per month, Claude Pro at $20 per month (and Max from $100), Microsoft Copilot Studio at $200 per 25,000 credits per month or $0.01 per message under pay-as-you-go meters, LangSmith at $39 per seat per month plus trace overages, and Perplexity Enterprise Max at $271 per seat per month when billed annually.

OpenClaw deserves special attention because it sits at the intersection of personal automation and open-source extensibility: it markets itself as “the AI that actually does things” through chat apps, offers persistent memory and tool execution, and supports a public skill marketplace (ClawHub). Its openness and deep access are also its biggest risk surface: multiple 2026 security reports and news investigations found malicious skills and supply-chain abuse in the ecosystem, prompting responses such as VirusTotal scanning integration and optional sandboxing and policy controls in the docs.

What is an AI agent and how the concept evolved

An AI agent, in contemporary product and engineering usage, is an LLM-centered system that can (a) interpret a goal, (b) decide on intermediate steps, (c) selectively use tools or integrations, and (d) produce outputs or actions with a traceable execution record. OpenAI’s agent guidance explicitly emphasizes toolkits (AgentKit), models with “agentic strengths,” and “dashboard features” for monitoring and optimization. Anthropic’s Claude Agent SDK similarly describes production agents that can autonomously read files, run commands, search the web, edit code, and manage context.

A practical point for decision-makers is that “agent” is not a binary label. Modern systems span a spectrum from chat-centric assistants to highly autonomous workflows with delegated sub-agents and computer-use capabilities, and the level of autonomy directly changes risk, observability needs, and cost predictability.

Agent evolution milestones (selected)

  1. 1966

    ELIZA popularizes rule-based conversational interaction.

  2. 1995

    A.L.I.C.E. scales pattern-matching chat via AIML communities.

  3. 2011

    Siri mainstreams voice assistants on smartphones.

  4. 2022

    ChatGPT accelerates LLM chat adoption.

  5. 2022

    ReAct formalizes interleaving reasoning and actions for tool use.

  6. 2023

    Toolformer shows self-supervised learning of API/tool usage.

  7. 2023

    AutoGPT-style autonomous loops inspire agent frameworks and benchmarks.

  8. 2023

    AutoGen advances multi-agent conversation orchestration.

  9. 2025

    OSWorld benchmarks open-ended computer-use tasks for multimodal agents.

  10. 2025

    OpenAI launches Responses API and an Agents SDK for developer-built agents.

  11. 2025-2026

    Agent security guidance matures (OWASP agentic risks).

The timeline is anchored by primary and widely cited references: Weizenbaum’s ELIZA paper (1966), documented histories of ALICE (1995) and Siri’s mainstream introduction (2011), OpenAI’s ChatGPT launch (Nov 30, 2022), and research milestones ReAct (2022) and Toolformer (2023). The “AutoGPT-style” period is reflected in subsequent benchmarking and academic analysis of Auto-GPT agents (2023), and in multi-agent frameworks such as AutoGen (2023) that formalize conversation-based orchestration among specialized agents. OSWorld (public benchmark site and conference references in 2024–2025) captures the shift toward evaluating agents in real web and desktop environments rather than purely textual tasks. OpenAI’s public rollout of agent-building APIs and SDKs (Responses API + Agents SDK coverage and docs) marks the commercialization of these ideas into developer platforms.

Definition and taxonomy of AI agents

A useful taxonomy for procurement and architecture reviews splits agents by autonomy level, task scope, and coordination pattern. This aligns with how platforms position themselves (consumer chat, workplace copilots, developer SDKs, or multi-agent orchestration).

Chatbots

Chatbots are conversation-first systems focused on interactive Q&A, drafting, and lightweight assistance. They may include tool features (web search, file analysis) but generally keep the human in the driver’s seat, with clear turn-by-turn control. ChatGPT’s plan descriptions, for example, emphasize messages/uploads, deep research, “agent mode,” and memory as tiered features in a chat product. Claude’s consumer plans similarly promote chat across devices, web search, code execution, files, “memory across conversations,” and optional extensions/connectors.

Task-specific agents

Task-specific agents are narrower systems optimized for a bounded domain: customer support triage, invoice processing, knowledge-base query, internal IT helpdesk, or code refactoring. They often use retrieval and tool calls but constrain actions to reduce risk and control cost variance. OpenAI’s agent guidance presents AgentKit as a toolkit to build workflows with models, tools, knowledge, and logic in a single UI, which is a common pattern for task-scoped agents. Microsoft positions Copilot Studio agents as a way to build business agents with connectors, flows, and governance policies, including the ability to block knowledge sources, connectors-as-tools, and publishing channels via data policies.

Autonomous agents

Autonomous agents attempt to decompose goals into sub-tasks and execute multi-step plans with limited or asynchronous human input. They may run background jobs, trigger on events, and operate over long horizons, which means they require stronger observability, safeguards, and cost controls. OpenClaw’s “sub-agents” feature explicitly supports background agent runs for parallel research or long tasks and notes the cost impact because each sub-agent has its own context and token usage. The broader research direction (ReAct, Toolformer) underpins why autonomy became more feasible: interleaving reasoning with tool actions reduces hallucination and allows grounded steps.

Multi-agent systems

Multi-agent systems coordinate multiple specialized agents that collaborate, delegate, critique, or verify outputs. This often improves robustness (self-checking) and throughput (parallelism), at the cost of higher orchestration complexity and non-linear token consumption. OpenAI’s Agents SDK positions itself as a framework for building multi-agent workflows and includes built-in tracing and a traces dashboard for debugging and monitoring. AutoGen’s research and documentation describe a multi-agent conversation framework where agents integrate LLMs, tools, and human inputs via automated agent chat. OpenClaw’s docs separately describe “multi-agent routing” and per-agent security profiles (sandbox configuration and tool restrictions), reflecting the operational need to isolate roles in a multi-agent setup.

Core features and the reference architecture decision-makers should evaluate

Across vendors, “agent platforms” converge on a common reference architecture. The differentiators are less about whether a feature exists and more about defaults, governance, ergonomics, and what is observable or auditable in production.

Capabilities and orchestration

Capabilities include reasoning, planning, and action selection. In practice, modern agents embed an “agent loop” that alternates between deciding what to do and executing tools, which matches both research framing (ReAct’s interleaving of reasoning and acting) and vendor SDKs that package this loop for developers. Orchestration primitives typically include (1) tool selection and execution, (2) handoffs or delegation to specialized agents, and (3) fallback behaviors like retries, model fallback chains, or human-in-the-loop approvals. Perplexity’s Agent API models documentation explicitly supports model fallback chains for high availability, and OpenClaw provides mechanisms for spawning sub-agents and configuring untrusted tool surfaces by depth and policy.

Integrations and tool use

Tool use is the defining ingredient that separates a chatty assistant from an operational agent. Anthropic’s docs describe tool use where Claude decides whether tools can help, emits structured tool-use requests, and expects the client to execute tools and return results. OpenAI’s platform pricing makes clear that tool usage often has separate meters (for example web search tool calls, file search tool calls, and containerized code execution sessions), which matters for budgeting and unit economics. Microsoft’s Copilot Studio ecosystem leans heavily on connectors: official documentation describes Power Platform connectors as “wrappers” around APIs enabling Copilot Studio (and Power Automate/Apps/Logic Apps) to communicate with other services, while also describing data policies for blocking connectors or knowledge sources to prevent exfiltration.

A notable 2025–2026 trend is “computer use” (agents that click, type, and navigate UI when no API exists). Microsoft’s Copilot Studio introduced “computer use” for interacting with websites and desktop apps, positioned as enabling automation even without an API. This category also explains why OSWorld-style benchmarks became important: they measure success rates across real computer tasks rather than curated text-only tests.

Memory

“Memory” spans at least two distinct layers: short-term conversational state (what’s in the current session) and long-term persistence across sessions (user preferences, project context, knowledge bases). LangGraph’s documentation explicitly distinguishes short-term memory as thread-scoped state persisted via checkpoints and long-term memory as cross-session stores that can be recalled across threads. OpenClaw’s memory tools show a concrete implementation: semantic search over Markdown memory files, controlled paths, local vs remote embedding providers, and optional batch embedding for large indexing jobs.

Observability and evaluation

Agents fail in ways standard apps do not: invisible loops, tool misuse, escalating cost, partial task completion, and subtle regressions. As a result, many platforms now treat tracing and evaluation as first-class. The OpenAI Agents SDK includes built-in tracing and a traces dashboard that records events like LLM generations, tool calls, handoffs, and guardrails, with tracing enabled by default. LangSmith directly productizes this need with trace-based billing and features like monitoring, alerting, evaluation workflows, and an “Agent Builder.” OpenClaw exposes operational “cost and usage” surfaces in chat via slash commands, including session cost snapshots and per-response usage footers, which is unusual in consumer-like agent experiences and valuable for cost governance.

Safety, security, and governance

Agent safety is not only about model outputs, but also about tool privileges, identity, and supply-chain risk. OpenClaw’s own documentation treats third-party skills as untrusted code, recommends sandboxing, and highlights that skills can inject secrets into the host process for an agent turn, raising operational security concerns. This is not theoretical: early 2026 reporting found malware in OpenClaw skills distributed via the ClawHub marketplace and documented how agent extensibility can become an attack surface.

Microsoft’s governance story is more enterprise-native: official documentation describes Copilot Studio security and governance controls such as geographic data residency, DLP policies, and certifications; and it provides concrete admin mechanisms to block connectors, knowledge sources, or channels. Industry security guidance has also started to formalize “agentic” risk categories: OWASP’s GenAI Security Project introduced a Top 10 for Agentic Applications focused on the unique security hazards of autonomous tool-using systems.

Target audiences and buying criteria

The same platform can look “expensive” or “cheap” depending on which audience is buying and which cost line item is dominant (seats, tokens, tool meters, admin overhead, or incidents). Sources from vendors’ own plan pages illustrate that segmentation: consumer tiers emphasize usage and convenience, while team and enterprise tiers highlight admin controls, connectors, and compliance.

Developers typically prioritize: API ergonomics, model flexibility, observability primitives, reproducible evals, and the ability to run locally or in their own infrastructure. OpenAI’s Agents SDK and AgentKit materials stress developer workflows, tracing, and deployment patterns; LangGraph/LangSmith emphasize controllable workflows, memory, durability, and debugging.

Enterprises prioritize: identity, access control, auditability, data residency, DLP and compliance posture, and vendor support. Microsoft’s Copilot Studio and Microsoft 365 Copilot emphasize enterprise controls and metered billing tied to Azure subscriptions, and official guidance describes policy enforcement and compliance foundations. Perplexity Enterprise advertises security and compliance claims (SOC 2 Type II, HIPAA, GDPR, PCI DSS) directly on its pricing pages, reflecting a similar enterprise buyer focus.

SMBs often want “good enough governance” but cannot absorb high integration costs. They gravitate to tools with fast setup, predictable per-seat pricing, and strong integrations. Examples include ChatGPT Business (per-seat with admin controls and internal tool connections), Claude Team (per-seat with SSO and connectors), and turnkey platforms like Replit that bundle building and deployment with agent automation credits.

Consumers generally choose based on convenience, capabilities, and limits, not architecture. Official pricing and plan pages for ChatGPT and Claude show a clear ladder from free to higher-usage tiers, with price points that have effectively become market anchors ($8 to $20 to $100 to $200).

Researchers and advanced analysts trend toward tools that (a) support deep research with citations, (b) can incorporate files and internal knowledge sources, and (c) allow some customization. Perplexity explicitly positions itself as better for complex questions and “building reports,” with deeper sourcing including proprietary data partners, while offering an Agent API with token-based, provider-direct pricing.

Pricing models and cost drivers

Agent pricing is best analyzed by billing units (seat, token, message, trace, action) and by what expands during scale (users, integrations, tool calls, long context, retention, and governance). The sections below use representative list prices from official pages; enterprise discounts and minimum commitments vary materially by vendor and customer size and should be treated as non-stationary inputs.

Typical monthly list prices for agent-style tiers (USD, representative examples)

Free$0
Entry$8
Pro$20
Team/SMB$30
Max$100
Ultra$200
Enterprise-high$271

The bars are representative anchors from widely used plans: Entry $8 and Pro $20 and Ultra $200 match ChatGPT Go/Plus/Pro list prices; Max $100 matches Claude Max starting price; Team/SMB $30 matches common team seat pricing such as ChatGPT Business monthly pricing ($30 per seat) and similar tiers; and Enterprise-high $271 corresponds to Perplexity Enterprise Max when billed annually.

Subscription pricing

Typical structure: fixed monthly or annual fees per user (sometimes with “reasonable use” policies and throttles), often with feature gating (deep research, higher reasoning, memory) and optional add-on credits.

Representative price ranges (list pricing):

Billing units: user-month (seat), often with annual commitments for lower effective monthly price (Claude Pro annual discount; Perplexity annual billing shown).

Primary cost drivers:

Usage-based pricing

Usage-based pricing dominates developer platforms and metered enterprise agent features. It is usually token-based (inputs and outputs), with separate meters for tool calls, storage, and sometimes containerized execution.

Typical price ranges (illustrative, per 1M tokens):

Billing units: input tokens, output tokens, and sometimes cached tokens at discounted rates (OpenAI lists cached input pricing; Perplexity model tables include cache read price; OpenClaw’s token cost estimation supports cache read/write categories in config).

Primary cost drivers:

Tiered plus usage (hybrid)

Hybrid pricing combines a base subscription with included credits or allowances and explicit overage meters. This model is common because agents have bursty usage and “unknown unknowns,” and vendors want both predictable ARR and variable revenue aligned to actual consumption.

Representative structures:

Billing units: credits, traces, deployment runs, usage credits, messages, and other platform-specific meters.

Primary cost drivers:

Enterprise licensing and negotiated contracts

Enterprise licensing typically bundles governance features (SSO, SCIM, admin roles, audit logs, DLP, data residency, indemnities) and support SLAs. Many vendors do not publish list prices for their highest tiers, but some publish high-end per-seat tiers (for example Perplexity Enterprise Max).

Representative range signals from official pricing pages:

Billing units: frequently per seat with annual commitments, sometimes plus usage meters for agent actions, tool calls, or add-on credits.

Primary cost drivers:

Open-source plus support costs

In open-source agent stacks, the software license is typically free, but real cost comes from infrastructure, APIs, security work, and internal engineering time.

Representative operational cost signals from OpenClaw’s own docs and adjacent deployment guidance:

Billing units: your cloud bill (VM-hours, storage, bandwidth), API tokens and tool calls (from whichever providers you attach), plus labor (engineering, security review, incident response).

Primary cost drivers:

Platform comparison for technical decision-makers

This table summarizes the platforms requested, focusing on commercial shape and operational characteristics rather than model quality alone.

PlatformPricing modelFree tierPrimary use caseIntegrationsCustomizationSecurity/complianceBest-for
OpenClawOpen-source software + bring-your-own hosting + bring-your-own model/tool APIs; cost visibility via /status and /usage surfacesNo license fee; operational costs depend on hosting and attached APIsPersonal automation via chat apps and local tools; skill-based extensibilitySkills via ClawHub; tools for web search, memory, exec, and moreHigh (self-host, config-driven, multi-agent, per-agent sandbox and tool policy)Strong controls exist (sandboxing, tool policy, approvals) but risk is high if misconfigured; ecosystem has seen malware incidentsPower users and builders who can self-host and want maximum automation capability with transparent internals
OpenAI agent platform (ChatGPT, Codex)Usage-based API (tokens) plus tool meters; Agent Builder design time is free until “Run”; ChatKit storage has GB-day pricing beyond free tierFree design/iteration in Agent Builder; limited free storage tier for ChatKit uploadsBuilding and deploying agent workflows (tools, knowledge, evals)Built-in tools (web search, file search, containers) with explicit metering; connectors via AgentKit ecosystemHigh for developers (SDK plus workflow tooling); multi-agent orchestration supportedTracing dashboard records tool calls, handoffs, guardrails; enterprise controls listed as included for AgentKit featuresTeams building production agents with strong observability and a large tool ecosystem
Claude agent platform (Claude Agent SDK + Claude plans)Subscription tiers (individual, team) plus API token pricing for custom builds; agent SDK is a developer libraryFree plan exists; higher usage in Pro/Max/Team tiersCoding and productivity agents (files, commands, web) with packaged agent loopTool use API; connectors via MCP; team plans mention Slack/Microsoft 365 connectorsHigh for devs via SDK; moderate for non-devs via app featuresTeam/enterprise features include SSO/SCIM/audit logs and “no training on content by default” claims for team plansDev teams building agentic coding and knowledge workflows, especially if they want SDK-packaged autonomy
Copilot/Agents (Microsoft Copilot + Copilot Studio)Per-user licensing for Copilot plans plus metered agent usage via Copilot Studio credits or Azure meters; $200 per 25k credits pack; $0.01 per billable message meterCopilot Chat included for eligible Microsoft 365 users, but “agents” require Azure and are meteredEnterprise workplace agents inside Microsoft 365, Teams, and external channels via Copilot StudioPower Platform connectors and agent flows; official docs describe connectors and data policy controlsModerate to high (low-code agent building plus custom connectors and flows)Governance: DLP, residency, certifications, admin controls; compliance and policy guidance is extensive in Microsoft LearnLarge orgs standardized on Microsoft 365 seeking governed agent rollout and integration coverage
Ecosystem (LangGraph + LangSmith)Open-source LangGraph library (free) plus LangSmith per-seat and usage-based tracing; $39/seat/mo plus trace overagesLangSmith Developer plan is $0/seat with included trace quotaBuilding controllable, stateful agent workflows with strong observability and eval toolingIntegrates with many models and tools through code; strong memory primitivesVery high (code-first graphs, state, memory, human-in-loop)Compliance claims include SOC 2 Type II, HIPAA, GDPR; enterprise options include self-hostingDev teams who want maximum control over orchestration with best-in-class tracing/evals
AutoGPTOpen-source/self-hosted (common pattern) with third-party model API costs; “platform” positioning varies by distributionYes (self-hosted is free to install)Experimentation with autonomous goal decomposition loopsTooling varies by fork; common patterns include web access, file memory, pluginsHigh for hackers; rough edges for production deploymentsSecurity posture depends on deployment; not inherently enterprise-governedResearchers and builders prototyping autonomous loops without committing to a SaaS platform
PerplexityPer-seat subscriptions (Pro and Enterprise tiers) plus API token pricing (Agent API, Sonar, etc.)Yes (free tier exists; Pro and Enterprise are paid)Research, sourcing, and report-building; enterprise knowledge search across web and work appsEnterprise plan describes search across team files and work apps; API supports model fallback and multiple providersModerate (strong UI, some model selection; deep orchestration usually via API)Enterprise pricing page advertises SOC 2 Type II, HIPAA, GDPR, PCI DSS; data privacy claims are explicitKnowledge workers who need fast, cited research and enterprise data protections
Ghost (Ghostwriter) and Replit AgentSubscription tiers with included usage credits and pay-as-you-go; Core $20/mo annual; Teams $35/seat/mo annualStarter plan is free with daily agent creditsApp and website building via an autonomous coding agent in a hosted IDEIntegrations are largely “inside Replit” (build, run, deploy); AI integrations credits includedHigh (you own code; agent executes in the dev environment)Enterprise plan offers SSO/SAML and SCIM; RBAC for TeamsStartups and teams shipping prototypes to production quickly, especially for full-stack builds

OpenClaw deep dive: official features, pricing, target users, and competitive positioning

What OpenClaw is, based on official sources

OpenClaw positions itself as a personal AI assistant that “actually does things” such as email and calendar actions, operating through chat channels like WhatsApp and Telegram and supported by a configurable agent runtime. It couples three product ideas that are strategically important:

  1. A local or self-hosted runtime with tool execution, including optional Docker-based sandboxing to reduce blast radius.
  2. Persistent memory primitives, including semantic memory search over local Markdown memory files with controlled read paths, and the option to use remote embeddings or local embeddings.
  3. A public skills marketplace (ClawHub) for distributing extensible capabilities as “skill bundles.”

OpenClaw’s docs also emphasize observability and cost awareness within the chat UX itself. The “API Usage and Costs” and “Token Use and Costs” references enumerate where costs appear (/status, /usage footers), what features can spend API keys (core responses, memory embeddings, web search, media understanding, skills), and how cost estimation requires configured per-model USD-per-1M-token rates.

Pricing: what is “official” vs what is inherently variable

OpenClaw does not present itself as a traditional per-seat SaaS with a published subscription for the core runtime. In practice, its “official pricing” is best described as:

This makes OpenClaw’s cost profile closer to a developer platform than a consumer assistant: your bill scales with automation frequency, tool calls, and long context usage rather than only “number of users.”

Target users

OpenClaw’s official docs and feature set imply three primary user segments:

Pros and cons: an evidence-based view

Pros (strong differentiators):

Cons (material risks and tradeoffs):

The OpenClaw team and ecosystem have responded to the security pressure with concrete measures. OpenClaw announced a partnership integrating VirusTotal scanning for skills, and VirusTotal itself added native support for analyzing OpenClaw skill packages via Code Insight to detect emerging abuse patterns. These are meaningful steps, but they do not eliminate fundamental agentic risks such as prompt injection and privilege misuse, which is consistent with broader industry security guidance for agentic systems.

How OpenClaw compares to the listed competitors

Compared to OpenAI Agents:

Compared to Anthropic Claude Agents:

Compared to Microsoft Copilot/Agents:

Compared to LangChain-based platforms:

Compared to AutoGPT:

Compared to Perplexity:

Compared to Replit Ghost:

Recommendations and alternatives by user type

Recommendations below assume a buyer that cares about reliability, security posture, and predictable scaling, and treats “agent” as a production system rather than a demo.

For developers building productized agents:

For enterprises standardizing internal agents:

For SMBs and startups (time-to-value matters more than perfect governance):

For consumers and individual professionals:

For researchers and evaluators:

Share This Page