AI coding agents need to see what they build. Whether it’s verifying a UI change, debugging an HTMX swap, or inspecting network traffic, browser automation is becoming a core part of the agent toolchain — not just an afterthought.
I’ve been building Squad of Agents almost entirely through Claude Code, and visual debugging is a daily workflow. Three browser automation tools are currently available in this ecosystem:
- agent-browser — a headless CLI designed specifically for AI agents
- Playwright MCP — Playwright exposed as an MCP server
- Claude in Chrome — a Chrome extension giving Claude direct control of your browser
The question isn’t which one works — they all do. The question is which one minimizes overhead in the two resources that matter most: time and context window tokens.
Methodology
Each tool performed an identical sequence against a production Django app: navigate to an authenticated dashboard, capture a full-page screenshot, read console logs, read network requests, and generate an accessibility snapshot. Same page, same data, same session.
Results
| Metric | agent-browser | Playwright MCP | Claude in Chrome |
|---|---|---|---|
| Wall time | 9.7s | 29.7s | 40.2s |
| Tool calls | 1 | 8 | 8 |
| Screenshot | 57KB PNG to file | 57KB PNG to file + inline | JPEG inline only |
| Console logs | Runtime only | Page-load only | Runtime only (after enable) |
| Network capture | None (route/mock only) | 16 requests | None (after enable) |
| Snapshot size | 2KB YAML | ~4KB YAML | ~3KB flat list |
| Token cost | ~50 tokens | ~3-5K tokens | ~2K tokens |
What the numbers reveal
The round-trip tax is real
agent-browser runs as a single Bash invocation — one tool call, no round-trips. Playwright and Chrome MCP require 6-8 sequential tool calls, each carrying the latency of model inference plus MCP transport. That serialization penalty alone accounts for a 3-4x speed difference.
This is an architectural insight, not an implementation detail. Any MCP-based browser tool will pay this tax. CLI tools that batch operations avoid it entirely.
Token efficiency compounds over sessions
This is the more consequential finding. Playwright MCP returns the full page accessibility snapshot with every response — navigation, clicks, screenshots, everything. Each snapshot is 3-5K tokens of YAML injected directly into the context window.
agent-browser writes outputs to files. The context window sees a one-line confirmation. Over a 20-interaction debugging session, that’s the difference between ~1K tokens (agent-browser) and ~60-100K tokens (Playwright MCP) spent purely on browser output.
When your context window is your most constrained resource, this isn’t a minor optimization — it determines whether your debugging session completes or gets truncated.
Network observability is Playwright’s edge
Playwright MCP was the only tool that passively captured network requests — 16 in total, including static assets, API calls, and analytics pings. agent-browser’s network commands are designed for route interception and mocking, not passive logging. Claude in Chrome requires enabling tracking before the page loads.
For debugging client-server interactions (failed API calls, CORS issues, missing resources), Playwright’s passive network capture is irreplaceable. It’s the one scenario where the token overhead is justified.
Console logging has nuances across all three tools. agent-browser captures runtime console output (messages after page load) via its console command — useful for debugging interactions. Playwright MCP captures page-load console output. Claude in Chrome requires explicit enablement. None of them give you a complete picture out of the box.
Reliability is not equal
Claude in Chrome failed on first attempt (“Browser extension is not connected”) and only worked in the second try. It depends on a running Chrome instance with an active extension — a fragile prerequisite for automated workflows.
agent-browser and Playwright MCP are self-contained. They launch their own browser instances and have no external dependencies beyond the binary itself.
Decision framework
| Scenario | Tool |
|---|---|
| Visual verification after code changes | agent-browser |
| Debugging API/network interactions | Playwright MCP |
| Inspecting state in your actual browser session | Claude in Chrome |
| Long debugging sessions (token budget matters) | agent-browser |
| CI/automated visual testing | agent-browser |
Conclusion
agent-browser should be the default. It’s 3-4x faster, ~50-100x more token-efficient, and the most reliable of the three. Switch to Playwright MCP when you need network observability. Use Claude in Chrome when you need your real browser’s authentication state.
The deeper lesson here is about tool design for AI agents. The best agent tools aren’t the most feature-rich — they’re the ones that respect the agent’s constraints. A tool that writes to files instead of flooding the context window understands that tokens are the scarcest resource. A tool that batches operations into a single invocation understands that round-trips are the primary latency bottleneck.
As AI-driven development matures, the tools that win will be the ones designed around these constraints from the ground up.