Skip to main content

5 posts tagged with "claude-code"

View All Tags

Claude Code Doesn't Index Your Codebase. Here's What It Does Instead.

· 21 min read
Vadim Nicolai
Senior Software Engineer

Last verified: March 2026

Boris Cherny's team built RAG into early Claude Code. They tested it against agentic search. Agentic search won — not narrowly. A Claude engineer confirmed it in a Hacker News thread: "In our testing we found that agentic search outperformed [it] by a lot, and this was surprising."

That thread is the clearest primary source on how Claude Code actually works — and why it works that way. Most articles on the topic paraphrase it from memory. This one starts from the source.

Q: Does Claude Code index your codebase? A: No. Claude Code does not pre-index your codebase or use vector embeddings. Instead, it uses filesystem tools — Glob for file pattern matching, Grep for content search, and Read for loading specific files — to explore code on demand as it works through each task. Anthropic calls this "agentic search."


The Confession: What Boris Cherny Actually Said

In a public Hacker News thread, Boris Cherny — principal software engineer at Anthropic and Claude Code's creator — wrote this directly:

"Early versions of Claude Code used RAG + a local vector db, but we found pretty quickly that agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."

That post on X/Twitter was followed by a comment from a Claude engineer in the same HN thread: "Right — Claude Code doesn't use RAG currently. In our testing we found that agentic search outperformed [it] by a lot, and this was surprising."

The "surprising" qualifier matters. This was not a story where the team started with a principled position and built to confirm it. They built RAG, tested it, found it underperformed, and redesigned. The result is an architecture that runs against the grain of every major competing tool — and the gap is not accidental.

Cherny's background shapes how to read this decision. Before Anthropic, he was a principal engineer at Meta. He describes Claude Code's origin as a personal experiment: he gave the model a bash tool, watched it autonomously write AppleScript to query his music library, and realized the implication. An agent with tools beats a script with pre-retrieved context. That insight drove Claude Code's entire design. The YC Startup Library interview goes deeper on this philosophy: Cherny believes the future of development lies in "agent topologies" — multiple agents with fresh, isolated context windows working in parallel, not a single large agent with accumulated, polluted memory.

The architectural bet against indexing is downstream from that belief.


How Claude Code Actually Searches Your Code

"Agentic" means the model drives the search process rather than receiving pre-retrieved context. Claude Code decides what to look for, picks the right tool, acts on the result, and loops until it has enough to complete the task. The loop is think → act → observe → repeat, continuing until the model produces a plain text response with no tool call attached.

What makes this work in practice is that the tools have very different cost profiles — and Claude Code is designed to use them in cost order.

The Tool Hierarchy with Token Economics

ToolWhat It DoesToken CostUse Case
GlobFile path pattern matchingNear-zero — returns paths onlyworkers/**/*.toml, src/**/*.graphql
GrepRegex content search (powered by ripgrep)Lightweight — returns matching linescreateD1HttpClient, is_remote_eu
ReadFull file contents into contextHeavy — 500–5,000 tokens per fileConfirm and load a specific file
Explore agentIsolated read-only sub-agent (Haiku model)Isolated — does not touch main context windowDeep codebase exploration across many files

Eighteen built-in tools are confirmed in BrightCoding's reverse-engineering of Claude Code's minified JS, including Bash, Grep, Glob, Read, WebFetch, and the Task tool that spawns sub-agents. The Piebald-AI GitHub repo tracks all system prompt components and sub-agent prompts per version, updated within minutes of each Claude Code release. George Sung independently confirmed the same loop structure in January 2026 by forking Ollama to intercept API traffic.

Glob is the opening move. workers/**/*.toml costs almost nothing — it returns file paths, not file contents. Claude Code uses Glob to narrow the search space before any expensive operations begin.

Grep does heavier lifting: searching file contents by regex. Running grep -r "createD1HttpClient" . returns every line containing that string, with surrounding context. It is fast, exact, and composable. Claude Code chains Grep calls the way a developer would in a terminal — each search informed by the previous result, progressively narrowing toward the relevant files.

# The kind of grep chain Claude Code runs:
grep -r "createD1HttpClient" src/
grep -r "D1HttpClient" src/db/
grep -r "import.*d1-http" src/

Read loads a full file into the context window. A 200-line TypeScript file costs roughly 500–1,500 tokens. Claude Code reserves Read for files already identified as relevant via Glob and Grep — it is the confirm step, not the discovery tool.

The Explore Sub-Agent Architecture

For deep exploration, Claude Code spawns an Explore sub-agent: a read-only specialist that runs on the Haiku model inside its own isolated context window. The Piebald-AI repo documents three sub-agent prompt types with their sizes as of current versions: Explore agent (516 tokens), Plan mode enhanced (633 tokens), Task tool (294 tokens).

The Explore agent can Glob, Grep, Read, and run limited Bash (list, copy, move). It cannot create or modify files. When it finishes, it returns a summary to the main agent — not raw file contents. That summary preserves the insight while discarding the tokens.

This is the key isolation property: exploration work does not consume the main conversation's context budget. Cherny has described this as essential to his "agent topologies" philosophy — fresh context windows prevent the main session from accumulating irrelevant content from early searches that turned out to be dead ends.

Q: How does Claude Code search code in large repositories? A: Claude Code uses a three-tool hierarchy: Glob (lightweight file path pattern matching), Grep (content search returning matching lines), and Read (full file content into context). For deep exploration, it spawns an Explore sub-agent — a read-only Haiku model with its own isolated context window — to keep heavy search from consuming the main conversation's token budget.


The Economics: Why This Approach Is Viable at Scale

The most important financial fact about Claude Code's architecture is the 92% prompt prefix reuse rate. LMCache's December 2025 analysis found that across all phases of Claude Code's agentic loop — including the ReAct-based sub-agent loops — the same prefix (system prompt, tool definitions, CLAUDE.md contents) appears in 92% of turns.

This matters because of how Anthropic's prompt caching works: cache write tokens cost 1.25x base price, but cache read tokens cost only 0.1x. For a 2M-token session, processing without caching costs 6.00.Withprefixcachingat926.00. With prefix caching at 92% reuse, that drops to 1.152 — an 81% cost reduction.

Without this, the "burn tokens iteratively" critique would be damning. With it, the economics of agentic search become defensible even on large codebases.

There is a real pricing cliff to understand. Claude API input tokens are priced at 3/millionupto200Ktokensperrequest;beyond200K,alltokensinthatrequestcost3/million up to 200K tokens per request; beyond 200K, all tokens in that request cost 6/million — a 2x jump. This is a hard threshold, not a gradual escalation. Agentic sessions that accumulate significant context must manage this cliff deliberately. Anthropic's cost documentation estimates heavy API coding sessions at 3,650+/month.ClaudeMaxat3,650+/month. Claude Max at 200/month works out to approximately 18x cheaper for intensive use — which is why most developers using Claude Code heavily are on the subscription plan rather than the API.

The latency problem with sequential tool calls is real — but being solved. Relace's Fast Agentic Search (FAS) showed what is possible: an RL-trained sub-agent calling 4–12 tools in parallel instead of sequentially. Each sequential tool call takes 1–2 seconds; 20 sequential turns means 20–40 seconds of latency. FAS reduced 20 turns to 5 and 10 turns to 4, a 4x latency reduction, while maintaining accuracy comparable to Claude Sonnet 4.5. The bottleneck is sequential execution, not the agentic approach itself.


How the Competition Does It: Cursor, Windsurf, and Copilot

Claude Code's no-index bet cuts against the design of every major competing tool.

ToolSearch StrategyIndex LocationPrivacy ModelFreshness
Claude CodeAgentic: Glob → Grep → Read → Explore agentsNo index (runtime search)Data never leaves machineAlways current (filesystem reads)
CursorSemantic vector RAG + optional @CodebaseTurbopuffer (cloud) + local cacheEmbeddings + masked paths in cloudMerkle-tree delta sync; incremental lag
Windsurf CascadeAST-level semantic RAG, local indexLocal (+ optional remote)Local-first; enterprise optionsAuto-updated on file change
GitHub CopilotCode-tuned transformer embeddingsGitHub API (remote) + local for under 750 filesEmbeddings in GitHub cloudIndexed per commit; local for uncommitted
Zed AIAutomatic context discovery (agentic-leaning)Varies by model providerDepends on providerRuntime

Cursor is the most technically detailed comparison. The Engineers Codex analysis documents the full pipeline: Cursor computes a Merkle tree of hashes of all valid files, sends delta diffs to AWS-cached embedding storage, and queries Turbopuffer — a serverless vector and full-text search engine — at inference time. Only metadata is stored in the cloud: masked paths (each path component hashed with a secret key and fixed nonce), line ranges, and embedding vectors. Raw source code never leaves the machine. Indexing time dropped from a median of 7.87s to 525ms after optimization. Cursor shows an index status indicator; Claude Code shows nothing, because nothing needs to build.

Windsurf Cascade takes a different approach: AST-level indexing, building semantic blocks at function, method, and class boundaries rather than naive text chunks. The index starts immediately on workspace open and stays updated automatically on file change. It is local-first, which gives it the freshness advantage of no sync lag.

GitHub Copilot went generally available with semantic search in March 2025. The embedding model is a proprietary transformer fine-tuned on source code. For projects under 750 files, VS Code builds a local advanced index automatically; 750–2,500 files requires a manual trigger; above 2,500 falls back to a basic index. Uncommitted changes use a hybrid local approach.

The user experience difference is immediate: Cursor and Copilot require a setup phase with progress indicators. Claude Code requires nothing. That zero-friction start is not just UX polish — it reflects the architecture. There is genuinely nothing to build.

Q: What is the difference between Claude Code and Cursor indexing? A: Cursor proactively indexes your codebase using tree-sitter chunking and vector embeddings stored in Turbopuffer, updated incrementally via Merkle tree sync. Claude Code does not index at all — it searches on demand using grep-style exact-match tools. Cursor wins on semantic and conceptual search; Claude Code wins on precision, freshness, and zero setup time.


Why Anthropic Chose Grep Over Embeddings

Q: Why doesn't Claude Code use RAG? A: Claude Code's creator Boris Cherny explained on Hacker News that early versions did use RAG with a local vector database, but the team found agentic search consistently outperformed it. The main reasons: precision (grep finds exact matches, embeddings introduce fuzzy positives), simplicity (no index to build or maintain), freshness (a pre-built index drifts from code during active editing), and privacy (no data leaves the machine for embedding computation).

The precision argument is the strongest one for code specifically. createD1HttpClient either appears in a file or it does not. There is no fuzzy positive. Vector embeddings can surface "conceptually adjacent" code that shares no tokens with the target symbol — and in a coding context, conceptual adjacency without textual match is usually noise, not signal.

There is also academic validation. An Amazon Science paper published February 2026 (arXiv 2602.23368, "Keyword Search Is All You Need") ran a systematic comparison of RAG against agentic keyword search across retrieval tasks and found that keyword search via agentic tool use achieves over 90% of RAG-level performance without a vector database. The benchmark focused on document Q&A rather than code navigation specifically — but the principle that exact-match retrieval with iterative refinement competes with semantic search holds in the code context where symbols are precise by definition.

Anthropic's own engineering blog makes the philosophical case explicit. Their September 2025 post, "Effective Context Engineering for AI Agents", states: "Good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." The "just in time" framing is key — agents should maintain lightweight identifiers (file paths, function names) and load data at runtime rather than pre-loading a large static context.

The December 2024 "Building Effective Agents" post reinforces this: "The most successful implementations use simple, composable patterns rather than complex frameworks." The basic building block is an LLM enhanced with retrieval, tools, and memory — but critically, with the model generating its own search queries rather than receiving pre-retrieved context.

Four specific objections drove the RAG abandonment decision. Security: an index stored somewhere is a target; Cursor's path masking adds cryptographic complexity that Claude Code avoids entirely. Privacy: embeddings of proprietary code leak information even as dense vectors; research on embedding inversion has shown partial text recovery in some settings. Staleness: an index built at session start is stale as soon as the first file changes. Reliability: every additional system is a failure point; vector DBs have latency spikes, embedding APIs have rate limits, sync pipelines have bugs.


The Real Costs: Token Burn and the Semantic Miss

The strongest published critique of agentic search came from Milvus. Their argument: "Grep is a dead end that drowns you in irrelevant matches, burns tokens, and stalls your workflow. Without semantic understanding, it's like asking your AI to debug blindfolded." They propose their Claude Context vector MCP plugin as a hybrid fix, claiming 40% token reduction.

Milvus sells a vector database. That commercial interest is transparent and worth noting. It does not make the technical criticism wrong.

The token burn problem is real on common terms. Search useState across a React codebase and you will get hundreds of matches across dozens of files. Claude Code must either process all of them (expensive) or refine the query (adds turns). On codebases with inconsistent naming or high churn, the refinement loop can consume substantial context before reaching the target file.

The 200K token pricing cliff makes this worse when hit: any request exceeding 200K input tokens pays 2x on all tokens in that request, not just the excess. The jump from 3/millionto3/million to 6/million is a hard threshold, not a gradual escalation — and agentic sessions on large codebases with vague prompts can hit it faster than expected.

The semantic miss problem is the other genuine limitation. Grep finds what you name. If createD1HttpClient was renamed buildGatewayClient six months ago, grep finds nothing. Vector embeddings preserve semantic relationships across renames — a real advantage on codebases with heavy refactoring history or cryptic abbreviation conventions.

In practice, Claude Code compensates by running multiple searches: "auth", "session", "token", "middleware", "jwt", "bearer" — triangulating toward the module rather than naming it directly. This multi-step reasoning is something static embedding retrieval cannot do (a vector DB returns its top-k hits and stops). But it costs more turns and more tokens than a single well-placed semantic query would.

Where agentic search wins clearly:

  • Exact symbol lookup — function names, class names, import paths are precise by definition
  • Active editing sessions — grep reads current filesystem state; no index can be as fresh
  • Security and privacy contexts — zero data leaves the machine
  • Well-named, medium-sized codebases — consistent naming discipline eliminates most semantic miss risk

Where proactive indexing wins:

  • Large monorepos — millions of lines where iterative grep exploration burns context faster than it narrows
  • Conceptual search — "find all places we handle authentication" without knowing exact symbol names
  • Unfamiliar codebases — when you cannot yet name what you are looking for, semantic similarity is more useful than exact match
  • Enterprise teams — persistent cross-session context without re-exploration cost

Q: Is agentic search better than RAG for code? A: For many workloads, yes. A February 2026 Amazon Science paper (arXiv 2602.23368) found keyword search via agentic tool use achieves over 90% of RAG-level performance without a vector database. For code specifically, exact-match search outperforms semantic retrieval on stable, well-named codebases because code symbols are precise. RAG's advantage is on conceptual search across large repos with inconsistent naming.


What Developers Built to Fill the Gap

The community response to Claude Code's no-index architecture is itself a data point. Developers who needed semantic search on top of agentic search built it as an MCP extension rather than switching tools.

Several projects emerged:

  • Claude Context (Milvus/Zilliz) — an MCP server adding vector-powered semantic search to Claude Code's tool set; the same Milvus that wrote the critique built the fix
  • claude-codebase-indexer — vector-based search with intelligent chunking as a Claude Code add-on
  • claude-code-project-index — a PROJECT_INDEX system for persistent architectural awareness across sessions
  • CocoIndex — real-time codebase indexing designed to work alongside any AI coding agent
  • ast-grep — structural search understanding code ASTs, not raw text; finds patterns like "all arrow functions returning a Promise" without exact symbol names

The architectural significance: Claude Code is simultaneously an MCP client (connecting to external tool servers like these) and an MCP server (exposing its own file editing and command execution tools to Claude Desktop, Cursor, and Windsurf). The MCP documentation describes both directions. The no-index architecture is not a closed position — it is a composable default. Vector search is a plugin away for anyone who needs it.

The community's response tells us who the current architecture serves well (developers on medium-to-large codebases with disciplined naming who need precision and privacy) and who it does not fully serve out of the box (teams working on large legacy systems with inconsistent conventions where conceptual search across sessions would save significant time).


Where This Is Going

Context windows keep expanding. Claude Sonnet 4.6 supports 1M tokens in beta. At that scale, the distinction between "indexing" and "just loading everything" starts to blur — a sufficiently large context window could theoretically hold a medium-sized codebase in its entirety.

There is a catch. NxCode's analysis of Opus 4.6 at 1M tokens documents a 17-point MRCR retrieval accuracy drop as context fills (93% at shorter contexts, 76% at 1M tokens). Large context is available but not free of quality degradation — models lose precision at the edges of their effective attention range. Loading an entire codebase into a context window does not guarantee the model uses that context accurately.

Three trajectories are running in parallel:

Agentic search improves its execution. Relace's parallel tool call result — 4x latency reduction by calling 4–12 tools simultaneously via RL-trained optimization — shows the sequential bottleneck can be engineered away. The fundamental approach stays the same; the execution gets more efficient. Expect Claude Code's own tool execution to move in this direction.

Hybrid architectures become the production consensus. The HN community thread on agentic vs. RAG in production reflects what practitioners are reaching for at enterprise scale: vector prefiltering to narrow candidates, followed by agentic confirmation. Faster first-query response from embeddings, precision and freshness from grep-based verification. Neither architecture alone is the final answer for the largest systems.

Context window economics change the calculus. With 1M token contexts and Anthropic's 81% cost reduction from prefix caching, the "loading an entire codebase is prohibitively expensive" constraint is weakening. Anthropic's principle — "just in time retrieval of the smallest possible set of high-signal tokens" — remains the right engineering philosophy, but the practical threshold for "too large to load" keeps rising.

What is not changing is Cherny's underlying bet. Claude Code is described by its creator as "a Unix utility, not a product". The design principle is "do the simple thing first": memory is a markdown file, prompt summarization is done simply, search is grep. Complexity is deferred until it is demonstrated to be necessary. The RAG experiment demonstrated it was not — at least not for the majority of workloads.

The one scenario where indexing becomes necessary is the scenario that is genuinely hard to grep: a monorepo at Google or Meta scale, with millions of files, multiple programming languages, decades of naming inconsistency, and teams who need to ask conceptual questions about code they have never read. That is a real workload. It is not the workload Claude Code was designed for.

For the rest — developers working on their own codebases, on team projects with shared naming conventions, on repositories they understand well enough to name what they are looking for — the agentic search approach holds. Grep is precise, fresh, and private. The model learns to search the way you would, because it has the same tools you do. And as 1M-token context windows become the baseline, the gap between "search" and "load everything" shrinks further — which means the principle Anthropic bet on (retrieve just in time, keep context tight, prefer simplicity) only becomes more relevant as the underlying capability improves.


References

Primary sources

Architecture and reverse engineering

Competitor architecture

Performance and benchmarks

Community tools

Two Paradigms of Multi-Agent AI: Rust Parallel Agents vs Claude Code Agent Teams

· 28 min read
Vadim Nicolai
Senior Software Engineer
TL;DR

Three multi-agent coordination positions, one codebase. A static Rust/Tokio fan-out assigns 20 agents at compile time with zero coordination overhead. A team.rs library implements the full Claude Code agent-teams model in pure Rust — TaskQueue, Mailbox, PlanGate, ShutdownToken — and the study pipeline now uses it to run a 2-step search→write flow with inter-worker messaging. Claude Code agent teams invert every assumption of static fan-out: dynamic task claiming, file-locked concurrency, full bidirectional messaging. The decision rule is one question: do your agents need to talk to each other? If no, tokio::spawn + Arc<T>. If yes: build team.rs, or use TeamCreate.

Multi-agent AI engineering has become a core discipline in production software development. The interesting question is no longer whether to build multi-agent systems. It is how — and specifically, which architectural pattern to reach for given the nature of the work. The clearest demonstration is that multiple fundamentally different paradigms live inside the same codebase.

When this article was first published, the comparison was binary: the Rust crate used bare tokio::spawn fan-out while Claude Code provided the coordination model. That binary is no longer accurate. The research crate now ships team.rs — a 641-line generic coordination library in pure Rust that implements the complete Claude Code agent-teams model. The codebase now demonstrates all three positions simultaneously.

Why Multi-Agent AI Systems Are Having a Moment in 2026

Agent papers grew from roughly 820 in 2024 to over 2,500 in 2025. Enterprise AI projects using multi-agent architectures reportedly reached 72% in 2025. LangGraph, the most-adopted orchestration framework in the ecosystem, leads adoption; AutoGen and CrewAI follow. The concept has moved from research to production infrastructure faster than most practitioners anticipated.

What the research papers do not tell you is which architectural pattern to use. That is the gap this article closes.

Paradigm 1: Infrastructure-Owned Parallelism — The Rust/DeepSeek Approach

The research crate is a real Rust binary that fans out up to 20 parallel DeepSeek agents against Semantic Scholar, collects their outputs, and writes results to Cloudflare D1. Its architecture is aggressive in its simplicity.

The entry point (research/src/bin/research_agent.rs) exposes five subcommands: research (single agent), study (20 parallel agents over agentic-coding topics), prep (10 parallel agents over application-prep topics), enhance (10 agents per application section), and backend (20 agents for backend interview prep). Every subcommand follows the same pattern: define a static list of tasks, queue them, spawn workers, collect results.

The task list is a compile-time constant:

// research/src/study.rs — 20 topics, statically defined
pub const TOPICS: &[TopicDef] = &[
TopicDef { slug: "tool-use-patterns", ... },
TopicDef { slug: "react-agent-loop", ... },
// ... 18 more
];

The task structure is fully known before the binary starts. There is no runtime negotiation over which agent handles which topic.

How the DeepSeek Tool-Use Loop Works in Rust

Each spawned agent runs the same inner loop, implemented in research/src/agent.rs. The loop is a direct implementation of the OpenAI-compatible function-calling protocol — without a Python SDK wrapper, without a framework abstraction layer:

// research/src/agent.rs — the agentic tool-use loop
impl DeepSeekAgent {
pub async fn prompt(&self, user_prompt: String) -> Result<String> {
let mut messages: Vec<Value> = vec![
json!({"role": "system", "content": self.preamble}),
json!({"role": "user", "content": user_prompt}),
];

loop {
let resp: Value = self.http
.post(&format!("{}/v1/chat/completions", self.base_url))
.bearer_auth(&self.api_key)
.json(&body)
.send().await?
.json().await?;

let finish_reason = resp["choices"][0]["finish_reason"]
.as_str().unwrap_or("stop");

match finish_reason {
"tool_calls" => {
// Execute each requested tool, append results, loop again
messages.push(message.clone());
for call in calls {
let result = tool.call_json(args).await?;
messages.push(json!({
"role": "tool",
"tool_call_id": call_id,
"content": result,
}));
}
}
_ => {
// "stop" — return the final content
return message["content"].as_str().map(String::from)...;
}
}
}
}
}

The Tool trait that backs this loop uses async_trait and is simple by design:

#[async_trait]
pub trait Tool: Send + Sync {
fn name(&self) -> &str;
fn definition(&self) -> ToolDefinition;
async fn call_json(&self, args: Value) -> Result<String>;
}

Tools register their own JSON Schema via definition(), and the agent loop dispatches by name. In the research crate, the tools are search_papers (Semantic Scholar API) and get_paper_detail. Agents in the study subcommand use both tools for paper lookup; agents in the prep subcommand run without tools — direct chat completions for speed, because their task structure does not require external lookups.

Spawning Parallel Agents with Tokio

The prep pipeline still demonstrates the flat fan-out pattern — no inter-worker communication, no dependency graph. The APPLICATION_TOPICS path is the cleanest example of infrastructure-owned parallelism:

// research/src/study.rs — run_prep()
let queue: TaskQueue<TopicDef> = TaskQueue::new();
for topic_def in APPLICATION_TOPICS {
queue.push(topic_def.slug, *topic_def, vec![], 2).await;
}

let mailbox = Mailbox::new();
let (_shutdown_tx, shutdown) = shutdown_pair();
let summary = TeamLead::new(APPLICATION_TOPICS.len())
.run(queue, mailbox, shutdown, move |ctx, task| {
let api_key = Arc::clone(&api_key);
let d1 = Arc::clone(&d1);
let topic_def = task.payload;
async move {
info!(worker = %ctx.worker_id, topic = topic_def.slug, "Prep agent starting");
let row = run_direct_agent(topic_def, &api_key).await?;
d1.insert_study_topic(&row)
.await
.with_context(|| format!("D1 insert failed for {}", topic_def.slug))?;
info!(worker = %ctx.worker_id, topic = topic_def.slug, "Saved to D1");
Ok::<(), anyhow::Error>(())
}
})
.await;

No mailbox communication between workers. No dependencies. Each worker reads its own topic, makes its own API call, writes its own row to D1. Workers never communicate with each other. This is the flat fan-out case expressed through the team abstraction — functionally equivalent to a bare tokio::spawn loop, but now with retry, idle notifications, and cooperative shutdown included for free.

Shared state is wrapped in Arc<T> and cloned cheaply into each task. A Tokio task carries roughly 64 bytes of overhead and spawns in sub-microsecond time. Spinning up 20 agents adds negligible latency to program startup.

The Third Path: Implementing Agent-Teams Primitives in Rust

After observing the mismatch between "static fan-out is too rigid for the study pipeline" and "spinning up a full Claude session per research topic is too expensive," the research crate grew a third position: research/src/team.rs, a 641-line Rust coordination library that implements the complete Claude Code agent-teams model natively.

This is not an accidental similarity. The module-level doc comment states the goal explicitly, mapping every agent-teams concept to its Rust equivalent:

Agent-teams conceptteam.rs equivalent
Shared task listTaskQueue<P>
Atomic task claimingTaskQueue::claim
Task dependenciesdepends_on in TaskQueue::push
Retry on failuremax_attempts + re-queue on fail
Queue change notificationTaskQueue::notify_handle
Lead / worker separationTeamLead + TeamContext
Worker identitystable worker-NN IDs
Peer discoveryctx.peer_ids in TeamContext
Point-to-point messageMailbox::send
Broadcast to all teammatesMailbox::broadcast
Idle notificationsworker → team-lead inbox on exit
Plan approval gatePlanGate
Cooperative shutdownShutdownToken / shutdown_pair

Every concept from the Claude Code agent-teams documentation has a direct Rust/Tokio equivalent. The target audience is clear: engineers who need the coordination semantics of agent teams but cannot or will not run a full Claude session per task — whether because of WASM constraints, cost at scale, or infrastructure ownership requirements.

The full implementation is in research/src/team.rs.

TaskQueue — Atomic Claiming with Dependency Support

TaskQueue<P> is generic over the task payload type. Its claim() method is the coordination core — it holds the mutex for the full claim operation, computes which tasks have their dependencies satisfied, and claims the lowest available ID:

// research/src/team.rs — TaskQueue::claim
pub async fn claim(&self, worker: &str) -> Option<(TaskId, String, P)> {
let mut s = self.inner.lock().await;
let done: HashSet<TaskId> = s.tasks.values()
.filter(|t| t.status == TaskStatus::Completed)
.map(|t| t.id)
.collect();
let id = s.tasks.values()
.filter(|t| {
t.status == TaskStatus::Pending
&& t.depends_on.iter().all(|d| done.contains(d))
})
.map(|t| t.id)
.min()?; // lowest ID wins (ID-order preference)
let task = s.tasks.get_mut(&id).unwrap();
task.status = TaskStatus::Claimed(worker.into());
task.attempts += 1;
Some((id, task.name.clone(), task.payload.clone()))
}

Tasks are pushed with an explicit dependency list:

// research/src/team.rs — TaskQueue::push
pub async fn push(
&self,
name: impl Into<String>,
payload: P,
depends_on: Vec<TaskId>, // IDs that must be Completed before this can be claimed
max_attempts: u32,
) -> TaskId {
let mut s = self.inner.lock().await;
let id = s.next_id;
s.next_id += 1;
s.tasks.insert(id, TaskEntry {
id,
name: name.into(),
payload,
status: TaskStatus::Pending,
depends_on,
attempts: 0,
max_attempts,
});
id
}

Failure handling re-queues the task as Pending if attempts remain, permanently marks it Failed otherwise, and notifies idle workers via Notify:

// research/src/team.rs — TaskQueue::fail
pub async fn fail(&self, id: TaskId) {
{
let mut s = self.inner.lock().await;
if let Some(t) = s.tasks.get_mut(&id) {
if t.attempts >= t.max_attempts {
warn!(task = %t.name, attempts = t.attempts, "Task permanently failed");
t.status = TaskStatus::Failed;
} else {
warn!(task = %t.name, attempt = t.attempts, max = t.max_attempts,
"Task failed — re-queuing for retry");
t.status = TaskStatus::Pending;
}
}
}
self.changed.notify_waiters();
}

Mailbox — Point-to-Point and Broadcast Messaging

The Mailbox is an Arc-wrapped HashMap<String, VecDeque<Envelope>> — named inboxes, FIFO order. Any string can be an inbox name: worker IDs, task slugs, topic slugs. From the doc comment:

Workers write to named inboxes and read from their own. The inbox name can be a worker ID, a task name, a topic slug — any agreed-upon key. This mirrors the agent-teams mailbox where teammates message each other directly without going through the lead.

Point-to-point send:

// research/src/team.rs — Mailbox::send
pub async fn send(
&self,
from: impl Into<String>,
to: impl Into<String>,
subject: impl Into<String>,
body: impl Into<String>,
) { ... }

Broadcast delivers the same message to every recipient in the slice:

// research/src/team.rs — Mailbox::broadcast
pub async fn broadcast(
&self,
from: impl Into<String>,
recipients: &[&str],
subject: impl Into<String>,
body: impl Into<String>,
) {
let from = from.into();
let subject = subject.into();
let body = body.into();
for recipient in recipients {
self.send(from.clone(), *recipient, subject.clone(), body.clone()).await;
}
}

Blocking receive parks the task until a message arrives:

// research/src/team.rs — Mailbox::recv_wait
pub async fn recv_wait(&self, inbox: &str) -> Envelope {
loop {
if let Some(env) = self.recv(inbox).await {
return env;
}
self.notify.notified().await;
}
}

The Envelope struct carries a monotonic message ID, sender, recipient, subject, and body (plain text or JSON):

// research/src/team.rs
pub struct Envelope {
pub id: u64,
pub from: String,
pub to: String,
pub subject: String,
pub body: String,
}

TeamLead::run() — The Worker Driver

TeamLead holds two fields: worker_count and idle_poll_ms. Its run() method is fully generic — the task payload type, return type, and worker closure are all type parameters:

// research/src/team.rs — TeamLead::run (signature)
pub async fn run<P, R, F, Fut>(
&self,
queue: TaskQueue<P>,
mailbox: Mailbox,
shutdown: ShutdownToken,
worker_fn: F,
) -> QueueSummary
where
P: Send + Clone + 'static,
R: Send + 'static,
F: Fn(TeamContext<P>, WorkerTask<P>) -> Fut + Send + Sync + Clone + 'static,
Fut: std::future::Future<Output = anyhow::Result<R>> + Send,

Each worker loop checks shutdown, claims tasks, invokes the worker closure, and handles success or failure. When idle, workers wait on a Notify handle rather than busy-polling:

// research/src/team.rs — worker loop inside TeamLead::run
loop {
if shutdown.is_cancelled() {
info!(worker = %worker_id, "Shutdown requested — exiting");
break;
}

match queue.claim(&worker_id).await {
Some((id, name, payload)) => {
info!(worker = %worker_id, task = %name, "Claimed task");
let ctx = TeamContext {
worker_id: worker_id.clone(),
peer_ids: peer_ids.clone(),
queue: queue.clone(),
mailbox: mailbox.clone(),
shutdown: shutdown.clone(),
};
let task = WorkerTask { id, name: name.clone(), payload };
match worker_fn(ctx, task).await {
Ok(_) => queue.complete(id).await,
Err(e) => {
tracing::error!(worker = %worker_id, task = %name, "Task failed: {e}");
queue.fail(id).await;
}
}
}
None => {
if queue.all_done().await {
info!(worker = %worker_id, "All tasks done — idle");
break;
}
let notify = queue.notify_handle();
tokio::select! {
_ = notify.notified() => {}
_ = tokio::time::sleep(Duration::from_millis(idle_poll_ms)) => {}
}
}
}
}

// Idle notification — mirrors agent-teams: "teammates notify the lead when they finish"
mailbox.send(
&worker_id,
"team-lead",
"idle",
format!("{worker_id} idle — queue: {} pending, ...", summary_snapshot.pending),
).await;

Workers send an "idle" message to the "team-lead" inbox on exit. This mirrors the agent-teams behavior where teammates automatically notify the lead when they finish.

Peer Discovery via TeamContext

Each worker receives a TeamContext containing its own ID, the list of all peer IDs, the shared queue, the shared mailbox, and the shutdown token:

// research/src/team.rs
pub struct TeamContext<P: Clone + Send + 'static> {
pub worker_id: String,
/// IDs of all other active workers — mirrors agent-teams members array.
pub peer_ids: Vec<String>,
pub queue: TaskQueue<P>,
pub mailbox: Mailbox,
pub shutdown: ShutdownToken,
}

peer_ids is computed by TeamLead::run() before spawning. Each worker gets all IDs except its own:

// research/src/team.rs — inside TeamLead::run
let all_ids: Vec<String> = (1..=self.worker_count)
.map(|i| format!("worker-{:02}", i))
.collect();

// per-worker:
let peer_ids: Vec<String> = all_ids.iter()
.filter(|id| *id != &worker_id)
.cloned()
.collect();

Workers can address each other directly via ctx.mailbox.send(&ctx.worker_id, peer_id, ...) using ctx.peer_ids as the address book — the exact same model as the agent-teams members array.

Cooperative Shutdown via ShutdownToken

The ShutdownToken uses a watch::channel — the lead's sender writes true to signal shutdown, and each worker checks the value between task iterations, never inside task execution:

// research/src/team.rs
#[derive(Clone)]
pub struct ShutdownToken(watch::Receiver<bool>);

impl ShutdownToken {
pub fn is_cancelled(&self) -> bool { *self.0.borrow() }
}

pub struct ShutdownSender(watch::Sender<bool>);

impl ShutdownSender {
pub fn shutdown(&self) { let _ = self.0.send(true); }
}

pub fn shutdown_pair() -> (ShutdownSender, ShutdownToken) {
let (tx, rx) = watch::channel(false);
(ShutdownSender(tx), ShutdownToken(rx))
}

From the doc comment: "Workers poll is_cancelled() between task iterations. They always finish their current task before checking — matching the agent-teams behaviour: 'teammates finish their current request before shutting down'." Workers are never cancelled mid-flight; the shutdown is cooperative.

PlanGate — Plan Approval Gate

PlanGate is the Rust equivalent of Claude Code's plan approval flow. Workers call submit_and_wait() and block on a oneshot::Receiver. The lead calls approve() or reject(), which sends on the oneshot::Sender and unblocks the worker:

// research/src/team.rs — PlanGate
pub async fn submit_and_wait(&self, worker_id: &str, plan: &str) -> PlanDecision {
let (tx, rx) = tokio::sync::oneshot::channel();
info!(worker = %worker_id, "Plan submitted, awaiting approval");
self.pending.lock().await.insert(
worker_id.into(),
PlanEntry { plan: plan.into(), tx },
);
self.notify.notify_waiters();
rx.await.unwrap_or(PlanDecision::Rejected { feedback: "Gate dropped".into() })
}

pub async fn approve(&self, worker_id: &str) {
if let Some(e) = self.pending.lock().await.remove(worker_id) {
info!(worker = %worker_id, "Plan approved");
let _ = e.tx.send(PlanDecision::Approved);
}
}

pub async fn reject(&self, worker_id: &str, feedback: &str) {
if let Some(e) = self.pending.lock().await.remove(worker_id) {
warn!(worker = %worker_id, "Plan rejected");
let _ = e.tx.send(PlanDecision::Rejected { feedback: feedback.into() });
}
}

The minimal example from the module's doc comment shows the full API surface in a dozen lines:

// research/src/team.rs — doc example
let queue: TaskQueue<String> = TaskQueue::new();
queue.push("greet", "hello".into(), vec![], 2).await;

let mailbox = Mailbox::new();
let (_sd_tx, shutdown) = shutdown_pair();

let summary = TeamLead::new(2)
.run(queue, mailbox, shutdown, |_ctx, task| async move {
println!("{}: {}", task.name, task.payload);
Ok::<(), anyhow::Error>(())
})
.await;

assert_eq!(summary.completed, 1);

The 2-Step Mailbox Pipeline: search→write via Mailbox

The study pipeline is where team.rs coordination replaces the old static fan-out. For each of the 20 agentic-coding topics, the pipeline queues two dependent tasks: a Search task that queries Semantic Scholar and deposits findings into the mailbox, and a Write task that reads those findings and generates the study guide.

The ResearchTask Enum

// research/src/study.rs
#[derive(Clone)]
enum ResearchTask {
Search(TopicDef),
Write { topic: TopicDef, category: &'static str },
}

The old run_single_agent() function — which handled the full research-and-write pipeline in one agent — has been replaced by two phase-specific functions: search_topic_papers() (runs the tool-use agent with SearchPapers and GetPaperDetail tools, returns raw findings as markdown) and write_study_guide() (pure-completion agent, no tools, receives findings string, returns a StudyTopicRow). Only the search phase needs the Semantic Scholar API; the write phase is deterministic given the findings.

Queuing Paired Tasks with Dependencies

For each topic, run_topics() pushes two tasks. The write:{slug} task carries the search task's ID in its depends_on list, so TaskQueue::claim cannot return it until the paired search task is completed:

// research/src/study.rs — run_topics()
let queue: TaskQueue<ResearchTask> = TaskQueue::new();
for topic_def in topics {
let search_id = queue
.push(
format!("search:{}", topic_def.slug),
ResearchTask::Search(*topic_def),
vec![], // no dependencies
2, // max 2 attempts
)
.await;
queue
.push(
format!("write:{}", topic_def.slug),
ResearchTask::Write { topic: *topic_def, category },
vec![search_id], // blocked until search completes
2,
)
.await;
}

For 20 topics, this pushes 40 tasks total. The queue enforces that no write:{slug} task can be claimed until its paired search:{slug} is completed.

The TeamLead::run() Call

The old bare tokio::spawn loop is replaced by TeamLead::new(topics.len()).run(...). The number of workers equals the number of topics, so search and write tasks for different topics can overlap even while write tasks within one topic block on their own search:

// research/src/study.rs — run_topics()
let mailbox = Mailbox::new();
let (_shutdown_tx, shutdown) = shutdown_pair();
let summary = TeamLead::new(topics.len())
.run(queue, mailbox, shutdown, move |ctx, task| {
let api_key = Arc::clone(&api_key);
let scholar = Arc::clone(&scholar);
let d1 = Arc::clone(&d1);
async move {
match task.payload {
ResearchTask::Search(topic) => {
info!(worker = %ctx.worker_id, topic = topic.slug, "Search phase starting");
let findings = search_topic_papers(topic, &scholar, &api_key).await?;
ctx.mailbox
.send(&ctx.worker_id, format!("findings:{}", topic.slug), "paper-findings", findings)
.await;
info!(worker = %ctx.worker_id, topic = topic.slug, "Search phase done, findings in mailbox");
}
ResearchTask::Write { topic, category } => {
info!(worker = %ctx.worker_id, topic = topic.slug, "Write phase starting");
let env = ctx.mailbox.recv_wait(&format!("findings:{}", topic.slug)).await;
let row = write_study_guide(topic, category, &env.body, &api_key).await?;
d1.insert_study_topic(&row)
.await
.with_context(|| format!("D1 insert failed for {}", topic.slug))?;
info!(worker = %ctx.worker_id, topic = topic.slug, "Saved to D1");
}
}
Ok::<(), anyhow::Error>(())
}
})
.await;

The mailbox inbox name convention is findings:{slug}. The search worker sends to that inbox; the write worker calls recv_wait(&format!("findings:{slug}")), blocking until the message is available. Task dependency in the queue guarantees the Write task cannot even be claimed until Search completes, so recv_wait unblocks quickly in practice — but the mailbox blocking provides a safety net if the dependency graph and the mailbox arrive slightly out of sync.

This is what the study.rs pipeline looked like before team.rs existed: isolated agents, no inter-worker communication, outputs collected after-the-fact from D1. Adding the mailbox turned it from independent parallel agents into a coordinated pipeline where one worker's output is another's input — exactly the pattern the Claude Code agent-teams SendMessage primitive enables.

Paradigm 2: Platform-Managed Agent Teams — The Claude Code Approach

Claude Code's experimental agent teams feature inverts every architectural assumption of static fan-out. Where the Rust system owns its concurrency at the OS level, Claude teams delegate coordination to the platform. Where Rust pre-assigns tasks via a queue, Claude teams use a shared task list with file-locked claiming at runtime. Where the flat Rust fan-out has isolated agents, Claude teammates send messages to each other directly.

The feature is enabled in the nomadically.work repo via .claude/settings.json:

{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}

That one line unlocks five coordination primitives: TeamCreate, TaskCreate, TaskUpdate, TaskList, and SendMessage. Each teammate is a full, independent Claude Code session — a separate process with its own context window, its own file system access, and its own ability to write and run code.

How Claude Code Agent Teams Self-Organize

The team lead creates a shared task list stored in ~/.claude/tasks/{team-name}/. Teammates discover available tasks by calling TaskList, claim work by calling TaskUpdate (setting themselves as owner), and the platform uses file locking to prevent two teammates from claiming the same task simultaneously. When a teammate finds something unexpected, they send a direct message to the relevant peer via SendMessage — the lead does not need to relay it.

The nomadically.work repo uses Claude agent teams in its SDD (Spec-Driven Development) orchestrator. The /sdd:ff command spawns a team where spec-writing and design run in parallel — two teammates working simultaneously, each producing artifacts the other may need to reference. The key point is that these phases are not fully independent: a spec decision can change a design constraint. If that happens, the teammates can tell each other directly.

Teammate display cycles through sessions via Shift+Down in-process, or splits into panes in tmux or iTerm2. The team lead's conversation history does not carry to teammates — each starts fresh from the shared CLAUDE.md project context and the task description. Context contamination is a real overhead: a teammate may re-investigate something the lead already resolved, spending tokens to rediscover context the lead has but could not transfer.

Known limitations apply: no session resumption, task status can lag under contention, no nested teams (teammates cannot spawn sub-teams), and the lead is fixed for the team's lifetime. These are experimental constraints, not permanent design decisions — but they are real constraints today.

Comparing the Three Positions: A Decision Framework

These are not competing patterns converging toward the same solution. They occupy distinct positions on a coordination spectrum, each optimal for a different class of work.

The sharpest question to ask before choosing is: do your agents need to talk to each other?

If the answer is no — if you can define all tasks before the run starts, if each agent's output is independent, if a failure in one agent should not affect the scope of another — then flat fan-out is the right call. The run_prep() path in study.rs demonstrates this with TeamLead providing retry and shutdown for free, but no mailbox communication.

If the answer is yes — and you own your infrastructure, need deterministic concurrency control, or operate under WASM or cost constraints that make full Claude sessions per task unviable — then team.rs is the answer. You get the full coordination model (messaging, dependency graphs, plan gates, cooperative shutdown) at Tokio cost, not Claude token cost.

If the answer is yes — and you want the coordination model without writing it, prefer natural-language task definitions, need human-in-the-loop steering mid-run, or cannot afford the maintenance overhead of a custom coordination library — then Claude Code agent teams are the answer.

DimensionRust flat fan-outRust with team.rsClaude Code agent teams
Task assignmentStatic, pre-queuedDynamic, atomic claim()Dynamic, file-locked claiming
Inter-agent communicationNoneMailbox::send / broadcastFull bidirectional via SendMessage
Task dependency supportNonedepends_on: Vec<TaskId>Blocked/unblocked dependency graph
Task retryManualmax_attempts + re-queuePlatform-managed
Human-in-the-loopFire-and-forgetPlanGate::submit_and_waitDirect message injection to any teammate
Cooperative shutdownNoneShutdownToken / watch channelPlatform-managed
Concurrency overhead~64 bytes + sub-μs spawnSame — TeamLead uses tokio::spawn internallyFull context window per teammate; token-linear scaling
Partial failure handlingCounter; peers continuefail() re-queues within max_attemptsFailed teammate replaceable without aborting team
Task dynamismZeroRe-queue on failure; dependency graph changes effective availabilityTasks can be created, re-assigned, or cancelled at runtime
ObservabilityStructured logs (tracing)Structured logs + QueueSummary + mailbox inbox countsTeammate display modes (in-process, tmux, iTerm2)
Infrastructure ownershipFullFullPlatform-managed

Cost, Latency, and Observability Tradeoffs

The Rust crate's cost model is transparent regardless of which coordination layer you use. Workers make independent API calls to DeepSeek. Each call consumes tokens proportional to the agent's preamble, context, and tool results. Total cost is roughly N times the cost of a single agent — no platform overhead, no coordination messages, no duplicate context. The team.rs coordination layer is in-process Rust with zero token cost.

Claude agent teams cost more per the official documentation, though no specific multiplier is published. Each teammate carries its own full context window. Broadcast messages sent to all teammates multiply by team size. The official recommendation is 3–5 teammates with 5–6 tasks each — beyond that, coordination overhead accumulates faster than parallelism saves.

Latency follows the opposite pattern. The Rust system's wall-clock time is bounded by the slowest agent plus network latency — typically 30–90 seconds for 20 agents running fully parallel. The team.rs 2-step pipeline adds the mailbox handoff latency (sub-millisecond, in-process), which is negligible compared to LLM inference time. A Claude team doing the same breadth of work sequentially within a single session would take proportionally longer.

The engineering category is well-compensated precisely because operating these systems at production scale requires understanding these tradeoffs, not just knowing the API.

When to Build Your Own vs Use Claude Code Agent Teams

Build infrastructure-owned concurrency (Rust team.rs, Python asyncio, TypeScript Promise.all) when:

  • Task structure is fully or partially known before execution starts
  • You need deterministic concurrency control with predictable retry behavior
  • You are running on constrained infrastructure (Cloudflare Workers, WASM) where a full agent session per task is not viable
  • Per-token cost matters at scale — flat API cost per agent, no platform overhead
  • Inter-agent communication is needed but full Claude sessions per worker are too expensive
  • You want compile-time type safety over agent payload shapes

Use Claude Code agent teams when:

  • The task is exploratory — agents may discover things that change the plan
  • Agents need to challenge or build on each other's reasoning in natural language
  • Task dependencies are dynamic — you cannot know the full task graph upfront
  • You want human steering capability mid-run without aborting the whole run
  • Orchestration code itself is a maintenance burden you want to avoid writing
  • Task definitions benefit from natural language rather than typed enum variants

The run_prep() path is an example of flat fan-out. The run_topics() pipeline is an example of team.rs coordination. The SDD orchestrator is an example of Claude agent teams. All three exist in the same codebase because the tasks they handle are structurally different — not because one pattern supersedes the others.

One nuance worth stating plainly: the static fan-out pattern is not Rust-specific. Python's asyncio.gather() and TypeScript's Promise.all() implement the same model. The Rust implementation is a hook into the nomadically.work codebase, not an argument for Rust as the only language for this problem. The DeepSeek API is OpenAI-compatible; the tool-use loop in agent.rs could be ported to Python in an afternoon. The Rust choice reflects specific constraints — WASM compilation targets, type-safe JSON handling, and zero-cost abstractions for a system intended for Cloudflare Worker environments. Those are valid reasons; they are also not universal.

What This Means for the Future of AI-Powered Software Development

The three positions now occupy distinct points on a coordination spectrum that will remain relevant regardless of how individual frameworks evolve.

At one end: static fan-out, owned concurrency, zero coordination overhead, compile-time task structure. Maximally efficient for embarrassingly parallel work where the task graph is known. Gets faster as inference costs fall and async runtimes improve.

In the middle: owned-infrastructure coordination (team.rs or equivalent), dynamic task claiming, in-process messaging, cooperative shutdown, plan gates. Maximally efficient when you need coordination semantics but cannot pay full-session cost per worker. Gets easier to build as the primitives become better understood.

At the other end: platform-managed coordination, dynamic teams, full messaging infrastructure, runtime task discovery in natural language. Maximally flexible for exploratory work where the task graph emerges during execution. Gets cheaper as context window costs fall and team-size recommendations increase.

The emerging challenge — genuinely unsolved — is automated task structure detection: given a goal, should the system fan-out statically, build a team.rs-style queue, or stand up a full agent team? The agentic frameworks (Claude Agent SDK, OpenAI Agents SDK, LangGraph) are converging on common primitives for describing tasks and dependencies. But the decision of which concurrency model to use still requires human judgment about the nature of the work.

That judgment is increasingly a senior engineering skill — and it is what separates engineers who can operate these systems at production scale from those who merely know the API.


FAQ

What is the Rust equivalent of Claude Code agent teams? The team.rs module in the nomadically.work research crate implements full parity: TaskQueue replaces the shared task list, TaskQueue::claim handles atomic claiming, Mailbox::send and Mailbox::broadcast replace SendMessage, PlanGate implements the plan approval gate, and ShutdownToken (via tokio::sync::watch) handles cooperative shutdown. Every agent-teams primitive has a direct Rust/Tokio equivalent.

What is the difference between multi-agent orchestration and agent swarms? Orchestration implies a coordinator that assigns tasks to workers based on a defined structure — the coordinator knows the plan. Swarms imply emergent coordination where agents self-organize without a central planner. Claude Code agent teams are closer to orchestration (a lead agent coordinates); the team.rs library is also orchestration (a TeamLead drives the queue); the bare tokio::spawn fan-out is neither — it is static parallelism without ongoing coordination of any kind.

How does Claude Code agent teams pricing work? Each teammate is a full Claude session consuming its own token budget. The official documentation describes cost as higher than a single session, scaling linearly with team size. Broadcast messages multiply by team size. Targeted teammate-to-teammate messages add tokens to both sending and receiving contexts. No specific multiplier is published.

Can I run AI agents in parallel with Rust? Yes. For flat fan-out (no inter-agent communication needed), the tokio::spawn + Arc<T> pattern is idiomatic. Wrap shared clients in Arc, clone into each spawned task, collect JoinHandles, await results. For coordination (dynamic claiming, messaging, dependencies, retry), use TeamLead::new(n).run(queue, mailbox, shutdown, worker_fn) from research/src/team.rs. The overhead for either is approximately 64 bytes per task and sub-microsecond spawn latency — the team.rs coordination layer is in-process Rust with no additional cost.

How do I implement inter-agent messaging in Rust? Use a shared Mailbox: a Mutex<HashMap<String, VecDeque<Envelope>>> with a Notify for wake-up. Workers call mailbox.send(from, to, subject, body) to deposit messages into named inboxes; receivers call recv_wait(inbox) to block until a message arrives. For broadcast (send to all peers simultaneously), pass &ctx.peer_ids as recipients. Worker addresses (peer_ids) are pre-computed by TeamLead::run() so every worker can address peers directly without going through the lead.

What is cooperative shutdown in Tokio async Rust? Cooperative shutdown means workers finish their current task before stopping — they are never cancelled mid-flight. In Tokio, implement with watch::channel(false): the lead calls sender.send(true) to signal shutdown; each worker checks *receiver.borrow() between task iterations (not inside task execution). This matches the Claude Code agent-teams behavior where "teammates finish their current request before shutting down." The ShutdownToken / shutdown_pair() pattern in team.rs is a direct implementation of this.

How do I implement task dependencies in an async task queue? Store tasks as HashMap<TaskId, TaskEntry> behind a Mutex. Each TaskEntry has a depends_on: Vec<TaskId> field. claim() locks the queue, computes the set of completed IDs, and picks the lowest-ID pending task whose all dependencies are in that set. On complete() or fail(), call notify.notify_waiters() to wake idle workers blocked on queue.notify_handle().notified().await. Workers that go idle call tokio::select! on the notify handle and a poll timeout, then re-attempt claim() on wake.

What is a plan approval gate in multi-agent systems? A plan gate is a synchronization point where a worker submits its plan and blocks until the lead approves or rejects it — used to give a human or lead agent a chance to review before the worker makes irreversible changes. In Rust, implement with Mutex<HashMap<worker_id, oneshot::Sender<PlanDecision>>>: the worker calls submit_and_wait(plan) which inserts a oneshot channel sender and awaits the receiver. The lead calls approve(worker_id) or reject(worker_id, feedback), which sends on the channel and unblocks the worker. PlanGate in team.rs is a direct implementation.

What is DeepSeek's tool use API? DeepSeek's tool use (function calling) is an OpenAI-compatible API feature where the model returns structured tool_calls JSON when it needs external data. The caller executes the requested function, appends the result as a tool message, and calls the API again. This repeats until finish_reason == "stop". The agent.rs loop implements this directly in Rust without a framework dependency.

When should I use a multi-agent system instead of a single agent? When the task exceeds what a single context window can reliably hold, when subtasks can be parallelized for speed, or when different subtasks benefit from different system prompts or tool sets. Multi-agent overhead is only justified when the task structure genuinely benefits from it — for single-context tasks, a well-prompted single agent is faster and cheaper.

What Rust crates support async LLM agents? The rig crate from 0xPlaygrounds is the most actively maintained Rust LLM agent framework (supports OpenAI, Anthropic, Cohere, and others). async_openai provides lower-level async bindings. The research crate implements its own thin client (agent.rs) against the DeepSeek API directly, plus a full coordination layer (team.rs) — a valid approach when framework overhead outweighs the convenience.


Code samples are taken from research/src/agent.rs, research/src/study.rs, and research/src/team.rs and lightly condensed for readability; no logic has been altered.

Why Do AI Agents Keep Making the Same Mistakes?

· 8 min read
Vadim Nicolai
Senior Software Engineer

Every Claude Code session leaves a trace — tool calls made, files read, edits applied, errors encountered, and ultimately a score reflecting how well the task was completed. Most systems discard this history. We built an agent that mines it.

The Trajectory Miner is the first agent in our six-agent autonomous self-improvement pipeline for nomadically.work, a remote EU job board aggregator. Its job: analyze past sessions, extract recurring patterns and reusable skills, and feed structured intelligence to the rest of the team. It writes no code. It produces raw material that other agents — the Codebase Auditor, Skill Evolver, and Code Improver — consume.

The design draws from four research papers, curated from the VoltAgent/awesome-ai-agent-papers collection. Here is what each paper contributes and how we translated academic ideas into a working system.

Note: The implementation has since evolved from a generic trajectory mining agent into a goal-driven "Pipeline Monitor" focused on job search pipeline health. The research principles described here still underpin the architecture, but the agent's focus has shifted to domain-specific priorities. The data structures and patterns below reflect the original design that these papers informed.

The Stateless Agent Problem

Devin, SWE-agent, OpenHands, Cursor — every major AI coding agent starts each session with a blank slate. They have no memory of what worked yesterday, no record of which approaches failed last week, no institutional knowledge accumulated over hundreds of sessions. Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025, yet almost none of these systems learn from their own history.

The result is predictable: agents repeat the same mistakes. They grep for patterns when they should trace imports. They edit files they haven't read. They propose fixes that were already tried and rejected. Research on trajectory reduction (AgentDiet) shows that "useless, redundant, and expired information is widespread in agent trajectories" — but the solution isn't just trimming waste. It's extracting what worked and making it available for next time.

Four Papers That Solved Pieces of the Puzzle

AutoRefine: Extracting Reusable Expertise from Trajectories

AutoRefine (Cao et al., 2025) addresses a fundamental inefficiency in LLM agents: they solve similar problems from scratch every time. The paper proposes extracting "reusable expertise" from successful agent trajectories — essentially distilling what worked into transferable knowledge.

The key insight is that agent trajectories contain implicit expertise that can be made explicit through structured extraction. Rather than replaying entire trajectories, AutoRefine identifies the decision points that mattered and the reasoning patterns that led to success.

How we used it: Our Trajectory Miner's Pattern Extraction phase directly implements AutoRefine's approach. When the agent reads past improvement suggestions from ~/.claude/state/improvements/, it clusters them into recurring patterns:

PATTERN: {
id: "P-001",
frequency: N,
dimensions: [...],
failure_types: [...],
root_cause_cluster: "...",
affected_targets: [...],
example_sessions: [...],
severity: "critical|high|medium|low",
suggested_fix_type: "skill_update|prompt_edit|code_fix|architecture|config"
}

Each pattern must appear in at least two sessions to qualify as "recurring" — single occurrences are tracked as "incidents" but don't drive fixes. This threshold prevents overreacting to one-off anomalies, a practical constraint AutoRefine's paper doesn't explicitly address but that we found essential in production.

ProcMEM: Procedural Memory for LLM Agents

ProcMEM (Xu et al., 2025) tackles agent memory from a different angle. Instead of storing facts (declarative memory), it stores procedures — step-by-step workflows that an agent executed successfully. The paper demonstrates that agents with procedural memory significantly outperform those with only declarative memory on repeated tasks.

The paper's core mechanism is a memory system that saves successful action sequences in a structured format, indexed by the type of task they solved. When the agent encounters a similar task, it retrieves the relevant procedure and adapts it.

How we used it: The Trajectory Miner's Procedural Skill Extraction phase implements ProcMEM's idea. For sessions that scored above 0.85 on all dimensions, the agent extracts what worked:

SKILL: {
id: "S-001",
description: "What the agent did well",
trigger: "When to apply this skill",
steps: [...],
tools_used: [...],
context_requirements: [...]
}

The trigger field is critical — it defines when a future agent should recall this skill. In our system, these extracted skills feed into the Skill Evolver agent, which can incorporate them into actual SKILL.md files that all agents read. This closes the loop: good behavior gets codified into instructions.

SWE-Replay: Recycling Trajectories at Critical Decision Points

SWE-Replay (Ning et al., 2025) focuses specifically on software engineering agents. Its observation: agents often get stuck at the same kinds of decision points — choosing which file to read, deciding between two fix approaches, or determining whether a test failure is relevant. The paper proposes identifying these "critical steps" and replaying successful trajectory fragments from prior sessions.

The innovation is not just replay but selective replay — knowing which moments in a trajectory are the high-leverage decision points where the right choice cascades into success and the wrong choice cascades into failure.

How we used it: The Trajectory Miner identifies Replay Candidates:

REPLAY: {
stuck_session: "session_id",
stuck_at: "description of the critical step",
successful_pattern: "S-xxx",
expected_improvement: "What would change"
}

This connects failing sessions to successful patterns. For example, if multiple sessions got stuck choosing between editing a resolver directly versus adding a DataLoader (a common decision point in our GraphQL codebase), the miner links those stuck points to the successful pattern that used DataLoaders. The downstream agents then know: when you hit this decision point, here's what worked before.

Beyond Static Summarization: Proactive Self-Questioning

"Beyond Static Summarization" (Li et al., 2025) challenges the common practice of having agents produce flat summaries of their findings. Instead, it proposes that agents should ask themselves probing questions about their own analysis — a form of epistemic self-awareness.

The paper shows that agents that question their own conclusions produce more reliable analysis, catch their own biases, and flag genuine uncertainty rather than presenting everything with false confidence.

How we used it: The Trajectory Miner includes a mandatory Self-Questioning phase. For every pattern discovered, the agent must ask:

  • Is this a symptom or root cause?
  • Could this be caused by missing context rather than bad logic?
  • Is the fix in the skill instructions, the code, or the architecture?
  • Would this pattern disappear if a different model were used?
  • Is there a simpler explanation (e.g., truncated context)?

This prevents the most common failure mode we observed in early versions: the miner would identify a "pattern" that was actually just a side effect of context window truncation. The self-questioning catches this by forcing the agent to consider simpler explanations before proposing complex ones.

How It Fits in a Six-Agent Pipeline

The Trajectory Miner is the first agent in the improvement pipeline:

mine → audit → evolve/apply (parallel) → verify

It reads from ~/.claude/state/improvements/ — JSON files generated by our stop_hook scoring system, which evaluates every Claude Code session on dimensions like task completion, tool efficiency, skill adherence, and routing accuracy. Sessions scoring below threshold get queued for analysis.

The miner's output — a structured mining report at ~/.claude/state/mining-report.json — becomes the input for two downstream agents:

  1. Codebase Auditor receives pattern IDs to investigate in the actual code
  2. Skill Evolver receives extracted skills to incorporate into agent instructions

The Meta-Optimizer coordinates this flow, deciding when to mine, what to prioritize, and whether the system is in an improvement phase or approaching saturation.

What We Learned Building It

Most autonomous coding systems are stateless across sessions. Each invocation starts fresh, repeating mistakes and rediscovering solutions. The Trajectory Miner breaks this pattern by creating institutional memory — not as a monolithic knowledge base, but as structured patterns, procedures, and replay candidates that other agents can act on.

The key design choice was making the miner a pure analyst. It never writes code, never edits prompts, never makes decisions about what to fix. It only produces intelligence. This separation of concerns means it can be aggressive in its analysis without risk — the worst case is a false pattern that gets filtered out by downstream agents.

Seven rules govern its behavior, but the most important is rule 7: "Be skeptical — correlation is not causation." In a system designed to improve itself, the biggest risk is false positives that trigger unnecessary changes, creating churn instead of improvement. The miner's job is not to find everything — it's to find the patterns that are real.

The answer to "why do AI agents keep making the same mistakes" turns out to be simple: nobody built the memory system. The hard part isn't the mining — it's the discipline to only act on patterns that are real.

References

  1. "AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement." arXiv preprint, 2026. https://arxiv.org/abs/2601.22758

  2. Fang, R., et al. "Mem^p: Exploring Agent Procedural Memory." arXiv preprint, 2025. https://arxiv.org/abs/2508.06433

  3. "SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents." arXiv preprint, 2026. https://arxiv.org/abs/2601.22129

  4. Yang, C., et al. "Beyond Static Summarization: Proactive Memory Extraction for LLM Agents." arXiv preprint, 2026. https://arxiv.org/abs/2601.04463


This article is part of a six-part series on building autonomous self-improvement agents, grounded in research from VoltAgent/awesome-ai-agent-papers. Data and implementation details from nomadically.work.

How I Built a UX Team with Claude Code Agent Teams

· 16 min read
Vadim Nicolai
Senior Software Engineer
TL;DR

Set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in .claude/settings.json. Write a command file in .claude/commands/ and spawn prompts in .claude/team-roles/. Type /ux-team and three agents — UX Lead, UX Researcher, UI Designer — run in parallel: researcher defines personas and journeys, designer builds the component system, lead synthesizes into a spec. File ownership is enforced by persona, not by filesystem. BMAD Method v6 provides the Sally persona and a quality-gate checklist that runs before the spec is marked complete.

BMAD Method + Langfuse + Claude Code Agent Teams in Production

· 16 min read
Vadim Nicolai
Senior Software Engineer

Running AI agents in a real codebase means solving three intertwined problems at once: planning and quality gates (so agents don't drift), observability (so you know what's working), and orchestration (so multiple agents divide work without clobbering each other). In nomadically.work — a remote EU job board with an AI classification and skill-extraction pipeline — these problems are solved by three complementary systems: BMAD v6, Langfuse, and Claude Code Agent Teams. This article explains how each works and how they compose.