Skip to main content

25 posts tagged with "LangGraph"

Building stateful, multi-step AI agents and workflows with LangGraph — graphs, reducers, checkpointing, and durable execution.

View All Tags

Closing the Loop: Evaluation, Debate, and Discovery

· 13 min read
Vadim Nicolai
Senior Software Engineer

The most stubborn bottleneck in autonomous knowledge graphs is not retrieval accuracy or latency — it is evaluation. Every edge inserted, every relationship inferred, every hypothesis proposed can be wrong, and the only way to know is to verify. But verification is itself becoming an agentic problem, and the 2026 literature is blunt about it: the evaluator must be as sophisticated as the generator. The question is no longer whether to close the loop but how — and the answer is a layered design that combines a deterministic rule engine, an agent-as-judge, multi-agent debate for contested edges, and autonomous discovery, all gated by a hard abstain-under-uncertainty rule.

This is article #5, the final guardrail in the Autonomous Knowledge Graphs series. It closes the loop over the graph that #1 builds, #2 reasons over, #3 repairs, and #4 remembers — under the same fleet constraints, with a ≥ 0.80 eval bar on every prompt path and grounding-first provenance throughout.

The Graph as Agent Memory

· 15 min read
Vadim Nicolai
Senior Software Engineer

The graph as agent memory rejects the notebook metaphor. A notebook remembers what you wrote, but not when you believed it, nor when the fact itself was true. Flat vector stores and long-context transformers collapse time into a single present, and an agent that cannot distinguish "I knew this yesterday" from "this is still true today" is not reasoning — it is repeating. A bi-temporal knowledge graph — one that records both valid_at (when the fact held in the world) and recorded_at (when the agent ingested it) — turns memory from a static log into a navigable, revision-conscious archive where nothing is deleted and facts are superseded by stamping invalid_at.

This is article #4 in the Autonomous Knowledge Graphs series. The lead and account graph from #1 doubles as the agent's long-term memory of an account across months of sessions, under the same fleet constraints: LangGraph control, Cloudflare D1 data, DeepSeek-only egress, grounding-first provenance, a ≥ 0.80 eval bar, and draft-first approval.

Self-Healing Knowledge Graphs: Graphs That Fix Themselves

· 13 min read
Vadim Nicolai
Senior Software Engineer

Provenance is not truth. A triple can be perfectly traced to a published source and still be wrong — contradicted by a later signal, inconsistent with the schema, or hallucinated by the model that extracted it. The industry has spent years building better provenance; the harder problem is what to do when provenance says the fact is sourced but the fact is still garbage. The sharpest 2026 statement of this is TGComplete, which finds that most gold-correct edges have no supporting passage even under exhaustive retrieval — so textual verification measures provenance, not correctness (Kang et al., 2026, arXiv:2606.15833).

This is article #3 in the Autonomous Knowledge Graphs series, and it is a guardrail. Where #1 builds the lead and account graph and #2 reasons over it, this article keeps it accurate over time. It obeys the same fleet constraints — LangGraph control, Cloudflare data, LangSmith observability, DeepSeek-only egress, a ≥ 0.80 eval bar, grounding-first provenance, draft-first approval — and runs as a background sweep over the stored graph.

Reasoning Over the Graph: From GraphRAG to Planning Agents

· 13 min read
Vadim Nicolai
Senior Software Engineer

Agentic GraphRAG treats the knowledge graph not as a static index to retrieve from once, but as a state space to reason over one node at a time. GraphRAG proved that structured knowledge could be retrieved at generation time — but a one-shot subgraph either drowns the LLM in irrelevant triples or misses the one critical edge. A question like "which board members at our lead's parent company also sit on our other accounts' boards?" is a sequence of decisions: which edge to follow, which node to expand, when to backtrack. That is a planning problem, and the 2026 research corpus has converged on agentic traversal to solve it.

This is article #2 in the Autonomous Knowledge Graphs series. It reasons over the lead and account graph that article #1 builds, and obeys the same fleet constraints: a LangGraph control plane, a Cloudflare D1 data plane, DeepSeek-only egress, grounding-first provenance, a ≥ 0.80 eval bar on every prompt path, and draft-first human approval. The worked example is an explainable lead-and-account recommendation: the agent returns not just an answer but the supporting sub-graph as evidence.

Autonomous Knowledge Graph Construction: Graphs That Build Themselves

· 16 min read
Vadim Nicolai
Senior Software Engineer

Autonomous knowledge graph construction is the pattern where one agent loop owns the entire lifecycle of a graph — read a source, search what is already known, verify a candidate fact, then write it — instead of running a one-shot batch extraction and hoping a later merge step cleans up the mess. The cleanest 2026 formulation is RAGA, which gives an LLM agent a CRUD toolset over the graph and constrains it with a Read-Search-Verify-Construct loop (Han & Cheng, 2026, arXiv:2605.17072).

This is the first article in a new series, Autonomous Knowledge Graphs, a connected five-part arc that climbs the same autonomy ladder as The Autonomous Sales Fleet — from human-curated graphs up to graphs that build, reason, repair, remember, and evaluate themselves. Every design in the series obeys the same fleet constraints: a LangGraph control plane, a Cloudflare D1 data plane, DeepSeek-only model egress through one gateway, a grounding-first record on every write — {confidence, reason, source, evidence} — and draft-first human approval at every irreversible step. The worked example throughout is a lead and account knowledge graph: the substrate the rest of the fleet reasons over.

Sales-Enablement Copilot: Deal Coaching & Objections

· 21 min read
Vadim Nicolai
Senior Software Engineer

The most effective sales-enablement copilot in our production fleet never sends a single message. That cuts against every vendor demo where a glowing AI drafts the perfect rebuttal and fires it off. This sales-enablement copilot does grounded deal coaching and objection handling, but in production the highest-leverage capability is not generation — it is holding fire. The agentic-sales fleet runs a LangGraph state machine where every objection-handling draft is stamped status='draft' and routed to a human for approval. The copilot coaches, suggests, and grounds its advice in company knowledge, but it never touches the send button. That single design choice turns a liability into an asset: the rep gets a grounded, auditable recommendation that she still owns.

On the fleet's autonomy ladder this capability sits deliberately medium — it is rep-assist, not self-direction. It automates the plan step: what grounded coaching and rebuttal a given objection deserves. But it hands both act and verify to the human. The copilot drafts and grounds; the rep decides and sends. That is a conscious rung below the orchestrator and the lead-to-proposal multi-agent pipeline. The failure cost of an objection rebuttal — repeating a hallucinated compliance claim to a live prospect — is high enough that earning the send is not worth it.

This is article #4 in The Autonomous Sales Fleet series, and like every entry it adds exactly 1 capability as 1 real graph: a company-knowledge-grounded objection-handling copilot that feeds the reply path, backed by a faithfulness gate and a per-vertical playbook of 9 entries. It builds on the shared fleet introduced in An Autonomous CRM Orchestrator with LangGraph (#1) and the typed task sequencing of A Multi-Step Lead-Qualification and Sales-Support Agent (#2).

NL-to-SQL CRM Analytics on Cloudflare D1 + Self-Healing

· 22 min read
Vadim Nicolai
Senior Software Engineer

A sales operator types "how many fintech contacts replied last week?" and gets an answer. No one writes SQL. This is NL-to-SQL CRM analytics on Cloudflare D1: the text_to_sql graph translates the question, runs it on D1, and — when the query fails — heals itself from the database's own error message. That last move is the load-bearing idea behind the self-healing loop: the database is not a passive recipient of your SQL. It is the most honest verifier you have.

That inversion drives Evaluating Open-Source LLM Agents for SQL Generation and Structured Analytics on Relational Databases, by Borovčak, Bagić Babac, and Mornar in Computers, Materials & Continua (2026). You do not demand a perfect one-shot translation. You let the query run, read the error, and regenerate against that diagnostic. The error text is the repair signal. Execution accuracy, not string overlap, is the metric that counts. The 7 numbered findings below are the evidence, and they map onto a 7-node production graph.

This is article #5 of a 10-part series, "The Autonomous Sales Fleet" — one production LangGraph + DeepSeek + Cloudflare-D1 + LangSmith system. Each part realizes one 2026 paper as one real graph. This one is the text_to_sql graph in backend/graphs/text_to_sql_graph.py, one of 39 registered in the fleet. It answers questions over the 4 CRM tables in the Cloudflare D1 database lead-gen-jobs. It generates a SELECT, validates it against a hard read-only gate, runs it, and repairs its own failures up to 2 times. No write path is ever reachable.

On the fleet's autonomy ladder this capability sits medium. It fully automates the plan→act span for a read-only analytics question. The graph translates intent to SQL, runs it, and heals its own failures with no human writing a query. The database's SELECT-only gate is what lets it act unattended. The operator reading the 1-to-2 sentence summary is the verify step. It earns that autonomy because the action space is structurally incapable of mutating data. A write-capable version would drop back down the ladder, behind human approval.

Two siblings frame this one. Article #1, Reason→Decompose→Act→Verify — an Autonomous CRM Orchestrator on LangGraph, reasons over signals and dispatches worker graphs. This graph answers the operator's question about the pipeline itself. Article #9, Evidence-Driven Release Gates for LLM Sales Agents, is the eval harness. It holds every prompt path here to the fleet's ≥0.80 bar before a change ships.

Lead Qualification Sequence: Chatbot to Sales Agent

· 26 min read
Vadim Nicolai
Senior Software Engineer

From Scripted Chatbot to Multi-Step Sales Agent: How to Build a Lead Qualification Sequence That Works

A multi-step lead qualification agent earns its autonomy by sequencing work no human queued: it decomposes an inbound signal into an ordered plan, grades each step against real data, and stops at a human-approval interrupt before anything ships. That is the line between a scripted chatbot and an agent — not a newer model or a sharper prompt, but a decision about who gets to sequence work. A chatbot automates a single turn; an agent automates the workflow that turn belongs to. On the fleet's autonomy ladder this capability sits high: it takes over the human plan step for an inbound lead — deciding which qualification and analysis tasks to run, and in what order — while every act stays a draft held for human verify.

The autonomy guard here is conservative by construction. The agent never sends; it composes, and the message is held as a pending draft behind a confirm-before-mutate interrupt, with a deterministic safety veto sitting upstream of the planner so a hostile or malformed plan can never reach a suppressed contact. That is the posture this article builds: reasoning is delegated, action is gated. Article #1's orchestrator dispatches into this qualifier; this is where the fleet first replaces a rep's "is this lead worth my time, and what do I do next?" judgement with a graded, auditable, draft-first sequence.

This is article #2 in The Autonomous Sales Fleet, a connected series describing one production agentic-sales system where each piece adds exactly one capability. The fleet shares a single architecture: a control plane of LangGraph StateGraphs, a data plane on Cloudflare (D1, Workers, Queues), and an observability plane of LangSmith tracing with per-graph golden datasets. Every LLM call exits through one DeepSeek endpoint behind a Cloudflare AI Gateway; no graph ships unless its golden dataset passes an eval gate; every persisted AI decision carries a four-field provenance record; and outreach is always draft-first, held for human approval. This article builds on The Autonomous CRM Orchestrator on LangGraph (#1) and connects forward to the Lead-to-Proposal Multi-Agent Pipeline (#3), which takes the qualified lead as a conceptual starting point.

The strongest evidence for constraining an agent the way this one does comes from AgentArch (Bogavelli, Sharma & Subramani, 2025), a benchmark of 18 agentic configurations across orchestration, prompt strategy, memory, and thinking-tool usage. It finds "significant model-specific architectural preferences" that break the one-size-fits-all assumption, with top models clearing only 35.3% of the complex enterprise task and 70.8% of the simpler one. When even the best configuration fails two of three hard tasks, an open-ended agent loop is a liability — and a closed, typed, narrow planner is the defensible bet. That is precisely the change this article walks through in a real email_orchestrator graph. Industry framing pieces such as Rai (2026) draw the same chatbot-versus-agent line conceptually; the engineering case rests on the indexed and canonical work cited below.

Lead-to-Proposal Multi-Agent Pipeline in LangGraph

· 25 min read
Vadim Nicolai
Senior Software Engineer

From Lead to Proposal: Building a Multi-Agent Pipeline with LangGraph

A lead-to-proposal pipeline in LangGraph runs an autonomous lead→proposal loop: a raw lead enters, and three specialized agents qualify it, research it from grounded facts, and draft a tailored proposal — every intermediate node executing unattended, with no sales rep between them. That is the whole point of decomposing the work into a multi-agent graph rather than one prompt. The loop earns its autonomy by stopping at exactly one place: a human gate on the send, the single action that carries legal and reputational weight.

That gate is what most implementations get wrong. They either automate everything and lose human oversight at the consequential step, or keep a human in every node and forfeit the throughput the automation was supposed to buy. The pipeline below takes neither path. It automates the expensive cognitive labour — qualify, research, draft — and holds the final verify for an operator, who approves a grounded draft rather than composing one from scratch. The bottleneck was never the proposal itself; it is everything upstream of it, and that is precisely what the loop absorbs.

Hierarchical Coach→Worker Delegation for Agent Teams

· 26 min read
Vadim Nicolai
Senior Software Engineer

A flat agent swarm caps its own autonomy. Let every worker talk to every peer with no leader tracking progress, and the system can run for hours without anyone — human or machine — able to say whether the work was actually done. That is the ceiling this article is about. Hierarchical coach→worker delegation raises it: a single coach plans once, delegates to specialized workers, and those workers act unattended against that one plan instead of re-improvising every step. The autonomy gain is not that more agents run; it is that one durable plan governs many executions over time, so the plan→act→verify loop stops being per-run and becomes a property of the whole campaign.

On the fleet's autonomy ladder this capability sits high. The coach automates the plan step across an entire multi-touch campaign — a sequence that unfolds over weeks, not a single run — and worker subgraphs act against that plan unattended, with the human verify preserved only at each draft's approval. This article grounds that argument in two flag-gated graphs from one production agentic-sales fleet: a campaign-level coach (AA02) and a single-email organized team (AA06). It connects both to the organized-teams paper by Guo et al. (2024) and to decades of organizational evidence. The constants, enums, and feature flags below are read from the code, not from a benchmark. The claim is contrarian because the zeitgeist says "swarm good, hierarchy bad." The evidence says the opposite.