Skip to main content

20 posts tagged with "LangGraph"

Building stateful, multi-step AI agents and workflows with LangGraph — graphs, reducers, checkpointing, and durable execution.

View All Tags

Knowledge-Graph RAG for Explainable Lead and Account Recommendation

· 11 min read
Vadim Nicolai
Senior Software Engineer

If your first instinct on hearing "knowledge graph" is to reach for Neo4j, you may be over-engineering a lead recommendation system. The past year's research on combining knowledge graphs with retrieval-augmented generation (KG-RAG) for recommendations converges on a pragmatic insight: the most effective KG is often the one you already have. In this design, that is a normalized relational schema of companies, contacts, opportunities, and emails living in a Cloudflare D1 database. Traversing those foreign keys at query time, bounding the fan-out, and feeding the resulting subgraph into an LLM can produce recommendations that are grounded by construction — every explanation path is required to trace back to a real row in the operational store.

LLM Lead Conversion-Propensity Scoring for B2B Lead Prioritization

· 12 min read
Vadim Nicolai
Senior Software Engineer

The published literature on lead scoring converges on a couple of recurring findings. A B2B feature-importance analysis identified lead source and lead status as the most predictive conversion features (Frontiers in AI, 2025). And a supervised classifier trained on labelled outcomes tends to beat both rule-based heuristics and manual qualification. Yet many B2B teams deploying an LLM for lead prioritisation skip the classifier, skip the labelled outcomes, and instead ask the model to reason its way to a score from contact evidence. Is that defensible, or is it cargo-cult AI?

LLM Sales-Email Intent Scoring for Inbound Lead Prioritization

· 10 min read
Vadim Nicolai
Senior Software Engineer

A practical LLM-based intent-scoring design can do exactly one thing: make a single call to a language model, read a few floating-point scores, and fall back to a keyword heuristic if the model fails. No multi-agent orchestration. No fine-tuned BERT. No LightGBM ensemble. And according to the 2026 literature, an LLM semantic scorer outperforms keyword-based intent detection (Sanjei et al., 2026). The useful insight is that an effective design for sales-email intent scoring can also be one of the simplest — a bounded, schema-constrained LLM step embedded inside an existing dataflow graph, designed to fail open rather than cascade errors downstream. This article unpacks why that design is attractive, what the research actually says, and how to build it without over-engineering.

Durable Execution in LangGraph: Agents That Survive Failure and Resume Where They Left Off

· 12 min read
Vadim Nicolai
Senior Software Engineer

Most AI agents are built as a single process holding state in memory: a while loop, local variables, maybe a sleep(). That holds up until the workflow has to outlive the process that started it — and in production it always does. The math is unforgiving: chain ten steps that each succeed 85% of the time and the whole run finishes only about 20% of the time (0.85¹⁰ ≈ 0.20). Without durability, every one of those failures restarts from scratch. The model might be reliable; the tool calls aren't. Better LLMs don't fix network failures — only durable execution does.

The research consensus is that the infrastructure around the model, not the model itself, is where production agents live. The 2026 design-space analysis Dive into Claude Code found that only 1.6% of Claude Code's codebase is AI decision logic; the other 98.4% is operational infrastructure for context management, tool routing, and recovery. LangGraph's answer to that reality is durable execution through its persistence layer — making the agent a row in a checkpoint store, not a stack frame in a living process. This article dissects how that works, the sharp edges it creates, and how to observe a workflow that — by design — no longer runs as a single process.

LangGraph v3 Event Streaming: Typed Projections Over a Content-Block Protocol

· 13 min read
Vadim Nicolai
Senior Software Engineer

Streaming an LLM to a user is easy. Consuming the stream on the server — token deltas, reasoning deltas, tool-call chunks, per-node state, subgraph events, usage metadata — is the part that turns into a pile of if chunk["type"] == ... branches. I shipped a streaming endpoint last week on LangGraph version="v2", because that is what's installed (1.1.8 locally, 1.2.4 on the server). The hand-rolled consumer was about twenty lines of fragile branching, a keepalive hack to stop a proxy from dropping the connection during DeepSeek's silent reasoning phase, and a manual accumulator that reset whenever langgraph_node changed.

LangGraph's version="v3" event-streaming API is what I'd reach for next, and the diff is the interesting part: it deletes most of that parsing. Instead of one undifferentiated event firehose you branch on, v3 gives you typed, per-channel projections you iterate independently, built on a content-block protocol that makes text, reasoning, tool-call, and multimodal boundaries explicit. v1 and v2 are unchanged. This is a walk through what v3 actually is, what it removes from your code, and where it still leaves work for you.

Observable AI Memory: mem0, LangGraph, and Qdrant with Enterprise-Grade Telemetry

· 13 min read
Vadim Nicolai
Senior Software Engineer

Most "AI memory" demos stop at memory.add() and memory.search(). That works on a laptop. It does not survive contact with production. The real questions are: When this recall is slow, which store is to blame? When a graph's spend triples overnight, which feature caused it? When a customer asks "what did your agent remember about me, and when?", can you answer from an audit log instead of a shrug?

TL;DR — This field report shows how to build an agent memory layer where every operation honors a contract: fail-open, PII-safe, and fully instrumented. Three stores (mem0, Qdrant, LangGraph) are funneled through single chokepoints, and each chokepoint fans out to five telemetry sinks. The result is a stack that answers the hard production questions without guesswork.

Multi-Modal Evaluation for AI-Generated LEGO Parts: A Production DeepEval Pipeline

· 19 min read
Vadim Nicolai
Senior Software Engineer

Your AI pipeline generates a parts list for a LEGO castle MOC. It says you need 12x "Brick 2 x 4" in Light Bluish Gray, 8x "Arch 1 x 4" in Dark Tan, and 4x "Slope 45 2 x 1" in Sand Green. The text looks plausible. But does the part image next to "Arch 1 x 4" actually show an arch? Does the quantity make sense for a castle build? Would this list genuinely help someone source bricks for the build?

These are multi-modal evaluation questions — they span text accuracy, image-text coherence, and practical usefulness. Standard unit tests cannot answer them. This article walks through a production evaluation pipeline built with DeepEval that evaluates AI-generated LEGO parts lists across five axes, using image metrics that most teams haven't touched yet.

The system is real. It runs in Bricks, a LEGO MOC discovery platform built with Next.js 19, LangGraph, and Neon PostgreSQL. The evaluation judge is DeepSeek — not GPT-4o — because you don't need a frontier model to grade your outputs.

CrewAI's Genuinely Unique Features: An Honest Technical Deep-Dive

· 14 min read
Vadim Nicolai
Senior Software Engineer

TL;DR — CrewAI's real uniqueness is that it models problems as "build a team of people" rather than "build a graph of nodes" (LangGraph) or "build a conversation" (AutoGen). The Crews + Flows dual-layer architecture is the core differentiator. The role-playing persona system and autonomous delegation are ergonomic wins, not technical breakthroughs. The hierarchical manager is conceptually appealing but broken in practice. This post separates what's genuinely novel from what's marketing.