2 posts tagged with "OpenTelemetry"

LangGraph v3 Event Streaming: Typed Projections Over a Content-Block Protocol

June 3, 2026 · 13 min read

Senior Software Engineer

Streaming an LLM to a user is easy. Consuming the stream on the server — token deltas, reasoning deltas, tool-call chunks, per-node state, subgraph events, usage metadata — is the part that turns into a pile of if chunk["type"] == ... branches. I shipped a streaming endpoint last week on LangGraph version="v2", because that is what's installed (1.1.8 locally, 1.2.4 on the server). The hand-rolled consumer was about twenty lines of fragile branching, a keepalive hack to stop a proxy from dropping the connection during DeepSeek's silent reasoning phase, and a manual accumulator that reset whenever langgraph_node changed.

LangGraph's version="v3" event-streaming API is what I'd reach for next, and the diff is the interesting part: it deletes most of that parsing. Instead of one undifferentiated event firehose you branch on, v3 gives you typed, per-channel projections you iterate independently, built on a content-block protocol that makes text, reasoning, tool-call, and multimodal boundaries explicit. v1 and v2 are unchanged. This is a walk through what v3 actually is, what it removes from your code, and where it still leaves work for you.

Observable AI Memory: mem0, LangGraph, and Qdrant with Enterprise-Grade Telemetry

June 2, 2026 · 13 min read

Vadim Nicolai

Senior Software Engineer

Most "AI memory" demos stop at memory.add() and memory.search(). That works on a laptop. It does not survive contact with production. The real questions are: When this recall is slow, which store is to blame? When a graph's spend triples overnight, which feature caused it? When a customer asks "what did your agent remember about me, and when?", can you answer from an audit log instead of a shrug?

TL;DR — This field report shows how to build an agent memory layer where every operation honors a contract: fail-open, PII-safe, and fully instrumented. Three stores (mem0, Qdrant, LangGraph) are funneled through single chokepoints, and each chokepoint fans out to five telemetry sinks. The result is a stack that answers the hard production questions without guesswork.