The Graph as Agent Memory
The graph as agent memory rejects the notebook metaphor. A notebook remembers what you wrote, but not when you believed it, nor when the fact itself was true. Flat vector stores and long-context transformers collapse time into a single present, and an agent that cannot distinguish "I knew this yesterday" from "this is still true today" is not reasoning — it is repeating. A bi-temporal knowledge graph — one that records both valid_at (when the fact held in the world) and recorded_at (when the agent ingested it) — turns memory from a static log into a navigable, revision-conscious archive where nothing is deleted and facts are superseded by stamping invalid_at.
This is article #4 in the Autonomous Knowledge Graphs series. The lead and account graph from #1 doubles as the agent's long-term memory of an account across months of sessions, under the same fleet constraints: LangGraph control, Cloudflare D1 data, DeepSeek-only egress, grounding-first provenance, a ≥ 0.80 eval bar, and draft-first approval.
The Gap: Why Flat Memory Fails Agents
Agent memory today falls into two camps. The flat log gives perfect recall of when something was said but no structure for reasoning across entities. The vector store excels at semantic similarity but has no notion of sequence or validity, so it cannot answer "what changed between last week and today?" Long-context models are offered as a third option, but attention dilutes across irrelevant history as context grows, making multi-hop reasoning unreliable — packing more tokens adds noise, not structure. Agents need structured, temporal, and traceable memory, and a graph is the representation that natively supports all three. The 2026 landscape reflects this: two large surveys reframe the graph as the agent-memory substrate (Yang et al., 2026, arXiv:2602.05665; Huang et al., 2026, arXiv:2602.06052).
Bi-Temporal Graphs: The Why and the How
The anchor is Engram, which proposes a bi-temporal knowledge graph for agent memory with salience decay and asynchronous consolidation (Wang, 2026, arXiv:2606.09900). Each fact — an edge between entity nodes — carries two dates: valid_at records the real-world time the relationship held; recorded_at records ingestion time. A third, invalid_at, is set when the fact is superseded; nothing is physically deleted.
Why two timestamps? Consider an agent managing accounts. A contract tier changes on the 1st, but the agent learns it on the 5th. Without bi-temporal stamps the system cannot answer "what did the agent think the tier was on the 3rd?" — the answer should be the old tier, because the new fact had not been learned yet. A single-timestamp graph would return the new tier on any query after the 5th, regardless of when the knowledge was acquired. Each edge is therefore a tuple (subject, predicate, object, valid_at, recorded_at, confidence, reason, source, evidence): the confidence must clear the write-time grounding gate, and reason/source/evidence link the fact back to the conversation or document that produced it.
Reference Architecture: LangGraph + D1 + Bi-Temporal Edges
LangGraph manages the agent lifecycle — extraction, storage, retrieval, evolution — while Cloudflare D1 holds the bi-temporal graph as property-graph tables with indexed timestamp columns:
CREATE TABLE edges (
id TEXT PRIMARY KEY,
subject TEXT NOT NULL, predicate TEXT NOT NULL, object TEXT NOT NULL,
valid_at INTEGER NOT NULL, -- world-time the fact became true
valid_to INTEGER DEFAULT NULL, -- world-time it stopped being true (open = still true)
recorded_at INTEGER NOT NULL, -- system-time the agent learned it
invalid_at INTEGER DEFAULT NULL, -- system-time a later belief superseded it
salience REAL DEFAULT 1.0, -- decayed by consolidation; drives eviction
last_retrieved_at INTEGER, -- bumped on read; feeds salience decay
archived_at INTEGER DEFAULT NULL, -- moved to cold storage (eviction, NOT supersession)
confidence REAL CHECK(confidence >= 0.0 AND confidence <= 1.0),
reason TEXT, source TEXT, evidence TEXT,
UNIQUE(subject, predicate, object, valid_at, recorded_at)
);
CREATE INDEX idx_edges_systime ON edges (recorded_at, invalid_at);
CREATE INDEX idx_edges_validtime ON edges (valid_at, valid_to);
This is bi-temporal in the strict sense: valid_at/valid_to is the world-time interval (when the fact was true), and recorded_at/invalid_at is the system-time interval (when the agent believed it). "What did the agent believe on date D?" filters system-time (recorded_at ≤ D < invalid_at); "what was actually true on date D?" filters world-time (valid_at ≤ D < valid_to). Crucially, eviction is kept distinct from supersession: a contradicted fact gets invalid_at, while a merely cold fact gets archived_at — so salience-based forgetting never silently drops a still-true edge out of point-in-time belief queries.
The Four-Phase Memory Lifecycle
GAM defines a hierarchical graph memory that decouples encoding from consolidation via an event-progression graph plus a topic associative network (Wu et al., 2026, arXiv:2604.12285). This design adapts that into 4 phases:
- Extraction. Raw messages go to a DeepSeek model that emits triples with a confidence score and evidence; only triples clearing the grounding gate survive, trading some recall for less noise.
- Storage. The triple is inserted as an edge. If one with the same
subject, predicate, objectandvalid_atexists, the newrecorded_atbecomes a separate row — no destructive update — preserving the full history of belief changes. - Retrieval. The system fixes the relevant time window (
now, or an explicit date for historical queries) and returns edges valid in that window. Following Mnemis (Tang et al., 2026, arXiv:2602.15313), retrieval is dual-route: a fast similarity-first System-1 path narrows candidates, then a slower System-2 traversal handles structured multi-hop reasoning. - Evolution. Consolidation runs asynchronously: redundant edges (repeated
recorded_atwith unchangedvalid_at) are merged, and edges weakened by downstream verification are quarantined rather than deleted. The 2026 memory survey frames exactly this extract → store → retrieve → evolve lifecycle, naming evolution as the stage most systems under-build (Yang et al., 2026).
Governance: Consolidation and Forgetting
Memory without forgetting grows without bound. The bi-temporal design makes forgetting safe — a fact is never lost, only marked superseded or lowered in access priority. Consolidation has two levers:
- Salience decay. Each cycle lowers the
salienceof edges whoselast_retrieved_atis old; oncesaliencefalls below a cold threshold the edge is stampedarchived_atand moved to a cold table — eviction, distinct from theinvalid_atused for supersession, so a still-true fact is never dropped from belief queries just for being cold. Engram uses a configurable decay scheme of this kind. - Subsumption. When a new edge with the same
subject, predicate, objectbut a latervalid_atarrives, the old edge is stampedinvalid_atat the new edge'srecorded_at. This is the belief-revision mechanism, and it never performs a destructive delete.
The cost is real: every fact update creates at least two rows (the superseding stamp and the new edge), so bi-temporal write amplification accumulates and is mitigated by periodic cold-compaction that rewrites the active table without invalidated edges.
Failure Modes: Semantic Drift and Contamination
Semantic drift. Over many cycles, an entity can accumulate conflicting predicates. Status changing over time is correct, but a tier asserted twice without a clear valid_at produces contradictory data over overlapping windows. The fix is to require every edge to carry a valid_at — explicit from the source, or inferred from the interval between successive recorded_at values — so the timeline stays unambiguous.
Contamination. A single bad extraction can propagate through multi-hop retrieval and poison downstream decisions. The grounding gate is the first defense; the second is a separate DeepSeek pass that re-scores the top retrieved paths and rejects the result set (the agent answers "I don't know") when path confidence is too low. MAGMA argues for separating semantic, temporal, causal, and entity graphs so contamination in one does not spread to the others (Jiang et al., 2026, arXiv:2601.03236) — a direction worth evaluating.
Decision Table: Graph vs Vector vs Long Context
| Requirement | Graph memory (this design) | Vector memory | Long context |
|---|---|---|---|
| Multi-hop reasoning across entities | Native | weak (needs chunk linking) | degrades as context grows |
| Temporal "what did the agent believe on date X?" | Bi-temporal stamps | no time metadata | must re-read history |
| Audit trail | Full history retained | overwrites | only via external logging |
| Very high write throughput | consolidation is a bottleneck | fast insert | no writes (static) |
| Frequent destructive updates | append-only (stamped) | direct removal | regenerate context |
For agents reasoning over customer accounts, support tickets, or multi-session projects, graph memory wins on recall quality; the cost is higher infrastructure complexity and slower writes.
Numbered Limitations
- Consolidation scales with edge count. Full consolidation grows with the active graph; sharding by subject helps but makes cross-shard joins expensive.
- Entity resolution is unsolved at scale. The graph assumes clean entity IDs, but "Acme Inc." and "Acme Corporation" are the same account; a lightweight deduper helps, and no 2026 paper solves this at scale for agent memory.
- The confidence threshold is static. A production memory should vary the write gate by context — higher for critical or financial facts, lower for exploratory preferences — rather than hard-coding one value; this design hard-codes it. (MAGMA's multi-graph separation of semantic, temporal, causal, and entity views is a complementary way to make retrieval context-aware.)
- Write amplification. Every update writes at least two rows; periodic cold-compaction reclaims space but needs an exclusive lock window.
- No standard benchmark. The 2026 surveys (Yang et al.; Huang et al.) call for one, but none exists, so any latency or accuracy figures are implementation-specific and none are claimed here.
- Inter-agent merging is open. In a multi-agent fleet, merging two belief graphs with overlapping entities can create contradiction cascades — an active problem with no production-ready solution.
Conclusion
The graph as agent memory is a statement about what an agent is. Bi-temporal graphs let an agent introspect its own history — "when did I learn X?", "was I ever wrong about Y?" — which is a prerequisite for self-correction and for trust: when a user asks "why did you do that?", the answer should be a traceable path through a graph, not a vector-similarity black box. The architecture here — LangGraph orchestration, D1 storage, DeepSeek extraction behind a confidence gate, and bi-temporal stamps on every edge — is pragmatic enough to deploy; the hard problems (entity resolution, inter-agent merging) remain open. The message is simple: give the agent a graph, stamp everything with two dates, and never delete.
Frequently Asked Questions
What is a bi-temporal knowledge graph for agent memory? It is an agent's long-term memory stored as a graph where every edge carries valid_at (when the fact held) and recorded_at (when the agent learned it). A superseded fact is stamped invalid_at rather than deleted, so memory is a revision-conscious archive, not a flat log.
Why do flat logs and vector stores fail as agent memory? A flat log records when something was said but offers no structure across entities; a vector store finds similar memories but has no notion of validity, so it cannot answer "what changed since last week." Long context dilutes attention. Graphs provide structured, temporal, traceable memory.
Why store two timestamps instead of one? If a tier changes on the 1st but the agent learns it on the 5th, only a bi-temporal graph can answer "what did the agent believe on the 3rd?" — the old tier. valid_at captures world time, recorded_at captures ingestion time, and the gap between them is where belief revision lives.
How does the memory avoid unbounded growth without deleting? Consolidation runs asynchronously with two levers: salience decay lowers the priority of edges not retrieved over time, and subsumption stamps invalid_at when a later valid_at supersedes a fact. Superseded edges move to cold storage and stay queryable.
When should an agent use graph memory over vector memory? When it must reason multi-hop across entities, answer temporal point-in-time questions, and keep an audit trail — for example over customer accounts across many sessions. Vector memory suits pure similarity at very high write rates; long context suits a single static read.
Autonomous Knowledge Graphs — the series
- Autonomous Knowledge Graph Construction: Graphs That Build Themselves (autonomy: high)
- Reasoning Over the Graph: From GraphRAG to Planning Agents (autonomy: high)
- Self-Healing Knowledge Graphs: Graphs That Fix Themselves (guardrail)
- The Graph as Agent Memory (this article — autonomy: medium)
- Closing the Loop: Evaluation, Debate, and Discovery (guardrail)
A companion thread to The Autonomous Sales Fleet. Next: #5 Closing the Loop.
References
- Liuyin Wang. Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents (Engram). 2026. arXiv:2606.09900. https://arxiv.org/abs/2606.09900
- Zhaofen Wu et al. GAM: Hierarchical Graph-based Agentic Memory for LLM Agents. 2026. arXiv:2604.12285. https://arxiv.org/abs/2604.12285
- Zihao Tang et al. Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory. 2026. arXiv:2602.15313. https://arxiv.org/abs/2602.15313
- Dongming Jiang et al. MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents. 2026. arXiv:2601.03236. https://arxiv.org/abs/2601.03236
- Chang Yang et al. Graph-based Agent Memory: Taxonomy, Techniques, and Applications. 2026. arXiv:2602.05665. https://arxiv.org/abs/2602.05665
- Wei-Chieh Huang et al. Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey. 2026. arXiv:2602.06052. https://arxiv.org/abs/2602.06052
