Beyond Fact Retrieval:
Episodic Memory for RAG with
Generative Semantic Workspaces
AAAI 2026 Oral
NeurIPS 2025 Workshop on Language Agents and World Models ⭐ Spotlight
TL;DR
GSW achieves state-of-the-art episodic memory performance with an F1-score of 0.85 on EpBench, outperforming structured RAG baselines by up to 20% in recall while reducing context tokens by 51%.
Motivation
Large Language Models face fundamental challenges with long-context reasoning. Current RAG solutions—from semantic embeddings to knowledge graphs—are designed for fact retrieval but fail to build the space-time-anchored narrative representations needed for tracking entities through evolving situations.
The vast majority of texts are not lists of facts but narratives of evolving real-world situations. Crime reports, political briefings, corporate filings, and news coverage all describe actors that adopt roles and transition through states while interacting across space and time.
Our Approach: Generative Semantic Workspaces (GSW)
Brain-Inspired Design: GSW mirrors the neocortical-hippocampal architecture. The Operator (neocortex) extracts semantic roles and states. The Reconciler (hippocampus) binds them into coherent spatiotemporal sequences.
GSW is a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations. It comprises two core components:
🔍 Operator
Maps incoming text to intermediate semantic structures:
- Actors & Entities: People, places, objects, times
- Roles: Situation-relevant descriptors
- States: Evolving conditions of actors
- Verbs & Valences: Actions and their arguments
- Spatio-Temporal Links: Shared locations/times
- Forward-Falling Questions: Predicted developments
🔄 Reconciler
Integrates semantic structures into a persistent workspace:
- Entity Resolution: Resolves entity matches across evolving narratives.
- Actor States: Tracks actor states over time.
- Spatio-Temporal Coherence: Enforces consistency and grounds actors to the right space and time.
- Predictive Questions: Resolves unanswered predictive questions.
End-to-End Pipeline: Documents are chunked and processed by the Operator to create local semantic graphs. The Reconciler integrates these into a unified global memory. At query time, entity-specific summaries are retrieved, re-ranked, and passed to an LLM.
Results
Performance on EpBench-200 (F1-Score by Query Complexity)
| Method | 0 Cues | 1 Cue | 2 Cues | 3-5 Cues | 6+ Cues | Overall |
|---|---|---|---|---|---|---|
| GSW (Ours) | 0.978 | 0.745 | 0.806 | 0.867 | 0.834 | 0.850 |
| HippoRAG2 | 0.828 | 0.675 | 0.762 | 0.755 | 0.746 | 0.753 |
| Embedding RAG | 0.906 | 0.727 | 0.724 | 0.744 | 0.678 | 0.770 |
| GraphRAG | 0.950 | 0.625 | 0.624 | 0.658 | 0.607 | 0.714 |
| LightRAG | 0.944 | 0.593 | 0.587 | 0.578 | 0.560 | 0.677 |
| Vanilla LLM | 0.883 | 0.709 | 0.582 | 0.484 | 0.323 | 0.642 |
Token Efficiency
| Method | Avg. Tokens/Query | Avg. Cost/Query |
|---|---|---|
| GSW (Ours) | ~3,587 | ~$0.0090 |
| GraphRAG | ~7,340 | ~$0.0184 |
| Embedding RAG | ~8,771 | ~$0.0219 |
| HippoRAG2 | ~8,771 | ~$0.0219 |
| LightRAG | ~40,476 | ~$0.1012 |
| Vanilla LLM | ~101,120 | ~$0.2528 |
Key Insight: GSW's entity-specific summaries provide targeted, query-relevant information—reducing hallucinations and drastically cutting inference costs.
Code
Coming soon.
Cite
If you find our work useful, kindly cite our paper:
@misc{rajesh2025factretrievalepisodicmemory,
title={Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces},
author={Shreyas Rajesh and Pavan Holur and Chenda Duan and David Chong and Vwani Roychowdhury},
year={2025},
eprint={2511.07587},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2511.07587}
}