The model is the CPU.
Membrane is the OS.
Persistent, structured memory for any LLM — without retraining, fine-tuning, or modifying the model. The model doesn't know Membrane exists.
Your context window — finite, precious, always filling up.
What current LLMs remember between conversations. Out of the box.
Total Membrane overhead on top of your main model call. Most turns cost less.
Three operations.
Every turn.
Membrane executes three operations in sequence on every conversation turn — transparent to the model, zero configuration required.
Promote
Retrieve the most relevant facts from the persistent store and inject them into the model's context window — before the model sees the message.
Extract
Pull discrete facts from the completed conversation turn and store them with embeddings and explicit typed relationships.
Repair
Check the model's response for contradictions against stored facts using two candidate pools for maximum coverage.
When a topic cluster grows beyond a threshold, Membrane compresses it into a single summary fact — one LLM call, children deactivated, anchor preserved with its original vector score. Context window slots reclaimed without losing provenance. Matches how the brain consolidates episodic detail into semantic memory.
After: anchor + summary (via traversal) = 2 slots
The anchor keeps its original similarity score for direct retrieval. The summary is wired as an ELABORATES child, so traversal surfaces it automatically. Deactivated members remain in the graph — auditable at any time via SUMMARIZES edges.
Inspect, resolve,
compress.
Membrane exposes its internal state through a set of in-chat commands — inspect what's stored, resolve contradictions, and compact memory on demand.
/factsShow all stored facts with access counts
/statsShow fact counts, session ID, history length
/resolveWalk through unresolved CONTRADICTS pairs interactively — pick which version is current
/demote [n]Compress fact clusters into summaries. Optional arg: minimum cluster size (default 5)
/sessionStart a new session — clears chat history, keeps memory graph intact
/clearClear all memory
Run with the graph backend to get the full command surface. /resolve and /demote require Neo4j — they operate on typed edges that the flat backend doesn't track.
Five typed edges.
Semantic memory.
Facts aren't just stored — they're structured. Every relationship between facts is a typed edge with its own semantics and distinct behavior in retrieval and repair.
New fact replaces old; old is deactivated
Two active facts in tension; neither supersedes the other
Adds specificity without invalidating the anchor
Only interpretable given another fact
Summary compresses a demoted cluster
Every edge carries a created_at timestamp. Deactivated facts are never deleted — full history is always auditable.
Deactivated nodes remain in graph — auditable, never deleted
Where it works today.
Where it's going.
Membrane is validated and working for individual developers right now. Enterprise, compliance, and clinical deployments are the target markets the architecture was designed for.
Single-Developer Persistent Memory
A developer working with an AI assistant across sessions — Membrane remembers architectural decisions, established patterns, and prior context without you having to re-explain every conversation. Drop in, run locally, works out of the box with zero infrastructure.
Enterprise Developer Tooling
AI coding assistants that remember architectural decisions, project constraints, and team conventions across sessions. Every fact, every supersession, every contradiction is a queryable edge with a timestamp and session ID.
Customer-Facing Agents
Support and service agents that accumulate user preferences, prior issue history, and established context — without accumulating raw conversation history in the context window. Memory degrades gracefully via demotion rather than hard truncation at the context limit.
Financial Services
Analysts and advisors using AI tools that must recall prior conversations, client preferences, and established positions. On-premises deployment with a complete audit trail that addresses compliance requirements cloud-based memory services cannot meet.
Healthcare & Clinical Workflows
Clinical AI tools that maintain longitudinal patient context across appointments. HIPAA-sensitive deployments run Membrane entirely on-premises with no data leaving the environment.
Research & Knowledge Work
Long-horizon research assistants where the AI needs to know what's been established, what's been tried, and what's in tension. The CONTRADICTS edge makes unresolved tensions a first-class object rather than a silent overwrite.
Membrane does not depend on any specific model architecture. Any backend implementing .complete() and .chat() can be substituted.
Isn't this just RAG?
No. Here's why.
Conversational facts, not document retrieval.
RAG retrieves document chunks to answer questions. Membrane manages a structured graph of facts distilled from conversation — discrete, relationship-aware, and maintained across sessions. No document index required.
RAG is a lookup. Membrane is working memory.
No access to model weights. No training run.
Fine-tuning bakes knowledge into the model's parameters. Membrane works at the context layer — it can be deployed today, updated in real time, and pointed at any model. Changing what the model 'knows' takes milliseconds, not epochs.
Fine-tuning is slow, expensive, and bakes in stale data.
Your audit trail. In your graph. Owned by you.
Managed cloud memory services are opaque — you don't know what they store, how they retrieve it, or what they've forgotten. Membrane's audit trail is structural: every fact, every relationship, every supersession is a queryable edge in your Neo4j instance. An auditor can reconstruct exactly what the system believed at any point in time.
Cloud services log. Membrane graphs. The difference is queryability.
This isn't another
memory framework.
The starting point wasn't an LLM problem. It was a neuroscience observation: biological memory isn't stored in one place, retrieved by one pathway, or lost all at once. Damage one route and another scaffolds recovery. That redundancy is structural — it's in the architecture, not the data.
Current LLM memory systems don't reflect that. They accumulate raw history, truncate hard at the context limit, and treat forgetting as an engineering problem rather than an architectural one. Membrane treats it as the latter.
The hippocampus doesn't reason.
It stores indices into representations distributed across cortex and mediates retrieval. When one pathway fails, another that encoded the same information differently can scaffold recovery — as in Melodic Intonation Therapy, where stroke patients who lose speech can still sing, and singing becomes the scaffold back to language.
Two structurally different systems running in parallel.
The original design called for a second model (Mamba, RWKV) running alongside the LLM to handle state. What was built is stronger: the second pathway is infrastructure, not a model. It doesn't generate language. It manages state, detects failure, and provides scaffolding — exactly as the non-dominant hemisphere does in MIT.
Model-agnostic by construction, not by accident.
Because the memory system is structurally separate from the model, it works with any LLM. Changing what the model knows takes milliseconds. The audit trail is a graph, not a log. Contradictions are edges, not silent overwrites. These aren't features — they're consequences of getting the architecture right.
Your memory.
In your graph.
Queryable by anyone.
The audit trail is structural, not logged. Every belief change is a graph edge. An auditor can reconstruct exactly what the system believed at any point in time — which session created each belief, what caused it to be updated, and what replaced it.
idStable 8-char identifier — portable across sessionssession_idWhich conversation session created the factcreated_atTimestamp when the fact was first extractedlast_accessedWhen the fact was last promoted to contextaccess_countHow many times this fact has been retrievedsuperseded_byPoints to the fact that replaced this one-- What did the system know about the user's language preference,
-- and when did it change?
MATCH path = (current:Fact)-[:SUPERSEDES*]->(original:Fact)
WHERE current.content CONTAINS 'language'
RETURN path
-- Every fact carries:
-- id stable 8-char identifier
-- session_id which conversation created it
-- created_at full temporal provenance
-- access_count how many times retrieved
-- Every relationship carries:
-- created_at when the relationship was established
-- edge type the semantic meaning
-- Full history is never deleted.This is the property that distinguishes Membrane from managed cloud memory services: the audit trail is in the graph, accessible to any Cypher query, owned by the operator.