Membrane

The model is the CPU.
Membrane is the OS.

Persistent, structured memory for any LLM — without retraining, fine-tuning, or modifying the model. The model doesn't know Membrane exists.

128Ktokens

Your context window — finite, precious, always filling up.

0sessions

What current LLMs remember between conversations. Out of the box.

$0.003/ turn

Total Membrane overhead on top of your main model call. Most turns cost less.

HOW IT WORKS

Three operations.
Every turn.

Membrane executes three operations in sequence on every conversation turn — transparent to the model, zero configuration required.

01
$0
No LLM call

Promote

Before the model call

Retrieve the most relevant facts from the persistent store and inject them into the model's context window — before the model sees the message.

show code
02
~$0.003
~500–800 tokens

Extract

After the model call

Pull discrete facts from the completed conversation turn and store them with embeddings and explicit typed relationships.

show code
03
$0 most turns
~$0.002 when triggered

Repair

After extraction

Check the model's response for contradictions against stored facts using two candidate pools for maximum coverage.

show code
Total overhead per turn:~$0.003–0.005on top of your main model call — regardless of which LLM you use.
MEMORY MANAGEMENT — DEMOTION

When a topic cluster grows beyond a threshold, Membrane compresses it into a single summary fact — one LLM call, children deactivated, anchor preserved with its original vector score. Context window slots reclaimed without losing provenance. Matches how the brain consolidates episodic detail into semantic memory.

Before: anchor + 4 children = 5 slots
After: anchor + summary (via traversal) = 2 slots

The anchor keeps its original similarity score for direct retrieval. The summary is wired as an ELABORATES child, so traversal surfaces it automatically. Deactivated members remain in the graph — auditable at any time via SUMMARIZES edges.

CLI COMMANDS

Inspect, resolve,
compress.

Membrane exposes its internal state through a set of in-chat commands — inspect what's stored, resolve contradictions, and compact memory on demand.

COMMANDWHAT IT DOESTYPE
/facts

Show all stored facts with access counts

inspect
/stats

Show fact counts, session ID, history length

inspect
/resolve

Walk through unresolved CONTRADICTS pairs interactively — pick which version is current

manage
/demote [n]

Compress fact clusters into summaries. Optional arg: minimum cluster size (default 5)

manage
/session

Start a new session — clears chat history, keeps memory graph intact

session
/clear

Clear all memory

session

Run with the graph backend to get the full command surface. /resolve and /demote require Neo4j — they operate on typed edges that the flat backend doesn't track.

THE MEMORY GRAPH

Five typed edges.
Semantic memory.

Facts aren't just stored — they're structured. Every relationship between facts is a typed edge with its own semantics and distinct behavior in retrieval and repair.

EDGEDIRECTIONWRITTEN BY
SUPERSEDES
new → oldExtract

New fact replaces old; old is deactivated

CONTRADICTS
f₁ ↔ f₂Repair

Two active facts in tension; neither supersedes the other

ELABORATES
detail → anchorExtract

Adds specificity without invalidating the anchor

DEPENDS_ON
dependent → anchorExtract

Only interpretable given another fact

SUMMARIZES
summary → membersDemotion

Summary compresses a demoted cluster

Every edge carries a created_at timestamp. Deactivated facts are never deleted — full history is always auditable.

example fact graph — 4 facts, 4 edges
SUPERSEDESELABORATESDEPENDS_ONSUMMARIZESprefers Pythonsupersededswitched toRustuses TokioLinux targetSummary: stack

Deactivated nodes remain in graph — auditable, never deleted

USE CASES

Where it works today.
Where it's going.

Membrane is validated and working for individual developers right now. Enterprise, compliance, and clinical deployments are the target markets the architecture was designed for.

working today

Single-Developer Persistent Memory

Working today

A developer working with an AI assistant across sessions — Membrane remembers architectural decisions, established patterns, and prior context without you having to re-explain every conversation. Drop in, run locally, works out of the box with zero infrastructure.

flat JSON backendzero infrastructureworks today
target market

Enterprise Developer Tooling

Full audit trail

AI coding assistants that remember architectural decisions, project constraints, and team conventions across sessions. Every fact, every supersession, every contradiction is a queryable edge with a timestamp and session ID.

Neo4j backendfull provenanceteam memory
target market

Customer-Facing Agents

Graceful memory, not truncation

Support and service agents that accumulate user preferences, prior issue history, and established context — without accumulating raw conversation history in the context window. Memory degrades gracefully via demotion rather than hard truncation at the context limit.

demotionsession persistencecontext efficiency
target market

Financial Services

On-premises, compliance-ready

Analysts and advisors using AI tools that must recall prior conversations, client preferences, and established positions. On-premises deployment with a complete audit trail that addresses compliance requirements cloud-based memory services cannot meet.

on-prem optionaudit trail
target market

Healthcare & Clinical Workflows

HIPAA, fully on-prem

Clinical AI tools that maintain longitudinal patient context across appointments. HIPAA-sensitive deployments run Membrane entirely on-premises with no data leaving the environment.

HIPAAno cloud egresson-prem Neo4j
target market

Research & Knowledge Work

Contradictions as first-class objects

Long-horizon research assistants where the AI needs to know what's been established, what's been tried, and what's in tension. The CONTRADICTS edge makes unresolved tensions a first-class object rather than a silent overwrite.

CONTRADICTS edgeslong horizonknowledge graph
MODEL AGNOSTIC
ClaudeGPT-4LlamaMistralBedrockAny API

Membrane does not depend on any specific model architecture. Any backend implementing .complete() and .chat() can be substituted.

WHAT MEMBRANE IS NOT

Isn't this just RAG?
No. Here's why.

NOT RAG

Conversational facts, not document retrieval.

RAG retrieves document chunks to answer questions. Membrane manages a structured graph of facts distilled from conversation — discrete, relationship-aware, and maintained across sessions. No document index required.

RAG is a lookup. Membrane is working memory.

NOT FINE-TUNING

No access to model weights. No training run.

Fine-tuning bakes knowledge into the model's parameters. Membrane works at the context layer — it can be deployed today, updated in real time, and pointed at any model. Changing what the model 'knows' takes milliseconds, not epochs.

Fine-tuning is slow, expensive, and bakes in stale data.

NOT A CLOUD MEMORY SERVICE

Your audit trail. In your graph. Owned by you.

Managed cloud memory services are opaque — you don't know what they store, how they retrieve it, or what they've forgotten. Membrane's audit trail is structural: every fact, every relationship, every supersession is a queryable edge in your Neo4j instance. An auditor can reconstruct exactly what the system believed at any point in time.

Cloud services log. Membrane graphs. The difference is queryability.

WHY WE BUILT THIS

This isn't another
memory framework.

The starting point wasn't an LLM problem. It was a neuroscience observation: biological memory isn't stored in one place, retrieved by one pathway, or lost all at once. Damage one route and another scaffolds recovery. That redundancy is structural — it's in the architecture, not the data.

Current LLM memory systems don't reflect that. They accumulate raw history, truncate hard at the context limit, and treat forgetting as an engineering problem rather than an architectural one. Membrane treats it as the latter.

Full write-up coming — follow the repo for updates.
THE INSIGHT

The hippocampus doesn't reason.

It stores indices into representations distributed across cortex and mediates retrieval. When one pathway fails, another that encoded the same information differently can scaffold recovery — as in Melodic Intonation Therapy, where stroke patients who lose speech can still sing, and singing becomes the scaffold back to language.

THE ARCHITECTURE

Two structurally different systems running in parallel.

The original design called for a second model (Mamba, RWKV) running alongside the LLM to handle state. What was built is stronger: the second pathway is infrastructure, not a model. It doesn't generate language. It manages state, detects failure, and provides scaffolding — exactly as the non-dominant hemisphere does in MIT.

THE CONSEQUENCE

Model-agnostic by construction, not by accident.

Because the memory system is structurally separate from the model, it works with any LLM. Changing what the model knows takes milliseconds. The audit trail is a graph, not a log. Contradictions are edges, not silent overwrites. These aren't features — they're consequences of getting the architecture right.

PROVENANCE & AUDITABILITY

Your memory.
In your graph.
Queryable by anyone.

The audit trail is structural, not logged. Every belief change is a graph edge. An auditor can reconstruct exactly what the system believed at any point in time — which session created each belief, what caused it to be updated, and what replaced it.

idStable 8-char identifier — portable across sessions
session_idWhich conversation session created the fact
created_atTimestamp when the fact was first extracted
last_accessedWhen the fact was last promoted to context
access_countHow many times this fact has been retrieved
superseded_byPoints to the fact that replaced this one
audit.cypher
Neo4j Cypher
-- What did the system know about the user's language preference,
-- and when did it change?

MATCH path = (current:Fact)-[:SUPERSEDES*]->(original:Fact)
WHERE current.content CONTAINS 'language'
RETURN path

-- Every fact carries:
--   id            stable 8-char identifier
--   session_id    which conversation created it
--   created_at    full temporal provenance
--   access_count  how many times retrieved

-- Every relationship carries:
--   created_at    when the relationship was established
--   edge type     the semantic meaning

-- Full history is never deleted.

This is the property that distinguishes Membrane from managed cloud memory services: the audit trail is in the graph, accessible to any Cypher query, owned by the operator.