Architecture

Memory architecture

How a character remembers you. Every turn we recall relevant past beats into the prompt; only turns that carry durable, retrieval-worthy information write anything back. A knowledge graph — not a chat log — holds the memory.

Recall every turn, write only when it matters. The same Safety Guard that judges canon and safety also decides, per turn, whether memory extraction runs — so most turns skip the heavy write pipeline entirely.

Knowledge graph

Nodes + edges + 1536-d embeddings in Postgres (pgvector).

Salience-gated

Writes run only when the judge sets extractMemory = true.

Session-scoped

Keyed by session today. Cross-session user memory is on the roadmap.

Data model

Four tables, all scoped by (world_id, session_id). An immutable event log feeds a node/edge graph; a snapshot caches the projected state.

ontology_nodesmemories

stable_key
node_type · semantic_kind
title · summary
embedding vector(1536)
consent_state · status

ontology_edgesrelations

from_node_id → to_node_id
relation_label
relation_description
evidence
confidence

world_eventsevent log

event_type
source_turn_id
payload (jsonb)
created_at

ontology_snapshotsprojection cache

projection_version
payload (jsonb)
source_event_ids

db/migrations/001_ontology_scaffold.sql · HNSW cosine index on embedding; nodes upsert by stable_key.

Ontology graph

Anchors are a fixed identity backbone. Each meaningful turn grafts a state atom — and a typed edge — onto it. Gold edges are the memory-to-memory links that make it a graph, not a star.

anchor — backbonestate_atom — memory + embeddingmemory ↔ memory edge

The turn loop

One judge call routes the turn. Recall happens on every turn; the write pipeline is gated behind the judge's extractMemory flag.

User turn

Safety Guard (judge)

moderiskflagsnoteextractMemory

Read · every turn

1Load recent beats (recency)

2Compress → inject into prompt

3Generate the reply

Write · gated

gate: extractMemory && mode ≠ safe_redirect

false →skip pipeline (most turns)true →extract → link → embed → persist

lib/ai-engine/runtime.ts · lib/ontology/runtime.ts

The Safety Guard contract

The judge returns five fields. blocked status, review-required and the memory-candidate gate are derived in code — the LLM is not asked to repeat them.

mode

in_world_answer · memory_consent · safe_redirect— routing decision

risk

low · medium · high— merges old safetyStatus + canonRisk

flags

string[]— violation labels (legal_floor:*)

note

string— merges generationGuidance + rationaleForAudit

extractMemory

boolean— the per-turn memory gate

Derived in code: safetyStatus = (mode=safe_redirect ∥ risk=high) ? blocked : clear · reviewRequired = risk ≠ low · lib/ai-engine/policy.ts

The salience gate

Memory worth storing arrives in bursts, not on a clock. So the gate is event-driven — and it adds no new model call, because the judge already runs every turn.

No new call

extractMemory rides on the judge that already gates safety.

No bias to fire

Default false. Greetings, acknowledgements, recap → skip.

Hard floor

safe_redirect always forces extractMemory = false.

Roadmap

What the current build deliberately leaves for the next increments.

Recall by embedding similarity, not just recency.
Move edge-linking from per-node to a periodic consolidation pass.
Take the write off the response critical path (fire-and-forget).
A periodic backstop that re-scans recent turns the single-turn gate missed.
Promote the memory key from session to a stable user identity.

Related surfaces:Rights Console Dev Console