LLM-powered long-term
memory engine

Apache-2.0 · Open Source

Rust core with Python, TypeScript & Rust bindings.
Extracts facts and entity relations from conversations, deduplicates, and stores them in vector + graph databases with full audit history.

Ebbinghaus forgetting curve — stale memories decay, frequently recalled facts grow stronger.
Session-aware recall — memories are typed and queries are auto-classified, so irrelevant context never reaches your agent.

pip install mem7
npm install @mem7ai/mem7
cargo add mem7

Quick Start

Get up and running in minutes. Pick your language to get started.

quickstart.py
from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

config = MemoryConfig(
    llm=LlmConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="qwen2.5:7b",
    ),
    embedding=EmbeddingConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="mxbai-embed-large",
        dims=1024,
    ),
)

m = Memory(config=config)

# Add a memory from conversation
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")

# Semantic search
results = m.search("What sports does Alice play?", user_id="alice")

# Get all memories for a user
memories = m.get_all(user_id="alice")
quickstart.ts
import { MemoryEngine } from "@mem7ai/mem7";

const engine = await MemoryEngine.create(JSON.stringify({
  llm: {
    base_url: "http://localhost:11434/v1",
    api_key: "ollama",
    model: "qwen2.5:7b",
  },
  embedding: {
    base_url: "http://localhost:11434/v1",
    api_key: "ollama",
    model: "mxbai-embed-large",
    dims: 1024,
  },
}));

// Add a memory from conversation
await engine.add(
  [{ role: "user", content: "I love playing tennis and my coach is Sarah." }],
  "alice"
);

// Semantic search
const results = await engine.search("What sports does Alice play?", "alice");

// Get all memories for a user
const memories = await engine.getAll("alice");
main.rs
use mem7::{ChatMessage, MemoryEngine, MemoryEngineConfig};

#[tokio::main]
async fn main() -> mem7::Result<()> {
    let config = MemoryEngineConfig::default();

    let engine = MemoryEngine::new(config).await?;
    let messages = vec![
        ChatMessage {
            role: "user".into(),
            content: "I love playing tennis and my coach is Sarah.".into(),
            images: vec![],
        },
    ];

    // Add a memory
    engine
        .add(&messages, Some("alice"), None, None, None, true)
        .await?;

    // Semantic search
    let results = engine
        .search(
            "What sports does Alice play?",
            Some("alice"),
            None,
            None,
            5,
            None,
            true,
            None,
            None,
        )
        .await?;

    Ok(())
}

Architecture

High-performance Rust core with zero-cost language bindings.

Python / TypeScript / Rust PyO3 · napi-rs · native
Rust Core tokio async runtime
mem7-llm
mem7-embedding
mem7-vector
mem7-history
mem7-dedup
mem7-reranker
mem7-graph
mem7-telemetry
mem7-store

Rust Performance

Async Rust core with tokio runtime. Native speed for embedding, deduplication, and vector operations.

Multi-Language

First-class Python, TypeScript, and Rust APIs. Zero-overhead bindings via PyO3 and napi-rs.

Pluggable Providers

Swap LLMs, embeddings, and vector stores via config. One OpenAI-compatible client covers most providers.

Audit Trail

Every ADD, UPDATE, and DELETE is recorded in a SQLite audit log with full history per memory.

Multi-User

Memories can be scoped by user_id, agent_id, and run_id for multi-tenant, per-agent, and per-session isolation.

Smart Deduplication

LLM-driven deduplication decides whether to add, update, or skip based on existing memories.

Graph Memory

Optional dual-path recall: vector search + graph search run concurrently. Extracts entities & relations via LLM.

Memory Decay

Ebbinghaus forgetting curve deprioritizes stale memories over time. Frequently recalled facts get stronger — just like human memory.

Observability

Built-in OpenTelemetry integration. Export trace spans for every operation via OTLP to Jaeger, Grafana Tempo, or any collector.

Dual-Path Recall

When graph is enabled, add() and search() run both paths concurrently via tokio::join!

add(messages)
Vector Path
LLM: extract facts
Embedding: vectorize
Dedup & store
memories
Graph Path
LLM: extract entities
LLM: extract relations
Store in graph DB
relations

Memory Decay

Ebbinghaus-inspired forgetting curve — stale memories fade, frequently recalled facts grow stronger.

The Model

S0 = base half-life α = rehearsal factor n = access count τ = last accessed at γ = decay shape ρ = min retention

How It Works

1
Write New memories are stamped with last_accessed_at and access_count = 0.
2
Decay Over time, the retention score drops — stale memories rank lower in search and dedup.
3
Rehearsal Each successful retrieval increments access_count and resets the timestamp (async, fire-and-forget).
4
Cue Awakening A highly relevant query naturally revives old memories via high raw_similarity.

Read Path

After vector search and optional reranking, each result's score is multiplied by its retention factor. Results are re-sorted by the decayed score. Graph search results receive the same treatment after BM25 reranking.

Write Path

During deduplication, existing memories retrieved as candidates also have decay applied before the LLM decides ADD/UPDATE/DELETE. Stale memories appear less "close" to new facts, making updates more likely.

Floor Guarantee

The min_retention parameter (default 10%) ensures no memory fully vanishes. Even a year-old untouched memory still contributes a small signal if the query is highly relevant.

Configuration

Decay is off by default. Enable and tune via config.

decay_config.py
from mem7.config import MemoryConfig, DecayConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    decay=DecayConfig(
        enabled=True,
        base_half_life_secs=604800.0,  # 7 days
        decay_shape=0.8,
        min_retention=0.1,
        rehearsal_factor=0.5,
    ),
)
decay_config.ts
const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  decay: {
    enabled: true,
    base_half_life_secs: 604800.0,  // 7 days
    decay_shape: 0.8,
    min_retention: 0.1,
    rehearsal_factor: 0.5,
  },
}));
decay_config.rs
use mem7_config::{MemoryEngineConfig, DecayConfig};

let config = MemoryEngineConfig {
    decay: Some(DecayConfig {
        enabled: true,
        base_half_life_secs: 604800.0,  // 7 days
        decay_shape: 0.8,
        min_retention: 0.1,
        rehearsal_factor: 0.5,
        ..Default::default()
    }),
    ..Default::default()
};
Parameter Default Description
base_half_life_secs 604800.0 Base stability in seconds (7 days) before any rehearsal bonus
decay_shape 0.8 Stretched-exponential shape (0 < γ ≤ 1); lower = slower initial decay
min_retention 0.1 Floor so no memory fully vanishes
rehearsal_factor 0.5 How much each retrieval increases stability

Session-Aware Recall

Context-aware scoring that demotes irrelevant memories based on task type — so design preferences don't leak into bug fixing.

The Scoring Model

The context coefficient is looked up from a (memory_type, task_type) weight matrix. Each fact is typed at write time; each query is classified at read time — in parallel with embedding, adding zero sequential latency.

How It Works

1
Memory Typing (Write) LLM classifies each extracted fact as factual, preference, procedural, or episodic.
2
Query Classification (Read) A lightweight LLM call classifies the query as troubleshooting, design, factual_lookup, planning, or general.
3
Context Coefficient Score is multiplied by the weight matrix value. E.g. preference × troubleshooting = 0.3.
4
Re-ranked Results Memories are re-sorted by final score. Contextually irrelevant items drop to the bottom.

Default Weight Matrix

Rows = memory type  |  Columns = task type  |  Values = context coefficient

troubleshoot design factual planning general
factual1.00.51.00.71.0
preference0.31.00.30.80.8
procedural0.80.50.51.00.7
episodic0.50.50.50.50.7

Zero Latency Overhead

Query classification runs in parallel with embedding via tokio::join!, so it adds exactly zero milliseconds to the search path.

Callers can also pass task_type directly to skip the LLM call entirely.

Configuration

Context scoring is off by default. Enable and optionally customize weights.

context_config.py
from mem7.config import MemoryConfig, ContextConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    context=ContextConfig(enabled=True),
)

m = Memory(config=config)

# Auto-classification (LLM decides task type)
results = m.search("fix Chrome CDP timeout", user_id="alice")

# Override: skip LLM call, tell mem7 this is troubleshooting
results = m.search("fix Chrome CDP timeout", user_id="alice", task_type="troubleshooting")
context_config.ts
const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  context: { enabled: true },
}));

// Auto-classification
const results = await engine.search("fix Chrome CDP timeout", "alice");

// Override task type
const results2 = await engine.search(
  "fix Chrome CDP timeout", "alice",
  undefined, undefined, undefined, undefined, undefined, "troubleshooting"
);

Supported Providers

One OpenAI-compatible client covers most LLM and embedding providers out of the box.

LLMs

OpenAI
Ollama
vLLM
Groq
Together
DeepSeek
xAI (Grok)
LM Studio
Azure OpenAI
Anthropicplanned
Geminiplanned
Vertex AIplanned
AWS Bedrockplanned

Any OpenAI-compatible API works

Embeddings

OpenAI
Ollama
Together
LM Studio
Azure OpenAI
FastEmbed (local ONNX)
Hugging Faceplanned
Geminiplanned
Vertex AIplanned
AWS Bedrockplanned

Any OpenAI-compatible API works

Vector Stores

In-memory (FlatIndex)
Upstash Vector
Qdrantplanned
Chromaplanned
pgvectorplanned
Milvusplanned
Pineconeplanned
Redisplanned
Weaviateplanned
Elasticsearchplanned
FAISSplanned
MongoDBplanned

Graph Stores

In-memory (FlatGraph)
Kuzu (embedded)
Neo4j
Memgraphplanned
Amazon Neptuneplanned

Optional dual-path recall (vector + graph)

Rerankers

Cohere
LLM-based
Jina AIplanned
Voyage AIplanned
Cross-encoderplanned

Optional post-search reranking

Language Bindings

Use mem7 from your preferred language.