mem7 — LLM-powered memory engine based on Rust

Quick Start

Get up and running in minutes. Pick your language to get started.

quickstart.py

from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

config = MemoryConfig(
    llm=LlmConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="qwen2.5:7b",
    ),
    embedding=EmbeddingConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="mxbai-embed-large",
        dims=1024,
    ),
)

m = Memory(config=config)

# Add a memory from conversation
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")

# Semantic search
results = m.search("What sports does Alice play?", user_id="alice")

# Get all memories for a user
memories = m.get_all(user_id="alice")

quickstart.ts

import { MemoryEngine } from "@mem7ai/mem7";

const engine = await MemoryEngine.create(JSON.stringify({
  llm: {
    base_url: "http://localhost:11434/v1",
    api_key: "ollama",
    model: "qwen2.5:7b",
  },
  embedding: {
    base_url: "http://localhost:11434/v1",
    api_key: "ollama",
    model: "mxbai-embed-large",
    dims: 1024,
  },
}));

// Add a memory from conversation
await engine.add(
  [{ role: "user", content: "I love playing tennis and my coach is Sarah." }],
  "alice"
);

// Semantic search
const results = await engine.search("What sports does Alice play?", "alice");

// Get all memories for a user
const memories = await engine.getAll("alice");

main.rs

use mem7::{ChatMessage, MemoryEngine, MemoryEngineConfig};

#[tokio::main]
async fn main() -> mem7::Result<()> {
    let config = MemoryEngineConfig::default();

    let engine = MemoryEngine::new(config).await?;
    let messages = vec![
        ChatMessage {
            role: "user".into(),
            content: "I love playing tennis and my coach is Sarah.".into(),
            images: vec![],
        },
    ];

    // Add a memory
    engine
        .add(&messages, Some("alice"), None, None, None, true)
        .await?;

    // Semantic search
    let results = engine
        .search(
            "What sports does Alice play?",
            Some("alice"),
            None,
            None,
            5,
            None,
            true,
            None,
            None,
        )
        .await?;

    Ok(())
}

Architecture

High-performance Rust core with zero-cost language bindings.

Python / TypeScript / Rust PyO3 · napi-rs · native

Rust Core tokio async runtime

mem7-llm

mem7-embedding

mem7-vector

mem7-history

mem7-dedup

mem7-reranker

mem7-graph

mem7-telemetry

mem7-store

Rust Performance

Async Rust core with tokio runtime. Native speed for embedding, deduplication, and vector operations.

Multi-Language

First-class Python, TypeScript, and Rust APIs. Zero-overhead bindings via PyO3 and napi-rs.

Pluggable Providers

Swap LLMs, embeddings, and vector stores via config. One OpenAI-compatible client covers most providers.

Audit Trail

Every ADD, UPDATE, and DELETE is recorded in a SQLite audit log with full history per memory.

Multi-User

Memories can be scoped by user_id, agent_id, and run_id for multi-tenant, per-agent, and per-session isolation.

Smart Deduplication

LLM-driven deduplication decides whether to add, update, or skip based on existing memories.

Graph Memory

Optional dual-path recall: vector search + graph search run concurrently. Extracts entities & relations via LLM.

Memory Decay

Ebbinghaus forgetting curve deprioritizes stale memories over time. Frequently recalled facts get stronger — just like human memory.

Observability

Built-in OpenTelemetry integration. Export trace spans for every operation via OTLP to Jaeger, Grafana Tempo, or any collector.

Dual-Path Recall

When graph is enabled, add() and search() run both paths concurrently via tokio::join!

add(messages)

Vector Path

LLM: extract facts

Embedding: vectorize

Dedup & store

memories

Graph Path

LLM: extract entities

LLM: extract relations

Store in graph DB

relations

search(query)

Vector Search

Embed query

Similarity search

Optional rerank

memories[]

Graph Search

Match entities

Traverse relations

Return triples

relations[]

Memory Decay

Ebbinghaus-inspired forgetting curve — stale memories fade, frequently recalled facts grow stronger.

The Model

S₀ = base half-life α = rehearsal factor n = access count τ = last accessed at γ = decay shape ρ = min retention

How It Works

1

Write New memories are stamped with last_accessed_at and access_count = 0.

2

Decay Over time, the retention score drops — stale memories rank lower in search and dedup.

3

Rehearsal Each successful retrieval increments access_count and resets the timestamp (async, fire-and-forget).

4

Cue Awakening A highly relevant query naturally revives old memories via high raw_similarity.

Read Path

After vector search and optional reranking, each result's score is multiplied by its retention factor. Results are re-sorted by the decayed score. Graph search results receive the same treatment after BM25 reranking.

Write Path

During deduplication, existing memories retrieved as candidates also have decay applied before the LLM decides ADD/UPDATE/DELETE. Stale memories appear less "close" to new facts, making updates more likely.

Floor Guarantee

The min_retention parameter (default 10%) ensures no memory fully vanishes. Even a year-old untouched memory still contributes a small signal if the query is highly relevant.

Configuration

Decay is off by default. Enable and tune via config.

decay_config.py

from mem7.config import MemoryConfig, DecayConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    decay=DecayConfig(
        enabled=True,
        base_half_life_secs=604800.0,  # 7 days
        decay_shape=0.8,
        min_retention=0.1,
        rehearsal_factor=0.5,
    ),
)

decay_config.ts

const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  decay: {
    enabled: true,
    base_half_life_secs: 604800.0,  // 7 days
    decay_shape: 0.8,
    min_retention: 0.1,
    rehearsal_factor: 0.5,
  },
}));

decay_config.rs

use mem7_config::{MemoryEngineConfig, DecayConfig};

let config = MemoryEngineConfig {
    decay: Some(DecayConfig {
        enabled: true,
        base_half_life_secs: 604800.0,  // 7 days
        decay_shape: 0.8,
        min_retention: 0.1,
        rehearsal_factor: 0.5,
        ..Default::default()
    }),
    ..Default::default()
};

Parameter	Default	Description
`base_half_life_secs`	`604800.0`	Base stability in seconds (7 days) before any rehearsal bonus
`decay_shape`	`0.8`	Stretched-exponential shape (0 < γ ≤ 1); lower = slower initial decay
`min_retention`	`0.1`	Floor so no memory fully vanishes
`rehearsal_factor`	`0.5`	How much each retrieval increases stability

Session-Aware Recall

Context-aware scoring that demotes irrelevant memories based on task type — so design preferences don't leak into bug fixing.

The Scoring Model

The context coefficient is looked up from a (memory_type, task_type) weight matrix. Each fact is typed at write time; each query is classified at read time — in parallel with embedding, adding zero sequential latency.

How It Works

1

Memory Typing (Write) LLM classifies each extracted fact as factual, preference, procedural, or episodic.

2

Query Classification (Read) A lightweight LLM call classifies the query as troubleshooting, design, factual_lookup, planning, or general.

3

Context Coefficient Score is multiplied by the weight matrix value. E.g. preference × troubleshooting = 0.3.

4

Re-ranked Results Memories are re-sorted by final score. Contextually irrelevant items drop to the bottom.

Default Weight Matrix

Rows = memory type | Columns = task type | Values = context coefficient

	troubleshoot	design	factual	planning	general
factual	1.0	0.5	1.0	0.7	1.0
preference	0.3	1.0	0.3	0.8	0.8
procedural	0.8	0.5	0.5	1.0	0.7
episodic	0.5	0.5	0.5	0.5	0.7

Zero Latency Overhead

Query classification runs in parallel with embedding via tokio::join!, so it adds exactly zero milliseconds to the search path.

Callers can also pass task_type directly to skip the LLM call entirely.

Configuration

Context scoring is off by default. Enable and optionally customize weights.

context_config.py

from mem7.config import MemoryConfig, ContextConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    context=ContextConfig(enabled=True),
)

m = Memory(config=config)

# Auto-classification (LLM decides task type)
results = m.search("fix Chrome CDP timeout", user_id="alice")

# Override: skip LLM call, tell mem7 this is troubleshooting
results = m.search("fix Chrome CDP timeout", user_id="alice", task_type="troubleshooting")

context_config.ts

const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  context: { enabled: true },
}));

// Auto-classification
const results = await engine.search("fix Chrome CDP timeout", "alice");

// Override task type
const results2 = await engine.search(
  "fix Chrome CDP timeout", "alice",
  undefined, undefined, undefined, undefined, undefined, "troubleshooting"
);

Supported Providers

One OpenAI-compatible client covers most LLM and embedding providers out of the box.

LLMs

OpenAI

Ollama

vLLM

Groq

Together

DeepSeek

xAI (Grok)

LM Studio

Azure OpenAI

Anthropicplanned

Geminiplanned

Vertex AIplanned

AWS Bedrockplanned

Any OpenAI-compatible API works

Embeddings

OpenAI

Ollama

Together

LM Studio

Azure OpenAI

FastEmbed (local ONNX)

Hugging Faceplanned

Geminiplanned

Vertex AIplanned

AWS Bedrockplanned

Any OpenAI-compatible API works

Vector Stores

In-memory (FlatIndex)

Upstash Vector

Qdrantplanned

Chromaplanned

pgvectorplanned

Milvusplanned

Pineconeplanned

Redisplanned

Weaviateplanned

Elasticsearchplanned

FAISSplanned

MongoDBplanned

Graph Stores

In-memory (FlatGraph)

Kuzu (embedded)

Neo4j

Memgraphplanned

Amazon Neptuneplanned

Optional dual-path recall (vector + graph)

Rerankers

Cohere

LLM-based

Jina AIplanned

Voyage AIplanned

Cross-encoderplanned

Optional post-search reranking

Language Bindings

Use mem7 from your preferred language.

npm install @mem7ai/mem7

Go

go get github.com/mem7ai/mem7-go

Planned

LLM-powered long-termmemory engine

Quick Start

Architecture

Rust Performance

Multi-Language

Pluggable Providers

Audit Trail

Multi-User

Smart Deduplication

Graph Memory

Memory Decay

Observability

Dual-Path Recall

Memory Decay

The Model

How It Works

Read Path

Write Path

Floor Guarantee

Configuration

Session-Aware Recall

The Scoring Model

How It Works

Default Weight Matrix

Zero Latency Overhead

Configuration

Supported Providers

LLMs

Embeddings

Vector Stores

Graph Stores

Rerankers

Language Bindings

LLM-powered long-term
memory engine