Rust core with Python, TypeScript & Rust bindings.
Extracts facts and entity relations from conversations, deduplicates, and stores them in vector + graph databases with full audit history.
Ebbinghaus forgetting curve — stale memories decay, frequently recalled facts grow stronger.
Session-aware recall — memories are typed and queries are auto-classified, so irrelevant context never reaches your agent.
pip install mem7
npm install @mem7ai/mem7
cargo add mem7
Get up and running in minutes. Pick your language to get started.
from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig
config = MemoryConfig(
llm=LlmConfig(
base_url="http://localhost:11434/v1",
api_key="ollama",
model="qwen2.5:7b",
),
embedding=EmbeddingConfig(
base_url="http://localhost:11434/v1",
api_key="ollama",
model="mxbai-embed-large",
dims=1024,
),
)
m = Memory(config=config)
# Add a memory from conversation
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")
# Semantic search
results = m.search("What sports does Alice play?", user_id="alice")
# Get all memories for a user
memories = m.get_all(user_id="alice")
import { MemoryEngine } from "@mem7ai/mem7";
const engine = await MemoryEngine.create(JSON.stringify({
llm: {
base_url: "http://localhost:11434/v1",
api_key: "ollama",
model: "qwen2.5:7b",
},
embedding: {
base_url: "http://localhost:11434/v1",
api_key: "ollama",
model: "mxbai-embed-large",
dims: 1024,
},
}));
// Add a memory from conversation
await engine.add(
[{ role: "user", content: "I love playing tennis and my coach is Sarah." }],
"alice"
);
// Semantic search
const results = await engine.search("What sports does Alice play?", "alice");
// Get all memories for a user
const memories = await engine.getAll("alice");
use mem7::{ChatMessage, MemoryEngine, MemoryEngineConfig};
#[tokio::main]
async fn main() -> mem7::Result<()> {
let config = MemoryEngineConfig::default();
let engine = MemoryEngine::new(config).await?;
let messages = vec![
ChatMessage {
role: "user".into(),
content: "I love playing tennis and my coach is Sarah.".into(),
images: vec![],
},
];
// Add a memory
engine
.add(&messages, Some("alice"), None, None, None, true)
.await?;
// Semantic search
let results = engine
.search(
"What sports does Alice play?",
Some("alice"),
None,
None,
5,
None,
true,
None,
None,
)
.await?;
Ok(())
}
High-performance Rust core with zero-cost language bindings.
Async Rust core with tokio runtime. Native speed for embedding, deduplication, and vector operations.
First-class Python, TypeScript, and Rust APIs. Zero-overhead bindings via PyO3 and napi-rs.
Swap LLMs, embeddings, and vector stores via config. One OpenAI-compatible client covers most providers.
Every ADD, UPDATE, and DELETE is recorded in a SQLite audit log with full history per memory.
Memories can be scoped by user_id, agent_id, and run_id for multi-tenant, per-agent, and per-session isolation.
LLM-driven deduplication decides whether to add, update, or skip based on existing memories.
Optional dual-path recall: vector search + graph search run concurrently. Extracts entities & relations via LLM.
Ebbinghaus forgetting curve deprioritizes stale memories over time. Frequently recalled facts get stronger — just like human memory.
Built-in OpenTelemetry integration. Export trace spans for every operation via OTLP to Jaeger, Grafana Tempo, or any collector.
When graph is enabled, add() and search() run both paths concurrently via tokio::join!
Ebbinghaus-inspired forgetting curve — stale memories fade, frequently recalled facts grow stronger.
last_accessed_at and access_count = 0.
access_count and resets the timestamp (async, fire-and-forget).
raw_similarity.
After vector search and optional reranking, each result's score is multiplied by its retention factor. Results are re-sorted by the decayed score. Graph search results receive the same treatment after BM25 reranking.
During deduplication, existing memories retrieved as candidates also have decay applied before the LLM decides ADD/UPDATE/DELETE. Stale memories appear less "close" to new facts, making updates more likely.
The min_retention parameter (default 10%) ensures no memory fully vanishes. Even a year-old untouched memory still contributes a small signal if the query is highly relevant.
Decay is off by default. Enable and tune via config.
from mem7.config import MemoryConfig, DecayConfig
config = MemoryConfig(
# ... llm, embedding, etc.
decay=DecayConfig(
enabled=True,
base_half_life_secs=604800.0, # 7 days
decay_shape=0.8,
min_retention=0.1,
rehearsal_factor=0.5,
),
)
const engine = await MemoryEngine.create(JSON.stringify({
// ... llm, embedding, etc.
decay: {
enabled: true,
base_half_life_secs: 604800.0, // 7 days
decay_shape: 0.8,
min_retention: 0.1,
rehearsal_factor: 0.5,
},
}));
use mem7_config::{MemoryEngineConfig, DecayConfig};
let config = MemoryEngineConfig {
decay: Some(DecayConfig {
enabled: true,
base_half_life_secs: 604800.0, // 7 days
decay_shape: 0.8,
min_retention: 0.1,
rehearsal_factor: 0.5,
..Default::default()
}),
..Default::default()
};
| Parameter | Default | Description |
|---|---|---|
base_half_life_secs |
604800.0 |
Base stability in seconds (7 days) before any rehearsal bonus |
decay_shape |
0.8 |
Stretched-exponential shape (0 < γ ≤ 1); lower = slower initial decay |
min_retention |
0.1 |
Floor so no memory fully vanishes |
rehearsal_factor |
0.5 |
How much each retrieval increases stability |
Context-aware scoring that demotes irrelevant memories based on task type — so design preferences don't leak into bug fixing.
The context coefficient is looked up from a (memory_type, task_type) weight matrix.
Each fact is typed at write time; each query is classified at read time — in parallel with embedding, adding zero sequential latency.
factual, preference, procedural, or episodic.
troubleshooting, design, factual_lookup, planning, or general.
preference × troubleshooting = 0.3.
Rows = memory type | Columns = task type | Values = context coefficient
| troubleshoot | design | factual | planning | general | |
|---|---|---|---|---|---|
| factual | 1.0 | 0.5 | 1.0 | 0.7 | 1.0 |
| preference | 0.3 | 1.0 | 0.3 | 0.8 | 0.8 |
| procedural | 0.8 | 0.5 | 0.5 | 1.0 | 0.7 |
| episodic | 0.5 | 0.5 | 0.5 | 0.5 | 0.7 |
Query classification runs in parallel with embedding via tokio::join!, so it adds exactly zero milliseconds to the search path.
Callers can also pass task_type directly to skip the LLM call entirely.
Context scoring is off by default. Enable and optionally customize weights.
from mem7.config import MemoryConfig, ContextConfig
config = MemoryConfig(
# ... llm, embedding, etc.
context=ContextConfig(enabled=True),
)
m = Memory(config=config)
# Auto-classification (LLM decides task type)
results = m.search("fix Chrome CDP timeout", user_id="alice")
# Override: skip LLM call, tell mem7 this is troubleshooting
results = m.search("fix Chrome CDP timeout", user_id="alice", task_type="troubleshooting")
const engine = await MemoryEngine.create(JSON.stringify({
// ... llm, embedding, etc.
context: { enabled: true },
}));
// Auto-classification
const results = await engine.search("fix Chrome CDP timeout", "alice");
// Override task type
const results2 = await engine.search(
"fix Chrome CDP timeout", "alice",
undefined, undefined, undefined, undefined, undefined, "troubleshooting"
);
One OpenAI-compatible client covers most LLM and embedding providers out of the box.
Any OpenAI-compatible API works
Any OpenAI-compatible API works
Optional dual-path recall (vector + graph)
Optional post-search reranking
Use mem7 from your preferred language.