MemGPT: Introduction to Memory Augmented Chat

MemGPT
why we need memory-augmented LLMs

👋 Charles Packer
● PhD candidate @ Sky / BAIR, focus in AI
● Author of MemGPT
○ First paper demonstrating how to give GPT-4
self-editing memory (AI that can learn over time)
● Working on agents since 2017
○ “the dark ages”
○ 5 BC = Before ChatGPT
📧 cpacker@berkeley.edu
🐦 @charlespacker

For LLMs, “memory” is everything
memory is context
context includes long-term memory, tool use, ICL, RAG, …

For LLMs, “memory” is everything
“memory” =

MemGPT - giving LLMs real “memory”
GPT

Why is this the “best” AI product?

MemGPT: Introduction to Memory Augmented Chat

tl;dr
LLMs doing constrained Q/A
🤩

tl;dr
LLMs doing long-range, open-ended tasks
🤨

90%+ of questions are
related to one project
No shared context! Why?
We don’t know how to do it…

How to get an LLM to use
● hundreds of chats
● + code base (1M+ LoC)
● + …

● …RAG?
● Lots of retrieval?
● Multi-step retrieval?
● Retrieval that works?
● What about writing?

…long-context LLMs?
Cost + latency
Context pollution

No shared context! Why?
We don’t know how to do it…

Search engine AI assistant
state management

MemGPT -> giving LLMs real “memory”

MemGPT -> memory via tools
LLM
tools
��
Memory

Text
User message
��
GPT-4
Context window
8k max token limit
ChatGPT
Text
Agent reply
��
Standard LLM setup
e.g., ChatGPT UI + GPT-4 model

Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy

Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse parse
Fixed-context LLM
e.g., GPT-4 with 8k max tokens

Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse parse
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
Virtual context
Main context
External context
∞ tokens
Max token limit
LLM

Event
User message
��
Document upload
��
System alert
🔔
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
{ “type”: “user_message”,
“content”: “how to undo git commit
-am?” }
{ “type”: “document_upload”,
“info”: “9 page PDF”,
“summary”: “MemGPT research paper” }
{ “type”: “system_alert”,
“content”: “Memory warning: 75% of
context used.” }

Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
LLM outputs are functions (JSON)
Event loop + functions that allow editing memory

Function
Send message
��
Query database
Pause interrupts
��
Agent can query out-of-context
information with functions
{
“function”: “ archival_memory_search”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}

Function
Send message
��
Query database
Pause interrupts
��
Pages into (finite) LLM context
{
“function”: “ archival_memory_search”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
LLM

Function
Send message
��
Edit context
Pause interrupts
��
Agent can edit their own memory
including their own context
{
“function”: “ core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}

Function
Send message
��
Edit context
Pause interrupts
��
Core memory is a reserved block
System
prompt
In-context
memory block
Working
context queue
{
“function”: “ core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}

Function
Send message
��
Query database
Pause interrupts
��
{
“function”: “ send_message”,
“params”: {
“message”: “How may I assist you?”
}
}
User messages are a function
Allows interacting with system
autonomously w/o user inputs

{ “type”: “user_message”,
“content”: “ what’s happening on may 21 2024?” }
{
“function”: “archival_memory_search”,
“params”: {
“query”: “ may 21 2024”,
}
}
{
“function”: “send_message”,
“params”: {
“message”: “ Have you heard about Milvus?”
}
}
🧑
🤖

what’s happening on may 21 2024?
Have you heard about Milvus?
🧑
🤖
(User’s POV)

Calling & executing custom tools
MemGPT -> Building LLM Agents
Long-term memory management
��
��
Loading external data sources (RAG)
🛠

MemGPT
= the OSS platform for building 🛠 and hosting 🏠 LLM agents

Developer
User
MemGPT
Dev Portal
MemGPT CLI
$ memgpt run

MemGPT server
User-facing
application
REST API Users
Agents
Tools
Sources
user_id: …
agent_id: … Personal Assistant
State Memories
Documents

MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State Memories
Documents
Webhooks

MemGPT
may 21 developer update 🎉

Docker integration - the fastest way to create a MemGPT server
Step 1: docker compose up
Step 2: create/edit/message agents using the MemGPT API
MemGPT ❤

MemGPT streaming API - token streaming
CLI: memgpt run --stream
REST API: use the stream_tokens ﬂag [PR #1280 - staging]

MemGPT streaming API - token streaming
MemGPT API works with both non-streaming + streaming endpoints
If the true LLM backend doesn’t support streaming, “fake streaming”

MemGPT /chat/completions proxy API
Connect your MemGPT server to any /chat/completions service!
For example - 📞 voice call your MemGPT agents using VAPI!
MemGPT ��

MemGPT: Introduction to Memory Augmented Chat

More Related Content

MemGPT: Introduction to Memory Augmented Chat