Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
MemGPT
why we need memory-augmented LLMs
👋 Charles Packer
● PhD candidate @ Sky / BAIR, focus in AI
● Author of MemGPT
○ First paper demonstrating how to give GPT-4
self-editing memory (AI that can learn over time)
● Working on agents since 2017
○ “the dark ages”
○ 5 BC = Before ChatGPT
📧 cpacker@berkeley.edu
🐦 @charlespacker
Agents in 2017 🙈
For LLMs, “memory” is everything
memory is context
context includes long-term memory, tool use, ICL, RAG, …
For LLMs, “memory” is everything
“memory” =
MemGPT - giving LLMs real “memory”
GPT
Why is this the “best” AI product?
What about this?
Search engine AI assistant
Search engine AI assistant
MemGPT: Introduction to Memory Augmented Chat
Search engine AI assistant
Search engine AI assistant
MemGPT: Introduction to Memory Augmented Chat
tl;dr
LLMs doing constrained Q/A
🤩
tl;dr
LLMs doing long-range, open-ended tasks
🤨
MemGPT: Introduction to Memory Augmented Chat
90%+ of questions are
related to one project
No shared context! Why?
We don’t know how to do it…
How to get an LLM to use
● hundreds of chats
● + code base (1M+ LoC)
● + …
● …RAG?
● Lots of retrieval?
● Multi-step retrieval?
● Retrieval that works?
● What about writing?
…long-context LLMs?
Cost + latency
Context pollution
No shared context! Why?
We don’t know how to do it…
Search engine AI assistant
state management
MemGPT -> giving LLMs real “memory”
MemGPT -> memory via tools
LLM
tools
��
Memory
Text
User message
��
GPT-4
Context window
8k max token limit
ChatGPT
Text
Agent reply
��
Standard LLM setup
e.g., ChatGPT UI + GPT-4 model
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse parse
Fixed-context LLM
e.g., GPT-4 with 8k max tokens
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse parse
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
Virtual context
Main context
External context
∞ tokens
Max token limit
LLM
Event
User message
��
Document upload
��
System alert
🔔
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
{ “type”: “user_message”,
“content”: “how to undo git commit
-am?” }
{ “type”: “document_upload”,
“info”: “9 page PDF”,
“summary”: “MemGPT research paper” }
{ “type”: “system_alert”,
“content”: “Memory warning: 75% of
context used.” }
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
LLM outputs are functions (JSON)
Event loop + functions that allow editing memory
Function
Send message
��
Query database
Pause interrupts
��
Agent can query out-of-context
information with functions
{
“function”: “ archival_memory_search”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
Function
Send message
��
Query database
Pause interrupts
��
Pages into (finite) LLM context
{
“function”: “ archival_memory_search”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
LLM
Function
Send message
��
Edit context
Pause interrupts
��
Agent can edit their own memory
including their own context
{
“function”: “ core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}
Function
Send message
��
Edit context
Pause interrupts
��
Core memory is a reserved block
System
prompt
In-context
memory block
Working
context queue
{
“function”: “ core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}
Function
Send message
��
Query database
Pause interrupts
��
{
“function”: “ send_message”,
“params”: {
“message”: “How may I assist you?”
}
}
User messages are a function
Allows interacting with system
autonomously w/o user inputs
{ “type”: “user_message”,
“content”: “ what’s happening on may 21 2024?” }
{
“function”: “archival_memory_search”,
“params”: {
“query”: “ may 21 2024”,
}
}
{
“function”: “send_message”,
“params”: {
“message”: “ Have you heard about Milvus?”
}
}
🧑
🤖
what’s happening on may 21 2024?
Have you heard about Milvus?
🧑
🤖
(User’s POV)
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy
Calling & executing custom tools
MemGPT -> Building LLM Agents
Long-term memory management
��
��
Loading external data sources (RAG)
🛠
MemGPT
= the OSS platform for building 🛠 and hosting 🏠 LLM agents
Developer
User
MemGPT
Dev Portal
MemGPT CLI
$ memgpt run
MemGPT server
User-facing
application
REST API Users
Agents
Tools
Sources
user_id: …
agent_id: … Personal Assistant
State Memories
Documents
MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State Memories
Documents
Webhooks
MemGPT
may 21 developer update 🎉
MemGPT: Introduction to Memory Augmented Chat
Docker integration - the fastest way to create a MemGPT server
Step 1: docker compose up
Step 2: create/edit/message agents using the MemGPT API
MemGPT ❤
MemGPT streaming API - token streaming
CLI: memgpt run --stream
REST API: use the stream_tokens flag [PR #1280 - staging]
MemGPT streaming API - token streaming
MemGPT API works with both non-streaming + streaming endpoints
If the true LLM backend doesn’t support streaming, “fake streaming”
MemGPT /chat/completions proxy API
Connect your MemGPT server to any /chat/completions service!
For example - 📞 voice call your MemGPT agents using VAPI!
MemGPT ��

More Related Content

MemGPT: Introduction to Memory Augmented Chat