Zep + LangChain: Give Your AI Assistant Persistent Memory

Zep + LangChain: Give Your AI Assistant Persistent Memory

Your chatbot has amnesia.

Every time a user starts a new session, your AI assistant forgets everything—their name, preferences, previous conversations, all of it. Gone. You're essentially building a goldfish with a PhD.

This isn't a minor inconvenience. It's why your customer support bot asks "How can I help you today?" to someone who's been troubleshooting the same issue for three days.

Zep fixes this. It's a memory layer that gives your LLM applications persistent, long-term memory. And integrating it with LangChain takes about 15 minutes.

What Zep Actually Does

Zep sits between your application and your LLM. When a conversation happens, Zep:

  1. Stores the full conversation history
  2. Automatically summarizes long conversations
  3. Extracts facts and entities from chats
  4. Retrieves relevant context when you need it

The magic is in that last part. When a user returns three weeks later and mentions "that project we discussed," Zep knows exactly what they're talking about.

Prerequisites

  • Python 3.9+
  • A running Zep instance (we'll use Elestio for deployment)
  • Basic familiarity with LangChain

Setting Up Zep

First, deploy Zep on Elestio. Select Zep from the marketplace, pick your provider (2 CPU / 4GB RAM minimum), and click deploy. You'll have a running instance in about 3 minutes.

Once deployed, grab your Zep URL from the Elestio dashboard. It'll look like https://zep-xxxxx.vm.elestio.app.

Installing Dependencies

pip install zep-python langchain langchain-openai

The Basic Integration

Here's a minimal example that gives your chatbot memory:

from zep_python import ZepClient
from zep_python.memory import Memory, Message
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, AIMessage, SystemMessage

# Initialize clients
zep = ZepClient(base_url="https://your-zep-instance.vm.elestio.app")
llm = ChatOpenAI(model="gpt-4")

SESSION_ID = "user-123-session"

async def chat_with_memory(user_input: str) -> str:
    # Get existing memory for this session
    try:
        memory = await zep.memory.aget(SESSION_ID)
        context = memory.summary.content if memory.summary else ""
        recent_messages = memory.messages[-10:]  # Last 10 messages
    except:
        context = ""
        recent_messages = []

    # Build message history
    messages = [
        SystemMessage(content=f"Context from previous conversations: {context}"),
    ]

    for msg in recent_messages:
        if msg.role == "human":
            messages.append(HumanMessage(content=msg.content))
        else:
            messages.append(AIMessage(content=msg.content))

    messages.append(HumanMessage(content=user_input))

    # Get response from LLM
    response = await llm.ainvoke(messages)

    # Store the conversation in Zep
    await zep.memory.aadd(
        SESSION_ID,
        Memory(messages=[
            Message(role="human", content=user_input),
            Message(role="ai", content=response.content)
        ])
    )

    return response.content

That's it. Your chatbot now remembers conversations across sessions.

Using LangChain's Built-in Zep Integration

LangChain has native Zep support, which makes things even cleaner:

from langchain_community.memory import ZepMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

memory = ZepMemory(
    session_id="user-456",
    url="https://your-zep-instance.vm.elestio.app",
    memory_key="chat_history",
    return_messages=True
)

chain = ConversationChain(
    llm=ChatOpenAI(model="gpt-4"),
    memory=memory,
    verbose=True
)

# This conversation is automatically stored and retrieved
response = chain.predict(input="My name is Sarah and I work at Acme Corp")
# Later...
response = chain.predict(input="What company do I work at?")
# Returns: "You work at Acme Corp, Sarah."

Fact Extraction: The Hidden Power

Zep doesn't just store messages—it extracts structured facts. This is incredibly useful for customer support:

# After some conversations, retrieve extracted facts
memory = await zep.memory.aget("user-456")

for fact in memory.facts:
    print(f"- {fact.fact}")

# Output:
# - User's name is Sarah
# - User works at Acme Corp
# - User is interested in enterprise pricing
# - User had an issue with API rate limits on Nov 15

You can use these facts to personalize responses, trigger workflows, or populate CRM records automatically.

Handling Multiple Users

In production, you'll want to create sessions per user:

def get_session_id(user_id: str, conversation_id: str = None) -> str:
    if conversation_id:
        return f"{user_id}-{conversation_id}"
    return f"{user_id}-default"

# Each user gets their own memory space
sarah_session = get_session_id("sarah@acme.com")
john_session = get_session_id("john@example.com")

Troubleshooting

Memory not persisting? Check that your Zep instance is running and accessible. Test with:

curl https://your-zep-instance.vm.elestio.app/healthz

Context window too large? Zep automatically summarizes old conversations, but you can control how many recent messages to include. Reduce recent_messages[-10:] to a smaller number.

Slow retrieval? Make sure your Elestio instance has adequate resources. For production workloads with many concurrent users, consider upgrading to 4 CPU / 8GB RAM.

Why Self-Host Your AI Memory?

Here's the thing about conversation history: it's incredibly sensitive data. Every chat contains context about your users, their problems, their business details.

Self-hosting Zep means that data stays on your infrastructure. No third-party has access to your users' conversations. For enterprise customers, this isn't optional—it's a requirement.

With Zep on Elestio, you get managed deployment without the data sovereignty concerns. Automatic backups, SSL, and updates—but your data stays yours.

What's Next

Once you have basic memory working, explore:

  • Custom metadata: Tag sessions with user segments or conversation types
  • Search: Query past conversations semantically
  • Webhooks: Trigger actions when specific facts are extracted

Your AI assistant just got a lot smarter. More importantly, it now remembers that it got smarter.


Thanks for reading! If you're building AI applications that need memory, deploy Zep on Elestio and have it running in minutes.