In this article, we’ll explore how memory types apply to AI agents and how we can use frameworks like LangChain to add memory to AI agents.
Memory in AI agents
AI agent memory refers to an AI system’s ability to store and recall past experiences. Unlike traditional AI models that process tasks independently, memory-enabled agents can retain context, recognize patterns, and adapt based on past interactions, which is crucial for goal-oriented applications.
Three core pillars define memory in agents:
- State: Understanding the present context
- Persistence: Retaining knowledge across sessions
- Selection: Evaluating what information to remember
When AI agents lack memory:
However, some AI agents today are stateless, meaning they cannot retain information from previous conversations or use past interactions to influence future ones.
This is why it becomes inefficient to repeatedly provide context in tools like ChatGPT when referring to previous discussions. These lack awareness of earlier interactions. While ChatGPT now offers a long-term memory feature that addresses this limitation, other models and popular coding assistants like GitHub Copilot do not have persistent, user-configurable memory.
How AI agents use memory?
To understand how memory works in AI agents, it’s helpful to break it down into two main types: short-term memory and long-term memory.
The following example illustrates how each one functions in practice:
Perception
Perception allows an AI system to process raw inputs like text or images into usable data. Tools such as Unstructured.io help convert files like PDFs into structured formats the AI can understand.
Motor functions (external tools)
Motor functions let the AI interact with external systems by performing actions through APIs, such as sending emails, updating documents, or triggering services.
Agent orchestrators/frameworks
The orchestrators/frameworks manage all components. It retrieves relevant data from memory, turns temporary information into long-term storage, formats data for processing, and plans tasks to help the AI stay focused on its goals.
Short-term memory (or working memory)
Short-term memory enables an application to remember messages within the same conversation, much like emails are grouped into threads.
In AI, this corresponds to the context window in large language models (LLMs), which temporarily stores user inputs and relevant data during an interaction. Once the session ends, this information is lost unless explicitly saved.
Simulating short-term memory in Python:
While these python simulations are not a full AI memory system, it shows how facts can be stored in structured formats, similar to how AI agents might use vector databases or knowledge graphs.
We’ll cover real-world applications with frameworks like LangChain (below) and use more advanced memory components to manage context, recall, and personalization across interactions.
Output:
Long-term memory
Source: LangGraph2
Semantic memory
Semantic memory refers to general knowledge, such as mathematical definitions or scientific facts, like a triangle has three sides.
In AI systems, semantic memory is often implemented using vector databases to search unstructured data, like retrieving similar documents or generating context-aware responses to provide accurate information.
Simulating semantic memory in Python
Here is a simple Python example that demonstrates the concept of semantic memory, storing and retrieving factual knowledge using key-value pairs.
Output:
Episodic memory
Episodic memory involves recalling personal experiences, such as the moment you received a job offer or a conversation from last weekend. This enables agents to be more personalized, referencing previous discussions and maintaining continuity across sessions.
Implementing episodic memory often involves RAG-like systems, which retrieve relevant pieces of prior context from unstructured data such as conversation history based on semantic similarity. This allows the AI to surface contextually accurate information even when no explicit structure exists.
In some cases, knowledge graphs can also be used to represent structured relationships between distinct data objects (e.g., people, locations) and define how they are connected. These are especially useful when explicit, fact-based reasoning is required, such as answering questions like “Which countries share a border with Portugal?”
Simulating episodic memory in Python
Output (example in Rust):
Procedural memory
Procedural memory captures knowledge about how to carry out tasks. In AI systems, this is typically implemented through functions, algorithms, or code that dictate the agent’s behavior.
It can include everything from basic routines like greeting users to more advanced workflows for problem-solving. Unlike semantic memory, which handles what the agent knows, procedural memory focuses on how that knowledge is applied.
Procedural memory simulation in Python
I created a simple simulation of procedural memory using a Python dictionary to store the agent’s current instructions.
The call_model function pretends to use those instructions to generate a tweet summary. Then I pass some feedback like “make it simpler” to the update_instructions function, which updates the stored instructions.
So, the agent “remembers” how it should behave, updates that memory when it gets feedback, and then uses the new version moving forward. It’s a lightweight way to mimic how AI agents adapt their behavior over time.
Output:
Inside the agent’s cognitive loop
Let’s unpack what happens when you interact with a memory-enabled AI assistant. Much like a human brain loops through perception, reasoning, and memory updates, the agent follows a similar sequence behind the scenes:
Step 1: The LLM receives your input
Your user message, along with the current chat history, is formatted into a structured prompt.
Step 2: Tool selection begins
The agent analyzes the prompt and determines which tools are relevant for the task. It may perform a semantic search across academic sources, query its internal knowledge base, or apply input compression to stay within the context window.
Step 3: Iterative reasoning kicks in
Instead of generating a response immediately, the agent enters a reasoning loop.
The agent steps through a reasoning loop structured as:Thought → Tool → Observation → Thought → ...
It evaluates the situation, selects a tool, reviews the result, and refines its thinking. This loop of thought, action, observation, and revision continues until the task is complete or a limit is reached, such as a maximum number of steps or tokens.
Step 4: Memory is updated
After the task is completed, the full interaction is saved to long-term memory. This includes the input, any tools used, observations made, and the final response. The memory is typically stored in a database like MongoDB so the agent can access it in future sessions and maintain context.
Building an AI research agent with memory
What we’re going to build
We’ll be using the code from this official GitHub notebook.
We’ll build a LangChain-powered AI research assistant that can:
- Search arXiv for relevant academic papers using keyword queries.
- Retrieve context-aware research results from a MongoDB-based long-term memory (vector database).
- Store and recall past user conversations using chat history memory.
- Use LLMs to reason over documents and answer natural language queries.
It supports two types of memory:
- Knowledge memory (long-term): Stores and retrieves vectorized research paper abstracts.
- Conversation memory (episodic): Stores chat interactions and uses them to inform future responses.
How will the agent use memory to reason and respond?
In this tutorial, the AI agent uses two key memory types:
- Knowledge memory (long-term): A MongoDB vector database stores research paper abstracts as embeddings. The agent will retrieve relevant documents using semantic similarity, supporting contextual reasoning over time.
- Conversation memory (episodic): A MongoDB collection logs the full chat history. This will enable the agent reference past interactions and respond with continuity across queries.
While there’s no explicit short-term memory module, the agent’s runtime working memory is implicitly handled by LangChain’s execution loop. It passes relevant context between steps and tools, giving the agent short-term recall during each reasoning cycle.
Infrastructure and tools required
- LangChain: Framework for building LLM workflows and agents
- MongoDB Atlas: Used for both vector search (long-term memory) and chat history storage
- OpenAI / Fireworks LLM: Language model for answering queries and planning actions
- Hugging Face Datasets: Load research papers (with precomputed embeddings)
- arXiv API via LangChain: Query real academic paper metadata and abstracts
Install libraries (Jupyter Notebook)
Set environment variables
In any project involving LLMs and vector databases, you’ll need API keys and a connection string. This step loads those securely into the environment so they can be accessed programmatically without hardcoding them into your scripts.
We’ll use
- OpenAI for embeddings or LLM calls.
- Fireworks AI if using their hosted LLMs.
- MongoDB Atlas to store vectorized data or memory.
Data ingestion into MongoDB vector database
Load the dataset: This dataset simulates what your agent “knows” up front. Each abstract is semantically embedded, enabling vector similarity search later.
(Optional) Inspect the data: It’s good practice to preview your data structure. You’ll confirm fields like title, abstract, and especially embedding (your vector list).
- print(len(dataset_df))
- dataset_df.head()
Full dataset:
Complete data ingestion into MongoDB
Connect to MongoDB: We’re building the MongoDB collection (similar to a table in relational databases) where vector data will be stored. This is your knowledge base, used later for semantic search.
Insert data: Now the AI agent will have a persistent, searchable knowledge base of research papers which is stored as both metadata and embeddings.
Create vector search index definition (Long-term memory)
This step enables vector search in MongoDB by creating a vector index on the field that holds the embeddings.
This index acts as a long-term semantic memory store. Think of this as giving your agent’s brain a structured, searchable memory where it can recall semantically similar knowledge.
You’re telling MongoDB:
- What kind of index to use: “type”: “vector”
- Where to look for the embeddings: “path”: “embedding”
- How long the vectors are: “numDimensions”: 256 (must match embedding model)
- How to compare them: “similarity”: “cosine”
Example index definition (JSON):
You’ll typically create this index manually via the MongoDB Atlas UI or using the Atlas API. It should match the embedding field and dimensions you used when you ingested the data (usually OpenAI or Fireworks-generated vectors).
Create LangChain retriever (MongoDB)
Create a long-term memory (Knowledge store): In this step, we build a retriever object that allows your AI agent to perform semantic search over the MongoDB collection filled with research paper embeddings. This retriever will be the agent’s “knowledge recall” system.
When the agent receives a question, the retriever will embed the question using the same embedding model and retrieves the top-k similar documents using vector similarity search.
This is like saying: “Out of everything I’ve read before (50,000+ papers), here are the 5 most semantically similar to what you just asked.”
Configure LLM using fireworks AI
Create the agent “Brain” (Reasoning system): In this step, we will define the language model that powers the AI agent’s reasoning capabilities. Specifically, we configure the Fireworks AI LLM that the agent will use to process queries, decide on tool use, and generate final responses.
So the agent will use memory by accessing:
- Chat history from the agent’s conversation memory
- Retrieved context from the vector knowledge base (long-term memory)
Agent tools creation
Create agent actions( How the agent interacts with the world): This step defines the tools your AI agent can use. These are callable functions the LLM can invoke to complete tasks.
In our case, tools are mostly for retrieving research papers, either from a MongoDB vector store (knowledge base) or directly from arXiv.
Here are the tools we will create:
- Vector Search Tool (Knowledge Base): This tool lets the agent query a MongoDB vector database, which acts as long-term memory. It retrieves abstracts semantically similar to a user’s query.
- arXiv Search Tool (Metadata): Allows the agent to query arXiv for up to 10 matching papers by keyword. Think of this as a way to search outside its internal memory.
- arXiv Lookup Tool (Full Document): Retrieves full content of a specific arXiv paper using its ID (e.g. 704.0001).
- Prompt Compression Tool: If the agent sees a long context or chat history, it can compress it using LLMLingua before submitting to the LLM.
Agent prompt creation
Create personality + memory context: This step sets up the agent’s personality, its domain knowledge, and how it should use its tools. It’s where we define the system message (who the agent is) and the structure of messages it will receive, including memory and instructions.
We will import the ChatPromptTemplate to structure the prompt and MessagesPlaceholder to allow dynamic memory (conversation history) to be inserted.
Define agent’s purpose (System Prompt):
The string below defines:
- The agent’s identity (“helpful research assistant”)
- Tool instructions: When to use:
- Knowledge base
- ArXiv metadata
- ArXiv full doc
- Prompt compression
It also includes a soft rule: “use compression when context is too long”, mimicking short-term memory management
Agent memory creation using MongoDB
Create long-term memory (Conversational history): This step connects your agent to a MongoDB collection to persistently store conversation history, enabling it to remember previous user interactions across sessions.
We will import the required memory classes:
- ConversationBufferMemory: Manages the actual memory logic (e.g., return chat history).
- MongoDBChatMessageHistory: Provides a LangChain-compatible interface for MongoDB.
Also, we’ll define a function to create or retrieve session history: This lets to dynamically reference a chat session using a unique session_id. Each session’s messages will be stored under a collection called “history”.
Create the conversation memory instance:
What will be stored in MongoDB (long-term memory)?
This long term memory will enable the agent to recall what the user previously asked. So, episodic memory agents will reason across multiple conversations, like remembering a research topic the user asked about 10 minutes ago or in a previous session.
Agent creation (tool use + memory)
Agent execution
Now that the agent is fully built with a brain (LLM), tools (actions), and memory (chat history), we can run it by invoking the executor.
We’ll use this call to prompt the agent to search academic literature:
Here the tool that searches the arXiv API and retrieves metadata (e.g. title, authors, date) for up to 10 papers (see below) matching your query.
Real-world applications of agent memory
1- Using external memory (arXiv) for knowledge retrieval
We’ll ask the agentto retrieve relevant academic papers about a specific and niche topic: Prompt Compression in LLMs (Large Language Models).
Input: {‘input’: ‘Get me a list of research papers on the topic Prompt Compression in LLM Applications.’
Agent Behavior (Console output):
> Entering new AgentExecutor chain…
Output (Agent response): Here is the list of research papers the agent retrieved from arXiv document:
Output (Agent response):
Note: The ‘chat_history’: ” is empty because this is the initial interaction
2- Leveraging episodic memory to reference prior conversations
Input: agent_executor.invoke({“input”: “What paper did we speak about from our chat history?”})
Agent Behavior (Console Output):
> Entering new AgentExecutor chain…
Output (Agent response):
The agent remembers the chat history using its conversational memory stored in MongoDB.
3- Using long-term knowledge memory to answer contextual queries
Input: agent_executor.invoke({“input”: “Get me some papers you have within your knowledge”})
Agent Behavior (Console Output):
> Entering new AgentExecutor chain…
Thanks to the memory wired in via MongoDB, the agent remembers the earlier topic “Prompt Compression” from the conversation. It doesn’t need the user to repeat the term explicitly.
Instead of searching the web again, the agent taps into its long-term knowledge memory (a MongoDB vector store). It performs a semantic search, comparing the embedded query to stored paper embeddings, and returns the most relevant matches.
Output (Agent response):
4- Episodic memory recall
Input: agent_executor.invoke ({ “input”: “What was the first question I asked?”})
Agent Behavior (Console Output):
> Entering new AgentExecutor chain…
Output (Agent response):
When the agent is asked, “What was the first question I asked?” it remembered correctly and responded:
“The first question you asked was: ‘Get me a list of research papers on the topic Prompt Compression.’”
This confirms the agent can recall earlier interactions from the conversation history stored in memory.
Reference Links

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.



Be the first to comment
Your email address will not be published. All fields are required.