In this article, we’ll explore how memory types apply to AI agents and how we can use frameworks like LangChain to add memory to AI agents.
Memory in AI agents
AI agent memory refers to an AI system’s ability to store and recall past experiences. Unlike traditional AI models that process tasks independently, memory-enabled agents can retain context, recognize patterns, and adapt based on past interactions, which is crucial for goal-oriented applications.
Three core pillars define memory in agents:
- State: Understanding the present context
- Persistence: Retaining knowledge across sessions
- Selection: Evaluating what information to remember
When AI agents lack memory:
However, some AI agents today are stateless, meaning they cannot retain information from previous conversations or use past interactions to influence future ones.
This is why it becomes inefficient to repeatedly provide context in tools like ChatGPT when referring to previous discussions. These lack awareness of earlier interactions. While ChatGPT now offers a long-term memory feature that addresses this limitation, other models and popular coding assistants like GitHub Copilot do not have persistent, user-configurable memory.
How AI agents use memory?
To understand how memory works in AI agents, it’s helpful to break it down into two main types: short-term memory and long-term memory.
The following example illustrates how each one functions in practice:
Perception
Perception allows an AI system to process raw inputs like text or images into usable data. Tools such as Unstructured.io help convert files like PDFs into structured formats the AI can understand.
Motor functions (external tools)
Motor functions let the AI interact with external systems by performing actions through APIs, such as sending emails, updating documents, or triggering services.
Agent orchestrators/frameworks
The orchestrators/frameworks manage all components. It retrieves relevant data from memory, turns temporary information into long-term storage, formats data for processing, and plans tasks to help the AI stay focused on its goals.
Short-term memory (or working memory)
Short-term memory enables an application to remember messages within the same conversation, much like emails are grouped into threads.
In AI, this corresponds to the context window in large language models (LLMs), which temporarily stores user inputs and relevant data during an interaction. Once the session ends, this information is lost unless explicitly saved.
Simulating short-term memory in Python:
While these python simulations are not a full AI memory system, it shows how facts can be stored in structured formats, similar to how AI agents might use vector databases or knowledge graphs.
We’ll cover real-world applications with frameworks like LangChain (below) and use more advanced memory components to manage context, recall, and personalization across interactions.
# Simulate short-term memory storing recent interactions among multiple entities
short_term_memory = []
# Simulate a conversation
short_term_memory.append("User: What's on my calendar today?")
short_term_memory.append("Agent: You have a meeting at 2 PM.")
short_term_memory.append("System: Calendar sync completed.")
short_term_memory.append("User: Cancel the meeting.")
# Retain only the most recent 3 messages
short_term_memory = short_term_memory[-3:]
# Print current contents of short-term memory
for line in short_term_memory:
print(line)
Output:
Agent: You have a meeting at 2 PM.
System: Calendar sync completed.
User: Cancel the meeting.
Long-term memory
Memory type | What is stored | Human example | Agent example |
---|---|---|---|
Semantic | Facts | Things I learned in school | Facts about a user |
Episodic | Experiences | Things I did | Past agent actions |
Procedural | Instructions | Instincts or motor skills | Agent system prompt |
Source: LangGraph2
Semantic memory
Semantic memory refers to general knowledge, such as mathematical definitions or scientific facts, like a triangle has three sides.
In AI systems, semantic memory is often implemented using vector databases to search unstructured data, like retrieving similar documents or generating context-aware responses to provide accurate information.
Simulating semantic memory in Python
Here is a simple Python example that demonstrates the concept of semantic memory, storing and retrieving factual knowledge using key-value pairs.
# Simulated semantic memory using key-value pairs
semantic_memory = {
("Triangle", "hasSides"): "Three",
("Water", "boilsAt"): "100°C"
}
# Query the memory; return the fact if found, or a fallback message
query = ("Triangle", "hasSides")
print("Answer:", semantic_memory.get(query, "This fact is not stored in memory."))
Output:
Answer: Three
Episodic memory
Episodic memory involves recalling personal experiences, such as the moment you received a job offer or a conversation from last weekend. This enables agents to be more personalized, referencing previous discussions and maintaining continuity across sessions.
Implementing episodic memory often involves RAG-like systems, which retrieve relevant pieces of prior context from unstructured data such as conversation history based on semantic similarity. This allows the AI to surface contextually accurate information even when no explicit structure exists.
In some cases, knowledge graphs can also be used to represent structured relationships between distinct data objects (e.g., people, locations) and define how they are connected. These are especially useful when explicit, fact-based reasoning is required, such as answering questions like “Which countries share a border with Portugal?”
Simulating episodic memory in Python
from datetime import datetime
# List to store episodic memory (past experiences)
episodic_memory = []
# Record a memory of receiving a job offer
episodic_memory.append({
"timestamp": datetime(2023, 5, 12, 14, 30),
"event": "job_offer_received",
"details": "Received an offer from Acme Corp for the Data Scientist position"
})
# Record a memory of a conversation from last weekend
episodic_memory.append({
"timestamp": datetime(2023, 5, 13, 19, 45),
"event": "weekend_conversation",
"details": "Talked with Alex about travel plans over dinner"
})
# Display stored episodes
for memory in episodic_memory:
print(memory)
Output (example in Rust):
{'timestamp': datetime.datetime(2023, 5, 12, 14, 30), 'event': 'job_offer_received', 'details': 'Received an offer from Acme Corp for the Data Scientist position'}
{'timestamp': datetime.datetime(2023, 5, 13, 19, 45), 'event': 'weekend_conversation', 'details': 'Talked with Alex about travel plans over dinner'}
Procedural memory
Procedural memory captures knowledge about how to carry out tasks. In AI systems, this is typically implemented through functions, algorithms, or code that dictate the agent’s behavior.
It can include everything from basic routines like greeting users to more advanced workflows for problem-solving. Unlike semantic memory, which handles what the agent knows, procedural memory focuses on how that knowledge is applied.

Procedural memory simulation in Python
I created a simple simulation of procedural memory using a Python dictionary to store the agent’s current instructions.
The call_model function pretends to use those instructions to generate a tweet summary. Then I pass some feedback like “make it simpler” to the update_instructions function, which updates the stored instructions.
So, the agent “remembers” how it should behave, updates that memory when it gets feedback, and then uses the new version moving forward. It’s a lightweight way to mimic how AI agents adapt their behavior over time.
# Store initial procedural memory (instructions)
procedural_memory = {
"instructions": "Summarize the paper in one sentence."
}
# Simulate using the current instructions
def call_model(memory):
print("Using instructions:", memory["instructions"])
print("→ Generating tweet summary...\n")
# Simulate updating instructions based on feedback
def update_instructions(memory, feedback):
print("Feedback received:", feedback)
memory["instructions"] = "Make it casual and easy to understand."
print("✅ Instructions updated.\n")
# Run simulation
call_model(procedural_memory)
update_instructions(procedural_memory, "Too technical, make it simpler.")
call_model(procedural_memory)
Output:
Using instructions: Summarize the paper in one sentence.
→ Generating tweet summary...
Feedback received: Too technical, make it simpler.
✅ Instructions updated.
Using instructions: Make it casual and easy to understand.
→ Generating tweet summary...
Inside the agent’s cognitive loop
Let’s unpack what happens when you interact with a memory-enabled AI assistant. Much like a human brain loops through perception, reasoning, and memory updates, the agent follows a similar sequence behind the scenes:
Step 1: The LLM receives your input
Your user message, along with the current chat history, is formatted into a structured prompt.
Step 2: Tool selection begins
The agent analyzes the prompt and determines which tools are relevant for the task. It may perform a semantic search across academic sources, query its internal knowledge base, or apply input compression to stay within the context window.
Step 3: Iterative reasoning kicks in
Instead of generating a response immediately, the agent enters a reasoning loop.
The agent steps through a reasoning loop structured as:Thought → Tool → Observation → Thought → ...
It evaluates the situation, selects a tool, reviews the result, and refines its thinking. This loop of thought, action, observation, and revision continues until the task is complete or a limit is reached, such as a maximum number of steps or tokens.
Step 4: Memory is updated
After the task is completed, the full interaction is saved to long-term memory. This includes the input, any tools used, observations made, and the final response. The memory is typically stored in a database like MongoDB so the agent can access it in future sessions and maintain context.
Building an AI research agent with memory
What we’re going to build
We’ll be using the code from this official GitHub notebook.
We’ll build a LangChain-powered AI research assistant that can:
- Search arXiv for relevant academic papers using keyword queries.
- Retrieve context-aware research results from a MongoDB-based long-term memory (vector database).
- Store and recall past user conversations using chat history memory.
- Use LLMs to reason over documents and answer natural language queries.
It supports two types of memory:
- Knowledge memory (long-term): Stores and retrieves vectorized research paper abstracts.
- Conversation memory (episodic): Stores chat interactions and uses them to inform future responses.
How will the agent use memory to reason and respond?
In this tutorial, the AI agent uses two key memory types:
- Knowledge memory (long-term): A MongoDB vector database stores research paper abstracts as embeddings. The agent will retrieve relevant documents using semantic similarity, supporting contextual reasoning over time.
- Conversation memory (episodic): A MongoDB collection logs the full chat history. This will enable the agent reference past interactions and respond with continuity across queries.
While there’s no explicit short-term memory module, the agent’s runtime working memory is implicitly handled by LangChain’s execution loop. It passes relevant context between steps and tools, giving the agent short-term recall during each reasoning cycle.
Infrastructure and tools required
- LangChain: Framework for building LLM workflows and agents
- MongoDB Atlas: Used for both vector search (long-term memory) and chat history storage
- OpenAI / Fireworks LLM: Language model for answering queries and planning actions
- Hugging Face Datasets: Load research papers (with precomputed embeddings)
- arXiv API via LangChain: Query real academic paper metadata and abstracts
Install libraries (Jupyter Notebook)
!pip install langchain langchain_openai langchain-fireworks langchain-mongodb arxiv pymupdf datasets pymongo
Set environment variables
In any project involving LLMs and vector databases, you’ll need API keys and a connection string. This step loads those securely into the environment so they can be accessed programmatically without hardcoding them into your scripts.
We’ll use
- OpenAI for embeddings or LLM calls.
- Fireworks AI if using their hosted LLMs.
- MongoDB Atlas to store vectorized data or memory.
import os
os.environ["OPENAI_API_KEY"] = ""
os.environ["FIREWORKS_API_KEY"] = ""
os.environ["MONGO_URI"] = ""
FIREWORKS_API_KEY = os.environ.get("FIREWORKS_API_KEY")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
MONGO_URI = os.environ.get("MONGO_URI")
Data ingestion into MongoDB vector database
Load the dataset: This dataset simulates what your agent “knows” up front. Each abstract is semantically embedded, enabling vector similarity search later.
import pandas as pd
from datasets import load_dataset
data = load_dataset("MongoDB/subset_arxiv_papers_with_emebeddings")
dataset_df = pd.DataFrame(data["train"])
(Optional) Inspect the data: It’s good practice to preview your data structure. You’ll confirm fields like title, abstract, and especially embedding (your vector list).
- print(len(dataset_df))
- dataset_df.head()
Full dataset:
id | submitter | authors | title | comments | journal-ref | doi | report-no | categories | license | abstract | versions | update_date | authors_parsed | embedding | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 704.0001 | Pavel Nadolsky | C. Bal\’azs, E. L. Berger, P. M. Nadolsky, C.-… | Calculation of prompt diphoton production cros… | 37 pages, 15 figures; published version | Phys.Rev.D76:013009,2007 | 10.1103/PhysRevD.76.013009 | ANL-HEP-PR-07-12 | hep-ph | None | A fully differential calculation in perturba… | [{‘version’: ‘v1’, ‘created’: ‘Mon, 2 Apr 2007… | 2008-11-26 | [[Balázs, C., ], [Berger, E. L., ], [Nadolsky,… | [0.0594153292, -0.0440569334, -0.0487333685, -… |
1 | 704.0002 | Louis Theran | Ileana Streinu and Louis Theran | Sparsity-certifying Graph Decompositions | To appear in Graphs and Combinatorics | None | None | None | math.CO cs.CG | http://arxiv.org/licenses/nonexclusive-distrib… | We describe a new algorithm, the -… | [{‘version’: ‘v1’, ‘created’: ‘Sat, 31 Mar 200… | 2008-12-13 | [[Streinu, Ileana, ], [Theran, Louis, ]] | [0.0247399714, -0.065658465, 0.0201423876, -0…. |
2 | 704.0003 | Hongjun Pan | Hongjun Pan | The evolution of the Earth-Moon system based o… | 23 pages, 3 figures | None | None | None | physics.gen-ph | None | The evolution of Earth-Moon system is descri… | [{‘version’: ‘v1’, ‘created’: ‘Sun, 1 Apr 2007… | 2008-01-13 | [[Pan, Hongjun, ]] | [0.0491479263, 0.0728017688, 0.0604138002, 0.0… |
3 | 704.0004 | David Callan | David Callan | A determinant of Stirling cycle numbers counts… | 11 pages | None | None | None | math.CO | None | We show that a determinant of Stirling cycle… | [{‘version’: ‘v1’, ‘created’: ‘Sat, 31 Mar 200… | 2007-05-23 | [[Callan, David, ]] | [0.0389556214, -0.0410280302, 0.0410280302, -0… |
4 | 704.0005 | Alberto Torchinsky | Wael Abu-Shammala and Alberto Torchinsky | From dyadic to $\Lambda_{\a… | None | Illinois J. Math. 52 (2008) no.2, 681-689 | None | None | math.CA math.FA | None | In this paper we show how to compute the $\L… | [{‘version’: ‘v1’, ‘created’: ‘Mon, 2 Apr 2007… | 2013-10-15 | [[Abu-Shammala, Wael, ], [Torchinsky, Alberto, ]] | [0.118412666, -0.0127423415, 0.1185125113, 0.0… |
Complete data ingestion into MongoDB
Connect to MongoDB: We’re building the MongoDB collection (similar to a table in relational databases) where vector data will be stored. This is your knowledge base, used later for semantic search.
from pymongo import MongoClient
# Initialize MongoDB python client
client = MongoClient(MONGO_URI, appname="devrel.content.ai_agent_firechain.python")
DB_NAME = "agent_demo"
COLLECTION_NAME = "knowledge"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
collection = client[DB_NAME][COLLECTION_NAME]
# Delete any existing records in the collection
collection.delete_many({})
Insert data: Now the AI agent will have a persistent, searchable knowledge base of research papers which is stored as both metadata and embeddings.
# Data Ingestion
records = dataset_df.to_dict("records")
collection.insert_many(records)
print("Data ingestion into MongoDB completed")
Create vector search index definition (Long-term memory)
This step enables vector search in MongoDB by creating a vector index on the field that holds the embeddings.
This index acts as a long-term semantic memory store. Think of this as giving your agent’s brain a structured, searchable memory where it can recall semantically similar knowledge.
You’re telling MongoDB:
- What kind of index to use: “type”: “vector”
- Where to look for the embeddings: “path”: “embedding”
- How long the vectors are: “numDimensions”: 256 (must match embedding model)
- How to compare them: “similarity”: “cosine”
Example index definition (JSON):
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 256,
"similarity": "cosine"
}
]
}
You’ll typically create this index manually via the MongoDB Atlas UI or using the Atlas API. It should match the embedding field and dimensions you used when you ingested the data (usually OpenAI or Fireworks-generated vectors).
Create LangChain retriever (MongoDB)
Create a long-term memory (Knowledge store): In this step, we build a retriever object that allows your AI agent to perform semantic search over the MongoDB collection filled with research paper embeddings. This retriever will be the agent’s “knowledge recall” system.
When the agent receives a question, the retriever will embed the question using the same embedding model and retrieves the top-k similar documents using vector similarity search.
This is like saying: “Out of everything I’ve read before (50,000+ papers), here are the 5 most semantically similar to what you just asked.”
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small", dimensions=256)
# Vector Store Creation
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
connection_string=MONGO_URI,
namespace=DB_NAME + "." + COLLECTION_NAME,
embedding=embedding_model,
index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
text_key="abstract",
)
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})
Configure LLM using fireworks AI
Create the agent “Brain” (Reasoning system): In this step, we will define the language model that powers the AI agent’s reasoning capabilities. Specifically, we configure the Fireworks AI LLM that the agent will use to process queries, decide on tool use, and generate final responses.
So the agent will use memory by accessing:
- Chat history from the agent’s conversation memory
- Retrieved context from the vector knowledge base (long-term memory)
from langchain_fireworks import ChatFireworks
llm = ChatFireworks(model="accounts/fireworks/models/firefunction-v1", max_tokens=256)
Agent tools creation
Create agent actions( How the agent interacts with the world): This step defines the tools your AI agent can use. These are callable functions the LLM can invoke to complete tasks.
In our case, tools are mostly for retrieving research papers, either from a MongoDB vector store (knowledge base) or directly from arXiv.
Here are the tools we will create:
- Vector Search Tool (Knowledge Base): This tool lets the agent query a MongoDB vector database, which acts as long-term memory. It retrieves abstracts semantically similar to a user’s query.
- arXiv Search Tool (Metadata): Allows the agent to query arXiv for up to 10 matching papers by keyword. Think of this as a way to search outside its internal memory.
- arXiv Lookup Tool (Full Document): Retrieves full content of a specific arXiv paper using its ID (e.g. 704.0001).
- Prompt Compression Tool: If the agent sees a long context or chat history, it can compress it using LLMLingua before submitting to the LLM.
from langchain.agents import tool
from langchain.tools.retriever import create_retriever_tool
from langchain_community.document_loaders import ArxivLoader
# Custom Tool Definiton
@tool
def get_metadata_information_from_arxiv(word: str) -> list:
"""
Fetches and returns metadata for a maximum of ten documents from arXiv matching the given query word.
Args:
word (str): The search query to find relevant documents on arXiv.
Returns:
list: Metadata about the documents matching the query.
"""
docs = ArxivLoader(query=word, load_max_docs=10).load()
# Extract just the metadata from each document
metadata_list = [doc.metadata for doc in docs]
return metadata_list
@tool
def get_information_from_arxiv(word: str) -> list:
"""
Fetches and returns metadata for a single research paper from arXiv matching the given query word, which is the ID of the paper, for example: 704.0001.
Args:
word (str): The search query to find the relevant paper on arXiv using the ID.
Returns:
list: Data about the paper matching the query.
"""
doc = ArxivLoader(query=word, load_max_docs=1).load()
return doc
# If you created a retriever with compression capabilities in the optional cell in an earlier cell, you can replace 'retriever' with 'compression_retriever'
# Otherwise you can also create a compression procedure as a tool for the agent as shown in the `compress_prompt_using_llmlingua` tool definition function
retriever_tool = create_retriever_tool(
retriever=retriever,
name="knowledge_base",
description="This serves as the base knowledge source of the agent and contains some records of research papers from Arxiv. This tool is used as the first step for exploration and reseach efforts.",
)
from langchain_community.document_compressors import LLMLinguaCompressor
compressor = LLMLinguaCompressor(model_name="openai-community/gpt2", device_map="cpu")
@tool
def compress_prompt_using_llmlingua(prompt: str, compression_rate: float = 0.5) -> str:
"""
Compresses a long data or prompt using the LLMLinguaCompressor.
Args:
data (str): The data or prompt to be compressed.
compression_rate (float): The rate at which to compress the data (default is 0.5).
Returns:
str: The compressed data or prompt.
"""
compressed_data = compressor.compress_prompt(
prompt,
rate=compression_rate,
force_tokens=["!", ".", "?", "\n"],
drop_consecutive=True,
)
return compressed_data
tools = [
retriever_tool,
get_metadata_information_from_arxiv,
get_information_from_arxiv,
compress_prompt_using_llmlingua,
]
Agent prompt creation
Create personality + memory context: This step sets up the agent’s personality, its domain knowledge, and how it should use its tools. It’s where we define the system message (who the agent is) and the structure of messages it will receive, including memory and instructions.
We will import the ChatPromptTemplate to structure the prompt and MessagesPlaceholder to allow dynamic memory (conversation history) to be inserted.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
Define agent’s purpose (System Prompt):
The string below defines:
- The agent’s identity (“helpful research assistant”)
- Tool instructions: When to use:
- Knowledge base
- ArXiv metadata
- ArXiv full doc
- Prompt compression
It also includes a soft rule: “use compression when context is too long”, mimicking short-term memory management
agent_purpose = """
You are a helpful research assistant equipped with various tools to assist with your tasks efficiently.
You have access to conversational history stored in your inpout as chat_history.
You are cost-effective and utilize the compress_prompt_using_llmlingua tool whenever you determine that a prompt or conversational history is too long.
Below are instructions on when and how to use each tool in your operations.
1. get_metadata_information_from_arxiv
Purpose: To fetch and return metadata for up to ten documents from arXiv that match a given query word.
When to Use: Use this tool when you need to gather metadata about multiple research papers related to a specific topic.
Example: If you are asked to provide an overview of recent papers on "machine learning," use this tool to fetch metadata for relevant documents.
2. get_information_from_arxiv
Purpose: To fetch and return metadata for a single research paper from arXiv using the paper's ID.
When to Use: Use this tool when you need detailed information about a specific research paper identified by its arXiv ID.
Example: If you are asked to retrieve detailed information about the paper with the ID "704.0001," use this tool.
3. retriever_tool
Purpose: To serve as your base knowledge, containing records of research papers from arXiv.
When to Use: Use this tool as the first step for exploration and research efforts when dealing with topics covered by the documents in the knowledge base.
Example: When beginning research on a new topic that is well-documented in the arXiv repository, use this tool to access the relevant papers.
4. compress_prompt_using_llmlingua
Purpose: To compress long prompts or conversational histories using the LLMLinguaCompressor.
When to Use: Use this tool whenever you determine that a prompt or conversational history is too long to be efficiently processed.
Example: If you receive a very lengthy query or conversation context that exceeds the typical token limits, compress it using this tool before proceeding with further processing.
"""
Compose the Prompt Template:
prompt = ChatPromptTemplate.from_messages(
[
("system", agent_purpose),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
("system", agent_purpose): sets the agent’s identity and how to use tools
("human", "{input}"): placeholder for user queries
MessagesPlaceholder("agent_scratchpad"): stores previous tool outputs and LLM steps (like memory of intermediate reasoning)
Agent memory creation using MongoDB
Create long-term memory (Conversational history): This step connects your agent to a MongoDB collection to persistently store conversation history, enabling it to remember previous user interactions across sessions.
We will import the required memory classes:
- ConversationBufferMemory: Manages the actual memory logic (e.g., return chat history).
- MongoDBChatMessageHistory: Provides a LangChain-compatible interface for MongoDB.
Also, we’ll define a function to create or retrieve session history: This lets to dynamically reference a chat session using a unique session_id. Each session’s messages will be stored under a collection called “history”.
Create the conversation memory instance:
from langchain.memory import ConversationBufferMemory
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
return MongoDBChatMessageHistory(
MONGO_URI, session_id, database_name=DB_NAME, collection_name="history"
)
memory = ConversationBufferMemory(
memory_key="chat_history", chat_memory=get_session_history("latest_agent_session")
)
memory_key="chat_history" — matches the key expected in the prompt template (MessagesPlaceholder("chat_history"))
return_messages=True — ensures the memory returns full messages, not just strings.
What will be stored in MongoDB (long-term memory)?
Each document in the "history" collection looks like:
json
CopyEdit
{
"session_id": "latest_agent_session",
"messages": [
{"type": "human", "content": "What is prompt compression?"},
{"type": "ai", "content": "Prompt compression is..."}
]
}
This long term memory will enable the agent to recall what the user previously asked. So, episodic memory agents will reason across multiple conversations, like remembering a research topic the user asked about 10 minutes ago or in a previous session.
Agent creation (tool use + memory)
from langchain.agents import AgentExecutor, create_tool_calling_agent
# Step 1: Create the agent with tool-calling capability
agent = create_tool_calling_agent(
llm=llm, #Brain: LLM you've configured
tools=tools, #Actions: Search Arxiv, access knowledge base, compress prompts
prompt=prompt #Instruction: Custom system message and chat formatting
)
# Step 2: Wrap the agent in an executor that handles reasoning, looping, and memory
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, #Enables step-by-step output
handle_parsing_errors=True, #Prevents crashes from tool call formatting issues
memory=memory #Long-term memory from MongoDB conversation history
)
Agent execution
Now that the agent is fully built with a brain (LLM), tools (actions), and memory (chat history), we can run it by invoking the executor.
We’ll use this call to prompt the agent to search academic literature:
agent_executor.invoke(
{
"input": "Get me a list of research papers on the topic Prompt Compression in LLM Applications."
}
)
Here the tool that searches the arXiv API and retrieves metadata (e.g. title, authors, date) for up to 10 papers (see below) matching your query.
Real-world applications of agent memory
1- Using external memory (arXiv) for knowledge retrieval
We’ll ask the agentto retrieve relevant academic papers about a specific and niche topic: Prompt Compression in LLMs (Large Language Models).
Input: {‘input’: ‘Get me a list of research papers on the topic Prompt Compression in LLM Applications.’
Agent Behavior (Console output):
> Entering new AgentExecutor chain…
Invoking: `get_metadata_information_from_arxiv` with `{'word': 'Prompt Compression in LLM Applications'}`
Output (Agent response): Here is the list of research papers the agent retrieved from arXiv document:
Output (Agent response):
{'input': 'What paper did we speak about from our chat history?',
'chat_history': '',
'output': 'Here are some research papers on the topic Prompt Compression in LLM Applications:\n\n1. "SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself" by Jun Gao\n2. "Adapting LLMs for Efficient Context Processing through Soft Prompt Compression" by Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd\n3. "LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models" by Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu\n4. "Learning to Compress Prompt in Natural Language Formats" by Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu\n5. "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression"'}
Note: The ‘chat_history’: ” is empty because this is the initial interaction
2- Leveraging episodic memory to reference prior conversations
Input: agent_executor.invoke({“input”: “What paper did we speak about from our chat history?”})
Agent Behavior (Console Output):
> Entering new AgentExecutor chain…
Invoking: `get_metadata_information_from_arxiv` with `{'word': 'chat history'}`
responded: I need to access the chat history to answer this question.
Output (Agent response):
'chat_history': 'Human: Get me a list of research papers on the topic Prompt Compression in LLM Applications.\nAI: Here are some research papers on the topic Prompt Compression in LLM Applications:\n\n1. "SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself" by Jun Gao\n2. "Adapting LLMs for Efficient Context Processing through Soft Prompt Compression" by Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd\n3. "LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models" by Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu\n4. "Learning to Compress Prompt in Natural Language Formats" by Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu\n5. "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression"',
'output': 'The paper we spoke about from our chat history is "ToxBuster: In-game Chat Toxicity Buster with BERT" by Zachary Yang, Yasmine Maricar, MohammadReza Davari, Nicolas Grenon-Godbout, and Reihaneh Rabbany.'}
The agent remembers the chat history using its conversational memory stored in MongoDB.
3- Using long-term knowledge memory to answer contextual queries
Input: agent_executor.invoke({“input”: “Get me some papers you have within your knowledge”})
Agent Behavior (Console Output):
> Entering new AgentExecutor chain…
Invoking: `knowledge_base` with `{'query': 'Prompt Compression'}`
Thanks to the memory wired in via MongoDB, the agent remembers the earlier topic “Prompt Compression” from the conversation. It doesn’t need the user to repeat the term explicitly.
Instead of searching the web again, the agent taps into its long-term knowledge memory (a MongoDB vector store). It performs a semantic search, comparing the embedded query to stored paper embeddings, and returns the most relevant matches.
Output (Agent response):
4- Episodic memory recall
Input: agent_executor.invoke ({ “input”: “What was the first question I asked?”})
Agent Behavior (Console Output):
> Entering new AgentExecutor chain…
The first question you asked was: "Get me a list of research papers on the topic Prompt Compression."
Output (Agent response):
When the agent is asked, “What was the first question I asked?” it remembered correctly and responded:
“The first question you asked was: ‘Get me a list of research papers on the topic Prompt Compression.’”
This confirms the agent can recall earlier interactions from the conversation history stored in memory.
External Links
- 1. https://langchain-ai.github.io/langgraph/concepts/memory/#manage-short-term-memory
- 2. https://langchain-ai.github.io/langgraph/concepts/memory/#manage-short-term-memory
- 3. https://medium.com/@honeyricky1m3/giving-your-ai-a-mind-exploring-memory-frameworks-for-agentic-language-models-c92af355df06
- 4. https://langchain-ai.github.io/langgraph/concepts/memory/#manage-short-term-memory
- 5. https://medium.com/@honeyricky1m3/giving-your-ai-a-mind-exploring-memory-frameworks-for-agentic-language-models-c92af355df06
Comments
Your email address will not be published. All fields are required.