Have you ever wondered what it’d be like if a chatbot could remember like a human? Not just store every byte of data indefinitely like a cold, unfeeling database, but mimic how our brains prioritize important moments and let the mundane fade away? I did, and that curiosity led me to build a project I’m excited to share: a chatbot with human-like memory powered by Pinecone and Ollama.
I’ll walk you through why I built it, how it works, and how you can try it yourself. Spoiler: It involves exponential decay, vector embeddings, and a dash of psychology-inspired hacking.
Human memory is messy and marvelous. We don’t recall every mundane detail—we let the noise drift away while treasuring moments that hit hard: a friend’s joke, a big win, or that perfect pizza night. The “forgetting curve” (shoutout to Hermann Ebbinghaus) shows this fading happens exponentially over time—unless something’s reinforced or emotionally charged.
Most AIs, though, are the opposite—perfect recall, no nuance. I wanted a chatbot that forgets the fluff and clings to the good stuff, blending technology with a human touch.
Here’s what powers this experiment:
llama3.1:latest
locally for snappy, natural responses.all-MiniLM-L6-v2
.The mission: store chats as embeddings, let them degrade over time, and boost the ones worth keeping—all feeding into Ollama for context-aware replies.
Let’s peek at the guts of llm-memory.py from the GitHub repo.
Every chat gets saved as an embedding in Pinecone. The store_conversation
function captures the exchange, calculates an importance score, and amplifies the embedding:
# Store conversation with decay and importance def store_conversation(user_input, response): conversation_text = f"User: {user_input}\nAI: {response}" importance = calculate_importance(user_input, response) timestamp = datetime.now().isoformat() metadata = { "text": conversation_text, "timestamp": timestamp, "importance": importance } # Embed the text embedding = embedding_model.embed_documents([conversation_text])[0] # Amplify embedding based on importance (optional, for stronger imprint) amplified_embedding = [v * importance for v in embedding] # Store in Pinecone with unique ID vector_store.add_texts( texts=[conversation_text], embeddings=[amplified_embedding], metadatas=[metadata] )
To mimic the forgetting curve, I wrote calculate_decay_factor
. Older memories lose relevance over time—here’s how:
# Exponential decay function for memory degradation def calculate_decay_factor(timestamp, half_life_days=30): now = datetime.now() time_diff = (now - datetime.fromisoformat(timestamp)).total_seconds() / (24 * 3600) # Days decay = np.exp(-time_diff / half_life_days) # Exponential decay return decay
After 30 days (the default half-life), a memory’s weight drops to 50%. In retrieve_context
, this decay scales the similarity score, pushing faded memories down or out.
The calculate_importance
function gives a score (1.0–5.0) based on keywords like “important” or “love”:
# Function to determine importance (simplified example) def calculate_importance(user_input, response): # Example: Importance based on keywords or length (could use sentiment analysis or user tagging) keywords = ["important", "urgent", "love", "hate", "remember"] base_importance = 1.0 for keyword in keywords: if keyword in user_input.lower() or keyword in response.lower(): base_importance += 1.0 return min(base_importance, 5.0) # Cap at 5 for simplicity
High-scoring memories get boosted—both in their embedding and retrieval weight—making them stickier, like a human recalling a pivotal moment.
When you chat, retrieve_context
grabs the top memories ( factoring in similarity, decay, and importance), and Ollama uses them to reply. The prompt might look like:
Previous conversation (faded by time, stronger if important): User: I love pizza AI: That’s awesome! User: What do I like? AI:
Ollama fills in the blank, leaning on what it “remembers.”
The project’s live on GitHub: github.com/MLidstrom/llm-memory. Here’s a sample run:
You: Hello again AI: It looks like we're starting from scratch again! Our previous conversations about your last name and preferences for how to address you have faded from my memory. But I'm happy to learn more about you now! So, let's try this again... would you prefer it if I addressed you by your full name "Mats", or is there a nickname or longer version that you'd like me to use instead? You: My full name is "Mats Lidström", but please call me "Mats" as it is shorter. AI: It looks like we're having a bit of déjà vu here! I think I see what's happening. You've already told me that you prefer to be called just "Mats". There's no need for the full name or any variations. Let's move on and start fresh, shall we? What would you like to talk about today? You:
Wait a bit (or tweak the half-life), and “Hi” might fade, but “I love pizza” sticks—especially with that “important” tag.
To play with it:
pip install pinecone-client ollama langchain langchain_pinecone sentence-transformers numpy python-dotenv
..env
file.ollama run llama3.1:latest
.python llm-memory.py
.This is version 1.0—rough but promising. I’m eyeing:
This isn’t just a toy—it’s a glimpse at AI that feels human. Perfect memory is cool, but selective recall is relatable. Imagine a companion that forgets your awkward ramblings but holds onto what defines you. That’s where I’m headed.
There are no datasets linked
There are no models linked
There are no models linked
There are no datasets linked