The Context Problem: Why AI Forgets and How to Fix It

You're three messages into a ChatGPT conversation. It's going well. Then you close the tab. Next time you open it, the AI has no idea what you were talking about. This is the context problem. And it's the biggest barrier to AI being actually useful.

Why Context Matters

Humans don't start every conversation from scratch. We remember:

What we discussed last time
Ongoing projects
Preferences and patterns

AI doesn't. Every conversation is isolated unless you manually provide context. This works fine for one-off questions. It breaks down for ongoing work.

The Technical Challenge

AI models have a context window The amount of text they can "see" at once.

GPT-4: ~128k tokens (~96k words)
Claude: ~200k tokens (~150k words)

That sounds like a lot. Until you realize:

A week of email: ~50k tokens
A codebase: ~500k tokens
Your entire note archive: ~5M tokens

You can't fit everything into context. You need to be selective.

The Three Types of Context

1. Conversational Context

What was said earlier in this conversation. This is handled automatically by most AI tools. The chat history is included in each request.

2. Session Context

Information relevant to this work session, but not part of the current conversation. Example: You're drafting an email. The AI should know:

Who you're emailing
What project this relates to
Past emails with this person

This requires state management outside the conversation.

3. Long-Term Context

Your preferences, past decisions, and knowledge base. Example: The AI should know:

How you prefer emails structured
Your role and company
Projects you're working on

This requires persistent memory across sessions.

How to Build Context-Aware Systems

Here's the architecture:


┌──────────────────────────────────────┐

│          User Request                │

└──────────────┬───────────────────────┘

│

┌──────────────▼───────────────────────┐

│      Context Retrieval Layer         │

│  (Fetches relevant information)      │

└──────────────┬───────────────────────┘

│

┌──────────┼──────────┐

│          │          │

┌───▼────┐ ┌──▼────┐ ┌──▼────┐

│Convo   │ │Session│ │Long-  │

│History │ │State  │ │Term   │

└────────┘ └───────┘ └───────┘

│

┌──────────────▼───────────────────────┐

│         AI Model (GPT-4, etc)        │

└──────────────┬───────────────────────┘

│

┌──────────────▼───────────────────────┐

│            Response                  │

└──────────────────────────────────────┘

When you make a request:

System retrieves relevant context from all three layers
Combines it with your request
Sends to AI model
Returns response

The key: selective retrieval. You don't dump everything into context. You fetch what's relevant.

Implementing Context Retrieval

Here's a simplified example:


def get_relevant_context(user_request, user_id):
# 1.

Conversational context (last N messages)

conversation_history = db.get_recent_messages(user_id, limit=10)
# 2.

Session context (current project, open documents)

session_state = session_store.get(user_id)
# 3.

Long-term context (semantic search over past interactions)

relevant_memories = vector_db.search(

query=user_request,

user_id=user_id,

top_k=5

)
# Combine contexts

full_context = {

"conversation": conversation_history,

"session": session_state,

"memories": relevant_memories

}

return full_context
# Use in AI request
context = get_relevant_context("Draft an email to John", user_id="123")

response = ai_model.generate(

prompt=user_request,

context=context

)

The AI sees:

Recent conversation
Current session state
Relevant past interactions

It doesn't see everything. Just what matters for this request.

The Memory Hierarchy

Not all context is equally important. You need a hierarchy: Tier 1: Immediate Context (always included)

Current conversation
Active documents
Current task

Tier 2: Recent Context (included if relevant)

This week's activity
Ongoing projects
Recent decisions

Tier 3: Historical Context (retrieved via semantic search)

Past projects
Old decisions
Archived knowledge

Think of it like human memory:

Working memory: What you're thinking about right now
Short-term memory: What happened recently
Long-term memory: Everything else, retrieved when needed

The Forgetting Curve

Interestingly, forgetting is a feature, not a bug. You don't want the AI to remember everything forever. Old information becomes noise. A good context system should:

Prioritize recent information (recency bias)
Surface frequently accessed information (frequency bias)
Decay old information (unless explicitly marked as important)


def calculate_relevance_score(memory, current_time):
# Recency: More recent = higher score

days_old = (current_time - memory.timestamp).days

recency_score = 1 / (1 + days_old * 0.1)
# Frequency: More accessed = higher score

frequency_score = min(memory.access_count / 10, 1.0)
# Semantic: More relevant to query = higher score

semantic_score = cosine_similarity(query_embedding, memory.embedding)
# Combined score

return (recency_score * 0.3 +

frequency_score * 0.2 +

semantic_score * 0.5)

This ensures the AI focuses on what matters, not what's merely stored.

Real-World Implementation

I built a context-aware system for a legal team. The challenge:

1000s of past cases
Complex legal precedents
Client-specific preferences

Without context, the AI would generate generic legal language. With context:

Conversational context: Current case discussion
Session context: Open case files, relevant statutes
Long-term context: Similar past cases, client preferences

Result: The AI generated drafts that reflected:

Firm's writing style
Client's preferred language
Relevant precedents from past cases

Accuracy went from "generic and useless" to "needs minor edits."

The Privacy Trade-Off

Context requires memory. Memory requires storage. Where does that data live?

Option 1: Cloud storage

Pro: Accessible anywhere
Con: Your data lives on someone else's servers

Option 2: Local storage

Pro: Full privacy
Con: Not accessible across devices

Option 3: Encrypted cloud

Pro: Accessible + private
Con: More complex setup

For most use cases, encrypted cloud is the right balance.

Building Your Own Context Layer

You don't need to build everything from scratch. Here's a practical stack:

Storage:

PostgreSQL (structured data: users, sessions)
Vector DB (semantic memory: Pinecone, Weaviate)
Redis (session state: fast access)

Retrieval:

Semantic search via embeddings
Recency/frequency scoring
Manual "pin" for important memories

Integration:

API layer that sits between your app and AI models
Injects context before sending requests
Updates memory after responses

The Future of Context

We're moving toward continuous context Systems that maintain state across all your interactions. Imagine:

AI that remembers every conversation you've ever had
Surfaces relevant past discussions automatically
Learns your preferences without explicit training

The technology exists. The challenge is building it responsibly.

Need help building a context-aware AI system? Let's talk.