AI Systems Design

The Context Problem: Why AI Forgets and How to Fix It

Jan 05, 20269 min readBy Frederick Nwokobia

You're three messages into a ChatGPT conversation. It's going well. Then you close the tab. Next time you open it, the AI has no idea what you were talking about. This is the context problem. And it's the biggest barrier to AI being actually useful.

Why Context Matters

Humans don't start every conversation from scratch. We remember:

  • What we discussed last time
  • Ongoing projects
  • Preferences and patterns

AI doesn't. Every conversation is isolated unless you manually provide context. This works fine for one-off questions. It breaks down for ongoing work.

The Technical Challenge

AI models have a context window The amount of text they can "see" at once.

  • GPT-4: ~128k tokens (~96k words)
  • Claude: ~200k tokens (~150k words)

That sounds like a lot. Until you realize:

  • A week of email: ~50k tokens
  • A codebase: ~500k tokens
  • Your entire note archive: ~5M tokens

You can't fit everything into context. You need to be selective.

The Three Types of Context

1. Conversational Context

What was said earlier in this conversation. This is handled automatically by most AI tools. The chat history is included in each request.

2. Session Context

Information relevant to this work session, but not part of the current conversation. Example: You're drafting an email. The AI should know:

  • Who you're emailing
  • What project this relates to
  • Past emails with this person

This requires state management outside the conversation.

3. Long-Term Context

Your preferences, past decisions, and knowledge base. Example: The AI should know:

  • How you prefer emails structured
  • Your role and company
  • Projects you're working on

This requires persistent memory across sessions.

How to Build Context-Aware Systems

Here's the architecture:


┌──────────────────────────────────────┐

│          User Request                │

└──────────────┬───────────────────────┘

│

┌──────────────▼───────────────────────┐

│      Context Retrieval Layer         │

│  (Fetches relevant information)      │

└──────────────┬───────────────────────┘

│

┌──────────┼──────────┐

│          │          │

┌───▼────┐ ┌──▼────┐ ┌──▼────┐

│Convo   │ │Session│ │Long-  │

│History │ │State  │ │Term   │

└────────┘ └───────┘ └───────┘

│

┌──────────────▼───────────────────────┐

│         AI Model (GPT-4, etc)        │

└──────────────┬───────────────────────┘

│

┌──────────────▼───────────────────────┐

│            Response                  │

└──────────────────────────────────────┘

When you make a request:

  1. System retrieves relevant context from all three layers
  2. Combines it with your request
  3. Sends to AI model
  4. Returns response

The key: selective retrieval. You don't dump everything into context. You fetch what's relevant.

Implementing Context Retrieval

Here's a simplified example:


def get_relevant_context(user_request, user_id):
# 1.

Conversational context (last N messages)

conversation_history = db.get_recent_messages(user_id, limit=10)
# 2.

Session context (current project, open documents)

session_state = session_store.get(user_id)
# 3.

Long-term context (semantic search over past interactions)

relevant_memories = vector_db.search(

query=user_request,

user_id=user_id,

top_k=5

)
# Combine contexts

full_context = {

"conversation": conversation_history,

"session": session_state,

"memories": relevant_memories

}

return full_context
# Use in AI request
context = get_relevant_context("Draft an email to John", user_id="123")

response = ai_model.generate(

prompt=user_request,

context=context

)

The AI sees:

  • Recent conversation
  • Current session state
  • Relevant past interactions

It doesn't see everything. Just what matters for this request.

The Memory Hierarchy

Not all context is equally important. You need a hierarchy: Tier 1: Immediate Context (always included)

  • Current conversation
  • Active documents
  • Current task

Tier 2: Recent Context (included if relevant)

  • This week's activity
  • Ongoing projects
  • Recent decisions

Tier 3: Historical Context (retrieved via semantic search)

  • Past projects
  • Old decisions
  • Archived knowledge

Think of it like human memory:

  • Working memory: What you're thinking about right now
  • Short-term memory: What happened recently
  • Long-term memory: Everything else, retrieved when needed

The Forgetting Curve

Interestingly, forgetting is a feature, not a bug. You don't want the AI to remember everything forever. Old information becomes noise. A good context system should:

  • Prioritize recent information (recency bias)
  • Surface frequently accessed information (frequency bias)
  • Decay old information (unless explicitly marked as important)

def calculate_relevance_score(memory, current_time):
# Recency: More recent = higher score

days_old = (current_time - memory.timestamp).days

recency_score = 1 / (1 + days_old * 0.1)
# Frequency: More accessed = higher score

frequency_score = min(memory.access_count / 10, 1.0)
# Semantic: More relevant to query = higher score

semantic_score = cosine_similarity(query_embedding, memory.embedding)
# Combined score

return (recency_score * 0.3 +

frequency_score * 0.2 +

semantic_score * 0.5)

This ensures the AI focuses on what matters, not what's merely stored.

Real-World Implementation

I built a context-aware system for a legal team. The challenge:

  • 1000s of past cases
  • Complex legal precedents
  • Client-specific preferences

Without context, the AI would generate generic legal language. With context:

  1. Conversational context: Current case discussion
  2. Session context: Open case files, relevant statutes
  3. Long-term context: Similar past cases, client preferences

Result: The AI generated drafts that reflected:

  • Firm's writing style
  • Client's preferred language
  • Relevant precedents from past cases

Accuracy went from "generic and useless" to "needs minor edits."

The Privacy Trade-Off

Context requires memory. Memory requires storage. Where does that data live?

Option 1: Cloud storage

  • Pro: Accessible anywhere
  • Con: Your data lives on someone else's servers

Option 2: Local storage

  • Pro: Full privacy
  • Con: Not accessible across devices

Option 3: Encrypted cloud

  • Pro: Accessible + private
  • Con: More complex setup

For most use cases, encrypted cloud is the right balance.

Building Your Own Context Layer

You don't need to build everything from scratch. Here's a practical stack:

Storage:

  • PostgreSQL (structured data: users, sessions)
  • Vector DB (semantic memory: Pinecone, Weaviate)
  • Redis (session state: fast access)

Retrieval:

  • Semantic search via embeddings
  • Recency/frequency scoring
  • Manual "pin" for important memories

Integration:

  • API layer that sits between your app and AI models
  • Injects context before sending requests
  • Updates memory after responses

The Future of Context

We're moving toward continuous context Systems that maintain state across all your interactions. Imagine:

  • AI that remembers every conversation you've ever had
  • Surfaces relevant past discussions automatically
  • Learns your preferences without explicit training

The technology exists. The challenge is building it responsibly.


Need help building a context-aware AI system? Let's talk.