You're three messages into a ChatGPT conversation. It's going well. Then you close the tab. Next time you open it, the AI has no idea what you were talking about. This is the context problem. And it's the biggest barrier to AI being actually useful.
Why Context Matters
Humans don't start every conversation from scratch. We remember:
- What we discussed last time
- Ongoing projects
- Preferences and patterns
AI doesn't. Every conversation is isolated unless you manually provide context. This works fine for one-off questions. It breaks down for ongoing work.
The Technical Challenge
AI models have a context window The amount of text they can "see" at once.
- GPT-4: ~128k tokens (~96k words)
- Claude: ~200k tokens (~150k words)
That sounds like a lot. Until you realize:
- A week of email: ~50k tokens
- A codebase: ~500k tokens
- Your entire note archive: ~5M tokens
You can't fit everything into context. You need to be selective.
The Three Types of Context
1. Conversational Context
What was said earlier in this conversation. This is handled automatically by most AI tools. The chat history is included in each request.
2. Session Context
Information relevant to this work session, but not part of the current conversation. Example: You're drafting an email. The AI should know:
- Who you're emailing
- What project this relates to
- Past emails with this person
This requires state management outside the conversation.
3. Long-Term Context
Your preferences, past decisions, and knowledge base. Example: The AI should know:
- How you prefer emails structured
- Your role and company
- Projects you're working on
This requires persistent memory across sessions.
How to Build Context-Aware Systems
Here's the architecture:
┌──────────────────────────────────────┐
│ User Request │
└──────────────┬───────────────────────┘
│
┌──────────────▼───────────────────────┐
│ Context Retrieval Layer │
│ (Fetches relevant information) │
└──────────────┬───────────────────────┘
│
┌──────────┼──────────┐
│ │ │
┌───▼────┐ ┌──▼────┐ ┌──▼────┐
│Convo │ │Session│ │Long- │
│History │ │State │ │Term │
└────────┘ └───────┘ └───────┘
│
┌──────────────▼───────────────────────┐
│ AI Model (GPT-4, etc) │
└──────────────┬───────────────────────┘
│
┌──────────────▼───────────────────────┐
│ Response │
└──────────────────────────────────────┘
When you make a request:
- System retrieves relevant context from all three layers
- Combines it with your request
- Sends to AI model
- Returns response
The key: selective retrieval. You don't dump everything into context. You fetch what's relevant.
Implementing Context Retrieval
Here's a simplified example:
def get_relevant_context(user_request, user_id):
# 1.
Conversational context (last N messages)
conversation_history = db.get_recent_messages(user_id, limit=10)
# 2.
Session context (current project, open documents)
session_state = session_store.get(user_id)
# 3.
Long-term context (semantic search over past interactions)
relevant_memories = vector_db.search(
query=user_request,
user_id=user_id,
top_k=5
)
# Combine contexts
full_context = {
"conversation": conversation_history,
"session": session_state,
"memories": relevant_memories
}
return full_context
# Use in AI request
context = get_relevant_context("Draft an email to John", user_id="123")
response = ai_model.generate(
prompt=user_request,
context=context
)
The AI sees:
- Recent conversation
- Current session state
- Relevant past interactions
It doesn't see everything. Just what matters for this request.
The Memory Hierarchy
Not all context is equally important. You need a hierarchy: Tier 1: Immediate Context (always included)
- Current conversation
- Active documents
- Current task
Tier 2: Recent Context (included if relevant)
- This week's activity
- Ongoing projects
- Recent decisions
Tier 3: Historical Context (retrieved via semantic search)
- Past projects
- Old decisions
- Archived knowledge
Think of it like human memory:
- Working memory: What you're thinking about right now
- Short-term memory: What happened recently
- Long-term memory: Everything else, retrieved when needed
The Forgetting Curve
Interestingly, forgetting is a feature, not a bug. You don't want the AI to remember everything forever. Old information becomes noise. A good context system should:
- Prioritize recent information (recency bias)
- Surface frequently accessed information (frequency bias)
- Decay old information (unless explicitly marked as important)
def calculate_relevance_score(memory, current_time):
# Recency: More recent = higher score
days_old = (current_time - memory.timestamp).days
recency_score = 1 / (1 + days_old * 0.1)
# Frequency: More accessed = higher score
frequency_score = min(memory.access_count / 10, 1.0)
# Semantic: More relevant to query = higher score
semantic_score = cosine_similarity(query_embedding, memory.embedding)
# Combined score
return (recency_score * 0.3 +
frequency_score * 0.2 +
semantic_score * 0.5)
This ensures the AI focuses on what matters, not what's merely stored.
Real-World Implementation
I built a context-aware system for a legal team. The challenge:
- 1000s of past cases
- Complex legal precedents
- Client-specific preferences
Without context, the AI would generate generic legal language. With context:
- Conversational context: Current case discussion
- Session context: Open case files, relevant statutes
- Long-term context: Similar past cases, client preferences
Result: The AI generated drafts that reflected:
- Firm's writing style
- Client's preferred language
- Relevant precedents from past cases
Accuracy went from "generic and useless" to "needs minor edits."
The Privacy Trade-Off
Context requires memory. Memory requires storage. Where does that data live?
Option 1: Cloud storage
- Pro: Accessible anywhere
- Con: Your data lives on someone else's servers
Option 2: Local storage
- Pro: Full privacy
- Con: Not accessible across devices
Option 3: Encrypted cloud
- Pro: Accessible + private
- Con: More complex setup
For most use cases, encrypted cloud is the right balance.
Building Your Own Context Layer
You don't need to build everything from scratch. Here's a practical stack:
Storage:
- PostgreSQL (structured data: users, sessions)
- Vector DB (semantic memory: Pinecone, Weaviate)
- Redis (session state: fast access)
Retrieval:
- Semantic search via embeddings
- Recency/frequency scoring
- Manual "pin" for important memories
Integration:
- API layer that sits between your app and AI models
- Injects context before sending requests
- Updates memory after responses
The Future of Context
We're moving toward continuous context Systems that maintain state across all your interactions. Imagine:
- AI that remembers every conversation you've ever had
- Surfaces relevant past discussions automatically
- Learns your preferences without explicit training
The technology exists. The challenge is building it responsibly.
Need help building a context-aware AI system? Let's talk.