March 24, 2026·6 min read·Helium Team

What Is a Context Window and Why Your AI Keeps Forgetting

You gave ChatGPT your entire project spec. Three messages later, it asked what language you're using. You told Claude about your database schema, had a productive back-and-forth, then it suggested a solution that contradicts the constraints you laid out in message one.

This isn't the AI being dumb. It's a hard architectural limit called the context window, and understanding it changes how you use every LLM.

The context window is the AI's entire working memory

Every time you send a message, the AI doesn't just read your latest input. It reads everything: your message, its previous response, the message before that, all the way back to the beginning of the conversation. The entire conversation gets reassembled and fed in as a single block of text.

The context window is the maximum size of that block, measured in tokens. A token is roughly ¾ of a word. GPT-4o has a 128,000-token context window. Claude has up to 200,000. Gemini claims up to 1 million on some models.

Those numbers sound enormous. They're not.

128,000 tokens is roughly 96,000 words, about the length of a novel. But that budget covers everything: the system prompt, your conversation history, any documents you uploaded, and the AI's response. A typical back-and-forth conversation burns through tokens fast because every previous message gets re-sent with each new request.

What happens when you hit the limit

The AI doesn't crash or throw an error when the conversation exceeds the context window. It does something worse: it silently drops the oldest content.

Most implementations use a sliding window approach. When the total conversation exceeds the limit, the earliest messages get truncated. The AI still responds, but it no longer has access to information from the beginning of the conversation. It's not choosing to forget. It literally can't see those messages anymore.

This is why long conversations degrade. The first 10 messages are sharp and contextualized. By message 40, the AI has lost your original instructions, your tech stack, your constraints, and half the decisions you made together. It starts giving generic advice because it no longer has the specific context that made earlier responses useful.

"Memory" features don't fix this

ChatGPT's Memory, Claude's memory, and similar features store small facts about you between conversations ("user prefers TypeScript," "user works at a startup"). These are injected as a few hundred tokens at the top of each new conversation.

This helps with surface-level personalization. It does not help with conversation-specific context. The AI might remember you prefer TypeScript, but it won't remember the specific component architecture you discussed three days ago, or the exact database schema you're working with, or the five approaches you already tried and rejected.

Memory features are a thin layer of personalization over the same context window limitation. They save you from re-stating your name and language preference. They don't save you from re-explaining your project.

The math that matters for daily use

Here's a practical calculation. Say you're using a model with a 128K token context window:

System prompt and memory: ~2,000 tokens
Your average message: ~200 tokens
AI's average response: ~800 tokens
Each exchange (you + AI): ~1,000 tokens

That gives you roughly 126 exchanges before the window fills up. Sounds like a lot, but if you pasted a 5,000-word document at the start (about 6,700 tokens), you're down to 119 exchanges. If the AI is giving long code responses (2,000+ tokens each), you're down to maybe 50-60 exchanges before early context starts dropping.

For complex technical conversations where you're iterating on code and architecture, you can hit the effective limit within 30-40 messages. That's one focused afternoon of work.

Five strategies that actually work

1. Front-load context in a single message. Instead of trickling in requirements over 10 messages, consolidate everything the AI needs into one comprehensive prompt. Tech stack, constraints, goals, relevant code, all in message one. This ensures the critical context occupies a fixed position that survives longer as the conversation grows.

2. Start new conversations for new topics. Don't have a 200-message conversation that covers six different features. Start a fresh conversation for each distinct task. Each new conversation gets the full context window budget.

3. Summarize and restart. When a conversation starts degrading (the AI forgets things, contradicts itself, gives generic advice), ask it to summarize the key decisions and current state. Copy that summary, start a new conversation, and paste it in. You get a fresh context window with the accumulated knowledge compressed into a fraction of the tokens.

4. Use project/system prompts for persistent context. Most platforms now support some form of system-level instructions: ChatGPT's Custom Instructions, Claude's system prompt, project-level context. Put your stable information here (tech stack, coding conventions, project architecture) so it's automatically included in every conversation without you re-typing it.

5. Build a personal context document. Maintain a document that describes who you are, what you're building, and how you work. Copy and paste it at the start of important conversations. This is more manual than system prompts, but it gives you full control over what the AI knows about you, and it's portable across platforms.

This last strategy is what Helium's "My Context" feature automates: a structured profile document you copy into any LLM conversation, so the AI starts at your level instead of starting from zero. But even a well-maintained text file gets you 80% of the benefit.

The deeper problem: conversations are the wrong unit of knowledge

Context windows are a technical limitation. But the real issue is conceptual: conversations are ephemeral by design. They're great for exploration, bad for retention.

When you have a breakthrough insight in conversation 47, it exists only in that conversation. The next time you start a fresh chat (which you should, per strategy #2), that insight is gone unless you explicitly carried it over.

The developers who get the most from AI don't just have better prompts. They have systems for extracting knowledge from conversations and storing it outside the context window. The conversation is the workspace. The library is the product.

Whether that library is a notes app, a code snippet manager, or a dedicated tool, the principle is the same: anything worth keeping needs to leave the conversation before the context window eats it.

aicontext-windowexplainer