Have you ever noticed that after a long conversation with an AI assistant, it suddenly seems to forget what you said earlier? This isn't a glitch—it's a fundamental limitation known as the context window.
Large language models (LLMs) like GPT-4, Claude, and Gemini can only process a fixed amount of text at once. This limit, called the context window, is measured in tokens (words or word pieces). Once the conversation exceeds that limit, the model drops the oldest information to make room for new input. It's like a whiteboard that can only hold so many notes: when it's full, the earliest notes are erased.
For example, a model with an 8,000-token context window can remember roughly the last 6,000 words of a chat. While this works for short exchanges, longer discussions or document analysis often suffer from “forgetting.” Recent advances—like Google Gemini's 1 million token context and Anthropic's 100K token Claude—are expanding these limits, but the core constraint remains.
Understanding context windows is crucial for anyone using AI effectively. It explains why you sometimes need to repeat key details, and why AI can't maintain a perfect memory in long conversations.