Context Windows Explained: Why Your LLM’s “Memory” Costs More Than You Think
How an LLM Actually Works Strip away the chatbot UI and an LLM is doing something surprisingly mechanical. Tokenization. Text goes in. The first step breaks it into tokens — sub-word units the model was trained to recognize. A token averages roughly four characters or three-quarters of a word in English. Code tokenizes less efficiently, closer to two tokens per word, because identifiers, operators, and whitespace are treated differently. CJK languages consume 2–8x more tokens than English for equivalent content. Different models use different tokenizers, so 1,000 tokens of GPT-5 input is not the same string as 1,000 tokens of Claude input. Embedding. Each token is converted into a high-dimensional vector — typically 4,096 to 16,384 dimensions in modern frontier models.