Tokens ≠ words: what your context files actually cost

Ask a developer how long their CLAUDE.md is and they'll answer in lines or words. Ask the model and it answers in tokens — and tokens are the only count that matters, because every context window, every rate limit, and every API invoice is denominated in them.

What a token actually is

Modern LLMs use byte-pair encoding: the vocabulary is built from the most frequent character sequences in the training data, not from dictionary words. Common words get one token; rarer words get sliced wherever the statistics say. In the cl100k encoding, "the" is 1 token, but "optimization" is 2 — split as optim + ization.

The word optimization split into its two real cl100k tokens, optim and ization, with their token IDs — Real cl100k split — one word, two tokens, IDs 19680 and 2065.

The exchange rate is counterintuitive in both directions. "internationalization" — 20 characters — is also just 2 tokens (international + ization), because both pieces are common in the training data. Meanwhile a single 🚀 emoji costs 3 tokens. Length in characters tells you almost nothing.

Formatting is where tokens go to die

Here is the same two-row roster as a markdown table versus a plain list, counted with cl100k:

| Name | Role |          - Ada — Engineer
|------|------|          - Lin — Designer
| Ada | Engineer |
| Lin | Designer |

   20 tokens                 9 tokens

Every pipe, every dash in the separator row, every alignment space is a token. The table costs 2.2× the list and carries identical information. Multiply that across a CLAUDE.md with six tables and you've spent hundreds of tokens on decoration the model doesn't need.

It's worth saying clearly: tables aren't bad. When data is genuinely two-dimensional, a table is the right call. The waste is in tables used for things that are really lists, boilerplate phrases repeated in every section, and filler prose ("It is important to note that...") that adds tokens but no instruction.

The 5-pass optimizer

Every file MDPilot generates — and any markdown you paste into the optimizer — runs through five passes, each measured against a js-tiktoken baseline so you see exactly what was saved:

Tokenize + baseline — count before touching anything.
Boilerplate strip — filler phrases and empty-calorie sentences removed by rule.
Cross-file dedup — the same setup instructions in README.md and AGENTS.md? One canonical copy, referenced from the other.
Verbose compression — long-winded constructions rewritten tighter without changing meaning.
Line compression — structural whitespace and redundant separators collapsed.

Typical result is a 20–40% reduction. On a 4,000-token context file that's up to 1,600 tokens back — context that now holds more of your actual code instead of formatting overhead.

The counting happens in your browser with js-tiktoken — your markdown never leaves the page just to get measured. Try it on your own CLAUDE.md: the before/after number is usually convincing on its own.