Embedding matrix explorer — distilgpt2

Section 1 · The lookup

The model's embedding matrix is a giant table — one row per token in its vocabulary, and each row is a vector of ? numbers. When the tokenizer turns your prompt into integer ids, the model uses each id as a row index into this table to pull out one row. That's literally it — no math yet. Just a lookup.

Embedding matrix shape: ? · ? at fp32 · ? rows total · we showed ? rows above

Section 2 · The neighborhood (PCA · 2D)

Plotting ? tokens after squashing the ?-dim embedding space down to 2 dimensions via PCA. The two axes are the directions of greatest spread across the full vocabulary (PC1 = ?%, PC2 = ?% of total variance). Tokens with similar meanings often end up near each other — this is learned structure, not designed.