Every animated demo on this page is a real forward pass through a small open-weight model (distilgpt2 or TinyLlama). All the intermediates are captured — tokens, embeddings, residual streams, attention patterns, logits, probabilities — and rendered as a step-by-step slideshow. No magic, no hand-waved arrows.
Built with HuggingFace transformers directly so every layer is observable.
A 6-layer GPT-2 base model. Watch the residual stream evolve layer-by-layer and the LM head pick "April" out of 50,257 candidates by similarity.
A chat-tuned Llama with RoPE positional encoding and untied LM-head weights. Same pipeline — different scale, different architectural choices.
Zoom into one transformer block. Watch a real attention head compute Q · K‑transpose, morph through softmax, then mix V. Then the FFN expands 768 → 3072, applies GELU, contracts back. Real numbers, animated.
Same deep-dive on TinyLlama. Bigger numbers (2048 → 5632 SwiGLU FFN), grouped-query attention (32 query heads, 4 KV heads), the SiLU activation curve. RoPE rotation explained for the chosen head.
2D projection of every vocab token's embedding row. Months cluster, days cluster, animals cluster — without any explicit semantics, just learned co-occurrence.
Same idea on a different model. The clusters look different — TinyLlama uses sentencepiece + a separately-trained vocabulary.