Diffusion Language Model from Scratch
Parallel token unmasking language modeling with diffusion.
Designed and trained a diffusion-based language model from the ground up.
- Replaced autoregressive decoding with confidence-based token unmasking for parallel generation.
- Built the training pipeline with SwiGLU activations, TikToken BPE encoding, mask-token injection, and RoPE.
- Trained on TinyStories to benchmark coherence and generation behavior.