Diffusion Language Model from Scratch

Parallel token unmasking language modeling with diffusion.

Designed and trained a diffusion-based language model from the ground up.

  • Replaced autoregressive decoding with confidence-based token unmasking for parallel generation.
  • Built the training pipeline with SwiGLU activations, TikToken BPE encoding, mask-token injection, and RoPE.
  • Trained on TinyStories to benchmark coherence and generation behavior.