Diffusion Language Model from Scratch

Designed and trained a diffusion-based language model from the ground up.

Replaced autoregressive decoding with confidence-based token unmasking for parallel generation.
Built the training pipeline with SwiGLU activations, TikToken BPE encoding, mask-token injection, and RoPE.
Trained on TinyStories to benchmark coherence and generation behavior.