Skip to content

k-duan/lm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lm

A small (enough) transformer model trained on tiny shakespeare dataset.

Screenshot 2025-01-11 at 08.44.46.png

Goals:

  • Understand the minimum configurations that can make an autoregressive LM "work".

Implementations:

  • A simple ASCII tokenizer using python's ord() and chr().
  • Top K sampling

Notes:

  • Does training loss decrease and converge means a correct implementation or successful model?
    • No! There was a bug in calculating attention mask, but the LM can still overfit training data with that bug. With that bug, the text generated by the model is completely nonsense.
  • Is position embedding really necessary?
    • No! It's not mandatory for an autoregressive model to work, if causal mask is used.
  • Most commonly made implementation mistakes

References:

  • The code is written from scratch but several bugs were found when comparing with Andrej Karpathy's nanoGPT repo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages