small language model written and train fully locally
start date: 10.02
Chaos made character total: 42373
characters:
!"#%&'()+,-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxyz~
vocab size: 83
v1 (repo):
no memory
64 vector size
32 token length
no memory
2000 training loop
output example: wel. nng. Ul Jis.","Theresolivip paloop promeve bestimeatofrace,Sht,"Evetelili,"Ed mbode daninukitmy
v2:
brain cells added - a little logic + memo to get meaningful output
batch_size = 64
v3:
no progress ,
ui added ,
opens via localhost
to run code: streamlit run app.py
v4:
trying to solve nonsenseful randomness ,
i tried temp already so top k sampling or top p sampling might solve issue ,
vector size: 64 ,
block size:32 ,
batch size:32 ,
temp:0.8
v5:
vector size: 128 ,
block size:128 ,
batch size:64 ,
temp:0.7 ,
v6:
head attention added ,
dataset changed (simple english sentences- he walks home etc) ,
feed forward added: self.blocks = nn.Sequential(MultiHeadAttention(...), FeedForward(...)) ,
there was a mistake while uploading weight in app.py (fixed) ,
load_state_ditch → load_state_dict ,

why foods fly. 300. He brushes his teethere. 278. The sun feels hair. 86. We eat dish soft song. 298. T
27.02 codes took to much time to train due to hardware issues trying to run it on gpu, until that 23.02 is last one
v7: 01.03 runs on gpu, better output quality , ++batch ++head , will try again w bigger dataset
Vocabulary size: 83 Total tokens: 57224 Model parameters: 0.35M Step 0: loss 4.7196 | Sample: 0pCel te2Nw&XnJrDOMa4z[qk3e#(g;cCg'Bnm!ltaY-u:HEY~ Step 200: loss 2.5153 | Sample: Grwavin Ithts ly. She bof 2287. ther t out inghero (...) Step 29600: loss 1.1040 | Sample: I buys of the plays witter. 426. She closes the st Step 29800: loss 1.0925 | Sample: of Zepperonic (1994)" "Yo canclin shiki bird. 3142 Model trained & weights saved...
13.03: larger dataset, 30000 epochs (0.36 parameters)
Teleman. 80. The stering tall have young from and distand down of the lonce, and charp
