Skip to content

nefiseT/Smaill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smaill

small language model written and train fully locally

start date: 10.02

Chaos made character total: 42373 characters:
!"#%&'()+,-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxyz~
vocab size: 83

image

v1 (repo):
no memory
64 vector size
32 token length
no memory
2000 training loop

output example: wel. nng. Ul Jis.","Theresolivip paloop promeve bestimeatofrace,Sht,"Evetelili,"Ed mbode daninukitmy

v2:
brain cells added - a little logic + memo to get meaningful output
batch_size = 64


v3:
no progress ,
ui added ,
opens via localhost



image
to run code: streamlit run app.py
v4:
trying to solve nonsenseful randomness ,
i tried temp already so top k sampling or top p sampling might solve issue ,
vector size: 64 ,
block size:32 ,
batch size:32 ,
temp:0.8

v5:
vector size: 128 ,
block size:128 ,
batch size:64 ,
temp:0.7 ,

v6:
head attention added ,
dataset changed (simple english sentences- he walks home etc) ,
feed forward added: self.blocks = nn.Sequential(MultiHeadAttention(...), FeedForward(...)) ,
there was a mistake while uploading weight in app.py (fixed) ,
load_state_ditch → load_state_dict ,

image

why foods fly. 300. He brushes his teethere. 278. The sun feels hair. 86. We eat dish soft song. 298. T

27.02 codes took to much time to train due to hardware issues trying to run it on gpu, until that 23.02 is last one

v7: 01.03 runs on gpu, better output quality , ++batch ++head , will try again w bigger dataset

Vocabulary size: 83 Total tokens: 57224 Model parameters: 0.35M Step 0: loss 4.7196 | Sample: 0pCel te2Nw&XnJrDOMa4z[qk3e#(g;cCg'Bnm!ltaY-u:HEY~ Step 200: loss 2.5153 | Sample: Grwavin Ithts ly. She bof 2287. ther t out inghero (...) Step 29600: loss 1.1040 | Sample: I buys of the plays witter. 426. She closes the st Step 29800: loss 1.0925 | Sample: of Zepperonic (1994)" "Yo canclin shiki bird. 3142 Model trained & weights saved...


13.03: larger dataset, 30000 epochs (0.36 parameters)
Teleman. 80. The stering tall have young from and distand down of the lonce, and charp

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages