Smaill

small language model written and train fully locally

start date: 10.02

Chaos made character total: 42373 characters:
!"#%&'()+,-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxyz~
vocab size: 83

v1 (repo):
no memory
64 vector size
32 token length
no memory
2000 training loop

output example: wel. nng. Ul Jis.","Theresolivip paloop promeve bestimeatofrace,Sht,"Evetelili,"Ed mbode daninukitmy

v2:
brain cells added - a little logic + memo to get meaningful output
batch_size = 64

v3:
no progress ,
ui added ,
opens via localhost

to run code: streamlit run app.py
v4:
trying to solve nonsenseful randomness ,
i tried temp already so top k sampling or top p sampling might solve issue ,
vector size: 64 ,
block size:32 ,
batch size:32 ,
temp:0.8

v5:
vector size: 128 ,
block size:128 ,
batch size:64 ,
temp:0.7 ,

v6:
head attention added ,
dataset changed (simple english sentences- he walks home etc) ,
feed forward added: self.blocks = nn.Sequential(MultiHeadAttention(...), FeedForward(...)) ,
there was a mistake while uploading weight in app.py (fixed) ,
load_state_ditch → load_state_dict ,

why foods fly. 300. He brushes his teethere. 278. The sun feels hair. 86. We eat dish soft song. 298. T

27.02 codes took to much time to train due to hardware issues trying to run it on gpu, until that 23.02 is last one

v7: 01.03 runs on gpu, better output quality , ++batch ++head , will try again w bigger dataset

Vocabulary size: 83 Total tokens: 57224 Model parameters: 0.35M Step 0: loss 4.7196 | Sample: 0pCel te2Nw&XnJrDOMa4z[qk3e#(g;cCg'Bnm!ltaY-u:HEY~ Step 200: loss 2.5153 | Sample: Grwavin Ithts ly. She bof 2287. ther t out inghero (...) Step 29600: loss 1.1040 | Sample: I buys of the plays witter. 426. She closes the st Step 29800: loss 1.0925 | Sample: of Zepperonic (1994)" "Yo canclin shiki bird. 3142 Model trained & weights saved...

13.03: larger dataset, 30000 epochs (0.36 parameters)
Teleman. 80. The stering tall have young from and distand down of the lonce, and charp

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
Smaill		Smaill
LICENSE		LICENSE
README.md		README.md
how to run it.txt		how to run it.txt
sources.txt		sources.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smaill

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smaill

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages