PDF Assistant

Using llms to chat with pdfs.

TODO

graph LR
em_model[Embeddings Model]
vs[Vector Store]
llm[LLM]
q[Query]
subgraph ingestion
pdf[PDF] --> em_model --save--> vs
end
subgraph chat
q --> em_model --search--> vs
q --> llm
vs --top k text extracts--> llm
llm --> answer
end

Notes:

The pdf extract is bad. The resulting text contains a lot of noise.

Tuning params would be tricky. What are we optimizing for? Creating some tests would be nice. Simple example queries would be fine as test. Observing the system's answers on it would be a good indicator of its performance.

The model seems to have its own knowledge that it uses to answer the Qs without the context. We should consider its previous knowledge as information as well, or maybe not.

In total two models will be used. One for the embeddings and one for text generation. Can a single Transformer be used? A transformer does have an encoder that "encodes" the text

Applications

Research papers
Financial Docs
News articles

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
atomic_habits.txt		atomic_habits.txt
chat.py		chat.py
llm.py		llm.py
make_index.py		make_index.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Assistant

TODO

About

Uh oh!

Releases

Packages

Languages

fakhirali/DocLLM

Folders and files

Latest commit

History

Repository files navigation

PDF Assistant

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages