lilbee: a single-executable local search engine you can talk to, running every model on llama.cpp #24625

tobocop2 · 2026-06-14T20:47:21Z

tobocop2
Jun 14, 2026

Thanks for llama.cpp. It's the inference engine lilbee runs every local model on. lilbee is a local-first search engine you can talk to: it runs and manages its own models, indexes your files and code, crawls the websites you point it at into a searchable library, and answers with a citation to the source. The model manager is built on llama.cpp, so one program browses Hugging Face, pulls models, and runs them on Metal, Vulkan, or CUDA. Without llama.cpp there is no lilbee.

lilbee ships as a single executable that bundles its own llama.cpp build. Because Hugging Face has far more GGUF architectures than the pinned runtime supports at any moment, lilbee reads the architecture before downloading and tags incompatible models in the catalog, so you don't wait through a multi-GB pull only to hit "unsupported architecture" at load time. I'm now moving to a llama-server and llama-swap setup so the same one binary scales from a laptop to a multi-GPU machine. I recently verified MiniMax M2 split across three graphics cards, all managed by lilbee.

Site: https://lilbee.sh. Repo: https://github.com/tobocop2/lilbee. Multi-GPU serving is here: tobocop2/lilbee#267

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lilbee: a single-executable local search engine you can talk to, running every model on llama.cpp #24625

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

lilbee: a single-executable local search engine you can talk to, running every model on llama.cpp #24625

Uh oh!

tobocop2 Jun 14, 2026

Replies: 0 comments

tobocop2
Jun 14, 2026