A locally operated AI chat with document processing, based on Ollama, Chainlit and Docling.
- On-Premise: Can be set up to work locally
- Document Processing: Supports PDF, DOCX, PPTX, XLSX, HTML, Markdown and more. Intelligent document conversion with options for layout and structure preservation
- Flexible Configuration: Customizable models and parameters
- Lightweight: Few dependencies and easy setup
Install uv for environment management.
Set up Ollama as your local LLM server:
# Install ollama (e.g. Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull models, e.g.:
ollama pull hf.co/unsloth/Qwen3.5-0.8B-GGUF:q6_k
ollama pull hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:q6_k
# Currently no Qwen3.5 GGUF from Unsloth works in Ollama due to separate mmproj vision files.
# See https://unsloth.ai/docs/models/qwen3.5#qwen3.5-35b-a3b
# Instead you need to pull the Qwen-3.5 models from the Ollama library.
ollama pull ollama pull qwen3.5:0.8b
ollama pull ollama pull qwen3.5:35b
# https://docs.ollama.com/context-length
# To increase the default context size that Ollama uses, you can set the environment variable OLLAMA_CONTEXT_LENGTH. For example, to set it to 64k tokens:
export OLLAMA_CONTEXT_LENGTH=64000
# Or create an alias of your model (check Ollama documentation if this is actually needed)
ollama show hf.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF:Q6_K --modelfile > modelfile_qwen3
nano modelfile_qwen3
# To change the default context size add this line at the end:
PARAMETER num_ctx 64000
# Then create a new model with the larger context size. This does not duplicate the model files itself.
ollama create Qwen3-64k -f modelfile_qwen3
# Set this model name in config.yaml to use it in the app.Install and set up the app:
git clone https://github.com/machinelearningZH/ai-chat
cd ai-chat
uv sync
# Adjust the configuration
nano config.yaml
# Also have a look at _core/constants.py for UI texts and prompts.
nano _core/constants.py
# Adjust the chainlit configuration
nano ~/.chainlit/config.toml
# > Make sure you disable telemetry by setting:
[telemetry]
enabled = false
# Start the app (opens in browser at http://localhost:8000):
uv run chainlit run chat.py
# Or set a specific port, watch and headless mode, and more:
# https://docs.chainlit.io/backend/command-line
uv run chainlit run chat.py -w -h --port 8501We use this AI chat internally as a lightweight local AI assistant with document processing capabilities that we can operate on-premise. We like Chainlit for its simplicity and configurability. We have also experimented successfully with other frameworks like Open WebUI.
Our current go-to LLM for small on-premise servers is Qwen3.5-35B-A3B, which performs well for general-purpose tasks and works sufficiently well for the German language too.
Chantal Amrhein, Patrick Arnecke – Statistisches Amt Zürich: Team Data
We welcome feedback and contributions! Email us or open an issue or pull request.
We use ruff for linting and formatting.
This project is licensed under the MIT License. See the LICENSE file for details.
This software (the Software) incorporates open-source models (the Models) from providers like Ollama, Hugging Face, Docling and OpenAI. The app has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software as well as of the underlying Models complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.
