Note
This doc is generated by AI because what else can we use
ledoxide is a specialized, client-pulling based HTTP server designed to implement a Vision-Language Model (VLM) based bookkeeping and expense extraction workflow. Its primary goal is to process images of receipts, invoices, or screenshotted transaction records (e.g., social media purchase notifications) and autonomously extract structured billing data: descriptions (notes), exact monetary amounts, and appropriate expense categorization.
The server delegates model execution to an Ollama daemon through the ollama-rs API.
The application is containerized and readily available via Docker.
Run Ollama separately, then point ledoxide at that Ollama instance with OLLAMA_HOST. The value must be in host:port form, without an http:// or https:// scheme.
On Docker Desktop, host.docker.internal:11434 usually reaches Ollama running on the host:
docker run -p 3100:3100 \
-e OLLAMA_HOST="host.docker.internal:11434" \
-e AUTH_KEY="your_secret_bearer_token" \
zhufucdev/ledoxide:latestOn Linux, you can also run the container on the host network and use the default 127.0.0.1:11434 Ollama endpoint:
docker run --network host \
-e AUTH_KEY="your_secret_bearer_token" \
zhufucdev/ledoxide:latestAdd this repo as flake input and use the provided service.
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
ledoxide = {
url = "github:zhufucdev/ledoxide";
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs =
inputs@{
self,
nixpkgs,
ledoxide,
...
}:
{
nixosConfigurations.functionaltux = nixpkgs.lib.nixosSystem {
modules = [
{...}: {
services.ledoxide = {
enable = true;
authKey = "your_secret_bearer_token";
extraEnv = "OLLAMA_HOST=127.0.0.1:11434";
};
},
ledoxide.nixosModules.ledoxide # omit this if you only want the standalone package!
ledoxide.nixosModules.package
];
};
};
}| Variable | Description |
|---|---|
AUTH_KEY |
Used as the Bearer token to protect endpoints. If not provided via flag or env var, a random key is generated and logged on startup. |
OLLAMA_HOST |
Ollama endpoint in host:port form. Defaults to 127.0.0.1:11434. Do not include a URL scheme. |
RUST_LOG |
Set to debug to enable verbose application logging, including prompts and Ollama responses. |
When running natively or overriding the Docker command, the following arguments are supported:
-b, --bind <BIND>: The address to bind to (default:127.0.0.1:3100).-a, --auth-key <AUTH_KEY>: Bearer token for protected endpoints. If omitted,AUTH_KEYis read from the environment or a random key is generated.-c, --categories <CATEGORIES>: A list of valid categories for expenses (defaults include Groceries, Transport, Rent, Entertainment, Shopping, Drink, and Food).--max-concurrency <N>: Maximum number of concurrent Ollama task runners (default: 4).--max-memory-size <N>: Number of finished task records to keep in memory before swapping older records to disk (default: 468,000).--large-model: Use the larger Ollama model configuration (gemma4:26binstead ofgemma4:e4b).--model-timeout-minutes <MINS>: Time before an inactive model is evicted from RAM/VRAM to save resources (default: 5).--offline: Do not ask Ollama to pull or create models on startup; requires the configured models to already exist in Ollama.
The server exposes a simple REST API:
-
GET /Returns the server package name and version string. -
POST /create_taskAccepts amultipart/form-datapayload containing an image file or zip archive (key:image) and optionallylm_options,vlm_options, andcategoriesJSON fields. Requires:Authorization: Bearer <AUTH_KEY>header. Returns: A JSONTaskControlBlockcontaining a unique task ID indicating the task is pending. -
GET /get_task/{task_id}Checks the status of a specific task by ID. Requires:Authorization: Bearer <AUTH_KEY>header. Returns: The task state (pending,running, orfinished). Iffinished, it includes the extracted structured data:notes,amount, andcategory.
- Architecture: The application is written in Rust, leveraging
tokiofor its async runtime andaxumfor HTTP routing. - Inference API: It uses
ollama-rsto call an external Ollama daemon. Ollama owns model downloads, quantization, GPU/CPU execution, and model residency. - Structured Output: Extraction requests use Ollama structured JSON formats backed by Rust schemas to keep notes, amount, and category parsing strict.
- Model Pipeline: The default pipeline uses
gemma4:e4bfor captioning and extraction. With--large-model, both stages usegemma4:26b.
- Model Memory Timeout: To preserve system RAM and GPU VRAM,
ledoxideunloads inactive Ollama models after the configurable timeout period (default 5 minutes). Ollama reloads them on the next request. - Task Swapping: To prevent the server's memory from bloating with historical task data over long uptimes, the internal
Schedulerimplements an on-disk swap queue. When the in-memory finished queue exceeds--max-memory-size(default: 468,000 items), older finished tasks are serialized usingpostcardand flushed to a temporary swap file on disk. The/get_taskendpoint streams over both active memory and the disk swap seamlessly. - Model Pulling: Unless
--offlineis set, startup checks Ollama for the configured models and pulls or creates them when missing. Ollama manages its own model storage.
- Task Removal: Finished tasks remain in memory or the on-disk swap file indefinitely. There is currently no API to "delete" or "acknowledge" a task to free its disk footprint once retrieved. Over extreme uptimes on busy servers, the swap file could grow continuously.
- Ollama Availability:
ledoxideexpects Ollama to be reachable before tasks are created. IfOLLAMA_HOSTpoints at the wrong address or the daemon is down, model pulls and task execution will fail. - Model Availability: The default model is
gemma4:e4b;--large-modelusesgemma4:26b. If these models are not available from your Ollama registry or local store, pre-create compatible models or run with models already present and--offline.