Telegram Γ Gemini CLI β streamed responses, voice, file & photo sharing, local transcription
Quick Start Β· Features Β· Config Β· Architecture
A Telegram bot written in Rust that wraps gemini-cli, letting you chat with Gemini AI directly from Telegram β with real-time streaming, voice transcription, photo & file analysis, and per-topic session isolation.
| Feature | Details | |
|---|---|---|
| π¬ | Real-time streaming | In-place draft updates while the model is generating, then final formatted commit |
| β³ | Instant feedback | Immediate startup placeholder (ΠΠΎΠ΄ΠΊΠ»ΡΡΠ°Ρ Gemini-ΡΠ΅ΡΡΠΈΡβ¦) on cold starts |
| π | Stop generation | Inline "π Stop" button to cancel generation mid-stream |
| π | Smart message splitting | Long responses auto-split into multiple Telegram messages at newline boundaries β no truncation |
| Error feedback | Session startup and runtime errors are surfaced to the user (no silent failure) | |
| π· | Photo analysis | Send photos (including albums) β batched via aggregator and analyzed by Gemini Vision |
| π | Document handling | Send files (PDF, XLSX, etc.) β downloaded and forwarded to gemini-cli for processing |
| π | File sharing | Gemini can send files back via the ATTACH_FILE: protocol |
| π§© | Message aggregation | Sequential messages within 1.5s are batched into a single prompt β handles albums, forwarded batches, and split messages |
| π₯ | Warm session pool | Keeps prewarmed ACP sessions to reduce first-response latency (WARM_SESSION_POOL_SIZE) |
| β»οΈ | Session startup retries | Automatic retry with backoff when ACP initialization fails transiently |
| π | Voice messages | Transcribed locally via Parakeet V3 or cloud via OpenAI Whisper |
| π§ | Local transcription | Offline, no API keys β NVIDIA Parakeet ONNX (int8, ~478 MB) |
| π | Forum topics | Each Telegram topic gets an isolated gemini-cli session |
| π·οΈ | Thread auto-title | First message sets topic title; later updates use recent-context summaries |
| π | Session management | /new starts fresh, /status shows active count |
| π | Access control | Optional user allowlist via ALLOWED_USER_IDS |
| π₯οΈ | macOS background service | launchd targets keep the bot running 24/7 with auto-restart |
| π§ | Setup wizard | Interactive --setup generates .env with guided prompts |
| π¨ | Customisable prompt | System prompt configurable via SYSTEM_PROMPT in .env |
| β | CI-gated | check + fmt + clippy + test on every push/PR |
- Rust β₯ 1.70 β rustup.rs
- gemini-cli β
npm install -g @google/gemini-cli && gemini - Telegram bot token β @BotFather
- ffmpeg β
brew install ffmpeg(required for voice messages) - (Optional) OpenAI API key β for cloud Whisper fallback
git clone https://github.com/sleep3r/toodles
cd toodles
# Option A: Interactive setup wizard (recommended)
make setup
# Option B: Manual config
cp .env.example .env
$EDITOR .env
# Run
make run # debug
make release # optimized build
make run-release # run optimized
# Optional: install as macOS launchd service (24/7)
make service-installmake service-install # build release + install + start
make service-status # check launchd state
make service-logs # tail bot logsservice-install copies your project .env into ~/.config/toodles/service.env
so launchd can read secrets consistently.
After code changes:
make service-update # rebuild release + restart serviceIf you change .env, run make service-update to sync it into the service env file.
Stop / remove service:
make service-stop
make service-uninstallOptional overrides (passed as Make variables):
make LAUNCHD_LABEL=com.alex.toodles service-install
make TOODLES_ENV_FILE=/path/to/.env service-install
make LAUNCHD_WORKDIR=/Users/alexander service-install βββββββββββββ ββββββββββββ ββββββββββββββββ
β Telegram βββββββββΆβ toodles βββββββββΆβ gemini-cli β
β user βββ edit β (Rust) βββ pipe β subprocess β
βββββββββββββ msg ββββββββββββ stdoutββββββββββββββββ
- User sends a message (text, photo, document, or voice)
- Messages are aggregated within a 1.5s window (handles albums and split messages)
- On cold start, a startup status is shown while ACP session is created (or grabbed from warm pool)
- A draft placeholder with π Stop is attached and updated during generation
- User can press Stop at any time β generation is cancelled via
CancellationToken - Final response is committed with MarkdownβTelegram HTML formatting and plain-text fallback
- Subsequent messages reuse the same topic/chat session automatically
toodles supports two transcription backends:
ββββββββββββββββββββββ ββββββββββββββββ βββββββββββββ
β Telegram Voice ββββββΆβ ffmpeg ββββββΆβ Parakeet βββββ text
β (OGG Opus) β β (16kHz f32) β β V3 π¦ β
ββββββββββββββββββββββ ββββββββββββββββ βββββββ¬ββββββ
β fallback
βββββββΌββββββ
β OpenAI β
β Whisper π β
βββββββββββββ
| Mode | Latency | Cost | Setup |
|---|---|---|---|
| Local (Parakeet V3) | ~2-5s | Free | --setup downloads 478 MB model |
| Cloud (Whisper API) | ~1-3s | ~$0.006/min | Requires OPENAI_API_KEY |
If both are enabled, local transcription is tried first with automatic cloud fallback.
All configuration is managed through environment variables or .env:
# Required
TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
# Access control (leave empty for unrestricted)
ALLOWED_USER_IDS=123456789,987654321
# Gemini CLI
GEMINI_CLI_PATH=gemini # path to binary
GEMINI_CLI_COMMAND=gemini --acp # optional full ACP command
GEMINI_WORKING_DIR=/path/to/project # optional cwd
GEMINI_YOLO=true # optional auto-approve mode
DRAFT_MODE=verbose # compact | verbose draft UX
THREAD_RENAME_EVERY=4 # 0 disables auto-rename
WARM_SESSION_POOL_SIZE=1 # 0 disables warm prewarmed pool
# Optional: read additional settings from TOML
TOODLES_CONFIG=~/.config/toodles/config.toml
# System prompt β customise the bot's personality
SYSTEM_PROMPT=You are a helpful AI assistant. Keep answers concise.
# Voice β cloud (optional fallback)
OPENAI_API_KEY=sk-...
# Voice β local (recommended)
USE_LOCAL_TRANSCRIPTION=true
MODELS_DIR=~/.toodles/models
# Logging
RUST_LOG=infoπ‘ Tip: Run
make setupto generate this interactively!
You can also keep settings in ~/.config/toodles/config.toml:
bot_token = "123456:ABC-DEF..."
gemini_cli_command = "gemini --acp"
gemini_working_dir = "/path/to/project"
gemini_yolo = true
draft_mode = "verbose"
thread_rename_every = 4
warm_session_pool_size = 1You can copy config.example.toml as a starting point.
| Command | Description |
|---|---|
/start |
Get started π |
/new |
Start fresh π |
/status |
Bot status π |
/thread |
Create forum thread π§΅ |
/help |
Show commands π‘ |
/thread works in forum-enabled supergroups where the bot has topic-management rights.
You can call /thread from both the main chat and existing topics; Toodles creates a new topic in the same group.
The first user message in a topic sets its initial title, then Toodles refreshes the title every THREAD_RENAME_EVERY messages using the recent message context.
If the first response sometimes takes too long:
- Set
WARM_SESSION_POOL_SIZE=1(or2) to keep prewarmed ACP sessions ready. - Keep
GEMINI_WORKING_DIRon a local SSD path (avoid slow network mounts). - Check bot logs for repeated ACP initialize retries; transient failures are retried automatically.
If /thread fails with "not enough rights to create a topic", grant the bot admin permission to manage topics.
src/
βββ main.rs β entry point, dispatcher, bot commands
βββ config.rs β Config from env + optional TOML (single gemini profile)
βββ session.rs β ACP session lifecycle + per-chat/topic session mapping
βββ aggregator.rs β message batching with debounce window + file guard ownership
βββ telegram_api.rs β raw Telegram API (sendMessageDraft), global HTTP client
βββ setup.rs β interactive setup wizard (--setup)
βββ transcription.rs β Parakeet V3 engine + model download
βββ handlers/
βββ mod.rs β CancelRegistry, inline stop button, draft streaming, message splitting, MarkdownβHTML
βββ message.rs β text message handler (with aggregation)
βββ document.rs β document/file handler (download + aggregate + query)
βββ photo.rs β photo handler (download + aggregate albums + query)
βββ voice.rs β voice handler (transcribe β query)
Session lifecycle:
stateDiagram-v2
[*] --> New: /new or first message
New --> Ready: session created
Ready --> Query: user message
Query --> Placeholder: β³ + π Stop button
Placeholder --> Streaming: line-by-line via BufReader
Streaming --> Cancelled: user clicks π
Cancelled --> Ready: β¬ Generation stopped
Streaming --> Ready: response committed (Markdown)
Ready --> [*]: /new (reset)
Each chat or forum topic maps to an isolated ACP session. Queries are serialised per session via tokio::sync::Mutex and a per-session queue. Startup uses retries and an optional warm pool (WARM_SESSION_POOL_SIZE) to reduce first-token latency. During generation, the bot updates one placeholder message (draft UX), supports inline cancellation via CancellationToken, and commits a final MarkdownβTelegram HTML response with plain-text fallback. Long responses are split across multiple Telegram messages at newline boundaries. Sequential messages and photo albums are aggregated via a 1.5s debounce window. Temporary files (photos, documents) are kept alive via Arc<TempFileGuard> until the query completes.
make help # show all targets
make build # debug build
make release # optimized build
make run # run (debug)
make run-release # run (release)
make setup # interactive setup wizard
make test # run tests
make lint # clippy
make fmt # format code
make clean # clean artifacts
make service-install # install/start launchd service
make service-sync-env # copy .env into launchd service env
make service-update # rebuild + restart launchd service
make service-stop # stop launchd service
make service-status # print launchd status
make service-logs # tail service logs
make service-uninstall # remove launchd serviceMIT β see LICENSE.
