toodles

Telegram × Gemini CLI — streamed responses, voice, file & photo sharing, local transcription

Quick Start · Features · Config · Architecture

A Telegram bot written in Rust that wraps gemini-cli, letting you chat with Gemini AI directly from Telegram — with real-time streaming, voice transcription, photo & file analysis, and per-topic session isolation.

✨ Features

	Feature	Details
💬	Real-time streaming	In-place draft updates while the model is generating, then final formatted commit
⏳	Instant feedback	Immediate startup placeholder (`Подключаю Gemini-сессию…`) on cold starts
🛑	Stop generation	Inline "🛑 Stop" button to cancel generation mid-stream
📝	Smart message splitting	Long responses auto-split into multiple Telegram messages at newline boundaries — no truncation
⚠️	Error feedback	Session startup and runtime errors are surfaced to the user (no silent failure)
📷	Photo analysis	Send photos (including albums) — batched via aggregator and analyzed by Gemini Vision
📄	Document handling	Send files (PDF, XLSX, etc.) — downloaded and forwarded to gemini-cli for processing
📎	File sharing	Gemini can send files back via the `ATTACH_FILE:` protocol
🧩	Message aggregation	Sequential messages within 1.5s are batched into a single prompt — handles albums, forwarded batches, and split messages
🔥	Warm session pool	Keeps prewarmed ACP sessions to reduce first-response latency (`WARM_SESSION_POOL_SIZE`)
♻️	Session startup retries	Automatic retry with backoff when ACP initialization fails transiently
🎙	Voice messages	Transcribed locally via Parakeet V3 or cloud via OpenAI Whisper
🧠	Local transcription	Offline, no API keys — NVIDIA Parakeet ONNX (int8, ~478 MB)
📌	Forum topics	Each Telegram topic gets an isolated gemini-cli session
🏷️	Thread auto-title	First message sets topic title; later updates use recent-context summaries
🔄	Session management	`/new` starts fresh, `/status` shows active count
🔒	Access control	Optional user allowlist via `ALLOWED_USER_IDS`
🖥️	macOS background service	launchd targets keep the bot running 24/7 with auto-restart
🧙	Setup wizard	Interactive `--setup` generates `.env` with guided prompts
🎨	Customisable prompt	System prompt configurable via `SYSTEM_PROMPT` in `.env`
✅	CI-gated	`check + fmt + clippy + test` on every push/PR

🚀 Quick Start

Prerequisites

Rust ≥ 1.70 — rustup.rs
gemini-cli — npm install -g @google/gemini-cli && gemini
Telegram bot token — @BotFather
ffmpeg — brew install ffmpeg (required for voice messages)
(Optional) OpenAI API key — for cloud Whisper fallback

Install & Run

git clone https://github.com/sleep3r/toodles
cd toodles

# Option A: Interactive setup wizard (recommended)
make setup

# Option B: Manual config
cp .env.example .env
$EDITOR .env

# Run
make run            # debug
make release        # optimized build
make run-release    # run optimized

# Optional: install as macOS launchd service (24/7)
make service-install

Run as macOS background service (launchd)

make service-install   # build release + install + start
make service-status    # check launchd state
make service-logs      # tail bot logs

service-install copies your project .env into ~/.config/toodles/service.env so launchd can read secrets consistently.

After code changes:

make service-update    # rebuild release + restart service

If you change .env, run make service-update to sync it into the service env file.

Stop / remove service:

make service-stop
make service-uninstall

Optional overrides (passed as Make variables):

make LAUNCHD_LABEL=com.alex.toodles service-install
make TOODLES_ENV_FILE=/path/to/.env service-install
make LAUNCHD_WORKDIR=/Users/alexander service-install

💬 How It Works

 ┌───────────┐        ┌──────────┐        ┌──────────────┐
 │ Telegram  │───────▶│ toodles  │───────▶│  gemini-cli  │
 │   user    │◀─ edit │  (Rust)  │◀─ pipe │  subprocess  │
 └───────────┘  msg   └──────────┘  stdout└──────────────┘

User sends a message (text, photo, document, or voice)
Messages are aggregated within a 1.5s window (handles albums and split messages)
On cold start, a startup status is shown while ACP session is created (or grabbed from warm pool)
A draft placeholder with 🛑 Stop is attached and updated during generation
User can press Stop at any time — generation is cancelled via CancellationToken
Final response is committed with Markdown→Telegram HTML formatting and plain-text fallback
Subsequent messages reuse the same topic/chat session automatically

🎙 Voice Transcription

toodles supports two transcription backends:

┌────────────────────┐     ┌──────────────┐     ┌───────────┐
│   Telegram Voice   │────▶│    ffmpeg     │────▶│ Parakeet  │──── text
│    (OGG Opus)      │     │  (16kHz f32)  │     │   V3 🦜   │
└────────────────────┘     └──────────────┘     └─────┬─────┘
                                                      │ fallback
                                                ┌─────▼─────┐
                                                │  OpenAI    │
                                                │ Whisper 🌐 │
                                                └───────────┘

Mode	Latency	Cost	Setup
Local (Parakeet V3)	~2-5s	Free	`--setup` downloads 478 MB model
Cloud (Whisper API)	~1-3s	~$0.006/min	Requires `OPENAI_API_KEY`

If both are enabled, local transcription is tried first with automatic cloud fallback.

⚙️ Configuration

All configuration is managed through environment variables or .env:

# Required
TELEGRAM_BOT_TOKEN=123456:ABC-DEF...

# Access control (leave empty for unrestricted)
ALLOWED_USER_IDS=123456789,987654321

# Gemini CLI
GEMINI_CLI_PATH=gemini                # path to binary
GEMINI_CLI_COMMAND=gemini --acp       # optional full ACP command
GEMINI_WORKING_DIR=/path/to/project   # optional cwd
GEMINI_YOLO=true                      # optional auto-approve mode
DRAFT_MODE=verbose                    # compact | verbose draft UX
THREAD_RENAME_EVERY=4                 # 0 disables auto-rename
WARM_SESSION_POOL_SIZE=1              # 0 disables warm prewarmed pool

# Optional: read additional settings from TOML
TOODLES_CONFIG=~/.config/toodles/config.toml

# System prompt — customise the bot's personality
SYSTEM_PROMPT=You are a helpful AI assistant. Keep answers concise.

# Voice — cloud (optional fallback)
OPENAI_API_KEY=sk-...

# Voice — local (recommended)
USE_LOCAL_TRANSCRIPTION=true
MODELS_DIR=~/.toodles/models

# Logging
RUST_LOG=info

💡 Tip: Run make setup to generate this interactively!

Optional TOML config

You can also keep settings in ~/.config/toodles/config.toml:

bot_token = "123456:ABC-DEF..."
gemini_cli_command = "gemini --acp"
gemini_working_dir = "/path/to/project"
gemini_yolo = true
draft_mode = "verbose"
thread_rename_every = 4
warm_session_pool_size = 1

You can copy config.example.toml as a starting point.

🤖 Bot Commands

Command	Description
`/start`	Get started 👋
`/new`	Start fresh 🔄
`/status`	Bot status 📊
`/thread`	Create forum thread 🧵
`/help`	Show commands 💡

/thread works in forum-enabled supergroups where the bot has topic-management rights. You can call /thread from both the main chat and existing topics; Toodles creates a new topic in the same group. The first user message in a topic sets its initial title, then Toodles refreshes the title every THREAD_RENAME_EVERY messages using the recent message context.

🧯 Cold-Start Tuning

If the first response sometimes takes too long:

Set WARM_SESSION_POOL_SIZE=1 (or 2) to keep prewarmed ACP sessions ready.
Keep GEMINI_WORKING_DIR on a local SSD path (avoid slow network mounts).
Check bot logs for repeated ACP initialize retries; transient failures are retried automatically.

If /thread fails with "not enough rights to create a topic", grant the bot admin permission to manage topics.

📐 Architecture

src/
├── main.rs             — entry point, dispatcher, bot commands
├── config.rs           — Config from env + optional TOML (single gemini profile)
├── session.rs          — ACP session lifecycle + per-chat/topic session mapping
├── aggregator.rs       — message batching with debounce window + file guard ownership
├── telegram_api.rs     — raw Telegram API (sendMessageDraft), global HTTP client
├── setup.rs            — interactive setup wizard (--setup)
├── transcription.rs    — Parakeet V3 engine + model download
└── handlers/
    ├── mod.rs           — CancelRegistry, inline stop button, draft streaming, message splitting, Markdown→HTML
    ├── message.rs       — text message handler (with aggregation)
    ├── document.rs      — document/file handler (download + aggregate + query)
    ├── photo.rs         — photo handler (download + aggregate albums + query)
    └── voice.rs         — voice handler (transcribe → query)

Session lifecycle:

stateDiagram-v2
    [*] --> New: /new or first message
    New --> Ready: session created
    Ready --> Query: user message
    Query --> Placeholder: ⏳ + 🛑 Stop button
    Placeholder --> Streaming: line-by-line via BufReader
    Streaming --> Cancelled: user clicks 🛑
    Cancelled --> Ready: ⬛ Generation stopped
    Streaming --> Ready: response committed (Markdown)
    Ready --> [*]: /new (reset)

Each chat or forum topic maps to an isolated ACP session. Queries are serialised per session via tokio::sync::Mutex and a per-session queue. Startup uses retries and an optional warm pool (WARM_SESSION_POOL_SIZE) to reduce first-token latency. During generation, the bot updates one placeholder message (draft UX), supports inline cancellation via CancellationToken, and commits a final Markdown→Telegram HTML response with plain-text fallback. Long responses are split across multiple Telegram messages at newline boundaries. Sequential messages and photo albums are aggregated via a 1.5s debounce window. Temporary files (photos, documents) are kept alive via Arc<TempFileGuard> until the query completes.

🛠 Makefile

make help          # show all targets
make build         # debug build
make release       # optimized build
make run           # run (debug)
make run-release   # run (release)
make setup         # interactive setup wizard
make test          # run tests
make lint          # clippy
make fmt           # format code
make clean         # clean artifacts
make service-install   # install/start launchd service
make service-sync-env  # copy .env into launchd service env
make service-update    # rebuild + restart launchd service
make service-stop      # stop launchd service
make service-status    # print launchd status
make service-logs      # tail service logs
make service-uninstall # remove launchd service

📄 License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
.vscode		.vscode
benches		benches
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.example.toml		config.example.toml
test_audio.ogg		test_audio.ogg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toodles

✨ Features

🚀 Quick Start

Prerequisites

Install & Run

Run as macOS background service (launchd)

💬 How It Works

🎙 Voice Transcription

⚙️ Configuration

Optional TOML config

🤖 Bot Commands

🧯 Cold-Start Tuning

📐 Architecture

🛠 Makefile

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

toodles

✨ Features

🚀 Quick Start

Prerequisites

Install & Run

Run as macOS background service (launchd)

💬 How It Works

🎙 Voice Transcription

⚙️ Configuration

Optional TOML config

🤖 Bot Commands

🧯 Cold-Start Tuning

📐 Architecture

🛠 Makefile

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages