Production-grade multi-tenant conversational AI platform — LLM-powered free-text understanding, declarative YAML state machines, one isolated container per client, pluggable connector framework with circuit breakers and retries.
Each client runs its own container from the same image. Configuration alone determines behavior — adding a new tenant requires zero code changes and zero redeployment.
┌──────────────────────────────────────────────────────┐
│ CONTROL PLANE │
│ (shared VM — :8001) │
│ │
│ Tenant & Identity · Flow Authoring · LLM Knowledge │
│ Task Scheduler · Observability · Tenant Orchestrator │
│ │
│ PostgreSQL (shared DB) │
└───────────────────────┬──────────────────────────────┘
│ boot-time config pull
┌──────────┼──────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Tenant A │ │Tenant B │ │Tenant C │
│:8002 │ │:8003 │ │... │
│ │ │ │ │ │
│Channel │ │Channel │ │Channel │
│Adapter │ │Adapter │ │Adapter │
│ ↓ │ │ ↓ │ │ ↓ │
│Bot │ │Bot │ │Bot │
│Engine │ │Engine │ │Engine │
│ ↙ ↘ │ │ ↙ ↘ │ │ ↙ ↘ │
│FSM LLM│ │FSM LLM│ │FSM LLM│
│ ↓ │ │ ↓ │ │ ↓ │
│Connector│ │Connector│ │Connector│
│Registry │ │Registry │ │Registry │
│+ Sched. │ │+ Sched. │ │+ Sched. │
└─────────┘ └─────────┘ └─────────┘
DATA PLANE — same image, different config
| Plane | Instances | Responsibility |
|---|---|---|
| Control Plane | 1 shared | Manages tenants, flows, credentials, LLM knowledge, scheduler visibility, observability |
| Data Plane | 1 per client | Receives messages, runs the FSM, invokes connectors, generates responses |
Boot-time config pull. Containers are stateless relative to the Control Plane. At startup each container fetches its full configuration (flow, connector bindings, LLM knowledge, encrypted credentials) via an authenticated API call. No runtime coupling between planes — a CP outage doesn't take down live bots.
SQLite per container. Conversation state and the scheduler's job store live in a SQLite volume scoped to the container. Cross-tenant data access is impossible by construction.
Embedded scheduler. Task scheduling (reminders, recurring jobs) runs inside each Data Plane container via APScheduler + SQLite jobstore. No CP→container network round-trip per dispatch; jobs survive container restarts.
Hexagonal ports throughout. ChannelAdapter, ConnectorPort, StateStorePort, and SchedulerPort are abstract interfaces. Swapping WhatsApp for Telegram, or Google Calendar for Outlook, is an adapter swap — zero engine changes.
The platform extends the deterministic FSM with LLM-powered free-text understanding as a first-class connector. When a user sends a message that matches no flow transition, the engine can delegate to an LLM before falling back to the default state — all without leaving the current state.
User message ──→ Bot Engine
│
FSM transition match?
├─ YES ──→ execute transition (deterministic, always first)
└─ NO ──→ llm_fallback enabled for this state?
├─ YES ──→ LLMConnector.answer(question, knowledge, context)
│ │
│ response text sent to user
│ current_state unchanged
└─ NO ──→ FSM fallback state
Key design constraints:
- LLM only generates text. It never transitions states, never executes connectors, never mutates conversation data. The FSM remains fully in control.
- Knowledge in context, no vector DB. Each tenant uploads a versioned markdown document (business info, FAQs, hours, prices). Small-business knowledge fits entirely in the prompt — zero extra infrastructure.
- Safe fallback. Out-of-scope questions get a canned "I don't have that information" response, never a hallucination.
- Per-tenant cost control. Usage counters with configurable monthly token quotas; quota exceeded triggers the safe fallback.
- Fully testable without credentials.
MockLLMAdapteris scriptable per test scenario —make testpasses with no API keys or network.
# In the tenant's boot config (delivered by Control Plane)
llm:
enabled: true
provider: anthropic # claude-haiku class — fast and cheap for FAQ
knowledge_version: v3 # versioned in CP like flows; hot-swappable
quota_monthly_tokens: 500000
# In any flow state
MENU:
llm_fallback: true # free-text messages that match no transition go to LLM
fallback: MENU # FSM fallback if LLM is disabled or quota exceeded| Message type | Handler |
|---|---|
| Button press / list selection | FSM transition (deterministic) |
| Recognized text payload | FSM transition (deterministic) |
| Free text, in knowledge scope | LLM → answer from knowledge doc |
| Free text, out of knowledge scope | Safe fallback message |
| Free text, quota exceeded | Safe fallback message |
| Component | Choice |
|---|---|
| Language | Python 3.14 |
| Web framework | FastAPI + uvicorn |
| Dependency manager | uv |
| Control Plane DB | PostgreSQL 16 + SQLAlchemy async + Alembic |
| Data Plane state | SQLite (per-container volume) |
| Scheduler | APScheduler 3.x + SQLite jobstore |
| Credential encryption | Fernet (symmetric, key managed by CP) |
| LLM provider | Anthropic (claude-haiku class) |
| Linter / formatter | ruff |
| Type checking | mypy |
| Tests | pytest |
# uv (dependency manager)
curl -LsSf https://astral.sh/uv/install.sh | sh# 1. Install dependencies
uv sync
# 2. Start Control Plane → http://localhost:8001
make run-control-plane
# 3. Start Data Plane with a dev tenant
TENANT_CONFIG_PATH=tests/configs/dev_tenant.yaml make run-data-plane
# → http://localhost:8002make up # Control Plane + Data Plane + PostgreSQL
make down # stop (keeps DB)
make clean # stop and wipe DB| Service | Port |
|---|---|
| Control Plane | localhost:8001 |
| Data Plane | localhost:8002 |
| PostgreSQL | localhost:5433 |
.
├── control_plane/ # Control plane (platform management)
│ ├── main.py # FastAPI app · /health
│ └── Dockerfile
│
├── data_plane/ # Data plane (per-tenant bot runtime)
│ ├── main.py # FastAPI app · lifespan · /health
│ ├── config.py # TenantConfig — loads YAML / CP boot payload
│ ├── engine/
│ │ ├── bot.py # Bot — central orchestrator
│ │ ├── interpreter.py # FSM interpreter: states, transitions, LLM fallback
│ │ ├── flow.py # Flow dataclasses + YAML loader
│ │ ├── outputs.py # Output types (Text, Buttons, List…)
│ │ └── degradation.py # Output degradation by channel capabilities
│ ├── ports/
│ │ ├── channel_adapter.py # ABC ChannelAdapter
│ │ ├── connector.py # ABC ConnectorPort
│ │ ├── state_store.py # ABC StateStorePort
│ │ └── scheduler.py # ABC SchedulerPort
│ ├── adapters/
│ │ ├── channel/
│ │ │ ├── whatsapp.py # WhatsAppAdapter (Cloud API v19, HMAC verified)
│ │ │ ├── http_dev.py # HttpDevChannelAdapter (local dev)
│ │ │ └── factory.py # channel_factory(config) → (adapter, router)
│ │ ├── connectors/
│ │ │ ├── google_calendar/ # GoogleCalendarAdapter
│ │ │ │ ├── adapter.py # Implements CalendarConnector
│ │ │ │ ├── client.py # Service account auth (thread-local)
│ │ │ │ ├── engine.py # compute_slots() — pure function
│ │ │ │ ├── repository.py # Raw Calendar API fetches
│ │ │ │ ├── mutations.py # create/cancel/confirm/mark events
│ │ │ │ ├── queries.py # High-level reads
│ │ │ │ └── parser.py # Description parsing + normalization
│ │ │ ├── llm/
│ │ │ │ ├── adapter.py # AnthropicLLMAdapter (implements LLMConnector)
│ │ │ │ └── mock.py # MockLLMAdapter (scriptable for tests)
│ │ │ └── mock_calendar.py # MockCalendarAdapter (tests)
│ │ ├── scheduler/
│ │ │ ├── apscheduler.py # APSchedulerAdapter (SQLite jobstore)
│ │ │ └── noop.py # NoopScheduler (tests)
│ │ └── state_store/
│ │ ├── sqlite.py # SQLiteStateStore (production)
│ │ └── in_memory.py # InMemoryStateStore (tests)
│ └── connectors/
│ ├── registry.py # ConnectorRegistry: category → implementation
│ ├── circuit_breaker.py # Circuit breaker: CLOSED / OPEN / HALF_OPEN
│ └── categories/
│ ├── calendar.py # CalendarConnector ABC
│ ├── llm.py # LLMConnector ABC
│ └── notification.py # NotificationConnector ABC
│
├── shared/
│ ├── domain/
│ │ ├── messages.py # InternalMessage (channel-agnostic)
│ │ └── conversation.py # ConversationState
│ └── connectors/
│ └── schemas/ # JSON Schemas for connector configs (autodiscovery)
│
├── flows/
│ └── peluqueria_flow.yaml # Example flow: hairdresser appointment booking
│
├── tests/
│ ├── configs/ # Tenant YAMLs for tests
│ └── flows/ # Toy flows for tests
│
├── docker-compose.yml
├── Makefile
└── pyproject.toml
The bot executes a state graph defined in YAML. The same generic engine runs any flow for any client — the flow is data, not code.
id: my_flow_v1
initial_state: MENU
global_transitions:
- on_payload: "back_to_menu"
target: MENU
states:
MENU:
llm_fallback: true # free-text questions answered from business knowledge
on_enter:
- action: send_interactive_buttons
body: "Hi! What would you like to do?"
buttons:
- id: "menu_book"
title: "Book appointment"
- id: "menu_cancel"
title: "Cancel appointment"
transitions:
- on_payload: "menu_book"
target: BOOK_SELECT_SERVICE
- on_payload: "menu_cancel"
target: CANCEL_SELECT
fallback: MENU
BOOK_SELECT_SERVICE:
on_enter:
- action: invoke_connector
connector: calendar
operation: get_available_days
params:
from_date: "{{data.today}}"
lookahead_days: 14
result_key: available_days
- action: send_dynamic_options
source_key: "available_days"
text: "Which day works for you?"
empty_text: "No days available right now."
transitions:
- on_payload_prefix: "day_"
extract_suffix_as: "selected_date"
target: BOOK_CONFIRM
fallback: MENU| Action | Description |
|---|---|
send_text |
Plain text message |
send_interactive_buttons |
Up to 3 buttons (WhatsApp) |
send_interactive_list |
Option list (up to 10 rows on WhatsApp) |
send_dynamic_options |
List built dynamically from a connector result |
invoke_connector |
Call an external connector (calendar, LLM, etc.) |
schedule_task |
Schedule a one-off or recurring task |
cancel_task |
Cancel a previously scheduled task by idempotency key |
| Field | Effect |
|---|---|
on_payload |
Exact match on button/list payload |
on_payload_prefix |
Payload starts with prefix; extract_suffix_as stores suffix in data |
on_type |
Message type (text, button, list) |
condition |
Expression over data (e.g. "data.selected_service") |
set_data |
Write values into data before entering the next state |
Each Data Plane container loads its configuration from the Control Plane at boot (or from a local YAML pointed to by TENANT_CONFIG_PATH in dev).
# tests/configs/dev_tenant.yaml
tenant_id: my_business
flow_path: flows/peluqueria_flow.yaml
channel:
type: http_dev # http_dev | whatsapp
connectors:
calendar:
type: mock # mock | mock_calendar | google_calendar
llm:
type: mock_llm # mock_llm | anthropicchannel:
type: whatsapp
phone_number_id: "1234567890"
access_token: "EAAxxxxxxx"
app_secret: "abc123"
verify_token: "my_verify_token"connectors:
calendar:
type: google_calendar
credentials_path: "/path/to/service_account.json"
calendar_id: "business@group.calendar.google.com"
timezone: "Europe/Madrid"
slot_duration_min: 30
lookahead_days_client: 14
lookahead_days_manual: 60
schedule:
mon: ["10:00-14:00", "17:00-21:00"]
tue: ["10:00-14:00", "17:00-21:00"]
wed: ["10:00-14:00", "17:00-21:00"]
thu: ["10:00-14:00", "17:00-21:00"]
fri: ["10:00-14:00", "17:00-21:00"]
sat: ["10:00-14:00"]See tests/configs/google_calendar_tenant.yaml.example for the full template.
| Channel | Class | Use |
|---|---|---|
| WhatsApp Cloud API | WhatsAppAdapter |
Production — HMAC verified, buttons, lists, templates |
| HTTP Dev | HttpDevChannelAdapter |
Local dev — POST /inbound + GET /messages (pull model) |
# Start the data plane with the http_dev channel
TENANT_CONFIG_PATH=tests/configs/dev_tenant.yaml make run-data-plane
# Send a message
curl -X POST http://localhost:8002/inbound \
-H "Content-Type: application/json" \
-d '{"contact_id": "user_1", "text": "hello"}'
# Read the bot's responses
curl http://localhost:8002/messagesFlows reference connectors by category (connector: calendar); the tenant config selects the concrete implementation (type: google_calendar). The ConnectorRegistry resolves the mapping at runtime — the engine knows nothing about specific adapters.
Cross-cutting concerns applied centrally (not inside each connector):
- Exponential backoff retries (tenacity)
- Circuit breaker: CLOSED → OPEN → HALF_OPEN
- Configurable timeouts
- Structured logging
| Connector | Category | Config type | Status |
|---|---|---|---|
GoogleCalendarAdapter |
CalendarConnector |
google_calendar |
Implemented |
AnthropicLLMAdapter |
LLMConnector |
anthropic |
Implemented |
MockCalendarAdapter |
CalendarConnector |
mock_calendar |
Tests |
MockLLMAdapter |
LLMConnector |
mock_llm |
Tests |
MockConnector |
Generic | mock |
Tests / dev |
Each container runs an embedded APScheduler instance with a SQLite jobstore (in the container volume):
- Scheduled jobs survive container restarts
- The YAML flow defines
tasks:(ad-hoc) andstanding_schedules:(registered at boot) - Flow actions:
schedule_task/cancel_taskwith idempotency keys - Dispatch: APScheduler →
POST /run-task(internal) →TaskExecutor→ Channel Adapter - 5-minute misfire grace window
tasks:
reminder:
- action: send_text
text: "Reminder: your appointment is tomorrow at {{data.time}}."
# In a booking state:
on_enter:
- action: schedule_task
task: reminder
idempotency_key: "reminder:{{data.event_id}}"
run_at: "{{data.reminder_time}}"make run-control-plane # Start Control Plane on localhost:8001
make run-data-plane # Start Data Plane on localhost:8002 (requires TENANT_CONFIG_PATH)
make up # Full stack via Docker Compose
make down # Stop containers (keeps DB)
make clean # Stop containers and wipe DB (down -v)
make test # Run all tests
make lint # ruff check + mypy
make format # ruff formatmake test # all tests
uv run pytest tests/test_bot_engine.py -v # single file
uv run pytest -k "peluqueria" -v # by nameNo credentials or network required. All external APIs (Calendar, WhatsApp, LLM) are mocked.