`NO_RELEVANT_SQL` Masks Wren Engine Failures

## Summary

When `wren-engine` is unresponsive (e.g., due to CPU deadlock), `wren-ai-service` silently times out and returns `NO_RELEVANT_SQL` — the same error code used for legitimate LLM SQL generation failures. This makes it nearly impossible to distinguish an infrastructure problem from an LLM quality problem without significant manual investigation.

## Impact

Users and operators are misled into debugging LLM configuration (model selection, prompts, parameters) when the actual root cause is a failed or frozen `wren-engine` instance. In our case, this led to multiple unnecessary model switches before the real issue was identified.

## Steps to Reproduce

1. Start the full WrenAI stack via Docker Compose.
2. Allow `wren-engine` to enter a CPU-saturated deadlock state (observed at ~200% CPU, confirmed unresponsive via `curl`).
3. Submit any natural language query from the UI.
4. Observe the result: UI shows "Clarification needed"; `wren-ai-service` logs `NO_RELEVANT_SQL`.

## Debug Timeline

**Symptom**: Every query returns "Clarification needed" in the UI.

**Step 1 — Checked `wren-ai-service` logs**
```
E wren-ai-service] ask pipeline - NO_RELEVANT_SQL: How many orders are there in total?
```
Log only shows the final outcome. No indication of *why* SQL generation failed.

**Step 2 — Mistakenly suspected LLM quality**
Switched models in order: `gpt-4o-mini` → `gpt-4o` → `claude-3-haiku` → back to `gpt-4o-mini`. All produced the same failure. Time spent: significant.

**Step 3 — Checked `wren-ui` logs, found the actual SQL**
```
[DEBUG] WrenEngineAdaptor - Dry run wren engine with body:
"SELECT COUNT(DISTINCT "olist_orders_dataset"."order_id") AS "total_orders" FROM "olist_orders_dataset""
```
The generated SQL was syntactically valid. The dry run was sent to `wren-engine` at `07:50:38` and the task failed at `07:51:09` — exactly 31 seconds later (matching the `engine_timeout: 30` setting).

**Step 4 — Tested `wren-engine` directly**
```bash
docker exec wrenai-wren-ui-1 curl -s --max-time 5 http://wren-engine:8080/v1/mdl
# exit code 28 (timeout) — engine is not responding
```

**Step 5 — Checked container resource usage**
```bash
docker stats wrenai-wren-engine-1 --no-stream
# CPU: 203%, MEM: 674MB — engine is deadlocked
```

**Resolution**: Restarted `wren-engine`. CPU dropped to normal, engine became responsive, queries succeeded.

## Root Cause

`wren-engine` entered a CPU deadlock and stopped responding to HTTP requests. `wren-ai-service` waited until `engine_timeout` (30s) and then classified the result as `NO_RELEVANT_SQL` with no logging of the actual cause.

## Proposed Improvements

1. **Distinguish timeout from SQL failure**: When a dry run times out or returns a connection error, log and surface a distinct error code (e.g., `ENGINE_TIMEOUT` or `ENGINE_UNAVAILABLE`) instead of `NO_RELEVANT_SQL`.

2. **Add explicit timeout logging in `wren-ai-service`**: When the engine call fails due to timeout, log the actual exception/reason, not just the pipeline outcome.

3. **Surface the error to the UI**: Show a different message when the engine is unreachable (e.g., "Unable to reach SQL engine") vs. when no SQL could be generated. This gives operators an actionable signal immediately.

## Environment

- `wren-ai-service` version: 0.29.0
- `wren-engine` version: 0.22.0
- `wren-ui` version: 0.32.2
- LLM: OpenRouter / `gpt-4o-mini`
- Embedder: Gemini / `gemini-embedding-001`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`NO_RELEVANT_SQL` Masks Wren Engine Failures #2153

Summary

Impact

Steps to Reproduce

Debug Timeline

Root Cause

Proposed Improvements

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NO_RELEVANT_SQL Masks Wren Engine Failures #2153

Description

Summary

Impact

Steps to Reproduce

Debug Timeline

Root Cause

Proposed Improvements

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`NO_RELEVANT_SQL` Masks Wren Engine Failures #2153