[Bug] RunSkillScriptTool retries SCRIPT_NOT_FOUND indefinitely; default max_llm_calls=500 is the only backstop

## 🔴 Required Information

**Describe the Bug:**

`RunSkillScriptTool.run_async` returns `SCRIPT_NOT_FOUND` as a structured soft-error string when a script path passed by the LLM does not exist in the skill's `scripts/` directory. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as a transient/recoverable failure and retries — but critically, **it hallucinates a different plausible script path on every retry**, not the same path. Nothing in `SkillToolset` tracks total failures across paths, so the loop continues until `RunConfig.max_llm_calls` is exhausted.

`max_llm_calls` defaults to **500** (`src/google/adk/agents/run_config.py`). This means a single invocation can silently consume the entire per-invocation call budget on repeated failing tool calls before the framework intervenes — and `max_llm_calls` is a global cap on legitimate reasoning, not a defense against a repeated-failure loop on one specific tool.

This is the same defect mode as #5652 (for `LoadSkillResourceTool`), in the same file (`src/google/adk/tools/skill_toolset.py`), with the same root cause and the same fix shape. Filing as a separate issue because the affected tool, error code, and code path are distinct.

**Steps to Reproduce:**

1. Install `google-adk` (any version that ships `SkillToolset`).
2. Create an agent with a `SkillToolset` containing a skill whose `SKILL.md` references executable scripts by natural-language names (e.g. \"the setup script\", \"the build helper\") without exact filenames, and configure a code executor on the toolset or agent.
3. Issue a query that prompts the model to run one of those scripts.
4. Observe in the trace that the model calls `run_skill_script` with a hallucinated path, receives `SCRIPT_NOT_FOUND`, then calls it again with a **different** hallucinated path, receives `SCRIPT_NOT_FOUND` again, and loops.

**Expected Behavior:**

After the first `SCRIPT_NOT_FOUND` within an invocation, any subsequent `run_skill_script` failure should return a terminal error code that unambiguously instructs the LLM to stop retrying and report the error. The agent's overall reasoning budget (`max_llm_calls`) should not be the only thing standing between an imperfect prompt and a runaway invocation.

**Observed Behavior:**

The same `SCRIPT_NOT_FOUND` soft error is returned on every attempt regardless of path or how many times it has already failed. The loop terminates only when `max_llm_calls` is exceeded.

**Code path** (`src/google/adk/tools/skill_toolset.py`, `RunSkillScriptTool.run_async`):

```python
if script is None:
    return {
        \"error\": f\"Script '{file_path}' not found in skill '{skill_name}'.\",
        \"error_code\": \"SCRIPT_NOT_FOUND\",
    }
```

No counter, no escalation, no terminal signal.

**Environment Details:**

- ADK Library Version: defect exists on `main` as of commit `327c45f9`
- Desktop OS: Linux (defect is in framework logic, not OS-specific)
- Python Version: `3.12.3`

**Model Information:**

- Are you using LiteLLM: N/A (defect is provider-agnostic)
- Which model: reproducible across any function-calling model — the retry behavior is a consequence of the soft error signal, not model-specific

---

## 🟡 Optional Information

**Regression:**

Not a regression. The defect has existed since `RunSkillScriptTool` was introduced — `RunSkillScriptTool.run_async` has never had any retry-guard logic on `SCRIPT_NOT_FOUND`.

**Additional Context:**

The same four factors that made #5652 reachable through ordinary use apply identically here:

1. **No script manifest at L2** — the `load_skill` response intentionally omits available script paths (progressive-disclosure spec). The LLM must infer paths from prose in `SKILL.md`, and inferred paths are routinely wrong.
2. **Soft error string** — `SCRIPT_NOT_FOUND` looks transient and recoverable to the model; retry is its default response.
3. **No terminal signal** — nothing escalates after the first miss.
4. **No scope boundary in default prompt** — until #5651 / this fix, the default system instruction does not tell the model to stop retrying on script tool errors.

The script tool's hallucination surface is arguably **worse** than the resource tool's: filenames like `scripts/setup.py`, `scripts/build.sh`, `scripts/run.py` are common conventions, so plausible-but-wrong guesses are extremely natural.

Considered and rejected alternatives (same as for #5652):

| Alternative | Why not |
|---|---|
| Per-path retry guard | LLM hallucinates a different path on each retry — confirmed in live trace for the resource tool, identical pattern applies to scripts; a per-path list never triggers |
| Tighten or default-lower `max_llm_calls` | Caps overall reasoning budget; punishes legitimate long-running agents |
| User-side `after_tool_callback` workaround | Symptomatic; pushes the fix onto every `SkillToolset` user |
| Add `available_scripts` manifest to L2 `load_skill` | Defeats the lazy-loading / token-saving design |
| New `list_skill_scripts` tool | Violates the L1→L2→L3 progressive disclosure contract |

**Minimal Reproduction Code:**

```python
import asyncio
from unittest import mock
from google.adk.skills import models
from google.adk.tools import skill_toolset, tool_context

skill = mock.create_autospec(models.Skill, instance=True)
skill.name = \"demo\"
skill.resources = mock.MagicMock()
skill.resources.get_script.return_value = None  # every path \"missing\"

ctx = mock.MagicMock(spec=tool_context.ToolContext)
ctx.state = {}
ctx.invocation_id = \"inv1\"
ctx._invocation_context = mock.MagicMock()
ctx.agent_name = \"agent\"

toolset_obj = skill_toolset.SkillToolset([skill])
tool = skill_toolset.RunSkillScriptTool(toolset_obj)

async def main():
    paths = [
        \"scripts/setup.py\",
        \"scripts/build.sh\",     # different path — LLM hallucination pattern
        \"scripts/run.py\",
    ]
    for i, path in enumerate(paths):
        r = await tool.run_async(
            args={\"skill_name\": \"demo\", \"file_path\": path},
            tool_context=ctx,
        )
        print(i, r[\"error_code\"])
    # On main (unpatched): all 3 print SCRIPT_NOT_FOUND — LLM has no reason to stop
    # With fix applied:    call 0 → SCRIPT_NOT_FOUND, calls 1-2 → SCRIPT_NOT_FOUND_FATAL

asyncio.run(main())
```

**How often has this issue occurred?:** Always (100%) — deterministic given any skill whose `SKILL.md` lets the model infer plausible-looking script paths that don't literally exist.

---

## Proposed Fix

A two-layer fix is in linked PR #5683, mirroring the approach taken for #5652 in #5651:

**Code**: an invocation-scoped **total failure counter** inside `RunSkillScriptTool.run_async`. The counter tracks the number of `SCRIPT_NOT_FOUND` responses across **all paths** within an invocation. State key:

```
temp:_adk_skill_script_not_found_count_<invocation_id>
```

- First failure → `SCRIPT_NOT_FOUND` (unchanged behavior).
- Any subsequent failure → `SCRIPT_NOT_FOUND_FATAL` with an explicit stop instruction and failure count.

The `temp:` prefix uses ADK's existing convention to prevent persistence to durable storage. The `<invocation_id>` suffix isolates in-memory backends where `temp:` keys are not auto-cleared between invocations.

**Prompt**: a no-retry rule for `run_skill_script` added to `_DEFAULT_SKILL_SYSTEM_INSTRUCTION`.

Linked PR: #5683
Companion to: #5652 / #5651

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RunSkillScriptTool retries SCRIPT_NOT_FOUND indefinitely; default max_llm_calls=500 is the only backstop #5684

🔴 Required Information

🟡 Optional Information

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alternative	Why not
Per-path retry guard	LLM hallucinates a different path on each retry — confirmed in live trace for the resource tool, identical pattern applies to scripts; a per-path list never triggers
Tighten or default-lower `max_llm_calls`	Caps overall reasoning budget; punishes legitimate long-running agents
User-side `after_tool_callback` workaround	Symptomatic; pushes the fix onto every `SkillToolset` user
Add `available_scripts` manifest to L2 `load_skill`	Defeats the lazy-loading / token-saving design
New `list_skill_scripts` tool	Violates the L1→L2→L3 progressive disclosure contract

[Bug] RunSkillScriptTool retries SCRIPT_NOT_FOUND indefinitely; default max_llm_calls=500 is the only backstop #5684

Description

🔴 Required Information

🟡 Optional Information

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions