Skip to content

[Bug] RunSkillScriptTool retries SCRIPT_NOT_FOUND indefinitely; default max_llm_calls=500 is the only backstop #5684

@Raman369AI

Description

@Raman369AI

🔴 Required Information

Describe the Bug:

RunSkillScriptTool.run_async returns SCRIPT_NOT_FOUND as a structured soft-error string when a script path passed by the LLM does not exist in the skill's scripts/ directory. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as a transient/recoverable failure and retries — but critically, it hallucinates a different plausible script path on every retry, not the same path. Nothing in SkillToolset tracks total failures across paths, so the loop continues until RunConfig.max_llm_calls is exhausted.

max_llm_calls defaults to 500 (src/google/adk/agents/run_config.py). This means a single invocation can silently consume the entire per-invocation call budget on repeated failing tool calls before the framework intervenes — and max_llm_calls is a global cap on legitimate reasoning, not a defense against a repeated-failure loop on one specific tool.

This is the same defect mode as #5652 (for LoadSkillResourceTool), in the same file (src/google/adk/tools/skill_toolset.py), with the same root cause and the same fix shape. Filing as a separate issue because the affected tool, error code, and code path are distinct.

Steps to Reproduce:

  1. Install google-adk (any version that ships SkillToolset).
  2. Create an agent with a SkillToolset containing a skill whose SKILL.md references executable scripts by natural-language names (e.g. "the setup script", "the build helper") without exact filenames, and configure a code executor on the toolset or agent.
  3. Issue a query that prompts the model to run one of those scripts.
  4. Observe in the trace that the model calls run_skill_script with a hallucinated path, receives SCRIPT_NOT_FOUND, then calls it again with a different hallucinated path, receives SCRIPT_NOT_FOUND again, and loops.

Expected Behavior:

After the first SCRIPT_NOT_FOUND within an invocation, any subsequent run_skill_script failure should return a terminal error code that unambiguously instructs the LLM to stop retrying and report the error. The agent's overall reasoning budget (max_llm_calls) should not be the only thing standing between an imperfect prompt and a runaway invocation.

Observed Behavior:

The same SCRIPT_NOT_FOUND soft error is returned on every attempt regardless of path or how many times it has already failed. The loop terminates only when max_llm_calls is exceeded.

Code path (src/google/adk/tools/skill_toolset.py, RunSkillScriptTool.run_async):

if script is None:
    return {
        \"error\": f\"Script '{file_path}' not found in skill '{skill_name}'.\",
        \"error_code\": \"SCRIPT_NOT_FOUND\",
    }

No counter, no escalation, no terminal signal.

Environment Details:

  • ADK Library Version: defect exists on main as of commit 327c45f9
  • Desktop OS: Linux (defect is in framework logic, not OS-specific)
  • Python Version: 3.12.3

Model Information:

  • Are you using LiteLLM: N/A (defect is provider-agnostic)
  • Which model: reproducible across any function-calling model — the retry behavior is a consequence of the soft error signal, not model-specific

🟡 Optional Information

Regression:

Not a regression. The defect has existed since RunSkillScriptTool was introduced — RunSkillScriptTool.run_async has never had any retry-guard logic on SCRIPT_NOT_FOUND.

Additional Context:

The same four factors that made #5652 reachable through ordinary use apply identically here:

  1. No script manifest at L2 — the load_skill response intentionally omits available script paths (progressive-disclosure spec). The LLM must infer paths from prose in SKILL.md, and inferred paths are routinely wrong.
  2. Soft error stringSCRIPT_NOT_FOUND looks transient and recoverable to the model; retry is its default response.
  3. No terminal signal — nothing escalates after the first miss.
  4. No scope boundary in default prompt — until fix: terminate infinite retry loop in LoadSkillResourceTool on RESOURCE_NOT_FOUND #5651 / this fix, the default system instruction does not tell the model to stop retrying on script tool errors.

The script tool's hallucination surface is arguably worse than the resource tool's: filenames like scripts/setup.py, scripts/build.sh, scripts/run.py are common conventions, so plausible-but-wrong guesses are extremely natural.

Considered and rejected alternatives (same as for #5652):

Alternative Why not
Per-path retry guard LLM hallucinates a different path on each retry — confirmed in live trace for the resource tool, identical pattern applies to scripts; a per-path list never triggers
Tighten or default-lower max_llm_calls Caps overall reasoning budget; punishes legitimate long-running agents
User-side after_tool_callback workaround Symptomatic; pushes the fix onto every SkillToolset user
Add available_scripts manifest to L2 load_skill Defeats the lazy-loading / token-saving design
New list_skill_scripts tool Violates the L1→L2→L3 progressive disclosure contract

Minimal Reproduction Code:

import asyncio
from unittest import mock
from google.adk.skills import models
from google.adk.tools import skill_toolset, tool_context

skill = mock.create_autospec(models.Skill, instance=True)
skill.name = \"demo\"
skill.resources = mock.MagicMock()
skill.resources.get_script.return_value = None  # every path \"missing\"

ctx = mock.MagicMock(spec=tool_context.ToolContext)
ctx.state = {}
ctx.invocation_id = \"inv1\"
ctx._invocation_context = mock.MagicMock()
ctx.agent_name = \"agent\"

toolset_obj = skill_toolset.SkillToolset([skill])
tool = skill_toolset.RunSkillScriptTool(toolset_obj)

async def main():
    paths = [
        \"scripts/setup.py\",
        \"scripts/build.sh\",     # different path — LLM hallucination pattern
        \"scripts/run.py\",
    ]
    for i, path in enumerate(paths):
        r = await tool.run_async(
            args={\"skill_name\": \"demo\", \"file_path\": path},
            tool_context=ctx,
        )
        print(i, r[\"error_code\"])
    # On main (unpatched): all 3 print SCRIPT_NOT_FOUND — LLM has no reason to stop
    # With fix applied:    call 0 → SCRIPT_NOT_FOUND, calls 1-2 → SCRIPT_NOT_FOUND_FATAL

asyncio.run(main())

How often has this issue occurred?: Always (100%) — deterministic given any skill whose SKILL.md lets the model infer plausible-looking script paths that don't literally exist.


Proposed Fix

A two-layer fix is in linked PR #5683, mirroring the approach taken for #5652 in #5651:

Code: an invocation-scoped total failure counter inside RunSkillScriptTool.run_async. The counter tracks the number of SCRIPT_NOT_FOUND responses across all paths within an invocation. State key:

temp:_adk_skill_script_not_found_count_<invocation_id>
  • First failure → SCRIPT_NOT_FOUND (unchanged behavior).
  • Any subsequent failure → SCRIPT_NOT_FOUND_FATAL with an explicit stop instruction and failure count.

The temp: prefix uses ADK's existing convention to prevent persistence to durable storage. The <invocation_id> suffix isolates in-memory backends where temp: keys are not auto-cleared between invocations.

Prompt: a no-retry rule for run_skill_script added to _DEFAULT_SKILL_SYSTEM_INSTRUCTION.

Linked PR: #5683
Companion to: #5652 / #5651

Metadata

Metadata

Assignees

Labels

tools[Component] This issue is related to tools

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions