fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND by Raman369AI · Pull Request #6 · Raman369AI/adk-python

Raman369AI · 2026-05-13T03:51:12Z

Companion to google#5651 — closes the same defect class for RunSkillScriptTool.

Summary

RunSkillScriptTool.run_async returns SCRIPT_NOT_FOUND as a structured soft-error string when a script path passed by the LLM does not exist in the skill's scripts/ directory. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as transient and retries — and in practice it hallucinates a different plausible script path on each retry (e.g. scripts/setup.py, scripts/build.py, scripts/run.sh). Nothing in RunSkillScriptTool tracks failures across paths, so the loop continues until RunConfig.max_llm_calls (default 500) is exhausted.

This is the same defect mode as LoadSkillResourceTool (google#5652 / google#5651), in the same file, with the same fix shape. Filing as a separate PR for a clean history.

Why this matters

The four conditions that made the resource-loop reachable apply identically to scripts:

No script manifest at L2 — the load_skill response intentionally omits a list of available scripts (progressive-disclosure spec). The LLM infers script filenames from prose in SKILL.md, and inferred filenames like scripts/setup.py or scripts/build.sh are routinely wrong.
Soft error string — SCRIPT_NOT_FOUND looks transient to the model.
No terminal signal — nothing escalates after the first miss.
max_llm_calls=500 is the only backstop — a global reasoning cap, not a defense against a known failure mode on one tool.

What changed

Two layers in src/google/adk/tools/skill_toolset.py, plus tests:

1. Code: invocation-scoped failure counter in `RunSkillScriptTool.run_async`

A total SCRIPT_NOT_FOUND count (across all paths) is tracked on tool_context.state under:

temp:_adk_skill_script_not_found_count_<invocation_id>

The temp: prefix uses ADK's existing convention so the value is trimmed from the persisted event delta and never reaches durable session storage.
The <invocation_id> suffix ensures correctness on in-memory session backends, where temp: keys are not auto-cleared between invocations.

Behavior:

First SCRIPT_NOT_FOUND within an invocation (any path) → returns SCRIPT_NOT_FOUND (unchanged).
Any subsequent miss in the same invocation → returns the new SCRIPT_NOT_FOUND_FATAL error code with an explicit "do not retry — report the error and stop" message, including the failure number.

The guard is invocation-scoped, path-agnostic (catches hallucinated variants), and adds no overhead on the success path. The counter key is independent of the resource counter introduced in google#5651 — a single miss on each tool does not immediately escalate.

2. Prompt: no-retry rule for `run_skill_script`

A new rule 5 in _DEFAULT_SKILL_SYSTEM_INSTRUCTION:

"If run_skill_script returns an error (for example SCRIPT_NOT_FOUND), do not retry the same script or guess a different script path. Report the error to the user and stop."

Why both layers

Defense-in-depth, identical reasoning to google#5651: code-only termination produces confusing downstream behavior when the LLM has no semantic reason to stop; prompt-only termination relies on the LLM following the rule, which imperfect upstream prompts can override.

Considered and rejected

Alternative	Why not
Per-path retry guard	LLM hallucinates a different script path on every retry — a per-path list never escalates
Tighten or default-lower `max_llm_calls`	Caps overall reasoning budget; punishes legitimate long-running tasks
Symptomatic `after_tool_callback` workaround	Pushes the fix onto every `SkillToolset` user
Add an `available_scripts` manifest to L2 `load_skill`	Defeats the lazy-loading / token-saving design
Introduce a `list_skill_scripts` tool	Violates the L1→L2→L3 progressive disclosure contract
Share one counter with `load_skill_resource`	Two distinct tools, two distinct hallucination surfaces; separate counters keep the failure semantics aligned per tool

Test plan

New tests in tests/unittests/tools/test_skill_toolset.py:

test_execute_script_repeated_failure_escalates_to_fatal — second call on the same missing path within an invocation returns SCRIPT_NOT_FOUND_FATAL.
test_execute_script_different_path_also_escalates_to_fatal — second call on a different missing path also returns SCRIPT_NOT_FOUND_FATAL; proves the counter is path-agnostic and catches hallucinated variants.
test_execute_script_failures_isolated_per_invocation — failures from invocation A do not escalate the first attempt in invocation B even when sharing the same session state dict.
test_execute_script_counter_uses_temp_prefix — every key written by the guard starts with temp: so it is trimmed from the persisted event delta.
test_execute_script_script_not_found — tightened: first miss returns the unchanged SCRIPT_NOT_FOUND code and message.

Verification:

All 95 tests in test_skill_toolset.py pass (90 pre-existing + 5 added/tightened), 0 regressions.
pyink --check clean on both modified files.
isort --check-only clean on both modified files.
mypy src/google/adk/tools/skill_toolset.py reports the same 19 pre-existing errors as main; zero new errors introduced.

Backwards compatibility

The first-failure behavior is unchanged: same SCRIPT_NOT_FOUND error code, same error string. Existing callers see no difference.
The new SCRIPT_NOT_FOUND_FATAL code is purely additive.
The new state key uses the temp: prefix and is guaranteed not to leak into persisted session storage.
The new prompt rule is additive; the prior four rules are unchanged.

Relationship to google#5651

This PR is logically independent and merges cleanly against current main. If google#5651 lands first, the prompts will need a trivial merge (both PRs append a new rule to the same instruction string); the code paths are in different methods and do not conflict.

…T_FOUND Mirrors the LoadSkillResourceTool fix (google#5651): when the LLM hallucinates a script path that does not exist in the skill's scripts/ directory, RunSkillScriptTool returns SCRIPT_NOT_FOUND as a soft error and the model retries with a different plausible path each turn. Nothing escalates between attempts, so the loop terminates only when RunConfig.max_llm_calls (default 500) is exhausted. Adds an invocation-scoped failure counter under temp:_adk_skill_script_not_found_count_<invocation_id>. The first miss within an invocation still returns SCRIPT_NOT_FOUND (unchanged); any subsequent miss returns the new SCRIPT_NOT_FOUND_FATAL with an explicit "do not retry, report and stop" message and the failure count. The counter is path-agnostic, so it fires even when the model hallucinates a different script path on each retry. The temp: prefix keeps the key out of durable session storage; the invocation_id suffix isolates in-memory backends where temp: keys are not auto-cleared between invocations. The default system prompt also gains a no-retry rule for run_skill_script errors so the model has a semantic reason to stop, complementing the code-level guard. Tests cover: first-miss soft error preserved, 2nd-miss-same-path escalates, 2nd-miss-different-path escalates (path-agnostic), per-invocation isolation, counter key uses temp: prefix.

gemini-code-assist

Code Review

This pull request updates the project version to 1.33.0, introduces a SkillRegistry interface for dynamic skill discovery, and enhances the SkillToolset with a turn-scoped cache. It adds support for PKCE in OAuth2 authentication, makes truncation limits configurable in environment tools, and refactors CLI onboarding logic. Telemetry is improved by updating semantic conventions and streamlining metric attributes. Numerous samples and tests are updated to use newer Gemini models, and various bug fixes are included across A2A, session management, and code execution components. Feedback suggests making the hardcoded cache size in SkillToolset configurable to better handle high-concurrency server environments.

gemini-code-assist · 2026-05-13T03:55:18Z

+        str,
+        dict[str, models.Skill | asyncio.Future[models.Skill | None] | None],
+    ] = {}
+    self._max_cache_turns = 16


The _max_cache_turns limit is hardcoded to 16. In a server environment where a single SkillToolset instance is shared across many concurrent users, this small cache size will lead to frequent evictions and redundant registry calls. Consider increasing this default or making it configurable to better support high-concurrency scenarios.

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Raman369AI closed this May 13, 2026

Raman369AI reopened this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#6

fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#6
Raman369AI wants to merge 1 commit into
mainfrom
fix/skill-script-not-found-infinite-loop

Raman369AI commented May 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Raman369AI commented May 13, 2026

Summary

Why this matters

What changed

1. Code: invocation-scoped failure counter in RunSkillScriptTool.run_async

2. Prompt: no-retry rule for run_skill_script

Why both layers

Considered and rejected

Test plan

Backwards compatibility

Relationship to google#5651

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Code: invocation-scoped failure counter in `RunSkillScriptTool.run_async`

2. Prompt: no-retry rule for `run_skill_script`