fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#6
fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#6Raman369AI wants to merge 1 commit into
Conversation
…T_FOUND Mirrors the LoadSkillResourceTool fix (google#5651): when the LLM hallucinates a script path that does not exist in the skill's scripts/ directory, RunSkillScriptTool returns SCRIPT_NOT_FOUND as a soft error and the model retries with a different plausible path each turn. Nothing escalates between attempts, so the loop terminates only when RunConfig.max_llm_calls (default 500) is exhausted. Adds an invocation-scoped failure counter under temp:_adk_skill_script_not_found_count_<invocation_id>. The first miss within an invocation still returns SCRIPT_NOT_FOUND (unchanged); any subsequent miss returns the new SCRIPT_NOT_FOUND_FATAL with an explicit "do not retry, report and stop" message and the failure count. The counter is path-agnostic, so it fires even when the model hallucinates a different script path on each retry. The temp: prefix keeps the key out of durable session storage; the invocation_id suffix isolates in-memory backends where temp: keys are not auto-cleared between invocations. The default system prompt also gains a no-retry rule for run_skill_script errors so the model has a semantic reason to stop, complementing the code-level guard. Tests cover: first-miss soft error preserved, 2nd-miss-same-path escalates, 2nd-miss-different-path escalates (path-agnostic), per-invocation isolation, counter key uses temp: prefix.
There was a problem hiding this comment.
Code Review
This pull request updates the project version to 1.33.0, introduces a SkillRegistry interface for dynamic skill discovery, and enhances the SkillToolset with a turn-scoped cache. It adds support for PKCE in OAuth2 authentication, makes truncation limits configurable in environment tools, and refactors CLI onboarding logic. Telemetry is improved by updating semantic conventions and streamlining metric attributes. Numerous samples and tests are updated to use newer Gemini models, and various bug fixes are included across A2A, session management, and code execution components. Feedback suggests making the hardcoded cache size in SkillToolset configurable to better handle high-concurrency server environments.
| str, | ||
| dict[str, models.Skill | asyncio.Future[models.Skill | None] | None], | ||
| ] = {} | ||
| self._max_cache_turns = 16 |
There was a problem hiding this comment.
The _max_cache_turns limit is hardcoded to 16. In a server environment where a single SkillToolset instance is shared across many concurrent users, this small cache size will lead to frequent evictions and redundant registry calls. Consider increasing this default or making it configurable to better support high-concurrency scenarios.
Fixes google#5684
Companion to google#5651 — closes the same defect class for
RunSkillScriptTool.Summary
RunSkillScriptTool.run_asyncreturnsSCRIPT_NOT_FOUNDas a structured soft-error string when a script path passed by the LLM does not exist in the skill'sscripts/directory. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as transient and retries — and in practice it hallucinates a different plausible script path on each retry (e.g.scripts/setup.py,scripts/build.py,scripts/run.sh). Nothing inRunSkillScriptTooltracks failures across paths, so the loop continues untilRunConfig.max_llm_calls(default 500) is exhausted.This is the same defect mode as
LoadSkillResourceTool(google#5652 / google#5651), in the same file, with the same fix shape. Filing as a separate PR for a clean history.Why this matters
The four conditions that made the resource-loop reachable apply identically to scripts:
load_skillresponse intentionally omits a list of available scripts (progressive-disclosure spec). The LLM infers script filenames from prose inSKILL.md, and inferred filenames likescripts/setup.pyorscripts/build.share routinely wrong.SCRIPT_NOT_FOUNDlooks transient to the model.max_llm_calls=500is the only backstop — a global reasoning cap, not a defense against a known failure mode on one tool.What changed
Two layers in
src/google/adk/tools/skill_toolset.py, plus tests:1. Code: invocation-scoped failure counter in
RunSkillScriptTool.run_asyncA total
SCRIPT_NOT_FOUNDcount (across all paths) is tracked ontool_context.stateunder:temp:prefix uses ADK's existing convention so the value is trimmed from the persisted event delta and never reaches durable session storage.<invocation_id>suffix ensures correctness on in-memory session backends, wheretemp:keys are not auto-cleared between invocations.Behavior:
SCRIPT_NOT_FOUNDwithin an invocation (any path) → returnsSCRIPT_NOT_FOUND(unchanged).SCRIPT_NOT_FOUND_FATALerror code with an explicit "do not retry — report the error and stop" message, including the failure number.The guard is invocation-scoped, path-agnostic (catches hallucinated variants), and adds no overhead on the success path. The counter key is independent of the resource counter introduced in google#5651 — a single miss on each tool does not immediately escalate.
2. Prompt: no-retry rule for
run_skill_scriptA new rule 5 in
_DEFAULT_SKILL_SYSTEM_INSTRUCTION:Why both layers
Defense-in-depth, identical reasoning to google#5651: code-only termination produces confusing downstream behavior when the LLM has no semantic reason to stop; prompt-only termination relies on the LLM following the rule, which imperfect upstream prompts can override.
Considered and rejected
max_llm_callsafter_tool_callbackworkaroundSkillToolsetuseravailable_scriptsmanifest to L2load_skilllist_skill_scriptstoolload_skill_resourceTest plan
New tests in
tests/unittests/tools/test_skill_toolset.py:test_execute_script_repeated_failure_escalates_to_fatal— second call on the same missing path within an invocation returnsSCRIPT_NOT_FOUND_FATAL.test_execute_script_different_path_also_escalates_to_fatal— second call on a different missing path also returnsSCRIPT_NOT_FOUND_FATAL; proves the counter is path-agnostic and catches hallucinated variants.test_execute_script_failures_isolated_per_invocation— failures from invocation A do not escalate the first attempt in invocation B even when sharing the same session state dict.test_execute_script_counter_uses_temp_prefix— every key written by the guard starts withtemp:so it is trimmed from the persisted event delta.test_execute_script_script_not_found— tightened: first miss returns the unchangedSCRIPT_NOT_FOUNDcode and message.Verification:
test_skill_toolset.pypass (90 pre-existing + 5 added/tightened), 0 regressions.pyink --checkclean on both modified files.isort --check-onlyclean on both modified files.mypy src/google/adk/tools/skill_toolset.pyreports the same 19 pre-existing errors asmain; zero new errors introduced.Backwards compatibility
SCRIPT_NOT_FOUNDerror code, same error string. Existing callers see no difference.SCRIPT_NOT_FOUND_FATALcode is purely additive.temp:prefix and is guaranteed not to leak into persisted session storage.Relationship to google#5651
This PR is logically independent and merges cleanly against current
main. If google#5651 lands first, the prompts will need a trivial merge (both PRs append a new rule to the same instruction string); the code paths are in different methods and do not conflict.