Skip to content

[Bug] LoadSkillResourceTool retries RESOURCE_NOT_FOUND indefinitely; default max_llm_calls=500 is the only backstop #5652

@Raman369AI

Description

@Raman369AI

🔴 Required Information

Describe the Bug:

LoadSkillResourceTool.run_async returns RESOURCE_NOT_FOUND as a structured soft-error string when a path passed by the LLM does not exist inside the skill's bundled resources. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as a transient/recoverable failure and retries — but critically, it hallucinates a different plausible path on every retry, not the same path. Nothing in SkillToolset tracks total failures across paths, so the loop continues until RunConfig.max_llm_calls is exhausted.

max_llm_calls defaults to 500 (src/google/adk/agents/run_config.py:314). This means a single invocation can silently consume the entire per-invocation call budget on repeated failing tool calls before the framework intervenes — and max_llm_calls is a global cap on legitimate reasoning, not a defense against a repeated-failure loop on one specific tool.

Steps to Reproduce:

  1. Install google-adk (any version that ships SkillToolset — verified on 1.32.0).
  2. Create an agent with a SkillToolset containing a skill whose SKILL.md references files by natural-language names (e.g. "Document 1", "the reference guide") without exact filenames.
  3. Issue a query that prompts the model to read one of those resources.
  4. Observe in the trace that the model calls load_skill_resource with a hallucinated path, receives RESOURCE_NOT_FOUND, then calls it again with a different hallucinated path, receives RESOURCE_NOT_FOUND again, and loops.

Expected Behavior:

After the first RESOURCE_NOT_FOUND within an invocation, any subsequent load_skill_resource failure should return a terminal error code that unambiguously instructs the LLM to stop retrying and report the error. The agent's overall reasoning budget (max_llm_calls) should not be the only thing standing between an imperfect prompt and a runaway invocation.

Observed Behavior:

The same RESOURCE_NOT_FOUND soft error is returned on every attempt regardless of path or how many times it has already failed. The loop terminates only when max_llm_calls is exceeded.

Live trace evidence (captured via GET /debug/trace/session/{session_id} against adk web):

SPAN: execute_tool load_skill_resource
  args:       {'file_path': 'references/reference_doc.md', 'skill_name': 'document-classifier'}
  error_code: RESOURCE_NOT_FOUND
  error:      Resource 'references/reference_doc.md' not found in skill 'document-classifier'.

SPAN: execute_tool load_skill_resource
  args:       {'skill_name': 'document-classifier', 'file_path': 'references/Document1.md'}
  error_code: RESOURCE_NOT_FOUND
  error:      Resource 'references/Document1.md' not found in skill 'document-classifier'.

The model tried references/reference_doc.md first, then hallucinated a completely different path (references/Document1.md) on the retry. Both returned the same soft error — the LLM had no signal to stop. This pattern continues indefinitely.

Environment Details:

  • ADK Library Version: 1.32.0 (issue exists on main as of commit 2d61cb69)
  • Desktop OS: Linux (defect is in framework logic, not OS-specific)
  • Python Version: 3.12.3

Model Information:

  • Are you using LiteLLM: N/A (defect is provider-agnostic)
  • Which model: gemini-3-flash-preview (observed; reproducible across any function-calling model — the retry behavior is a consequence of the soft error signal, not model-specific)

🟡 Optional Information

Regression:

Not a regression. The defect has existed since SkillToolset was introduced — LoadSkillResourceTool.run_async has never had any retry-guard logic.

Additional Context:

Four factors combine to make this loop reachable through ordinary use:

  1. No resource manifest at L2 — the load_skill response intentionally omits available file paths (progressive-disclosure spec). The LLM must infer paths from prose, and inferred paths are routinely wrong.
  2. Soft error stringRESOURCE_NOT_FOUND looks transient and recoverable to the model; retry is its default response.
  3. No terminal signal — nothing escalates after the first miss.
  4. No scope boundary in default prompt — the system instruction doesn't distinguish skill-bundled files from runtime user inputs (e.g. a PDF the user is processing), so the model sometimes routes runtime documents through load_skill_resource and loops on them.

Considered and rejected alternatives:

Alternative Why not
Per-path retry guard LLM hallucinates a different path on each retry — confirmed in live trace; a per-path list never triggers
Tighten or default-lower max_llm_calls Caps overall reasoning budget; punishes legitimate long-running agents
User-side after_tool_callback workaround Symptomatic; pushes the fix onto every SkillToolset user
Add available_resources manifest to L2 load_skill Defeats the lazy-loading / token-saving design
New list_skill_resources tool Violates the L1→L2→L3 progressive disclosure contract

Minimal Reproduction Code:

import asyncio
from unittest import mock
from google.adk.skills import models
from google.adk.tools import skill_toolset, tool_context

skill = mock.create_autospec(models.Skill, instance=True)
skill.name = "demo"
skill.resources = mock.MagicMock()
skill.resources.get_reference.return_value = None  # every path "missing"

ctx = mock.MagicMock(spec=tool_context.ToolContext)
ctx.state = {}
ctx.invocation_id = "inv1"
ctx._invocation_context = mock.MagicMock()
ctx.agent_name = "agent"

toolset_obj = skill_toolset.SkillToolset([skill])
tool = skill_toolset.LoadSkillResourceTool(toolset_obj)

async def main():
    paths = [
        "references/missing.md",
        "references/other_guess.md",   # different path — LLM hallucination pattern
        "references/yet_another.md",
    ]
    for i, path in enumerate(paths):
        r = await tool.run_async(
            args={"skill_name": "demo", "file_path": path},
            tool_context=ctx,
        )
        print(i, r["error_code"])
    # On main (unpatched): all 3 print RESOURCE_NOT_FOUND — LLM has no reason to stop
    # With fix applied:    call 0 → RESOURCE_NOT_FOUND, calls 1-2 → RESOURCE_NOT_FOUND_FATAL

asyncio.run(main())

How often has this issue occurred?: Always (100%) — deterministic given any skill whose SKILL.md lets the model infer plausible-looking paths that don't literally exist.


Proposed Fix

A two-layer fix is in linked PR #5651:

Code: an invocation-scoped total failure counter inside LoadSkillResourceTool.run_async. The counter tracks the number of RESOURCE_NOT_FOUND responses across all paths within an invocation (not per-path — live testing confirmed the LLM uses a different path on each retry). State key:

temp:_adk_skill_resource_not_found_count_<invocation_id>
  • First failure → RESOURCE_NOT_FOUND (unchanged behavior).
  • Any subsequent failure → RESOURCE_NOT_FOUND_FATAL with an explicit stop instruction and failure count.

The temp: prefix uses ADK's existing convention to prevent persistence to durable storage. The <invocation_id> suffix isolates in-memory backends where temp: keys are not auto-cleared between invocations.

Prompt: a no-retry rule and a scope boundary added to _DEFAULT_SKILL_SYSTEM_INSTRUCTION.

Live trace with fix applied (same session, patched build):

SPAN: execute_tool load_skill_resource
  args:       {'file_path': 'references/reference_doc.md', 'skill_name': 'document-classifier'}
  error_code: RESOURCE_NOT_FOUND
  error:      Resource 'references/reference_doc.md' not found in skill 'document-classifier'.

SPAN: execute_tool load_skill_resource
  args:       {'skill_name': 'document-classifier', 'file_path': 'references/Document1.md'}
  error_code: RESOURCE_NOT_FOUND_FATAL
  error:      Resource 'references/Document1.md' not found in skill 'document-classifier'.
              This is resource lookup failure #2 this invocation. Do not retry any path
              — report the error to the user and stop.

Loop terminated on the second call. The model attempted a different path (Document1.md vs reference_doc.md) — exactly the hallucination pattern that a per-path guard would have missed.

Linked PR: #5651

Metadata

Metadata

Labels

request clarification[Status] The maintainer need clarification or more information from the authortools[Component] This issue is related to tools

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions