Skip to content

refactor: Stainless-style MCP phase one — execute + search_docs#7

Open
Aliiiu wants to merge 121 commits into
mainfrom
refactor/stainless-style-mcp-phase-one
Open

refactor: Stainless-style MCP phase one — execute + search_docs#7
Aliiiu wants to merge 121 commits into
mainfrom
refactor/stainless-style-mcp-phase-one

Conversation

@Aliiiu
Copy link
Copy Markdown
Contributor

@Aliiiu Aliiiu commented May 14, 2026

Summary

  • Adds a Stainless/CoinGecko-style 2-tool default surface (execute sandboxed JavaScript + search_docs local MiniSearch index) plus two convenience tools (debank_resolve, debank_get_supported_chain_list). 30 of the 31 legacy debank_* tools become hidden by default; opt back in with --legacy-tools or DEBANK_MCP_LEGACY=1.
  • Each service method gains a public *Raw() JSON-returning variant; markdown methods are thin wrappers with separate fetch/format error contexts. Existing v0.1 markdown output is preserved byte-identical (verified by 31-method snapshot regression).
  • New requirement: Node.js ≥ 22 (isolated-vm@^6). Published binary's shebang already passes --no-node-snapshot.

Architecture

src/mcp/
├── execute/     # isolated-vm sandbox + in-sandbox debank client (lazy-loaded)
├── search-docs/ # MiniSearch over committed embedded-index.ts + cookbook .md files
├── instructions/# instructions.md → instructions.generated.ts (committed)
├── legacy/      # pure tool-metadata.ts + side-effectful tool-handlers.ts
└── tools.ts     # debank_resolve + default-surface debank_get_supported_chain_list

Key architectural choices, all documented in docs/superpowers/specs/2026-05-13-stainless-style-mcp-refactor-phase-one-design.md:

  • Three-layer timeout on every debank.* call in the sandbox: 5s AbortController cancels in-flight request, 6s axios timeout (asymmetric so abort wins), host-side Promise.race guarantees per-call resolution.
  • Lazy isolated-vm load — server starts and search_docs works even if the native addon fails to load; only execute returns the canonical "isolated-vm native module failed to load" failure.
  • Side-effect-free tool-metadata.ts consumed by the build-time docs index generator; runtime handlers and service imports live in a separate tool-handlers.ts.

What's in the diff

87 commits total:

  • 52 planning commits — spec (22 review rounds) and implementation plan (28 review rounds) under docs/superpowers/.
  • 35 refactor commits — the implementation, organized by the 30 plan tasks.

Deviations from plan worth noting

  • src/mcp/execute/client.ts switched from ivm.Callback({async:true}) (per plan) to ivm.Reference with a guest-side wrapper that JSON-serializes args and unpacks {ok, data|error} envelopes. The agent-facing contract (plain async fns, errors as catchable exceptions, results as JS objects) is preserved. Zero-arg case (e.g. debank.chain.getSupportedChainList()) is regression-tested.
  • scripts/build-docs-index.ts and scripts/build-instructions.ts run biome format --write on their generated output as a final step. Determinism verified — empty git diff on rerun.

Test plan

  • pnpm run build — clean (prebuild regenerates committed docs/instructions deterministically)
  • git diff --exit-code on embedded-index.ts + instructions.generated.ts — clean
  • pnpm lint — 0 errors (warnings are pre-existing noNonNullAssertion style)
  • pnpm test88/88 tests across 16 files (8.6s)
  • pnpm exec tsc --noEmit — clean
  • git diff --check origin/main...HEAD — clean (no trailing-whitespace noise)
  • Snapshot regression: 31/31 service methods produce byte-identical v0.1 markdown
  • Lazy-isolated-vm child-process test: server starts, search_docs works, execute returns canonical native-load failure
  • Zero-arg sandbox calls: debank.chain.getSupportedChainList() regression-tested end-to-end
  • Reviewer: smoke-test against a real DEBANK_API_KEY with the published binary's shebang flow

Migration

For users on v0.1.x who depend on the 30 hidden tools:

```bash

Restore legacy surface

mcp-debank --legacy-tools

or

DEBANK_MCP_LEGACY=1 mcp-debank
```

New integrations should use `execute` + `search_docs` instead — see `src/mcp/instructions/instructions.md` for the agent-facing operational guide.

🤖 Generated with Claude Code

Aliiiu and others added 30 commits May 13, 2026 14:15
Captures decisions and design for the first sub-project of the
multi-phase refactor toward the CoinGecko/Stainless MCP architecture:
execute (isolated-vm sandbox), search_docs (MiniSearch index over
existing tool definitions), curated instructions, and two convenience
tools. The 28 legacy tools move behind a --legacy-tools flag. HTTP
transport, tool removal, OpenAPI authoring, and resolver migration
are deferred to later sub-projects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revises the spec in response to seven review findings:

- Reframe 0.2.0 as a breaking change (legacy tools hidden by default
  is a wire-surface break); add explicit changelog/README guidance.
- Add a *Raw() method per service to expose parsed JSON to the sandbox;
  fetchWithToolConfig and formatResponse remain protected.
- Rename "TypeScript" to "JavaScript" throughout the execute tool
  (isolated-vm runs raw V8; no TS transpile in phase one).
- Add per-call host-side AbortController + axios timeout (5 s) so a
  stalled DeBank request cannot outlive the isolate's 30 s budget.
- Split legacy/tools.ts into tool-metadata.ts (pure, indexed at build
  time) and tool-handlers.ts (imports services), keeping module-load
  side effects out of the docs index build.
- Normalize the response contract: outer MCP envelope uses isError
  (camelCase, per MCP spec); inner JSON payload uses ok/result/error/
  log_lines/err_lines. Single §4.0 contract section.
- Replace the search_docs no-match hint that referenced raw HTTP
  (sandbox has no fetch) with a pointer to debank_resolve /
  debank_get_supported_chain_list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the remaining five findings before implementation:

- Decision #11 + §2.2 step 1: thread a new RequestOptions
  ({signal?, timeout?}) parameter through fetchWithToolConfig /
  postWithToolConfig and the four private fetchDirect/fetchViaGateway/
  postDirect/postViaGateway callees. Pure addition with default
  undefined so the legacy path stays byte-identical. The sandbox
  proxy creates an AbortController per call (5 s), passes the signal
  AND axios timeout, and wraps abort rejections with a "DeBank call
  timed out" message.
- §3.1 cold start rewritten: service singletons and the entity
  resolver are always constructed at startup regardless of --legacy-
  tools, because default tools (execute, debank_resolve,
  debank_get_supported_chain_list) need them. Only tool-handlers.ts
  import is conditional. isolated-vm Isolate stays lazy.
- Metadata schema split into legacyMethodPath (markdown) and
  sandboxMethodPath (Raw/JSON) so the two consumers join through
  unambiguous fields.
- New §5b: build:docs + prebuild script, tsx, and full new-dep list
  (isolated-vm, minisearch, vitest, @vitest/coverage-v8, msw, tsx).
- §6: read FastMCP server version from package.json instead of the
  current hardcoded "1.0.0" at src/index.ts:29.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.5 + file tree + §5b: instructions ship via a generated
  instructions.generated.ts (committed) emitted by a new
  build-instructions.ts script in prebuild. Authoring source stays
  in instructions.md but is NOT shipped — the dist-only files field
  in package.json stays unchanged.
- §2.1 + §2.3: tool schema examples switched from the raw MCP
  inputSchema wire shape to the FastMCP-native parameters: z.object
  shape that matches src/tools/index.ts:55. Added a note that
  FastMCP converts Zod to inputSchema on the wire.
- §3.1: cold-start preamble now lists the two distinct module-load
  side effects in a table (services/index.ts → singletons +
  OpenRouter wiring; cache-manager.ts top-level call →
  initializeCacheManager fire-and-forget, reached transitively
  through entity-resolver.ts). Step 5 explicitly imports
  entity-resolver; step 10 clarifies that "lazy" applies to the
  isolated-vm runtime, while each execute call constructs its own
  fresh Isolate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes graceful-degradation paths that don't belong in a Stainless-
style server (required artifacts should fail loudly, not silently
degrade), and corrects two technical-accuracy issues:

- §3.1 step 5: rewritten as three explicit Gemini paths (cached;
  uncached; no key → null). The earlier wording claimed a non-Gemini
  fallback exists in base-resolver.ts:24-40, which is false — both
  branches use google("gemini-2.5-flash"). The current spec describes
  v0.1's actual behavior and notes that killing the Gemini dep is the
  deferred sub-project in §7.
- §4.4 retitled "Required-artifact failures": missing instructions.md
  fails build:instructions; missing instructions.generated.ts or
  embedded-index.ts fails tsc; missing isolated-vm native module
  surfaces at first execute call, not at startup. Removes the prior
  "log warning and start with empty instructions" path.
- §5b: build-instructions.ts emits content via JSON.stringify(markdown)
  instead of a template literal, eliminating backtick/${} escape
  hazards from code examples in the markdown source.
- §5b: zod-to-json-schema 3.x is confirmed in pnpm-lock.yaml but the
  repo runs zod@^4 — implementation plan must verify v4 compatibility
  at first index-builder commit or switch to Zod 4's built-in
  z.toJSONSchema().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.2 Step 2: make legacy error semantics explicit after the *Raw()
  split. The try/catch + logAndWrapError contextual logging
  (token.service.ts:28-43 pattern) moves into *Raw() — the markdown
  wrapper becomes a one-liner with no catch of its own. Test plan
  notes a byte-identical regression check for the markdown output.
- §2.4: drop the never-defined `suggestions` field from
  debank_resolve. Null returns are {resolved: null, error: "<hint>"}
  with a static chain-ID list — matches §4.3 and avoids speccing a
  similarity metric over chainIds.
- §2.5: align the instructions.generated.ts example with §5b — show
  a JSON.stringify-style emit (quoted string) instead of a template
  literal.
- §4.4: drop the prepublishOnly mention. publish-packages already
  runs build directly, which fires prebuild → build:instructions, so
  no extra hook is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.2 Step 2: restore a thin catch around formatResponse in the
  markdown wrapper. The earlier no-catch shape would have dropped
  the service-context wrap for toMarkdown failures (a silent
  behavior change for legacy callers). Now both error sources —
  network and formatter — get logAndWrapError with the existing
  contextual messages. Two distinct messages: "Failed to fetch X"
  in *Raw(), "Failed to format X response" in the markdown wrapper.
- §5.2: regression-test scope expanded from "one per service" to all
  28 methods, driven by per-method JSON fixtures under
  tests/fixtures/services/. Explicitly notes every response-shape
  variant (single object, flat array, nested array, POST body,
  usd_value_list special case) is covered.
- §4.3: collapse the two debank_resolve null-path messages into one
  canonical string identical to §2.4. Tests assert against a single
  literal, no drift.
- §3.1 step 10: add an implementation note requiring dynamic-import
  of isolated-vm and sandbox.ts. Static imports at the top of
  sandbox.ts or execute/tool.ts would load the native addon at
  server startup, defeating the lazy step-10 contract. Tests verify
  require.cache absence before first execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.2: pin down the *Raw() shape for the one transformed method.
  getUserTotalNetCurveRaw returns the literal DeBank wrapper
  {usd_value_list: NetCurvePoint[]}, not the bare array. The
  markdown wrapper unwraps before formatResponse. The cookbook
  example for net-curve queries must show
  (await debank.user.getUserTotalNetCurve({id})).usd_value_list so
  agents see the wrapper. Verified by grep no other method does this.
- §3.1 step 10: drop the require.cache assertion (incorrect for this
  ESM project) and replace with two tests — vi.mock("isolated-vm")
  spy + a CI smoke test that pnpm rm isolated-vm and runs the server.
  Lives in tests/integration/lazy-isolated-vm.test.ts.
- Decision row 7: rewritten to match §2.2's dual-catch model so the
  summary doesn't imply a catch-free one-liner.
- §2.2: add an explicit "intentional error-string refinement" note.
  Today's single catch labels every error "Failed to fetch X" even
  for toMarkdown failures (misleading); after the split, fetch
  errors keep that wording while formatter errors get
  "Failed to format X response". Log scrapers matching on
  "Failed to fetch" still match all fetch errors; only rare
  formatter-failure log lines change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §5.2 / test plan: fix net-curve response-shape claim.
  getUserChainNetCurve fetches NetCurvePoint[] directly per
  user.service.ts:424; only getUserTotalNetCurve unwraps a
  {usd_value_list: [...]} envelope. Wording now enumerates the four
  variants — single-object, flat array (incl. chain net curve),
  nested-object-containing-array (only total net curve), POST body.
- §2.2: replace "checked via grep" with a direct invariant
  statement: "getUserTotalNetCurve is the only service method that
  unwraps an API response before formatting." Spec states the
  invariant; the grep was how it was verified, not the spec.
- §3.1 step 10: drop the `pnpm rm isolated-vm` CI test — that
  mutates package.json/pnpm-lock.yaml and bleeds across test runs.
  Replace with a child-process test that spawns
  `node --import tests/integration/no-isolated-vm.loader.mjs ...`,
  an ESM resolve hook that throws ERR_MODULE_NOT_FOUND on
  "isolated-vm". Project metadata stays untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three wording/precision fixes, no architectural changes.

- §1 invariants: the "*() returns markdown via
  formatResponse(await thisRaw())" invariant now acknowledges the
  single getUserTotalNetCurve exception (which unwraps
  data.usd_value_list before formatting). Points readers at §2.2
  for the precise rule.
- §2.2 step 1: scope "byte-identical" to "successful markdown
  output and axios request shape" so it stops contradicting the
  later, intentional formatter error-string refinement.
- §5.2: clarify the regression-test harness. Single-version, not
  dual: snapshot v0.1 markdown into tests/snapshots/services/
  before any service refactor; after the refactor, the running v0.2
  code is asserted equal to those committed snapshots. No
  dual-version test runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §3.1 + §3.1-preamble: fix the double-registration bug for
  debank_get_supported_chain_list. Step 8 (legacy registration) now
  filters out that name since step 7 already registered it as a
  default convenience tool. FastMCP rejects duplicate names, so
  this would have been a startup error under --legacy-tools. Now
  27 legacy handlers register, not 28.
- §2.3: new explicit rule that index-builder.ts strips any
  underscore-prefixed parameter (currently _userQuery) when
  emitting the docs index. _userQuery exists for the legacy JQ-
  filter context machinery and is meaningless inside execute().
  Teaching agents to pass it from sandbox code would be misleading.
  Legacy tool-handlers.ts registration keeps the field — that's
  where it actually does work. Test asserts indexed params for
  debank_get_chain contain id but not _userQuery.
- §1 invariants: line 49 wording tightened. Markdown formatter
  (toMarkdown) and JQ filter (LLMDataFilter) are LEGACY-PATH-ONLY;
  the sandbox calls *Raw() directly and never touches
  formatResponse. Implementers must not call toMarkdown or
  LLMDataFilter from *Raw() or the sandbox proxy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Counting fix throughout: the legacy tool count is 31, not 28
  (verified with `grep -c '^\s*name: \"debank_' src/tools/index.ts`).
  Every tool-count reference in the spec is updated: invariants,
  decision rows, refactor scope, test plan, cold-start sequence,
  release plan, deferred items. The only remaining "28" is a
  token.service.ts:28-43 file-line citation, which is correct.
- Legacy-mode math fixed accordingly: 30 of 31 handlers register
  under --legacy-tools (not 27 of 28), because
  debank_get_supported_chain_list is still owned by the default
  surface and FastMCP rejects duplicate names.
- §6 release copy and changeset text rewritten to say "30 of the 31
  legacy debank_* tools are now hidden by default;
  debank_get_supported_chain_list remains visible as a default
  grounding tool." Less ambiguous than the previous "the 28 tools
  are hidden" framing.
- §5.2 execute/client.ts row: extend test coverage from just
  debank.resolveChain to all three sandbox-facing resolvers —
  resolveChain, resolveChains (success + null-on-any-fail), and
  resolveWrappedToken (success path + null for unknown chain ID
  and null for chain without a wrapped token).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both findings address gaps in the integration test plan that would
let tests pass on one developer's machine and fail (or hit real LLMs)
on another's.

- §5.3: new "Test environment setup" boilerplate at the top of the
  integration section, mandatory in every integration test file via
  vitest setupFiles. Sets a dummy DEBANK_API_KEY (env.ts:18-29
  refuses to import without it), then deletes IQ_GATEWAY_URL,
  IQ_GATEWAY_KEY (base.service.ts:55 routes through the gateway
  whenever both are present — a local .env leak would silently
  bypass the MSW mock for pro-openapi.debank.com),
  GOOGLE_GENERATIVE_AI_API_KEY, and OPENROUTER_API_KEY (no
  accidental real LLM calls). Notes that import order matters —
  services must be imported after setup.ts runs.
- §5.3 resolveChain integration bullet: explicitly mock
  src/lib/entity-resolver via vi.mock so "Polygon" → "matic" is
  deterministic. The previous bullet relied on Gemini being
  available, which it isn't in CI and shouldn't be in any
  integration test. The unit test in §5.2 (mcp/tools.ts row) still
  covers the actual resolver behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two test-isolation hardenings.

- §5.3 test setup: neutralize dotenv before env.ts is imported. The
  previous boilerplate `delete process.env.IQ_GATEWAY_URL` left the
  key undefined, so when env.ts's top-level `dotenv.config()` ran it
  would populate any undefined key from a developer's .env file —
  letting a local IQ_GATEWAY_URL silently bypass the MSW mock for
  pro-openapi.debank.com. New boilerplate uses
  `vi.mock("dotenv", () => ({ config: () => ({parsed: {}}) }))`
  followed by `vi.stubEnv` for each key. Empty strings fail Zod's
  .min(1), so optional fields become undefined per env.ts:18-29.
- §5.2: unit-test rows that exercise the resolver from inside the
  sandbox now explicitly note `vi.mock("../../src/lib/entity-
  resolver", ...)` with fixed stubs. The previous wording
  ("debank.resolveChain('BSC') → 'bsc'") could be read as exercising
  the real Gemini-backed resolver, which would fail without a key
  and burn money with one. Resolver-accuracy testing is explicitly
  out of scope for the automated suite (Gemini calls not exercised).
  resolveWrappedToken stays unmocked — it's a pure chains.ts lookup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §5.3 test setup: empty-string stubs would have crashed env.ts.
  IQ_GATEWAY_URL is z.url().optional() — empty string fails URL
  validation, not "treated as undefined." Other optional fields use
  z.string().min(1).optional() — empty also fails. Replace
  vi.stubEnv("", "") with delete process.env.X. The vi.mock("dotenv")
  step still closes the .env-reload loophole that motivated the
  previous wording.
- §3.1 step 10 / lazy-isolated-vm child-process test: vi.mock does
  not extend to spawned children — they run real dotenv.config() at
  startup, which would load the developer's .env (Gemini key →
  cache init → unwanted network call). Spawn now passes an explicit
  sanitized env block (only PATH, NODE_ENV, DEBANK_API_KEY,
  DOTENV_CONFIG_PATH=/dev/null) and a cwd in os.tmpdir() so
  dotenv's relative .env lookup finds nothing.
- §5.2 / §5.3: vi.mock specifiers now include the .js extension to
  match the runtime import string in this NodeNext project (e.g.
  src/tools/index.ts:7 imports "../lib/entity-resolver.js"). Vitest
  matches specifiers verbatim against the implementation import;
  dropping the extension would silently skip the mock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §3.1 step 10 lazy-isolated-vm smoke test: relative paths
  (./tests/..., dist/index.js) wouldn't resolve from a temp cwd.
  Updated the example to resolve both the loader and the entrypoint
  to absolute paths via fileURLToPath(import.meta.url) + path.resolve
  against the repo root before passing to spawn. Adds the mkdtempSync
  cwd construction inline.
- §5.2 + §5.3 resolver mocks: vi.mock({factory}) replaces every
  export. resolveWrappedToken (pure, no LLM) would lose its real
  implementation under a factory that only declares resolveChain.
  Switch to the partial-mock pattern using vi.mock(spec, async
  (importOriginal) => ({ ...await importOriginal(), resolveChain:
  vi.fn(...) })). resolveWrappedToken keeps the real chains.ts
  lookup; only the LLM-backed exports get stubs.
- §2.3 docs index: harmonize the field name. The metadata example
  used `description`, the MiniSearch boost referenced `summary`, and
  the result example used `summary`. Standardize on `description`
  throughout — one name, mirroring src/tools/index.ts:54 where each
  tool definition already has a description. New note explains why
  we don't carry CoinGecko's separate summary+description split
  (their OpenAPI spec has both; ours doesn't).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two corrections to the isolated-vm wiring spec — the previous text
named the wrong primitives and underspecified transfer options.

- §2.2 Step 3: replace "isolated-vm Reference" with
  "ivm.Callback({ async: true })". A Reference requires guest-side
  .apply(...) invocation, which would force hand-written wrapper
  functions in the guest to make debank.user.getX(args) feel native.
  A Callback constructed with { async: true } transfers into the
  guest context as a plain async function — the agent's run(debank)
  body can call it directly via await. Resolver helpers wired the
  same way; resolveWrappedToken uses { async: false } since it's a
  pure chains.ts lookup. Includes a host-side wiring sketch showing
  evalClosure + Callback construction.
- §2.1 Step 5: make the script.run TransferOptions explicit.
  Without promise: true the host receives a Reference to the
  IIFE's unresolved Promise, not the resolved value. Without an
  explicit copy: true (or externalCopy) the complex return can
  come back as a Reference the host can't serialize. Step 5 now
  specifies { timeout: 30_000, promise: true, reference: false,
  copy: true } and links to the isolated-vm TransferOptions docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three isolated-vm wiring corrections.

- §2.2 wiring sketch: the guest-facing name is getUserChainBalance,
  not getUserChainBalanceRaw. The previous evalClosure body
  installed the Raw suffix on globalThis.debank.user, contradicting
  the agent-facing shape in §2.2 and the metadata 'qualified' field
  in §2.3. Fix: evalClosure installs the agent name; only the host
  callback body calls userService.getUserChainBalanceRaw. Added an
  explicit "Naming asymmetry is deliberate" note explaining that
  sandboxMethodPath in metadata is the host-side lookup, not what
  the agent types.
- §2.2 callback signature: ivm.Callback args are copied into the
  host function, not Reference objects. Removed the misleading
  `argsRef: ivm.Reference<unknown>` + `.copySync()` dance; the
  callback signature is `async (args: { chain_id: string; id:
  string })` directly. References the Callback section in the
  isolated-vm docs.
- §2.1 step 5 TransferOptions: drop reference: false. Transfer
  options are positive flags — pick one of copy / externalCopy /
  reference and set it true. reference: false was noise (and could
  fail TS typings). Final shape: { timeout: 30_000, promise: true,
  copy: true }.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §5b: add a "pretest": "pnpm run build" script to package.json so
  pnpm test always builds first. Without it, CI's pnpm test runs
  the child-process smoke test against a missing or stale dist/.
- §5.1: add an explicit required vitest.config.ts with
  test.setupFiles pointing at tests/integration/setup.ts. Without
  the config file, Vitest doesn't load the setup, the dotenv mock
  + env pruning silently don't run, and a developer's local .env
  leaks into the test process — defeating all the work in §5.3.
- §2.2 step 3 intro: drop the stale getUserChainBalanceRaw example.
  Agent-facing surface uses getUserChainBalance everywhere; the
  Raw suffix is a host-side implementation detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.2 wiring sketch: move `{ release: true }` from
  Reference#set's transfer-options arg (where it is either rejected
  by TypeScript or silently ignored) onto ExternalCopy#copyInto,
  which is what isolated-vm actually accepts the flag on. Added a
  comment explaining the placement.
- §5.2 execute/client.ts row: split into two explicit assertion
  groups. Group (a) — Naming-asymmetry forwarding — is new: spy on
  userService.getUserChainBalanceRaw and assert that an execute()
  call writing d.user.getUserChainBalance (no Raw) routes through
  that spy. This is the load-bearing §2.2 contract — agent never
  sees Raw, host dispatches to Raw — and was previously implied
  but not asserted directly. Group (b) — Resolver-helper coverage
  via partial mock — is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.1 step 5: add an outer host-side Promise.race wall-clock guard
  around script.run. The isolated-vm script timeout only guards V8
  execution, not arbitrary Promise settlement — a guest body like
  `async function run(){ await new Promise(() => {}); }` would hang
  the host indefinitely because no DeBank call ever fires the per-
  call 5 s timeout. On race timeout the implementation disposes the
  isolate (frees V8 heap) and throws "Execute timed out after 30s".
  Added integration test for never-settling promise.
- §3.1 step 10 / lazy-isolated-vm smoke test: --import only
  preloads a module; it does NOT auto-install resolve hooks. Per
  node:module docs the preload must call register() explicitly.
  Split the loader into two files: no-isolated-vm.register.mjs
  (calls register on the hooks file) and no-isolated-vm.hooks.mjs
  (the actual resolve function throwing ERR_MODULE_NOT_FOUND on
  "isolated-vm"). Updated the spawn command to pass the register
  file to --import.
- §3.1 step 10 lazy-load note: isolated-vm is CommonJS; dynamic-
  importing it from ESM gives a namespace where the package
  exports live under .default. Require normalization:
  `const mod = await import("isolated-vm"); const ivm = mod.default
  ?? mod;`. Without this, ivm.Isolate is undefined on some Node
  versions with a confusing TypeError at first execute.
- §2.2 callback body + step 3 narrative: avoid the
  AbortController-vs-axios-timeout race. Set abort at 5_000 ms and
  axios timeout at 6_000 ms — strictly larger so abort wins under
  normal conditions. Detect both abort (controller.signal.aborted)
  and axios timeout error codes (ECONNABORTED, ETIMEDOUT) in the
  catch. Either path emits the one canonical
  "DeBank call timed out after 5s: <method>" message. Tests assert
  both paths produce the same string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.1 step 5: clear the outer-timeout handle on the success path
  via try/finally so a successful execute doesn't leave a 30 s
  pending callback that fires dispose() on an already-disposed
  isolate (harmless but noisy, breaks fake-timer tests).
  Centralize disposal behind a `disposed` flag for idempotency.
- §2.2 trailing paragraph: replace the stale "30 s outer timeout
  is isolated-vm's built-in script timeout" wording — it
  contradicted the round-20 fix in §2.1. The paragraph now lists
  three layered timers explicitly: script.run timeout (V8 guard),
  outer Promise.race (wall-clock guard), per-call AbortController
  (5 s) + axios timeout (6 s). The §2.1 outer race is the only
  one with a true wall-clock guarantee.
- §5 / lazy-isolated-vm: replace stale "loader is no-isolated-vm.
  loader.mjs" wording with the round-20 split: register.mjs (the
  --import target calling register()) + hooks.mjs (the resolve
  hook). Prevents the implementation plan from creating both old
  and new files.
- §2.2 + §3.2 data-flow narrative: fix two stale "timeout: 5000"
  examples to match the round-20 5 s/6 s split (AbortController
  5 s, axios 6 s strictly larger so abort wins first). The 5/6
  asymmetry is load-bearing for the canonical timeout-error
  rewrite — stale 5/5 examples would have led implementers to
  reintroduce the race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §2.2: the round-20 axios-timeout fallback path (err.code ===
  ECONNABORTED / ETIMEDOUT) silently can't fire today because
  extractErrorMessage flattens every AxiosError to new Error(msg),
  discarding code and cause (error-handler.ts:22). Phase one
  patches extractErrorMessage to preserve both: wrap with
  { cause: error } and copy error.code onto the wrapper. Without
  this fix the abort-vs-axios race fix degrades to "only abort
  wins reach the canonical message." Added a §5.2 unit-test row
  for error-handler.ts asserting the .code and .cause preservation
  for AxiosError input.
- §4.1 sandbox-failure table: add the missing row for the outer
  host-side Promise.race timeout (round-20). Distinguishes V8
  script-execution timeout ("Script timed out after 30s") from the
  wall-clock never-settling-promise timeout ("Execute timed out
  after 30s"). The behavior was specified in §2.1 and tested in
  §5.3 but missing from the §4.1 failure table.
- §1 invariants: replace the stale "Dual timeout" invariant with a
  three-layer description matching §2.1 + §2.2 + §3.1: script.run
  timeout (V8 guard), outer Promise.race (true wall-clock guard),
  per-call AbortController + axios. Explicit that only the outer
  race provides a true wall-clock guarantee.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
30 sequenced tasks with bite-sized steps covering:
- Deps + scripts (Tasks 1-2)
- Vitest config + test setup with dotenv mock and env pruning (3-4)
- error-handler axios code/cause preservation (5)
- RequestOptions through BaseService (6)
- Snapshot v0.1 markdown for all 31 service methods (7)
- Service-by-service *Raw() + dual-catch refactor (8-12)
- Pure tool-metadata + side-effectful tool-handlers (13-14)
- Docs index build + 10 cookbook entries (15)
- Instructions.md + generator + generated TS (16)
- Sandbox (isolated-vm, three-layer timeout, blocklist) (17)
- In-sandbox debank client (Callbacks, dual timeout, error preservation) (18)
- Execute MCP tool with lazy load (19)
- Search_docs MCP tool (20)
- Convenience tools (21)
- New src/index.ts wiring + version from package.json (22)
- Integration tests: execute, search_docs, legacy mode, lazy isolated-vm (23-26)
- Service snapshot regression (27)
- CI workflow (28)
- Release artifacts (29)
- Final integration verification (30)

Plan implements docs/superpowers/specs/2026-05-13-stainless-style-mcp-
refactor-phase-one-design.md (commit 78aa170).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… plan

- High: Task 19 executeTool now catches isolated-vm load failure and
  emits the canonical "isolated-vm native module failed to load…
  pnpm rebuild isolated-vm. Original error: …" message in the
  {ok:false} envelope (spec §4.4). Added a unit test that mocks
  runInSandbox to throw ERR_MODULE_NOT_FOUND. The lazy-isolated-vm
  smoke test only covered server startup; this closes the gap on
  the actual execute() failure path.
- High: Task 14 tool-handlers restores the v0.1 debank_get_chain
  per-tool resolve. resolveEntities() doesn't handle args.id as a
  chain (it only handles id-as-token-when-chain_id-set), so a generic
  call would have regressed debank_get_chain({id:"Ethereum"}).
  Special-case preserved with an explicit comment + regression test.
- Medium: Task 8 zero-arg raw method signature shimmed —
  getSupportedChainListRaw now takes (_args?: Record<string, never>,
  options?: RequestOptions) so the sandbox dispatcher's (args, options)
  call convention doesn't drop the AbortController/timeout for
  truly-zero-arg methods.
- Medium: Task 7 scripts/snapshot-baseline.ts sets DEBANK_API_KEY +
  deletes gateway/LLM vars at the very top, BEFORE any src/ import.
  env.ts refuses to parse otherwise; the vitest setup file doesn't
  apply to standalone tsx scripts.
- Medium: Task 17 sandbox SCRIPT_DEADLINE_MS reads from
  DEBANK_MCP_SANDBOX_DEADLINE_MS env override (test-only knob).
  Task 23 never-settling test sets it to 1000 ms and uses
  vi.resetModules to import a fresh sandbox — test now completes
  in ~1s instead of 30s. Error message interpolates the actual
  deadline so logs stay truthful.
- Low: Task 8 type names corrected — ChainInfo (types.ts:13) and
  GasMarket (types.ts:302) instead of fabricated Chain / GasPrices.
  Added a note that all service Raw types should import from
  ../types.js with the actual exported names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… plan

- Medium: Task 8 ChainService snippets now reproduce v0.1 formatResponse
  options byte-identical — getChain title is "Chain Information:
  ${data.name}" (not "Chain Information"), getGasPrices title is
  "Gas Prices for Chain: ${args.chain_id}" with
  numberFields: ["price", "front_tx_count", "estimated_seconds"].
  Without this, the snapshot regression would have caught the mismatch
  but the engineer would have shipped wrong markdown to whoever was
  rebuilding the baseline. Added a stronger directive that ALL
  formatResponse options across Tasks 8-12 must be copied verbatim
  from the v0.1 method body.
- Medium: Task 23 resolveChain-inside-execute test now calls
  vi.resetModules() BEFORE vi.doMock() and again in the finally
  block. Earlier tests in the same file have already imported
  executeTool, which lazy-imports ./sandbox.js and ./client.js;
  those caches the original (real) entity-resolver. Without the
  reset, doMock wouldn't intercept the cached chain and the test
  could hit the real Gemini path.
- Medium: Task 26 lazy-isolated-vm test now drives the MCP stdio
  handshake explicitly (initialize → notifications/initialized →
  tools/list) and parses line-delimited JSON-RPC responses. The
  previous "any stdout means ready" gate would deadlock because
  FastMCP servers don't proactively announce ready — the client
  initiates the handshake. The new test also asserts the actual
  tool names registered, which is stronger evidence that
  isolated-vm wasn't required to reach server.start.
- Low: Task 14 expected test count corrected from 3 to 4 after
  adding the debank_get_chain regression test in round-1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… plan

- Medium: Task 17 runInSandbox now wraps getIvm() + Isolate
  construction INSIDE the try block, so a native isolated-vm load
  failure returns the canonical {ok:false, error: "isolated-vm
  native module failed to load. … pnpm rebuild isolated-vm. Original
  error: …"} payload instead of rejecting. The catch detects the
  load-failure path by checking `isolate === undefined`. executeTool's
  outer try/catch (Task 19) is now belt-and-braces only — the
  contract that "runInSandbox never rejects" is the primary
  guarantee. Explicit contract note added at the end of Task 17.
- Medium: Task 26 lazy-isolated-vm test child env now also sets
  DEBANK_MCP_LEGACY: "1" alongside the --legacy-tools argv flag.
  Belt-and-braces against any argv-position quirk in the child;
  same tools/list assertion still verifies the registration path.
- Low: Task 23 expected-output text now says the never-settling
  test takes ~1s (matches the DEBANK_MCP_SANDBOX_DEADLINE_MS=1000
  override applied in round-1), not 30s.
- Low: Task 30 sanity-check now drives the MCP stdio handshake
  explicitly via a heredoc (initialize → notifications/initialized
  → tools/list). The previous text claimed FastMCP "prints a
  handshake on stdout" — it doesn't; clients initiate. Matches the
  lazy-isolated-vm test's handshake driving from round-2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… plan

- Medium: Task 21 default `debank_get_supported_chain_list` is now
  v0.1-shape preserved — accepts `_userQuery`, pipes through
  setQuery on every service, full v0.1 description text. The
  previous stripped-down convenience version would have shipped a
  semantic break: legacy mode skips this tool as a duplicate
  (§3.1 step 8), so users on --legacy-tools never got the original
  shape back. The default registration is now the byte-identical
  v0.1 tool from src/tools/index.ts:52-62.
- Medium: Task 16 instructions.md no longer claims "The server
  already retries upstream transient errors." That contradicts
  spec §4.5 — no server-side retry. Replaced with explicit
  agent-side retry guidance plus a worked retry-loop example
  showing the in-single-execute-body pattern (variables don't
  persist between execute calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… plan

- Medium: Task 18 makeTimeoutWrapped now wraps the *Raw() return
  in new ivm.ExternalCopy(result) per spec §2.2 step 3. ivm is
  threaded in as the first parameter (it's only available inside
  installDebankClient after the dynamic import). Resolver helpers
  also wrap their returns in ExternalCopy for the same boundary-
  contract reason. Without this, complex DeBank responses (deeply
  nested objects) can surface as Reference handles rather than
  copied values on the guest side.
- Medium: Task 16 instructions.md error-handling section corrects
  the throw-vs-return semantics. runInSandbox returns
  {ok:true, result:<value>} for ANY returned value, including
  error-shaped objects like { error: "..." }. Only uncaught throws
  produce {ok:false}. The previous wording would have taught
  agents to return-instead-of-throw and silently miss failures
  in client-side error checks.
- Medium/Low: Task 26 lazy-isolated-vm child-process test now
  follows the initialize → tools/list → tools/call execute flow.
  The fourth step asserts isError:true and the canonical "isolated-
  vm native module failed to load… pnpm rebuild isolated-vm"
  payload. End-to-end proof of the lazy-loading contract — previous
  test only proved registration works without the addon, not that
  execute actually fails gracefully when the addon is genuinely
  unloadable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… plan

- Medium: Task 13 side-effect-freeness test is now an honest child-
  process check. The previous in-process "does NOT import" assertion
  was a false positive: tool-metadata.js was already imported at the
  top of the test file (cached), and the vitest setupFiles always
  sets DEBANK_API_KEY so a transitive env.ts import wouldn't even
  throw. The new test spawns node with no env vars and imports
  tool-metadata from dist — if anything in the import graph touches
  env.ts the Zod refine fails and the child exits non-zero. Tests
  the load-time invariant honestly.
- Medium: Task 16 instructions wrapped-token section now says the
  agent MUST call debank.resolveWrappedToken explicitly with a
  worked example. The previous wording ("automatically resolve")
  was wrong — the sandbox client forwards args verbatim; nothing
  intercepts the keywords. Matches the actual client.ts behavior
  in Task 18.
- Low: Task 13 first metadata entry description is now byte-
  identical to v0.1 src/tools/index.ts:54 — restored the final
  sentence "Use this to discover available chains before calling
  other chain-specific endpoints." Added a stronger note in Step 2
  that descriptions must be copied verbatim from v0.1 (the snapshots
  test catches body drift; description drift would have shipped
  unnoticed since metadata isn't snapshot-tested).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t state leak across tool calls)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Aliiiu
Copy link
Copy Markdown
Contributor Author

Aliiiu commented May 14, 2026

/gemini review

Aliiiu and others added 2 commits May 14, 2026 17:35
…rns null for non-wrapped tokens

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…) into ok:false envelope

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a major refactor of the DeBank MCP server, introducing a 'Code Mode' that allows AI agents to run sandboxed JavaScript via a new execute tool and search documentation using search_docs. To optimize LLM performance, 30 legacy tools are now hidden by default but remain accessible via the --legacy-tools flag. The architecture is updated to support layered timeouts and raw JSON responses from services, and a comprehensive test suite using Vitest and MSW has been introduced. I have no feedback to provide.

Aliiiu and others added 25 commits May 14, 2026 17:36
…array + score + string params)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…needsResolution (preserves WBNB/WMATIC/WAVAX support)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ay //)

Converts all two-or-more consecutive // line-comment blocks that appear
outside the top-of-file header into /** */ JSDoc-style blocks, following
the stainless-style convention established in this branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ocks across 8 files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Solves architecture findings #1 (singleton-state leak) and #4 (formatResponse three-branches-behind-one-method) by deleting the underlying complexity rather than restructuring it.

Removed:
- src/lib/utils/data-filter.ts (LLMDataFilter)
- src/lib/integrations/openrouter.ts
- BaseService.{aiModel, dataFilter, currentQuery, setAIModel, setQuery, formatResponse}
- _userQuery parameter from all legacy tool schemas
- setQuery broadcast block in tool-handlers.ts and tools.ts
- tsconfig.scripts.json (was a tsx-resolution workaround for js-tiktoken)
- js-tiktoken + @openrouter/ai-sdk-provider deps
- OPENROUTER_API_KEY, LLM_MODEL, GOOGLE_GENERATIVE_AI_API_KEY from env.ts
- 3 tests (2 _userQuery-piping tests + 1 singleton-state-leak regression)

Replaced:
- Service markdown methods call toMarkdown() directly instead of this.formatResponse()
- Snapshot baseline script invocation simplified (no --tsconfig flag)

Rationale: Code Mode (the execute tool) pushes projection into agent-authored JS inside the sandbox. The v0.1 host-side LLM filter that ran on huge legacy-tool responses was a dead affordance after the v0.2 refactor — only fired on --legacy-tools paths and required model + query + token-threshold state that bypassed the method signature. The CoinGecko Stainless reference doesn't filter on the host at all.

BREAKING for --legacy-tools users: huge responses are no longer LLM-compressed via _userQuery. Use `execute` with a JS projection instead.
Captures the architectural reasoning so a future explorer doesn't propose adding a host-side LLM filter without surfacing the trade-off. The right layer for any future filtering is the tool layer, not BaseService.
…ilders

Replaces makeHostRef + makeResolverRef + inline sync ref with:
- installServiceCall (dual-timeout, JSON args, timeout-aware error coercion)
- installResolver (spread args, plain errors, optional sync via applySync)

The 3rd pattern (sync resolver) collapses into installResolver as a
sync?: boolean flag — a real micro-variation within "resolver" semantics,
not a third pattern. Envelope construction is deduplicated via two private
helpers (envelopeOk / envelopeFail).

installDebankClient's body is now a flat list: 31 installServiceCall +
3 installResolver. Each call site says only what its semantic requires.

No behavior change. All 8 client.test.ts tests + 97-test full suite pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each TOOL_METADATA entry's legacyMethodPath/sandboxMethodPath strings
become legacyImpl/sandboxImpl thunks built by lazyMethod<K, M>(svc, mth)
where the generic constraints type-check both arguments against the
actual service singleton shape (via 'import type * as Services').

A typo on either argument is now a compile error. Renaming a method on a
service surfaces in tool-metadata.ts at build time instead of at agent-
call time.

Deletions:
- SERVICE_MAP copy in tool-handlers.ts
- SERVICE_MAP copy in execute/client.ts
- resolveMethod() in tool-handlers.ts
- resolveRaw() in execute/client.ts
- 'every legacyMethodPath / sandboxMethodPath resolves' test in
  tool-handlers.test.ts (the bug class it catches is now a compile error)

Side-effect-freeness preserved: dynamic 'import' lives inside the thunk
closure, never executes at module load. tool-metadata.import.test.ts
still passes — verified the child-process import works with no env vars.
…dToken

The WRAPPED_TOKEN_KEYWORDS array moves from validators.ts into
entity-resolver.ts where its sole consumer lives. The token-side
predicate becomes a private isWrappedTokenKeyword(); the chain-side
predicate becomes the exported looksLikeChainName().

The previous needsResolution(str, type) overload split into two
named predicates. Callers that used the chain branch outside this
module (tool-handlers.ts for the debank_get_chain quirk) now import
looksLikeChainName by its actual purpose.

needsResolution is deleted from validators.ts. The module shrinks to
~7 lines, holding only isNotFoundResponse.

Behavior unchanged — same keywords, same predicates, same semantics.
96/96 tests pass without modification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oes the work

The custom 80-line serialize() in build-docs-index.ts emitted unquoted
object keys + trailing commas + tab indentation. Biome's `format --write`
post-pass already does all three by default (quoteProperties: 'asNeeded',
trailingCommas: 'all', indentStyle: 'tab').

Replacing `serialize(entries)` with `JSON.stringify(entries, null, "\t")`
produces byte-identical embedded-index.ts after the biome pass — verified
by empty git diff on regeneration.

The deletion test passes cleanly: complexity goes nowhere, because biome
was already doing that work. ~40 lines removed.
…ate literal

The script previously embedded the markdown via JSON.stringify, producing
a 5898-char single-line string with \n escapes throughout. Hard to read,
hard to diff.

Now the script escapes only the three characters that have meaning inside
a JS template literal (backslash, backtick, \${) and emits the markdown
between backticks. The runtime template-literal parser reverses those
escapes — the exported INSTRUCTIONS value is byte-identical (md5
3c05c708...4a49 before and after).

The committed file now reads top-to-bottom as the original markdown
structure with only the necessary backtick escapes around code fences
and inline code spans. Diffs against future instructions.md changes will
be line-accurate.

Behavior unchanged. 96/96 tests pass including the lazy-isolated-vm
child-process MCP handshake that loads INSTRUCTIONS into a live server.
Captures the three-location rule (// for file headers + single-line
notes, /** */ for everything else multi-line) and the supporting
sub-rules (no JSDoc tags, no rotting task references, no TODO without
issue, etc.) so future contributors don't revert to defaults.

The convention was applied piecemeal across recent commits (notably
bef121f, 213aac7) but lived only in commit messages and tribal
knowledge. docs/style/ is the natural home as more conventions
accumulate.
…the binding constraint

The 100KB character cap silently dropped results once cumulative
JSON.stringify size exceeded it. In practice the 10-result hit limit
already constrains output well below context limits (10 method entries
≈ 10KB; 10 verbose prose entries ≈ 30KB), but the cap added a silent-
data-loss code path that mirrored the v0.1 LLM filter anti-pattern.

Research confirms CoinGecko's stainless-generated MCP search_docs has
no equivalent cap — they cap at 10 results and trust that ceiling.

This is the same shape of deepening as the v0.1 filter deletion: the
abstraction was guarding against a problem the binding constraint
already handles. Delete it.
…ma, invoke_endpoint)

Replaces the per-endpoint legacy surface with three meta-tools that
mirror CoinGecko Stainless's --tools=dynamic mode:

- list_endpoints(filter?) returns available qualified names + summaries
- get_endpoint_schema(name) returns params + response JSON Schema + example
- invoke_endpoint(name, params, jq_filter?) dispatches by name and
  optionally projects the response through a jq filter (deterministic
  replacement for the deleted v0.1 LLM filter)

Stage 1 of 2: ADDS the new surface. The 30 legacy debank_* tools remain
intact behind --legacy-tools for this commit. Stage 2 deletes them along
with the markdown wrapper layer.

Default tool surface grows from 4 to 7 tools. Each new tool is registered
unconditionally (no flag).

Adds node-jq dependency. Adds responseSchema field to ToolMetadata and
hand-written Zod response schemas at src/mcp/legacy/response-schemas.ts.

Updates instructions.md with a 'When to use which tool' section
teaching the discovery → schema → invoke workflow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stage 2 of 2: completes the dynamic-tools migration. The legacy
debank_* tools (behind --legacy-tools), the markdown wrapper methods
on every service, the toMarkdown formatter primitive, the v0.1
snapshot regression test (Task 27), the baseline-capture script
(Task 7), and the --legacy-tools flag are all removed.

Per-endpoint access now happens via the dynamic-tools triad added in
Stage 1: list_endpoints + get_endpoint_schema + invoke_endpoint (with
jq_filter for host-side projection — the deterministic replacement
for the v0.1 LLM filter).

Services now expose only *Raw() JSON-returning methods. invoke_endpoint
dispatches by qualified name via the lazyMethod-typed sandboxImpl
field (legacyImpl deleted).

Default tool surface: execute, search_docs, debank_resolve,
debank_get_supported_chain_list (now JSON), list_endpoints,
get_endpoint_schema, invoke_endpoint. The --legacy-tools flag and
DEBANK_MCP_LEGACY env var are no longer recognized.

BREAKING CHANGES:
- The 30 debank_* tools (chain, protocol, token, user, transaction)
  are no longer available. Use invoke_endpoint with the qualified
  name from list_endpoints.
- The --legacy-tools flag and DEBANK_MCP_LEGACY env var are removed.
- debank_get_supported_chain_list now returns JSON, not markdown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The tool was redundant with invoke_endpoint — agents that want the
chain list now call:
  invoke_endpoint({ name: 'debank.chain.getSupportedChainList', params: {} })

debank_resolve stays as a default convenience because resolveChain
is an in-memory string-table lookup (not one of the 31 service
endpoints), and spinning up an isolated-vm isolate just to convert
'BSC' → 'bsc' is overhead the agent shouldn't pay.

Default tool surface is now 6 tools: execute, search_docs,
debank_resolve, list_endpoints, get_endpoint_schema, invoke_endpoint.

Updated instructions.md leading paragraph and search_docs's no-match
hint to point agents at the new path. lazy-isolated-vm child-process
test updated to assert the new toolNames shape.
…entally committed in 9c7f5d4)

The file is the user's CoinGecko tool-docs extract used to inform the
dynamic-tools triad design. It was meant to stay as local research
material, not ship in the repo. `git add -A` in the previous commit
accidentally swept it up; this reverts that and adds an ignore rule.
…ls=dynamic

User-raised challenge: 'if everything is happening in the sandbox;
discovery, filtering and endpoint calling and everything we won't have
a wasted overhead' — and they're right. The 6-tool surface bundled four
tools whose only real justification was sandbox-overhead avoidance, but
one execute call amortizes the isolate boot across all operations in
its body. The argument doesn't hold for any multi-step workflow.

Aligning with CoinGecko's Stainless reference (which puts dynamic tools
behind --tools=dynamic): default surface is now execute + search_docs
only. Four extra tools register when --tools=dynamic or
DEBANK_MCP_TOOLS=dynamic is set:

  - debank_resolve (in-memory chain-name resolver)
  - list_endpoints (filtered listing of available endpoints)
  - get_endpoint_schema (params + response JSON Schema for one endpoint)
  - invoke_endpoint (per-endpoint dispatch with optional jq_filter)

The four extras are still fully functional behind the flag; the only
change is registration. node-jq stays as a dependency since
invoke_endpoint still uses it under the flag.

Renamed defaultConvenienceTools → dynamicConvenienceTools to match
the new gating semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
node-jq spawns a downloaded jq binary via a postinstall script that pnpm
skips unless approved as a built dependency, causing `spawn .../jq ENOENT`
in CI. Switch to jqts (already a dependency) — pure JS, no native binary,
no install-script approval, cross-platform. Unwrap single-output streams
to preserve jq CLI scalar semantics (`.name` -> "Ethereum", not ["Ethereum"]).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant