Skip to content

Add AppleScript Computer Use subagent with safe on-device execution#1771

Merged
tpae merged 7 commits into
mainfrom
feature/applescript
Jul 1, 2026
Merged

Add AppleScript Computer Use subagent with safe on-device execution#1771
tpae merged 7 commits into
mainfrom
feature/applescript

Conversation

@tpae

@tpae tpae commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • New applescript delegation subagent + run_applescript tool: an agent describes a whole task, a dedicated on-device AppleScript model writes the script, it runs in-process via NSAppleScript, and results/errors feed back so the model can iterate (Finder, Safari, Mail, Notes, System Events, app state).
  • Safe by default: the generated script is shown in the Computer Use confirmation overlay before it runs (confirm-each), with an opt-in auto-run-with-warning mode. macOS Automation (TCC) permission is required and surfaced; NSAppleEventsUsageDescription broadened for agent automation.
  • NSAppleScript execution is serialized process-wide on a dedicated queue to avoid errOSAInvalidID deadlocks under concurrency.
  • Model management: curated AppleScript models (8B / 16B-A4B JANG_4M) download from a new "Models" tab in Computer Use settings, with global defaults and per-agent overrides in the Subagents tab. The subagent owns its own model residency (dedicated bundle) and renders its own model picker + execution-mode control instead of the shared override row.
  • Localization strings added; new prose aligned to the de-hyphenated "subagent" spelling from Standardize on "subagent" spelling (drop hyphenated "sub-agent") #1769.

Test plan

  • swift test --package-path Packages/OsaurusCore --filter AppleScript — 19/19 pass (decode, executor error mapping, loop gate/termination, capability gating).
  • OsaurusCore builds clean; no linter errors; Localizable.xcstrings / InfoPlist.xcstrings are valid JSON.
  • Live: download an AppleScript model from Computer Use → Models, enable AppleScript on an agent, grant Automation permission, then run a feasible task (e.g. "create a note in Notes titled Demo"); confirm the script preview, execution, and the error-iteration loop on failure.

Adds an `applescript` delegation subagent and `run_applescript` tool so an
agent can automate the Mac (Finder, Safari, Mail, Notes, System Events, app
state) by describing a whole task. A dedicated on-device AppleScript model
writes the script, it runs in-process via NSAppleScript, and results/errors
feed back so the model can iterate.

- Execution: NSAppleScript calls are serialized process-wide on a dedicated
  queue for thread-safety (avoids errOSAInvalidID deadlocks). Two execution
  modes: confirm-each (default) and auto-run-with-warning.
- Safety: the generated script is shown in the Computer Use confirmation
  overlay before it runs; macOS Automation (TCC) permission is required and
  surfaced. NSAppleEventsUsageDescription broadened for agent automation.
- Models: curated AppleScript models (8B / 16B-A4B JANG_4M) download from a
  new "Models" tab in Computer Use settings, with global defaults and
  per-agent overrides in the Subagents tab.
- The subagent owns its own model residency (dedicated bundle) and renders
  its own model picker + execution-mode control instead of the shared row.
- Localization strings added; 19 swift-testing cases cover decode, executor
  error mapping, loop gate/termination, and capability gating.
@github-actions github-actions Bot added the enhancement New feature or request label Jun 30, 2026
tpae added 6 commits June 30, 2026 14:29
The "every kind", per-agent-toggle, and SSOT-union golden sets in
SubagentCapabilityRegistryTests are drift guards that pin the exact set of
subagent capabilities. Adding the `applescript` capability grew those sets
by one, so test-core failed on three expectations (it isn't covered by the
`--filter AppleScript` run that validated the feature locally).

Update the golden sets to include applescript and add a matching
`ToolRegistry.agentDelegationAppleScriptToolNames` per-family accessor
(mirroring spawn/image) so the "all == union of the per-family sets"
invariant stays explicit and derived from the registry.
Run the repo's swift-format config over the new AppleScript files and the
registry-test union chain: split multi-argument SubagentActivityEvent calls
to one argument per line (lineBreakBeforeEachArgument), space the
`1 ... count` range, and fix continuation indentation. Whitespace-only; no
behavior change.
The 8B ZAYA AppleScript checkpoint produced low-quality scripts, so remove it
from the curated catalog and promote the Gemma-4 16B-A4B MoE build to the sole
Top Pick / default. Updates the catalog comments, the 16B description (no more
"than the 8B" comparison), and the AppleScriptAction native-tool-format note
(Gemma-4 only).
Track A (reliability, shipping code + guards):
- Route exact identifiers (note title, path, mailbox, URL) into named
  literals: broaden the applescript/mac_query `contents` tool descriptions,
  extend appleScriptGuidance/appleScriptGuidanceCompact, and strengthen the
  subagent literal-usage rule so identifiers are referenced by placeholder
  instead of re-typed.
- Add a query-mode first-turn read-only exemplar (standard + concise prompts)
  so mac_query's first script is a read, not a blocked write.
- Add a find-or-create rule for missing targets (automate prompt) and teach
  the mock world to record `make new note`.

Track B (capability coverage): new scripted anchors + live cases for
multi-app chaining, structured multi-value return, permission recovery,
literal scale (~15), desktop context, and classifier false-positives; add an
`environmentContext` seed to the AppleScript eval schema.

Track C (sweep + default): fix the capability scoreboard to attribute the
real generation model (per-case modelId) instead of the harness-nominal/judge
label, promote `.nameOnly` (the reproduced sweep winner that clears the
many-literal ceiling) to the shipped literal-announcement default, and
realign the sweep variants against the new baseline.
# Conflicts:
#	Packages/OsaurusCore/PrivacyFilter/Views/PrivacyView.swift
@tpae tpae merged commit 6d9e002 into main Jul 1, 2026
5 checks passed
@tpae tpae deleted the feature/applescript branch July 1, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant