Add AppleScript Computer Use subagent with safe on-device execution#1771
Merged
Conversation
Adds an `applescript` delegation subagent and `run_applescript` tool so an agent can automate the Mac (Finder, Safari, Mail, Notes, System Events, app state) by describing a whole task. A dedicated on-device AppleScript model writes the script, it runs in-process via NSAppleScript, and results/errors feed back so the model can iterate. - Execution: NSAppleScript calls are serialized process-wide on a dedicated queue for thread-safety (avoids errOSAInvalidID deadlocks). Two execution modes: confirm-each (default) and auto-run-with-warning. - Safety: the generated script is shown in the Computer Use confirmation overlay before it runs; macOS Automation (TCC) permission is required and surfaced. NSAppleEventsUsageDescription broadened for agent automation. - Models: curated AppleScript models (8B / 16B-A4B JANG_4M) download from a new "Models" tab in Computer Use settings, with global defaults and per-agent overrides in the Subagents tab. - The subagent owns its own model residency (dedicated bundle) and renders its own model picker + execution-mode control instead of the shared row. - Localization strings added; 19 swift-testing cases cover decode, executor error mapping, loop gate/termination, and capability gating.
The "every kind", per-agent-toggle, and SSOT-union golden sets in SubagentCapabilityRegistryTests are drift guards that pin the exact set of subagent capabilities. Adding the `applescript` capability grew those sets by one, so test-core failed on three expectations (it isn't covered by the `--filter AppleScript` run that validated the feature locally). Update the golden sets to include applescript and add a matching `ToolRegistry.agentDelegationAppleScriptToolNames` per-family accessor (mirroring spawn/image) so the "all == union of the per-family sets" invariant stays explicit and derived from the registry.
Run the repo's swift-format config over the new AppleScript files and the registry-test union chain: split multi-argument SubagentActivityEvent calls to one argument per line (lineBreakBeforeEachArgument), space the `1 ... count` range, and fix continuation indentation. Whitespace-only; no behavior change.
The 8B ZAYA AppleScript checkpoint produced low-quality scripts, so remove it from the curated catalog and promote the Gemma-4 16B-A4B MoE build to the sole Top Pick / default. Updates the catalog comments, the 16B description (no more "than the 8B" comparison), and the AppleScriptAction native-tool-format note (Gemma-4 only).
Track A (reliability, shipping code + guards): - Route exact identifiers (note title, path, mailbox, URL) into named literals: broaden the applescript/mac_query `contents` tool descriptions, extend appleScriptGuidance/appleScriptGuidanceCompact, and strengthen the subagent literal-usage rule so identifiers are referenced by placeholder instead of re-typed. - Add a query-mode first-turn read-only exemplar (standard + concise prompts) so mac_query's first script is a read, not a blocked write. - Add a find-or-create rule for missing targets (automate prompt) and teach the mock world to record `make new note`. Track B (capability coverage): new scripted anchors + live cases for multi-app chaining, structured multi-value return, permission recovery, literal scale (~15), desktop context, and classifier false-positives; add an `environmentContext` seed to the AppleScript eval schema. Track C (sweep + default): fix the capability scoreboard to attribute the real generation model (per-case modelId) instead of the harness-nominal/judge label, promote `.nameOnly` (the reproduced sweep winner that clears the many-literal ceiling) to the shipped literal-announcement default, and realign the sweep variants against the new baseline.
# Conflicts: # Packages/OsaurusCore/PrivacyFilter/Views/PrivacyView.swift
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
applescriptdelegation subagent +run_applescripttool: an agent describes a whole task, a dedicated on-device AppleScript model writes the script, it runs in-process viaNSAppleScript, and results/errors feed back so the model can iterate (Finder, Safari, Mail, Notes, System Events, app state).confirm-each), with an opt-inauto-run-with-warningmode. macOS Automation (TCC) permission is required and surfaced;NSAppleEventsUsageDescriptionbroadened for agent automation.NSAppleScriptexecution is serialized process-wide on a dedicated queue to avoiderrOSAInvalidIDdeadlocks under concurrency.Test plan
swift test --package-path Packages/OsaurusCore --filter AppleScript— 19/19 pass (decode, executor error mapping, loop gate/termination, capability gating).OsaurusCorebuilds clean; no linter errors;Localizable.xcstrings/InfoPlist.xcstringsare valid JSON.