stacklok · jerm-dro · Apr 9, 2026
@@ -0,0 +1,45 @@
+---
+name: incident-triage-lame
+description: Triage active PagerDuty incidents by gathering context from all available services one tool call at a time.
+allowed-tools: Bash
+---
+
+# Incident Triage (No Scripting)
+
+We have active PagerDuty incidents. Build an incident triage report by gathering data from every available service.
+
+Do NOT use `execute_tool_script`. Call each tool individually, one at a time.
+
+1. Check PagerDuty for service health and active incidents
+2. For each degraded service, gather context from Datadog (metrics + logs), GitHub (recent PRs), Slack (#incidents messages), Jira (related issues), and Confluence (runbooks)
+3. Cross-reference the results to identify probable root causes, who's engaged, and what runbooks apply
+
+Format the final report as markdown matching this structure exactly:
+
+```
+# Incident Triage Report
+
+## Service Health
+<paste service health output verbatim>
+
+## Active Incidents
+<paste incidents output verbatim>
+
+## Degraded Service: <name>
+### Metrics
+<paste metrics verbatim>
+### Error Logs
+<paste logs verbatim>
+### Recent PRs (Potential Root Causes)
+<paste prs verbatim>
+### Slack #incidents Context
+<paste messages verbatim>
+### Related Jira Issues
+<paste jira verbatim>
+### Runbooks
+<paste runbooks verbatim>
+
+(repeat for each degraded service)
+```
+
+Include the raw tool output under each heading — do not summarize or rewrite it.
@@ -0,0 +1,79 @@
+---
+name: incident-triage
+description: Triage active PagerDuty incidents by gathering context from all available services using execute_tool_script.
+allowed-tools: Bash
+---
+
+# Incident Triage
+
+We have active PagerDuty incidents. Use `execute_tool_script` to build an incident triage report by gathering data from every available service in a single scripted call.
+
+Write a Starlark script that:
+
+1. Gets the service health list and active incidents from PagerDuty
+2. For each service that is NOT "Operational", gathers context in parallel:
+   - Datadog metrics and error logs for that service
+   - Recent GitHub PRs (look for potential root cause deploys)
+   - Slack #incidents messages for team context
+   - Related Jira issues
+   - Confluence runbooks
+3. Parses the text results to extract key details — incident IDs, error messages, who's involved, what was recently deployed
+4. Formats the result as a **markdown report** and returns it as a string
+
+The script should return a ready-to-display markdown string — NOT a dict. Build the markdown inside the script so no post-processing is needed. Structure it like:
+
+```
+# Incident Triage Report
+
+## Service Health
+<paste service health text>
+
+## Active Incidents
+<paste incidents text>
+
+## Degraded Service: <name>
+### Metrics
+<paste metrics>
+### Error Logs
+<paste logs>
+### Recent PRs (Potential Root Causes)
+<paste prs>
+### Slack #incidents Context
+<paste messages>
+### Related Jira Issues
+<paste jira>
+### Runbooks
+<paste runbooks>
+
+(repeat for each degraded service)
+```
+
+Use loops over the degraded services and string parsing to cross-reference results.
+
+Use `parallel()` to fan out tool calls concurrently. `parallel()` takes a list of zero-arg callables (use `lambda`) and returns results in order. Fan out all services at once:
+
+```python
+def gather_context(svc):
+    results = parallel([
+        lambda s=svc: datadog_datadog_query_metrics(query=s),
+        lambda s=svc: datadog_datadog_search_logs(query=s),
+        lambda s=svc: github_github_search_prs(query=s),
+        lambda s=svc: slack_slack_read_messages(channel="incidents"),
+        lambda s=svc: jira_jira_search_issues(query=s),
+        lambda s=svc: confluence_confluence_search_pages(query=s),
+    ])
+    return results
+
+# Fan out ALL services concurrently (nested parallel)
+contexts = parallel([lambda s=svc: gather_context(s) for svc in degraded_services])
+```
+
+NOTE: Starlark lambdas capture variables by reference. When using `lambda` inside a loop, bind the loop variable via a default argument to avoid the classic closure bug:
+```python
+# WRONG — all lambdas see the final value of svc
+[lambda: query(svc) for svc in services]
+# RIGHT — bind svc at definition time
+[lambda s=svc: query(s) for svc in services]
+```
+
+IMPORTANT: The script returns a fully formatted markdown report. After calling execute_tool_script, display the result text verbatim. Do NOT summarize, reformat, or add your own analysis — the script output IS the final answer.
@@ -27,6 +27,7 @@ import (
 	"github.com/stacklok/toolhive/pkg/authserver/server/keys"
 	"github.com/stacklok/toolhive/pkg/container/runtime"
 	"github.com/stacklok/toolhive/pkg/groups"
+	"github.com/stacklok/toolhive/pkg/script"
 	"github.com/stacklok/toolhive/pkg/telemetry"
 	"github.com/stacklok/toolhive/pkg/vmcp"
 	"github.com/stacklok/toolhive/pkg/vmcp/aggregator"
@@ -624,6 +625,7 @@ func runServe(cmd *cobra.Command, _ []string) error {
 		Port:                    port,
 		AuthMiddleware:          authMiddleware,
 		AuthzMiddleware:         authzMiddleware,
+		ScriptMiddleware:        script.NewMiddleware(),
 		AuthInfoHandler:         authInfoHandler,
 		AuthServer:              embeddedAuthServer,
 		TelemetryProvider:       telemetryProvider,

@@ -0,0 +1,139 @@
+# Script Middleware Demo
+
+Demonstrates `execute_tool_script` — a Starlark scripting layer that lets agents
+orchestrate multiple MCP tool calls in a single atomic operation.
+
+## What this shows
+
+An agent connected to a VirtualMCPServer with 8 enterprise tool backends
+(Slack, Jira, GitHub, PagerDuty, Datadog, Confluence, Google Drive, Linear)
+uses `execute_tool_script` to gather and cross-reference data across services
+in one call instead of 8+ sequential round-trips.
+
+## Setup (local Kind cluster)
+
+### Prerequisites
+- `kind`, `kubectl`, `docker` installed
+- ToolHive operator image built locally: `task build-all-images`
+
+### Deploy
+
+```bash
+# From repo root
+./demo/script-middleware/deploy.sh
+```
+
+This creates a Kind cluster, installs the operator, deploys 8 dummy MCP servers
+and a VirtualMCPServer, and sets up port-forwarding on localhost:4483.
+
+### Connect with Claude Code
+
+```bash
+# In Claude Code settings, add as an MCP server:
+#   URL: http://localhost:4483/mcp
+#   Transport: streamable-http
+
+# Then give Claude this prompt (see below)
+```
+
+### Teardown
+
+```bash
+kind delete cluster --name script-demo
+```
+
+## The Prompt
+
+Give this to Claude (or any MCP-capable agent) after connecting:
+
+> We have active PagerDuty incidents. Use execute_tool_script to build an
+> incident triage report by gathering data from every available service.
+>
+> Write a script that:
+> 1. Gets the service health list and active incidents from PagerDuty
+> 2. For each service that is NOT "Operational", gathers context:
+>    - Datadog metrics and error logs for that service
+>    - Recent GitHub PRs (look for potential root cause deploys)
+>    - Slack #incidents messages for team context
+>    - Related Jira issues
+>    - Confluence runbooks
+> 3. Parses the text results to extract key details (incident IDs, error
+>    messages, who's involved, what was recently deployed)
+> 4. Returns a structured dict mapping each degraded service to its
+>    full triage context
+>
+> The script should use loops and string parsing — don't just call each
+> tool once, cross-reference the results.
+
+### What the agent should produce
+
+A Starlark script that loops over degraded services, calls 5-6 tools per
+service, parses the text output to extract names/IDs/timestamps, and returns
+a structured dict. Something like:
+
+```python
+services = pagerduty_list_services()
+incidents = pagerduty_list_incidents()
+report = {}
+
+for line in services.split("\n"):
+    if "Degraded" in line or "Critical" in line:
+        svc = line.split(" — ")[0].strip()
+
+        metrics = datadog_query_metrics(query=svc, timeframe="last_1h")
+        logs = datadog_search_logs(query="ERROR", service=svc)
+        prs = github_search_prs(query="merged", repo=svc)
+        slack = slack_read_messages(channel="#incidents")
+        jira = jira_search_issues(query=svc, project="ENG")
+        runbook = confluence_search_pages(query=svc + " runbook")
+
+        # Extract people involved from Slack messages
+        people = []
+        for msg in slack.split("\n"):
+            if "]" in msg:
+                who = msg.split("]")[1].split(":")[0].strip()
+                if who and who not in people:
+                    people.append(who)
+
+        # Find incident IDs for this service
+        svc_incidents = []
+        for inc in incidents.split("\n"):
+            if svc in inc:
+                svc_incidents.append(inc.strip())
+
+        report[svc] = {
+            "incidents": svc_incidents,
+            "metrics_summary": metrics,
+            "recent_errors": logs,
+            "recent_prs": prs,
+            "team_engaged": people,
+            "related_jira": jira,
+            "runbook": runbook,
+        }
+
+return report
+```
+
+## Why this is interesting
+
+1. **Loops + conditionals** — the script iterates over degraded services,
+   not a static list. The agent writes real control flow.
+
+2. **Cross-referencing** — incident IDs from PagerDuty are matched against
+   service names. Slack messages are parsed to extract who's engaged.
+
+3. **8+ tool calls in one round-trip** — without `execute_tool_script`,
+   the agent needs sequential calls with model inference between each.
+   The script runs server-side and returns one aggregated result.
+
+4. **Text parsing** — the script does string splitting and filtering that
+   would otherwise require the model to process raw text from each tool.
+
+## Coherent demo story
+
+The dummy data tells a story: Alice deployed `v2.4.1` which caused the
+checkout service to timeout, spiking web-app latency. PagerDuty fired two
+incidents (SEV1 checkout, SEV2 web-app). The Slack #incidents channel shows
+Alice, Bob, and Carol coordinating. Datadog logs show the exact error chain.
+GitHub shows the merged PR that caused it. The script stitches all of this
+together into a single triage report.
@@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+# Deploy the script middleware demo to a local Kind cluster.
+# Prerequisites: kind, kubectl, docker, task (Taskfile)
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+CLUSTER_NAME="script-demo"
+
+echo "=== Script Middleware Demo ==="
+echo ""
+
+# 1. Create Kind cluster (if not exists)
+if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
+    echo "Kind cluster '$CLUSTER_NAME' already exists, reusing."
+else
+    echo "Creating Kind cluster '$CLUSTER_NAME'..."
+    cat <<EOF | kind create cluster --name "$CLUSTER_NAME" --config=-
+kind: Cluster
+apiVersion: kind.x-k8s.io/v1alpha4
+nodes:
+  - role: control-plane
+    kubeadmConfigPatches:
+      - |
+        kind: InitConfiguration
+        nodeRegistration:
+          kubeletExtraArgs:
+            node-labels: "ingress-ready=true"
+    extraPortMappings:
+      - containerPort: 30080
+        hostPort: 4483
+        protocol: TCP
+EOF
+fi
+
+export KUBECONFIG="$(kind get kubeconfig-path --name="$CLUSTER_NAME" 2>/dev/null || echo "$HOME/.kube/config")"
+kubectl config use-context "kind-${CLUSTER_NAME}"
+
+# 2. Build and load images
+echo ""
+echo "Building operator and vmcp images..."
+cd "$REPO_ROOT"
+task build-all-images 2>&1 | tail -5
+
+echo "Loading images into Kind cluster..."
+kind load docker-image ghcr.io/stacklok/toolhive/operator:latest --name "$CLUSTER_NAME"
+kind load docker-image ghcr.io/stacklok/toolhive/vmcp:latest --name "$CLUSTER_NAME"
+kind load docker-image ghcr.io/stacklok/toolhive/proxyrunner:latest --name "$CLUSTER_NAME"
+
+# 3. Install CRDs and operator
+echo ""
+echo "Installing CRDs..."
+kubectl apply -f deploy/charts/operator-crds/files/crds/ 2>&1 | head -5
+
+echo "Deploying operator..."
+helm upgrade --install thv-operator deploy/charts/operator \
+    --namespace toolhive-system --create-namespace \
+    --set image.tag=latest \
+    --set vmcpImage.tag=latest \
+    --set proxyRunnerImage.tag=latest \
+    --wait --timeout 120s 2>&1 | tail -3
+
+# 4. Deploy demo manifests
+echo ""
+echo "Deploying demo MCP servers..."
+kubectl apply -f "$SCRIPT_DIR/manifests.yaml"
+
+# 5. Wait for VirtualMCPServer
+echo ""
+echo "Waiting for VirtualMCPServer to be ready..."
+kubectl wait --for=condition=Ready virtualmcpserver/demo-vmcp \
+    -n script-demo --timeout=180s 2>&1 || true
+
+# 6. Patch the NodePort to use 30080 (mapped to host 4483)
+echo ""
+echo "Configuring NodePort..."
+VMCP_SVC=$(kubectl get svc -n script-demo -l app.kubernetes.io/instance=demo-vmcp -o name | head -1)
+if [ -n "$VMCP_SVC" ]; then
+    kubectl patch "$VMCP_SVC" -n script-demo --type='json' \
+        -p='[{"op":"replace","path":"/spec/ports/0/nodePort","value":30080}]' 2>/dev/null || true
+fi
+
+echo ""
+echo "=== Demo Ready ==="
+echo ""
+echo "VirtualMCPServer: http://localhost:4483/mcp"
+echo ""
+echo "Tools available: slack (4), jira (4), confluence (2), github (4),"
+echo "                 pagerduty (3), datadog (3), google-drive (2), linear (2)"
+echo "                 + execute_tool_script"
+echo ""
+echo "Connect with an MCP client or add to Claude Code settings:"
+echo '  { "mcpServers": { "demo": { "url": "http://localhost:4483/mcp" } } }'
+echo ""
+echo "Teardown: kind delete cluster --name $CLUSTER_NAME"