Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions .claude/skills/incident-triage-lame/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
name: incident-triage-lame
description: Triage active PagerDuty incidents by gathering context from all available services one tool call at a time.
allowed-tools: Bash
---

# Incident Triage (No Scripting)

We have active PagerDuty incidents. Build an incident triage report by gathering data from every available service.

Do NOT use `execute_tool_script`. Call each tool individually, one at a time.

1. Check PagerDuty for service health and active incidents
2. For each degraded service, gather context from Datadog (metrics + logs), GitHub (recent PRs), Slack (#incidents messages), Jira (related issues), and Confluence (runbooks)
3. Cross-reference the results to identify probable root causes, who's engaged, and what runbooks apply

Format the final report as markdown matching this structure exactly:

```
# Incident Triage Report

## Service Health
<paste service health output verbatim>

## Active Incidents
<paste incidents output verbatim>

## Degraded Service: <name>
### Metrics
<paste metrics verbatim>
### Error Logs
<paste logs verbatim>
### Recent PRs (Potential Root Causes)
<paste prs verbatim>
### Slack #incidents Context
<paste messages verbatim>
### Related Jira Issues
<paste jira verbatim>
### Runbooks
<paste runbooks verbatim>

(repeat for each degraded service)
```

Include the raw tool output under each heading — do not summarize or rewrite it.
79 changes: 79 additions & 0 deletions .claude/skills/incident-triage/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
name: incident-triage
description: Triage active PagerDuty incidents by gathering context from all available services using execute_tool_script.
allowed-tools: Bash
---

# Incident Triage

We have active PagerDuty incidents. Use `execute_tool_script` to build an incident triage report by gathering data from every available service in a single scripted call.

Write a Starlark script that:

1. Gets the service health list and active incidents from PagerDuty
2. For each service that is NOT "Operational", gathers context in parallel:
- Datadog metrics and error logs for that service
- Recent GitHub PRs (look for potential root cause deploys)
- Slack #incidents messages for team context
- Related Jira issues
- Confluence runbooks
3. Parses the text results to extract key details — incident IDs, error messages, who's involved, what was recently deployed
4. Formats the result as a **markdown report** and returns it as a string

The script should return a ready-to-display markdown string — NOT a dict. Build the markdown inside the script so no post-processing is needed. Structure it like:

```
# Incident Triage Report

## Service Health
<paste service health text>

## Active Incidents
<paste incidents text>

## Degraded Service: <name>
### Metrics
<paste metrics>
### Error Logs
<paste logs>
### Recent PRs (Potential Root Causes)
<paste prs>
### Slack #incidents Context
<paste messages>
### Related Jira Issues
<paste jira>
### Runbooks
<paste runbooks>

(repeat for each degraded service)
```

Use loops over the degraded services and string parsing to cross-reference results.

Use `parallel()` to fan out tool calls concurrently. `parallel()` takes a list of zero-arg callables (use `lambda`) and returns results in order. Fan out all services at once:

```python
def gather_context(svc):
results = parallel([
lambda s=svc: datadog_datadog_query_metrics(query=s),
lambda s=svc: datadog_datadog_search_logs(query=s),
lambda s=svc: github_github_search_prs(query=s),
lambda s=svc: slack_slack_read_messages(channel="incidents"),
lambda s=svc: jira_jira_search_issues(query=s),
lambda s=svc: confluence_confluence_search_pages(query=s),
])
return results

# Fan out ALL services concurrently (nested parallel)
contexts = parallel([lambda s=svc: gather_context(s) for svc in degraded_services])
```

NOTE: Starlark lambdas capture variables by reference. When using `lambda` inside a loop, bind the loop variable via a default argument to avoid the classic closure bug:
```python
# WRONG — all lambdas see the final value of svc
[lambda: query(svc) for svc in services]
# RIGHT — bind svc at definition time
[lambda s=svc: query(s) for svc in services]
```

IMPORTANT: The script returns a fully formatted markdown report. After calling execute_tool_script, display the result text verbatim. Do NOT summarize, reformat, or add your own analysis — the script output IS the final answer.
2 changes: 2 additions & 0 deletions cmd/vmcp/app/commands.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
"github.com/stacklok/toolhive/pkg/authserver/server/keys"
"github.com/stacklok/toolhive/pkg/container/runtime"
"github.com/stacklok/toolhive/pkg/groups"
"github.com/stacklok/toolhive/pkg/script"
"github.com/stacklok/toolhive/pkg/telemetry"
"github.com/stacklok/toolhive/pkg/vmcp"
"github.com/stacklok/toolhive/pkg/vmcp/aggregator"
Expand Down Expand Up @@ -624,6 +625,7 @@ func runServe(cmd *cobra.Command, _ []string) error {
Port: port,
AuthMiddleware: authMiddleware,
AuthzMiddleware: authzMiddleware,
ScriptMiddleware: script.NewMiddleware(),
AuthInfoHandler: authInfoHandler,
AuthServer: embeddedAuthServer,
TelemetryProvider: telemetryProvider,
Expand Down
139 changes: 139 additions & 0 deletions demo/script-middleware/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Script Middleware Demo

Demonstrates `execute_tool_script` — a Starlark scripting layer that lets agents
orchestrate multiple MCP tool calls in a single atomic operation.

## What this shows

An agent connected to a VirtualMCPServer with 8 enterprise tool backends
(Slack, Jira, GitHub, PagerDuty, Datadog, Confluence, Google Drive, Linear)
uses `execute_tool_script` to gather and cross-reference data across services
in one call instead of 8+ sequential round-trips.

## Setup (local Kind cluster)

### Prerequisites
- `kind`, `kubectl`, `docker` installed
- ToolHive operator image built locally: `task build-all-images`

### Deploy

```bash
# From repo root
./demo/script-middleware/deploy.sh
```

This creates a Kind cluster, installs the operator, deploys 8 dummy MCP servers
and a VirtualMCPServer, and sets up port-forwarding on localhost:4483.

### Connect with Claude Code

```bash
# In Claude Code settings, add as an MCP server:
# URL: http://localhost:4483/mcp
# Transport: streamable-http

# Then give Claude this prompt (see below)
```

### Teardown

```bash
kind delete cluster --name script-demo
```

## The Prompt

Give this to Claude (or any MCP-capable agent) after connecting:

> We have active PagerDuty incidents. Use execute_tool_script to build an
> incident triage report by gathering data from every available service.
>
> Write a script that:
> 1. Gets the service health list and active incidents from PagerDuty
> 2. For each service that is NOT "Operational", gathers context:
> - Datadog metrics and error logs for that service
> - Recent GitHub PRs (look for potential root cause deploys)
> - Slack #incidents messages for team context
> - Related Jira issues
> - Confluence runbooks
> 3. Parses the text results to extract key details (incident IDs, error
> messages, who's involved, what was recently deployed)
> 4. Returns a structured dict mapping each degraded service to its
> full triage context
>
> The script should use loops and string parsing — don't just call each
> tool once, cross-reference the results.

### What the agent should produce

A Starlark script that loops over degraded services, calls 5-6 tools per
service, parses the text output to extract names/IDs/timestamps, and returns
a structured dict. Something like:

```python
services = pagerduty_list_services()
incidents = pagerduty_list_incidents()
report = {}

for line in services.split("\n"):
if "Degraded" in line or "Critical" in line:
svc = line.split(" — ")[0].strip()

metrics = datadog_query_metrics(query=svc, timeframe="last_1h")
logs = datadog_search_logs(query="ERROR", service=svc)
prs = github_search_prs(query="merged", repo=svc)
slack = slack_read_messages(channel="#incidents")
jira = jira_search_issues(query=svc, project="ENG")
runbook = confluence_search_pages(query=svc + " runbook")

# Extract people involved from Slack messages
people = []
for msg in slack.split("\n"):
if "]" in msg:
who = msg.split("]")[1].split(":")[0].strip()
if who and who not in people:
people.append(who)

# Find incident IDs for this service
svc_incidents = []
for inc in incidents.split("\n"):
if svc in inc:
svc_incidents.append(inc.strip())

report[svc] = {
"incidents": svc_incidents,
"metrics_summary": metrics,
"recent_errors": logs,
"recent_prs": prs,
"team_engaged": people,
"related_jira": jira,
"runbook": runbook,
}

return report
```

## Why this is interesting

1. **Loops + conditionals** — the script iterates over degraded services,
not a static list. The agent writes real control flow.

2. **Cross-referencing** — incident IDs from PagerDuty are matched against
service names. Slack messages are parsed to extract who's engaged.

3. **8+ tool calls in one round-trip** — without `execute_tool_script`,
the agent needs sequential calls with model inference between each.
The script runs server-side and returns one aggregated result.

4. **Text parsing** — the script does string splitting and filtering that
would otherwise require the model to process raw text from each tool.

## Coherent demo story

The dummy data tells a story: Alice deployed `v2.4.1` which caused the
checkout service to timeout, spiking web-app latency. PagerDuty fired two
incidents (SEV1 checkout, SEV2 web-app). The Slack #incidents channel shows
Alice, Bob, and Carol coordinating. Datadog logs show the exact error chain.
GitHub shows the merged PR that caused it. The script stitches all of this
together into a single triage report.
95 changes: 95 additions & 0 deletions demo/script-middleware/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/usr/bin/env bash
# Deploy the script middleware demo to a local Kind cluster.
# Prerequisites: kind, kubectl, docker, task (Taskfile)
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
CLUSTER_NAME="script-demo"

echo "=== Script Middleware Demo ==="
echo ""

# 1. Create Kind cluster (if not exists)
if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
echo "Kind cluster '$CLUSTER_NAME' already exists, reusing."
else
echo "Creating Kind cluster '$CLUSTER_NAME'..."
cat <<EOF | kind create cluster --name "$CLUSTER_NAME" --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 30080
hostPort: 4483
protocol: TCP
EOF
fi

export KUBECONFIG="$(kind get kubeconfig-path --name="$CLUSTER_NAME" 2>/dev/null || echo "$HOME/.kube/config")"
kubectl config use-context "kind-${CLUSTER_NAME}"

# 2. Build and load images
echo ""
echo "Building operator and vmcp images..."
cd "$REPO_ROOT"
task build-all-images 2>&1 | tail -5

echo "Loading images into Kind cluster..."
kind load docker-image ghcr.io/stacklok/toolhive/operator:latest --name "$CLUSTER_NAME"
kind load docker-image ghcr.io/stacklok/toolhive/vmcp:latest --name "$CLUSTER_NAME"
kind load docker-image ghcr.io/stacklok/toolhive/proxyrunner:latest --name "$CLUSTER_NAME"

# 3. Install CRDs and operator
echo ""
echo "Installing CRDs..."
kubectl apply -f deploy/charts/operator-crds/files/crds/ 2>&1 | head -5

echo "Deploying operator..."
helm upgrade --install thv-operator deploy/charts/operator \
--namespace toolhive-system --create-namespace \
--set image.tag=latest \
--set vmcpImage.tag=latest \
--set proxyRunnerImage.tag=latest \
--wait --timeout 120s 2>&1 | tail -3

# 4. Deploy demo manifests
echo ""
echo "Deploying demo MCP servers..."
kubectl apply -f "$SCRIPT_DIR/manifests.yaml"

# 5. Wait for VirtualMCPServer
echo ""
echo "Waiting for VirtualMCPServer to be ready..."
kubectl wait --for=condition=Ready virtualmcpserver/demo-vmcp \
-n script-demo --timeout=180s 2>&1 || true

# 6. Patch the NodePort to use 30080 (mapped to host 4483)
echo ""
echo "Configuring NodePort..."
VMCP_SVC=$(kubectl get svc -n script-demo -l app.kubernetes.io/instance=demo-vmcp -o name | head -1)
if [ -n "$VMCP_SVC" ]; then
kubectl patch "$VMCP_SVC" -n script-demo --type='json' \
-p='[{"op":"replace","path":"/spec/ports/0/nodePort","value":30080}]' 2>/dev/null || true
fi

echo ""
echo "=== Demo Ready ==="
echo ""
echo "VirtualMCPServer: http://localhost:4483/mcp"
echo ""
echo "Tools available: slack (4), jira (4), confluence (2), github (4),"
echo " pagerduty (3), datadog (3), google-drive (2), linear (2)"
echo " + execute_tool_script"
echo ""
echo "Connect with an MCP client or add to Claude Code settings:"
echo ' { "mcpServers": { "demo": { "url": "http://localhost:4483/mcp" } } }'
echo ""
echo "Teardown: kind delete cluster --name $CLUSTER_NAME"
Loading
Loading