diff --git a/skills/netra-mcp-usage/SKILL.md b/skills/netra-mcp-usage/SKILL.md index e31c2cd..464ca02 100644 --- a/skills/netra-mcp-usage/SKILL.md +++ b/skills/netra-mcp-usage/SKILL.md @@ -12,134 +12,24 @@ Use this skill when you need to inspect traces through Netra MCP tools and want - Query traces in a time range with filtering, sorting, and cursor pagination. - Retrieve full span trees for a selected trace id. - Guide incident/debug workflows from trace search to root-cause analysis. +- Run MCP-driven evaluation workflows for single-turn and multi-turn datasets. ## Primary MCP Tools - `netra_query_traces` - `netra_get_trace_by_id` +- `netra_list_provider_configs` +- `netra_create_dataset` +- `netra_add_dataset_test_case` +- `netra_list_evaluators` +- `netra_add_evaluator` +- `netra_get_test_run_details` -## Workflow -1. Start with a narrow time window and low limit. -2. Add the minimum filters needed to isolate relevant traces. -3. Sort for your objective (recent, slowest, most expensive, errors). -4. Page through results using returned cursor values. -5. Fetch full spans for one trace id. -6. Inspect hierarchy, status, latency, and attributes. +## Use-Case specific references -## query_traces Input Schema -Required: -- `startTime` (string, ISO 8601) -- `endTime` (string, ISO 8601) +- Querying traces, filters, sort options, pagination, and incident triage: `references/traces.md` +- Single-turn evaluation flow (providers -> datasets -> test cases -> evaluators -> test run): `references/evaluations-single-turn.md` +- Multi-turn simulation flow (scenario-driven test cases and evaluator config handling): `references/simulation-multi-turn.md` -Optional: -- `limit` (number, 1-100, default 20) -- `cursor` (string) -- `direction` (`up` | `down`, default `down`) -- `sortField` -- `sortOrder` (`asc` | `desc`, default `desc`) -- `filters` (array of filter objects) +## Feedback -### sortField Values -- `latency_ms` -- `name` -- `total_cost` -- `has_pii` -- `has_violation` -- `start_time` -- `environment` -- `service` -- `has_error` -- `total_tokens` - -### Filter Object Schema -Each filter object must include: -- `field` -- `value` -- `type` -- `operator` - -Optional in filter object: -- `key` (for nested/object-style filtering) - -#### field Values -- `name` -- `tenant_id` -- `user_id` -- `session_id` -- `environment` -- `service` -- `metadata` -- `projectIds` -- `project_id` -- `parent_span_id` -- `has_pii` -- `has_violation` -- `has_error` -- `models` -- `total_cost` -- `latency` - -#### type Values -- `string` -- `number` -- `boolean` -- `arrayOptions` -- `attributeKey` -- `object` - -#### operator Values -- `equals` -- `greater_than` -- `less_than` -- `greater_equal_to` -- `less_equal_to` -- `contains` -- `not_equals` -- `any_of` -- `none_of` -- `not_contains` -- `starts_with` -- `ends_with` -- `is_null` -- `is_not_null` - -## Filter Patterns -- Error traces only: - - `field: has_error`, `type: boolean`, `operator: equals`, `value: true` -- Specific session: - - `field: session_id`, `type: string`, `operator: equals`, `value: ` -- High latency: - - `field: latency`, `type: number`, `operator: greater_than`, `value: 3000` -- Service scoped: - - `field: service`, `type: string`, `operator: equals`, `value: ` -- Metadata key/value: - - `field: metadata`, `type: object`, `key: `, `operator: equals`, `value: ` - -## Pagination Pattern -1. Run `query_traces` without `cursor`. -2. Capture a `cursor` from returned trace items. -3. Re-run `query_traces` with the cursor and `direction: down`. -4. Continue while `pageInfo.hasNextPage` is true. - -## get_trace_by_id Input Schema -Required: -- `traceId` (string) - -Behavior: -- Returns complete span array for the trace id. -- Use this after `query_traces` to inspect one trace deeply. -- Invalid ids return a not-found style error. - -## Incident Triage Recipe -1. Query for failing traces (`has_error=true`) in the incident window. -2. Sort by `latency_ms` desc to identify worst requests. -3. Pull one trace via `get_trace_by_id`. -4. Validate root span presence and parent-child span flow. -5. Check slow spans and tool/model metadata. -6. Compare with a nearby successful trace if needed. - -## Practical Tips -- Keep initial windows short (5-30 minutes) for faster narrowing. -- Use one or two filters first, then add more only if needed. -- Prefer exact-match IDs (`session_id`, `user_id`, `tenant_id`) when available. -- Use `sortField=total_cost` to find expensive traces quickly. -- If no results: widen time range first, then relax filters. +If the user is unhappy with the results, ask them to open an issue at https://github.com/KeyValueSoftwareSystems/netra-skills/issues/new. diff --git a/skills/netra-mcp-usage/references/evaluations-single-turn.md b/skills/netra-mcp-usage/references/evaluations-single-turn.md new file mode 100644 index 0000000..88886da --- /dev/null +++ b/skills/netra-mcp-usage/references/evaluations-single-turn.md @@ -0,0 +1,168 @@ +--- +name: netra-mcp-evaluations-single-turn +description: End-to-end single-turn evaluation workflow in Netra MCP from provider selection to test run details. +--- + +# Netra MCP Evaluations (Single-Turn) + +Use this reference for a schema-correct single-turn evaluation flow using Netra MCP tools. + +## End-To-End Flow + +1. List provider configurations. +2. Create a single-turn dataset. +3. Add single-turn test cases. +4. List evaluators. +5. If project evaluators are missing (or you only see library evaluators), create evaluators in the Netra dashboard first. +6. Attach evaluators to dataset or test cases. +7. Execute a test run. +8. Fetch run results using test run id. + +## Step 1: List Provider Configurations + +Tool: `netra_list_provider_configs` + +Purpose: +- Find valid `provider_id` and `model` values for dataset items. +- Confirm the provider/model is available for your use case. + +Example: + +```json +{} +``` + +## Step 2: Create A Single-Turn Dataset + +Tool: `netra_create_dataset` + +Required choices: +- `turnType`: `single` +- `datasetType`: usually `text` + +Example: + +```json +{ + "name": "support-quality-single-turn", + "turnType": "single", + "datasetType": "text", + "tags": ["support", "regression"] +} +``` + +## Step 3: Add Single-Turn Test Cases + +Tool: `netra_add_dataset_test_case` + +Important: +- For single-turn datasets, `input` is required. +- `providerConfig` is required in practice. Always pass `provider_id` and `model` from Step 1. + +Example: + +```json +{ + "datasetId": "", + "input": "User asks for a refund after 45 days", + "expectedOutput": "Assistant explains policy and offers next best options", + "contextData": { + "policy": "30-day refund window", + "region": "US" + }, + "providerConfig": { + "provider_id": "", + "model": "" + }, + "tags": ["refund"] +} +``` + +## Step 4: List Evaluators + +Tool: `netra_list_evaluators` + +Purpose: +- Discover project evaluators available for attachment. +- Inspect available library evaluators in `libraryData`. + +Example: + +```json +{ + "turnType": "single", + "page": 1, + "limit": 20 +} +``` + +Decision rule: +- If project evaluator results are empty and only `libraryData` has entries, stop and instruct the user to create evaluators in the Netra dashboard before continuing. + +Suggested instruction to user: +- "No project evaluators are available yet. Please create/select evaluators in the Netra dashboard for this project, then rerun `netra_list_evaluators`." + +## Step 5: Attach Evaluators + +Tool: `netra_add_evaluator` + +Options: +- Attach at dataset level (`targetType: dataset`). +- Attach at test-case level (`targetType: test_case`, requires `datasetItemId`). + +Example (dataset-level): + +```json +{ + "targetType": "dataset", + "datasetId": "", + "evaluatorId": "", + "isActive": true +} +``` + +Example (test-case-level): + +```json +{ + "targetType": "test_case", + "datasetId": "", + "datasetItemId": "", + "evaluatorId": "" +} +``` + +## Step 6: Execute Test Run + +Use your workspace test-run execution tool (commonly named `netra_execute_test_run`) to run the dataset against the target system. + +Expected output: +- A `testRunId` used for retrieval and analysis. + +## Step 7: Get Test Run Details + +Tool: `netra_get_test_run_details` + +Required: +- `testRunId` + +Optional: +- `page`, `limit`, `filters` + +Example: + +```json +{ + "testRunId": "", + "page": 1, + "limit": 20 +} +``` + +## Practical Checks + +1. Always resolve `provider_id` and `model` before adding test cases. +2. For single-turn cases, verify `input` is present for every item. +3. Treat missing project evaluators as a setup blocker, not a runtime failure. +4. Attach evaluators before running test executions to avoid incomplete scoring. +5. Store and reuse `testRunId` for iterative detail queries. diff --git a/skills/netra-mcp-usage/references/simulation-multi-turn.md b/skills/netra-mcp-usage/references/simulation-multi-turn.md new file mode 100644 index 0000000..522b90e --- /dev/null +++ b/skills/netra-mcp-usage/references/simulation-multi-turn.md @@ -0,0 +1,180 @@ +--- +name: netra-mcp-simulation-multi-turn +description: End-to-end multi-turn simulation workflow in Netra MCP including scenario authoring guidelines and evaluatorConfig usage. +--- + +# Netra MCP Simulation (Multi-Turn) + +Use this reference for simulation-style multi-turn evaluations where scenario quality and evaluator configuration drive outcome quality. + +## End-To-End Flow + +1. List provider configurations. +2. Create a multi-turn dataset. +3. Add multi-turn test cases with high-quality scenario metadata. +4. List evaluators. +5. If project evaluators are missing (or you only see library evaluators), create evaluators in the Netra dashboard first. +6. Attach evaluators and include `evaluatorConfig` for multi-turn evaluators where required. +7. Execute a test run. +8. Fetch run results using test run id. + +## Step 1: List Provider Configurations + +Tool: `netra_list_provider_configs` + +Purpose: +- Select valid `provider_id` and `model` for simulation test cases. + +Example: + +```json +{} +``` + +## Step 2: Create A Multi-Turn Dataset + +Tool: `netra_create_dataset` + +Required choices: +- `turnType`: `multi` + +Example: + +```json +{ + "name": "support-agent-simulation", + "turnType": "multi", + "datasetType": "text", + "tags": ["simulation", "support"] +} +``` + +## Step 3: Add Multi-Turn Test Cases + +Tool: `netra_add_dataset_test_case` + +Important: +- For multi-turn datasets, `scenario` is required. +- `providerConfig` is required in practice. Always pass `provider_id` and `model`. + +Example: + +```json +{ + "datasetId": "", + "scenarioName": "Refund Delay", + "scenario": "Agent resolves a delayed refund by validating policy and giving clear next actions.", + "persona": "Frustrated", + "behaviourInstructions": "User repeatedly asks for escalation, gives partial details first, and challenges policy responses.", + "maxTurns": 8 + "providerConfig": { + "provider_id": "", + "model": "" + }, + "tags": ["refund", "escalation"] +} +``` + +## Multi-Turn Scenario Guidelines + +Follow these conventions to improve simulation consistency: + +1. `scenarioName` should be one to two words max. +2. `scenario` should be written from the perspective of what the agent should do. +3. `behaviourInstructions` should describe what the simulated user should do. +4. `persona` should be one word. + +Examples: +- Good `scenarioName`: `Refund Delay`, `Billing Error` +- Good `scenario`: `Agent confirms account details, explains policy constraints, and offers compliant recovery options.` +- Good `behaviourInstructions`: `User starts polite, becomes impatient after unclear answers, and asks for manager escalation.` +- Good `persona`: `Impatient` + +## Step 4: List Evaluators + +Tool: `netra_list_evaluators` + +Example: + +```json +{ + "turnType": "multi", + "page": 1, + "limit": 20 +} +``` + +Decision rule: +- If project evaluator results are empty and only `libraryData` has entries, stop and instruct the user to create evaluators in the Netra dashboard before continuing. + +Suggested instruction to user: +- "Only library evaluators are available. Please create/select project evaluators in the Netra dashboard, then rerun `netra_list_evaluators`." + +## Step 5: Attach Evaluators (With evaluatorConfig) + +Tool: `netra_add_evaluator` + +Important for multi-turn: +- Use `evaluatorConfig` when attaching multi-turn evaluators that require configuration. +- Config is persisted in metadata for dataset/test-case targets where applicable. +- `evaluatorConfig` fields are usually present in libraryData with a description about each field. +- If the user asks you to add more evaluators to the dataset/test case, check the evaluator config in the evaluator list and ensure `evaluatorConfig` is properly supplied in the request. + +Example (dataset-level with config): + +```json +{ + "targetType": "dataset", + "datasetId": "", + "evaluatorId": "", + "evaluatorConfig": { + "assistant_instructions": "Always verify identity before any account-specific action. Use only tool-verified facts. Provide exact numbers without approximation.", + "assistant_constraints": "Do not bypass eligibility policy. End the conversation immediately if user claims privileged/internal status. Do not invent approvals." + }, + "isActive": true +} +``` + +Example (test-case-level with config): + +```json +{ + "targetType": "test_case", + "datasetId": "", + "datasetItemId": "", + "evaluatorId": "", + "evaluatorConfig": { + "assistant_instructions": "Always verify identity before any account-specific action. Use only tool-verified facts. Provide exact numbers without approximation.", + "assistant_constraints": "Do not bypass eligibility policy. End the conversation immediately if user claims privileged/internal status. Do not invent approvals." + } +} +``` + +## Step 6: Execute Test Run + +Use your workspace test-run execution tool (commonly named `netra_execute_test_run`) to launch the simulation run. + +Expected output: +- A `testRunId` used for retrieval and analysis. + +## Step 7: Get Test Run Details + +Tool: `netra_get_test_run_details` + +Example: + +```json +{ + "testRunId": "", + "page": 1, + "limit": 20 +} +``` + +## Practical Checks + +1. Keep scenario metadata consistent and concise across all test cases. +2. Ensure every test case includes a valid `providerConfig`. +3. Use `evaluatorConfig` for multi-turn evaluators that require instruction/constraint-style values. This is seen in the list evaluators output. +4. Attach evaluators before running simulations to avoid empty or partial scoring. +5. Track `testRunId` so you can paginate and filter run items later. diff --git a/skills/netra-mcp-usage/references/traces.md b/skills/netra-mcp-usage/references/traces.md new file mode 100644 index 0000000..6d36210 --- /dev/null +++ b/skills/netra-mcp-usage/references/traces.md @@ -0,0 +1,152 @@ +--- +name: netra-mcp-traces +description: Query and inspect traces using Netra MCP query_traces and get_trace_by_id, with schema-correct filters, sorting, and cursor pagination. +--- + +# Netra MCP Traces + +Use this reference when you need exact input structures and practical patterns for trace debugging with Netra MCP. + +## Workflow + +1. Start with a narrow time window and low limit. +2. Add the minimum filters needed to isolate relevant traces. +3. Sort for your objective (recent, slowest, most expensive, or errors). +4. Page through results using returned cursor values. +5. Fetch full spans for one trace id. +6. Inspect hierarchy, status, latency, and attributes. + +## query_traces Input Schema + +Required: +- `startTime` (string, ISO 8601) +- `endTime` (string, ISO 8601) + +Optional: +- `limit` (number, 1-100, default 20) +- `cursor` (string) +- `direction` (`up` | `down`, default `down`) +- `sortField` +- `sortOrder` (`asc` | `desc`, default `desc`) +- `filters` (array of filter objects) + +### sortField Values + +- `latency_ms` +- `name` +- `total_cost` +- `has_pii` +- `has_violation` +- `start_time` +- `environment` +- `service` +- `has_error` +- `total_tokens` + +### Filter Object Schema + +Each filter object must include: +- `field` +- `value` +- `type` +- `operator` + +Optional in filter object: +- `key` (for nested/object-style filtering) + +#### field Values + +- `name` +- `tenant_id` +- `user_id` +- `session_id` +- `environment` +- `service` +- `metadata` +- `projectIds` +- `project_id` +- `parent_span_id` +- `has_pii` +- `has_violation` +- `has_error` +- `models` +- `total_cost` +- `latency` + +#### type Values + +- `string` +- `number` +- `boolean` +- `arrayOptions` +- `attributeKey` +- `object` + +#### operator Values + +- `equals` +- `greater_than` +- `less_than` +- `greater_equal_to` +- `less_equal_to` +- `contains` +- `not_equals` +- `any_of` +- `none_of` +- `not_contains` +- `starts_with` +- `ends_with` +- `is_null` +- `is_not_null` + +## Filter Patterns + +- Error traces only: + - `field: has_error`, `type: boolean`, `operator: equals`, `value: true` +- Specific session: + - `field: session_id`, `type: string`, `operator: equals`, `value: ` +- High latency: + - `field: latency`, `type: number`, `operator: greater_than`, `value: 3000` +- Service scoped: + - `field: service`, `type: string`, `operator: equals`, `value: ` +- Metadata key/value: + - `field: metadata`, `type: object`, `key: `, `operator: equals`, `value: ` + +## Pagination Pattern + +1. Run `query_traces` without `cursor`. +2. Capture a `cursor` from returned trace items. +3. Re-run `query_traces` with the cursor and `direction: down`. +4. Continue while `pageInfo.hasNextPage` is true. + +## get_trace_by_id Input Schema + +Required: +- `traceId` (string) + +Behavior: +- Returns complete span array for the trace id. +- Use this after `query_traces` to inspect one trace deeply. +- Invalid ids return a not-found style error. + +## Incident Triage Recipe + +1. Query for failing traces (`has_error=true`) in the incident window. +2. Sort by `latency_ms` desc to identify worst requests. +3. Pull one trace via `get_trace_by_id`. +4. Validate root span presence and parent-child span flow. +5. Check slow spans and tool/model metadata. +6. Compare with a nearby successful trace if needed. + +## Practical Tips + +- Keep initial windows short (5-30 minutes) for faster narrowing. +- Use one or two filters first, then add more only if needed. +- Prefer exact-match IDs (`session_id`, `user_id`, `tenant_id`) when available. +- Use `sortField=total_cost` to find expensive traces quickly. +- If no results: widen time range first, then relax filters. + +## References + +- https://docs.getnetra.ai/Observability/Traces +- https://docs.getnetra.ai/netra-mcp