diff --git a/skills/netra-mcp-usage/SKILL.md b/skills/netra-mcp-usage/SKILL.md
index e31c2cd..464ca02 100644
--- a/skills/netra-mcp-usage/SKILL.md
+++ b/skills/netra-mcp-usage/SKILL.md
@@ -12,134 +12,24 @@ Use this skill when you need to inspect traces through Netra MCP tools and want
 - Query traces in a time range with filtering, sorting, and cursor pagination.
 - Retrieve full span trees for a selected trace id.
 - Guide incident/debug workflows from trace search to root-cause analysis.
+- Run MCP-driven evaluation workflows for single-turn and multi-turn datasets.
 
 ## Primary MCP Tools
 - `netra_query_traces`
 - `netra_get_trace_by_id`
+- `netra_list_provider_configs`
+- `netra_create_dataset`
+- `netra_add_dataset_test_case`
+- `netra_list_evaluators`
+- `netra_add_evaluator`
+- `netra_get_test_run_details`
 
-## Workflow
-1. Start with a narrow time window and low limit.
-2. Add the minimum filters needed to isolate relevant traces.
-3. Sort for your objective (recent, slowest, most expensive, errors).
-4. Page through results using returned cursor values.
-5. Fetch full spans for one trace id.
-6. Inspect hierarchy, status, latency, and attributes.
+## Use-Case specific references
 
-## query_traces Input Schema
-Required:
-- `startTime` (string, ISO 8601)
-- `endTime` (string, ISO 8601)
+- Querying traces, filters, sort options, pagination, and incident triage: `references/traces.md`
+- Single-turn evaluation flow (providers -> datasets -> test cases -> evaluators -> test run): `references/evaluations-single-turn.md`
+- Multi-turn simulation flow (scenario-driven test cases and evaluator config handling): `references/simulation-multi-turn.md`
 
-Optional:
-- `limit` (number, 1-100, default 20)
-- `cursor` (string)
-- `direction` (`up` | `down`, default `down`)
-- `sortField`
-- `sortOrder` (`asc` | `desc`, default `desc`)
-- `filters` (array of filter objects)
+## Feedback
 
-### sortField Values
-- `latency_ms`
-- `name`
-- `total_cost`
-- `has_pii`
-- `has_violation`
-- `start_time`
-- `environment`
-- `service`
-- `has_error`
-- `total_tokens`
-
-### Filter Object Schema
-Each filter object must include:
-- `field`
-- `value`
-- `type`
-- `operator`
-
-Optional in filter object:
-- `key` (for nested/object-style filtering)
-
-#### field Values
-- `name`
-- `tenant_id`
-- `user_id`
-- `session_id`
-- `environment`
-- `service`
-- `metadata`
-- `projectIds`
-- `project_id`
-- `parent_span_id`
-- `has_pii`
-- `has_violation`
-- `has_error`
-- `models`
-- `total_cost`
-- `latency`
-
-#### type Values
-- `string`
-- `number`
-- `boolean`
-- `arrayOptions`
-- `attributeKey`
-- `object`
-
-#### operator Values
-- `equals`
-- `greater_than`
-- `less_than`
-- `greater_equal_to`
-- `less_equal_to`
-- `contains`
-- `not_equals`
-- `any_of`
-- `none_of`
-- `not_contains`
-- `starts_with`
-- `ends_with`
-- `is_null`
-- `is_not_null`
-
-## Filter Patterns
-- Error traces only:
-  - `field: has_error`, `type: boolean`, `operator: equals`, `value: true`
-- Specific session:
-  - `field: session_id`, `type: string`, `operator: equals`, `value: <session-id>`
-- High latency:
-  - `field: latency`, `type: number`, `operator: greater_than`, `value: 3000`
-- Service scoped:
-  - `field: service`, `type: string`, `operator: equals`, `value: <service-name>`
-- Metadata key/value:
-  - `field: metadata`, `type: object`, `key: <metadata-key>`, `operator: equals`, `value: <value>`
-
-## Pagination Pattern
-1. Run `query_traces` without `cursor`.
-2. Capture a `cursor` from returned trace items.
-3. Re-run `query_traces` with the cursor and `direction: down`.
-4. Continue while `pageInfo.hasNextPage` is true.
-
-## get_trace_by_id Input Schema
-Required:
-- `traceId` (string)
-
-Behavior:
-- Returns complete span array for the trace id.
-- Use this after `query_traces` to inspect one trace deeply.
-- Invalid ids return a not-found style error.
-
-## Incident Triage Recipe
-1. Query for failing traces (`has_error=true`) in the incident window.
-2. Sort by `latency_ms` desc to identify worst requests.
-3. Pull one trace via `get_trace_by_id`.
-4. Validate root span presence and parent-child span flow.
-5. Check slow spans and tool/model metadata.
-6. Compare with a nearby successful trace if needed.
-
-## Practical Tips
-- Keep initial windows short (5-30 minutes) for faster narrowing.
-- Use one or two filters first, then add more only if needed.
-- Prefer exact-match IDs (`session_id`, `user_id`, `tenant_id`) when available.
-- Use `sortField=total_cost` to find expensive traces quickly.
-- If no results: widen time range first, then relax filters.
+If the user is unhappy with the results, ask them to open an issue at https://github.com/KeyValueSoftwareSystems/netra-skills/issues/new.
diff --git a/skills/netra-mcp-usage/references/evaluations-single-turn.md b/skills/netra-mcp-usage/references/evaluations-single-turn.md
new file mode 100644
index 0000000..88886da
--- /dev/null
+++ b/skills/netra-mcp-usage/references/evaluations-single-turn.md
@@ -0,0 +1,168 @@
+---
+name: netra-mcp-evaluations-single-turn
+description: End-to-end single-turn evaluation workflow in Netra MCP from provider selection to test run details.
+---
+
+# Netra MCP Evaluations (Single-Turn)
+
+Use this reference for a schema-correct single-turn evaluation flow using Netra MCP tools.
+
+## End-To-End Flow
+
+1. List provider configurations.
+2. Create a single-turn dataset.
+3. Add single-turn test cases.
+4. List evaluators.
+5. If project evaluators are missing (or you only see library evaluators), create evaluators in the Netra dashboard first.
+6. Attach evaluators to dataset or test cases.
+7. Execute a test run.
+8. Fetch run results using test run id.
+
+## Step 1: List Provider Configurations
+
+Tool: `netra_list_provider_configs`
+
+Purpose:
+- Find valid `provider_id` and `model` values for dataset items.
+- Confirm the provider/model is available for your use case.
+
+Example:
+
+```json
+{}
+```
+
+## Step 2: Create A Single-Turn Dataset
+
+Tool: `netra_create_dataset`
+
+Required choices:
+- `turnType`: `single`
+- `datasetType`: usually `text`
+
+Example:
+
+```json
+{
+  "name": "support-quality-single-turn",
+  "turnType": "single",
+  "datasetType": "text",
+  "tags": ["support", "regression"]
+}
+```
+
+## Step 3: Add Single-Turn Test Cases
+
+Tool: `netra_add_dataset_test_case`
+
+Important:
+- For single-turn datasets, `input` is required.
+- `providerConfig` is required in practice. Always pass `provider_id` and `model` from Step 1.
+
+Example:
+
+```json
+{
+  "datasetId": "<dataset-id>",
+  "input": "User asks for a refund after 45 days",
+  "expectedOutput": "Assistant explains policy and offers next best options",
+  "contextData": {
+    "policy": "30-day refund window",
+    "region": "US"
+  },
+  "providerConfig": {
+    "provider_id": "<provider-id>",
+    "model": "<model-name>"
+  },
+  "tags": ["refund"]
+}
+```
+
+## Step 4: List Evaluators
+
+Tool: `netra_list_evaluators`
+
+Purpose:
+- Discover project evaluators available for attachment.
+- Inspect available library evaluators in `libraryData`.
+
+Example:
+
+```json
+{
+  "turnType": "single",
+  "page": 1,
+  "limit": 20
+}
+```
+
+Decision rule:
+- If project evaluator results are empty and only `libraryData` has entries, stop and instruct the user to create evaluators in the Netra dashboard before continuing.
+
+Suggested instruction to user:
+- "No project evaluators are available yet. Please create/select evaluators in the Netra dashboard for this project, then rerun `netra_list_evaluators`."
+
+## Step 5: Attach Evaluators
+
+Tool: `netra_add_evaluator`
+
+Options:
+- Attach at dataset level (`targetType: dataset`).
+- Attach at test-case level (`targetType: test_case`, requires `datasetItemId`).
+
+Example (dataset-level):
+
+```json
+{
+  "targetType": "dataset",
+  "datasetId": "<dataset-id>",
+  "evaluatorId": "<evaluator-id>",
+  "isActive": true
+}
+```
+
+Example (test-case-level):
+
+```json
+{
+  "targetType": "test_case",
+  "datasetId": "<dataset-id>",
+  "datasetItemId": "<dataset-item-id>",
+  "evaluatorId": "<evaluator-id>"
+}
+```
+
+## Step 6: Execute Test Run
+
+Use your workspace test-run execution tool (commonly named `netra_execute_test_run`) to run the dataset against the target system.
+
+Expected output:
+- A `testRunId` used for retrieval and analysis.
+
+## Step 7: Get Test Run Details
+
+Tool: `netra_get_test_run_details`
+
+Required:
+- `testRunId`
+
+Optional:
+- `page`, `limit`, `filters`
+
+Example:
+
+```json
+{
+  "testRunId": "<test-run-id>",
+  "page": 1,
+  "limit": 20
+}
+```
+
+## Practical Checks
+
+1. Always resolve `provider_id` and `model` before adding test cases.
+2. For single-turn cases, verify `input` is present for every item.
+3. Treat missing project evaluators as a setup blocker, not a runtime failure.
+4. Attach evaluators before running test executions to avoid incomplete scoring.
+5. Store and reuse `testRunId` for iterative detail queries.
diff --git a/skills/netra-mcp-usage/references/simulation-multi-turn.md b/skills/netra-mcp-usage/references/simulation-multi-turn.md
new file mode 100644
index 0000000..522b90e
--- /dev/null
+++ b/skills/netra-mcp-usage/references/simulation-multi-turn.md
@@ -0,0 +1,180 @@
+---
+name: netra-mcp-simulation-multi-turn
+description: End-to-end multi-turn simulation workflow in Netra MCP including scenario authoring guidelines and evaluatorConfig usage.
+---
+
+# Netra MCP Simulation (Multi-Turn)
+
+Use this reference for simulation-style multi-turn evaluations where scenario quality and evaluator configuration drive outcome quality.
+
+## End-To-End Flow
+
+1. List provider configurations.
+2. Create a multi-turn dataset.
+3. Add multi-turn test cases with high-quality scenario metadata.
+4. List evaluators.
+5. If project evaluators are missing (or you only see library evaluators), create evaluators in the Netra dashboard first.
+6. Attach evaluators and include `evaluatorConfig` for multi-turn evaluators where required.
+7. Execute a test run.
+8. Fetch run results using test run id.
+
+## Step 1: List Provider Configurations
+
+Tool: `netra_list_provider_configs`
+
+Purpose:
+- Select valid `provider_id` and `model` for simulation test cases.
+
+Example:
+
+```json
+{}
+```
+
+## Step 2: Create A Multi-Turn Dataset
+
+Tool: `netra_create_dataset`
+
+Required choices:
+- `turnType`: `multi`
+
+Example:
+
+```json
+{
+  "name": "support-agent-simulation",
+  "turnType": "multi",
+  "datasetType": "text",
+  "tags": ["simulation", "support"]
+}
+```
+
+## Step 3: Add Multi-Turn Test Cases
+
+Tool: `netra_add_dataset_test_case`
+
+Important:
+- For multi-turn datasets, `scenario` is required.
+- `providerConfig` is required in practice. Always pass `provider_id` and `model`.
+
+Example:
+
+```json
+{
+  "datasetId": "<dataset-id>",
+  "scenarioName": "Refund Delay",
+  "scenario": "Agent resolves a delayed refund by validating policy and giving clear next actions.",
+  "persona": "Frustrated",
+  "behaviourInstructions": "User repeatedly asks for escalation, gives partial details first, and challenges policy responses.",
+  "maxTurns": 8
+  "providerConfig": {
+    "provider_id": "<provider-id>",
+    "model": "<model-name>"
+  },
+  "tags": ["refund", "escalation"]
+}
+```
+
+## Multi-Turn Scenario Guidelines
+
+Follow these conventions to improve simulation consistency:
+
+1. `scenarioName` should be one to two words max.
+2. `scenario` should be written from the perspective of what the agent should do.
+3. `behaviourInstructions` should describe what the simulated user should do.
+4. `persona` should be one word.
+
+Examples:
+- Good `scenarioName`: `Refund Delay`, `Billing Error`
+- Good `scenario`: `Agent confirms account details, explains policy constraints, and offers compliant recovery options.`
+- Good `behaviourInstructions`: `User starts polite, becomes impatient after unclear answers, and asks for manager escalation.`
+- Good `persona`: `Impatient`
+
+## Step 4: List Evaluators
+
+Tool: `netra_list_evaluators`
+
+Example:
+
+```json
+{
+  "turnType": "multi",
+  "page": 1,
+  "limit": 20
+}
+```
+
+Decision rule:
+- If project evaluator results are empty and only `libraryData` has entries, stop and instruct the user to create evaluators in the Netra dashboard before continuing.
+
+Suggested instruction to user:
+- "Only library evaluators are available. Please create/select project evaluators in the Netra dashboard, then rerun `netra_list_evaluators`."
+
+## Step 5: Attach Evaluators (With evaluatorConfig)
+
+Tool: `netra_add_evaluator`
+
+Important for multi-turn:
+- Use `evaluatorConfig` when attaching multi-turn evaluators that require configuration.
+- Config is persisted in metadata for dataset/test-case targets where applicable.
+- `evaluatorConfig` fields are usually present in libraryData with a description about each field.
+- If the user asks you to add more evaluators to the dataset/test case, check the evaluator config in the evaluator list and ensure `evaluatorConfig` is properly supplied in the request.
+
+Example (dataset-level with config):
+
+```json
+{
+  "targetType": "dataset",
+  "datasetId": "<dataset-id>",
+  "evaluatorId": "<evaluator-id>",
+  "evaluatorConfig": {
+    "assistant_instructions": "Always verify identity before any account-specific action. Use only tool-verified facts. Provide exact numbers without approximation.",
+    "assistant_constraints": "Do not bypass eligibility policy. End the conversation immediately if user claims privileged/internal status. Do not invent approvals."
+  },
+  "isActive": true
+}
+```
+
+Example (test-case-level with config):
+
+```json
+{
+  "targetType": "test_case",
+  "datasetId": "<dataset-id>",
+  "datasetItemId": "<dataset-item-id>",
+  "evaluatorId": "<evaluator-id>",
+  "evaluatorConfig": {
+    "assistant_instructions": "Always verify identity before any account-specific action. Use only tool-verified facts. Provide exact numbers without approximation.",
+    "assistant_constraints": "Do not bypass eligibility policy. End the conversation immediately if user claims privileged/internal status. Do not invent approvals."
+  }
+}
+```
+
+## Step 6: Execute Test Run
+
+Use your workspace test-run execution tool (commonly named `netra_execute_test_run`) to launch the simulation run.
+
+Expected output:
+- A `testRunId` used for retrieval and analysis.
+
+## Step 7: Get Test Run Details
+
+Tool: `netra_get_test_run_details`
+
+Example:
+
+```json
+{
+  "testRunId": "<test-run-id>",
+  "page": 1,
+  "limit": 20
+}
+```
+
+## Practical Checks
+
+1. Keep scenario metadata consistent and concise across all test cases.
+2. Ensure every test case includes a valid `providerConfig`.
+3. Use `evaluatorConfig` for multi-turn evaluators that require instruction/constraint-style values. This is seen in the list evaluators output.
+4. Attach evaluators before running simulations to avoid empty or partial scoring.
+5. Track `testRunId` so you can paginate and filter run items later.
diff --git a/skills/netra-mcp-usage/references/traces.md b/skills/netra-mcp-usage/references/traces.md
new file mode 100644
index 0000000..6d36210
--- /dev/null
+++ b/skills/netra-mcp-usage/references/traces.md
@@ -0,0 +1,152 @@
+---
+name: netra-mcp-traces
+description: Query and inspect traces using Netra MCP query_traces and get_trace_by_id, with schema-correct filters, sorting, and cursor pagination.
+---
+
+# Netra MCP Traces
+
+Use this reference when you need exact input structures and practical patterns for trace debugging with Netra MCP.
+
+## Workflow
+
+1. Start with a narrow time window and low limit.
+2. Add the minimum filters needed to isolate relevant traces.
+3. Sort for your objective (recent, slowest, most expensive, or errors).
+4. Page through results using returned cursor values.
+5. Fetch full spans for one trace id.
+6. Inspect hierarchy, status, latency, and attributes.
+
+## query_traces Input Schema
+
+Required:
+- `startTime` (string, ISO 8601)
+- `endTime` (string, ISO 8601)
+
+Optional:
+- `limit` (number, 1-100, default 20)
+- `cursor` (string)
+- `direction` (`up` | `down`, default `down`)
+- `sortField`
+- `sortOrder` (`asc` | `desc`, default `desc`)
+- `filters` (array of filter objects)
+
+### sortField Values
+
+- `latency_ms`
+- `name`
+- `total_cost`
+- `has_pii`
+- `has_violation`
+- `start_time`
+- `environment`
+- `service`
+- `has_error`
+- `total_tokens`
+
+### Filter Object Schema
+
+Each filter object must include:
+- `field`
+- `value`
+- `type`
+- `operator`
+
+Optional in filter object:
+- `key` (for nested/object-style filtering)
+
+#### field Values
+
+- `name`
+- `tenant_id`
+- `user_id`
+- `session_id`
+- `environment`
+- `service`
+- `metadata`
+- `projectIds`
+- `project_id`
+- `parent_span_id`
+- `has_pii`
+- `has_violation`
+- `has_error`
+- `models`
+- `total_cost`
+- `latency`
+
+#### type Values
+
+- `string`
+- `number`
+- `boolean`
+- `arrayOptions`
+- `attributeKey`
+- `object`
+
+#### operator Values
+
+- `equals`
+- `greater_than`
+- `less_than`
+- `greater_equal_to`
+- `less_equal_to`
+- `contains`
+- `not_equals`
+- `any_of`
+- `none_of`
+- `not_contains`
+- `starts_with`
+- `ends_with`
+- `is_null`
+- `is_not_null`
+
+## Filter Patterns
+
+- Error traces only:
+	- `field: has_error`, `type: boolean`, `operator: equals`, `value: true`
+- Specific session:
+	- `field: session_id`, `type: string`, `operator: equals`, `value: <session-id>`
+- High latency:
+	- `field: latency`, `type: number`, `operator: greater_than`, `value: 3000`
+- Service scoped:
+	- `field: service`, `type: string`, `operator: equals`, `value: <service-name>`
+- Metadata key/value:
+	- `field: metadata`, `type: object`, `key: <metadata-key>`, `operator: equals`, `value: <value>`
+
+## Pagination Pattern
+
+1. Run `query_traces` without `cursor`.
+2. Capture a `cursor` from returned trace items.
+3. Re-run `query_traces` with the cursor and `direction: down`.
+4. Continue while `pageInfo.hasNextPage` is true.
+
+## get_trace_by_id Input Schema
+
+Required:
+- `traceId` (string)
+
+Behavior:
+- Returns complete span array for the trace id.
+- Use this after `query_traces` to inspect one trace deeply.
+- Invalid ids return a not-found style error.
+
+## Incident Triage Recipe
+
+1. Query for failing traces (`has_error=true`) in the incident window.
+2. Sort by `latency_ms` desc to identify worst requests.
+3. Pull one trace via `get_trace_by_id`.
+4. Validate root span presence and parent-child span flow.
+5. Check slow spans and tool/model metadata.
+6. Compare with a nearby successful trace if needed.
+
+## Practical Tips
+
+- Keep initial windows short (5-30 minutes) for faster narrowing.
+- Use one or two filters first, then add more only if needed.
+- Prefer exact-match IDs (`session_id`, `user_id`, `tenant_id`) when available.
+- Use `sortField=total_cost` to find expensive traces quickly.
+- If no results: widen time range first, then relax filters.
+
+## References
+
+- https://docs.getnetra.ai/Observability/Traces
+- https://docs.getnetra.ai/netra-mcp