feat(agent): MVE Experiment Designer by mattdot · Pull Request #976 · microsoft/hve-core

mattdot · 2026-03-11T20:11:20Z

Pull Request

Description

Adds a new conversational coaching agent that guides users through designing a Minimum Viable Experiment (MVE). The agent follows a structured, phase-based process — from problem discovery and hypothesis formation through viability vetting to a complete experiment plan. It helps users translate unknowns and assumptions into crisp, testable hypotheses, evaluates experiment feasibility, and produces actionable MVE plans with session tracking via .copilot-tracking. Includes the agent definition (experiment-designer.agent.md) and companion instructions (experiment-designer.instructions.md) covering MVE domain knowledge, vetting criteria, and experiment type reference.

Related Issue(s)

Closes #973

Type of Change

Select all that apply:

Code & Documentation:

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update

Infrastructure & Configuration:

AI Artifacts:

Reviewed contribution with prompt-builder agent and addressed all feedback
Copilot instructions (.github/instructions/*.instructions.md)
Copilot prompt (.github/prompts/*.prompt.md)
Copilot agent (.github/agents/*.agent.md)
Copilot skill (.github/skills/*/SKILL.md)

Note for AI Artifact Contributors:

Agents: Research, indexing/referencing other project (using standard VS Code GitHub Copilot/MCP tools), planning, and general implementation agents likely already exist. Review .github/agents/ before creating new ones.

Skills: Must include both bash and PowerShell scripts. See Skills.

Model Versions: Only contributions targeting the latest Anthropic and OpenAI models will be accepted. Older model versions (e.g., GPT-3.5, Claude 3) will be rejected.

See Agents Not Accepted and Model Version Requirements.

Other:

Script/automation (.ps1, .sh, .py)
Other (please describe):

Sample Prompts (for AI Artifact Contributions)

User Request:

"I have an idea for [feature/product/approach] but I'm not sure if it will work. Help me design an experiment to validate it before we commit to building it."
"We need to test whether [assumption] is true before starting development"
"Help me design an MVE for [project/feature]"
"Our customer wants us to build X, but there are unknowns around data feasibility / architecture / LLM capability — can we experiment first?"
"I want to validate my hypothesis about [topic] with a structured experiment"

Execution Flow:

Phase 1 — Problem & Context Discovery: Agent asks probing questions about the problem statement, customer context, business case, unknowns, and constraints. Creates a tracking directory at .copilot-tracking/mve/{date}/{experiment-name}/ and writes context.md.
Phase 2 — Hypothesis Formation: Agent guides user to translate unknowns into testable hypotheses using the format "We believe [assumption]. We will test this by [method]. We will know we are right/wrong when [measurable outcome]." Prioritizes hypotheses by risk and impact. Writes hypotheses.md.
Phase 3 — MVE Vetting & Red Flag Check: Agent applies four vetting criteria (business sense, crisp problem statement, Responsible AI, clear next steps) and checks against nine red flag patterns (demos, skipping ahead, solved problems, mini-MVP, etc.). Writes vetting.md. If fundamental problems found, returns to Phase 1 or 2.
Phase 4 — Experiment Design: Agent helps choose experiment type, define technical approach, set measurable success/failure criteria per hypothesis, scope timeline to weeks, and plan post-experiment evaluation. Writes experiment-design.md.
Phase 5 — MVE Plan Output: Agent consolidates all phase outputs into a single mve-plan.md document for stakeholder review. Iterates based on user feedback, returning to earlier phases if needed.

Output Artifacts:

context.md — Problem statement, customer context, business justification
hypotheses.md — Prioritized testable hypotheses with assumption/method/outcome
vetting.md — Vetting criteria results and red flag assessment
experiment-design.md — Approach, scope, timeline, resources, success criteria
mve-plan.md — Consolidated plan document for stakeholder review

<!-- markdownlint-disable-file -->
# MVE Context: {experiment-name}

## Problem Statement
{User's refined problem statement}

## Customer & Stakeholder Context
{Customer details, priority level, sponsors}

## Known Constraints
{IP, data access, timeline constraints}

## Assumptions & Unknowns
- Unknown 1: ...
- Assumption 1: ...

Business Case

{Why this experiment matters, what decision it informs}

Success Indicators:

The .copilot-tracking/mve/{date}/{experiment-name}/ directory contains all five markdown artifacts (context.md, hypotheses.md, vetting.md, experiment-design.md, mve-plan.md)
Each hypothesis follows the three-part format: assumption, test method, measurable outcome
Hypotheses are prioritized by risk and impact with clear rationale
Vetting results explicitly address all four criteria and flag any red flags encountered
Success and failure criteria are defined per hypothesis with quantitative thresholds
The experiment is scoped to weeks (not months) with explicit out-of-scope boundaries
mve-plan.md includes next steps for both validated and invalidated outcomes
The agent challenged vague problem statements or untestable hypotheses rather than accepting them uncritically

For detailed contribution requirements, see:

Common Standards: docs/contributing/ai-artifacts-common.md - Shared standards for XML blocks, markdown quality, RFC 2119, validation, and testing
Agents: docs/contributing/custom-agents.md - Agent configurations with tools and behavior patterns
Prompts: docs/contributing/prompts.md - Workflow-specific guidance with template variables
Instructions: docs/contributing/instructions.md - Technology-specific standards with glob patterns
Skills: docs/contributing/skills.md - Task execution utilities with cross-platform scripts

Testing

I've used it for a few MVE opportunities to help refine our hypotheses and plan our MVE.

Checklist

Required Checks

[x ] Documentation is updated (if applicable)
[x ] Files follow existing naming conventions
[x ] Changes are backwards compatible (if applicable)
[N/A ] Tests added for new functionality (if applicable)

AI Artifact Contributions

Used /prompt-analyze to review contribution
[x ] Addressed all feedback from prompt-builder review
[x ] Verified contribution follows common standards and type-specific requirements

Required Automated Checks

The following validation commands must pass before merging:

Markdown linting: npm run lint:md
Spell checking: npm run spell-check
Frontmatter validation: npm run lint:frontmatter
Skill structure validation: npm run validate:skills
Link validation: npm run lint:md-links
PowerShell analysis: npm run lint:ps
Plugin freshness: npm run plugin:generate

(can't run dev container, hoping ci/cd pipeline checks these :) )

Security Considerations

[x ] This PR does not contain any sensitive or NDA information
[N/A ] Any new dependencies have been reviewed for security issues
[N/A ] Security-related scripts follow the principle of least privilege

Additional Notes

feat(instructions): introduce MVE coaching conventions for Experiment Designer chore(collections): include Experiment Designer in experimental collections chore(collections): update experimental collection YAML to reference new agent and instructions 🔧 - Generated by Copilot

codecov-commenter · 2026-03-11T20:13:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.94%. Comparing base (27fbd33) to head (bf61eea).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #976      +/-   ##
==========================================
- Coverage   88.04%   86.94%   -1.10%     
==========================================
  Files          45       31      -14     
  Lines        7885     5408    -2477     
==========================================
- Hits         6942     4702    -2240     
+ Misses        943      706     -237

Flag	Coverage Δ
pester	`86.94% <ø> (ø)`
pytest	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 14 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

WilliamBerryiii · 2026-03-11T23:00:09Z

@mattdot ... can you look at the hifi and lofi prototype builders in design thinking and see if this covers your needs first?

…mum Viable Experiments

mattdot · 2026-03-12T19:31:08Z

@mattdot ... can you look at the hifi and lofi prototype builders in design thinking and see if this covers your needs first?

@WilliamBerryiii not quite. It kind of proposes testing assumptions, but it doesn't really do it with the scientific rigor I'd expect from a true MVE. It feels more like it's proposing a vibe check of the assumptions rather than an experiment result that we have rock solid confidence in.

WilliamBerryiii · 2026-03-13T02:54:58Z

@mattdot ... can you look at the hifi and lofi prototype builders in design thinking and see if this covers your needs first?

@WilliamBerryiii not quite. It kind of proposes testing assumptions, but it doesn't really do it with the scientific rigor I'd expect from a true MVE. It feels more like it's proposing a vibe check of the assumptions rather than an experiment result that we have rock solid confidence in.

One last set of questions (I should have asked earlier but has to think about it) ... where do you think this goes from a collections perspective after it's run in the experimental phase? More Coding Focused? Data Science too?
https://microsoft.github.io/hve-core/docs/getting-started/install#collection-packages

Should this agent's artifact (the experiment.md) be handed off to the PRD-builder and/or Task Researcher for the implementation phase? You've got more experience in this space, are the experiments you're running more of a "rough PRD" scale or more of a "if we had enough tokens, we could probably get this through a task researcher run" 😂 ... This really comes down to do you want the experiment to run PRD -> *-Backlog-Manager for entry into the backlog or go right to coding (or both).

mattdot · 2026-03-13T21:55:18Z

@mattdot ... can you look at the hifi and lofi prototype builders in design thinking and see if this covers your needs first?

@WilliamBerryiii not quite. It kind of proposes testing assumptions, but it doesn't really do it with the scientific rigor I'd expect from a true MVE. It feels more like it's proposing a vibe check of the assumptions rather than an experiment result that we have rock solid confidence in.

One last set of questions (I should have asked earlier but has to think about it) ... where do you think this goes from a collections perspective after it's run in the experimental phase? More Coding Focused? Data Science too? https://microsoft.github.io/hve-core/docs/getting-started/install#collection-packages

Should this agent's artifact (the experiment.md) be handed off to the PRD-builder and/or Task Researcher for the implementation phase? You've got more experience in this space, are the experiments you're running more of a "rough PRD" scale or more of a "if we had enough tokens, we could probably get this through a task researcher run" 😂 ... This really comes down to do you want the experiment to run PRD -> *-Backlog-Manager for entry into the backlog or go right to coding (or both).

The output of this is really a plan and hypothesis to go do an experiment on. Once you actually do the experiment, the results of the experiment would be used much like other research could be used, as inputs to PRD or ADR.

For the collections, I could see this in the Data Science and Project Planning collections.

…ucture

WilliamBerryiii · 2026-03-15T18:49:11Z

@mattdot - should I update this to exit with a hand off document for the ADO and GH backlog managers? Do you anticipate that the experiment generates work items or do we go right to task researcher/planner/implementor/reviewer for workflow execution?

mattdot · 2026-03-16T19:27:58Z

@mattdot - should I update this to exit with a hand off document for the ADO and GH backlog managers? Do you anticipate that the experiment generates work items or do we go right to task researcher/planner/implementor/reviewer for workflow execution?

I kind of feel like backlog might be the way to go since you could come out with several hypothesis to test and i would be good to track/work them independently.

- add optional Phase 6 generating backlog-brief.md from mve-plan.md - add backlog-brief.md template to session artifacts and instructions - add usage guide and end-to-end example for Phase 6 workflow - enable experiment-to-backlog transition via bridge document 🔬 - Generated by Copilot

WilliamBerryiii · 2026-03-18T00:27:20Z

Changes Pushed: Backlog Bridge Phase

Hey @mattdot — I pushed a commit to your branch that adds Phase 6 (Backlog Bridge) to the Experiment Designer. Here's a summary of what changed and why. Let me know if you're ok with these changes and I'll get the merge going.

What's New

Phase 6: Backlog Bridge — an optional phase that converts completed MVE outputs into a backlog-brief.md document formatted for consumption by ADO or GitHub backlog manager agents via their Discovery Path B.

Only triggers when the user explicitly asks to create backlog items from the experiment.
Maps each hypothesis to a REQ-NNN requirement with acceptance criteria derived from success criteria.
Preserves priority rankings, dependencies, resource requirements, and out-of-scope items.
Provides handoff guidance for both ADO and GitHub backlog managers.

Files Changed (2 files, +148 / -24)

.github/agents/experimental/experiment-designer.agent.md

Added Phase 6 (Backlog Bridge) section with generation steps and completion guidance.
Streamlined Phase 3 (Vetting Criteria and Red Flags) to reference detailed descriptions in the instructions file instead of inlining full text — reduces duplication and keeps the agent file focused on coaching flow.
Phase 4 success criteria now says "Refine the success criteria established in Phase 2" to reinforce continuity between phases.
Experiment type list replaced with a reference to the instructions file's canonical list.

.github/instructions/experimental/experiment-designer.instructions.md

Added backlog-brief.md to the session artifact directory tree and descriptions.
Added full Backlog Brief Template with field placeholders and structure.
Added Template Field Guidance section explaining how to populate each template section.
Added Backlog Bridge Usage Guide covering when to use, inputs/outputs, and handoff to backlog managers.
Added Backlog Bridge Example with end-to-end walkthrough from experiment completion to backlog item creation.
Suggested Labels format updated to support multiple experiment types as separate labels.

Prompt Builder Review

These changes went through a Prompt Builder evaluation pass (test + evaluate + fix cycle). Key findings addressed:

Duplication reduction — Moved detailed vetting criteria and red flag descriptions from the agent file to the instructions file, replacing them with concise labels and cross-references.
Phase continuity — Phase 4 success criteria explicitly reference Phase 2 outputs.
Label format — Multi-type experiments produce separate labels rather than compound strings.
Template hygiene — markdownlint-disable-file included in the template.

All linting (npm run lint:all) passes clean. Prompt Tester confirmed all 5 requirements pass. Prompt Evaluator confirmed all targeted fixes resolved with no new issues.

Commit

feat(agents): add backlog bridge phase to experiment designer

- fix ADO backlog manager intent classification to route structured briefs to Discovery instead of PRD Planning - add disambiguation heuristics separating PRDs from backlog-brief.md inputs - add backlog brief keyword signal to GitHub backlog manager Discovery row - add Backlog Brief document type to GitHub discovery parsing guidelines 🔗 - Generated by Copilot

WilliamBerryiii · 2026-03-18T01:00:14Z

Discovery Path B Alignment (`bf61eeab`)

This commit ensures both ADO and GitHub backlog managers correctly route backlog-brief.md artifacts to Discovery Path B (artifact-driven discovery) instead of misclassifying them.

Problem

ADO Backlog Manager routed all document-bearing requests to PRD Planning (@AzDO PRD to WIT), including structured briefs that should go to Discovery Path B.
GitHub Backlog Manager lacked "backlog brief" in its Discovery keyword signals.
GitHub Discovery instructions had no Document Parsing Guidelines entry for Backlog Brief documents.

Changes

File	Change
`.github/agents/ado/ado-backlog-manager.agent.md`	Added "backlog brief" keyword and "structured requirement briefs" indicator to Discovery row; refined disambiguation heuristics to separate PRDs (→ PRD Planning) from structured briefs (→ Discovery Path B)
`.github/agents/github/github-backlog-manager.agent.md`	Added "backlog brief" to Discovery keyword signals and contextual indicators
`.github/instructions/github/github-backlog-discovery.instructions.md`	Added Backlog Brief rows to Document Parsing Guidelines table (experiment requirements → User story, non-functional constraints → Task)

Design Note

ADO's ado-wit-discovery.instructions.md was intentionally not modified — it uses generic extraction that handles backlog briefs adequately. The GitHub version has a structured Document Parsing Guidelines table that needed explicit Backlog Brief entries.

This completes the end-to-end path: Experiment Designer → backlog-brief.md → Backlog Manager → Discovery Path B → work items.

mattdot added 4 commits March 10, 2026 16:13

Merge branch 'microsoft:main' into main

06d0d0d

feat(agents): experiment-designer documentation

cd8d5ea

Merge branch 'main' of https://github.com/mattdot/hve-core

58efd85

mattdot requested a review from a team as a code owner March 11, 2026 20:11

feat(agents): add experiment-designer agent and instructions for Mini…

d05e655

…mum Viable Experiments

WilliamBerryiii added this to the v3.2.0 milestone Mar 13, 2026

Merge branch 'main' into main

1b4ad42

mattdot and others added 3 commits March 13, 2026 22:09

feat(instructions): enhance experiment-designer documentation and str…

1d592e8

…ucture

Merge branch 'main' of https://github.com/mattdot/hve-core

9bf38f2

Merge branch 'main' into main

66ebcf9

WilliamBerryiii and others added 3 commits March 15, 2026 11:49

Merge branch 'main' into main

b283627

Merge branch 'main' into main

f844a47

Merge branch 'main' into main

c43d536

WilliamBerryiii and others added 4 commits March 16, 2026 20:31

Merge branch 'main' into main

4412835

build(plugins): regenerate experimental plugin README

840f451

Merge branch 'main' into main

077a0b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): MVE Experiment Designer#976

feat(agent): MVE Experiment Designer#976
mattdot wants to merge 17 commits intomicrosoft:mainfrom
mattdot:main

mattdot commented Mar 11, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 11, 2026 •

edited

Loading

Uh oh!

WilliamBerryiii commented Mar 11, 2026

Uh oh!

mattdot commented Mar 12, 2026

Uh oh!

WilliamBerryiii commented Mar 13, 2026 •

edited

Loading

Uh oh!

mattdot commented Mar 13, 2026

Uh oh!

WilliamBerryiii commented Mar 15, 2026

Uh oh!

mattdot commented Mar 16, 2026

Uh oh!

WilliamBerryiii commented Mar 18, 2026 •

edited

Loading

Uh oh!

WilliamBerryiii commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mattdot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Related Issue(s)

Type of Change

Sample Prompts (for AI Artifact Contributions)

Business Case

Testing

Checklist

Required Checks

AI Artifact Contributions

Required Automated Checks

Security Considerations

Additional Notes

Uh oh!

codecov-commenter commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

WilliamBerryiii commented Mar 11, 2026

Uh oh!

mattdot commented Mar 12, 2026

Uh oh!

WilliamBerryiii commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdot commented Mar 13, 2026

Uh oh!

WilliamBerryiii commented Mar 15, 2026

Uh oh!

mattdot commented Mar 16, 2026

Uh oh!

WilliamBerryiii commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Pushed: Backlog Bridge Phase

What's New

Files Changed (2 files, +148 / -24)

Prompt Builder Review

Commit

Uh oh!

WilliamBerryiii commented Mar 18, 2026

Discovery Path B Alignment (bf61eeab)

Problem

Changes

Design Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattdot commented Mar 11, 2026 •

edited

Loading

codecov-commenter commented Mar 11, 2026 •

edited

Loading

WilliamBerryiii commented Mar 13, 2026 •

edited

Loading

WilliamBerryiii commented Mar 18, 2026 •

edited

Loading

Discovery Path B Alignment (`bf61eeab`)