Skip to content

Extensibility requests - evals#490

Draft
AleksandricMarko wants to merge 20 commits intomainfrom
extensibility_evals
Draft

Extensibility requests - evals#490
AleksandricMarko wants to merge 20 commits intomainfrom
extensibility_evals

Conversation

@AleksandricMarko
Copy link
Copy Markdown
Collaborator

Please go to the Preview tab and select the appropriate template for your changes:

  • Experiment - For running benchmarks and experiments
  • Standard PR - Just delete this text and fill in description

Copy link
Copy Markdown
Collaborator

@haoranpb haoranpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

The only piece missing is reading the agent's output during the evaluate step.


Will need some refactoring if you want it merged, but good if you just need it to run locally.

AleksandricMarko and others added 5 commits February 4, 2026 15:51
- Enhanced parse_metrics_ext to search session logs for JSON output
  (agent writes to stdout, not stderr where metrics are parsed)
- Updated step7 instructions to require comprehensive JSON output
  with all workflow step results in specified structure
- Updated prompt template to explicitly request JSON output
- Agent now consistently produces structured JSON output for analysis

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements comparison between agent output and expected results. Validates outcome is FEASIBLE and labels match expected values. Sets resolved based on validation result.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
extensibility-request-template: |
You are working with a Business Central (AL) code repository at {{repo_path}}.

Task: Analyze and process the extensibility request using the custom agent - "Argus extensibility agent"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to specify this in the prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants