Skip to content

microsoft/BC-Bench

BC-Bench

Dataset Validation and Verification CI

A benchmark for evaluating coding agents on real-world Business Central (AL) development tasks, inspired by SWE-Bench.

Purpose

BC-Bench provides a reproducible evaluation framework for coding agents working on real-world Business Central development tasks:

  • Measure performance of different models on authentic AL issues
  • Quantify impact of tooling changes (MCP servers, custom instructions, custom agents, etc)
  • Track progress with transparent, comparable metrics over time
  • Rapidly iterate on agent configurations and setups

Dataset

We follow the SWE-Bench schema with BC-specific adjustments:

  • environment_setup_commit and version are combined into environment_setup_version
  • project_paths to enumerate AL project roots touched by the fix
  • problem_statement and hints_text are not included in the jsonl file but stored under problemstatement for screenshots in repro steps

Agents Under Evaluation

mini-BC-agent

A minimal agent loop based on mini-swe-agent. Its simplicity makes it perfect for establishing baseline performance. See mini-bc-agent.

GitHub Copilot CLI

The GitHub Copilot CLI supports MCP servers, tools, and agent mode. It closely simulates real developers' workflow (both VS Code and Coding Agent), making it an ideal candidate for evaluating automated workflows.

Claude Code

Claude Code is Anthropic's agentic coding tool. It supports MCP servers, custom system prompts, and agent mode. BC-Bench integrates with Claude Code using the same shared configuration as Copilot.

Getting Started

BC-Bench is open source, and you're welcome to fork and adapt it for your own use. We are not accepting external contributions in this repository at this time. You can run evaluations locally and replace the dataset under dataset/ with tasks from your own codebase.

About

Inspired by SWE-Bench, for Business Central (AL) ecosystem.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors