BC-Bench

A benchmark for evaluating coding agents on real-world Business Central (AL) development tasks, inspired by SWE-Bench.

Purpose

BC-Bench provides a reproducible evaluation framework for coding agents working on real-world Business Central development tasks:

Measure performance of different models on authentic AL issues
Quantify impact of tooling changes (MCP servers, custom instructions, custom agents, etc)
Track progress with transparent, comparable metrics over time
Rapidly iterate on agent configurations and setups

Dataset

We follow the SWE-Bench schema with BC-specific adjustments:

environment_setup_commit and version are combined into environment_setup_version
project_paths to enumerate AL project roots touched by the fix
problem_statement and hints_text are not included in the jsonl file but stored under problemstatement for screenshots in repro steps

Agents Under Evaluation

mini-BC-agent

A minimal agent loop based on mini-swe-agent. Its simplicity makes it perfect for establishing baseline performance. See mini-bc-agent.

GitHub Copilot CLI

The GitHub Copilot CLI supports MCP servers, tools, and agent mode. It closely simulates real developers' workflow (both VS Code and Coding Agent), making it an ideal candidate for evaluating automated workflows.

Claude Code

Claude Code is Anthropic's agentic coding tool. It supports MCP servers, custom system prompts, and agent mode. BC-Bench integrates with Claude Code using the same shared configuration as Copilot.

Getting Started

BC-Bench is open source, and you're welcome to fork and adapt it for your own use. We are not accepting external contributions in this repository at this time. You can run evaluations locally and replace the dataset under dataset/ with tasks from your own codebase.

Setup for Forks

Name		Name	Last commit message	Last commit date
Latest commit History 736 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
dataset		dataset
docs		docs
evaluator		evaluator
notebooks		notebooks
scripts		scripts
src/bcbench		src/bcbench
tests		tests
.editorconfig		.editorconfig
.env.sample		.env.sample
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BC-Bench

Purpose

Dataset

Agents Under Evaluation

mini-BC-agent

GitHub Copilot CLI

Claude Code

Getting Started

About

Uh oh!

Releases 8

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BC-Bench

Purpose

Dataset

Agents Under Evaluation

mini-BC-agent

GitHub Copilot CLI

Claude Code

Getting Started

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 8

Uh oh!

Contributors

Uh oh!

Languages