A chess engine project built to show how it thinks, and to compare engine techniques across methods and languages.
Board & move generation
- Bitboard representation with Little-Endian Rank-File (LERF) mapping
- Wrap-safe directional shifts and ray-walking for sliding pieces
- Full legal move generation with make/unmake, castling, en passant, promotion
- Zobrist hashing for fast position keys
- FEN (de)serialization
Search (alpha-beta family)
- Minimax with alpha-beta pruning
- Iterative deepening with time management
- Transposition table keyed by Zobrist hash (depth-preferred replacement)
- Move ordering: TT move → promotions → MVV-LVA captures → killer moves → history heuristic (with a self-bounding "gravity" update)
- Quiescence search to tame the horizon effect
Evaluation (handcrafted, three interchangeable levels)
- Material + piece-square tables
- Pawn structure (doubled / isolated / passed), bishop pair, rough mobility, and a midgame/endgame king-safety table
Neural networks (optional nn extra)
- Separate policy and value networks, each with a CNN or light-ResNet trunk
(one
num_res_blocksknob switches between them) - Board-to-planes encoding and a 4096 from-to policy head
- Hybrid hooks: the value net drops into the alpha-beta search as a
BaseEvaluate, and the policy net supplies move priors - Supervised training on master games, plus an AlphaZero-style self-play reinforcement-learning loop: a PUCT MCTS guided by the two nets, self-play game generation, and per-generation training with acceptance gating
Interfaces
- UCI protocol over stdin/stdout — drives from any GUI or match runner
- A pygame GUI to play against the bot, with legal-move hints and a promotion picker
Benchmarking harness
- A dependency-free UCI match runner that referees games and emits PGN
- Elo analysis: head-to-head Elo difference with a 95% confidence interval and likelihood-of-superiority, plus a Bradley-Terry rating table for 3+ engines
- A
cutechess-cli+ Stockfish + Ordo pipeline for the standard workflow - A 12-position opening suite for positional variety
Python 3.12 · uv · pygame · Stockfish (sparring partner) · Ordo (rating) · cutechess-cli (match runner) · Docker
uv sync # create the environment from the lockfile
uv run python python/main.py # play against the bot (pygame GUI)
uv run python python/uci.py # talk UCI: uci, isready, position startpos, go movetime 1000The Python implementation lives under python/ (engine core in python/engine,
neural nets in python/nn), with equivalent-logic ports beside it in cpp/
(C++17) and rust/ (Rust). Each port is a standalone UCI binary that the match
harness and the GUI drive identically. Shared tooling (assets/, harness/)
stays at the repo root.
The C++ and Rust engines are line-for-line ports of the Python core: the same bitboard move generation, handcrafted evaluation, and alpha-beta search with a transposition table. They are verified equivalent — identical perft counts and byte-identical search output (scores, node counts, principal variation) at a fixed depth — and run orders of magnitude faster.
# C++ → cpp/build/uci
cmake -S cpp -B cpp/build -DCMAKE_BUILD_TYPE=Release && cmake --build cpp/build -j
# Rust → rust/target/release/uci
cargo build --release --manifest-path rust/Cargo.toml
# Validate move generation (should match the reference perft numbers)
cpp/build/perft
cargo run --release --manifest-path rust/Cargo.toml --bin perftPlay the GUI against any of the three brains (the board still renders in Python; only the bot's move is computed by the chosen engine):
uv run python python/main.py --lang py # in-process Python (default)
uv run python python/main.py --lang cpp # C++ UCI binary
uv run python python/main.py --lang rust # Rust UCI binaryTo train the supervised networks, install the optional dependencies first. Games are streamed from the angeluriot/chess_games dataset and replayed into (position, move, result) samples:
uv sync --extra nn # adds torch + numpy + datasets
cd python
python -m nn.train --mode policy --max-games 2000 --epochs 5
python -m nn.train --mode value --max-games 2000 --min-elo 2200 --epochs 5With supervised weights in place, nn.rl runs an AlphaZero-style loop on top of
them: each generation plays self-play games (a PUCT MCTS using the policy net for
priors and the value net for leaf evaluations), trains candidate nets on the
collected (position, MCTS policy, game result) samples, and promotes the
candidate only if it beats the current best in an in-process match. The nets are
warm-started from the latest supervised checkpoints (training from scratch in
pure Python is infeasible), and accepted generations are written to
nn/weights/rl/ — not nn/weights/ — so the GUI keeps using the supervised
nets until you adopt an RL net explicitly. A per-generation metrics CSV (policy
loss, value loss, gate score) lands in harness/results/.
cd python
python -m nn.rl --generations 5 --games-per-gen 20 --sims 80 # train (CPU-friendly)The GUI can load the RL nets directly — no promotion step. The Engine-settings
panel has a Weights toggle (supervised / rl) that points the nn
evaluation and policy move source at nn/weights/ or nn/weights/rl/; start
on the RL nets with python main.py --eval nn --weights rl (value net) or
--search policy --weights rl. For the UCI binary / harness (which always read
nn/weights/), copy a chosen RL checkpoint up with python -m nn.promote.
Self-play is the bottleneck of the RL loop, so the move generation and MCTS can
run natively. The eschess_native extension (Rust/PyO3, in rust-ffi/) wraps
the fast Rust engine and runs the whole self-play loop across many games in
parallel — bypassing the GIL — calling back into Python only for batched
policy/value inference. It is optional: when it is not built, nn.rl falls back
to the pure-Python self-play with no change in behaviour.
Build it into the environment (needs a Rust toolchain), then select it with
--backend native (the default auto uses it when available):
uv run --extra nn maturin develop --release -m rust-ffi/Cargo.toml # build the module
cd python
python -m nn.rl --generations 5 --games-per-gen 64 --sims 80 --backend nativeA quick parity check (native vs pure-Python move generation, encoding, and
self-play) lives in tests/ and runs without torch:
uv run --extra ffi python tests/test_native_parity.pyThe image bundles the engine with every external tool the benchmark needs —
Stockfish, Ordo, and cutechess-cli — and builds the C++ and Rust
engines (cpp/build/uci, rust/target/release/uci), so the entire
cross-language pipeline (matches, telemetry, ACL vs Stockfish) is reproducible
with no host setup.
All harness scripts write their output (PGNs, logs, telemetry CSV/PNG, ACL CSV)
to harness/results/. Mount that directory so the output lands on the host;
otherwise it disappears with --rm.
docker build -t eschess .
# Run the engine as a UCI process
docker run --rm -i eschess python python/uci.py
# Benchmark vs a strength-limited Stockfish (PGN + log → harness/results/)
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess \
python harness/benchmark.py --a-eval medium --stockfish 1320 --games 100
# Cross-language throughput (CSV + PNG → harness/results/)
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess \
python harness/telemetry.py --depth 7 --plot
# Move quality vs Stockfish: play a few games, then score ACL (PGN + CSV → harness/results/)
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess bash -c '
python harness/match.py --engine1 cpp/build/uci --name1 eschess-cpp \
--engine2 stockfish --name2 SF-1350 --opt2 UCI_LimitStrength=true \
--opt2 UCI_Elo=1350 --games 4 --movetime 100 --pgn harness/results/g.pgn && \
python harness/acl.py harness/results/g.pgn --ref stockfish --ref-depth 12'Play a match. The runner launches two UCI engines, referees with Eschess's own board logic (checkmate, stalemate, 50-move, threefold, insufficient material), and writes a PGN:
uv run python harness/match.py \
--engine1 "python3 python/uci.py" --name1 EschessA \
--engine2 "python3 python/uci.py" --name2 EschessB \
--games 100 --movetime 100 --concurrency 4 --openings harness/openings.epd--concurrency N plays N games in parallel. The engine config lives inside the
engine command, so any matchup works: choose the evaluation with --eval simple|medium|complex|nn and the move source with --search alphabeta|policy
(e.g. --engine1 "python3 python/uci.py --eval complex"), and set the TT size
with --opt1/--opt2 Hash=N. Time per move is the match-wide --movetime — there
is no per-engine depth flag, and the C++/Rust binaries take no flags (fixed
medium eval). The PGN defaults to harness/results/ (override with --pgn).
Point --engine2 at Stockfish for a real benchmark. Stockfish ships only in the
Docker image, so run it there — mounting harness/results keeps the PGN on the
host:
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess \
python harness/match.py \
--engine1 "python3 python/uci.py" --name1 Eschess \
--engine2 stockfish --name2 SF-1320 \
--opt2 UCI_LimitStrength=true --opt2 UCI_Elo=1320 \
--games 100 --movetime 100 --openings harness/openings.epdRate the results. elo.py has no external dependencies, so run it locally on
the PGN that the match left in harness/results/. For two engines it reports the
score, Elo difference ± 95% margin, the confidence interval,
likelihood-of-superiority, and draw rate; for three or more it solves a
Bradley-Terry model for a full rating table:
uv run python harness/elo.py harness/results/<run>.pgn --anchor SF-1320 --anchor-elo 1320Benchmark the ML model (one shot). harness/benchmark.py wraps the
match-then-rate flow and builds the engine commands for you, writing a timestamped
run folder (PGN + log) to harness/results/. ML configs (nn evaluation,
policy search) need the torch deps (--extra nn); --stockfish benchmarks need
Stockfish, so run those in Docker (the image bundles both):
# value network vs the medium handcrafted eval (local, no external tools)
uv run --extra nn python harness/benchmark.py --a-eval nn --b-eval medium \
--games 100 --movetime 100 --concurrency 4
# policy network vs a 1320 Stockfish (Docker; run folder → harness/results/)
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess \
python harness/benchmark.py --a-search policy --stockfish 1320 \
--games 100 --concurrency 4Standard tooling. harness/run_cutechess.sh drives the same benchmark
through cutechess-cli and rates it with Ordo. Those tools ship only in the
Docker image, so run it there (PGN → harness/results/):
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess \
harness/run_cutechess.sh 100 20+0.2 1320Cross-language telemetry. harness/telemetry.py drives the Python, C++, and
Rust engines over UCI on a shared set of positions and reports nodes-per-second,
peak memory, and (where perf hardware counters are permitted) cache-miss rate.
Because the three engines are logically equivalent they visit the same nodes at a
fixed depth, so NPS is a clean speed comparison. Results go to a timestamped CSV
under harness/results/; with the bench extra it also renders PNG charts:
uv run --extra bench python harness/telemetry.py --depth 6 --cache --plotMove quality (Average Centipawn Loss). harness/acl.py scores the moves in a
played PGN against a strong reference engine (Stockfish), reporting ACL split by
game phase (opening / middlegame / endgame) and by player — how much worse each
played move was than the reference's best. The reference is Stockfish, so run it
in Docker; the per-player/phase table is also written to a CSV in
harness/results/:
docker run --rm -v "$PWD/harness/results:/app/harness/results" eschess \
python harness/acl.py harness/results/<run>.pgn --ref stockfish --ref-depth 14Notes
- The engine is pure Python (~2–16k NPS), so use short fixed-time controls (50–200 ms/move) to keep 100-game matches to minutes rather than hours.
- PGN move text is UCI coordinate notation, not SAN — raters key off the
[Result]tag, so this does not affect ratings. - Ordo refuses to rate a "not well connected" database (e.g. when one engine
wins every game); the bundled
elo.pyhandles that case, so the scripts default to it for the headline number.
- Phase 1 — Core engine (done): bitboard representation, alpha-beta minimax with iterative deepening and a Zobrist transposition table, MVV-LVA / killer / history move ordering, and a handcrafted piece-square + structural evaluation.
- Phase 1.5 — UCI & benchmarking (done): UCI protocol wrapper, an automated match harness against a dialed-down Stockfish, and Elo analysis with confidence intervals.
- Phase 2 — Machine learning & hybrids: supervised value/policy networks on
master games (done) and an AlphaZero-style self-play RL loop with MCTS (done,
nn.rl); still to come, a hybrid orchestrator (opening book early, search/ML in the middlegame, endgame tablebases when the board simplifies). - Phase 3 — Cross-language benchmarking (done): the same alpha-beta search
and evaluation in Python, C++ (
cpp/), and Rust (rust/), verified equivalent by perft and byte-identical fixed-depth search; telemetry for nodes-per-second, memory, and cache misses (harness/telemetry.py); and move quality via average centipawn loss vs Stockfish (harness/acl.py). - Phase 4 — Web deployment & visualization: an interactive sandbox with a learning mode (watch engine-vs-engine games with the search tree and evaluations overlaid) and a competition mode (play a configurable engine), a live self-play training dashboard, and an LLM "chess coach" that narrates the engine's top variations.