An Alpha-Zero-style Connect Four engine trained entirely via self play.
The game logic, Monte Carlo Tree Search, and multi-threaded self play engine is written in rust here.
The NN is written in Python/PyTorch here and interfaces with rust via PyO3
- Install clang
# Instructions for Ubuntu/Debian (other OSs may vary)
sudo apt install clang
- Install uv for python dep/env management
curl -LsSf https://astral.sh/uv/install.sh | sh
- Install deps and create virtual env:
uv sync
- Compile rust code
uv run python -m ensurepip --upgrade
uv run maturin develop --release
- Train a network
uv run python src/c4a0/main.py train --max-gens=10
- Play against the network
uv run python src/c4a0/main.py play --model=best
- (Optional) Download a connect four solver to objectively measure training progress:
git clone https://github.com/PascalPons/connect4.git solver
cd solver
make
# Download opening book to speed up solutions
wget https://github.com/PascalPons/connect4/releases/download/book/7x6.book
Now pass the solver paths to train
, score
and other commands:
uv run python src/c4a0/main.py score solver/c4solver solver/7x6.book
After 9 generations of training (approx ~15 min on an RTX 3090) we achieve the following results:
PyTorch NN src/c4a0/nn.py
A resnet-style CNN that takes in as input a baord position and outputs a Policy (probability distribution over moves weighted by promise) and Q Value (predicted win/loss value [-1, 1]).
Various NN hyperparameters can are sweepable via the nn-sweep
command.
Connect Four Game Logic rust/src/c4r.rs
Implements compact bitboard representation of board state (Pos
) and all connect four rules
and game logic.
Monte Carlo Tree Search (MCTS) rust/src/mcts.rs
Implements Monte Carlo Tree Search - the core algorithm behind Alpha-Zero. Probabalistically explores potential game pathways and optimally hones in on the optimal move to play from any position.
MCTS relies on outputs from the NN. The output of MCTS helps train the next generation's NN.
Self Play rust/src/self_play.rs
Uses rust multi-threading to parallelize self play (training data generation).
Solver rust/src/solver.rs
Connect Four is a perfectly solved game. See Pascal Pons's great writeup on how to build a perfect solver. We can use these solutions to objectively measure our NN's performance. Importantly we never train on these solutions, instead only using our self-play data to improve the NN's performance.
solver.rs
contains the stdin/out interface to learn the objective solutions to our training
positions. Because solutions are expensive to compute, we cache them in a local
rocksdb database (solutions.db). We then measure our
training positions to see how often they recommend optimal moves as determined by the solver.