Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
/ c4a0 Public

Alpha-Zero Connect Four NN trained via self play

License

Notifications You must be signed in to change notification settings

advait/c4a0

Repository files navigation

c4a0: Connect Four Alpha-Zero

CI

An Alpha-Zero-style Connect Four engine trained entirely via self play.

The game logic, Monte Carlo Tree Search, and multi-threaded self play engine is written in rust here.

The NN is written in Python/PyTorch here and interfaces with rust via PyO3

Terminal UI

Usage

  1. Install clang
# Instructions for Ubuntu/Debian (other OSs may vary)
sudo apt install clang
  1. Install uv for python dep/env management
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Install deps and create virtual env:
uv sync
  1. Compile rust code
uv run python -m ensurepip --upgrade
uv run maturin develop --release
  1. Train a network
uv run python src/c4a0/main.py train --max-gens=10
  1. Play against the network
uv run python src/c4a0/main.py play --model=best
  1. (Optional) Download a connect four solver to objectively measure training progress:
git clone https://github.com/PascalPons/connect4.git solver
cd solver
make
# Download opening book to speed up solutions
wget https://github.com/PascalPons/connect4/releases/download/book/7x6.book

Now pass the solver paths to train, score and other commands:

uv run python src/c4a0/main.py score solver/c4solver solver/7x6.book

Results

After 9 generations of training (approx ~15 min on an RTX 3090) we achieve the following results:

Training Results

Architecture

PyTorch NN src/c4a0/nn.py

A resnet-style CNN that takes in as input a baord position and outputs a Policy (probability distribution over moves weighted by promise) and Q Value (predicted win/loss value [-1, 1]).

Various NN hyperparameters can are sweepable via the nn-sweep command.

Connect Four Game Logic rust/src/c4r.rs

Implements compact bitboard representation of board state (Pos) and all connect four rules and game logic.

Monte Carlo Tree Search (MCTS) rust/src/mcts.rs

Implements Monte Carlo Tree Search - the core algorithm behind Alpha-Zero. Probabalistically explores potential game pathways and optimally hones in on the optimal move to play from any position.

MCTS relies on outputs from the NN. The output of MCTS helps train the next generation's NN.

Uses rust multi-threading to parallelize self play (training data generation).

Connect Four is a perfectly solved game. See Pascal Pons's great writeup on how to build a perfect solver. We can use these solutions to objectively measure our NN's performance. Importantly we never train on these solutions, instead only using our self-play data to improve the NN's performance.

solver.rs contains the stdin/out interface to learn the objective solutions to our training positions. Because solutions are expensive to compute, we cache them in a local rocksdb database (solutions.db). We then measure our training positions to see how often they recommend optimal moves as determined by the solver.