Evolving a Tic Tac Toe AI In Your Browser
This is an experiment, implementing @neat-evolution in a real-world web app.
I recently completed an experiment to evolve a Tic Tac Toe AI using the @neat-evolution library. The goal was to see if the NEAT (NeuroEvolution of Augmenting Topologies) algorithm could effectively learn Tic Tac Toe. For background on the library powering this, see my previous post on evolving neural networks in TypeScript.
Try it out
You can play the Tic Tac Toe AI demo here: https://hexagonoids.heygrady.com/experiments/tictactoe/
The source code is available on Github: https://github.com/heygrady/hexagonoids/
Key Features
- In-Browser Training: The AI trains directly in the browser using Web Workers to keep the UI responsive.
- Auto-saving progress (using
idb-keyval), allowing training to resume after page reloads. - Hot swapping the NEAT variant, allowing you to test different algorithms like NEAT, HyperNEAT, ES-HyperNEAT, and DES-HyperNEAT.
- Modern frontend stack: Built with SolidJS and nanostores for lightweight state management.
- Thrilling Tic Tac Toe action!
Game play
The game is a standard Tic Tac Toe game where you play against an AI opponent. The AI starts off as a completely untrained neural network and trains in the background as you play. After each match, the AI evolves for a few more generations and selects the best performing network as your next opponent. You should see the AI improve steadily as you play. Usually it only takes a few dozen matches before the AI is playing at a competent level and within 200 generations it should force a tie for nearly every game. You will likely see the same “best” opponent for many generations with occasional breakthroughs that lead to rapid improvement. After 500 or more generations it should max out its fitness and further evolution will only degrade performance.
The game also updates your Glicko-2 rating as you play against the AI, so you can see how well you stack up against the evolving networks.
Settings
There are some settings you can tweak, such as choosing how many generations the AI trains after each match, and which NEAT variant to use. My 7-year-old daughter requested that there be an option to choose the emoji used for the player pieces, but I’m the one who chose unicorns and rainbows as the default pieces.
When you swap the NEAT variant it will save the current population and start (or resume) training using the selected variant. You will likely see the best results using the default options, but it’s fun to experiment with the different algorithms. Tic Tac Toe is a simple game, so the differences between the algorithms are not as pronounced as they would be in more complex games.
Tech Stack
The demo is built with Astro deployed to Vercel. The UI
uses SolidJS using the client:only="solid-js"
directive for client-side hydration (no SSR needed for this interactive app).
State management uses nanostores, a lightweight (~1KB) framework-agnostic library. I implemented a “draft vs committed” pattern for settings: some changes (like emoji selection) apply immediately, while others (like switching algorithms) stage in a draft until confirmed.
Persistence relies on
idb-keyval for IndexedDB
storage. The app maintains three separate stores: user settings, player Glicko-2
ratings and match history, and evolved populations (which can grow to several
megabytes as networks evolve). If the stored populations grow too large they are
pruned to save space.
Fitness
Devising an effective fitness function was the greatest challenge. Tic Tac Toe has only three outcomes: win, lose, or draw. This makes it difficult for the AI to learn from its mistakes; a loss provides little information about what went wrong. After many iterations I settled on a tournament environment using swiss tournaments and using Glicko-2 ratings to evaluate the networks.
To establish accurate Glicko-2 ratings, the AI needs to play a substantial number of games. Prior to the tournament, I run the networks through a gauntlet against a fixed set of opponents:
- a Minimax AI that plays perfect games
- a Sleeper AI that makes random opening moves and switches to the Minimax AI to finish the game.
This seeds the tournament with a consistent baseline rating for all the networks.
A perfect player will always force a draw. It’s not possible to dominate Tic Tac Toe and so the fitness maxes out around 0.8 to 0.9. The final fitness score is based on min/max Glicko-2 ratings and also the average match score from the gauntlet games. The Glicko-2 rating score and the gauntlet score are combined into the final fitness score; these weights can be tuned (but not in the UI, yet). I have it set to favor the Glicko-2 score to ensure we’re choosing the strongest network of each generation.
Tournament Strategy, and Tic Tac Toe Environment
One key feature of the neat-evolution library is the ability to create custom environments for training. In this case, I created a Tic Tac Toe environment that will conduct a match between two networks or against a gauntlet of fixed AI opponents. The environment handles the game logic, scoring, and fitness evaluation of those matches.
To make this work with the built-in worker-evaluator, I added the ability to provide an evaluation-strategy, which in this case is responsible for running the tournament and calculating the final fitness scores from the gauntlet and Glicko2 scores.
Bundling with Vite
The Tic Tac Toe demo is part of an Astro app built with Vite. Because the neat-evolution library uses web workers, I configured Vite to handle the worker files correctly. One complication was that the workers are initialized deep within the neat-evolution library, which is not the happy path for Vite. The official documentation on Vite’s worker features is comprehensive, but integrating workers initialized outside the standard patterns required some additional investigation. There were three tricks I used to get this working:
- glob imports
- using the
?worker&urlsuffix to get the worker URL - manual chunking to ensure the code the worker imports isn’t bundled with the main thread code
The ?worker&url suffix tells Vite to process the file as a worker and return
its URL rather than the module itself:
import workerEvaluatorScriptUrl from "@neat-evolution/worker-evaluator/workerEvaluatorScript?worker&url";
I modified the neat-evolution library to use these techniques, allowing for
paths to be passed in from the outside. This was challenging to debug. Without
manual chunking the app would throw cryptic errors like a is undefined, which
was caused by SolidJS (which accesses the window object) being added to the same
chunk as some code the worker imported — workers don’t have a window object,
so the minified code failed. After some trial and error with Rollup’s
manualChunks configuration, I was able to isolate the worker code into
separate chunks.
Effective training
A great deal of work went into how to encode the Tic Tac Toe board for the neural network inputs and outputs. The board is represented as a flat array of 18 one-hot encoded inputs with each input representing a cell on the board, 9 for the AI’s pieces and 9 for the opponents pieces. There are 9 outputs, one for each cell on the board, representing the AI’s confidence in placing a piece in that cell. The highest output value is chosen as the AI’s move.
A crucial optimization was normalizing the orientation of the board before feeding it to the neural network. There are a limited number of possible board configurations. If you account for symmetries (rotations and reflections) there are only 765 unique board states. By normalizing the board to a canonical orientation, the neural network can learn more effectively, as it doesn’t have to learn the same patterns multiple times for different orientations.
The canonicalization algorithm searches all 8 symmetries (4 rotations × 2 reflection states) in a single loop:
// Search all 8 symmetries to find the canonical (lexicographically smallest) board
for (let i = 0; i < 4; i++) {
if (i > 0) {
currentBoard = rotateBoard90(currentBoard);
currentMapping = currentMapping.map(rotateIndex90);
}
// Test current rotation
if (boardToKey(currentBoard) < bestKey) {
bestKey = boardToKey(currentBoard);
bestBoard = currentBoard.slice();
bestMapping = currentMapping.slice();
}
// Test reflection of current rotation
const reflected = reflectBoardVertical(currentBoard);
if (boardToKey(reflected) < bestKey) {
bestKey = boardToKey(reflected);
bestBoard = reflected;
bestMapping = currentMapping.map(reflectIndexVertical);
}
}
The function returns both the canonical board and an index mapping, so after the neural network outputs its move, the result can be transformed back to the original orientation. Results are cached using an LRU cache for performance.
| Metric | Count | Description |
|---|---|---|
| State Space (Simple) | 5,478 | All reachable legal arrangements of X and O. |
| State Space (Unique) | 765 | All reachable arrangements, removing rotational/reflective duplicates. |
| Game Tree Complexity | 26,830 | The number of possible distinct games played (sequences of moves), not just static snapshots. |
After some experimentation, I decided to choose the canonical board state based on the lowest numerical representation of the board using only the opponent’s pieces. This seemed to help the AI learn defensive strategies more effectively, as it would see similar board states more frequently. The canonical states are not spatially consistent so similar board states might have different orientations, which probably hindered learning. Choosing the stable orientation using only the opponent’s pieces seemed to improve learning speed.
Another innovation was choosing an arbitrary input for the null state (an empty board). The network was receiving all zeroes in that case. I decided to use zeroes for the ai pieces and ones for the opponent pieces. This seemed to help the network choose more consistent opening moves.
Gauntlet Opponent
This was another area of significant trial and error. Tic Tac Toe is a solved game. You can create a perfect player using the Minimax algorithm. However, a perfect Minimax AI crushes untrained networks every time, creating a learning bottleneck. To address this, I created a Sleeper AI that plays random moves for the first two turns before switching to perfect play. This exposes the networks to a wider variety of board states and allows them to learn more effectively. The random opening moves create more diverse board states, avoiding overfitting to a narrow set of scenarios.
I experimented with a stable of AI opponents but the Minimax AI with random openings worked best.
Worker Pools and Worker Actions
I extracted a reusable worker pool and worker action system that can run
arbitrary tasks in web workers. These are now part of the neat-evolution library
as the
@neat-evolution/worker-pool
and
@neat-evolution/worker-actions
modules. This simplified the code for
@neat-evolution/worker-evaluator
and
@neat-evolution/worker-reproducer,
allowing them to focus on their core responsibilities while delegating worker
management to the pool.
The architecture has two layers: WorkerPool manages thread lifecycle using
semaphore-based concurrency control (via
async-sema), while WorkerActions
provides the communication layer. The pool uses a LIFO stack for worker reuse,
promoting cache locality, and decouples thread count from task concurrency—you
can queue more tasks than you have threads.
Flux Standard Actions for Workers
The action system is inspired by
redux-actions and the Flux
Standard Action pattern. The
createAction
function returns action creators with a nice trick: toString() returns the
action type, so action creators can be used directly as object keys:
export function createAction<P = any>(
type: string,
payloadCreator: (...args: any[]) => P = identityPayloadCreator<P>,
) {
const actionCreator = (...args: any[]): WorkerAction<P> => {
const payload = payloadCreator(...args);
return { type, payload };
};
actionCreator.toString = () => type;
return actionCreator;
}
Dual Message Handling
The system supports two communication patterns. The first is RPC-style
request/response: dispatcher.request(action) returns a promise that resolves
when the worker responds. Request IDs use base36 encoding for compactness
(counter.toString(36)) rather than UUIDs.
The second pattern is spontaneous events: workers can emit events that the main thread listens for. The dispatcher handles both in a single message handler:
class Dispatcher {
private _onMessage(message: any, worker: Worker) {
// 1. Handle RPC responses (correlate request → response)
if (this.requestManager.handleResponse(action)) {
return;
}
// 2. Handle spontaneous events from worker
const listeners = this.eventListeners.get(action.type);
if (listeners != null) {
for (const listener of listeners) {
listener(action, context);
}
}
}
}
A few other details I’m pleased with: workers signal readiness via a
WORKER_READY message, but there’s a failsafe timeout that auto-sends the ready
signal if the developer forgets to call handler.ready(). The system also
supports Transferable objects for zero-copy passing of ArrayBuffers between
threads.
A Little Help From My Friends
This would not have been possible without my friends Gemini CLI and Claude Code. These tools have been rapidly advancing and picking this project back up after a few months was made much easier with their help. I was particularly impressed with Claude Code’s ability to make smart refactors and remain effective even for long coding sessions. It took some learning to develop a productive workflow but once I got the hang of it, it was quite addictive.
Finalizing this project, getting all of the code committed and released, was
almost fun. I like to use conventional commits and automated release processes
and using Claude Code to make atomic commits and use gh to create the PR and
monitor the release process was a dream come true. It was really nice to hand
over the bulk of the work rebasing and squashing a jungle of 200+ commits into a
nice clean PR.
These tools are amazing when they work and spooky when they don’t. It’s like working with the mythical 10x developer who occasionally loses their mind and morphs into a confused intern. They are not a replacement for an experienced developer, but they can be a force multiplier for one. Even if you don’t trust the code they write, I will never go back to committing and releasing code the old fashioned way.
Next Steps
It would be possible to expand the settings to expose many more of the arbitrary
parameters, such as the weights for the fitness function or the number of
gauntlet games, etc. As part of getting this all working I ended up creating an
EvolutionManager class that abstracted away much of the boilerplate around
managing the NEAT instance, the evaluator, the reproducer, and the training
loop. This could be further abstracted to allow for more flexible configurations
as it currently bakes in the specific configuration needed for the Tic Tac Toe
demo.
Conclusion
This project pushed me to learn far more about Tic Tac Toe than I ever expected. The Glicko2 rating system, board symmetry optimization, and Swiss tournament pairing were all new territory. The tournament strategy alone required extending the neat-evolution library in ways I hadn’t anticipated, which led to the worker-actions RPC system I’m now quite pleased with.
What I’m most satisfied with is how production-ready the result feels: training persists across page reloads, algorithms can be hot-swapped without losing progress, and it runs smoothly on mobile devices. The persistence layer makes it feel like a real application rather than a throwaway demo.
If you want to see a neural network learn in real-time, give it a try. Watch the fitness climb from 0.5 to 0.8 over a few dozen generations, and see if you can still beat it once it’s had a chance to evolve (you will be able to, for certain).
Written by Grady Kuhnline.