How to teach neural network not to lose?

jaroslav.tavgen · Post by **jaroslav.tavgen** » Fri Feb 27, 2026 9:21 pm

Hi! Could you help me with building a neural network?

As a sign that I understand something in neural networks (I probably don't, LOL) I've decided to teach NN how to play a 4x4 tic-tactoe.

And I always encounter the same problem: the neural network greatly learns how to play but never learns 100%.

For example the NN which is learning how not to lose as X (it treats a victory and a draw the same way) loses from 14 to 40 games per 10 000 games. And it seems that it either stops learning or starts learning so slowly it is not noticeable and would take forever to perfect the game.

The neural network has:

32 input neurons (each being 0 or 1 for crosses and naughts)
8 hidden layers 32 hidden neurons each
one output layer
all activation functions are sigmoid
learning rate: 0.00001

The neural network learns as follows: it plays 10 000 games where crosses are a neural network and naughts play random moves. The game counts how many times crosses or naughts won. The neural network is not learning during this time.

After 10 000 games are played the statistics is printed out (and then nullified) and the learning mode is turned on on. Now the game does not keep statistics (how many times crosses or naughts won) but it saves the board state (32 neurons reflecting crosses and naughts, each square could be 0 or 1) after each move of the crosses. After the game is finished then it saves the board state after the last move of the crosses as a new input. If the game finished as a draw or a crosses victory then the new output is 1. If the naughts won then the output is 0.

Once there are 32 inputs/otputs the neural network learns in one epoch (backpropagation). Then the counter is dropped and the game has to collect new 32 inputs/outputs from the next games. It keeps doing so until the next 10 000 games are played. The statistics is neigher kept nor printed: the neural network was constantly learning so no need for statistics.

Then the learning mode is turned off and the statistics of how many crosses or naughts won is collected again but the neural network will not be learning during the next 10 000 games. Etc. The cycles always alternate.

And thhis neural network hits this roadblock (14-40 games per 10,000). I know that this is not the limit: I've managed to create neural networks which managed to lose from 0 to 10 games per 10 000 games. But still could not make the neural network learn how not to lose at all.

What should I do to improve the understanding of the neural network?

hgm · Post by **hgm** » Fri Feb 27, 2026 10:05 pm

How does the output layer indicate which move should be played? Is there one output for each cell, and do you play in the cell with the highest output?

jaroslav.tavgen · Post by **jaroslav.tavgen** » Fri Feb 27, 2026 10:56 pm

hgm wrote: ↑Fri Feb 27, 2026 10:05 pm How does the output layer indicate which move should be played? Is there one output for each cell, and do you play in the cell with the highest output?

Yes, the move which gets the highest score at the output layer gets played.

Pseudocode:

Code: Select all

board [ empty_squares [ i ] ] = 1;
...forward propagation
if result > best_result:
    best_result = result
    best_move = empty_squares [ i ];
board [ empty_squares [ i ] ] = 0;
...
board [ best_move ] = whoseTurn;

How to teach neural network not to lose?

How to teach neural network not to lose?

Re: How to teach neural network not to lose?

Re: How to teach neural network not to lose?