question about gensfen

jdart · Post by **jdart** » Sat Dec 26, 2020 2:20 am

I am looking at:

https://github.com/nodchip/Stockfish/bl ... ensfen.cpp

The question is - it looks like by default it occasionally inserts a random move into the played games to increase variety. But then when the game is ended, it assigns the same result to all the generated FENs for the game. Isn't inserting a random move likely to invalidate the result, though, for any positions before the random move? For example, you made a really bad move, and lost the game, you'd assign a "loss" result to previous FENs, but the player might have been doing well before that point.

You can certainly have the same issue when random moves are not inserted, because a move returned from search can be a blunder too, and spoil the game. But it seems to me that the chances of that occurring are higher if you are inserting random moves.

It appears though the code just doesn't worry about this.

--Jon

Daniel Shawul · Post by **Daniel Shawul** » Sat Dec 26, 2020 3:31 am

You need some randomness for variety and this is often controlled by the "temperature" parameter for MCTS-NN reinforcement learning.
I wouldn't worry about the final score being assigned to all positions as long as the randomness (temperature) is limited to a reasonable value.

I use a similar approach for AB-NNUE selfplay game generation as for regular MCTS-NN training.
Upto move 30, moves are sampled based on their score i.e. after doing a multi-pv search over all moves and obtaining scores.
You could use the root moves node counts for sampling if you don't want to do a multi-pv search.
Using temperature does not affect strength that much unless it is really too high.
The way SF NNUE does it by inserting a random move looks like an adhoc solution compared to the more disciplined AlphaZero approach.
Also make sure that white and black use same number of random moves in a given game, otherwise result will be heavily biased to one side.
I had a bug where black was playing one less random move than white ( ply < 30 instead of ply <= 30), and its winning percentage was higher
than whites. Once I fixed the bug white started winning 54% of the random games as expected ...

Joerg Oster · Post by **Joerg Oster** » Sat Dec 26, 2020 5:14 am

Daniel Shawul wrote: ↑Sat Dec 26, 2020 3:31 am You need some randomness for variety and this is often controlled by the "temperature" parameter for MCTS-NN reinforcement learning.
I wouldn't worry about the final score being assigned to all positions as long as the randomness (temperature) is limited to a reasonable value.

I use a similar approach for AB-NNUE selfplay game generation as for regular MCTS-NN training.
Upto move 30, moves are sampled based on their score i.e. after doing a multi-pv search over all moves and obtaining scores.
You could use the root moves node counts for sampling if you don't want to do a multi-pv search.
Using temperature does not affect strength that much unless it is really too high.
The way SF NNUE does it by inserting a random move looks like an adhoc solution compared to the more disciplined AlphaZero approach.
Also make sure that white and black use same number of random moves in a given game, otherwise result will be heavily biased to one side.
I had a bug where black was playing one less random move than white ( ply < 30 instead of ply <= 30), and its winning percentage was higher
than whites. Once I fixed the bug white started winning 54% of the random games as expected ...

But is this really needed?

Networks like the ones in AlphaZero or Lc0, which also provide a policy head,
need to learn to differentiate between good and bad moves.
This doesn't seem to be the case for pure eval networks like NNUE ...

Joost Buijs · Post by **Joost Buijs** » Sat Dec 26, 2020 9:08 am

jdart wrote: ↑Sat Dec 26, 2020 2:20 am I am looking at:

https://github.com/nodchip/Stockfish/bl ... ensfen.cpp

The question is - it looks like by default it occasionally inserts a random move into the played games to increase variety. But then when the game is ended, it assigns the same result to all the generated FENs for the game. Isn't inserting a random move likely to invalidate the result, though, for any positions before the random move? For example, you made a really bad move, and lost the game, you'd assign a "loss" result to previous FENs, but the player might have been doing well before that point.

You can certainly have the same issue when random moves are not inserted, because a move returned from search can be a blunder too, and spoil the game. But it seems to me that the chances of that occurring are higher if you are inserting random moves.

It appears though the code just doesn't worry about this.

This is something that worries me too.

The last year I've been quite busy generating labeled fens for a Draughts program I'm working on and I was not happy with the results I got by using fens from self-play games with random moves inserted. Basically because inserting random moves invalidates the result. When you play enough games all these errors will probably average out, but it just doesn't seem right.

So I decided to use a different approach by generating a large number of positions from self-play games with a very shallow depth, and using root-shuffling to get enough randomness, and labeling them afterwards by playing a game from each position with a normal non randomized search.

Generating games in this way can be done very quickly, however labeling them afterwards costs an enormous amount of time. It is doable if you limit the number of positions by randomly picking positions from these games and using a not too high search depth for labeling. Many positions are already near the endgame, so these games end quickly.

Currently I use the same approach for chess, the difference is that I started by subtracting positions from a large number of internet games instead of generating them myself. Games from the past and present, human and computer, in the hope this gives enough randomness. The results of these games are very unreliable, so I have to relabel them too. When I don't need my machines for something else I can run 60 games in parallel for labeling, and my database grows little by little.

Others seem to be using positions from noobpwnftw's database, something I didn't look at yet.

Daniel Shawul · Post by **Daniel Shawul** » Sat Dec 26, 2020 11:39 am

Joerg Oster wrote: ↑Sat Dec 26, 2020 5:14 am
Daniel Shawul wrote: ↑Sat Dec 26, 2020 3:31 am You need some randomness for variety and this is often controlled by the "temperature" parameter for MCTS-NN reinforcement learning.
I wouldn't worry about the final score being assigned to all positions as long as the randomness (temperature) is limited to a reasonable value.

I use a similar approach for AB-NNUE selfplay game generation as for regular MCTS-NN training.
Upto move 30, moves are sampled based on their score i.e. after doing a multi-pv search over all moves and obtaining scores.
You could use the root moves node counts for sampling if you don't want to do a multi-pv search.
Using temperature does not affect strength that much unless it is really too high.
The way SF NNUE does it by inserting a random move looks like an adhoc solution compared to the more disciplined AlphaZero approach.
Also make sure that white and black use same number of random moves in a given game, otherwise result will be heavily biased to one side.
I had a bug where black was playing one less random move than white ( ply < 30 instead of ply <= 30), and its winning percentage was higher
than whites. Once I fixed the bug white started winning 54% of the random games as expected ...
But is this really needed?

Networks like the ones in AlphaZero or Lc0, which also provide a policy head,
need to learn to differentiate between good and bad moves.
This doesn't seem to be the case for pure eval networks like NNUE ...

The real reason why you need randomness is for exploring the reasonable set of games (openings etc) the engine is going to play.
For example, you want to explore g2g4 much less than e2e4.
Given limited capacity of nets, one needs to priortize learning one set of postions over the other.
Without temperature, the engine is going to learn a very narrow (if not just one) set of games.

Sampling based on the visit distribution of an 800 nodes mcts search mirrors the move choices the engine is making inside the tree.

Daniel Shawul · Post by **Daniel Shawul** » Sat Dec 26, 2020 11:52 am

Joost Buijs wrote: ↑Sat Dec 26, 2020 9:08 am
jdart wrote: ↑Sat Dec 26, 2020 2:20 am I am looking at:

https://github.com/nodchip/Stockfish/bl ... ensfen.cpp

The question is - it looks like by default it occasionally inserts a random move into the played games to increase variety. But then when the game is ended, it assigns the same result to all the generated FENs for the game. Isn't inserting a random move likely to invalidate the result, though, for any positions before the random move? For example, you made a really bad move, and lost the game, you'd assign a "loss" result to previous FENs, but the player might have been doing well before that point.

You can certainly have the same issue when random moves are not inserted, because a move returned from search can be a blunder too, and spoil the game. But it seems to me that the chances of that occurring are higher if you are inserting random moves.

It appears though the code just doesn't worry about this.
This is something that worries me too.

The last year I've been quite busy generating labeled fens for a Draughts program I'm working on and I was not happy with the results I got by using fens from self-play games with random moves inserted. Basically because inserting random moves invalidates the result. When you play enough games all these errors will probably average out, but it just doesn't seem right.

So I decided to use a different approach by generating a large number of positions from self-play games with a very shallow depth, and using root-shuffling to get enough randomness, and labeling them afterwards by playing a game from each position with a normal non randomized search.

Generating games in this way can be done very quickly, however labeling them afterwards costs an enormous amount of time. It is doable if you limit the number of positions by randomly picking positions from these games and using a not too high search depth for labeling. Many positions are already near the endgame, so these games end quickly.

Currently I use the same approach for chess, the difference is that I started by subtracting positions from a large number of internet games instead of generating them myself. Games from the past and present, human and computer, in the hope this gives enough randomness. The results of these games are very unreliable, so I have to relabel them too. When I don't need my machines for something else I can run 60 games in parallel for labeling, and my database grows little by little.

Others seem to be using positions from noobpwnftw's database, something I didn't look at yet.

Lc0 tried using a book and removing temperature all in all and it didn't work out that well.
Also randomizing (having blunders) has benefits.
You need bad games in your training games at all points of your training otherwise your net will forget about basic stuff such as piece values.
I believe SF NNUE trainers also observed similar behavior when using larger depth (say d20) games for training.
Another advantage is that temperature makes your net prefer less blundery positions. That is why lc0 shuffles and shuffles to eventually win some dead- draw-looking positions. AlphaZero didn't actually use temperature for endgames (not temperature from move 15 onwards) but lc0 did find some benefit in using temperature there too. I do it like A0.

If temperature worries you so much, you can use "game splitting" where you have an alternative game starting from the blunder position.

Joerg Oster · Post by **Joerg Oster** » Sat Dec 26, 2020 12:26 pm

Daniel Shawul wrote: ↑Sat Dec 26, 2020 11:39 am
Joerg Oster wrote: ↑Sat Dec 26, 2020 5:14 am
Daniel Shawul wrote: ↑Sat Dec 26, 2020 3:31 am You need some randomness for variety and this is often controlled by the "temperature" parameter for MCTS-NN reinforcement learning.
I wouldn't worry about the final score being assigned to all positions as long as the randomness (temperature) is limited to a reasonable value.

I use a similar approach for AB-NNUE selfplay game generation as for regular MCTS-NN training.
Upto move 30, moves are sampled based on their score i.e. after doing a multi-pv search over all moves and obtaining scores.
You could use the root moves node counts for sampling if you don't want to do a multi-pv search.
Using temperature does not affect strength that much unless it is really too high.
The way SF NNUE does it by inserting a random move looks like an adhoc solution compared to the more disciplined AlphaZero approach.
Also make sure that white and black use same number of random moves in a given game, otherwise result will be heavily biased to one side.
I had a bug where black was playing one less random move than white ( ply < 30 instead of ply <= 30), and its winning percentage was higher
than whites. Once I fixed the bug white started winning 54% of the random games as expected ...
But is this really needed?

Networks like the ones in AlphaZero or Lc0, which also provide a policy head,
need to learn to differentiate between good and bad moves.
This doesn't seem to be the case for pure eval networks like NNUE ...
The real reason why you need randomness is for exploring the reasonable set of games (openings etc) the engine is going to play.
For example, you want to explore g2g4 much less than e2e4.
Given limited capacity of nets, one needs to priortize learning one set of postions over the other.
Without temperature, the engine is going to learn a very narrow (if not just one) set of games.

Sampling based on the visit distribution of an 800 nodes mcts search mirrors the move choices the engine is making inside the tree.

Yes, but opposite to what AlphaZero and Lc0 are/were doing, we are not limited to only play games from the start position.
We can use all kind of different start positions, with all kind of imbalances, from all game phases, Chess960 etc., covering as much king piece positions as possible.
Training (shallow) NNUEs seems to have more in common with tuning a handcrafted eval than with 'learning' of deep NNs which also guide a P-UCT search with their policy head.

Edit: one indication for this is the successful SPSA-tuning of Stockfish's NNUE, see https://github.com/official-stockfish/S ... 2ffd08005d

syzygy · Post by **syzygy** » Sat Dec 26, 2020 1:24 pm

The obvious way to introduce randomisation would seem to be in the opening. First play some reasonable opening moves from a book, then play some moves randomly picked from the top 5 or so, then play the rest normally. When training, ignore everything up to the last random move.

But I have absolutely no experience with training neutral networks

Joost Buijs · Post by **Joost Buijs** » Sat Dec 26, 2020 1:27 pm

Daniel Shawul wrote: ↑Sat Dec 26, 2020 11:52 am
Joost Buijs wrote: ↑Sat Dec 26, 2020 9:08 am
jdart wrote: ↑Sat Dec 26, 2020 2:20 am I am looking at:

https://github.com/nodchip/Stockfish/bl ... ensfen.cpp

The question is - it looks like by default it occasionally inserts a random move into the played games to increase variety. But then when the game is ended, it assigns the same result to all the generated FENs for the game. Isn't inserting a random move likely to invalidate the result, though, for any positions before the random move? For example, you made a really bad move, and lost the game, you'd assign a "loss" result to previous FENs, but the player might have been doing well before that point.

You can certainly have the same issue when random moves are not inserted, because a move returned from search can be a blunder too, and spoil the game. But it seems to me that the chances of that occurring are higher if you are inserting random moves.

It appears though the code just doesn't worry about this.
This is something that worries me too.

The last year I've been quite busy generating labeled fens for a Draughts program I'm working on and I was not happy with the results I got by using fens from self-play games with random moves inserted. Basically because inserting random moves invalidates the result. When you play enough games all these errors will probably average out, but it just doesn't seem right.

So I decided to use a different approach by generating a large number of positions from self-play games with a very shallow depth, and using root-shuffling to get enough randomness, and labeling them afterwards by playing a game from each position with a normal non randomized search.

Generating games in this way can be done very quickly, however labeling them afterwards costs an enormous amount of time. It is doable if you limit the number of positions by randomly picking positions from these games and using a not too high search depth for labeling. Many positions are already near the endgame, so these games end quickly.

Currently I use the same approach for chess, the difference is that I started by subtracting positions from a large number of internet games instead of generating them myself. Games from the past and present, human and computer, in the hope this gives enough randomness. The results of these games are very unreliable, so I have to relabel them too. When I don't need my machines for something else I can run 60 games in parallel for labeling, and my database grows little by little.

Others seem to be using positions from noobpwnftw's database, something I didn't look at yet.
Lc0 tried using a book and removing temperature all in all and it didn't work out that well.
Also randomizing (having blunders) has benefits.
You need bad games in your training games at all points of your training otherwise your net will forget about basic stuff such as piece values.
I believe SF NNUE trainers also observed similar behavior when using larger depth (say d20) games for training.
Another advantage is that temperature makes your net prefer less blundery positions. That is why lc0 shuffles and shuffles to eventually win some dead- draw-looking positions. AlphaZero didn't actually use temperature for endgames (not temperature from move 15 onwards) but lc0 did find some benefit in using temperature there too. I do it like A0.

If temperature worries you so much, you can use "game splitting" where you have an alternative game starting from the blunder position.

I agree that you need randomness, and that you need all kinds of positions, lost ones with a large material difference too. This doesn't change the fact that e.g. labeling a clearly lost position as a win and vice versa because the remainder of the game was played sub-optimally due to randomness is undesirable.

Game splitting is an option, but it is not so easy to determine what the blunder position actually is. This is why I decided to label the positions afterwards, and it has the benefit that positions seen during labeling that are not in the database yet can be added as well. The only difficulty is that this gets rather slow with a very large number of positions, I keep a binary-tree with hashes in memory to determine if a position has been seen before, with some hundred million positions this process starts getting a bit slow.

jdart · Post by **jdart** » Sat Dec 26, 2020 4:30 pm

For eval training (Texel method) I have also "decided to label the positions afterwards," as Joost Buijs did. I have used several methods to generate them including sampling from the search. But then I play out games from the positions using this script: https://github.com/jdart1/arasan-chess/ ... sitions.py.

question about gensfen

question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen

Re: question about gensfen