AlphaZero

Alexander Lim · Post by **Alexander Lim** » Wed Apr 29, 2020 4:45 pm

Fafkorn wrote: ↑Tue Apr 28, 2020 6:42 pm I have another question according to this topic. When I get my policy from neural network (in my case 4096 numbers), let's say only 3 moves are legal with policy values 0.1, 0.1, 0.1. This affects on proportions postion reward to visits value in my U(s, a). Do I have to normalize policy values and how can I do this?

Yes you should normalise the values.

For example let's suppose there are only 3 legal moves with values of say 0.15, 0.4, 0.25 (so the remaining 0.2 is spread out amongst the 4093 illegal moves). You now normalise by dividing by the sum of the values:

0.15 + 0.4 + 0.25 = 0.8

0.15 / 0.8 = 0.1875
0.4 / 0.8 = 0.5
0.25 / 0.8 = 0.3125

So your new policy values are 0.1875, 0.5, 0.3125 which now sum to 1.

In short you disregard the illegal moves and normalise only the legal moves.

Fafkorn · Post by **Fafkorn** » Wed Apr 29, 2020 5:53 pm

Thank you very much. I couldn't find this in any paper.

Fafkorn · Post by **Fafkorn** » Thu Apr 30, 2020 1:10 pm

When I perform pretaining, I use GM games and encode their moves to 4095 zeros and 1 one in my case as policy values?

brianr · Post by **brianr** » Thu Apr 30, 2020 4:11 pm

Don't know what you mean by pre-training, but a suggestion.

Doing supervised learning (SL) from games with only the move actually made set to 1 turns out to reduce net strength by about 100-150 Elo. Training games from Lc0 include policy probabilities for all moves and produce stronger nets. It is possible to re-score PGN files and add some policy moves from short searches. A sample data file has already been provided (with explaination) here:
https://github.com/dkappe/leela-chess-w ... -Gyal-Data

Using this data can produce a significantly stronger net than with only the one move policy, but still somewhat weaker than the "full" training data.

Fafkorn · Post by **Fafkorn** » Thu Apr 30, 2020 5:07 pm

I think that's what I've been looking for.
By pre-training I meant to train model with data from games played by pro players or engines. After that I wanted to start training model with self-played games.

thomasahle · Post by **thomasahle** » Tue May 05, 2020 10:51 am

Check outthe Fastchess of you're interested: https://github.com/thomasahle/fastchess . It's a python implementation of the MCTS approach in the Alpha Zero papers, and it uses the simplest "Neural" network architecture possible: A linear function from the current boolean board to the next move. (A 1895 x 4095 matrix.)

If all you want is 2000 ELO this should be more than enough. Fastchess is 1700-1800 ELO and it is written in Python.

You need some data to train on. The best, easily accessible data is the cclr-v3 data from http://data.lczero.org/files/

brianr · Post by **brianr** » Tue May 05, 2020 2:47 pm

A couple of things. The CCRL Standard Dataset is quite valuable as it provides a trained net benchmark to compare against a net you have trained from the same data. However, as mentioned earlier, the Bad Gyal data will produce stronger nets having more Policy information. And, of course, the actual Lc0 data has full Policy info and will produce even better nets.

In terms of pre-processing the data, be aware that Windows struggles with directories that have a large number of files (more than about 30,000). So, I try to limit the number of games/chunk files (if one game per chunk, some datasets have more than one game, like I think Bad Gyal data) to about 30,000 in a multi-level directory structure. If you try to open a directory with the 2 million files from the CCRL Standard dataset, it will appear as if your system is hanging for a very long time (even with fast SSD disk). There may be some Windows tools to help with this, but I used Ubuntu to split the large directory into many smaller sub-directories, which only has to be done one time.

Milos · Post by **Milos** » Tue May 05, 2020 3:43 pm

Fafkorn wrote: ↑Wed Apr 29, 2020 5:53 pm Thank you very much. I couldn't find this in any paper.

Using softmax after fully connected layer does what you need automatically. It's a basic CNN stuff pretty much a norm in any classification task.

Fafkorn · Post by **Fafkorn** » Tue May 05, 2020 10:19 pm

Yes, but softmax doesn't know how to prevent from giving non zero values to illegal moves

Fafkorn · Post by **Fafkorn** » Tue May 05, 2020 10:40 pm

Due to fast chess comment
I think that I have MCTS implementation and NN model build already behind me. I've started to train my model with some engine games data. I don't know how to use lczero data for my own purpose. I guess that my policy size (4096 possible moves) is different from their.

AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero

Re: AlphaZero