Fafkorn wrote: ↑Tue Apr 28, 2020 6:42 pm
I have another question according to this topic. When I get my policy from neural network (in my case 4096 numbers), let's say only 3 moves are legal with policy values 0.1, 0.1, 0.1. This affects on proportions postion reward to visits value in my U(s, a). Do I have to normalize policy values and how can I do this?
Yes you should normalise the values.
For example let's suppose there are only 3 legal moves with values of say 0.15, 0.4, 0.25 (so the remaining 0.2 is spread out amongst the 4093 illegal moves). You now normalise by dividing by the sum of the values:
Don't know what you mean by pre-training, but a suggestion.
Doing supervised learning (SL) from games with only the move actually made set to 1 turns out to reduce net strength by about 100-150 Elo. Training games from Lc0 include policy probabilities for all moves and produce stronger nets. It is possible to re-score PGN files and add some policy moves from short searches. A sample data file has already been provided (with explaination) here: https://github.com/dkappe/leela-chess-w ... -Gyal-Data
Using this data can produce a significantly stronger net than with only the one move policy, but still somewhat weaker than the "full" training data.
I think that's what I've been looking for.
By pre-training I meant to train model with data from games played by pro players or engines. After that I wanted to start training model with self-played games.
Check outthe Fastchess of you're interested: https://github.com/thomasahle/fastchess . It's a python implementation of the MCTS approach in the Alpha Zero papers, and it uses the simplest "Neural" network architecture possible: A linear function from the current boolean board to the next move. (A 1895 x 4095 matrix.)
If all you want is 2000 ELO this should be more than enough. Fastchess is 1700-1800 ELO and it is written in Python.
You need some data to train on. The best, easily accessible data is the cclr-v3 data from http://data.lczero.org/files/
A couple of things. The CCRL Standard Dataset is quite valuable as it provides a trained net benchmark to compare against a net you have trained from the same data. However, as mentioned earlier, the Bad Gyal data will produce stronger nets having more Policy information. And, of course, the actual Lc0 data has full Policy info and will produce even better nets.
In terms of pre-processing the data, be aware that Windows struggles with directories that have a large number of files (more than about 30,000). So, I try to limit the number of games/chunk files (if one game per chunk, some datasets have more than one game, like I think Bad Gyal data) to about 30,000 in a multi-level directory structure. If you try to open a directory with the 2 million files from the CCRL Standard dataset, it will appear as if your system is hanging for a very long time (even with fast SSD disk). There may be some Windows tools to help with this, but I used Ubuntu to split the large directory into many smaller sub-directories, which only has to be done one time.
Due to fast chess comment
I think that I have MCTS implementation and NN model build already behind me. I've started to train my model with some engine games data. I don't know how to use lczero data for my own purpose. I guess that my policy size (4096 possible moves) is different from their.