tablebase neural nets

Robert Pope · Post by **Robert Pope** » Sun Aug 08, 2021 12:11 am

Does anyone have experience building their own neural nets to duplicate endgame databases? The only thing close I can think of is dkappe's ender nets, but those are based on lc0's architecture, plus they were training on self-play rather than tablebases.

I decided to give it a try and see how far I could get, but I've run into some challenges and would be interested in comparing experiences. I'm starting with the basic KRK ending, which is either a draw, or mate(d) in 1-16.

For my first go, I set the training labels to 2000-dtz (times -1 for side to move being the loser), and draws at 0. I thought that made sense, since the longer the win, the closer to a draw it is. But even with a hugely overparameterized model (100,000+ parameters), I wasn't even getting a very good fit on the training data. e.g. I had a lot of cases where the actual value was 1996 and the prediction was 1980 or 2004 (outside the range of the whole training set).

Then I switched to a categorical model, where the labels are 0-33 for the different possible outcomes. So far, here I've gotten models that can fit the training data very well (90% or 100% accuracy), but they aren't performing well on the validation data.

For people who have tried this, what worked well for you? board representation, net model, etc. How complicated an endgame were you able to train well? It's annoying that I am having this much trouble on what should be a very basic endgame.

connor_mcmonigle · Post by **connor_mcmonigle** » Sun Aug 08, 2021 2:01 am

DTZ prediction strikes me as both difficult to get right and not relevant to playing strength. Ultimately, a model predicting WDL probabilities for some endgame position will still develop a reasonable notion of "progress" in a winning position as it's likely the positions the model determines to have more certain win/loss values will naturally correspond to positions with lower DTZ values. However, if you're dead set on trying to predict dtz values, it might be interesting to try predicting the parameters of some discrete distribution (perhaps Poisson or Binomial?) and maximizing the log probability assigned to the dtz label by the chosen discrete distribution with the predicted parameters.

Seer v2.0.1/2.1.0 (later versions rely on a network trained through several self play iterations starting with the network used v2.0.1/2.1.0) relied on a network trained via a process involving starting with a network trained to predict WDL probabilities from several million common <=6 man positions labeled by way of syzygy egtbs (http://talkchess.com/forum3/viewtopic.php?f=2&t=77187). The resultant network proved very capable in endgame positions. It was effectively perfect in <= 5 man positions and quite reasonable in common 6 man as well as 7 man positions. The network relies on halfKA inputs and has the following topology (using dense skip connections): halfKA->ReLU(160x2)->Relu(16)->Relu(16+16)->Relu(16+16+16)->softmax(3).

If you're interested in experimenting with it, you can download v2.1.0 here: https://github.com/connormcmonigle/seer ... tag/v2.1.0. Setting up the desired position and entering "eval" will result in a WDL prediction such as the following being printed:

[d]1k6/8/P7/8/3B4/8/8/K7 b - - 0 1

Code: Select all

position fen 1k6/8/P7/8/3B4/8/8/K7 b - - 0 1
eval
score: 0
(w, d, l): (1.95676e-06, 0.999953, 4.55584e-05)

Robert Pope · Post by **Robert Pope** » Sun Aug 08, 2021 3:29 am

Thanks, I'll take a look at that.

I guess there were two reasons I was trying to predict dtz instead of wdl:
1. I was starting from the ground up and KRK is 100% won, so predicting wdl isn't very useful.
2. Knowing a position is won doesn't translate into being able to force the mate. If you have a good dtz or dtm prediction, you can drive to mate with a little searching to overcome errors. Maybe I'm wrong, but I'm not convinced that a search attempting to maximize the wdl prediction would do that (e.g. there could be a large section of a 6 piece problem that you can predict the correct wdl perfectly. The best that can do is tell you which moves risk not winning anymore.)

dkappe · Post by **dkappe** » Sun Aug 08, 2021 5:17 am

I did try to do an Ender type net with the old style NNUE nets. I trained two nets, one with a full 1b dataset and another with the 18 piece and fewer positions plus some extra positions to make up the difference in quantity. Running tests on 10k 16 piece epd’s (played both ways) that Komodo 14 thought were within 200 cp of even, the full net was much better than the Ender net at 20” + 0.2”. I don’t remember the exact results, but it wasn’t within 100 elo.

connor_mcmonigle · Post by **connor_mcmonigle** » Sun Aug 08, 2021 6:18 am

Robert Pope wrote: ↑Sun Aug 08, 2021 3:29 am ...
2. Knowing a position is won doesn't translate into being able to force the mate. If you have a good dtz or dtm prediction, you can drive to mate with a little searching to overcome errors. Maybe I'm wrong, but I'm not convinced that a search attempting to maximize the wdl prediction would do that (e.g. there could be a large section of a 6 piece problem that you can predict the correct wdl perfectly. The best that can do is tell you which moves risk not winning anymore.)

The Pr(W) + 0.5*Pr(D) (expectation) maximization solution certainly isn't perfect, but it definitely proved much better than I was anticipating. I thought I might have to introduce something analogous to a "moveslefthead", but it didn't prove to be an issue in practice. DTZ prediction is definitely an interesting problem to try to solve even if I'm unconvinced it's relevant to playing strength. Perhaps some insight into a good choice for the discrete posterior distribution could be gained by plotting a histogram of dtz values.

Robert Pope · Post by **Robert Pope** » Thu Aug 19, 2021 5:10 am

Hey, Connor,
I've tried to do some reading on the NNUE HalfKA architecture, but I'm struggling a bit. Could you explain what the format of the inputs are to this net?
Chessprogramming.org says it's 12x64x64 = 45,056 (actually, that must be 11x64x64) times two. I get the 12 pieces, but I don't see what the 64x64 means. My own representation was only 15x64, so I don't see where they get so many inputs.

connor_mcmonigle · Post by **connor_mcmonigle** » Thu Aug 19, 2021 7:47 am

Robert Pope wrote: ↑Thu Aug 19, 2021 5:10 am Hey, Connor,
I've tried to do some reading on the NNUE HalfKA architecture, but I'm struggling a bit. Could you explain what the format of the inputs are to this net?
Chessprogramming.org says it's 12x64x64 = 45,056 (actually, that must be 11x64x64) times two. I get the 12 pieces, but I don't see what the 64x64 means. My own representation was only 15x64, so I don't see where they get so many inputs.

With HalfKA, you have 768 (piece x square) features per our king's square -> 64x12x64. Consequently feature indices are given by (king square) x 12 x 64 + (piece type) x 64 + (square).

The other important aspect is the notion of "half" which amounts to a clever way to encode tempo information. You can find more information here: https://github.com/glinscott/nnue-pytor ... cs/nnue.md.

brianr · Post by **brianr** » Thu Aug 19, 2021 10:00 pm

Robert Pope wrote: ↑Sun Aug 08, 2021 12:11 am Does anyone have experience building their own neural nets to duplicate endgame databases? The only thing close I can think of is dkappe's ender nets, but those are based on lc0's architecture, plus they were training on self-play rather than tablebases.

I tried training some NNs using Lc0 type nets with sample positions from all possible piece locations and the "correct" tablebase moves.
With the simple KQvK and KRvK positions the nets could learn to mate pretty well.
They completely failed to learn KBNvK mates, so I gave up.

Robert Pope · Post by **Robert Pope** » Fri Aug 20, 2021 11:03 pm

connor_mcmonigle wrote: ↑Thu Aug 19, 2021 7:47 am
Robert Pope wrote: ↑Thu Aug 19, 2021 5:10 am Hey, Connor,
I've tried to do some reading on the NNUE HalfKA architecture, but I'm struggling a bit. Could you explain what the format of the inputs are to this net?
Chessprogramming.org says it's 12x64x64 = 45,056 (actually, that must be 11x64x64) times two. I get the 12 pieces, but I don't see what the 64x64 means. My own representation was only 15x64, so I don't see where they get so many inputs.
With HalfKA, you have 768 (piece x square) features per our king's square -> 64x12x64. Consequently feature indices are given by (king square) x 12 x 64 + (piece type) x 64 + (square).

The other important aspect is the notion of "half" which amounts to a clever way to encode tempo information. You can find more information here: https://github.com/glinscott/nnue-pytor ... cs/nnue.md.

Thanks, that was very helpful!

tablebase neural nets

tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets

Re: tablebase neural nets