chess evaluation neural network design

Daniel Shawul · Post by **Daniel Shawul** » Fri May 04, 2018 3:25 pm

AlvaroBegue wrote:Is your plan to filter out non-quiescent positions?

Ideally, you would train by picking a random position, running QS with the current NN and then use the leaf from that search. This is expensive, but it would optimize something very reasonable: the quality of the prediction of the result of the game given by QS.

Filtering out non-quiescent positions is a cheap approximation, and it's possible that it's perfectly fine. After all, what's important in the evaluation (e.g., is this passed pawn enough advantage to win the game? Is this king-side attack likely to win?) can be learned from looking at quiescent positions only. The search will handle the messy situations.

My intuitions about this have evolved over the years, mostly unencumbered by complicated things like "evidence".

The thing is that I am not at all convinced about NN solving even shallow tactics at all.

What I am planning to do is use SEE instead of full blown qsearch() to get queiscent postions. Come to think of it SEE is just a single rollout from a non-quiescent position where you select the lowest valued attacker at each ply. So at the leaves, I will do the SEE rollout and use the NN for evaluation only once at the resulting non-quiescent postion. I am thinking this SEE+NN would be better than hoping to resolve tactics with a stack of residual blocks. There are also history planes in A0 network which supposedly can pick up some move patterns, but maybe "future planes" would have been better. That is use those 8x T-future planes to lookahead 8-steps forward with SEE kind of logic.
I am not sure if it is better to widen (i.e. increase the number of channels with history/future information) or deepen by concatenating the neural networks where output of the policty network ( or SEE) is used to determine the next position for the opponent and so on. Just thinking out aloud.

Daniel

AlvaroBegue · Post by **AlvaroBegue** » Fri May 04, 2018 3:47 pm

Daniel Shawul wrote:
AlvaroBegue wrote:Is your plan to filter out non-quiescent positions?

Ideally, you would train by picking a random position, running QS with the current NN and then use the leaf from that search. This is expensive, but it would optimize something very reasonable: the quality of the prediction of the result of the game given by QS.

Filtering out non-quiescent positions is a cheap approximation, and it's possible that it's perfectly fine. After all, what's important in the evaluation (e.g., is this passed pawn enough advantage to win the game? Is this king-side attack likely to win?) can be learned from looking at quiescent positions only. The search will handle the messy situations.

My intuitions about this have evolved over the years, mostly unencumbered by complicated things like "evidence".
The thing is that I am not at all convinced about NN solving even shallow tactics at all.

What I am planning to do is use SEE instead of full blown qsearch() to get queiscent postions. Come to think of it SEE is just a single rollout from a non-quiescent position where you select the lowest valued attacker at each ply. So at the leaves, I will do the SEE rollout and use the NN for evaluation only once at the resulting non-quiescent postion. I am thinking this SEE+NN would be better than hoping to resolve tactics with a stack of residual blocks. There are also history planes in A0 network which supposedly can pick up some move patterns, but maybe "future planes" would have been better. That is use those 8x T-future planes to lookahead 8-steps forward with SEE kind of logic.
I am not sure if it is better to widen (i.e. increase the number of channels with history/future information) or deepen by concatenating the neural networks where output of the policty network ( or SEE) is used to determine the next position for the opponent and so on. Just thinking out aloud.

Daniel

Maybe I didn't express myself clearly. In my plan, I don't think the NN should take care of *any* tactics.

The source of the confusion is probably this paragraph:

Ideally, you would train by picking a random position, running QS with the current NN and then use the leaf from that search. This is expensive, but it would optimize something very reasonable: the quality of the prediction of the result of the game given by QS.

I think what I said there is correct but confusing. The plan is to call QS, using the NN as the evaluation function. Then how do you measure the fitness of the NN parameters? You treat the whole QS as a black box and try to make it the best possible predictor of the outcome of the game. The procedure to compute the gradient of that function for a particular position would be what I described: Run QS, remember what position gave you the score and then do backpropagation on the evaluation of that position.

Restricting yourself to quiescent positions is an approximation to that. I used to think that the biases introduced by this approximation would hurt the quality of the evaluation function. But now I think that perhaps it's no big deal, if you have enough data to learn from.

Daniel Shawul · Post by **Daniel Shawul** » Fri May 04, 2018 4:50 pm

AlvaroBegue wrote:
Daniel Shawul wrote:
AlvaroBegue wrote:Is your plan to filter out non-quiescent positions?

Ideally, you would train by picking a random position, running QS with the current NN and then use the leaf from that search. This is expensive, but it would optimize something very reasonable: the quality of the prediction of the result of the game given by QS.

Filtering out non-quiescent positions is a cheap approximation, and it's possible that it's perfectly fine. After all, what's important in the evaluation (e.g., is this passed pawn enough advantage to win the game? Is this king-side attack likely to win?) can be learned from looking at quiescent positions only. The search will handle the messy situations.

My intuitions about this have evolved over the years, mostly unencumbered by complicated things like "evidence".
The thing is that I am not at all convinced about NN solving even shallow tactics at all.

What I am planning to do is use SEE instead of full blown qsearch() to get queiscent postions. Come to think of it SEE is just a single rollout from a non-quiescent position where you select the lowest valued attacker at each ply. So at the leaves, I will do the SEE rollout and use the NN for evaluation only once at the resulting non-quiescent postion. I am thinking this SEE+NN would be better than hoping to resolve tactics with a stack of residual blocks. There are also history planes in A0 network which supposedly can pick up some move patterns, but maybe "future planes" would have been better. That is use those 8x T-future planes to lookahead 8-steps forward with SEE kind of logic.
I am not sure if it is better to widen (i.e. increase the number of channels with history/future information) or deepen by concatenating the neural networks where output of the policty network ( or SEE) is used to determine the next position for the opponent and so on. Just thinking out aloud.

Daniel
Maybe I didn't express myself clearly. In my plan, I don't think the NN should take care of *any* tactics.

The source of the confusion is probably this paragraph:
Ideally, you would train by picking a random position, running QS with the current NN and then use the leaf from that search. This is expensive, but it would optimize something very reasonable: the quality of the prediction of the result of the game given by QS.
I think what I said there is correct but confusing. The plan is to call QS, using the NN as the evaluation function. Then how do you measure the fitness of the NN parameters? You treat the whole QS as a black box and try to make it the best possible predictor of the outcome of the game. The procedure to compute the gradient of that function for a particular position would be what I described: Run QS, remember what position gave you the score and then do backpropagation on the evaluation of that position.

Restricting yourself to quiescent positions is an approximation to that. I used to think that the biases introduced by this approximation would hurt the quality of the evaluation function. But now I think that perhaps it's no big deal, if you have enough data to learn from.

I understood you the first time like you said it now. I know that is the reason why you generated those quiet labeled epd files to tune handwritten static evaluation functions -- otherwise one would have to tune on the outcome of a search(d) blackbox function instead. I was just expressing why I decided to go for an evaluation NN that won't try to figure out tactics with stacks of residual blocks, and posing a question if that would be better than a SEE rollout followed by one block NN evaluation. The other method I mention was instead of widening the network with more channels, is to deepen it by feeding the outcome of the policy network to another netowrk (which is equivalent to a manual MCTS rollout with the policy network picking moves). So unlike AlphaZero we are not going to terminate hard after expansion (i.e. without montecarlo rollouts) but do 1 rollout at each leaf until the position becomes quiet, picking the moves with SEE along the way. If I used the result from a polciy network to pick moves, this is like stacking networks for a lookahead search.

AlvaroBegue · Post by **AlvaroBegue** » Fri May 04, 2018 5:03 pm

I don't think the main point of the stack of residual blocks is learning tactics. It can learn things like a bishop having nowhere good to go, or whether the king can come and help stop a passed pawn. I think the stack of residual blocks is the essence of how an A0-style evaluation works.

By providing attack maps as inputs, and by using QS (so the network doesn't need to learn tactics) we might be able to reduce the size of the network somewhat, and perhaps we can find a good compromise between complexity of the NN and speed of evaluation.

I'm not sure what problem you are trying to fix with the idea of using SEE rollouts. Why not just use traditional QS?

AlvaroBegue · Post by **AlvaroBegue** » Fri May 04, 2018 5:08 pm

Oh, perhaps what you are proposing is using an SEE playout as a cheap alternative to running QS for every training sample?

Daniel Shawul · Post by **Daniel Shawul** » Fri May 04, 2018 5:23 pm

AlvaroBegue wrote:I don't think the main point of the stack of residual blocks is learning tactics. It can learn things like a bishop having nowhere good to go, or whether the king can come and help stop a passed pawn. I think the stack of residual blocks is the essence of how an A0-style evaluation works.

By providing attack maps as inputs, and by using QS (so the network doesn't need to learn tactics) we might be able to reduce the size of the network somewhat, and perhaps we can find a good compromise between complexity of the NN and speed of evaluation.

I'm not sure what problem you are trying to fix with the idea of using SEE rollouts. Why not just use traditional QS?

The QS is sort of full width so I don't want to have to do many neural network eval calls at the leaves. A single SEE rollout, on the other hand, still does one neural network evaluation, while incorporating heuriustic tactics into it. If you decide to use a policy network instead, the NN would have to be called at each ply. Maybe the NN won't be as costly as I though it it does mostly standpat cutoffs.

Ok the king is a short-range mover so it may help to have more stacks of residual blocks, but the attack maps should significantly help with all the other pieces. A 2-layer convnet should be able to pick up on the value of having the king support/block passed pawns. A stack of NN layers is not going to help much because it is not going to evaluate accurately if the king has a smooth path to the passed pawn anyway, so it might not be significantly better than a 2-layer convnet that would just decide the outcome based on the distance, attack maps etc. The AlphaZero method has went to great lengths not to incorporate any domain knowledge -- from the selection of input planes to the stacks of residual blocks -- that maynot be optimal if you don't care about not using heuriustics.

AlvaroBegue · Post by **AlvaroBegue** » Fri May 04, 2018 6:45 pm

Now that you mention it, it would make sense to compute the distance from the king to all squares on the board and add it as an input plane (well two, obviously). This can be done naively or knowing that the king cannot go through its own pieces or through squares attacked by the opponent.

The problem with blindly following captures with SEE is that they might lead you to a really bad position, and then you won’t know that you could have used the stand pat option. I don’t think that would work well. I can try to construct some example position to illustrate what I am saying.

Daniel Shawul · Post by **Daniel Shawul** » Sun May 06, 2018 1:29 pm

AlvaroBegue wrote:Now that you mention it, it would make sense to compute the distance from the king to all squares on the board and add it as an input plane (well two, obviously). This can be done naively or knowing that the king cannot go through its own pieces or through squares attacked by the opponent.

The problem with blindly following captures with SEE is that they might lead you to a really bad position, and then you won’t know that you could have used the stand pat option. I don’t think that would work well. I can try to construct some example position to illustrate what I am saying.

Ok. I wanted to avoid using the policy network because I don't think it works well in chess. The heuristics used in chess are good enough for move ordering and also fast so I want to experiment with those. SEE has a stand pat option so I might have to do a couple of NN eval calls on the way; but still prefer it more than a policy NN trained for tactics. I have just finished writing the c++ api to integrate tensorflow so time for expermentation on search with NN eval ..

AlvaroBegue · Post by **AlvaroBegue** » Sun May 06, 2018 2:07 pm

Daniel Shawul wrote:[...] I have just finished writing the c++ api to integrate tensorflow so time for expermentation on search with NN eval ..

How did you do that? Did you use something like tensorflow_cc (https://github.com/FloopCZ/tensorflow_cc)?

Daniel Shawul · Post by **Daniel Shawul** » Mon May 07, 2018 6:30 pm

AlvaroBegue wrote: ↑Sun May 06, 2018 2:07 pm
Daniel Shawul wrote:[...] I have just finished writing the c++ api to integrate tensorflow so time for expermentation on search with NN eval ..
How did you do that? Did you use something like tensorflow_cc (https://github.com/FloopCZ/tensorflow_cc)?

Yes, I am using that. You can statically bind the library to your engine to run on the CPU, but for the GPU it is sort of inconvineint still
because you need to use bazel compiler and use dynamic linking instead.
I just ran alpha-beta scoprio with the 2-layer convnet eval and the nps has plummeted by a factor of 160x

Code: Select all

EgbbProbe not Loaded!
loading_time = 0s
[st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
2 56 1 45  Nb1-c3 e7-e6
3 61 4 150  Nb1-c3 Ng8-f6 Ng1-f3
4 78 7 380  Nb1-c3 e7-e6 Ng1-f3 Nb8-c6
5 50 15 860  Nb1-c3 Nb8-c6 e2-e3 d7-d5 d2-d4
6 31 29 1890  Nb1-c3 d7-d5 Ng1-f3 Nb8-c6 e2-e4 d5xe4 Nc3xe4
6 38 62 4356  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 d7-d6
7 19 95 6980  e2-e4 Nb8-c6 Nb1-c3 Ng8-f6 d2-d3 d7-d5 e4xd5 Nf6xd5 Nc3xd5 Qd8xd5
7 32 119 8594  Nb1-c3 d7-d5 d2-d4 e7-e6 Ng1-f3 Ng8-e7 Bc1-f4
8 40 169 12112  Nb1-c3 d7-d5 d2-d4 Ng8-f6 Bc1-f4 Bc8-d7 Ng1-f3 Nb8-c6
9 42 253 17522  Nb1-c3 d7-d5 d2-d4 Ng8-f6 Bc1-f4 e7-e6 Ng1-f3 Bf8-d6 Bf4-d2
10 52 381 25962  Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 e7-e6 Qd1-d3 Nb8-c6 Bc1-f4 Bf8-e7
11 43 698 45807  Nb1-c3 d7-d5 e2-e3 a7-a6 Ng1-f3 Ng8-f6 Bf1-e2 e7-e6 Ke1-g1 Bf8-e7 a2-a3
11 51 1145 75848  c2-c4 Ng8-f6 Nb1-c3 e7-e6 d2-d4 d7-d5 c4xd5 e6xd5 Ng1-f3 c7-c6 Bc1-f4 Nb8-d7
splits = 0 badsplits = 0 egbb_probes = 0
nodes = 78197 <58 qnodes> time = 11912ms nps = 6564
move c2c4
Bye Bye

chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design

Re: chess evaluation neural network design