Board adaptive / tuning evaluation function - no NN/AI

Tony P. · Post by **Tony P.** » Wed Jan 15, 2020 5:28 pm

YUFe wrote: ↑Wed Jan 15, 2020 2:07 pm Has anyone used a NN for hashing or board state representation/encoding ?

It has been done in MuZero*. I like the approach a lot and think Mu0 would grow a lot stronger than Alpha0 if Deepmind hadn't switched it from chess to tasks that interested the team more.

Another piece on this topic (albeit on reinforcement learning in general, not chess) that I like is Sourabh Bose's PhD dissertation.

* That said, I do believe that a chess board representation could use far fewer parameters and still be accurate enough.

Tony P. · Post by **Tony P.** » Wed Jan 15, 2020 6:40 pm

OK, there's no need to read that dissertation - there's so much literature available that I can't be sure which method is optimal, need to use RL techniques to learn to explore the space of RL papers more efficiently

Here's a possibly more interesting paper: Measuring Structural Similarities in Finite MDPs. If applied to chess, it would produce a metric that would treat positions with more similar sets of (pseudo)legal moves (identified e.g. by the to- and from- squares and the piece types of promotions) as closer to each other than those with less similar sets of moves.

YUFe · Post by **YUFe** » Thu Jan 16, 2020 10:00 am

Tony P. wrote: ↑Wed Jan 15, 2020 5:28 pm It has been done in MuZero ... I do believe that a chess board representation could use far fewer parameters

I researched the "representation" part. I was just talking about an Autoencoder for the position. They represent the entire game, not just the current state.
I did not know of MuZero, but then I never liked the Atari 2600.

Tony P. · Post by **Tony P.** » Thu Jan 16, 2020 9:07 pm

YUFe wrote: ↑Thu Jan 16, 2020 10:00 am They represent the entire game, not just the current state.

Mu0 was tested vs A0 with a search budget of 800 (lol) nodes per decision. I doubt Mu0 would be competent at large search depths without modifications, as it only encodes the root and then searches in the latent space.

However, if it were modified to roll out a number of predicted 6-ply sequences on an actual board, then encode the resulting positions before searching in their subtrees in the latent space, and repeat this procedure 6, 12, etc. plies away from the root, then it would have to encode somewhat fewer nodes than if it encoded every position in the tree, though the savings wouldn't be that big because there'd be relatively fewer TT hits.

As Mu0's value prediction network would be very deep and have a ton of parameters anyway, the addition of policy prediction heads to the output layer didn't increase the computational cost per latent state by much, so it made sense for DeepMind to add those heads to guide the search.

On the other hand, in an algorithm with a lightweight value prediction function, it indeed likely wouldn't make sense to call a separate policy predictor instead of guiding the search with some simple move ordering heuristic and then by the value predictions for the children. Then (as far as I understand, that's what you had in mind) an encoder would only be used to map a discrete board state into a lower dimension real-valued vector that would be easier to put into a kernel/NN/etc. to predict the value, and would also make the training easier.

I'm not an expert, so take my ramblings with a grain of salt

Tony P. · Post by **Tony P.** » Thu Jan 16, 2020 10:35 pm

I might end up trying some method from this survey of knowledge graph embeddings/projections (treating positions as KGs) or whichever survey will be the most up-to-date then, but I expect a lot of trial and error

as the trade-off between eval accuracy and speed seems dramatic in chess.

YUFe · Post by **YUFe** » Fri Jan 17, 2020 9:37 am

Tony P. wrote: ↑Thu Jan 16, 2020 10:35 pm I might ...

I appreciate your enthusiasm, but you need to make your own thread and detail it there.

YUFe · Post by **YUFe** » Sun Jan 19, 2020 11:19 am

After the search those moves that were investigated the deepest have reliable new evaluations. Thus the available moves now have their original evaluation and a new evaluation identical to that of a much later/deeper move.
We do know which of the myriad of possible following states gave us that value.
After each search we got pairs of states of approximately the same value even though the eval() does yield different values. We know that our eval() is wrong in a particular direction. We might have a move that was previously evaluated as better than that which turned out as better after the search and it is likely that we have a good new estimate for it. Now we kind of just have to figure out what changed. Also we can figure out what factors did not have an impact or varying, oscillating impact with dept. Because all root move result states can not be too different it should not be hard to figure out what made the difference.

We can make the move and store the entire tree that belongs to it or change eval() based on that tree. Both the change to the eval() just like the stored tree carry over information from the last search.

PK · Post by PK » Tue Jan 21, 2020 12:47 pm

Rodent IV adapts to position in a sense: it uses two competing piece/square table scores. If they are equal, they are weighted 50/50. If one is better, ratio between their weights is pulled towards 25/75 (function of square root of difference between the scores, capped at 25).

YUFe · Post by **YUFe** » Tue Jan 21, 2020 2:25 pm

PK wrote: ↑Tue Jan 21, 2020 12:47 pmIf one is better

Please specify.
Is that a systematic form of extremizing as in partial information aggregation for Forecasts/Predictions ?

DustyMonkey · Post by **DustyMonkey** » Wed Jan 22, 2020 4:43 am

YUFe wrote: ↑Wed Jan 15, 2020 2:07 pm That brings me totally off topic, I have some other ideas:
Has anyone used a NN for hashing or board state representation/encoding ?
One could train a net with few inner nodes/neurons (e.g. <25) to turn a representation with many (e.g. 385) neurons into that inner representation and back to the input. That looks useless at first but would provide a "notation" that is meaningful and likely allows to judge how similar two states are. Using a normal hash function on that would make sure that things similar become dissimilar.

Re: NN for hashing/board representation

The search term you need is "autoencoder" - but there will be the issue of what you will want to be considered 'typical' positions - an autoencoder cannot reduce the state much beyond a well packed traditional encoding _unless_ you decide that some 'possible' positions are either more or less important.

An NN may be overkill however. A simpler k-means clustering attack (with large k) will be much much quicker to 'train' and give you the same sort of 'similarity' groupings.

Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

More details

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI