Board adaptive / tuning evaluation function - no NN/AI

YUFe · Post by **YUFe** » Tue Jan 14, 2020 8:48 pm

I have had this insight and consequent idea.
I do not know how many have had the idea and what it is called. I also lack the terminology.
I know the insight is right but it's usefulness might be limited.
The idea is about the evaluation function. I am sure there are different functions for different stages of the game, that is not what I mean. My idea is to make the function adaptive on a turn to turn basis. The info for the need is the change of the evaluations along the optimal play route / tree dept. The insight was that the values should never change towards a draw or suddenly jump towards a mate. If they do the evaluation function has a specific weakness.
While I think this is sound, the trouble is what to do with it. How could it have seen it's later value earlier in this particular board setting?
The more obvious form of adaptivity/tuning is to change the function to evaluate the best found move as high as the previously highest evaluated and use that changed function for the next move or new search.
I can get more specific but for now it's enough to tell what I was thinking about.
Questions: What is this called and where can I read about it?

BeyondCritics · Post by **BeyondCritics** » Tue Jan 14, 2020 10:31 pm

This reminds somewhat to Temporal_difference_learning.
Although you don't need to dive in that much if your main interest is chess programming. Then start with the Chess Programming Wiki.

abulmo2 · Post by **abulmo2** » Tue Jan 14, 2020 10:50 pm

It sounds like the oracle approach. It was successfull in the 1990s, but was abandonned in favor of tapered evaluation.

YUFe · Post by **YUFe** » Wed Jan 15, 2020 7:56 am

BeyondCritics wrote: ↑Tue Jan 14, 2020 10:31 pm This reminds somewhat to Temporal_difference_learning.

That is close to what I was thinking. Just that I do not want to learn an eval function, but only modify it to better fit the current situation. I want to bias it to pay more attention to what is important in the current situation to make a Shannon Type B search more viable / reduce the effective branching factor.
The Oracle is in this/my case the search based eval compared to the (local) root evaluation.

Tony P. · Post by **Tony P.** » Wed Jan 15, 2020 9:29 am

I'm not sure what you mean by 'AI'. Nowadays, it's standard for strong engines to use at least a bit of machine learning, e.g. Texel tuning.

Out of the current engines, Winter is the one that I find the soundest scientifically. It has started using a small NN for eval recently but used to rely on lightweight 'shallow' machine learning methods before v0.6.2 because of the lack of computational resources for NN training. Its author posts as jorose on this forum and on the TCEC Discord server and makishiima_shogo on Twitch. It's easy to get in touch with him on Discord or Twitch (the tcec_chess_tv channel) and at least get advice on the most suitable algorithms for tuning/adaptivity even if you don't want to derive from Winter.

YUFe · Post by **YUFe** » Wed Jan 15, 2020 2:07 pm

Tony P. wrote: ↑Wed Jan 15, 2020 9:29 am I'm not sure what you mean by 'AI'.

What does anyone? The spectrum from self-optimizing to AGI is wide and gradual.
In this context I mean that the program I would write is the same every time you start it and does not contain a NN.

Nowadays, it's standard for strong engines to use at least a bit of machine learning, e.g. Texel tuning.

That is want I explicitly did not mean.

using a small NN for

That brings me totally off topic, I have some other ideas:
Has anyone used a NN for hashing or board state representation/encoding ?
One could train a net with few inner nodes/neurons (e.g. <25) to turn a representation with many (e.g. 385) neurons into that inner representation and back to the input. That looks useless at first but would provide a "notation" that is meaningful and likely allows to judge how similar two states are. Using a normal hash function on that would make sure that things similar become dissimilar.

Tony P. · Post by **Tony P.** » Wed Jan 15, 2020 5:28 pm

YUFe wrote: ↑Wed Jan 15, 2020 2:07 pm Has anyone used a NN for hashing or board state representation/encoding ?

It has been done in MuZero*. I like the approach a lot and think Mu0 would grow a lot stronger than Alpha0 if Deepmind hadn't switched it from chess to tasks that interested the team more.

Another piece on this topic (albeit on reinforcement learning in general, not chess) that I like is Sourabh Bose's PhD dissertation.

* That said, I do believe that a chess board representation could use far fewer parameters and still be accurate enough.

Tony P. · Post by **Tony P.** » Wed Jan 15, 2020 6:40 pm

OK, there's no need to read that dissertation - there's so much literature available that I can't be sure which method is optimal, need to use RL techniques to learn to explore the space of RL papers more efficiently

Here's a possibly more interesting paper: Measuring Structural Similarities in Finite MDPs. If applied to chess, it would produce a metric that would treat positions with more similar sets of (pseudo)legal moves (identified e.g. by the to- and from- squares and the piece types of promotions) as closer to each other than those with less similar sets of moves.

YUFe · Post by **YUFe** » Thu Jan 16, 2020 10:00 am

Tony P. wrote: ↑Wed Jan 15, 2020 5:28 pm It has been done in MuZero ... I do believe that a chess board representation could use far fewer parameters

I researched the "representation" part. I was just talking about an Autoencoder for the position. They represent the entire game, not just the current state.
I did not know of MuZero, but then I never liked the Atari 2600.

Tony P. · Post by **Tony P.** » Thu Jan 16, 2020 9:07 pm

YUFe wrote: ↑Thu Jan 16, 2020 10:00 am They represent the entire game, not just the current state.

Mu0 was tested vs A0 with a search budget of 800 (lol) nodes per decision. I doubt Mu0 would be competent at large search depths without modifications, as it only encodes the root and then searches in the latent space.

However, if it were modified to roll out a number of predicted 6-ply sequences on an actual board, then encode the resulting positions before searching in their subtrees in the latent space, and repeat this procedure 6, 12, etc. plies away from the root, then it would have to encode somewhat fewer nodes than if it encoded every position in the tree, though the savings wouldn't be that big because there'd be relatively fewer TT hits.

As Mu0's value prediction network would be very deep and have a ton of parameters anyway, the addition of policy prediction heads to the output layer didn't increase the computational cost per latent state by much, so it made sense for DeepMind to add those heads to guide the search.

On the other hand, in an algorithm with a lightweight value prediction function, it indeed likely wouldn't make sense to call a separate policy predictor instead of guiding the search with some simple move ordering heuristic and then by the value predictions for the children. Then (as far as I understand, that's what you had in mind) an encoder would only be used to map a discrete board state into a lower dimension real-valued vector that would be easier to put into a kernel/NN/etc. to predict the value, and would also make the training easier.

I'm not an expert, so take my ramblings with a grain of salt

Board adaptive / tuning evaluation function - no NN/AI

Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI

Re: Board adaptive / tuning evaluation function - no NN/AI