CHEMICAL ENGINES

fantasmadel50 · Post by **fantasmadel50** » Sat Feb 04, 2017 11:22 pm

Well, someone would be so kind as to clearly explain the meaning of CMH, MRL,. Thank you very much

Tony P. · Post by **Tony P.** » Sun Feb 05, 2017 3:12 am

Are you asking what these abbreviations mean in the context of chess programming? I think that:

CMH = Countermove Heuristic

MRL = Modular Reinforcement Learning

These topics are too complex for me to understand; I hope you learn something useful from the linked articles.

The word 'modular' in 'modular reinforcement learning' means that there are several separate learning 'modules' which are dedicated to distinct aspects of the task (of chess play in our case). For example, for each of the phases of the game, there may be a special module (opening / middlegame / endgame) that is used to tune only the parameters relevant to that phase - it would be illogical to use endgame positions to train the assessment of piece development or to use opening books to train the ability to pass a pawn.

'Reinforcement learning' means that the engine learns the optimal values of its parameters by playing against itself, making the moves leading to good (in its opinion) positions more often than bad moves (though it's still necessary to tell it to sometimes play the variations that it regards as bad - its initial assessment might be wrong and it might easily miss a good move if it doesn't experiment with each and every move). The engine doesn't know in advance which moves are good - it discovers them by trial and error.

This is very different from 'supervised learning' where a human expert tells the engine which move is the best in some certain training positions, and then, when the engine encounters an unknown position, it tries to find a similar position where it was previously told which move is the best.

abulmo2 · Post by **abulmo2** » Sun Feb 05, 2017 4:29 pm

Tony P. wrote: 'Reinforcement learning' means that the engine learns the optimal values of its parameters by playing against itself, making the moves leading to good (in its opinion) positions more often than bad moves (though it's still necessary to tell it to sometimes play the variations that it regards as bad - its initial assessment might be wrong and it might easily miss a good move if it doesn't experiment with each and every move). The engine doesn't know in advance which moves are good - it discovers them by trial and error.

This is very different from 'supervised learning' where a human expert tells the engine which move is the best in some certain training positions, and then, when the engine encounters an unknown position, it tries to find a similar position where it was previously told which move is the best.

Your definitions contain some misconceptions. Learning is mostly used to tune parameters of the evaluation function. The evaluation is not here to tell which move is best, but to score a position. The search (alphabeta with refinements), by backpropagating scores, decides which move is best. Reinforcement learning not only find optimal values for the parameters, but it also finds by itself the parameters (ie the chess concepts) to tune. In reinforcement learning, the human expert chooses the parameters to tune (ie the chess concepts to use), and let the computer tune the parameters from a given set of training data. The human expert is not invoved in telling which moves are good or bad in the training data.

Tony P. · Post by **Tony P.** » Sun Feb 05, 2017 10:28 pm

You're right in that I wasn't accurate when writing my previous post... Thank you for the clarifications! (I assume you meant supervised learning in your last 2 sentences.)

But I feel that we're talking about somewhat different approaches to evaluation.

You've described the tuning process for the static evaluation function that scores positions only ('the V-function'). I agree that this is almost universally used in modern chess engines.

What I was hinting at is the 'move evaluation network' that Matthew Lai tried to implement in Giraffe (it's been removed from the latest build 20160923 but it was in build 20150908 that plays stronger) - it scores position-move pairs (or, in terms of machine learning, 'state-action pairs').

Ideally, if this score ('the Q-function') were accurate enough, there would be no need to search the tree at all; but as it's extremely hard to make it predict the best move accurately straight away, Matthew tried 'probability-based search' where the game tree is explored quite selectively, namely, the engine calculates the 'likelihood' of a variation being principal as the product of the individual 'likelihoods of superiority' of the moves that it consists of, and branches the tree only in those variations that are most 'likely' principal.

I don't know why this 'probabilistic' approach didn't result in a significant ELO increase. There are many possible reasons for it; I'm inclined to blame the architecture of the neural network. Anyway, that's a topic for another thread.

fantasmadel50 · Post by **fantasmadel50** » Mon Feb 06, 2017 3:57 am

Tony P. wrote:You're right in that I wasn't accurate when writing my previous post... Thank you for the clarifications! (I assume you meant supervised learning in your last 2 sentences.)

But I feel that we're talking about somewhat different approaches to evaluation.

You've described the tuning process for the static evaluation function that scores positions only ('the V-function'). I agree that this is almost universally used in modern chess engines.

What I was hinting at is the 'move evaluation network' that Matthew Lai tried to implement in Giraffe (it's been removed from the latest build 20160923 but it was in build 20150908 that plays stronger) - it scores position-move pairs (or, in terms of machine learning, 'state-action pairs').

Ideally, if this score ('the Q-function') were accurate enough, there would be no need to search the tree at all; but as it's extremely hard to make it predict the best move accurately straight away, Matthew tried 'probability-based search' where the game tree is explored quite selectively, namely, the engine calculates the 'likelihood' of a variation being principal as the product of the individual 'likelihoods of superiority' of the moves that it consists of, and branches the tree only in those variations that are most 'likely' principal.

I don't know why this 'probabilistic' approach didn't result in a significant ELO increase. There are many possible reasons for it; I'm inclined to blame the architecture of the neural network. Anyway, that's a topic for another thread.

Thank you so much

CHEMICAL ENGINES

CHEMICAL ENGINES

Re: CHEMICAL ENGINES

Re: CHEMICAL ENGINES

Re: CHEMICAL ENGINES

Re: CHEMICAL ENGINES