Deep Pink: a chess engine using deep learning

nkg114mc · Post by **nkg114mc** » Sun Feb 05, 2017 11:03 pm

Just found this blog post about deep learning in chess, by Erik Bernhardsson

http://blog.yhat.com/posts/deep-learning-chess.html

And the source code:

https://github.com/erikbern/deep-pink

It mentioned an engine "Deep-Pink", which includes a trained neural network model (for evaluation function). Maybe some one would be interested.

What do you think about this work?

jdart · Post by **jdart** » Sun Feb 05, 2017 11:47 pm

A few comments:

1. Training for this engine used an interesting objective function. I don't think I have seen that approach used for chess before.

2. "Players will choose an optimal or near-optimal move" is a bad assumption for the FICS dataset. Players will choose better than random moves, which is part of the theory. But for chess, even expert players are a poor "oracle," because they blunder from time to time and make suboptimal moves pretty often. (Go is apparently different because for a long time at least, pro level players were much better than computers, although by an absolute standard of accuracy they might be making errors).

3. The results reinforce the idea tha a NN or deep learning model sacrifices speed for accuracy, and while it can produce reasonable performance, it is hard for it to compete with speed-optimized engines with faster evals.

--Jon

Tony P. · Post by **Tony P.** » Sun Feb 05, 2017 11:57 pm

Meh. Deep Pink has scored merely 30-35% of points in a match vs Sunfish, which is a simple and weak engine that's not even rated on CCRL (nor other official rating lists, I guess).

Currently, the strongest of those chess engines whose evaluation is based on a neural network is Giraffe. It has achieved the strength of a human IM, which is not bad for such an experimental engine. Unfortunately for the computer chess community but fortunately for the mankind, its author Matthew Lai has been recruited by DeepMind so now he has little spare time left for Giraffe improvement.

DeepMind is constantly improving its technologies and I believe that they'd yield a very strong engine (possibly stronger than Stockfish) if anyone were interested in applying those methods to chess - I'll write more on this matter soon - but chess is already 'solved' (computers play at superhuman ability) so AI researchers' work is now dedicated to harder games and real-life tasks.

Tony P. · Post by **Tony P.** » Mon Feb 06, 2017 12:17 am

jdart wrote:2. "Players will choose an optimal or near-optimal move" is a bad assumption for the FICS dataset. Players will choose better than random moves, which is part of the theory. But for chess, even expert players are a poor "oracle," because they blunder from time to time and make suboptimal moves pretty often. (Go is apparently different because for a long time at least, pro level players were much better than computers, although by an absolute standard of accuracy they might be making errors).

It's OK for AlphaGo to use human games to train on because humans yet evaluate positions better than computers. However, the learner can't become stronger than the teacher. Eventually, Go engines will start using reinforced or unsupervised learning and discover superhuman positional features. What is stopping them from doing so now is the lack of computational power of the hardware.

jdart wrote:3. The results reinforce the idea tha a NN or deep learning model sacrifices speed for accuracy, and while it can produce reasonable performance, it is hard for it to compete with speed-optimized engines with faster evals.

Indeed. The problem with NN engines so far is that their search isn't selective enough. They'll start seeing a few moves further than Stockfish when they learn to spend most of their time on exploring the most critical positions. Conventional engines use heuristic rules to determine when to do quiescent search. Instead of using these 'rules of thumb' (exploring captures, pawn attacks, etc.), the 'criticality' of a position (the extent of the need to search its subtree deeper) can be assessed by a separate neural network.

In particular, it may be worthwhile to train a network to have an 'intuitive' sense of mate attacks and forced draws that are probable a few moves later. For example, if the engine 'sees' that its pieces are connected well towards the weaknesses around the opponent's king, it should start a search for mating patterns like a human master would do.

Tony P. · Post by **Tony P.** » Mon Feb 06, 2017 12:41 am

Actually, I believe there will be a point in computer chess progress when search will no longer be divided into main and quiescent. Engines will, all the way from the root to the leaves, always prefer to dig deeper into those variations that 1) are looking the most promising so far and 2) their model opponent (not a mirror image of the engine like it is in minimax search!) is most likely to move into. By spending most of their time on 2-4 candidate moves per position and only a small (but nonzero) portion of the time on the rest of the moves, they'll achieve a smaller effective branching factor.

The dilemma between the research of all the possibilities and the depth of the research of the best ones is known in reinforcement learning theory as the 'exploration vs exploitation trade-off'.

jdart · Post by **jdart** » Mon Feb 06, 2017 1:02 am

The purpose of the quiescence search, and most of the pruning that happens in the main search, is to cut off rapidly nodes that are losing material. That is pretty necessary because so many parts of the search tree typically involve losing material for no compensation.

There is also no avoiding the horizon effect, not with current or projected hardware speeds. You will always miss things that are outside the horizon. If the horizon is moved closer because you spend more time on eval, then you will miss more.

That is not to say that there can't be better algorithms. But I am afraid that at best these will still be heuristics that work most of the time but fail to be accurate at particular nodes, because you just can't afford spend enough time to get high accuracy.

--Jon

Tony P. · Post by **Tony P.** » Mon Feb 06, 2017 2:02 am

jdart wrote:The purpose of the quiescence search, and most of the pruning that happens in the main search, is to cut off rapidly nodes that are losing material. That is pretty necessary because so many parts of the search tree typically involve losing material for no compensation.

A hanging piece and a losing static exchange are such basic features that I think a good NN will detect them anyway or at least sense the tension and expand the tree a few moves ahead from that node. The ability to develop such a sense can be included as an auxiliary component into the objective of the training.

My optimistic musings are based on the 2016 work of V. Mnih et al. with DeepMind - they've been able to train a computer to attain an 8.8 times higher score than expert humans at Atari games. If you wish to know the technical details, see their arXiv preprints. I haven't studied them in depth yet but it looks like the biggest breakthrough has been the introduction of 'auxiliary tasks' in their latest paper. If anyone wishes, I can try to retell the papers in a less esoteric language and explain how this framework applies to chess.

I'm sure that it's going to be efficient in complex board games because they're a classical example of tasks where the explicit reward is observed very seldom - a point or half of it is awarded only 'in the terminal state' (at the end of the game), and there are no other rewards in the process, so it makes sense to design intermediate pseudo-rewards like those for encountering manually programmed (material, open file possession, etc.) or automatically extracted features.

jdart wrote:There is also no avoiding the horizon effect, not with current or projected hardware speeds. You will always miss things that are outside the horizon. If the horizon is moved closer because you spend more time on eval, then you will miss more.

That is not to say that there can't be better algorithms. But I am afraid that at best these will still be heuristics that work most of the time but fail to be accurate at particular nodes, because you just can't afford spend enough time to get high accuracy.

There's no doubt about it. Chess is still very far from being solved in the game-theoretical sense.

Note however that the practical goal is not to play a Nash equilibrium strategy - it suffices to lure the opponent into a variation where he's more likely to make a mistake than us because his horizon in the particular subtree is expected to be closer than ours.

Tony P. · Post by **Tony P.** » Mon Feb 06, 2017 2:39 am

Tony P. wrote:it makes sense to design intermediate pseudo-rewards like those for encountering manually programmed (material, open file possession, etc.) or automatically extracted features.

This may be an improvement over the simple inclusion of these features into the static eval because log(p) - log(1-p) (where p is the expected value of the number of tournament points to be scored in the game*) generally depends nonlinearly and possibly nonmonotonically on the features, whereas in conventional eval routines, the 'winningness' of a position is usually assumed to be the sum of the bonuses and penalties assigned to the features. DNNs allow to be more flexible and approximate the value function of a position-move pair with a nonlinear function.

* The 'probability' p is between 0 and 1; it can be interpreted as, say, the average number of points per game that the engine will score if it plays a ton of test games from the current position as a gauntlet vs equally strong opponents; Deep Pink's eval is, in fact, an estimate for 2p-1, as 2p-1 == -1 in lost positions, 0 in drawn ones and 1 in won ones. Giraffe's eval seems to likewise be an estimate of 20p-10 except forced mate positions, as it's close to -10 in lost positions and to 10 in won ones when there's no forced mate.

matthewlai · Post by **matthewlai** » Mon Feb 06, 2017 6:03 pm

Tony P. wrote:Actually, I believe there will be a point in computer chess progress when search will no longer be divided into main and quiescent.

This is an interesting idea and something I've spent a lot of time thinking about.

I think it's possible, I'm just not quite sure how, yet.

The horizon effect is due to the fact that some positions are too chaotic (uncertain) to be evaluated statically, and must be searched further.

At the moment we use a very crude heuristic (captures, maybe late passed pawn pushes, etc), but there's no reason why we can't train a neural network to recognize these positions. It may even be able to find patterns that are more complicated than just captures and late pawn pushes.

Basically we want an evaluation function that returns not just a mean, but also a certainty/confidence. I tried a few different approaches, but couldn't make it work.

I made a post about this a few months ago I believe.

Fulvio · Post by **Fulvio** » Mon Feb 06, 2017 7:14 pm

jdart wrote: 2. "Players will choose an optimal or near-optimal move" is a bad assumption for the FICS dataset.

I wonder if anyone has ever tried to use tablebases to train a deep learning engine. After all the code to create all the positions and correct moves already exists; it is simply a matter of adding the training algorithm to the process...

Deep Pink: a chess engine using deep learning

Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning

Re: Deep Pink: a chess engine using deep learning