Playing the endgame like a boss !!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Uri Blass
Posts: 8334
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Playing the endgame like a boss !!

Post by Uri Blass » Sun Mar 17, 2019 8:27 am

I do not know if NN are going to dominate but it is clear that stockfish goes in the wrong way and it is going to lose the first place.
I believe that the way to test only by many games is not the correct way to continue to get better.

I think that first step if you have an engine should be to a build a test suite from games of the engine when the engine does not find the right move.

Testing a new patch should be done first in 1000 positions that the engine failed to find the right move.
If there is no improvement then it is a waste of resources to test at short time control or long time control because improvement in elo means also improvement in the move choice of the engine in part of the cases.

There should be for every patch that pass a list of positions when the patch improve the move choice of the engine in order to help other developers.

It does not happen in the stockfish framework and people who look at the results of the tests see only that the version after the new patch passed SPRT test.

User avatar
hgm
Posts: 23000
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Playing the endgame like a boss !!

Post by hgm » Sun Mar 17, 2019 9:46 am

The problem is that training purely for win probability is not consistent. Even if at some point it would know perfectly how to win KRK, the training would up the value of the long wins, making it more difficult for the engine to actually find those wins. This makes the inability to find the long wins a self-fulfilling prophecy. If, given the evaluation noise of the NN, an evaluation gradient of 10% is needed from the start of the long win to the mate, the long wins cannot be evaluated much better than 90% without losing the possibility to actually convert 90%. This is then what the training will converge to (say 92%). With as a consequence that it indeed will not manage to convert in ~8% of the cases.

Of course you can try to get the evaluation noise down, so that you need a smaller gradient to convert the long wins, but with a NN of given size there will be a limit to that.

If the training objective would not have been the pure win probability S, but something like S - DTM*0.5%, the situation where it can convert 100% of the long (say 20-move, allowing it a few sub-optimal moves) KRK wins would be stable: The NN value head would output the 90% it was trained to output, which provides enough gradient to convert with certainty. But further training would then not destroy it, because the certain conversion still took about 20 moves. So it will just confirm the 90% output. The situation where the conversion is 100% is thus preserved, rather than destroyed by further training.

jp
Posts: 320
Joined: Mon Apr 23, 2018 5:54 am

Re: Playing the endgame like a boss !!

Post by jp » Sun Mar 17, 2019 12:16 pm

Alexander Lim wrote:
Sun Mar 17, 2019 3:23 am
Apparently Demis Hassabis said AlphaZero does not suffer from these endgame problems (One of the Leela developers mentions this on a youbtube video). Are there any AlphaZero games played to the endgame with mate to confirm this?
I think we'd need to see this to believe it. Right now it looks like DM's choice to end games early may have unintentionally worked out very well for them.

Some people claim that A0's analysis of the Carlsen-Caruana WCh. endgames was bad compared with SF. I haven't seen for myself though. Can anyone here confirm?

abulmo2
Posts: 145
Joined: Fri Dec 16, 2016 10:04 am
Contact:

Re: Playing the endgame like a boss !!

Post by abulmo2 » Sun Mar 17, 2019 8:18 pm

Uri Blass wrote:
Sun Mar 17, 2019 8:27 am
I think that first step if you have an engine should be to a build a test suite from games of the engine when the engine does not find the right move.
Testing a new patch should be done first in 1000 positions that the engine failed to find the right move.
Examining positions and games to discover weaknesses of a program is of course a good practice, however it is not sufficient. If you improve your engine on those 1000 positions, you may degrade the engine on other untested positions, so that the overall strength actually diminish. For example, with Amoeba version 2.3 I get the following results on STS 15.0 (avoid pointless exchange): 44/100. With version 2.4, the results have been improved: 65/100. Unfortunately, on the same time, the results on STS 6.0 (recapturing) were degraded from 71/100 to 51/100.
Richard Delorme

Uri Blass
Posts: 8334
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Playing the endgame like a boss !!

Post by Uri Blass » Mon Mar 18, 2019 4:04 am

abulmo2 wrote:
Sun Mar 17, 2019 8:18 pm
Uri Blass wrote:
Sun Mar 17, 2019 8:27 am
I think that first step if you have an engine should be to a build a test suite from games of the engine when the engine does not find the right move.
Testing a new patch should be done first in 1000 positions that the engine failed to find the right move.
Examining positions and games to discover weaknesses of a program is of course a good practice, however it is not sufficient. If you improve your engine on those 1000 positions, you may degrade the engine on other untested positions, so that the overall strength actually diminish. For example, with Amoeba version 2.3 I get the following results on STS 15.0 (avoid pointless exchange): 44/100. With version 2.4, the results have been improved: 65/100. Unfortunately, on the same time, the results on STS 6.0 (recapturing) were degraded from 71/100 to 51/100.
I do not say not to play games for testing but playing games should be done only after giving positions that the patch are supposed to fix the move choice.

Practically I can see in the stockfish framework a lot of patches without a list of positions that stockfish play better.

Post Reply