I am thinking about more sensitive frameworks to test very small engine design ideas in much less time.
1. How about using number of moves to mate in white/black game pairs as indicator of superiority?
2. And what about minimizing useless draw/draw results by creating a special test positions set consisting only of highly unbalanced starting positions?
Number of moves to mate as metric of playing strength
Moderator: Ras
-
- Posts: 894
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Number of moves to mate as metric of playing strength
For 1 it is probably better to count up to a theoretical win (as indicated by EGT). I think 2 is already done in TCEC.
-
- Posts: 7251
- Joined: Mon May 27, 2013 10:31 am
Re: Number of moves to mate as metric of playing strength
Or maybe an idea is to make it resign early.
Or remove (or add) some pieces from initial position if it limits the game length.
Or change the chess rules. So infinite possibilities to reduce game length.
Or remove (or add) some pieces from initial position if it limits the game length.
Or change the chess rules. So infinite possibilities to reduce game length.
-
- Posts: 10801
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Number of moves to mate as metric of playing strength
number of moves to mate is not a good predictor of superiority in normal chess positionsAleks Peshkov wrote: ↑Sat Sep 17, 2022 3:30 pm I am thinking about more sensitive frameworks to test very small engine design ideas in much less time.
1. How about using number of moves to mate in white/black game pairs as indicator of superiority?
2. And what about minimizing useless draw/draw results by creating a special test positions set consisting only of highly unbalanced starting positions?
For example Stockfish does not have the knowledge that it should trade pieces with big material advantage because of some simplifications for this case and developers do not care.
Even latest stockfish version show a stupid analysis for the following position and I am sure many weak engines are going to beat stockfish in pair of games from the following position when the winner is the side who mate faster.
I guess that if you make some tournament with a lot of engines from this position when the winner is the side that mate in less moves then stockfish is not going to be in the top 100 in the following list.
https://ccrl.chessdom.com/ccrl/4040/
I think that it may be interesting to make some 9 round swiss tournament with all 498 engines in the ccrl 40/15 list from this position(every game is pair of games with white and black and the winner is the side that mate in less moves) but I do not know if somebody can get all of them to install them.
[fen]1nb1kbn1/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1[/fen]
Stockfish_22091707_x64_avx2:
NNUE evaluation using nn-ad9b42354671.nnue enabled
1/1 00:00 145 145k +21.96 e2-e3
2/2 00:00 539 539k +21.96 e2-e3
3/2 00:00 3k 2,620k +21.96 e2-e3
4/3 00:00 3k 2,966k +22.17 Nb1-c3 c7-c6 h2-h4
5/4 00:00 4k 3,843k +21.96 Nb1-c3 a7-a6 Ng1-f3 c7-c6
6/6 00:00 10k 4,984k +21.82 e2-e3 d7-d6
7/7 00:00 28k 7,009k +21.75 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 g2-g3 e7-e6
8/9 00:00 80k 9,971k +21.63 Nb1-c3 Ng8-f6 Ng1-f3 c7-c6 d2-d3 Nf6-d5 Nc3-e4 Nd5-f6
9/11 00:00 170k 11,323k +21.61 Nb1-c3 Nb8-c6 Ng1-h3 Ng8-f6 Nc3-b5 a7-a6 Nb5xc7+ Ke8-d8 Nc7-a8 Nf6-d5 e2-e4
10/11 00:00 228k 11,391k +21.61 Nb1-c3 Nb8-c6 Nc3-b5 Ke8-d8 Ng1-f3 Ng8-f6 Nf3-g5 a7-a6 Nb5-c3 Nc6-e5
11/16 00:00 560k 10,985k +21.52 Nb1-c3 Nb8-c6 Nc3-d5 Ke8-d8 Ng1-f3 e7-e6 Nd5-e3 Ng8-f6 Ne3-c4 Kd8-e8 g2-g3 Bf8-b4
12/16 00:00 764k 11,580k +21.59 Nb1-c3 Nb8-c6 a2-a3 e7-e6 Ng1-f3 Ng8-f6 e2-e4 d7-d5 e4xd5 Nf6xd5 Nc3-e4
13/18 00:00 1,556k 12,858k +21.41 Nb1-c3 Nb8-c6 d2-d4 e7-e6 a2-a3 d7-d5 f2-f3 Bf8-e7 h2-h4 Ng8-f6 e2-e3 Ke8-f8
14/22 00:00 2,074k 12,884k +21.39 Nb1-c3 Nb8-c6 d2-d4 e7-e6 a2-a3 Ng8-f6 e2-e4 d7-d5 e4-e5 Nf6-e4 Nc3-e2 f7-f6 Ng1-f3 f6xe5 d4xe5
15/24 00:00 3,020k 13,248k +21.46 e2-e3 Ng8-f6 Ng1-e2 Nb8-c6 g2-g3 b7-b6 d2-d3 e7-e6 Bf1-g2 Bf8-d6 f2-f4
16/22 00:00 5,544k 13,966k +21.58 e2-e3 Ng8-f6 Nb1-c3 Nb8-c6 a2-a3 a7-a6 d2-d4 d7-d6 Bf1-d3 Bc8-e6 Ng1-e2 Nf6-d5 Nc3xd5 Be6xd5 O-O
17/26 00:00 8,288k 14,489k +21.63 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-b4 a2-a3 Bb4-d6 Nb1-c3 Ke8-f8 b2-b4 Kf8-g8 f2-f4 Bc8-a6
18/21 00:00 8,686k 14,524k +21.61 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-d6 d2-d3 Ke8-f8 a2-a3 Bc8-a6 f2-f4 g7-g6 Nb1-c3 Ba6-c8 d3-d4 Kf8-g8
19/21 00:00 9,031k 14,520k +21.63 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-d6 d2-d3 Ke8-f8 Nb1-d2 Bc8-a6 a2-a3 Kf8-g8 Rf1-e1 Bd6-c5 b2-b4 Bc5-d6
20/22 00:00 9,960k 14,561k +21.67 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-d6 d2-d3 Ke8-f8 a2-a3 Kf8-g8 Nb1-c3 Nf6-d5 Nc3xd5 e6xd5 Bg2xd5 Nc6-e7 Bd5-g2
21/23 00:00 10,585k 14,520k +21.63 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-d6 d2-d3 Ke8-f8 c2-c4 a7-a5 Nb1-c3 Bc8-a6 e3-e4 g7-g6 f2-f4 Kf8-g7
22/25 00:00 12,225k 14,589k +21.61 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-d6 d2-d3 Ke8-f8 a2-a3 g7-g6 c2-c4 Kf8-g8 Nb1-c3 a7-a5 d3-d4 Bc8-a6 Qd1-a4 Kg8-g7 e3-e4
23/28 00:01 16,730k 14,871k +21.61 g2-g3 Ng8-f6 Bf1-g2 Nb8-c6 e2-e3 e7-e6 Ng1-e2 b7-b6 O-O Bf8-d6 d2-d3 Ke8-f8 a2-a3 g7-g6 c2-c4 Kf8-g8 Nb1-c3 a7-a5 d3-d4 Bc8-a6 Qd1-a4 Kg8-g7 e3-e4
24/29 00:01 19,181k 14,974k +21.61 g2-g3 Nb8-c6 Bf1-g2 e7-e6 e2-e3 b7-b6 d2-d3 Ng8-f6 a2-a3 Bf8-d6 Ng1-e2 Ke8-f8 O-O a7-a5 c2-c4 g7-g6 Nb1-c3 Bc8-a6 e3-e4 Kf8-g7 f2-f4 Ba6-b7 e4-e5
25/30 00:01 29,471k 15,113k +21.61 g2-g3 Nb8-c6 Bf1-g2 e7-e6 e2-e3 b7-b6 d2-d3 Ng8-e7 Nb1-d2 g7-g6 Ng1-e2 Bf8-g7 O-O Ke8-f8 a2-a3 Ne7-f5 d3-d4 a7-a5 c2-c4 Kf8-g8 Qd1-a4 Bc8-a6 b2-b4 Nf5-d6 b4-b5
26/32 00:02 36,241k 15,215k +21.61 g2-g3 Nb8-c6 e2-e3 e7-e6 Bf1-g2 b7-b6 d2-d3 Ng8-e7 Nb1-d2 g7-g6 Ng1-e2 Bf8-g7 O-O Ne7-f5 c2-c3 Nc6-e5 Qd1-c2 Bc8-a6 c3-c4 Ne5-c6 Bg2-h3 Ba6-b7 d3-d4 Ke8-f8 Qc2-a4 Nf5-d6 d4-d5
27/34 00:02 45,468k 15,232k +21.61 g2-g3 Nb8-c6 e2-e3 e7-e6 Bf1-g2 b7-b6 d2-d3 Ng8-e7 Ng1-e2 g7-g6 O-O Bf8-g7 Nb1-d2 Ke8-f8 c2-c3 Bc8-a6 c3-c4 Ba6-b7 Qd1-a4 a7-a5 Qa4-b5 Kf8-g8 f2-f4 Ne7-f5 Nd2-e4 Nf5-d6 Ne4xd6 c7xd6
28/36 00:04 62,161k 15,273k +21.53 g2-g3 Nb8-c6 d2-d3 e7-e6 Bf1-g2 b7-b6 e2-e3 Ng8-e7 Ng1-e2 g7-g6 O-O Bc8-b7 f2-f4 Bf8-g7 c2-c3 Bb7-a6 Nb1-a3 Ke8-f8 Ra1-b1 f7-f5 b2-b4 Nc6-d8 Bg2-h3 Nd8-c6 b4-b5
29/35 00:05 79,784k 15,311k +21.61 g2-g3 Nb8-c6 e2-e3 e7-e6 Bf1-g2 b7-b6 Ng1-e2 Ng8-e7 O-O g7-g6 d2-d3 Ke8-d8 f2-f4 Bc8-a6 c2-c4 Kd8-c8 Nb1-d2 Ne7-f5 Nd2-e4 Bf8-e7 a2-a3 Ba6-b7 g3-g4 Nf5-h4 Bg2-h3 Kc8-b8 Ne4-d2 Be7-c5 d3-d4 Bc5-d6
30/34 00:07 107,886k 15,296k +21.52 g2-g3 Nb8-c6 e2-e3 e7-e6 Bf1-g2 b7-b6 Ng1-e2 Ng8-e7 O-O g7-g6 d2-d3 Bc8-a6 Nb1-d2 Ne7-f5 Nd2-f3 Bf8-g7 c2-c3 Nf5-d6 Nf3-e1 Ba6-b7 Qd1-a4 a7-a5 a2-a3 Ke8-f8 f2-f4 Kf8-g8 Ne1-f3 Bb7-a6 Qa4-c2 Nd6-b7 b2-b4 Nb7-d6
-
- Posts: 894
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Number of moves to mate as metric of playing strength
I am not trying to limit single test game length. I am trying to reduce number of test games needed to proof or disproof a design hypothesis.
-
- Posts: 894
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Number of moves to mate as metric of playing strength
Even if Stockfish does not understand the need of fast win it will fight to avoid fast lose from the weaker side. So in two games mini-match it does not matter how fast theoretical shortest mate is but only who wins faster (or avoid lose longer).Uri Blass wrote: ↑Sat Sep 17, 2022 5:40 pm number of moves to mate is not a good predictor of superiority in normal chess positions
For example Stockfish does not have the knowledge that it should trade pieces with big material advantage because of some simplifications for this case and developers do not care.
Even latest stockfish version show a stupid analysis for the following position and I am sure many weak engines are going to beat stockfish in pair of games from the following position when the winner is the side who mate faster.
I guess that if you make some tournament with a lot of engines from this position when the winner is the side that mate in less moves then stockfish is not going to be in the top 100 in the following list.
And I am going to incrementally test engine play against earlier version of itself while the engine in its rudiment state.
-
- Posts: 10801
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Number of moves to mate as metric of playing strength
Stockfish has bad evaluation and does not understand to avoid losing fast.Aleks Peshkov wrote: ↑Sat Sep 17, 2022 5:58 pmEven if Stockfish does not understand the need of fast win it will fight to avoid fast lose from the weaker side. So in two games mini-match it does not matter how fast theoretical shortest mate is but only who wins faster (or avoid lose longer).Uri Blass wrote: ↑Sat Sep 17, 2022 5:40 pm number of moves to mate is not a good predictor of superiority in normal chess positions
For example Stockfish does not have the knowledge that it should trade pieces with big material advantage because of some simplifications for this case and developers do not care.
Even latest stockfish version show a stupid analysis for the following position and I am sure many weak engines are going to beat stockfish in pair of games from the following position when the winner is the side who mate faster.
I guess that if you make some tournament with a lot of engines from this position when the winner is the side that mate in less moves then stockfish is not going to be in the top 100 in the following list.
And I am going to incrementally test engine play against earlier version of itself while the engine in its rudiment state.
Old Wasp withn no NN survive against itself for more moves
Finally old Wasp survive against stockfish for significantly more moves