To make the full of "eval driven search", I adopted the following methodology:bob wrote: I can't say "for most reasonable programs". I was directly addressing a claim about stockfish and Crafty.
The "depth 1" can be tricky. Here's why:
(1) some programs extend depth when they give check (Crafty, for example). While others extend depth when escaping check. That is a big difference, in that with the latter, you drop into the q-search while in check. Nothing wrong, but you might or might not recognize a mate. Which means that with a 1 ply search, some programs will recognize a mate in 1, some won't.
(2) q-search. Some do a very simple q-search. Some escape or give check at the first ply of q-search. Some give check at the first search ply, escape at the second and give check again at the third. Those are not equal.
When I tried this with stockfish, i ran into the same problem. All 1 ply searches are not created equal.
As I mentioned, I am not sure my test is all that useful, because a program's evaluation is written around its search, and vice-versa. If you limit one part, you might be limiting the other part without knowing, skewing the results.
1/ Use depth=4 to avoid the issue of threats, q-search imbalances, inadequacy of depth=1 search in games.
2/ Most engines do not follow well "go nodes" UCI command for small number of nodes. They do follow "go depth" command rather well.
3/ Shredder engines (here I used Shredder 12 as standard candle) follow "go nodes" command literally, even for small number of nodes.
4/ Calculate average nodes for each engine to depth=4 on many positions (150 late opening, 171 endgame).
5/ Set the matches: Engine X depth=4 versus Shredder 12 number of nodes which each engine uses to depth 4.
In Shredder GUI, as an example Gaviota 1.0:
Code: Select all
Late opening positions:
TotTime: 25s SolTime: 25s
Ply: 0 Positions:150 Avg Nodes: 0 Branching = 0.00
Ply: 1 Positions:150 Avg Nodes: 90 Branching = 0.00
Ply: 2 Positions:150 Avg Nodes: 255 Branching = 2.83
Ply: 3 Positions:150 Avg Nodes: 524 Branching = 2.05
Ply: 4 Positions:150 Avg Nodes: 1254 Branching = 2.39
Engine: Gaviota v1.0 (2048 MB)
by Miguel A. Ballicora
Endgame positions:
TotTime: 28s SolTime: 28s
Ply: 0 Positions:171 Avg Nodes: 0 Branching = 0.00
Ply: 1 Positions:171 Avg Nodes: 46 Branching = 0.00
Ply: 2 Positions:171 Avg Nodes: 158 Branching = 3.43
Ply: 3 Positions:171 Avg Nodes: 386 Branching = 2.44
Ply: 4 Positions:171 Avg Nodes: 928 Branching = 2.40
Engine: Gaviota v1.0 (2048 MB)
by Miguel A. Ballicora
Code: Select all
1) Stockfish 21.03.2015 598 nodes
2) Komodo 8 1455 nodes
3) Houdini 4 1957 nodes
4) Robbolito 0.085 1853 nodes
5) Texel 1.05 1831 nodes
6) Gaviota 1.0 1091 nodes
7) Strelka 2.0 4485 nodes
8) Fruit 2.1 7294 nodes
9) Komodo 3 1865 nodes
10) Stockfish 2.1.1 1892 nodes
11) Houdini 1.5 2167 nodes
12) Crafty 24.1 1584 nodes
Shredder 12 standard candle, the strength of evals:
Code: Select all
# PLAYER : RATING POINTS PLAYED (%)
1 Gaviota 1.0 : 178.8 735.0 1000 73.5%
2 Komodo 3 (2011) : 159.9 713.5 1000 71.3%
3 Houdini 4 : 125.3 671.5 1000 67.2%
4 Komodo 8 : 109.3 651.0 1000 65.1%
5 Houdini 1.5 (2011) : 101.6 641.0 1000 64.1%
6 RobboLito 0.085 : 43.0 561.0 1000 56.1%
7 Shredder 12 : 0.0 5703.5 12000 47.5%
8 Stockfish 21.03.2015 : -11.2 484.0 1000 48.4%
9 Stockfish 2.1.1 (2011) : -22.1 468.5 1000 46.9%
10 Texel 1.05 : -43.0 439.0 1000 43.9%
11 Strelka 2.0 (Rybka 1.0) : -109.6 348.5 1000 34.9%
12 Crafty 24.1 : -115.8 340.5 1000 34.0%
13 Fruit 2.1 : -199.1 243.0 1000 24.3%
b) Besides the "anomaly" with Gaviota 1.0 (in fact there were earlier indications that Gaviota has a strong eval), there is another of Komodo 3 having a better eval (larger maybe) than Komodo 8. Larry or Mark could confirm or infirm the regression (or thinning) of the eval.
c) Stockfish eval seems no stronger than that of Shredder 12, but it seems significantly stronger than that of Crafty 24.1.