Is e4 significantly better than d4?

Laskos · Post by **Laskos** » Fri Sep 13, 2019 2:07 pm

Lc0 is very strong about positional openings. The best net to LTC and strong RTX hardware in games and performing the best in my opening test-suite is JHorthos 320x24b bignet J13B.2-136 (built upon late T40 nets). It practically understands in 1 million nodes search all human opening theory built in 100 years when only the positional things are considered. So, I decided to see what it thinks about the starting position of chess in very long searches, well surpassing the 16GB capacity of my RAM, using 128GB SSD cache. And well surpassing the known opening knowledge (presumably).

GPU: RTX 2070
The batch file can look like this:

Code: Select all

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 400 
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\J13B.2-136
setoption name ScoreType value win_percentage
setoption name MultiPV value 20
setoption name VerboseMoveStats value true
go nodes 250000000 searchmoves d2d4 g1f3 c2c4

First, a fast 1 million search from the starting position. This is already enough to see practically all human 1-ply theory of the position (there is no tactics). The quantity to look at is "Q", the performance from White POW here is (1+Q) / 2.

Code: Select all

info string g2g4  (378 ) N:     286 (+ 0) (P:  1.43%) (Q: -0.40008) (D:  0.282)
(U: 0.60112) (Q+U:  0.20104) (V: -0.4436)
info string f2f3  (346 ) N:     524 (+ 0) (P:  1.72%) (Q: -0.22862) (D:  0.364)
(U: 0.39472) (Q+U:  0.16610) (V: -0.2382)
info string g1h3  (161 ) N:     802 (+ 0) (P:  2.05%) (Q: -0.15754) (D:  0.414)
(U: 0.30835) (Q+U:  0.15081) (V: -0.1577)
info string h2h4  (403 ) N:     876 (+ 0) (P:  2.07%) (Q: -0.13819) (D:  0.408)
(U: 0.28454) (Q+U:  0.14635) (V: -0.1346)
info string b1a3  (34  ) N:     959 (+ 0) (P:  2.17%) (Q: -0.12870) (D:  0.410)
(U: 0.27319) (Q+U:  0.14449) (V: -0.1111)
info string b2b4  (234 ) N:    1080 (+ 0) (P:  2.29%) (Q: -0.11460) (D:  0.413)
(U: 0.25634) (Q+U:  0.14174) (V: -0.0917)
info string f2f4  (351 ) N:    1244 (+ 0) (P:  2.30%) (Q: -0.08744) (D:  0.407)
(U: 0.22316) (Q+U:  0.13572) (V: -0.1041)
info string a2a4  (207 ) N:    1693 (+ 0) (P:  2.57%) (Q: -0.05450) (D:  0.450)
(U: 0.18322) (Q+U:  0.12872) (V: -0.0337)
info string a2a3  (204 ) N:    2642 (+ 0) (P:  2.85%) (Q: -0.01052) (D:  0.450)
(U: 0.13013) (Q+U:  0.11961) (V: -0.0056)
info string d2d3  (288 ) N:    2737 (+ 0) (P:  2.77%) (Q: -0.00428) (D:  0.449)
(U: 0.12238) (Q+U:  0.11810) (V: -0.0129)
info string h2h3  (400 ) N:    2750 (+ 0) (P:  2.91%) (Q: -0.00897) (D:  0.464)
(U: 0.12797) (Q+U:  0.11900) (V: -0.0022)
info string b2b3  (230 ) N:    2991 (+ 0) (P:  2.95%) (Q: -0.00178) (D:  0.422)
(U: 0.11923) (Q+U:  0.11745) (V: -0.0061)
info string b1c3  (36  ) N:    4315 (+ 0) (P:  3.09%) (Q:  0.02509) (D:  0.437)
(U: 0.08652) (Q+U:  0.11161) (V:  0.0120)
info string c2c3  (259 ) N:    5604 (+ 0) (P:  3.62%) (Q:  0.03195) (D:  0.459)
(U: 0.07806) (Q+U:  0.11001) (V:  0.0444)
info string g2g3  (374 ) N:   10261 (+ 0) (P:  5.75%) (Q:  0.04036) (D:  0.464)
(U: 0.06775) (Q+U:  0.10811) (V:  0.0553)
info string e2e3  (317 ) N:   14263 (+ 0) (P:  5.98%) (Q:  0.05455) (D:  0.468)
(U: 0.05062) (Q+U:  0.10517) (V:  0.0571)
info string c2c4  (264 ) N:   28066 (+ 0) (P:  7.63%) (Q:  0.06911) (D:  0.451)
(U: 0.03283) (Q+U:  0.10194) (V:  0.0737)
info string g1f3  (159 ) N:   64502 (+ 0) (P: 10.01%) (Q:  0.08059) (D:  0.481)
(U: 0.01874) (Q+U:  0.09933) (V:  0.0848)
info string d2d4  (293 ) N:  439604 (+ 0) (P: 16.58%) (Q:  0.09024) (D:  0.490)
(U: 0.00456) (Q+U:  0.09480) (V:  0.0909)
info string e2e4  (322 ) N:  578066 (+850) (P: 19.27%) (Q:  0.09088) (D:  0.497)
 (U: 0.00402) (Q+U:  0.09490) (V:  0.1079)
bestmove e2e4 ponder e7e5

So, the lead candidates are e2e4 with 54.54% performance and d2d4 with 54.51% performance, an insignificant difference. And it confirms the human opening theory.

But now the things become serious. Longish (many hours) 250+ million nodes search with this bignet. The difference widens dramatically, so much that d2d4 is hardly explored anymore:

Code: Select all

info string g2g4  (378 ) N:    9986 (+ 0) (P:  1.43%) (Q: -0.39817) (D:  0.275)
(U: 0.50416) (Q+U:  0.10599) (V:  -.----)
info string f2f3  (346 ) N:   17137 (+ 0) (P:  1.72%) (Q: -0.24788) (D:  0.349)
(U: 0.35290) (Q+U:  0.10502) (V:  -.----)
info string g1h3  (161 ) N:   26407 (+ 0) (P:  2.05%) (Q: -0.16915) (D:  0.403)
(U: 0.27364) (Q+U:  0.10449) (V:  -.----)
info string h2h4  (403 ) N:   31379 (+ 0) (P:  2.07%) (Q: -0.12788) (D:  0.409)
(U: 0.23209) (Q+U:  0.10421) (V:  -.----)
info string b1a3  (34  ) N:   31863 (+ 0) (P:  2.17%) (Q: -0.13594) (D:  0.390)
(U: 0.24021) (Q+U:  0.10428) (V:  -.----)
info string b2b4  (234 ) N:   39780 (+ 0) (P:  2.29%) (Q: -0.09927) (D:  0.433)
(U: 0.20329) (Q+U:  0.10402) (V:  -.----)
info string f2f4  (351 ) N:   41298 (+ 0) (P:  2.30%) (Q: -0.09236) (D:  0.413)
(U: 0.19634) (Q+U:  0.10398) (V:  -.----)
info string a2a4  (207 ) N:   58013 (+ 0) (P:  2.57%) (Q: -0.05243) (D:  0.457)
(U: 0.15614) (Q+U:  0.10372) (V:  -.----)
info string d2d3  (288 ) N:   83276 (+ 0) (P:  2.77%) (Q: -0.01397) (D:  0.455)
(U: 0.11743) (Q+U:  0.10347) (V:  -.----)
info string a2a3  (204 ) N:   85754 (+ 0) (P:  2.85%) (Q: -0.01359) (D:  0.447)
(U: 0.11705) (Q+U:  0.10346) (V:  -.----)
info string h2h3  (400 ) N:   97278 (+ 0) (P:  2.91%) (Q: -0.00223) (D:  0.454)
(U: 0.10562) (Q+U:  0.10339) (V:  -.----)
info string b2b3  (230 ) N:   98455 (+ 0) (P:  2.95%) (Q: -0.00236) (D:  0.442)
(U: 0.10575) (Q+U:  0.10339) (V:  -.----)
info string b1c3  (36  ) N:  122110 (+ 0) (P:  3.09%) (Q:  0.01402) (D:  0.437)
(U: 0.08925) (Q+U:  0.10328) (V:  -.----)
info string c2c3  (259 ) N:  201218 (+ 0) (P:  3.62%) (Q:  0.03965) (D:  0.463)
(U: 0.06346) (Q+U:  0.10310) (V:  -.----)
info string g2g3  (374 ) N:  348785 (+ 0) (P:  5.75%) (Q:  0.04488) (D:  0.467)
(U: 0.05818) (Q+U:  0.10305) (V:  -.----)
info string e2e3  (317 ) N:  413542 (+ 0) (P:  5.98%) (Q:  0.05205) (D:  0.475)
(U: 0.05095) (Q+U:  0.10300) (V:  -.----)
info string c2c4  (264 ) N:  688398 (+ 0) (P:  7.63%) (Q:  0.06384) (D:  0.464)
(U: 0.03907) (Q+U:  0.10291) (V:  -.----)
info string g1f3  (159 ) N: 1576963 (+ 0) (P: 10.01%) (Q:  0.08036) (D:  0.500)
(U: 0.02238) (Q+U:  0.10274) (V:  -.----)
info string d2d4  (293 ) N: 5674011 (+ 0) (P: 16.58%) (Q:  0.08522) (D:  0.510)
(U: 0.01030) (Q+U:  0.09552) (V:  -.----)
info string e2e4  (322 ) N: 248014778 (+121) (P: 19.27%) (Q:  0.10216) (D:  0.51
8) (U: 0.00027) (Q+U:  0.10243) (V:  -.----)
bestmove e2e4 ponder e7e5

e2e4 performs at an increasing 55.11%, while d2d4 performs at a decreasing 54.26%. Observe that d2d4 is not that well explored, and things get even worse for d2d4 when it is further explored. I restricted the moves to be explored to d2d4 g1f3 and c2c4, the following candidates after e2e4. The most explored, as expected, was d2d4:

About 130 million nodes (hours of search again):

Code: Select all

info string c2c4  (264 ) N: 1444102 (+ 0) (P:  7.63%) (Q:  0.05985) (D:  0.477)
(U: 0.01254) (Q+U:  0.07239) (V:  -.----)
info string g1f3  (159 ) N: 6657797 (+ 0) (P: 10.01%) (Q:  0.06870) (D:  0.520)
(U: 0.00357) (Q+U:  0.07227) (V:  -.----)
info string d2d4  (293 ) N: 124278559 (+117) (P: 16.58%) (Q:  0.07200) (D:  0.48
8) (U: 0.00032) (Q+U:  0.07231) (V:  -.----)
bestmove d2d4 ponder g8f6

The performance of d2d4 decreases further to 53.60% (while that previous of e2e4 increased to 55.11% in the previous very long search). To observe that all these values are VERY slowly moving, there are no any jumps like those with regular AB engines. So, e4 performs about 40% better than d4, taking the draw as the baseline. All other opening moves are worse than both. The difference IS significant.

To have a glimpse of what Leela sees ahead, a speculative one following the PV:

e4 line:

info depth 36 seldepth 99 time 31466907 nodes 257660432 score cp 5510 hashfull 1
000 nps 8188 tbhits 0 multipv 1 pv e2e4 e7e5 g1f3 b8c6 f1b5 g8f6 e1g1 f6e4 d2d4
e4d6 b5c6 d7c6 d4e5 d6f5 d1d8 e8d8 b1c3 f8e7 h2h3 f5h4 f3h4 e7h4 f1d1 d8e8 g2g4
h7h5 f2f3 c8e6 c3e2 e6d5 g1g2 f7f6 e2f4 h5g4 h3g4 a8d8 c1e3 e8f7 e5e6 d5e6 f4e6
f7e6 f3f4 b7b6 g2f3 c6c5 a2a4 h8e8 a4a5 g7g6 c2c4 d8d6 d1d6 e6d6 e3d2 e8d8 d2c3
d6e6 a1h1 g6g5 f4f5 e6e7 f3e2 b6a5 b2b3 d8b8 h1h3 e7f7 c3a5 b8e8 h3e3 h4g3 a5c3
e8e3 e2e3 c7c6 e3d3 f7e7 d3c2 g3f2 c2d3 f2g3 c3d2

After 8 moves, it's still in main theory:
C67
Spanish Game, Berlin Defense, l'Hermet Variation, Berlin Wall Defense

[d]r1bk1b1r/ppp2ppp/2p5/4Pn2/8/5N2/PPP2PPP/RNB2RK1 w - - 0 9

It is played by many top players today. Its performance among 2700+ players from my database is:

Players average: 2761
Performance: 2815

White Win: 22.1%
Draw: 66.2%
Black Win: 11.7%
========================
Overall performance: 55.2%

Although the draw rate is very high (66.2% compared to 52.4% from the starting position), White Win/Loss ratio is among the highest (almost 2) of the opening repertoire. According to Lc0, Black cannot defend better against e2e4.

d4 line:

info depth 28 seldepth 75 time 38173609 nodes 132380459 score cp 5359 hashfull 1
000 nps 3467 tbhits 0 multipv 1 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7 g1f3
e8g8 f1g2 d7d5 e1g1 c7c6 d1c2 b8d7 d2f4 b7b6 f1d1 c8a6 f3e5 a8c8 b1c3 a6c4 e5c4
d5c4 e2e4 b6b5 a2a4 a7a6 a4b5 a6b5 d4d5 c6d5 e4d5 e6e5 f4g5 e7c5 g2h3 h7h6 h3d7
h6g5 d7b5 c5d4 b5c6 f6e8 c3b5 e8d6 b5d4 e5d4 b2b3 c4b3 c2b3 c8b8 b3d3 b8b4 a1a4
b4a4 c6a4 d8f6 d3d4 f6d4 d1d4 f8b8 d4g4 f7f6 a4d7 g8f7 g4a4 f6f5 a4a6 f7e7 d7e6

It is soon out of general theory (Catalan Opening, General), but after 6 moves it's still played by many top players today, and it seems not that rare among top players (maybe a new trend?)

[d]rnbq1rk1/ppp1bppp/4pn2/3p4/2PP4/5NP1/PP1BPPBP/RN1QK2R w KQ d6 0 7

Its performance among 2700+ players from my database is:

Players average: 2752
Performance: 2775

White Win: 22.2%
Draw: 60.4%
Black Win: 17.4%
========================
Overall performance: 52.4%

The draw rate is somewhat lower here (still pretty high), but the White performance is not that good, too many Black wins. According to Lc0, White cannot do better with d2d4.

============================

Is it all due to some simple or stupid peculiarities of Lc0, this net, some tactics or something like that?

MikeGL · Post by **MikeGL** » Fri Sep 13, 2019 2:30 pm

Laskos wrote: ↑Fri Sep 13, 2019 2:07 pm Lc0 is very strong about positional openings. The best net to LTC and strong RTX hardware in games and performing the best in my opening test-suite is JHorthos 320x24b bignet J13B.2-136 (built upon late T40 nets). It practically understands in 1 million nodes search all human opening theory built in 100 years when only the positional things are considered. So, I decided to see what it thinks about the starting position of chess in very long searches, well surpassing the 16GB capacity of my RAM, using 128GB SSD cache. And well surpassing the known opening knowledge (presumably).

GPU: RTX 2070
The batch file can look like this:
Code: Select all
setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 400 
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\J13B.2-136
setoption name ScoreType value win_percentage
setoption name MultiPV value 20
setoption name VerboseMoveStats value true
go nodes 250000000 searchmoves d2d4 g1f3 c2c4
First, a fast 1 million search from the starting position. This is already enough to see practically all human 1-ply theory of the position (there is no tactics). The quantity to look at is "Q", the performance from White POW here is (1+Q) / 2.
Code: Select all
info string g2g4  (378 ) N:     286 (+ 0) (P:  1.43%) (Q: -0.40008) (D:  0.282)
(U: 0.60112) (Q+U:  0.20104) (V: -0.4436)
info string f2f3  (346 ) N:     524 (+ 0) (P:  1.72%) (Q: -0.22862) (D:  0.364)
(U: 0.39472) (Q+U:  0.16610) (V: -0.2382)
info string g1h3  (161 ) N:     802 (+ 0) (P:  2.05%) (Q: -0.15754) (D:  0.414)
(U: 0.30835) (Q+U:  0.15081) (V: -0.1577)
info string h2h4  (403 ) N:     876 (+ 0) (P:  2.07%) (Q: -0.13819) (D:  0.408)
(U: 0.28454) (Q+U:  0.14635) (V: -0.1346)
info string b1a3  (34  ) N:     959 (+ 0) (P:  2.17%) (Q: -0.12870) (D:  0.410)
(U: 0.27319) (Q+U:  0.14449) (V: -0.1111)
info string b2b4  (234 ) N:    1080 (+ 0) (P:  2.29%) (Q: -0.11460) (D:  0.413)
(U: 0.25634) (Q+U:  0.14174) (V: -0.0917)
info string f2f4  (351 ) N:    1244 (+ 0) (P:  2.30%) (Q: -0.08744) (D:  0.407)
(U: 0.22316) (Q+U:  0.13572) (V: -0.1041)
info string a2a4  (207 ) N:    1693 (+ 0) (P:  2.57%) (Q: -0.05450) (D:  0.450)
(U: 0.18322) (Q+U:  0.12872) (V: -0.0337)
info string a2a3  (204 ) N:    2642 (+ 0) (P:  2.85%) (Q: -0.01052) (D:  0.450)
(U: 0.13013) (Q+U:  0.11961) (V: -0.0056)
info string d2d3  (288 ) N:    2737 (+ 0) (P:  2.77%) (Q: -0.00428) (D:  0.449)
(U: 0.12238) (Q+U:  0.11810) (V: -0.0129)
info string h2h3  (400 ) N:    2750 (+ 0) (P:  2.91%) (Q: -0.00897) (D:  0.464)
(U: 0.12797) (Q+U:  0.11900) (V: -0.0022)
info string b2b3  (230 ) N:    2991 (+ 0) (P:  2.95%) (Q: -0.00178) (D:  0.422)
(U: 0.11923) (Q+U:  0.11745) (V: -0.0061)
info string b1c3  (36  ) N:    4315 (+ 0) (P:  3.09%) (Q:  0.02509) (D:  0.437)
(U: 0.08652) (Q+U:  0.11161) (V:  0.0120)
info string c2c3  (259 ) N:    5604 (+ 0) (P:  3.62%) (Q:  0.03195) (D:  0.459)
(U: 0.07806) (Q+U:  0.11001) (V:  0.0444)
info string g2g3  (374 ) N:   10261 (+ 0) (P:  5.75%) (Q:  0.04036) (D:  0.464)
(U: 0.06775) (Q+U:  0.10811) (V:  0.0553)
info string e2e3  (317 ) N:   14263 (+ 0) (P:  5.98%) (Q:  0.05455) (D:  0.468)
(U: 0.05062) (Q+U:  0.10517) (V:  0.0571)
info string c2c4  (264 ) N:   28066 (+ 0) (P:  7.63%) (Q:  0.06911) (D:  0.451)
(U: 0.03283) (Q+U:  0.10194) (V:  0.0737)
info string g1f3  (159 ) N:   64502 (+ 0) (P: 10.01%) (Q:  0.08059) (D:  0.481)
(U: 0.01874) (Q+U:  0.09933) (V:  0.0848)
info string d2d4  (293 ) N:  439604 (+ 0) (P: 16.58%) (Q:  0.09024) (D:  0.490)
(U: 0.00456) (Q+U:  0.09480) (V:  0.0909)
info string e2e4  (322 ) N:  578066 (+850) (P: 19.27%) (Q:  0.09088) (D:  0.497)
 (U: 0.00402) (Q+U:  0.09490) (V:  0.1079)
bestmove e2e4 ponder e7e5
So, the lead candidates are e2e4 with 54.54% performance and d2d4 with 54.51% performance, an insignificant difference. And it confirms the human opening theory.

But now the things become serious. Longish (many hours) 250+ million nodes search with this bignet. The difference widens dramatically, so much that d2d4 is hardly explored anymore:
Code: Select all
info string g2g4  (378 ) N:    9986 (+ 0) (P:  1.43%) (Q: -0.39817) (D:  0.275)
(U: 0.50416) (Q+U:  0.10599) (V:  -.----)
info string f2f3  (346 ) N:   17137 (+ 0) (P:  1.72%) (Q: -0.24788) (D:  0.349)
(U: 0.35290) (Q+U:  0.10502) (V:  -.----)
info string g1h3  (161 ) N:   26407 (+ 0) (P:  2.05%) (Q: -0.16915) (D:  0.403)
(U: 0.27364) (Q+U:  0.10449) (V:  -.----)
info string h2h4  (403 ) N:   31379 (+ 0) (P:  2.07%) (Q: -0.12788) (D:  0.409)
(U: 0.23209) (Q+U:  0.10421) (V:  -.----)
info string b1a3  (34  ) N:   31863 (+ 0) (P:  2.17%) (Q: -0.13594) (D:  0.390)
(U: 0.24021) (Q+U:  0.10428) (V:  -.----)
info string b2b4  (234 ) N:   39780 (+ 0) (P:  2.29%) (Q: -0.09927) (D:  0.433)
(U: 0.20329) (Q+U:  0.10402) (V:  -.----)
info string f2f4  (351 ) N:   41298 (+ 0) (P:  2.30%) (Q: -0.09236) (D:  0.413)
(U: 0.19634) (Q+U:  0.10398) (V:  -.----)
info string a2a4  (207 ) N:   58013 (+ 0) (P:  2.57%) (Q: -0.05243) (D:  0.457)
(U: 0.15614) (Q+U:  0.10372) (V:  -.----)
info string d2d3  (288 ) N:   83276 (+ 0) (P:  2.77%) (Q: -0.01397) (D:  0.455)
(U: 0.11743) (Q+U:  0.10347) (V:  -.----)
info string a2a3  (204 ) N:   85754 (+ 0) (P:  2.85%) (Q: -0.01359) (D:  0.447)
(U: 0.11705) (Q+U:  0.10346) (V:  -.----)
info string h2h3  (400 ) N:   97278 (+ 0) (P:  2.91%) (Q: -0.00223) (D:  0.454)
(U: 0.10562) (Q+U:  0.10339) (V:  -.----)
info string b2b3  (230 ) N:   98455 (+ 0) (P:  2.95%) (Q: -0.00236) (D:  0.442)
(U: 0.10575) (Q+U:  0.10339) (V:  -.----)
info string b1c3  (36  ) N:  122110 (+ 0) (P:  3.09%) (Q:  0.01402) (D:  0.437)
(U: 0.08925) (Q+U:  0.10328) (V:  -.----)
info string c2c3  (259 ) N:  201218 (+ 0) (P:  3.62%) (Q:  0.03965) (D:  0.463)
(U: 0.06346) (Q+U:  0.10310) (V:  -.----)
info string g2g3  (374 ) N:  348785 (+ 0) (P:  5.75%) (Q:  0.04488) (D:  0.467)
(U: 0.05818) (Q+U:  0.10305) (V:  -.----)
info string e2e3  (317 ) N:  413542 (+ 0) (P:  5.98%) (Q:  0.05205) (D:  0.475)
(U: 0.05095) (Q+U:  0.10300) (V:  -.----)
info string c2c4  (264 ) N:  688398 (+ 0) (P:  7.63%) (Q:  0.06384) (D:  0.464)
(U: 0.03907) (Q+U:  0.10291) (V:  -.----)
info string g1f3  (159 ) N: 1576963 (+ 0) (P: 10.01%) (Q:  0.08036) (D:  0.500)
(U: 0.02238) (Q+U:  0.10274) (V:  -.----)
info string d2d4  (293 ) N: 5674011 (+ 0) (P: 16.58%) (Q:  0.08522) (D:  0.510)
(U: 0.01030) (Q+U:  0.09552) (V:  -.----)
info string e2e4  (322 ) N: 248014778 (+121) (P: 19.27%) (Q:  0.10216) (D:  0.51
8) (U: 0.00027) (Q+U:  0.10243) (V:  -.----)
bestmove e2e4 ponder e7e5
e2e4 performs at an increasing 55.11%, while d2d4 performs at a decreasing 54.26%. Observe that d2d4 is not that well explored, and things get even worse for d2d4 when it is further explored. I restricted the moves to be explored to d2d4 g1f3 and c2c4, the following candidates after e2e4. The most explored, as expected, was d2d4:

About 130 million nodes (hours of search again):
Code: Select all
info string c2c4  (264 ) N: 1444102 (+ 0) (P:  7.63%) (Q:  0.05985) (D:  0.477)
(U: 0.01254) (Q+U:  0.07239) (V:  -.----)
info string g1f3  (159 ) N: 6657797 (+ 0) (P: 10.01%) (Q:  0.06870) (D:  0.520)
(U: 0.00357) (Q+U:  0.07227) (V:  -.----)
info string d2d4  (293 ) N: 124278559 (+117) (P: 16.58%) (Q:  0.07200) (D:  0.48
8) (U: 0.00032) (Q+U:  0.07231) (V:  -.----)
bestmove d2d4 ponder g8f6
The performance of d2d4 decreases further to 53.60% (while that previous of e2e4 increased to 55.11% in the previous very long search). To observe that all these values are VERY slowly moving, there are no any jumps like those with regular AB engines. So, e4 performs about 40% better than d4, taking the draw as the baseline. All other opening moves are worse than both. The difference IS significant.

To have a glimpse of what Leela sees ahead, a speculative one following the PV:

e4 line:
info depth 36 seldepth 99 time 31466907 nodes 257660432 score cp 5510 hashfull 1
000 nps 8188 tbhits 0 multipv 1 pv e2e4 e7e5 g1f3 b8c6 f1b5 g8f6 e1g1 f6e4 d2d4
e4d6 b5c6 d7c6 d4e5 d6f5 d1d8 e8d8 b1c3 f8e7 h2h3 f5h4 f3h4 e7h4 f1d1 d8e8 g2g4
h7h5 f2f3 c8e6 c3e2 e6d5 g1g2 f7f6 e2f4 h5g4 h3g4 a8d8 c1e3 e8f7 e5e6 d5e6 f4e6
f7e6 f3f4 b7b6 g2f3 c6c5 a2a4 h8e8 a4a5 g7g6 c2c4 d8d6 d1d6 e6d6 e3d2 e8d8 d2c3
d6e6 a1h1 g6g5 f4f5 e6e7 f3e2 b6a5 b2b3 d8b8 h1h3 e7f7 c3a5 b8e8 h3e3 h4g3 a5c3
e8e3 e2e3 c7c6 e3d3 f7e7 d3c2 g3f2 c2d3 f2g3 c3d2

After 8 moves, it's still in main theory:
C67
Spanish Game, Berlin Defense, l'Hermet Variation, Berlin Wall Defense

[d]r1bk1b1r/ppp2ppp/2p5/4Pn2/8/5N2/PPP2PPP/RNB2RK1 w - - 0 9

It is played by many top players today. Its performance among 2700+ players from my database is:

Players average: 2761
Performance: 2815

White Win: 22.1%
Draw: 66.2%
Black Win: 11.7%
========================
Overall performance: 55.2%

Although the draw rate is very high (66.2% compared to 52.4% from the starting position), White Win/Loss ratio is among the highest (almost 2) of the opening repertoire. According to Lc0, Black cannot defend better against e2e4.

d4 line:
info depth 28 seldepth 75 time 38173609 nodes 132380459 score cp 5359 hashfull 1
000 nps 3467 tbhits 0 multipv 1 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7 g1f3
e8g8 f1g2 d7d5 e1g1 c7c6 d1c2 b8d7 d2f4 b7b6 f1d1 c8a6 f3e5 a8c8 b1c3 a6c4 e5c4
d5c4 e2e4 b6b5 a2a4 a7a6 a4b5 a6b5 d4d5 c6d5 e4d5 e6e5 f4g5 e7c5 g2h3 h7h6 h3d7
h6g5 d7b5 c5d4 b5c6 f6e8 c3b5 e8d6 b5d4 e5d4 b2b3 c4b3 c2b3 c8b8 b3d3 b8b4 a1a4
b4a4 c6a4 d8f6 d3d4 f6d4 d1d4 f8b8 d4g4 f7f6 a4d7 g8f7 g4a4 f6f5 a4a6 f7e7 d7e6

It is soon out of general theory (Catalan Opening, General), but after 6 moves it's still played by many top players today, and it seems not that rare among top players (maybe a new trend?)

[d]rnbq1rk1/ppp1bppp/4pn2/3p4/2PP4/5NP1/PP1BPPBP/RN1QK2R w KQ d6 0 7

Its performance among 2700+ players from my database is:

Players average: 2752
Performance: 2775

White Win: 22.2%
Draw: 60.4%
Black Win: 17.4%
========================
Overall performance: 52.4%

The draw rate is somewhat lower here (still pretty high), but the White performance is not that good, too many Black wins. According to Lc0, White cannot do better with d2d4.

============================

Is it all due to some simple or stupid peculiarities of Lc0, this net, some tactics or something like that?

Wow, that's novel and impressive.
1.e4 "best by test" that's what Fischer said.
But I heard, just few days ago, someone claimed 1.c4 was best, but according to the table above it is placed 4th.

Laskos · Post by **Laskos** » Fri Sep 13, 2019 2:51 pm

MikeGL wrote: ↑Fri Sep 13, 2019 2:30 pm Wow, that's novel and impressive.
1.e4 "best by test" that's what Fischer said.
But I heard, just few days ago, someone claimed 1.c4 was best, but according to the table above it is placed 4th.

I put now to analyze c2c4 and it seems its score is decreasing, no chance for it to make it to top 2. I will leave analysis for maybe 50-100 million nodes with the best bignet of JHorthos. For some reason I sort of trust Lc0 with good nets here, maybe because they are so impressive in my test-suite, they really seem at say 1 million nodes (even less for bignet, it's a slower net) to "understand" almost all positional opening theory.

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 13, 2019 3:41 pm

What I find dubious A0 claims about opening theory is that their conclusions are drawn from games of 800 nodes ply searches.
How can that be more reliable than past methods in which alpha-beta searchers like Stockfish are used to analyze openings for days.
These two approaches are different in that the A0 methods relies more on statistics though each independent game is not that strong but
the average statistic of millions of games maybe a valuable positional opening book. The latter focuses more on accurate analysis of book lines
and maybe be the better of option in openings where accuracy is paramount.

On that note, now that you are using Lc0 just like Stockfish used to used i.e. analyzing opening for hours/days, I don't think the results you get
mix well with the 800-nodes games method. What I can conclude from your statistic of e2e4 vs d2d4 is that maybe the Lc0 search is noticing some tactical lines in e2e4 ( and given how poor lc0 is in tactics compared to Stockfish ), the better result for e2e4 is to be taken with a grain of salt. And infact I trust more the statistics method where 800-nodes search where e2e4 ~= d2d4.

For me the A0 conclusions openings are nothing more than this network prefers the French line more than others. You don't exactly get the same opening preference if you train a new network from scratch. So there could be a "butterfly effect" in which minor changes in initialization, search parameters lead to different opening preferences, but i do concede that since NN are good on positonal understanding the resulting opening book is also good.

ouachita · Post by **ouachita** » Fri Sep 13, 2019 3:56 pm

My success rate over the past 10 years is better with 1.d4

Laskos · Post by **Laskos** » Fri Sep 13, 2019 4:25 pm

Daniel Shawul wrote: ↑Fri Sep 13, 2019 3:41 pm What I find dubious A0 claims about opening theory is that their conclusions are drawn from games of 800 nodes ply searches.
How can that be more reliable than past methods in which alpha-beta searchers like Stockfish are used to analyze openings for days.
These two approaches are different in that the A0 methods relies more on statistics though each independent game is not that strong but
the average statistic of millions of games maybe a valuable positional opening book. The latter focuses more on accurate analysis of book lines
and maybe be the better of option in openings where accuracy is paramount.

On that note, now that you are using Lc0 just like Stockfish used to used i.e. analyzing opening for hours/days, I don't think the results you get
mix well with the 800-nodes games method. What I can conclude from your statistic of e2e4 vs d2d4 is that maybe the Lc0 search is noticing some tactical lines in e2e4 ( and given how poor lc0 is in tactics compared to Stockfish ), the better result for e2e4 is to be taken with a grain of salt. And infact I trust more the statistics method where 800-nodes search where e2e4 ~= d2d4.

For me the A0 conclusions openings are nothing more than this network prefers the French line more than others. You don't exactly get the same opening preference if you train a new network from scratch. So there could be a "butterfly effect" in which minor changes in initialization, search parameters lead to different opening preferences, but i do concede that since NN are good on positonal understanding the resulting opening book is also good.

So, you trust more nodes=1 eval (P) than nodes = hundreds of millions (Q)? (P) is mostly what it assimilated from 800 nodes/ply matches, practically doing as best as it can to statistically incorporate that search into the eval.
I wouldn't share you opinion, although it can make one think. One data is the following: my positional test suite built upon human opening theory contains 200 positions. I suspect that 20-25 of solutions are wrong or ambiguous. Maybe some few more are much above human theory level. Lc0 with the bignet at 1 node solves 125/200 (very high indeed, equivalent to Stockfish at some 50 million nodes). At some 500k nodes it solves 165/200, flooring being at maybe some 175/200 (or even less). It does bring me some confidence that longer searches do help a lot.
About accuracy, this would imply that tactics hard to see to Lc0 might be important. With my 8-14 movers of the opening testing suite, where tactics was indeed an issue, I had less problems, Leela outperformed SF massively. Here tactics should enter even less. This "tactics" theory of some "tricky moves" or consecutive "unique moves" can be similar to that Chess in fact is a Black Win (zugzwang). I don't find that as plausible.
Yes, nets have some drifts and "catastrophic butterflies", but I put the short 1 million nodes search, which is according to known theory (e2e4 ~= d2d4), and longer, hundreds of millions nodes searches, deviating from known theory. I am not sure where catastrophic butterflies can enter here. MCTS with averaging is stable to butterflies, AFAIK.

Laskos · Post by **Laskos** » Fri Sep 13, 2019 4:34 pm

MikeGL wrote: ↑Fri Sep 13, 2019 2:30 pm Wow, that's novel and impressive.
1.e4 "best by test" that's what Fischer said.
But I heard, just few days ago, someone claimed 1.c4 was best, but according to the table above it is placed 4th.

No, c4 performs worse and worse in the search, and after about 50 million nodes it comes at a decreasing 52.59%. Solidly behind d2d4 (and the best, e2e4).

Code: Select all

info depth 26 seldepth 65 time 3071296 nodes 51806543 score cp 5259 hashfull 100
0 nps 16867 tbhits 0 multipv 1 pv c2c4 e7e5 g2g3 g8f6 f1g2 f8c5 d2d3 b8c6 b1c3 e
8g8 a2a3 a7a6 g1f3 d7d6 b2b4 c5a7 e1g1 c6d4 f3d4 a7d4 d1c2 c7c6 c1g5 h7h6 g5f6 d
8f6 a1b1 f6e7 a3a4 c8e6 e2e3 d4c3 c2c3 a8c8 f1c1 h6h5 h2h4 g7g5 h4g5 e7g5 c4c5 h
5h4 c5d6 h4g3 d3d4

info string c2c4  (264 ) N: 50001131 (+82) (P:  7.63%) (Q:  0.05197) (D:  0.488)
 (U: 0.00021) (Q+U:  0.05218) (V:  -.----)

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 13, 2019 5:24 pm

Laskos wrote: ↑Fri Sep 13, 2019 4:25 pm
Daniel Shawul wrote: ↑Fri Sep 13, 2019 3:41 pm What I find dubious A0 claims about opening theory is that their conclusions are drawn from games of 800 nodes ply searches.
How can that be more reliable than past methods in which alpha-beta searchers like Stockfish are used to analyze openings for days.
These two approaches are different in that the A0 methods relies more on statistics though each independent game is not that strong but
the average statistic of millions of games maybe a valuable positional opening book. The latter focuses more on accurate analysis of book lines
and maybe be the better of option in openings where accuracy is paramount.

On that note, now that you are using Lc0 just like Stockfish used to used i.e. analyzing opening for hours/days, I don't think the results you get
mix well with the 800-nodes games method. What I can conclude from your statistic of e2e4 vs d2d4 is that maybe the Lc0 search is noticing some tactical lines in e2e4 ( and given how poor lc0 is in tactics compared to Stockfish ), the better result for e2e4 is to be taken with a grain of salt. And infact I trust more the statistics method where 800-nodes search where e2e4 ~= d2d4.

For me the A0 conclusions openings are nothing more than this network prefers the French line more than others. You don't exactly get the same opening preference if you train a new network from scratch. So there could be a "butterfly effect" in which minor changes in initialization, search parameters lead to different opening preferences, but i do concede that since NN are good on positonal understanding the resulting opening book is also good.
So, you trust more nodes=1 eval (P) than nodes = hundreds of millions (Q)? (P) is mostly what it assimilated from 800 nodes/ply matches, practically doing as best as it can to statistically incorporate that search into the eval.

If the millions of nodes search were performed on every ply of the game, ofcourse I would trust that but here you are only doing it on one position -- the start position. However, since you have NN evaluation at the leaves and assuming that it is equivalent in strength to a rollouts search, then the deep search maybe better indeed. My worry is that if the search is being misguided by shallow tactics which will turn out to be nothing in the end -- a good example is a gambit line -- then the method that uses average statistics of 800-nodes games maybe a better opening book.
We are assuming that NN evaluation ~= rollouts search, which A0 were doing 50/50 at some point. An actual rollouts search to the end of the game captures long term effects well, which is why the old style MCTS before NN era were better than any alpha-beta searcher with heuristic eval (if we ignore the branching factor issue for now).

I wouldn't share you opinion, although it can make one think. One data is the following: my positional test suite built upon human opening theory contains 200 positions. I suspect that 20-25 of solutions are wrong or ambiguous. Maybe some few more are much above human theory level. Lc0 with the bignet at 1 node solves 125/200 (very high indeed, equivalent to Stockfish at some 50 million nodes). At some 500k nodes it solves 165/200, flooring being at maybe some 175/200 (or even less). It does bring me some confidence that longer searches do help a lot.

Ok. I forgot about the NN evaluation at the tips that is equivalent to a rollouts search and had better positional understanding. However, the story could be different if we use standard evaluation though i.e. make a book by doing milions of 8-ply search per ply game, vs search from the intial postion maybe to depth of 50.

About accuracy, this would imply that tactics hard to see to Lc0 might be important. With my 8-14 movers of the opening testing suite, where tactics was indeed an issue, I had less problems, Leela outperformed SF massively. Here tactics should enter even less. This "tactics" theory of some "tricky moves" or consecutive "unique moves" can be similar to that Chess in fact is a Black Win (zugzwang). I don't find that as plausible.
Yes, nets have some drifts and "catastrophic butterflies", but I put the short 1 million nodes search, which is according to known theory (e2e4 ~= d2d4), and longer, hundreds of millions nodes searches, deviating from known theory. I am not sure where catastrophic butterflies can enter here. MCTS with averaging is stable to butterflies, AFAIK.

AFAIK if you train a new network from scratch with different initialization and random number sequence, then you won't get the same opening distribution every time. I consider the initilaization changes minor, while the consequence (opening perfernce changes) to be big. If more or less the same network is produced on every run, i guess there is no point in training a new network from scratch.

jdart · Post by **jdart** » Fri Sep 13, 2019 5:27 pm

I don't think you are ever going to be able to decide this question with a root-level search.

--JOn

Laskos · Post by **Laskos** » Fri Sep 13, 2019 6:06 pm

Daniel Shawul wrote: ↑Fri Sep 13, 2019 5:24 pm
Laskos wrote: ↑Fri Sep 13, 2019 4:25 pm
Daniel Shawul wrote: ↑Fri Sep 13, 2019 3:41 pm What I find dubious A0 claims about opening theory is that their conclusions are drawn from games of 800 nodes ply searches.
How can that be more reliable than past methods in which alpha-beta searchers like Stockfish are used to analyze openings for days.
These two approaches are different in that the A0 methods relies more on statistics though each independent game is not that strong but
the average statistic of millions of games maybe a valuable positional opening book. The latter focuses more on accurate analysis of book lines
and maybe be the better of option in openings where accuracy is paramount.

On that note, now that you are using Lc0 just like Stockfish used to used i.e. analyzing opening for hours/days, I don't think the results you get
mix well with the 800-nodes games method. What I can conclude from your statistic of e2e4 vs d2d4 is that maybe the Lc0 search is noticing some tactical lines in e2e4 ( and given how poor lc0 is in tactics compared to Stockfish ), the better result for e2e4 is to be taken with a grain of salt. And infact I trust more the statistics method where 800-nodes search where e2e4 ~= d2d4.

For me the A0 conclusions openings are nothing more than this network prefers the French line more than others. You don't exactly get the same opening preference if you train a new network from scratch. So there could be a "butterfly effect" in which minor changes in initialization, search parameters lead to different opening preferences, but i do concede that since NN are good on positonal understanding the resulting opening book is also good.
So, you trust more nodes=1 eval (P) than nodes = hundreds of millions (Q)? (P) is mostly what it assimilated from 800 nodes/ply matches, practically doing as best as it can to statistically incorporate that search into the eval.
If the millions of nodes search were performed on every ply of the game, ofcourse I would trust that but here you are only doing it on one position -- the start position. However, since you have NN evaluation at the leaves and assuming that it is equivalent in strength to a rollouts search, then the deep search maybe better indeed. My worry is that if the search is being misguided by shallow tactics which will turn out to be nothing in the end -- a good example is a gambit line -- then the method that uses average statistics of 800-nodes games maybe a better opening book.
We are assuming that NN evaluation ~= rollouts search, which A0 were doing 50/50 at some point. An actual rollouts search to the end of the game captures long term effects well, which is why the old style MCTS before NN era were better than any alpha-beta searcher with heuristic eval (if we ignore the branching factor issue for now).

I wouldn't share you opinion, although it can make one think. One data is the following: my positional test suite built upon human opening theory contains 200 positions. I suspect that 20-25 of solutions are wrong or ambiguous. Maybe some few more are much above human theory level. Lc0 with the bignet at 1 node solves 125/200 (very high indeed, equivalent to Stockfish at some 50 million nodes). At some 500k nodes it solves 165/200, flooring being at maybe some 175/200 (or even less). It does bring me some confidence that longer searches do help a lot.
Ok. I forgot about the NN evaluation at the tips that is equivalent to a rollouts search and had better positional understanding. However, the story could be different if we use standard evaluation though i.e. make a book by doing milions of 8-ply search per ply game, vs search from the intial postion maybe to depth of 50.

So, this is mostly about the statistic of outcomes in various forms of search and eval versus the statistic of MCTS deep search aggregate of NN evals (and NN evals using limited rollouts). I do not know where to stand here. The statistic of outcomes has its flaws, serious flaws, regarding the openings. I think with AB engines one can use 2moves_v1 random file and 3moves_GM with a similar effect. Outcomes with badly evaluating engines ply by ply are not sensitive almost at all to the first moves to start with, where exactly these bad evaluations count.

About accuracy, this would imply that tactics hard to see to Lc0 might be important. With my 8-14 movers of the opening testing suite, where tactics was indeed an issue, I had less problems, Leela outperformed SF massively. Here tactics should enter even less. This "tactics" theory of some "tricky moves" or consecutive "unique moves" can be similar to that Chess in fact is a Black Win (zugzwang). I don't find that as plausible.
Yes, nets have some drifts and "catastrophic butterflies", but I put the short 1 million nodes search, which is according to known theory (e2e4 ~= d2d4), and longer, hundreds of millions nodes searches, deviating from known theory. I am not sure where catastrophic butterflies can enter here. MCTS with averaging is stable to butterflies, AFAIK.
AFAIK if you train a new network from scratch with different initialization and random number sequence, then you won't get the same opening distribution every time. I consider the initilaization changes minor, while the consequence (opening perfernce changes) to be big. If more or less the same network is produced on every run, i guess there is no point in training a new network from scratch.

Sure catastrophic butterflies are all over there during the training, the nets drift like crazy. I guess one can check this effect on a very long search by using some T40 net (maybe a bit shifted earlier one) or a T30 net. One would need much more nodes, some factors of 2-8, as they are weaker per node than this bignet, especially T30. And scale worse to long searches. I will leave that to someone with big hardware (the RAM does matter in the speed of that huge search).

Is e4 significantly better than d4?

Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?

Re: Is e4 significantly better than d4?