What the heck happens here?
Posted: Tue Aug 20, 2019 9:22 am
I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.
Hardware: 4 i7 cores, RTX 2070 GPU.
The similarity matrix is here:
The dendrogram in SPSS using average linkage between groups via correlation is here:
First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.
I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:
1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.
2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.
3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.
4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.
If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.
Do you have any idea why this behavior occurs? What does seem plausible to you?
Hardware: 4 i7 cores, RTX 2070 GPU.
The similarity matrix is here:
Code: Select all
sim version 3
Key:
1) Andscacs 0.95 (time: 100 ms scale: 10.0)
2) Ethereal 11.50 (time: 100 ms scale: 10.0)
3) Fire 7.1 (time: 100 ms scale: 10.0)
4) Fruit 2.1 (time: 100 ms scale: 10.0)
5) Komodo 13.02 (time: 100 ms scale: 10.0)
6) Lc0 11261 (time: 100 ms scale: 10.0)
7) Lc0 32930 (time: 100 ms scale: 10.0)
8) Lc0 42184 (time: 100 ms scale: 10.0)
9) Lc0 42850 (time: 100 ms scale: 10.0)
10) Lc0_320x24b (time: 100 ms scale: 10.0)
11) Lc0_320x24b x0.2 (time: 100 ms scale: 2.0)
12) Senpai 1.0 (time: 100 ms scale: 10.0)
13) SF 10 (time: 100 ms scale: 10.0)
14) SF 8 (time: 100 ms scale: 10.0)
15) SF dev (time: 100 ms scale: 10.0)
16) SF dev x10 (time: 100 ms scale: 100.0)
17) SF dev x30 (time: 100 ms scale: 300.0)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1. ----- 49.19 45.69 37.95 48.17 45.19 43.86 44.14 44.25 44.57 44.68 46.88 50.36 52.22 49.93 46.58 45.47
2. 49.19 ----- 48.05 39.58 48.57 47.01 45.40 45.88 46.07 45.51 45.65 48.66 52.15 52.48 52.09 47.49 47.15
3. 45.69 48.05 ----- 40.17 46.43 42.74 41.85 42.69 42.35 42.17 43.06 45.35 48.36 50.24 47.69 43.76 42.91
4. 37.95 39.58 40.17 ----- 39.51 35.34 35.08 35.31 35.07 35.45 35.71 46.55 37.81 39.88 37.50 34.78 34.69
5. 48.17 48.57 46.43 39.51 ----- 46.07 44.70 45.39 45.25 45.38 45.34 48.28 50.10 51.18 50.15 47.10 46.13
6. 45.19 47.01 42.74 35.34 46.07 ----- 73.45 73.46 73.95 74.13 66.64 42.40 51.41 49.22 50.66 59.04 58.92
7. 43.86 45.40 41.85 35.08 44.70 73.45 ----- 77.70 77.43 78.42 68.35 42.02 50.12 47.50 49.58 57.71 58.05
8. 44.14 45.88 42.69 35.31 45.39 73.46 77.70 ----- 84.94 81.80 70.64 43.01 49.70 48.18 49.56 58.30 58.30
9. 44.25 46.07 42.35 35.07 45.25 73.95 77.43 84.94 ----- 82.58 70.72 42.61 50.08 48.49 49.73 58.45 58.82
10. 44.57 45.51 42.17 35.45 45.38 74.13 78.42 81.80 82.58 ----- 72.29 42.11 50.28 48.05 50.35 59.25 59.07
11. 44.68 45.65 43.06 35.71 45.34 66.64 68.35 70.64 70.72 72.29 ----- 42.61 49.78 47.16 49.42 54.14 54.62
12. 46.88 48.66 45.35 46.55 48.28 42.40 42.02 43.01 42.61 42.11 42.61 ----- 46.42 48.07 46.56 42.70 41.94
13. 50.36 52.15 48.36 37.81 50.10 51.41 50.12 49.70 50.08 50.28 49.78 46.42 ----- 58.76 63.17 55.01 53.95
14. 52.22 52.48 50.24 39.88 51.18 49.22 47.50 48.18 48.49 48.05 47.16 48.07 58.76 ----- 57.13 52.22 51.29
15. 49.93 52.09 47.69 37.50 50.15 50.66 49.58 49.56 49.73 50.35 49.42 46.56 63.17 57.13 ----- 55.39 53.79
16. 46.58 47.49 43.76 34.78 47.10 59.04 57.71 58.30 58.45 59.25 54.14 42.70 55.01 52.22 55.39 ----- 72.27
17. 45.47 47.15 42.91 34.69 46.13 59.22 58.55 58.80 59.32 59.77 55.12 41.94 53.95 51.29 53.79 72.27 -----
First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.
I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:
1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.
2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.
3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.
4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.
If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.
Do you have any idea why this behavior occurs? What does seem plausible to you?