What the heck happens here?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

What the heck happens here?

Post by Laskos »

I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.

Hardware: 4 i7 cores, RTX 2070 GPU.

The similarity matrix is here:

Code: Select all

sim version 3

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 10.0)
  2) Ethereal 11.50 (time: 100 ms  scale: 10.0)
  3) Fire 7.1 (time: 100 ms  scale: 10.0)
  4) Fruit 2.1 (time: 100 ms  scale: 10.0)
  5) Komodo 13.02 (time: 100 ms  scale: 10.0)
  6) Lc0 11261 (time: 100 ms  scale: 10.0)
  7) Lc0 32930 (time: 100 ms  scale: 10.0)
  8) Lc0 42184 (time: 100 ms  scale: 10.0)
  9) Lc0 42850 (time: 100 ms  scale: 10.0)
 10) Lc0_320x24b (time: 100 ms  scale: 10.0)
 11) Lc0_320x24b x0.2 (time: 100 ms  scale: 2.0)
 12) Senpai 1.0 (time: 100 ms  scale: 10.0)
 13) SF 10 (time: 100 ms  scale: 10.0)
 14) SF 8 (time: 100 ms  scale: 10.0)
 15) SF dev (time: 100 ms  scale: 10.0)
 16) SF dev x10 (time: 100 ms  scale: 100.0)
 17) SF dev x30 (time: 100 ms  scale: 300.0)

         1     2     3     4     5     6     7     8     9    10    11    12	13    14    15    16    17
  1.  ----- 49.19 45.69 37.95 48.17 45.19 43.86 44.14 44.25 44.57 44.68 46.88 50.36 52.22 49.93 46.58 45.47
  2.  49.19 ----- 48.05 39.58 48.57 47.01 45.40 45.88 46.07 45.51 45.65 48.66 52.15 52.48 52.09 47.49 47.15
  3.  45.69 48.05 ----- 40.17 46.43 42.74 41.85 42.69 42.35 42.17 43.06 45.35 48.36 50.24 47.69 43.76 42.91
  4.  37.95 39.58 40.17 ----- 39.51 35.34 35.08 35.31 35.07 35.45 35.71 46.55 37.81 39.88 37.50 34.78 34.69
  5.  48.17 48.57 46.43 39.51 ----- 46.07 44.70 45.39 45.25 45.38 45.34 48.28 50.10 51.18 50.15 47.10 46.13
  6.  45.19 47.01 42.74 35.34 46.07 ----- 73.45 73.46 73.95 74.13 66.64 42.40 51.41 49.22 50.66 59.04 58.92
  7.  43.86 45.40 41.85 35.08 44.70 73.45 ----- 77.70 77.43 78.42 68.35 42.02 50.12 47.50 49.58 57.71 58.05
  8.  44.14 45.88 42.69 35.31 45.39 73.46 77.70 ----- 84.94 81.80 70.64 43.01 49.70 48.18 49.56 58.30 58.30
  9.  44.25 46.07 42.35 35.07 45.25 73.95 77.43 84.94 ----- 82.58 70.72 42.61 50.08 48.49 49.73 58.45 58.82
 10.  44.57 45.51 42.17 35.45 45.38 74.13 78.42 81.80 82.58 ----- 72.29 42.11 50.28 48.05 50.35 59.25 59.07
 11.  44.68 45.65 43.06 35.71 45.34 66.64 68.35 70.64 70.72 72.29 ----- 42.61 49.78 47.16 49.42 54.14 54.62
 12.  46.88 48.66 45.35 46.55 48.28 42.40 42.02 43.01 42.61 42.11 42.61 ----- 46.42 48.07 46.56 42.70 41.94
 13.  50.36 52.15 48.36 37.81 50.10 51.41 50.12 49.70 50.08 50.28 49.78 46.42 ----- 58.76 63.17 55.01 53.95
 14.  52.22 52.48 50.24 39.88 51.18 49.22 47.50 48.18 48.49 48.05 47.16 48.07 58.76 ----- 57.13 52.22 51.29
 15.  49.93 52.09 47.69 37.50 50.15 50.66 49.58 49.56 49.73 50.35 49.42 46.56 63.17 57.13 ----- 55.39 53.79
 16.  46.58 47.49 43.76 34.78 47.10 59.04 57.71 58.30 58.45 59.25 54.14 42.70 55.01 52.22 55.39 ----- 72.27
 17.  45.47 47.15 42.91 34.69 46.13 59.22 58.55 58.80 59.32 59.77 55.12 41.94 53.95 51.29 53.79 72.27 -----
 
The dendrogram in SPSS using average linkage between groups via correlation is here:

Lc0_02_dendr.jpg


First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.

I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:

1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.

2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.

3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.

4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.


If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.

Do you have any idea why this behavior occurs? What does seem plausible to you?
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: What the heck happens here?

Post by mhull »

Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.

Hardware: 4 i7 cores, RTX 2070 GPU.

The similarity matrix is here:

Code: Select all

sim version 3

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 10.0)
  2) Ethereal 11.50 (time: 100 ms  scale: 10.0)
  3) Fire 7.1 (time: 100 ms  scale: 10.0)
  4) Fruit 2.1 (time: 100 ms  scale: 10.0)
  5) Komodo 13.02 (time: 100 ms  scale: 10.0)
  6) Lc0 11261 (time: 100 ms  scale: 10.0)
  7) Lc0 32930 (time: 100 ms  scale: 10.0)
  8) Lc0 42184 (time: 100 ms  scale: 10.0)
  9) Lc0 42850 (time: 100 ms  scale: 10.0)
 10) Lc0_320x24b (time: 100 ms  scale: 10.0)
 11) Lc0_320x24b x0.2 (time: 100 ms  scale: 2.0)
 12) Senpai 1.0 (time: 100 ms  scale: 10.0)
 13) SF 10 (time: 100 ms  scale: 10.0)
 14) SF 8 (time: 100 ms  scale: 10.0)
 15) SF dev (time: 100 ms  scale: 10.0)
 16) SF dev x10 (time: 100 ms  scale: 100.0)
 17) SF dev x30 (time: 100 ms  scale: 300.0)

         1     2     3     4     5     6     7     8     9    10    11    12	13    14    15    16    17
  1.  ----- 49.19 45.69 37.95 48.17 45.19 43.86 44.14 44.25 44.57 44.68 46.88 50.36 52.22 49.93 46.58 45.47
  2.  49.19 ----- 48.05 39.58 48.57 47.01 45.40 45.88 46.07 45.51 45.65 48.66 52.15 52.48 52.09 47.49 47.15
  3.  45.69 48.05 ----- 40.17 46.43 42.74 41.85 42.69 42.35 42.17 43.06 45.35 48.36 50.24 47.69 43.76 42.91
  4.  37.95 39.58 40.17 ----- 39.51 35.34 35.08 35.31 35.07 35.45 35.71 46.55 37.81 39.88 37.50 34.78 34.69
  5.  48.17 48.57 46.43 39.51 ----- 46.07 44.70 45.39 45.25 45.38 45.34 48.28 50.10 51.18 50.15 47.10 46.13
  6.  45.19 47.01 42.74 35.34 46.07 ----- 73.45 73.46 73.95 74.13 66.64 42.40 51.41 49.22 50.66 59.04 58.92
  7.  43.86 45.40 41.85 35.08 44.70 73.45 ----- 77.70 77.43 78.42 68.35 42.02 50.12 47.50 49.58 57.71 58.05
  8.  44.14 45.88 42.69 35.31 45.39 73.46 77.70 ----- 84.94 81.80 70.64 43.01 49.70 48.18 49.56 58.30 58.30
  9.  44.25 46.07 42.35 35.07 45.25 73.95 77.43 84.94 ----- 82.58 70.72 42.61 50.08 48.49 49.73 58.45 58.82
 10.  44.57 45.51 42.17 35.45 45.38 74.13 78.42 81.80 82.58 ----- 72.29 42.11 50.28 48.05 50.35 59.25 59.07
 11.  44.68 45.65 43.06 35.71 45.34 66.64 68.35 70.64 70.72 72.29 ----- 42.61 49.78 47.16 49.42 54.14 54.62
 12.  46.88 48.66 45.35 46.55 48.28 42.40 42.02 43.01 42.61 42.11 42.61 ----- 46.42 48.07 46.56 42.70 41.94
 13.  50.36 52.15 48.36 37.81 50.10 51.41 50.12 49.70 50.08 50.28 49.78 46.42 ----- 58.76 63.17 55.01 53.95
 14.  52.22 52.48 50.24 39.88 51.18 49.22 47.50 48.18 48.49 48.05 47.16 48.07 58.76 ----- 57.13 52.22 51.29
 15.  49.93 52.09 47.69 37.50 50.15 50.66 49.58 49.56 49.73 50.35 49.42 46.56 63.17 57.13 ----- 55.39 53.79
 16.  46.58 47.49 43.76 34.78 47.10 59.04 57.71 58.30 58.45 59.25 54.14 42.70 55.01 52.22 55.39 ----- 72.27
 17.  45.47 47.15 42.91 34.69 46.13 59.22 58.55 58.80 59.32 59.77 55.12 41.94 53.95 51.29 53.79 72.27 -----
 
The dendrogram in SPSS using average linkage between groups via correlation is here:


Lc0_02_dendr.jpg

First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.

I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:

1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.

2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.

3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.

4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.


If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.

Do you have any idea why this behavior occurs? What does seem plausible to you?
Not sure I understand your question, "I don't like /1/", is that a reference to "1/ Some ..." above? Anyway...

If I understand correctly, the NN serves like an eval function and MCTS as search. The NN is the key piece of chess knowledge for Lc0, whereas the conventional programs "knowledge" is split between eval and lmr-enhanced search. At short time controls, Lc0 hits the correct move more often because a high percentage of its understanding is in the NN. The other programs are hampered at short tc, because a high percentage of their "knowledge" is formed on-the-fly at depth.
Matthew Hull
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: What the heck happens here?

Post by noobpwnftw »

It's quite straightforward:
The learning target of those NNs are to fit the outcomes of a one-node evaluation to a multi-node search. In essence, they don't learn chess, they learn how to produce the most statistical fitting outcome of a defined way of search. In turn, with chess rules applied, this results in them being mostly able to find correct moves for chess positions with relatively low number of evaluations. This also explains why they don't scale: if they do then the NNs are probably under-trained.

However, for example, in a drawn position, all correct moves have the exact same outcome, yet they don't have the same chance that the opponent can make a blunder, so they may be evaluated differently, in pure WDL point of view, this chance does not exist.

Getting the most statistically sound move for most of the positions and getting to the ground truth of chess are two different things, there is nothing wrong with your observation.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: What the heck happens here?

Post by Pio »

Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.

Hardware: 4 i7 cores, RTX 2070 GPU.

The similarity matrix is here:

Code: Select all

sim version 3

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 10.0)
  2) Ethereal 11.50 (time: 100 ms  scale: 10.0)
  3) Fire 7.1 (time: 100 ms  scale: 10.0)
  4) Fruit 2.1 (time: 100 ms  scale: 10.0)
  5) Komodo 13.02 (time: 100 ms  scale: 10.0)
  6) Lc0 11261 (time: 100 ms  scale: 10.0)
  7) Lc0 32930 (time: 100 ms  scale: 10.0)
  8) Lc0 42184 (time: 100 ms  scale: 10.0)
  9) Lc0 42850 (time: 100 ms  scale: 10.0)
 10) Lc0_320x24b (time: 100 ms  scale: 10.0)
 11) Lc0_320x24b x0.2 (time: 100 ms  scale: 2.0)
 12) Senpai 1.0 (time: 100 ms  scale: 10.0)
 13) SF 10 (time: 100 ms  scale: 10.0)
 14) SF 8 (time: 100 ms  scale: 10.0)
 15) SF dev (time: 100 ms  scale: 10.0)
 16) SF dev x10 (time: 100 ms  scale: 100.0)
 17) SF dev x30 (time: 100 ms  scale: 300.0)

         1     2     3     4     5     6     7     8     9    10    11    12	13    14    15    16    17
  1.  ----- 49.19 45.69 37.95 48.17 45.19 43.86 44.14 44.25 44.57 44.68 46.88 50.36 52.22 49.93 46.58 45.47
  2.  49.19 ----- 48.05 39.58 48.57 47.01 45.40 45.88 46.07 45.51 45.65 48.66 52.15 52.48 52.09 47.49 47.15
  3.  45.69 48.05 ----- 40.17 46.43 42.74 41.85 42.69 42.35 42.17 43.06 45.35 48.36 50.24 47.69 43.76 42.91
  4.  37.95 39.58 40.17 ----- 39.51 35.34 35.08 35.31 35.07 35.45 35.71 46.55 37.81 39.88 37.50 34.78 34.69
  5.  48.17 48.57 46.43 39.51 ----- 46.07 44.70 45.39 45.25 45.38 45.34 48.28 50.10 51.18 50.15 47.10 46.13
  6.  45.19 47.01 42.74 35.34 46.07 ----- 73.45 73.46 73.95 74.13 66.64 42.40 51.41 49.22 50.66 59.04 58.92
  7.  43.86 45.40 41.85 35.08 44.70 73.45 ----- 77.70 77.43 78.42 68.35 42.02 50.12 47.50 49.58 57.71 58.05
  8.  44.14 45.88 42.69 35.31 45.39 73.46 77.70 ----- 84.94 81.80 70.64 43.01 49.70 48.18 49.56 58.30 58.30
  9.  44.25 46.07 42.35 35.07 45.25 73.95 77.43 84.94 ----- 82.58 70.72 42.61 50.08 48.49 49.73 58.45 58.82
 10.  44.57 45.51 42.17 35.45 45.38 74.13 78.42 81.80 82.58 ----- 72.29 42.11 50.28 48.05 50.35 59.25 59.07
 11.  44.68 45.65 43.06 35.71 45.34 66.64 68.35 70.64 70.72 72.29 ----- 42.61 49.78 47.16 49.42 54.14 54.62
 12.  46.88 48.66 45.35 46.55 48.28 42.40 42.02 43.01 42.61 42.11 42.61 ----- 46.42 48.07 46.56 42.70 41.94
 13.  50.36 52.15 48.36 37.81 50.10 51.41 50.12 49.70 50.08 50.28 49.78 46.42 ----- 58.76 63.17 55.01 53.95
 14.  52.22 52.48 50.24 39.88 51.18 49.22 47.50 48.18 48.49 48.05 47.16 48.07 58.76 ----- 57.13 52.22 51.29
 15.  49.93 52.09 47.69 37.50 50.15 50.66 49.58 49.56 49.73 50.35 49.42 46.56 63.17 57.13 ----- 55.39 53.79
 16.  46.58 47.49 43.76 34.78 47.10 59.04 57.71 58.30 58.45 59.25 54.14 42.70 55.01 52.22 55.39 ----- 72.27
 17.  45.47 47.15 42.91 34.69 46.13 59.22 58.55 58.80 59.32 59.77 55.12 41.94 53.95 51.29 53.79 72.27 -----
 
The dendrogram in SPSS using average linkage between groups via correlation is here:


Lc0_02_dendr.jpg

First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.

I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:

1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.

2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.

3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.

4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.


If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.

Do you have any idea why this behavior occurs? What does seem plausible to you?
Hi Kai!

First of all I must say I find your posts very interesting.

I believe that the LMR is the culprit ;) because it will use statistics to order the moves.

It would be really interesting if you tested my hypothesis by removing LMR but increase the search time so that the strength of the stockfish with and without LMR will perform the same.

Another thing that would be interesting is to use the history counters not only for ordering the moves but also to add a little bonus or malus to the evaluation. Of course you will have to treat the bonus as a separate score so you will not get a feedback loop that will make good history lead to better evaluation that will lead to even better history ...

/Pio
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: What the heck happens here?

Post by chrisw »

Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.

Hardware: 4 i7 cores, RTX 2070 GPU.

The similarity matrix is here:

Code: Select all

sim version 3

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 10.0)
  2) Ethereal 11.50 (time: 100 ms  scale: 10.0)
  3) Fire 7.1 (time: 100 ms  scale: 10.0)
  4) Fruit 2.1 (time: 100 ms  scale: 10.0)
  5) Komodo 13.02 (time: 100 ms  scale: 10.0)
  6) Lc0 11261 (time: 100 ms  scale: 10.0)
  7) Lc0 32930 (time: 100 ms  scale: 10.0)
  8) Lc0 42184 (time: 100 ms  scale: 10.0)
  9) Lc0 42850 (time: 100 ms  scale: 10.0)
 10) Lc0_320x24b (time: 100 ms  scale: 10.0)
 11) Lc0_320x24b x0.2 (time: 100 ms  scale: 2.0)
 12) Senpai 1.0 (time: 100 ms  scale: 10.0)
 13) SF 10 (time: 100 ms  scale: 10.0)
 14) SF 8 (time: 100 ms  scale: 10.0)
 15) SF dev (time: 100 ms  scale: 10.0)
 16) SF dev x10 (time: 100 ms  scale: 100.0)
 17) SF dev x30 (time: 100 ms  scale: 300.0)

         1     2     3     4     5     6     7     8     9    10    11    12	13    14    15    16    17
  1.  ----- 49.19 45.69 37.95 48.17 45.19 43.86 44.14 44.25 44.57 44.68 46.88 50.36 52.22 49.93 46.58 45.47
  2.  49.19 ----- 48.05 39.58 48.57 47.01 45.40 45.88 46.07 45.51 45.65 48.66 52.15 52.48 52.09 47.49 47.15
  3.  45.69 48.05 ----- 40.17 46.43 42.74 41.85 42.69 42.35 42.17 43.06 45.35 48.36 50.24 47.69 43.76 42.91
  4.  37.95 39.58 40.17 ----- 39.51 35.34 35.08 35.31 35.07 35.45 35.71 46.55 37.81 39.88 37.50 34.78 34.69
  5.  48.17 48.57 46.43 39.51 ----- 46.07 44.70 45.39 45.25 45.38 45.34 48.28 50.10 51.18 50.15 47.10 46.13
  6.  45.19 47.01 42.74 35.34 46.07 ----- 73.45 73.46 73.95 74.13 66.64 42.40 51.41 49.22 50.66 59.04 58.92
  7.  43.86 45.40 41.85 35.08 44.70 73.45 ----- 77.70 77.43 78.42 68.35 42.02 50.12 47.50 49.58 57.71 58.05
  8.  44.14 45.88 42.69 35.31 45.39 73.46 77.70 ----- 84.94 81.80 70.64 43.01 49.70 48.18 49.56 58.30 58.30
  9.  44.25 46.07 42.35 35.07 45.25 73.95 77.43 84.94 ----- 82.58 70.72 42.61 50.08 48.49 49.73 58.45 58.82
 10.  44.57 45.51 42.17 35.45 45.38 74.13 78.42 81.80 82.58 ----- 72.29 42.11 50.28 48.05 50.35 59.25 59.07
 11.  44.68 45.65 43.06 35.71 45.34 66.64 68.35 70.64 70.72 72.29 ----- 42.61 49.78 47.16 49.42 54.14 54.62
 12.  46.88 48.66 45.35 46.55 48.28 42.40 42.02 43.01 42.61 42.11 42.61 ----- 46.42 48.07 46.56 42.70 41.94
 13.  50.36 52.15 48.36 37.81 50.10 51.41 50.12 49.70 50.08 50.28 49.78 46.42 ----- 58.76 63.17 55.01 53.95
 14.  52.22 52.48 50.24 39.88 51.18 49.22 47.50 48.18 48.49 48.05 47.16 48.07 58.76 ----- 57.13 52.22 51.29
 15.  49.93 52.09 47.69 37.50 50.15 50.66 49.58 49.56 49.73 50.35 49.42 46.56 63.17 57.13 ----- 55.39 53.79
 16.  46.58 47.49 43.76 34.78 47.10 59.04 57.71 58.30 58.45 59.25 54.14 42.70 55.01 52.22 55.39 ----- 72.27
 17.  45.47 47.15 42.91 34.69 46.13 59.22 58.55 58.80 59.32 59.77 55.12 41.94 53.95 51.29 53.79 72.27 -----
 
The dendrogram in SPSS using average linkage between groups via correlation is here:


Lc0_02_dendr.jpg

First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.

I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:

1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.

2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.

3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.

4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.


If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.

Do you have any idea why this behavior occurs? What does seem plausible to you?
I took a look at the SIMEX EPDs, and classified them by material on board.
There’s 8238 different EPDs, with 16 possible piece conformations, and the count of each type. It’s not a balanced suite across chess positions, which you would probably need to be making conclusive observations other than simple general similarity.

KQRBNP 5202
KRBNP 999
KQRBP 492
KRBP 301
KRP 268
KQRNP 207
KQRP 152
KBNP 149
KRNP 147
KQBNP 113
KBP 57
KQBP 47
KNP 44
KQP 34
KP 15
KQNP 11
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: What the heck happens here?

Post by Rebel »

Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.
Can you zip similarity.data and attach it here?

I want to produce a HTML with SIMEX.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: What the heck happens here?

Post by Laskos »

Rebel wrote: Wed Aug 21, 2019 9:30 am
Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.
Can you zip similarity.data and attach it here?

I want to produce a HTML with SIMEX.
The zip file is too large to be attached here (260KB).
Also, I was not that careful with several regular engines, I am not sure I didn't forget to put 1 or 2 engines on 4 threads, as I did for all regular engines. The goal was to see the behavior of SF with greatly increased TC. In principle it should be fine, I am just not sure.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: What the heck happens here?

Post by Laskos »

chrisw wrote: Tue Aug 20, 2019 10:10 pm

I took a look at the SIMEX EPDs, and classified them by material on board.
There’s 8238 different EPDs, with 16 possible piece conformations, and the count of each type. It’s not a balanced suite across chess positions, which you would probably need to be making conclusive observations other than simple general similarity.

KQRBNP 5202
KRBNP 999
KQRBP 492
KRBP 301
KRP 268
KQRNP 207
KQRP 152
KBNP 149
KRNP 147
KQBNP 113
KBP 57
KQBP 47
KNP 44
KQP 34
KP 15
KQNP 11
I guess it shouldn't be that bad. I had a look too, one doesn't need a general representative set of the in-game positions. For example, half of the positions would be endgame positions if they are representative of the games move by move, and I would avoid that.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: What the heck happens here?

Post by Laskos »

Pio wrote: Tue Aug 20, 2019 10:03 pm
Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.

Hardware: 4 i7 cores, RTX 2070 GPU.

The similarity matrix is here:

Code: Select all

sim version 3

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 10.0)
  2) Ethereal 11.50 (time: 100 ms  scale: 10.0)
  3) Fire 7.1 (time: 100 ms  scale: 10.0)
  4) Fruit 2.1 (time: 100 ms  scale: 10.0)
  5) Komodo 13.02 (time: 100 ms  scale: 10.0)
  6) Lc0 11261 (time: 100 ms  scale: 10.0)
  7) Lc0 32930 (time: 100 ms  scale: 10.0)
  8) Lc0 42184 (time: 100 ms  scale: 10.0)
  9) Lc0 42850 (time: 100 ms  scale: 10.0)
 10) Lc0_320x24b (time: 100 ms  scale: 10.0)
 11) Lc0_320x24b x0.2 (time: 100 ms  scale: 2.0)
 12) Senpai 1.0 (time: 100 ms  scale: 10.0)
 13) SF 10 (time: 100 ms  scale: 10.0)
 14) SF 8 (time: 100 ms  scale: 10.0)
 15) SF dev (time: 100 ms  scale: 10.0)
 16) SF dev x10 (time: 100 ms  scale: 100.0)
 17) SF dev x30 (time: 100 ms  scale: 300.0)

         1     2     3     4     5     6     7     8     9    10    11    12	13    14    15    16    17
  1.  ----- 49.19 45.69 37.95 48.17 45.19 43.86 44.14 44.25 44.57 44.68 46.88 50.36 52.22 49.93 46.58 45.47
  2.  49.19 ----- 48.05 39.58 48.57 47.01 45.40 45.88 46.07 45.51 45.65 48.66 52.15 52.48 52.09 47.49 47.15
  3.  45.69 48.05 ----- 40.17 46.43 42.74 41.85 42.69 42.35 42.17 43.06 45.35 48.36 50.24 47.69 43.76 42.91
  4.  37.95 39.58 40.17 ----- 39.51 35.34 35.08 35.31 35.07 35.45 35.71 46.55 37.81 39.88 37.50 34.78 34.69
  5.  48.17 48.57 46.43 39.51 ----- 46.07 44.70 45.39 45.25 45.38 45.34 48.28 50.10 51.18 50.15 47.10 46.13
  6.  45.19 47.01 42.74 35.34 46.07 ----- 73.45 73.46 73.95 74.13 66.64 42.40 51.41 49.22 50.66 59.04 58.92
  7.  43.86 45.40 41.85 35.08 44.70 73.45 ----- 77.70 77.43 78.42 68.35 42.02 50.12 47.50 49.58 57.71 58.05
  8.  44.14 45.88 42.69 35.31 45.39 73.46 77.70 ----- 84.94 81.80 70.64 43.01 49.70 48.18 49.56 58.30 58.30
  9.  44.25 46.07 42.35 35.07 45.25 73.95 77.43 84.94 ----- 82.58 70.72 42.61 50.08 48.49 49.73 58.45 58.82
 10.  44.57 45.51 42.17 35.45 45.38 74.13 78.42 81.80 82.58 ----- 72.29 42.11 50.28 48.05 50.35 59.25 59.07
 11.  44.68 45.65 43.06 35.71 45.34 66.64 68.35 70.64 70.72 72.29 ----- 42.61 49.78 47.16 49.42 54.14 54.62
 12.  46.88 48.66 45.35 46.55 48.28 42.40 42.02 43.01 42.61 42.11 42.61 ----- 46.42 48.07 46.56 42.70 41.94
 13.  50.36 52.15 48.36 37.81 50.10 51.41 50.12 49.70 50.08 50.28 49.78 46.42 ----- 58.76 63.17 55.01 53.95
 14.  52.22 52.48 50.24 39.88 51.18 49.22 47.50 48.18 48.49 48.05 47.16 48.07 58.76 ----- 57.13 52.22 51.29
 15.  49.93 52.09 47.69 37.50 50.15 50.66 49.58 49.56 49.73 50.35 49.42 46.56 63.17 57.13 ----- 55.39 53.79
 16.  46.58 47.49 43.76 34.78 47.10 59.04 57.71 58.30 58.45 59.25 54.14 42.70 55.01 52.22 55.39 ----- 72.27
 17.  45.47 47.15 42.91 34.69 46.13 59.22 58.55 58.80 59.32 59.77 55.12 41.94 53.95 51.29 53.79 72.27 -----
 
The dendrogram in SPSS using average linkage between groups via correlation is here:


Lc0_02_dendr.jpg

First to observe that all Lc0 nets from different runs used show a very high similarity among them. Different runs might show drifts to different optima, if not in strength, at least in move selection for quiet moves having no clear best move. Second, Stockfish_dev at much longer times (x10 and x30) goes away from the regular engines family clearly towards Lc0 family. The similarity of SF_dev x30 with SF_dev is 53.79% (the highest in the pool of regular engines), but goes above 59% similarity with many of Lc0 family. At very long times per position, SF_dev clusters with Lc0 engines. At the same time, Lc0 at very short time per position, doesn't drift away from Lc0 the family.

I tried to imagine what the hell that is plausible to mean.
The most plausible explanation comes as:

1/ Some 75-80% of quiet positions in chess with apparently several possible best moves, in fact do have a unique solution from the WDL perfect chess point of view. This does seem a bit strange to me, as I expected a lower number.

2/ Lc0 at 1s/move with a good net gives the correct solution from the perfect player point of view in some 90-95% of cases in these quiet positions.

3/ Strong regular engines at 1s/move give 40-55% correct WDL solutions for these positions.

4/ SF_dev at much longer time control like 30s/move improves that to maybe 75-80% correct WDL solutions.


If you combine /1 - 4/, you will get that sort of clustering. This seems plausible to me, but I don't like /1/, because 6-7 men TBs seem to not behave like that, but maybe they are not representative.

Do you have any idea why this behavior occurs? What does seem plausible to you?
Hi Kai!

First of all I must say I find your posts very interesting.

I believe that the LMR is the culprit ;) because it will use statistics to order the moves.

It would be really interesting if you tested my hypothesis by removing LMR but increase the search time so that the strength of the stockfish with and without LMR will perform the same.

Another thing that would be interesting is to use the history counters not only for ordering the moves but also to add a little bonus or malus to the evaluation. Of course you will have to treat the bonus as a separate score so you will not get a feedback loop that will make good history lead to better evaluation that will lead to even better history ...

/Pio

Thanks, very interesting.

I am testing right now Komodo 13.1 with and without LMR. Time handicap to same strength seems a factor of 3-4 (not sure how it scales), for now I tested at regular time per position, no-LMR Komodo at three times longer time. Now the run is going with no-LMR at 30x time per position, let's see if it migrates to Lc0 branch. It will take some time. For now, Komodo 13.1 (3x no-LMR to same strength) clusters as here:
Lc0_03_dendr.jpg
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: What the heck happens here?

Post by Rebel »

Laskos wrote: Wed Aug 21, 2019 12:02 pm
Rebel wrote: Wed Aug 21, 2019 9:30 am
Laskos wrote: Tue Aug 20, 2019 9:22 am I took the Sim tester which has some 8,000+ quiet positions with unclear best move and tested engines, including versions of Lc0 nets, all at 1 second / position and additionally SF_dev at much longer 10s/position and 30s/poisition. Also, for some check, one Lc0 at 0.2s/position.
Can you zip similarity.data and attach it here?

I want to produce a HTML with SIMEX.
The zip file is too large to be attached here (260KB).
Also, I was not that careful with several regular engines, I am not sure I didn't forget to put 1 or 2 engines on 4 threads, as I did for all regular engines. The goal was to see the behavior of SF with greatly increased TC. In principle it should be fine, I am just not sure.
Okay, did the SIMEX test with the engines I have, but..... on depth=1 only, thus canceling (the main) search. So basically testing the evaluation function on similarity. This is an alternative way of looking at things.

http://rebel13.nl/html/kai.html

NN are the Lc0 network files you used.
90% of coding is debugging, the other 10% is writing bugs.