Komodo MCTS scales worse with TC than Komodo A/B?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

Laskos wrote: Mon Nov 12, 2018 11:51 pm
lkaufman wrote: Fri Nov 09, 2018 5:42 am A couple points worth mentioning. If you want to eliminate the possible distortion of one engine simply being much stronger than the other, I suggest you test Komodo 12.2 mcts (or wait for the bugfix 12.2.1) vs. Komodo 9 (or 9.02 or 9.1 if you prefer), which is our best free version and is very evenly matched with Komodo 12.2 MCTS in my tests. But most likely you will find that normal Komodo scales better from 1 to 10 seconds on one thread. The reason is that at one second per move, Komodo MCTS doesn't have enough time to really "do its thing" and is more or less a crippled normal Komodo. But at ten seconds per move (or even five) the MCTS aspect is in full effect. So my main point is that how Komodo MCTS scales from 1 to 10 seconds on one thread is not predictive of how it would scale from 5" to 50". My data is inconclusive on this point, I think the scaling is pretty similar. We'll know when CCRL has ratings for both 40/4 and 40/40 for Komodo MCTS, or CEGT for 40/4 and 40/20, or fastgm for 10' and 60', which can be compared with Komodo 9.
I think your three queens solution to the draw problem is interesting, but perhaps not so predictive of normal chess. My preferred solution to the draw problem is to start with positions evaluated around 0.7 or so by Komodo, counting draws as wins for the bad side. With alternating colors, no draws at all, equal chances, and reasonably normal chess.
In 4 days I managed to perform this test you propose, and then I interpreted the result using a mathematically sound pentanomial variance (error margin) for paired (side-reversed) games developed and derived by Michel Van den Bergh and me, and described briefly here https://www.chessprogramming.org/Match_Statistics. My openings are pretty markedly unbalanced (80cp-100cp advantage for White), are played side and reversed, and draw rate is kept pretty low. The correct pentanomial error margins in this case are 1.8-2.2 times smaller than naive trinomial error margins usually shown in rating tools, because the outcomes in paired games are pretty highly correlated.

The tests are at 6'' per move and 60'' per move. on 1 i7 3.8 GHz thread (4 concurrent games are running on 4 cores). I set hash at 512MB in both cases. The results are:

6'' per move:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 804.127 sec)
Settings = Gauntlet/512MB/6000ms per move/M 1500cp for 5 moves, D 160 moves/EPD:C:\LittleBlitzer\OP_08_10_W_Trim.epd(5840)
Time = 20394 sec elapsed, 0 sec remaining
 1.  Komodo 9.1               	49.0/100	37-39-24  	(L: m=39 t=0 i=0 a=0)	(D: r=4 i=18 f=0 s=1 a=1)	(tpm=5736.5 d=27.38 nps=2090592)
 2.  Komodo 12.2 MCTS         	51.0/100	39-37-24  	(L: m=37 t=0 i=0 a=0)	(D: r=4 i=18 f=0 s=1 a=1)	(tpm=5934.5 d=12.90 nps=2064)
-6.9 Elo points with 15.9 Elo points 1 sigma pentanomial error margin.


60'' per move

Code: Select all

Games Completed = 100 of 100 (Avg game length = 9338.734 sec)
Settings = Gauntlet/512MB/60000ms per move/M 1500cp for 5 moves, D 160 moves/EPD:C:\LittleBlitzer\OP_08_10_W_Trim.epd(5840)
Time = 241335 sec elapsed, 0 sec remaining
 1.  Komodo 9.1               	55.5/100	40-29-31  	(L: m=29 t=0 i=0 a=0)	(D: r=8 i=17 f=1 s=0 a=5)	(tpm=57467.1 d=33.96 nps=2194737)
 2.  Komodo 12.2 MCTS         	44.5/100	29-40-31  	(L: m=39 t=0 i=0 a=1)	(D: r=8 i=17 f=1 s=0 a=5)	(tpm=59428.3 d=17.66 nps=1658)
38.4 Elo points with 15.2 Elo points 1 sigma pentanomial error margin.

============================================================


  • Difference: 45.3 Elo points.
    1 sigma (pentanomial) for the difference: 22.0 Elo points.

98.0% that Komodo 12.2 MCTS scales worse than Komodo 9.1 A/B. That is already a fairly significant result.
If applying your adjudications (not very mathematically sound), we still get a pretty significant result. So, count all draws from advantageous starting position as losses.

For STC we have:
52:48 MCTS against A/B
For LTC we have:
40:60 MCTS against A/B

96% likelihood that A/B scales better. Which is again fairly significant, although the sound mathematical pentanomial variance gives even higher 98% likelihood.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

Thanks for running this. I don't think that the poor scaling is inherent to MCTS, rather it's just that we have tuned our parameters at levels around 3 seconds a move or so and that means that they will be poorly tuned for time controls far from that level. So I have to start doing some tuning runs at much longer time controls to address this problem. We improved Komodo MCTS in blitz by 225 elo since last release according to CEGT testing, while I would expect the gains at much longer time controls to be more in the ballpark of 150 or so. We currently have MCTS parameter values set independent of available time, but there is no reason this has to be so. It's just a matter of devoting the resources to seeing what works with more time.

Quick question: What percentage of games were won by the superior side? I estimate +70 cp is about the point where Komodo wins half of the games, and you were using more than that.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

lkaufman wrote: Tue Nov 13, 2018 6:36 pm Thanks for running this. I don't think that the poor scaling is inherent to MCTS, rather it's just that we have tuned our parameters at levels around 3 seconds a move or so and that means that they will be poorly tuned for time controls far from that level. So I have to start doing some tuning runs at much longer time controls to address this problem. We improved Komodo MCTS in blitz by 225 elo since last release according to CEGT testing, while I would expect the gains at much longer time controls to be more in the ballpark of 150 or so. We currently have MCTS parameter values set independent of available time, but there is no reason this has to be so. It's just a matter of devoting the resources to seeing what works with more time.

Quick question: What percentage of games were won by the superior side? I estimate +70 cp is about the point where Komodo wins half of the games, and you were using more than that.
Yes, I also observed that for example the parameter "MCTS Explore" can have a different optimal value for different TC with pretty heavy impact. It seems MCTS is still in crude, not fine-tuned phase, which is normal for such a new approach. It's a bit premature to talk now about the scaling, until the things settle a bit by tuning the search.

As for win for superior side in the 80cp-100cp range:
at 6s/move: 72/100
at 60s/move: 64/100

All in all, about 65-70%.

Anyway, congratulations for such a strong MCTS engine, with probably the best MC search implementation.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

Laskos wrote: Tue Nov 13, 2018 7:46 pm
lkaufman wrote: Tue Nov 13, 2018 6:36 pm Thanks for running this. I don't think that the poor scaling is inherent to MCTS, rather it's just that we have tuned our parameters at levels around 3 seconds a move or so and that means that they will be poorly tuned for time controls far from that level. So I have to start doing some tuning runs at much longer time controls to address this problem. We improved Komodo MCTS in blitz by 225 elo since last release according to CEGT testing, while I would expect the gains at much longer time controls to be more in the ballpark of 150 or so. We currently have MCTS parameter values set independent of available time, but there is no reason this has to be so. It's just a matter of devoting the resources to seeing what works with more time.

Quick question: What percentage of games were won by the superior side? I estimate +70 cp is about the point where Komodo wins half of the games, and you were using more than that.
Yes, I also observed that for example the parameter "MCTS Explore" can have a different optimal value for different TC with pretty heavy impact. It seems MCTS is still in crude, not fine-tuned phase, which is normal for such a new approach. It's a bit premature to talk now about the scaling, until the things settle a bit by tuning the search.

As for win for superior side in the 80cp-100cp range:
at 6s/move: 72/100
at 60s/move: 64/100

All in all, about 65-70%.

Anyway, congratulations for such a strong MCTS engine, with probably the best MC search implementation.
Thanks, your 65 to 70 % wins suggests to me that if the goal is 50% the proper advantage may be more like 0.6 or so. Regarding MCTS Explore, do you have reason to believe that the number should go up with more time, or go down? Knowing this would save us some testing time.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

lkaufman wrote: Tue Nov 13, 2018 8:29 pm
Laskos wrote: Tue Nov 13, 2018 7:46 pm
lkaufman wrote: Tue Nov 13, 2018 6:36 pm Thanks for running this. I don't think that the poor scaling is inherent to MCTS, rather it's just that we have tuned our parameters at levels around 3 seconds a move or so and that means that they will be poorly tuned for time controls far from that level. So I have to start doing some tuning runs at much longer time controls to address this problem. We improved Komodo MCTS in blitz by 225 elo since last release according to CEGT testing, while I would expect the gains at much longer time controls to be more in the ballpark of 150 or so. We currently have MCTS parameter values set independent of available time, but there is no reason this has to be so. It's just a matter of devoting the resources to seeing what works with more time.

Quick question: What percentage of games were won by the superior side? I estimate +70 cp is about the point where Komodo wins half of the games, and you were using more than that.
Yes, I also observed that for example the parameter "MCTS Explore" can have a different optimal value for different TC with pretty heavy impact. It seems MCTS is still in crude, not fine-tuned phase, which is normal for such a new approach. It's a bit premature to talk now about the scaling, until the things settle a bit by tuning the search.

As for win for superior side in the 80cp-100cp range:
at 6s/move: 72/100
at 60s/move: 64/100

All in all, about 65-70%.

Anyway, congratulations for such a strong MCTS engine, with probably the best MC search implementation.
Thanks, your 65 to 70 % wins suggests to me that if the goal is 50% the proper advantage may be more like 0.6 or so. Regarding MCTS Explore, do you have reason to believe that the number should go up with more time, or go down? Knowing this would save us some testing time.
Probably up. At least a testing suite which measures pretty well the strength, ERET.epd of 111 positions, seems to confirm that. I checked in real games only for the shortest 3s/move time control that the default MCTS Explore=6 is the best. For longer TC, I am unable to get sufficient number of games. 3s/move is about the shortest TC one can use as testing goes, both on suites and in games with MCTS version. Here are results for the suite:

3.0s/position:

MCTS Explore=4:
score=21/111 [averages on correct positions: depth=12.2 time=2.53 nodes=419]

Def (6):
score=29/111 [averages on correct positions: depth=12.2 time=2.48 nodes=399]

MCTS Explore=12:
score=25/111 [averages on correct positions: depth=11.8 time=2.35 nodes=368]



10.0s/position:

Def(6):
score=22/111 [averages on correct positions: depth=12.6 time=3.46 nodes=582]

MCTS Explore=12:
score=38/111 [averages on correct positions: depth=12.9 time=4.11 nodes=656]

MCTS Explore=18:
score=35/111 [averages on correct positions: depth=12.8 time=3.83 nodes=622]



60s/position:

Def (6):
score=38/111 [averages on correct positions: depth=13.3 time=6.58 nodes=1032]

MCTS Explore=12:
score=45/111 [averages on correct positions: depth=14.1 time=11.96 nodes=1783]

MCTS Explore=18:
score=51/111 [averages on correct positions: depth=13.9 time=11.06 nodes=1575]

MCTS Explore=24:
score=49/111 [averages on correct positions: depth=13.9 time=10.29 nodes=1607]



We see that from the best Default value 6 at 3s/move, the best seems to be 12 at 10s/move and 18 at 60s/move.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

Laskos wrote: Wed Nov 14, 2018 5:05 pm
lkaufman wrote: Tue Nov 13, 2018 8:29 pm
Laskos wrote: Tue Nov 13, 2018 7:46 pm
lkaufman wrote: Tue Nov 13, 2018 6:36 pm Thanks for running this. I don't think that the poor scaling is inherent to MCTS, rather it's just that we have tuned our parameters at levels around 3 seconds a move or so and that means that they will be poorly tuned for time controls far from that level. So I have to start doing some tuning runs at much longer time controls to address this problem. We improved Komodo MCTS in blitz by 225 elo since last release according to CEGT testing, while I would expect the gains at much longer time controls to be more in the ballpark of 150 or so. We currently have MCTS parameter values set independent of available time, but there is no reason this has to be so. It's just a matter of devoting the resources to seeing what works with more time.

Quick question: What percentage of games were won by the superior side? I estimate +70 cp is about the point where Komodo wins half of the games, and you were using more than that.
Yes, I also observed that for example the parameter "MCTS Explore" can have a different optimal value for different TC with pretty heavy impact. It seems MCTS is still in crude, not fine-tuned phase, which is normal for such a new approach. It's a bit premature to talk now about the scaling, until the things settle a bit by tuning the search.

As for win for superior side in the 80cp-100cp range:
at 6s/move: 72/100
at 60s/move: 64/100

All in all, about 65-70%.

Anyway, congratulations for such a strong MCTS engine, with probably the best MC search implementation.
Thanks, your 65 to 70 % wins suggests to me that if the goal is 50% the proper advantage may be more like 0.6 or so. Regarding MCTS Explore, do you have reason to believe that the number should go up with more time, or go down? Knowing this would save us some testing time.
Probably up. At least a testing suite which measures pretty well the strength, ERET.epd of 111 positions, seems to confirm that. I checked in real games only for the shortest 3s/move time control that the default MCTS Explore=6 is the best. For longer TC, I am unable to get sufficient number of games. 3s/move is about the shortest TC one can use as testing goes, both on suites and in games with MCTS version. Here are results for the suite:

3.0s/position:

MCTS Explore=4:
score=21/111 [averages on correct positions: depth=12.2 time=2.53 nodes=419]

Def (6):
score=29/111 [averages on correct positions: depth=12.2 time=2.48 nodes=399]

MCTS Explore=12:
score=25/111 [averages on correct positions: depth=11.8 time=2.35 nodes=368]



10.0s/position:

Def(6):
score=22/111 [averages on correct positions: depth=12.6 time=3.46 nodes=582]

MCTS Explore=12:
score=38/111 [averages on correct positions: depth=12.9 time=4.11 nodes=656]

MCTS Explore=18:
score=35/111 [averages on correct positions: depth=12.8 time=3.83 nodes=622]



60s/position:

Def (6):
score=38/111 [averages on correct positions: depth=13.3 time=6.58 nodes=1032]

MCTS Explore=12:
score=45/111 [averages on correct positions: depth=14.1 time=11.96 nodes=1783]

MCTS Explore=18:
score=51/111 [averages on correct positions: depth=13.9 time=11.06 nodes=1575]

MCTS Explore=24:
score=49/111 [averages on correct positions: depth=13.9 time=10.29 nodes=1607]



We see that from the best Default value 6 at 3s/move, the best seems to be 12 at 10s/move and 18 at 60s/move.
Thanks, that's quite helpful. Do you have some comparison numbers for other engines? I'm wondering if the problem solving score of Komodo MCTS is in the ballpark of the score of similarly-rated normal engines (Ethereal 11, SF 6, Komodo 9 for example). On the Arasan set Komodo MCTS seems to score less than I would expect from its rating, which I guess is not surprising since the idea of MCTS engines is to find moves likely to work rather than unlikely moves that work only in an exact sequence. But Komodo MCTS is kind of a hybrid so it may not bee too bad at problems.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

lkaufman wrote: Wed Nov 14, 2018 5:35 pm
Laskos wrote: Wed Nov 14, 2018 5:05 pm
lkaufman wrote: Tue Nov 13, 2018 8:29 pm
Laskos wrote: Tue Nov 13, 2018 7:46 pm
lkaufman wrote: Tue Nov 13, 2018 6:36 pm Thanks for running this. I don't think that the poor scaling is inherent to MCTS, rather it's just that we have tuned our parameters at levels around 3 seconds a move or so and that means that they will be poorly tuned for time controls far from that level. So I have to start doing some tuning runs at much longer time controls to address this problem. We improved Komodo MCTS in blitz by 225 elo since last release according to CEGT testing, while I would expect the gains at much longer time controls to be more in the ballpark of 150 or so. We currently have MCTS parameter values set independent of available time, but there is no reason this has to be so. It's just a matter of devoting the resources to seeing what works with more time.

Quick question: What percentage of games were won by the superior side? I estimate +70 cp is about the point where Komodo wins half of the games, and you were using more than that.
Yes, I also observed that for example the parameter "MCTS Explore" can have a different optimal value for different TC with pretty heavy impact. It seems MCTS is still in crude, not fine-tuned phase, which is normal for such a new approach. It's a bit premature to talk now about the scaling, until the things settle a bit by tuning the search.

As for win for superior side in the 80cp-100cp range:
at 6s/move: 72/100
at 60s/move: 64/100

All in all, about 65-70%.

Anyway, congratulations for such a strong MCTS engine, with probably the best MC search implementation.
Thanks, your 65 to 70 % wins suggests to me that if the goal is 50% the proper advantage may be more like 0.6 or so. Regarding MCTS Explore, do you have reason to believe that the number should go up with more time, or go down? Knowing this would save us some testing time.
Probably up. At least a testing suite which measures pretty well the strength, ERET.epd of 111 positions, seems to confirm that. I checked in real games only for the shortest 3s/move time control that the default MCTS Explore=6 is the best. For longer TC, I am unable to get sufficient number of games. 3s/move is about the shortest TC one can use as testing goes, both on suites and in games with MCTS version. Here are results for the suite:

3.0s/position:

MCTS Explore=4:
score=21/111 [averages on correct positions: depth=12.2 time=2.53 nodes=419]

Def (6):
score=29/111 [averages on correct positions: depth=12.2 time=2.48 nodes=399]

MCTS Explore=12:
score=25/111 [averages on correct positions: depth=11.8 time=2.35 nodes=368]



10.0s/position:

Def(6):
score=22/111 [averages on correct positions: depth=12.6 time=3.46 nodes=582]

MCTS Explore=12:
score=38/111 [averages on correct positions: depth=12.9 time=4.11 nodes=656]

MCTS Explore=18:
score=35/111 [averages on correct positions: depth=12.8 time=3.83 nodes=622]



60s/position:

Def (6):
score=38/111 [averages on correct positions: depth=13.3 time=6.58 nodes=1032]

MCTS Explore=12:
score=45/111 [averages on correct positions: depth=14.1 time=11.96 nodes=1783]

MCTS Explore=18:
score=51/111 [averages on correct positions: depth=13.9 time=11.06 nodes=1575]

MCTS Explore=24:
score=49/111 [averages on correct positions: depth=13.9 time=10.29 nodes=1607]



We see that from the best Default value 6 at 3s/move, the best seems to be 12 at 10s/move and 18 at 60s/move.
Thanks, that's quite helpful. Do you have some comparison numbers for other engines? I'm wondering if the problem solving score of Komodo MCTS is in the ballpark of the score of similarly-rated normal engines (Ethereal 11, SF 6, Komodo 9 for example). On the Arasan set Komodo MCTS seems to score less than I would expect from its rating, which I guess is not surprising since the idea of MCTS engines is to find moves likely to work rather than unlikely moves that work only in an exact sequence. But Komodo MCTS is kind of a hybrid so it may not bee too bad at problems.
The test was on one thread. Here are the results on 4 threads at 10s/position for regular engines and Lc0, which can be compared roughly to 60s/position of Komodo MCTS:

Code: Select all

10 seconds/position 
4 threads 
512 MB Hash



Houdni 6.03 Tactical 
Result: 81 out of 111 = 72.9%. Average time = 1.80s / 15.44

SF_dev
Result: 80 out of 111 = 72.0%. Average time = 1.60s / 18.06

Houdini 6.03
Result: 78 out of 111 = 70.2%. Average time = 1.92s / 16.83

Komodo 12.1.1
Result: 71 out of 111 = 63.9%. Average time = 2.26s / 17.97

Lc0 v18.1 ID11261
Result: 59 out of 111 = 53.1%. Average time = 1.73s / 7.67

Deep Shredder 13
Result: 55 out of 111 = 49.5%. Average time = 2.68s / 20.96

Ethereal 11.00
Result: 53 out of 111 = 47.7%. Average time = 2.99s / 19.96

Andscacs 0.94
Result: 49 out of 111 = 44.1%. Average time = 2.17s / 18.65

Fire 7.1
Result: 45 out of 111 = 40.5%. Average time = 3.36s / 15.82
So, at its best setting, Komodo MCTS seems to perform on this suite just a bit lower or on par with regular engines. But at default settings and 60s/position, it seems to underperform on suite quite a bit.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

Ten seconds on four thread as typically equal to 30 seconds on one thread. Interpolating between your peak results at ten and 30 seconds (logarithmically) gives Komodo MCTS a bit over 46 at that level, a point more than the very strong Fire 7.1 but 7 below Ethereal 11, rated a bit below Komodo 12.2.2 MCTS so far. So I completely agree with your comments. We are now testing your suggestion that we can use higher Explore value at longer time limits. So far the evidence suggests that this is so but is far from significant. More games and other values should clarify.
Komodo rules!
Gary Internet
Posts: 60
Joined: Thu Jan 04, 2018 7:09 pm

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Gary Internet »

Komodo 12.2 MCTS just hit the FastGM rating list at bullet time control - http://fastgm.de/60-0.60.html

It places right between Ethereal 11.00 and Fire 7.1. On the complete list it ranks exactly one place above Komodo 9 :D

TCEC Division 4 will be a good test of how it performs at long time control. It's already crashed once apparently although the crash isn't being displayed on the TCEC GUI because of a bug of some kind.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

Gary Internet wrote: Thu Nov 15, 2018 4:38 am Komodo 12.2 MCTS just hit the FastGM rating list at bullet time control - http://fastgm.de/60-0.60.html

It places right between Ethereal 11.00 and Fire 7.1. On the complete list it ranks exactly one place above Komodo 9 :D

TCEC Division 4 will be a good test of how it performs at long time control. It's already crashed once apparently although the crash isn't being displayed on the TCEC GUI because of a bug of some kind.
That placement is totally in line with my own testing. We already found and fixed a bug that was probably related to the crash, although I'm not sure that the problem is fully solved yet. I think that it is likely that the huge number of threads rather than the long time control may be related to the problem. It is already clear that "MCTS Explore" should be raised with longer time limits and/or more threads, but how much and what the relationship is have yet to be determined.
Komodo rules!