My opinion is that it is better to decide later if to use lazy smp.
I suggest to use lazy smp in the final only in case that there is some evidence that support the conjecture that lazy smp is better at long time control.
In other words only if you see more than 50% for lazy smp against master at long time control use lazy smp and otherwise do not use lazy smp.
So far I did not see more than 50% and the only match that I know at long time control is the following match when lazy smp got one loss and many draws(unfortunately the starting positions of this match were not good and were too drawish and IMO it is better to use opening like 1.e4 h5 when it is clear that white has an advantage but not clear if white wins so hopefully we will get 1.5:0.5 in part of the matches.
http://talkchess.com/forum/viewtopic.ph ... 1&start=10
Martin on the SF loss on time
Moderators: hgm, Rebel, chrisw
-
- Posts: 10314
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Martin on the SF loss on time
Do you mean elo difference, as measured by score? That is an incorrect measure. You have to compare apples with apples.elo differences generally shrink massively between 60" games and TCEC,
The correct measure is "resolution". I.e. elo difference (as measured by score) divided by error bars (mainly controlled by the draw ratio).
I am quite interested to know if resolution indeed goes down with TC.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Martin on the SF loss on time
It's a good question, the answer seems again empirical. I did a couple of years ago a test with Houdini, and the fraction wins/losses seemed either pretty constant with TC (at longer TC) or mildly increasing. For Komodo-SF matches at TCEC, it was an apparent increase in wins/losses compared to shorter TC. I picked quickly two parameters which seem reasonable to me:Michel wrote:
I am quite interested to know if resolution indeed goes down with TC.
draw_ratio -- monotonous with TC
win/loss
Then:
sigma = sqrt(1 - draw_ratio) -- for small difference between win and loss;
win - loss = (win/loss - 1 + draw_ratio - draw_ratio*win/loss)/(1 + win/loss)
1/ Constant win/loss (some empirical data suggest it)
(win-loss)/sigma:
2/ win/loss increases as 1/sqrt(1-draw_ratio) (some other empirical data suggest it, especially Komodo-SF matches)
(win-loss)/sigma:
To establish a more general rule, one has to take some database with engines at different TC, but I bet it will vary on engine and TC chosen.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Martin on the SF loss on time
If the Lazy SMP version scales well, the Elo difference on many cores should be large enough that a reasonable number of games would show it.lucasart wrote:I think SF should go with the master branch, for the final. Lazy SMP is stronger than master, but in 3h games the difference should be small:
* We do not have statistically reliable data to know the elo gain at 3h games, and we never will (because statistically reliable means tens of thousand of games, which we cannot do at this tc).
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Martin on the SF loss on time
I tried to investigate a bit more and to incorporate a draw model (assuming logistic ELO), but it seems the TC scaling has little to do with the usual Bayeselo and Davidson draw models. Then I took the "fastgm" website results at 10 minutes per game and 1 minute per game, for example Stockfish 6 performance looks like that:
A global result for this data is:
60s + 0.6s
Stockfish 6 : 2250 (+1195, =859,-196), 72.2 %
d=0.382
w/l=6.10
w-l=0.444
sigma=0.6489
(w-l)/sigma = 0.684
600s + 6s
Stockfish 6 : 2700 (+1163,=1390,-147), 68.8 %
d=0.515
w/l=7.91
w-l=0.376
sigma=0.5863
(w-l)/sigma = 0.641
Resolution decreases with TC. Win/loss increases indeed with TC, but not enough to offset the diminishing (win-loss), which decreases more than sigma.
This data I would call "large strength difference" decrease in resolution with TC, and it is very well fitted by the model win/loss ~ C/(1-draw) for the value of C between 3 and 4. The plot of resolution versus draw rate looks like that:
If one picks only close in strength results from database, the resolution increases with TC, and now we have "small strength difference" increase in resolution with TC. It is described by the same win/loss ~ C/(1-draw), but now C is around 1:
I think that Miguel's wilos are better suited to describe these results than our usual elos.
Code: Select all
60s + 0.6s
Stockfish 6 64 3369 : 2250 (+1195, =859, -196), 72.2 %
vs. : games ( +, =, -), (%) : Diff
Komodo 9 64-bit : 250 ( 79, 118, 53), 55.2 : +16
Houdini 4 x64 : 250 ( 98, 116, 36), 62.4 : +66
Gull 3 x64 : 250 ( 117, 102, 31), 67.2 : +150
Fire 4 x64 : 250 ( 128, 106, 16), 72.4 : +163
Equinox 3.30 x64mp : 250 ( 138, 93, 19), 73.8 : +192
Critter 1.6a 64-bit : 250 ( 127, 104, 19), 71.6 : +192
Bouquet 1.8 x64 : 250 ( 155, 88, 7), 79.6 : +218
Deep Rybka 4.1 x64 : 250 ( 171, 70, 9), 82.4 : +263
Hannibal 1.5 x64 : 250 ( 182, 62, 6), 85.2 : +369
600s + 6s
Stockfish 6 3130 : 2700 (+1163,=1390,-147), 68.8 %
vs. : games ( +, =, -), (%) : Diff
Komodo 9 : 300 ( 43, 210, 47), 49.3 : -3
Houdini 4 : 300 ( 111, 154, 35), 62.7 : +73
Gull 3 : 300 ( 109, 176, 15), 65.7 : +115
Fire 4 : 300 ( 108, 179, 13), 65.8 : +122
Equinox 3.30 : 300 ( 134, 160, 6), 71.3 : +163
Critter 1.6a : 300 ( 145, 147, 8), 72.8 : +174
Bouquet 1.8 : 300 ( 167, 128, 5), 77.0 : +201
Rybka 4.1 : 300 ( 174, 111, 15), 76.5 : +208
Hannibal 1.5 : 300 ( 172, 125, 3), 78.2 : +249
60s + 0.6s
Stockfish 6 : 2250 (+1195, =859,-196), 72.2 %
d=0.382
w/l=6.10
w-l=0.444
sigma=0.6489
(w-l)/sigma = 0.684
600s + 6s
Stockfish 6 : 2700 (+1163,=1390,-147), 68.8 %
d=0.515
w/l=7.91
w-l=0.376
sigma=0.5863
(w-l)/sigma = 0.641
Resolution decreases with TC. Win/loss increases indeed with TC, but not enough to offset the diminishing (win-loss), which decreases more than sigma.
This data I would call "large strength difference" decrease in resolution with TC, and it is very well fitted by the model win/loss ~ C/(1-draw) for the value of C between 3 and 4. The plot of resolution versus draw rate looks like that:
If one picks only close in strength results from database, the resolution increases with TC, and now we have "small strength difference" increase in resolution with TC. It is described by the same win/loss ~ C/(1-draw), but now C is around 1:
I think that Miguel's wilos are better suited to describe these results than our usual elos.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Martin on the SF loss on time
It is not clear what "scales" means in this context. I have seen references to BOTH higher NPS, AND longer time-to-depth. If I had time, I'd run a few tests on it to see what it does on my 20 or 24 core boxes... But that's time away from working on my code, which is not exactly a good use of time.syzygy wrote:If the Lazy SMP version scales well, the Elo difference on many cores should be large enough that a reasonable number of games would show it.lucasart wrote:I think SF should go with the master branch, for the final. Lazy SMP is stronger than master, but in 3h games the difference should be small:
* We do not have statistically reliable data to know the elo gain at 3h games, and we never will (because statistically reliable means tens of thousand of games, which we cannot do at this tc).
-
- Posts: 10314
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Martin on the SF loss on time
scales well mean performing significantly better at longer time control.bob wrote:It is not clear what "scales" means in this context. I have seen references to BOTH higher NPS, AND longer time-to-depth. If I had time, I'd run a few tests on it to see what it does on my 20 or 24 core boxes... But that's time away from working on my code, which is not exactly a good use of time.syzygy wrote:If the Lazy SMP version scales well, the Elo difference on many cores should be large enough that a reasonable number of games would show it.lucasart wrote:I think SF should go with the master branch, for the final. Lazy SMP is stronger than master, but in 3h games the difference should be small:
* We do not have statistically reliable data to know the elo gain at 3h games, and we never will (because statistically reliable means tens of thousand of games, which we cannot do at this tc).
The claim is that lazy SMP with many cores performs better than the previous algorithm at long time control.
If there is an advantage of at least 30 elo not at blitz then it is possible to show elo advantage by a test of some hundrends of games and you do not need thousands of games.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Martin on the SF loss on time
Indeed we of course tested lazy_smp in our framework for many tens of thousand of games before to submit to Martin, and we didn't experience any time loss. People stating that lazy smp was untested, simply ignore how SF development works: every patch that goes in is very deeply and strictly tested, much more than in any other engine development that I am aware of.syzygy wrote:Yes I know, but as far as I understand the lag parameter has always been set to 10ms and time management has not really been changed.bob wrote:You did understand that they changed versions prior to this stage? Switching to the new lazy-amp version which apparently exhibits this bug while old versions with normal YBW did not.
Apparently SF loses a bit of time upon finishing a search when it waits for all threads to stop. I would think the YBWC version also stops threads before sending the best move (or it would be cheating) and loses some time on that.
Of course it might simply take the lazy-smp version a bit longer to stop the threads than the previous version, and that difference might be just enough to lose on time under the wrong circumstances...
The only time related difference between old version and lazy one is the way engine stops and waits for the slaves threads to terminate the search before to return the best move.
So considering the above, my take is that stopping the threads in lazy smp requires more time on the particular TCEC hardware.
We didn't had the chance to test on that hardware, nor on similar hardware, given that very few people have access to such powerful machine.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Martin on the SF loss on time
BTW even as of today we were not able to reproduce the time loss on any of our machines, even the most powerful ones.mcostalba wrote: We didn't had the chance to test on that hardware, nor on similar hardware, given that very few people have access to such powerful machine.
Louis tried hard to reproduce the time loss on his big hardware machine, but he failed even under the extreme conditions he threw to SF.
As of today, I am not aware of anybody had some time losses with lazy_smp, the only one seems to be TCEC machine...if this is not unfortunate, well, I don't know what unlucky means
-
- Posts: 1154
- Joined: Fri Jun 23, 2006 5:18 am
Re: Martin on the SF loss on time
I think scale does not (or should not) refer to either NPS or time-to-depth. I think it refers to program strength measured in elo. Which is unfortunate, since the two items you reference are much less time consuming to measure.bob wrote: It is not clear what "scales" means in this context. I have seen references to BOTH higher NPS, AND longer time-to-depth. If I had time, I'd run a few tests on it to see what it does on my 20 or 24 core boxes... But that's time away from working on my code, which is not exactly a good use of time.
-Sam