stevenaaus wrote:I've added per-game time control to Scid vs. PC's tournament feature if anyone wants to mess around. (Komodo's per-move time control still seems broke) \
I will check that out, I would like for that to work.
At the moment it's alpha code, for UCI only... I think i'm doing it right. Code is in svn.
Thanks, CCRL 40/4 gives 70 +/- 38 Elo points (95%) improvement, CEGT 40/20 gives 110 +/- 27 points (95%), maybe they will converge to something inside mine 93 +/- 15 (95%) result, I am really curious if my result is valid.
Kai
We have thousands of games that says it's 100 ELO, but that is at time controls that are around 40 seconds per game on very fast hardware (overclocked i7 6 core.)
YMMV
Don
Don, I think that tests at 40s per game on fast hardware (with a solid increment, say equivalent to 2.5s + 0.25s) are very representative, excluding time management which is another problem. I am using 1s + 0.1s (~15s per game) on not so fast hardware. Measuring the _difference_ between two engines it never let me down with more than 10 Elo points beyond error margins (say 30,000 games, ~3 Elo points error margins 95% conf.).
Program Score % Elo + - Draws
1 Komodo64 2.01 64 bit : 262.5/400 65.6 3256 29 29 31.2 %
2 Komodo64 1.3 JA : 137.5/400 34.4 3144 29 29 31.2 %
112 +/- 29 Elo points (95% confidence) improvement in self-play, probably a little less in a gauntlet, but the new Reptilian seems the level of SF 2.01. Will leave it for more games, then a gauntlet.
Kai
Kai,
When we test at that pace it makes a big difference how we set "move overhead milliseconds" and the default is current 20. To be honest, on our tester we don't forfeit even when setting it to zero. It will DEFINITELY affect the result at 1 second games like you are testing. Komodo 1.3 did not have any overhead built in so in your match komodo 2 was playing handicapped.
The purpose of "move overhead milliseconds" is to deal with slower graphical interfaces which I believe could be inadvertently imposing a penalty on each move. And if you are manually operating the computer to play a game with a physical clock you could set it to 2000 or more.
Program Score % Elo + - Draws
1 Komodo64 2.01 64 bit : 262.5/400 65.6 3256 29 29 31.2 %
2 Komodo64 1.3 JA : 137.5/400 34.4 3144 29 29 31.2 %
112 +/- 29 Elo points (95% confidence) improvement in self-play, probably a little less in a gauntlet, but the new Reptilian seems the level of SF 2.01. Will leave it for more games, then a gauntlet.
Kai
Kai,
When we test at that pace it makes a big difference how we set "move overhead milliseconds" and the default is current 20. To be honest, on our tester we don't forfeit even when setting it to zero. It will DEFINITELY affect the result at 1 second games like you are testing. Komodo 1.3 did not have any overhead built in so in your match komodo 2 was playing handicapped.
The purpose of "move overhead milliseconds" is to deal with slower graphical interfaces which I believe could be inadvertently imposing a penalty on each move. And if you are manually operating the computer to play a game with a physical clock you could set it to 2000 or more.
Interesting, the average time per move was 102ms or so for 2.01. Meaning 82ms? This is a handicap of ~25 Elo points. Thanks for the info.
Thanks, CCRL 40/4 gives 70 +/- 38 Elo points (95%) improvement, CEGT 40/20 gives 110 +/- 27 points (95%), maybe they will converge to something inside mine 93 +/- 15 (95%) result, I am really curious if my result is valid.
Kai
We have thousands of games that says it's 100 ELO, but that is at time controls that are around 40 seconds per game on very fast hardware (overclocked i7 6 core.)
YMMV
Don
Don, I think that tests at 40s per game on fast hardware (with a solid increment, say equivalent to 2.5s + 0.25s) are very representative, excluding time management which is another problem. I am using 1s + 0.1s (~15s per game) on not so fast hardware. Measuring the _difference_ between two engines it never let me down with more than 10 Elo points beyond error margins (say 30,000 games, ~3 Elo points error margins 95% conf.).
Kai
Hey, I just send a response to an earlier post of yours. Good timing.
Our experience is that MOST of the time fast time controls are very representative. But we found that sometimes it's misleading. We are often trying to measure just 2 or 3 ELO and there are many ideas we have tried that work really nicely at game in 2 or 3 seconds but very poorly at long time controls. And some of the things we do absolutely require much longer time controls to really exercise certain algorithms. For example if you look at stockfish, they have progressively more aggressive LMR with depth, so you cannot fully test that algorithm without playing longer time controls. If, for instance, if it was a really bad idea it might look great at 1 + 0.1. (Of course it's not a bad idea, but this was just an example.)
Sometimes I run 7 ply games to get a feel for an idea, but I don't really use the results. On my home brewed tester I get 10 games per second at that depth. But a typical depth we run at is game in 3 seconds + 0.03 increment which is probably faster than your 1 + 0.1 - that gives us a feel and from there we decide whether to proceed to longer tests. Keep in mind that we are most of time only looking for 2 or 3 ELO and we cannot afford to be off by 10 ELO.
But I agree with in principle, we are always looking for ways to cheat in the testing as this is the primary bottleneck in making progress.
Program Score % Elo + - Draws
1 Komodo64 2.01 64 bit : 262.5/400 65.6 3256 29 29 31.2 %
2 Komodo64 1.3 JA : 137.5/400 34.4 3144 29 29 31.2 %
112 +/- 29 Elo points (95% confidence) improvement in self-play, probably a little less in a gauntlet, but the new Reptilian seems the level of SF 2.01. Will leave it for more games, then a gauntlet.
Kai
Kai,
When we test at that pace it makes a big difference how we set "move overhead milliseconds" and the default is current 20. To be honest, on our tester we don't forfeit even when setting it to zero. It will DEFINITELY affect the result at 1 second games like you are testing. Komodo 1.3 did not have any overhead built in so in your match komodo 2 was playing handicapped.
The purpose of "move overhead milliseconds" is to deal with slower graphical interfaces which I believe could be inadvertently imposing a penalty on each move. And if you are manually operating the computer to play a game with a physical clock you could set it to 2000 or more.
Interesting, the average time per move was 102ms or so for 2.01. Meaning 82ms? This is a handicap of ~25 Elo points. Thanks for the info.
It's not quite that bad. If the move overhead in reality is only 5 ms, it's not like you give away the 15 ms, it is still left on your clock. However, there is a delay in getting to use that time and we have found that it takes too long before the extra 15 ms gets to have much of an impact (you only get a fraction of it back on each move) so in reality you are still playing a big part of the game too fast. So it does have a crippling affect. And you can lose the game before this extra time is built up enough to make much of a difference.
Thanks for posting the results. So it is 9-4 in favor of Komodo 2.01, which is a good improvement indeed - considering that it's certainly difficult to improve so easily at higher level (3000 elo and above)
Note that STS 14 was released recently and is available for download.
[D]4rk2/pp3p2/7N/2rp2pQ/qb2n3/3BP3/PP3PPP/R2K3R w - - 0 21
In this position from So-Grandelius... Komodo puts Bc2 as first choice with a +- score (#1998) when Qxc2 actually mates.