Thomas Zipproth spotted a severe bug in the LittleBlitzerGUI: The 50moves-draw-rule detection doesnt work properly! Some games, which are draw by the 50moves-rule, can be won or lost. After an investigation of the 55000 games of my LS top10 tournament (a big thanx to Thomas Zipproth!), we found, that an Elo-distortion of the engine-rankings in the range of +/- 0-2 Elo is caused by that bug...From now, all testwork for the LS-ratinglist is done with cutechess-cli.
It is strongly recommended, not ot use the LittleBlitzerGUI for testing anymore !!!
Stefan
Severe bug found in the LittleBlitzerGUI
Moderators: hgm, Rebel, chrisw
-
- Posts: 2444
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
-
- Posts: 2444
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: Severe bug found in the LittleBlitzerGUI
Correct. And I will use cutechess-cli in the future. But in engine-tests with not so many games, the distortion can be bigger...So I wanted to post this warning here for other testers.SzG wrote:That 2 Elo distortion does not seem unbearable to me. If we wanted our ranking lists to be as accurate as that, we would have to play at least 100000 games for each engine, a task we surely could not undertake. Our best tested engine, SlowChess Blitz has 11000 games and still at only +/- 7 Elo error margin.pohl4711 wrote:After an investigation of the 55000 games of my LS top10 tournament (a big thanx to Thomas Zipproth!), we found, that an Elo-distortion of the engine-rankings in the range of +/- 0-2 Elo is caused by that bug...From now, all testwork for the LS-ratinglist is done with cutechess-cli.
It is strongly recommended, not ot use the LittleBlitzerGUI for testing anymore !!!
Stefan
For me two engines with a 5, maybe even with more, Elo difference are equal.
Stefan
-
- Posts: 27811
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Severe bug found in the LittleBlitzerGUI
Note that when the Elo ratings calculated with and without this by 2 Elo, it doesn't imply at all that these ratings are off by two Elo. Nearly as often they will be 2 Elo better, because the correct calculation has an error that is much larger, and happens to have the opposite sign.
In fact, when I run a test gauntlet, and now randomly select as much as 10% of the games, and replace their result by a randomly chosen win or loss, this would only marginally drive up the error. If you did very asymmetric testing (on average much better or much worse opponents) there would be a small systematic effect too, but such testing would be unreliable anyway.
In fact, when I run a test gauntlet, and now randomly select as much as 10% of the games, and replace their result by a randomly chosen win or loss, this would only marginally drive up the error. If you did very asymmetric testing (on average much better or much worse opponents) there would be a small systematic effect too, but such testing would be unreliable anyway.
-
- Posts: 284
- Joined: Tue Aug 13, 2013 9:44 am
Re: Severe bug found in the LittleBlitzerGUI
Are you sure? Because Dariusz Orzechowski reports implicitly it workspohl4711 wrote:Thomas Zipproth spotted a severe bug in the LittleBlitzerGUI: The 50moves-draw-rule detection doesnt work properly! Some games, which are draw by the 50moves-rule, can be won or lost. After an investigation of the 55000 games of my LS top10 tournament (a big thanx to Thomas Zipproth!), we found, that an Elo-distortion of the engine-rankings in the range of +/- 0-2 Elo is caused by that bug...From now, all testwork for the LS-ratinglist is done with cutechess-cli.
It is strongly recommended, not ot use the LittleBlitzerGUI for testing anymore !!!
Stefan
Dariusz Orzechowski wrote:Virtually all these "missing" repetition draws were transferred to fifty moves category when contempt was enabled. https://groups.google.com/d/msg/fishcoo ... RqRmM6pasJ