For a while I'm using bayeselo to compare versions of my chess program.
Now the last couple of weeks I have been pulling my hair out because of an unexplainable difference in elo-rating.
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Stockfish 030914 2500 -75 202 1697 100% 925 0%
2 XboardEngine 1703 61 52 1698 91% 980 0%
3 Embla2001-lmr 1025 14 14 1697 57% 1048 40%
4 Embla-lmr-2006c2b 999 14 14 1705 53% 1045 38%
5 Embla-lmr-2006c2b-2 980 14 14 1696 50% 1048 35%
6 Embla2065_2045 951 13 13 1702 46% 1038 60%
7 Embla2067_2045 946 13 13 1698 45% 1049 59%
8 Embla2067_2065_2045 941 13 13 1698 45% 1051 51%
9 Embla2067 937 13 13 1699 46% 1050 54%
10 Embla2067_2065 933 13 13 1697 44% 1038 52%
11 Embla2067b 932 13 13 1700 42% 1040 58%
12 Embla2065 926 13 13 1699 43% 1050 59%
13 Embla2045 893 14 15 1700 37% 1041 37%
14 ParisHilton -134 133 184 1696 0% 1130 0%
To verify that I did not accidently compile a diffent version, I recompiled the Embla-lmr-2006c2b code to Embla-lmr-2006c2b-2. So if I made a mistake, then Embla-lmr-2006c2b and Embla-lmr-2006c2b-2 can have different ratings. But one of these two must be the same as Embla2001-lmr.
Anyone got an idea what could be going on here?
(XboardEngine is tscp 1.81 and paris hilton is an engine that aims it playing the worst move possible so that I have a lower boundary)