Understanding Bayeselo results

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Post Reply
Eric Stock

Understanding Bayeselo results

Post by Eric Stock » Tue Jun 01, 2010 3:38 am

Hi, I ran a set of 1000 matches between two engines, then loaded the .pgn file into Bayeselo and executed the command mm and then results

Here is what I got


ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 A 14 17 17 1000 54% -14 43%
2 B -14 17 17 1000 46% 14 43%

Ok, so a couple of questions:
1. The +,- are these the 95% confidence intervals for the rating..ie 95% of the time the score is within the interval [-17,+17] of the score?

2. Why might the draw % be so high ? What does this mean for my results? I am testing using Arena and using the opening book "little main book"

Thanks,
Eric Stock

Edmund
Posts: 668
Joined: Mon Dec 03, 2007 2:01 pm
Location: Barcelona, Spain
Contact:

Re: Understanding Bayeselo results

Post by Edmund » Tue Jun 01, 2010 9:56 am

Hello Eric,

firstly 1) is correct. The +- values indicate the confidence intervals for an alpha of 95%.

secondly 2) the draw rate depends on a couple of aspects.
a) the general level of the play. That means, the longer the timecontrol or the higher the average elo of the engines, the higher will be the draw rate.
b) the elo difference of the players, ie. the lower the difference the higher is the chance for a draw. and
c) the more similar the playing style of the engines the higher the chance for a draw. So playing matches between two version of almost the same engine (testing a change for example) can also produce higher draw rates.

There are some other issues like choosing a drawish opening book for example. Furthermore I am not quite sure about the impact of endgame-tablebases, but I could imagine a decreased draw rate there as well.

You might want to take a look on Kirills http://kirill-kryukov.com/chess/kcec/draw_rate.html draw statistics. There you will find that 43% is not so unusual for pairs such close in strength.

Post Reply