The above echoes some of your posts on the Winboard forum years ago. I wish you could expand more on the subject, it seems hardly anyone ever thought of tackling it. Should you ever find the time to do some research I believe the findings would prove a fascinating read.hgm wrote: ↑Mon Apr 29, 2019 7:27 amI don't expect that your rating for NEG in particular is wrong; more games probably would not help. It is just that all ratings in that region are totally off, as the Elo model tends to break down there. Conventional engines (i.e. alpha-beta searchers) are often that weak because they are buggy, and sometimes they are lucky that the bug doesn't manifest itself during the game, and then they can play quite strong games. So they sometimes beat rather strong opponents (especially if the opponent is also buggy, but just blunders more rarely), which the rating-extraction software considers impossible if it doesn't assign them ratings that are much closer than they should be. In a sense most engines in that Elo range play more against their own bugs than against the opponent, making it very hard to determine their relative strength by playing them against each other.
Also, the gap between non-searching engines (including random movers) and alpha-beta searchers is enormous, perhaps as much as 5000 Elo, and there is nothing in that gap that could be used to really measure it. The non-searching engines only score points against alpha-beta searchers because the latter are beaten by their own bugs, and then hand out the points for free no matter what the opponent is.
Thank you for the insights, Harm, your posts are always greatly appreciated.