Absolute ELO scale

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
nionita
Posts: 161
Joined: Fri Oct 22, 2010 7:47 pm
Location: Austria

Re: Absolute ELO scale

Post by nionita » Sat Dec 17, 2016 3:23 pm

hgm wrote:
nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
This is a logical way to define things. There is the practical problem, however, that almost anything that does some rudimentary thinking is somuch stronger that it scores close to 100% against the random mover, so that you cannot derive the Elo difference from it. So the 'calibration point' would lie (too) far outside the populated range.

So you would need to fill the gap with series of engines that would bridge the difference. There are some non-searching engines like N.E.G., but even these beat a random mover badly (if they have stalemate avoidance), and probably by too much for reliable Elo determination. And they lose equally badly against searchig engines, even if these do just 1-ply + QS.

Perhaps a series of searching engines based on the Beal effect, with progressively deeper search (and random evaluation) could bridge the gap, and have reasonable scores agaist each other, the random mover on one side, and searching +evaluating engines on the other. There is no guarantee that bridging the gap with another series of engines would not give you a totally different rating, however. Basing ratings on just a sigle pairing of players is already bad, and making a chain of such pairings...
I was thinking that, for new engine authors, having a standard (weak) engine can be beneficial to measure the progress in the beginning, and even the fact that having bugs you lose more than necessary against that engine could be helpful for them.

Also if we make experiements for (new) ML algorithms and get (mostly) weaker results, such a reference would make results comparable.

Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 9:01 pm
Location: Irvine, CA, USA

Re: Absolute ELO scale

Post by Dirt » Sat Dec 17, 2016 5:02 pm

hgm wrote:There are some non-searching engines like N.E.G., but even these beat a random mover badly (if they have stalemate avoidance), and probably by too much for reliable Elo determination. And they lose equally badly against searchig engines, even if these do just 1-ply + QS.
Yeah, I immediately thought of N.E.G. when reading the first post. I think that would be a better zero point, but maybe not good enough.
Deasil is the right way to go.

Uri Blass
Posts: 8530
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Absolute ELO scale

Post by Uri Blass » Sat Dec 17, 2016 5:53 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
Yes

I think that using random player for elo is not natural behavior of weak players and it may distort elo.

Let take an extreme example.

Suppose that you have an engine that play like stockfish with white but play random moves with black.

Suppose that it play in human tournaments

What is going to be the engine's rating against humans?
It is clear that the rating is going to be dependent on the opponent

In every match against humans that are not extremely weak and may draw by stalemate even against the random mover or lose by 2 illegal moves against the random mover it is going to score 50%(except maybe few humans who have practical chances not to lose against stockfish with black).

User avatar
Laskos
Posts: 9312
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos » Sat Dec 17, 2016 7:08 pm

hgm wrote:There are some non-searching engines like N.E.G., but even these beat a random mover badly (if they have stalemate avoidance), and probably by too much for reliable Elo determination. And they lose equally badly against searchig engines, even if these do just 1-ply + QS.
It's not so bad. I let Andscacs eval + 1 ply against random mover Andscacs. No time losses and such.

Code: Select all

Score of Ands depth=1 vs Ands Random: 39993 - 0 - 7  [1.000] 40000
ELO difference: 1623.18 +/- 165.00
Finished match
In line with previous test, and with FIDE ratings: I estimate the FIDE strength of 1 ply Andscacs about 1200. Probably engines dilute a bit ratings comparing cu FIDE, and weak humans are unlike both 1-ply mover and random mover.

User avatar
hgm
Posts: 23480
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Absolute ELO scale

Post by hgm » Sat Dec 17, 2016 10:11 pm

Well, you cannot get a reliable rating difference from a score of 0.01%. That is way to sensitive for the Elo model.

tysen2k
Posts: 4
Joined: Wed Sep 09, 2015 10:19 pm

Re: Absolute ELO scale

Post by tysen2k » Sat Dec 17, 2016 10:51 pm

I've been toying around with the idea of setting the "reference" Elo level to be the level that gives handicap odds a multiplying effect. For example, if you subtract about 425 from current Elo levels, knight odds support about a 1.43x difference in Elo across a wide range of Elo.

User avatar
Laskos
Posts: 9312
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos » Sun Dec 18, 2016 1:00 am

hgm wrote:Well, you cannot get a reliable rating difference from a score of 0.01%. That is way to sensitive for the Elo model.
Elo model for engines seems to be logistic. It is corroborated in this case by the fact that adding small intervals adds up to a logistic value in the total result on large span. Besides that, it was more to show that random mover can from time to time, say one in 10,000, draw ply 1 full engine. Not one on 10^20 cases.

User avatar
hgm
Posts: 23480
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Absolute ELO scale

Post by hgm » Sun Dec 18, 2016 8:50 am

Laskos wrote:Elo model for engines seems to be logistic.
In what range? I am pretty sure there isn't much statistics this far out in the tails. And what holds in one range of ratings might not hold in a completely different range (where engines must be buggy to be as weak as they are).

User avatar
Laskos
Posts: 9312
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos » Sun Dec 18, 2016 9:03 am

hgm wrote:
Laskos wrote:Elo model for engines seems to be logistic.
In what range? I am pretty sure there isn't much statistics this far out in the tails. And what holds in one range of ratings might not hold in a completely different range (where engines must be buggy to be as weak as they are).
You might look at this thread and plot:
http://www.talkchess.com/forum/viewtopic.php?t=60791
Image
On 1400 ELO points. Sure, no resignations, no time forfeits, no illegal moves. The conditions for a logistic behavior can be set easily.

User avatar
hgm
Posts: 23480
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Absolute ELO scale

Post by hgm » Sun Dec 18, 2016 9:18 am

It seems you only used one method to weaken the engines there, namely reducing the size of the search tree of healthy engines by node count. You cannot assume this would hold for other methods of weakening too (like random pruning, gross misevaluation).

Post Reply