Absolute ELO scale

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
nionita
Posts: 161
Joined: Fri Oct 22, 2010 7:47 pm
Location: Austria

Absolute ELO scale

Post by nionita » Sat Dec 17, 2016 11:09 am

Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?

Henk
Posts: 5799
Joined: Mon May 27, 2013 8:31 am

Re: Absolute ELO scale

Post by Henk » Sat Dec 17, 2016 11:22 am

Good idea. But I'm no expert. To me working with probabilities is like working with quick sand.

User avatar
Laskos
Posts: 9414
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos » Sat Dec 17, 2016 12:45 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
That would be a good definition. I even tried to see what the ratings would look like, and they are not far off from our current ratings. Daniel released a version of Andscacs with desired rate of randomness, the results were at 10'' + 0.1'':

Code: Select all

   # PLAYER         : RATING    POINTS  PLAYED    (%) 
   1 Random 0%      : 2697.0     935.0    1000   93.5% 
   2 Random 10%     : 2229.8    1033.0    2000   51.6% 
   3 Random 20%     : 1632.3     970.0    2000   48.5% 
   4 Random 30%     : 1156.3     582.0    2000   29.1% 
   5 Random 40%     : 1142.2    1217.0    2000   60.9% 
   6 Random 50%     :  961.6    1148.0    2000   57.4% 
   7 Random 60%     :  604.0     820.5    2000   41.0% 
   8 Random 70%     :  450.9    1097.5    2000   54.9% 
   9 Random 80%     :  204.7     872.0    2000   43.6% 
  10 Random 90%     :   76.6    1115.0    2000   55.8% 
  11 Random 100%    : -155.6     210.0    1000   21.0%
Random 20% is engine playing like Andscacs 80% of times, randomly 20% of times. The engines probably follow a logistic on these large spans, so these are logistic ELO, not FIDE ELO. Keep in mind that engines could play worse than random, they can go for some sort of anti-play. What their negative ELO can be is hard to say.
Last edited by Laskos on Sat Dec 17, 2016 12:51 pm, edited 1 time in total.

User avatar
hgm
Posts: 23623
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Absolute ELO scale

Post by hgm » Sat Dec 17, 2016 12:49 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
This is a logical way to define things. There is the practical problem, however, that almost anything that does some rudimentary thinking is somuch stronger that it scores close to 100% against the random mover, so that you cannot derive the Elo difference from it. So the 'calibration point' would lie (too) far outside the populated range.

So you would need to fill the gap with series of engines that would bridge the difference. There are some non-searching engines like N.E.G., but even these beat a random mover badly (if they have stalemate avoidance), and probably by too much for reliable Elo determination. And they lose equally badly against searchig engines, even if these do just 1-ply + QS.

Perhaps a series of searching engines based on the Beal effect, with progressively deeper search (and random evaluation) could bridge the gap, and have reasonable scores agaist each other, the random mover on one side, and searching +evaluating engines on the other. There is no guarantee that bridging the gap with another series of engines would not give you a totally different rating, however. Basing ratings on just a sigle pairing of players is already bad, and making a chain of such pairings...

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Absolute ELO scale

Post by Adam Hair » Sat Dec 17, 2016 12:53 pm

Check out the CCRL 40/4 complete list. The bottom engine is Brutus Rnd, a random mover. It does have a better score than it probably should. Several of its opponents have a problem applying checkmate with a material advantage.

User avatar
hgm
Posts: 23623
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Absolute ELO scale

Post by hgm » Sat Dec 17, 2016 1:04 pm

Yes, that is another problem. Many engines 'at the bottom of the pack' are quite buggy, and their results are not well described by an Elo model with a bell-shaped curve (like gaussian or logistic). In particular, they will manage to lose points (draws, but often also losses) against arbitrarily weak opponents, purely due to problems of their own. (E.g. by forfeiting on time, playing an illegal move, resigning when ahead, refusing to checkmate.) If you analyze that with a normal Elo model, the weak engines that get handed free points this way get a hugely overvalued rating, like the points where somehow deserved by their ow performance.

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 8:05 pm
Location: Italy
Full name: Stefano Gemma
Contact:

Re: Absolute ELO scale

Post by stegemma » Sat Dec 17, 2016 1:36 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
ELO 0 is when an engine always resign at its first move... and still it can win against another similar engine, while playing black! :)

Maybe it would be better to define a standard search+evaluate and assign to that engine a default ELO (of about 1000, I think). More simple would be to use an historical open source engine, that will never change in the future.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com

Henk
Posts: 5799
Joined: Mon May 27, 2013 8:31 am

Re: Absolute ELO scale

Post by Henk » Sat Dec 17, 2016 1:51 pm

stegemma wrote:
nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
ELO 0 is when an engine always resign at its first move... and still it can win against another similar engine, while playing black! :)

Maybe it would be better to define a standard search+evaluate and assign to that engine a default ELO (of about 1000, I think). More simple would be to use an historical open source engine, that will never change in the future.
I don't know: Is it allowed to resign if it is not your turn. And if allowed what happens if both resign at exactly the same time.

User avatar
Guenther
Posts: 3015
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: Absolute ELO scale

Post by Guenther » Sat Dec 17, 2016 2:15 pm

Laskos wrote:...

Keep in mind that engines could play worse than random, they can go for some sort of anti-play. What their negative ELO can be is hard to say.
Exactly. I would guess the negative rating will just mirror the best possible rating, but on the negative scale. This is for always playing the worst possible move.

So sth like (simplified):

Code: Select all

 ~ +4000 best move possible
+/- 0 truly random moves
~ -4000 worst move posssible

User avatar
Laskos
Posts: 9414
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos » Sat Dec 17, 2016 3:04 pm

Guenther wrote:
Laskos wrote:...

Keep in mind that engines could play worse than random, they can go for some sort of anti-play. What their negative ELO can be is hard to say.
Exactly. I would guess the negative rating will just mirror the best possible rating, but on the negative scale. This is for always playing the worst possible move.

So sth like (simplified):

Code: Select all

 ~ +4000 best move possible
+/- 0 truly random moves
~ -4000 worst move posssible
Also, without resignation, time forfeits or illegal moves. Simply play legal moves till the end of the game as defined by FIDE (without those mentioned). Because the negative ELO will simply go for the engine resigning or forfeiting sooner.

Post Reply