Absolute ELO scale

nionita · Post by **nionita** » Sat Dec 17, 2016 12:09 pm

Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?

Henk · Post by **Henk** » Sat Dec 17, 2016 12:22 pm

Good idea. But I'm no expert. To me working with probabilities is like working with quick sand.

Laskos · Post by **Laskos** » Sat Dec 17, 2016 1:45 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?

That would be a good definition. I even tried to see what the ratings would look like, and they are not far off from our current ratings. Daniel released a version of Andscacs with desired rate of randomness, the results were at 10'' + 0.1'':

Code: Select all

   # PLAYER         &#58; RATING    POINTS  PLAYED    (%) 
   1 Random 0%      &#58; 2697.0     935.0    1000   93.5% 
   2 Random 10%     &#58; 2229.8    1033.0    2000   51.6% 
   3 Random 20%     &#58; 1632.3     970.0    2000   48.5% 
   4 Random 30%     &#58; 1156.3     582.0    2000   29.1% 
   5 Random 40%     &#58; 1142.2    1217.0    2000   60.9% 
   6 Random 50%     &#58;  961.6    1148.0    2000   57.4% 
   7 Random 60%     &#58;  604.0     820.5    2000   41.0% 
   8 Random 70%     &#58;  450.9    1097.5    2000   54.9% 
   9 Random 80%     &#58;  204.7     872.0    2000   43.6% 
  10 Random 90%     &#58;   76.6    1115.0    2000   55.8% 
  11 Random 100%    &#58; -155.6     210.0    1000   21.0%

Random 20% is engine playing like Andscacs 80% of times, randomly 20% of times. The engines probably follow a logistic on these large spans, so these are logistic ELO, not FIDE ELO. Keep in mind that engines could play worse than random, they can go for some sort of anti-play. What their negative ELO can be is hard to say.

hgm · Post by **hgm** » Sat Dec 17, 2016 1:49 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?

This is a logical way to define things. There is the practical problem, however, that almost anything that does some rudimentary thinking is somuch stronger that it scores close to 100% against the random mover, so that you cannot derive the Elo difference from it. So the 'calibration point' would lie (too) far outside the populated range.

So you would need to fill the gap with series of engines that would bridge the difference. There are some non-searching engines like N.E.G., but even these beat a random mover badly (if they have stalemate avoidance), and probably by too much for reliable Elo determination. And they lose equally badly against searchig engines, even if these do just 1-ply + QS.

Perhaps a series of searching engines based on the Beal effect, with progressively deeper search (and random evaluation) could bridge the gap, and have reasonable scores agaist each other, the random mover on one side, and searching +evaluating engines on the other. There is no guarantee that bridging the gap with another series of engines would not give you a totally different rating, however. Basing ratings on just a sigle pairing of players is already bad, and making a chain of such pairings...

Adam Hair · Post by **Adam Hair** » Sat Dec 17, 2016 1:53 pm

Check out the CCRL 40/4 complete list. The bottom engine is Brutus Rnd, a random mover. It does have a better score than it probably should. Several of its opponents have a problem applying checkmate with a material advantage.

hgm · Post by **hgm** » Sat Dec 17, 2016 2:04 pm

Yes, that is another problem. Many engines 'at the bottom of the pack' are quite buggy, and their results are not well described by an Elo model with a bell-shaped curve (like gaussian or logistic). In particular, they will manage to lose points (draws, but often also losses) against arbitrarily weak opponents, purely due to problems of their own. (E.g. by forfeiting on time, playing an illegal move, resigning when ahead, refusing to checkmate.) If you analyze that with a normal Elo model, the weak engines that get handed free points this way get a hugely overvalued rating, like the points where somehow deserved by their ow performance.

stegemma · Post by **stegemma** » Sat Dec 17, 2016 2:36 pm

nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?

ELO 0 is when an engine always resign at its first move... and still it can win against another similar engine, while playing black!

Maybe it would be better to define a standard search+evaluate and assign to that engine a default ELO (of about 1000, I think). More simple would be to use an historical open source engine, that will never change in the future.

Henk · Post by **Henk** » Sat Dec 17, 2016 2:51 pm

stegemma wrote:
nionita wrote:Is there any theoretical problem if we define ELO 0 = play strenght of an engine which plays in every (legal) position one of the legal moves following a random uniform probability distribution?
ELO 0 is when an engine always resign at its first move... and still it can win against another similar engine, while playing black!

Maybe it would be better to define a standard search+evaluate and assign to that engine a default ELO (of about 1000, I think). More simple would be to use an historical open source engine, that will never change in the future.

I don't know: Is it allowed to resign if it is not your turn. And if allowed what happens if both resign at exactly the same time.

Guenther · Post by **Guenther** » Sat Dec 17, 2016 3:15 pm

Laskos wrote:...

Keep in mind that engines could play worse than random, they can go for some sort of anti-play. What their negative ELO can be is hard to say.

Exactly. I would guess the negative rating will just mirror the best possible rating, but on the negative scale. This is for always playing the worst possible move.

So sth like (simplified):

Code: Select all

 ~ +4000 best move possible
+/- 0 truly random moves
~ -4000 worst move posssible

Laskos · Post by **Laskos** » Sat Dec 17, 2016 4:04 pm

Guenther wrote:
Laskos wrote:...

Keep in mind that engines could play worse than random, they can go for some sort of anti-play. What their negative ELO can be is hard to say.
Exactly. I would guess the negative rating will just mirror the best possible rating, but on the negative scale. This is for always playing the worst possible move.

So sth like (simplified):
Code: Select all
 ~ +4000 best move possible
+/- 0 truly random moves
~ -4000 worst move posssible

Also, without resignation, time forfeits or illegal moves. Simply play legal moves till the end of the game as defined by FIDE (without those mentioned). Because the negative ELO will simply go for the engine resigning or forfeiting sooner.

Absolute ELO scale

Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale