Absolute ELO scale

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos »

hgm wrote:It seems you only used one method to weaken the engines there, namely reducing the size of the search tree of healthy engines by node count. You cannot assume this would hold for other methods of weakening too (like random pruning, gross misevaluation).
Yes, but I used an array of different engines, which differ in many characteristics, not just nodes searched. Then, with that Andscacs randomizer and partial randomizer, logistic came out too as most adapted, on 3000 ELO scale now. Sure, engines forfeiting most of games on time and such, will not obey logistic, in fact probably will not obey any model.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Absolute ELO scale

Post by cdani »

I have done a version of Andscacs that tries to play the worst possible move, for if anyone is interested, Andworst -0.1, the version number go backwards :-)

www.andscacs.com/andworst.zip

It happen that it can take an offered piece because like this the rival has mate, but if the rival is a weak engine or a random mover maybe it will not see the mate, so sometimes it achieves a draw against a random mover.

Sure it can be done even worst player.

The Andscacs random commented by Kai is here:

www.andscacs.com/andscacs_r087007.zip
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Absolute ELO scale

Post by hgm »

Laskos wrote:Yes, but I used an array of different engines, which differ in many characteristics, not just nodes searched. Then, with that Andscacs randomizer and partial randomizer, logistic came out too as most adapted, on 3000 ELO scale now. Sure, engines forfeiting most of games on time and such, will not obey logistic, in fact probably will not obey any model.
Well, in a broader view of things these engines are practically identical: they are all alpha-beta searchers with a heuristic evaluation, which uses material, King Safety, Pawn structure, etc. There source of error is never a gross blunder, just that that something is outside of their search horizon. As the probability for this is basically a property of the game tree of Chess, it is not surprising they would all behave in a certain way. With a different source of error, e.g. when a single gross blunder decides the outcome of the game (of which time losses is only one possible case), things might look very different. You did not include any MC-UCT engines, engines without QS, engines with faulty piece values, non-searching engines...
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Absolute ELO scale

Post by cdani »

This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos »

cdani wrote:This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip
Thank you very much, Daniel!

First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random: 68 - 932 - 0  [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
Now I will see if at longer time control A-worst performs even worse.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Absolute ELO scale

Post by Guenther »

Laskos wrote:
cdani wrote:This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip
Thank you very much, Daniel!

First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random: 68 - 932 - 0  [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
Now I will see if at longer time control A-worst performs even worse.
Is the result WLD? or LDW? or DLW or...?
The best a worst mover should reach is a draw, but no win.

BTW I take back my suggestion that a rnd mover should be in the middle of a scale between best/worst(losers) player.
Thinking about it again it should be shifted towards a negative value.

(A real random mover should do no search at all, it should just iterate through all legal moves and select randomly one of those.
I don't know if this is acchieved yet for the available rnd movers?)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos »

Guenther wrote:
Laskos wrote:
cdani wrote:This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip
Thank you very much, Daniel!

First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random: 68 - 932 - 0  [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
Now I will see if at longer time control A-worst performs even worse.
Is the result WLD? or LDW? or DLW or...?
The best a worst mover should reach is a draw, but no win.

BTW I take back my suggestion that a rnd mover should be in the middle of a scale between best/worst(losers) player.
Thinking about it again it should be shifted towards a negative value.

(A real random mover should do no search at all, it should just iterate through all legal moves and select randomly one of those.
I don't know if this is acchieved yet for the available rnd movers?)
Andscacs randomizer AFAIK is doing that, picking randomly a legal move from all legal moves.
The results are WLD. Now, the second test is trickier: two worst movers at different time controls have very high draw ratio and small difference in Win/Loss. But sometimes they lose on time or disconnect, which bothers me. I will try to see later what happens. The games between worst movers are very long, hundreds of moves, most ending in draws. In all these games in this thread it's important to not have any sort of adjudication.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Absolute ELO scale

Post by cdani »

Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random: 68 - 932 - 0  [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:
Random: 932 wins, and 68 draws. Worst: 0 wins.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos »

cdani wrote:
Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random: 68 - 932 - 0  [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:
Random: 932 wins, and 68 draws. Worst: 0 wins.
68 are wins of the worst. Can I ask you something: what time control, depth or nodes are needed to set in Cutechess-Cli for Random for it to work correctly in the shortest amount of time as a generator of random legal moves? I remember in the past I had some problems with it.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Absolute ELO scale

Post by Laskos »

Laskos wrote:
cdani wrote:
Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random: 68 - 932 - 0  [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:
Random: 932 wins, and 68 draws. Worst: 0 wins.
68 are wins of the worst. Can I ask you something: what time control, depth or nodes are needed to set in Cutechess-Cli for Random for it to work correctly in the shortest amount of time as a generator of random legal moves? I remember in the past I had some problems with it.
Yes, it was a problem with my A-Random, I saw that it might make illegal moves or lose on time in too tight thinking time. Seems fixed now:
A-worst at 10''+0.1'' versus A-Random (fixed):

Code: Select all

Score of A-worst vs A Random: 0 - 998 - 2  [0.001] 1000
ELO difference: -1199.83 +/- nan
Finished match
2 draws.

PS I tried to play two A-worst at different time controls, from time to time engine loses by "disconnection", rendering results meaningless. Games are very long, and usually draws. Is it a small bug, or I am again doing something wrong? Time controls were say 20''+0.2'' vesrsus 5''+0.05''.