Absolute ELO scale

Laskos · Post by **Laskos** » Sun Dec 18, 2016 10:38 am

hgm wrote:It seems you only used one method to weaken the engines there, namely reducing the size of the search tree of healthy engines by node count. You cannot assume this would hold for other methods of weakening too (like random pruning, gross misevaluation).

Yes, but I used an array of different engines, which differ in many characteristics, not just nodes searched. Then, with that Andscacs randomizer and partial randomizer, logistic came out too as most adapted, on 3000 ELO scale now. Sure, engines forfeiting most of games on time and such, will not obey logistic, in fact probably will not obey any model.

cdani · Post by **cdani** » Sun Dec 18, 2016 11:49 am

I have done a version of Andscacs that tries to play the worst possible move, for if anyone is interested, Andworst -0.1, the version number go backwards

www.andscacs.com/andworst.zip

It happen that it can take an offered piece because like this the rival has mate, but if the rival is a weak engine or a random mover maybe it will not see the mate, so sometimes it achieves a draw against a random mover.

Sure it can be done even worst player.

The Andscacs random commented by Kai is here:

www.andscacs.com/andscacs_r087007.zip

hgm · Post by **hgm** » Sun Dec 18, 2016 12:05 pm

Laskos wrote:Yes, but I used an array of different engines, which differ in many characteristics, not just nodes searched. Then, with that Andscacs randomizer and partial randomizer, logistic came out too as most adapted, on 3000 ELO scale now. Sure, engines forfeiting most of games on time and such, will not obey logistic, in fact probably will not obey any model.

Well, in a broader view of things these engines are practically identical: they are all alpha-beta searchers with a heuristic evaluation, which uses material, King Safety, Pawn structure, etc. There source of error is never a gross blunder, just that that something is outside of their search horizon. As the probability for this is basically a property of the game tree of Chess, it is not surprising they would all behave in a certain way. With a different source of error, e.g. when a single gross blunder decides the outcome of the game (of which time losses is only one possible case), things might look very different. You did not include any MC-UCT engines, engines without QS, engines with faulty piece values, non-searching engines...

cdani · Post by **cdani** » Sun Dec 18, 2016 12:37 pm

This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip

Laskos · Post by **Laskos** » Sun Dec 18, 2016 2:03 pm

cdani wrote:This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip

Thank you very much, Daniel!

First match: A-worst at 5''+0.05'' versus A-Random:

Code: Select all

Score of A-worst vs A-Random&#58; 68 - 932 - 0  &#91;0.068&#93; 1000
ELO difference&#58; -454.76 +/- 43.43
Finished match

450 ELO points weaker than random mover.
Now I will see if at longer time control A-worst performs even worse.

Guenther · Post by **Guenther** » Sun Dec 18, 2016 2:21 pm

Laskos wrote:
cdani wrote:This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip
Thank you very much, Daniel!

First match: A-worst at 5''+0.05'' versus A-Random:
Code: Select all
Score of A-worst vs A-Random&#58; 68 - 932 - 0  &#91;0.068&#93; 1000
ELO difference&#58; -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
Now I will see if at longer time control A-worst performs even worse.

Is the result WLD? or LDW? or DLW or...?
The best a worst mover should reach is a draw, but no win.

BTW I take back my suggestion that a rnd mover should be in the middle of a scale between best/worst(losers) player.
Thinking about it again it should be shifted towards a negative value.

(A real random mover should do no search at all, it should just iterate through all legal moves and select randomly one of those.
I don't know if this is acchieved yet for the available rnd movers?)

Laskos · Post by **Laskos** » Sun Dec 18, 2016 2:38 pm

Guenther wrote:
Laskos wrote:
cdani wrote:This new version seems even worst against the random mover:

www.andscacs.com/andworst-0.2.zip
Thank you very much, Daniel!

First match: A-worst at 5''+0.05'' versus A-Random:
Code: Select all
Score of A-worst vs A-Random&#58; 68 - 932 - 0  &#91;0.068&#93; 1000
ELO difference&#58; -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
Now I will see if at longer time control A-worst performs even worse.
Is the result WLD? or LDW? or DLW or...?
The best a worst mover should reach is a draw, but no win.

BTW I take back my suggestion that a rnd mover should be in the middle of a scale between best/worst(losers) player.
Thinking about it again it should be shifted towards a negative value.

(A real random mover should do no search at all, it should just iterate through all legal moves and select randomly one of those.
I don't know if this is acchieved yet for the available rnd movers?)

Andscacs randomizer AFAIK is doing that, picking randomly a legal move from all legal moves.
The results are WLD. Now, the second test is trickier: two worst movers at different time controls have very high draw ratio and small difference in Win/Loss. But sometimes they lose on time or disconnect, which bothers me. I will try to see later what happens. The games between worst movers are very long, hundreds of moves, most ending in draws. In all these games in this thread it's important to not have any sort of adjudication.

cdani · Post by **cdani** » Sun Dec 18, 2016 3:03 pm

Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:
Code: Select all
Score of A-worst vs A-Random&#58; 68 - 932 - 0  &#91;0.068&#93; 1000
ELO difference&#58; -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.

I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:
Random: 932 wins, and 68 draws. Worst: 0 wins.

Laskos · Post by **Laskos** » Sun Dec 18, 2016 3:40 pm

cdani wrote:
Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:
Code: Select all
Score of A-worst vs A-Random&#58; 68 - 932 - 0  &#91;0.068&#93; 1000
ELO difference&#58; -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:
Random: 932 wins, and 68 draws. Worst: 0 wins.

68 are wins of the worst. Can I ask you something: what time control, depth or nodes are needed to set in Cutechess-Cli for Random for it to work correctly in the shortest amount of time as a generator of random legal moves? I remember in the past I had some problems with it.

Laskos · Post by **Laskos** » Sun Dec 18, 2016 3:54 pm

Laskos wrote:
cdani wrote:
Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:
Code: Select all
Score of A-worst vs A-Random&#58; 68 - 932 - 0  &#91;0.068&#93; 1000
ELO difference&#58; -454.76 +/- 43.43
Finished match
450 ELO points weaker than random mover.
I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:
Random: 932 wins, and 68 draws. Worst: 0 wins.
68 are wins of the worst. Can I ask you something: what time control, depth or nodes are needed to set in Cutechess-Cli for Random for it to work correctly in the shortest amount of time as a generator of random legal moves? I remember in the past I had some problems with it.

Yes, it was a problem with my A-Random, I saw that it might make illegal moves or lose on time in too tight thinking time. Seems fixed now:
A-worst at 10''+0.1'' versus A-Random (fixed):

Code: Select all

Score of A-worst vs A Random&#58; 0 - 998 - 2  &#91;0.001&#93; 1000
ELO difference&#58; -1199.83 +/- nan
Finished match

2 draws.

PS I tried to play two A-worst at different time controls, from time to time engine loses by "disconnection", rendering results meaningless. Games are very long, and usually draws. Is it a small bug, or I am again doing something wrong? Time controls were say 20''+0.2'' vesrsus 5''+0.05''.

Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale

Re: Absolute ELO scale