Yes, but I used an array of different engines, which differ in many characteristics, not just nodes searched. Then, with that Andscacs randomizer and partial randomizer, logistic came out too as most adapted, on 3000 ELO scale now. Sure, engines forfeiting most of games on time and such, will not obey logistic, in fact probably will not obey any model.hgm wrote:It seems you only used one method to weaken the engines there, namely reducing the size of the search tree of healthy engines by node count. You cannot assume this would hold for other methods of weakening too (like random pruning, gross misevaluation).
Absolute ELO scale
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Absolute ELO scale
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Absolute ELO scale
I have done a version of Andscacs that tries to play the worst possible move, for if anyone is interested, Andworst -0.1, the version number go backwards
www.andscacs.com/andworst.zip
It happen that it can take an offered piece because like this the rival has mate, but if the rival is a weak engine or a random mover maybe it will not see the mate, so sometimes it achieves a draw against a random mover.
Sure it can be done even worst player.
The Andscacs random commented by Kai is here:
www.andscacs.com/andscacs_r087007.zip
www.andscacs.com/andworst.zip
It happen that it can take an offered piece because like this the rival has mate, but if the rival is a weak engine or a random mover maybe it will not see the mate, so sometimes it achieves a draw against a random mover.
Sure it can be done even worst player.
The Andscacs random commented by Kai is here:
www.andscacs.com/andscacs_r087007.zip
Daniel José - http://www.andscacs.com
-
- Posts: 27817
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Absolute ELO scale
Well, in a broader view of things these engines are practically identical: they are all alpha-beta searchers with a heuristic evaluation, which uses material, King Safety, Pawn structure, etc. There source of error is never a gross blunder, just that that something is outside of their search horizon. As the probability for this is basically a property of the game tree of Chess, it is not surprising they would all behave in a certain way. With a different source of error, e.g. when a single gross blunder decides the outcome of the game (of which time losses is only one possible case), things might look very different. You did not include any MC-UCT engines, engines without QS, engines with faulty piece values, non-searching engines...Laskos wrote:Yes, but I used an array of different engines, which differ in many characteristics, not just nodes searched. Then, with that Andscacs randomizer and partial randomizer, logistic came out too as most adapted, on 3000 ELO scale now. Sure, engines forfeiting most of games on time and such, will not obey logistic, in fact probably will not obey any model.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Absolute ELO scale
Daniel José - http://www.andscacs.com
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Absolute ELO scale
Thank you very much, Daniel!cdani wrote:This new version seems even worst against the random mover:
www.andscacs.com/andworst-0.2.zip
First match: A-worst at 5''+0.05'' versus A-Random:
Code: Select all
Score of A-worst vs A-Random: 68 - 932 - 0 [0.068] 1000
ELO difference: -454.76 +/- 43.43
Finished match
Now I will see if at longer time control A-worst performs even worse.
-
- Posts: 4611
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Absolute ELO scale
Is the result WLD? or LDW? or DLW or...?Laskos wrote:Thank you very much, Daniel!cdani wrote:This new version seems even worst against the random mover:
www.andscacs.com/andworst-0.2.zip
First match: A-worst at 5''+0.05'' versus A-Random:450 ELO points weaker than random mover.Code: Select all
Score of A-worst vs A-Random: 68 - 932 - 0 [0.068] 1000 ELO difference: -454.76 +/- 43.43 Finished match
Now I will see if at longer time control A-worst performs even worse.
The best a worst mover should reach is a draw, but no win.
BTW I take back my suggestion that a rnd mover should be in the middle of a scale between best/worst(losers) player.
Thinking about it again it should be shifted towards a negative value.
(A real random mover should do no search at all, it should just iterate through all legal moves and select randomly one of those.
I don't know if this is acchieved yet for the available rnd movers?)
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Absolute ELO scale
Andscacs randomizer AFAIK is doing that, picking randomly a legal move from all legal moves.Guenther wrote:Is the result WLD? or LDW? or DLW or...?Laskos wrote:Thank you very much, Daniel!cdani wrote:This new version seems even worst against the random mover:
www.andscacs.com/andworst-0.2.zip
First match: A-worst at 5''+0.05'' versus A-Random:450 ELO points weaker than random mover.Code: Select all
Score of A-worst vs A-Random: 68 - 932 - 0 [0.068] 1000 ELO difference: -454.76 +/- 43.43 Finished match
Now I will see if at longer time control A-worst performs even worse.
The best a worst mover should reach is a draw, but no win.
BTW I take back my suggestion that a rnd mover should be in the middle of a scale between best/worst(losers) player.
Thinking about it again it should be shifted towards a negative value.
(A real random mover should do no search at all, it should just iterate through all legal moves and select randomly one of those.
I don't know if this is acchieved yet for the available rnd movers?)
The results are WLD. Now, the second test is trickier: two worst movers at different time controls have very high draw ratio and small difference in Win/Loss. But sometimes they lose on time or disconnect, which bothers me. I will try to see later what happens. The games between worst movers are very long, hundreds of moves, most ending in draws. In all these games in this thread it's important to not have any sort of adjudication.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Absolute ELO scale
I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:450 ELO points weaker than random mover.Code: Select all
Score of A-worst vs A-Random: 68 - 932 - 0 [0.068] 1000 ELO difference: -454.76 +/- 43.43 Finished match
Random: 932 wins, and 68 draws. Worst: 0 wins.
Daniel José - http://www.andscacs.com
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Absolute ELO scale
68 are wins of the worst. Can I ask you something: what time control, depth or nodes are needed to set in Cutechess-Cli for Random for it to work correctly in the shortest amount of time as a generator of random legal moves? I remember in the past I had some problems with it.cdani wrote:I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:450 ELO points weaker than random mover.Code: Select all
Score of A-worst vs A-Random: 68 - 932 - 0 [0.068] 1000 ELO difference: -454.76 +/- 43.43 Finished match
Random: 932 wins, and 68 draws. Worst: 0 wins.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Absolute ELO scale
Yes, it was a problem with my A-Random, I saw that it might make illegal moves or lose on time in too tight thinking time. Seems fixed now:Laskos wrote:68 are wins of the worst. Can I ask you something: what time control, depth or nodes are needed to set in Cutechess-Cli for Random for it to work correctly in the shortest amount of time as a generator of random legal moves? I remember in the past I had some problems with it.cdani wrote:I have done the same test as you, and Random wins most of the games, and just a few are draw. As Guenther said, probably your result is:Laskos wrote:
First match: A-worst at 5''+0.05'' versus A-Random:450 ELO points weaker than random mover.Code: Select all
Score of A-worst vs A-Random: 68 - 932 - 0 [0.068] 1000 ELO difference: -454.76 +/- 43.43 Finished match
Random: 932 wins, and 68 draws. Worst: 0 wins.
A-worst at 10''+0.1'' versus A-Random (fixed):
Code: Select all
Score of A-worst vs A Random: 0 - 998 - 2 [0.001] 1000
ELO difference: -1199.83 +/- nan
Finished match
PS I tried to play two A-worst at different time controls, from time to time engine loses by "disconnection", rendering results meaningless. Games are very long, and usually draws. Is it a small bug, or I am again doing something wrong? Time controls were say 20''+0.2'' vesrsus 5''+0.05''.