UCI_Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Run a new RR with uci elo 1500 engines at TC 60s+100ms. New engines are Rybka, Rodent, Danasah with engine opponent setting, new version of Stockfish and my new engine version.

Code: Select all

   # PLAYER                                 :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Cheng 4.39 ucielo 1500                 :     597    195    60.5      64  94.5
   2 Cheese 2.1 ucielo 1500                 :     366    132    53.0      64  82.8
   3 Rybka v2.3.2a ucielo 1500              :     300    122    50.0      64  78.1
   4 Fruit reloaded v3.21 ucielo 1500       :     300    122    50.0      64  78.1
   5 Amyan 1.72 ucielo 1500                 :     231    116    46.5      64  72.7
   6 Houdini 3 ucielo 1500                  :      76    105    37.5      64  58.6
   7 Ufim v8.02 ucielo 1500                 :      76    102    37.5      64  58.6
   8 Rhetoric 1.4.3 ucielo 1500             :      44    102    35.5      64  55.5
   9 MadChess 2.2 ucielo 1500               :     -80    100    27.5      64  43.0
  10 Discocheck 5.2 ucielo 1500             :    -111    100    25.5      64  39.8
  11 Deuterium v2019.2.37.71 ucielo 1500    :    -111    105    25.5      64  39.8
  12 Stockfish 260819 ucielo 1500           :    -126    108    24.5      64  38.3
  13 Rodent IV 021 ucielo 1500              :    -150    105    23.0      64  35.9
  14 Arasan 21.3 ucielo 1500                :    -267    109    16.0      64  25.0
  15 DanaSah 7.9 engine_opp ucielo 1500     :    -276    113    15.5      64  24.2
  16 CT800 V1.34 ucielo 1500                :    -410    137     9.0      64  14.1
  17 Hiarcs 14 ucielo 1500                  :    -461    153     7.0      64  10.9
Games:
https://drive.google.com/file/d/1ekzAfs ... sp=sharing


Updated my TOPSIS ranking, by testing at (1s/pos) these new engines with 5k positions from human games with a rating of 1450 to 1550.
Test result table:

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

                               Engine  Total  Match  High   Low  HACD  LACD     HEMSE
              Ufim v8.02 UCI_Elo 1500   5000   2168  1363  1469   422   361   4737056
             Arasan 21.3 UCI_Elo 1500   5000   1641  1167  2192   495   452   6708957
             CT800 V1.34 UCI_Elo 1500   5000   1642  1149  2209   333   807  10656012
   DanaSah 7.9 human_opp UCI_Elo 1500   5000   1679  1202  2119   461   396   6041935
              Cheng 4.39 UCI_Elo 1500   5000   2068  1466  1466   474   300   5016436
          Discocheck 5.2 UCI_Elo 1500   5000   1948  1308  1744   381   443   5505496
              Amyan 1.72 UCI_Elo 1500   5000   1803  1218  1979   419   561   7648687
            MadChess 2.2 UCI_Elo 1500   5000   1693  1240  2067   424   552   7680236
              Cheese 2.1 UCI_Elo 1500   5000   2123  1486  1391   404   210   3546737
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1853  1349  1798   357   461   5719087
                         Stockfish 10   5000   2244  2340   416   355    46   3238666
                  Arminius 2017-01-01   5000   2216  1827   957   420    68   3236474
           Rybka v2.3.2a UCI_Elo 1500   5000   2128  1368  1504   428   269   4128324
           Rodent IV 021 UCI_Elo 1500   5000   2013  1270  1717   410   537   6605086
   DanaSah 7.9 engine_opp ucielo 1500   5000   1730  1214  2056   425   390   5673680
        Stockfish 260819 UCI_Elo 1500   5000   1337  1252  2411   421   436   6484532
               Houdini 3 UCI_Elo 1500   5000   1722  1290  1988   446   236   4121787
               Hiarcs 14 UCI_Elo 1500   5000   1795  1107  2098   376   695   8627186
 Deuterium v2019.2.37.71 UCI_Elo 1500   5000   1896  1297  1807   433   311   4515235

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is stronger than human move 
       according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE or Sum((HumanScore - EngineScore)^2)/total, smaller is better.
Stockfish 10 and Arminius 2017-01-01 are added to tune the TOPSIS weight. These 2 engines are not at a level of uci elo 1500 and should be rank last in TOPSIS.


TOPSIS:
Apply 3 criteria with corresponding weight.
Match, w=0.1, maximize
LACD, w=0.4, maximize
HEMSE, w=0.5, minimize

Code: Select all

TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.1    LACD (max) W.0.4    HEMSE (min) W.0.5    Rank
------------------------------------  -------------------  ------------------  -------------------  ------
      Ufim v8.02 UCI_Elo 1500                2168                 361              4.73706e+06        5
      Arasan 21.3 UCI_Elo 1500               1641                 452              6.70896e+06        12
      CT800 V1.34 UCI_Elo 1500               1642                 807              1.0656e+07         13
 DanaSah 7.9 human_opp UCI_Elo 1500          1679                 396              6.04194e+06        14
      Cheng 4.39 UCI_Elo 1500                2068                 300              5.01644e+06        17
    Discocheck 5.2 UCI_Elo 1500              1948                 443              5.5055e+06         3
      Amyan 1.72 UCI_Elo 1500                1803                 561              7.64869e+06        6
     MadChess 2.2 UCI_Elo 1500               1693                 552              7.68024e+06        8
      Cheese 2.1 UCI_Elo 1500                2123                 210              3.54674e+06        15
    Rhetoric 1.4.3 UCI_Elo 1500              1853                 461              5.71909e+06        2
            Stockfish 10                     2244                  46              3.23867e+06        19
        Arminius 2017-01-01                  2216                  68              3.23647e+06        18
     Rybka v2.3.2a UCI_Elo 1500              2128                 269              4.12832e+06        10
     Rodent IV 021 UCI_Elo 1500              2013                 537              6.60509e+06        1
 DanaSah 7.9 engine_opp ucielo 1500          1730                 390              5.67368e+06        9
   Stockfish 260819 UCI_Elo 1500             1337                 436              6.48453e+06        11
       Houdini 3 UCI_Elo 1500                1722                 236              4.12179e+06        16
       Hiarcs 14 UCI_Elo 1500                1795                 695              8.62719e+06        4
Deuterium v2019.2.37.71 UCI_Elo 1500         1896                 311              4.51524e+06        7
The new number 1 that plays like human at around 1500 elo is Rodent. See the rank column, lower is more human.

TOPSIS ref:
https://en.wikipedia.org/wiki/TOPSIS
https://scikit-criteria.readthedocs.io/ ... start.html
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Robert Pope wrote: Thu Aug 08, 2019 3:42 pm A little bit of a tangent: if you get an engine to frequently match the moves of a 1500 player, is there any reason to believe the engine would also be 1500?
There is reason to believe that it is close to 1500.
Robert Pope wrote: Thu Aug 08, 2019 3:42 pmIf you match 95% of moves, what are in the 5%? Probably all the dumb blunders. So I would think you would end up with something that picks 1500-level quality moves, but never blunders, and it would actually be maybe 200 elo stronger in practice. Or am I missing something?
There is a possiblity.

I have been experimenting with TOPSIS applying criteria and weights. I added normal Stockfish and Arminius and both engines top the match move criteria. But of course these engines are beyond 1500. So I selected a TOPSIS criteria/weight so that these two engines will be ranked last. The selected criteria are also applied to all engines.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: UCI_Elo

Post by lkaufman »

Ferdy wrote: Sat Jul 06, 2019 8:38 pm
Patrice Duhamel wrote: Sat Jul 06, 2019 12:29 pm Any advice to tune engine parameters to correspond to the right UCI_Elo value ?

In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
If we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.

If the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.
Komodo rules!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

lkaufman wrote: Sun Sep 15, 2019 8:51 pm
Ferdy wrote: Sat Jul 06, 2019 8:38 pm
Patrice Duhamel wrote: Sat Jul 06, 2019 12:29 pm Any advice to tune engine parameters to correspond to the right UCI_Elo value ?

In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
If we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.
If the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.
I am primarily interested in the fide elo 2000 and below. The human will play at std tc. I start investigating which engines with uci elo feature set at ucielo 1500 that can play as close as possible to human at around elo1500. Currently I tested these engines at 1s per position for 5k pos to get some stats.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: UCI_Elo

Post by lkaufman »

Ferdy wrote: Sun Sep 15, 2019 11:13 pm
lkaufman wrote: Sun Sep 15, 2019 8:51 pm
Ferdy wrote: Sat Jul 06, 2019 8:38 pm
Patrice Duhamel wrote: Sat Jul 06, 2019 12:29 pm Any advice to tune engine parameters to correspond to the right UCI_Elo value ?

In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
If we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.
If the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.
I am primarily interested in the fide elo 2000 and below. The human will play at std tc. I start investigating which engines with uci elo feature set at ucielo 1500 that can play as close as possible to human at around elo1500. Currently I tested these engines at 1s per position for 5k pos to get some stats.
My best guess is that an engine with a CCRL 40/40 rating of 1500 would be an even match (running on their reference hardware) at 40/2 hours with a human with a Fide rating of about 1900.
Komodo rules!
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: UCI_Elo

Post by MikeB »

Ferdy wrote: Sun Sep 15, 2019 5:37 pm Run a new RR with uci elo 1500 engines at TC 60s+100ms. New engines are Rybka, Rodent, Danasah with engine opponent setting, new version of Stockfish and my new engine version.

Code: Select all

   # PLAYER                                 :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Cheng 4.39 ucielo 1500                 :     597    195    60.5      64  94.5
   2 Cheese 2.1 ucielo 1500                 :     366    132    53.0      64  82.8
   3 Rybka v2.3.2a ucielo 1500              :     300    122    50.0      64  78.1
   4 Fruit reloaded v3.21 ucielo 1500       :     300    122    50.0      64  78.1
   5 Amyan 1.72 ucielo 1500                 :     231    116    46.5      64  72.7
   6 Houdini 3 ucielo 1500                  :      76    105    37.5      64  58.6
   7 Ufim v8.02 ucielo 1500                 :      76    102    37.5      64  58.6
   8 Rhetoric 1.4.3 ucielo 1500             :      44    102    35.5      64  55.5
   9 MadChess 2.2 ucielo 1500               :     -80    100    27.5      64  43.0
  10 Discocheck 5.2 ucielo 1500             :    -111    100    25.5      64  39.8
  11 Deuterium v2019.2.37.71 ucielo 1500    :    -111    105    25.5      64  39.8
  12 Stockfish 260819 ucielo 1500           :    -126    108    24.5      64  38.3
  13 Rodent IV 021 ucielo 1500              :    -150    105    23.0      64  35.9
  14 Arasan 21.3 ucielo 1500                :    -267    109    16.0      64  25.0
  15 DanaSah 7.9 engine_opp ucielo 1500     :    -276    113    15.5      64  24.2
  16 CT800 V1.34 ucielo 1500                :    -410    137     9.0      64  14.1
  17 Hiarcs 14 ucielo 1500                  :    -461    153     7.0      64  10.9
Games:
https://drive.google.com/file/d/1ekzAfs ... sp=sharing


Updated my TOPSIS ranking, by testing at (1s/pos) these new engines with 5k positions from human games with a rating of 1450 to 1550.
Test result table:

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

                               Engine  Total  Match  High   Low  HACD  LACD     HEMSE
              Ufim v8.02 UCI_Elo 1500   5000   2168  1363  1469   422   361   4737056
             Arasan 21.3 UCI_Elo 1500   5000   1641  1167  2192   495   452   6708957
             CT800 V1.34 UCI_Elo 1500   5000   1642  1149  2209   333   807  10656012
   DanaSah 7.9 human_opp UCI_Elo 1500   5000   1679  1202  2119   461   396   6041935
              Cheng 4.39 UCI_Elo 1500   5000   2068  1466  1466   474   300   5016436
          Discocheck 5.2 UCI_Elo 1500   5000   1948  1308  1744   381   443   5505496
              Amyan 1.72 UCI_Elo 1500   5000   1803  1218  1979   419   561   7648687
            MadChess 2.2 UCI_Elo 1500   5000   1693  1240  2067   424   552   7680236
              Cheese 2.1 UCI_Elo 1500   5000   2123  1486  1391   404   210   3546737
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1853  1349  1798   357   461   5719087
                         Stockfish 10   5000   2244  2340   416   355    46   3238666
                  Arminius 2017-01-01   5000   2216  1827   957   420    68   3236474
           Rybka v2.3.2a UCI_Elo 1500   5000   2128  1368  1504   428   269   4128324
           Rodent IV 021 UCI_Elo 1500   5000   2013  1270  1717   410   537   6605086
   DanaSah 7.9 engine_opp ucielo 1500   5000   1730  1214  2056   425   390   5673680
        Stockfish 260819 UCI_Elo 1500   5000   1337  1252  2411   421   436   6484532
               Houdini 3 UCI_Elo 1500   5000   1722  1290  1988   446   236   4121787
               Hiarcs 14 UCI_Elo 1500   5000   1795  1107  2098   376   695   8627186
 Deuterium v2019.2.37.71 UCI_Elo 1500   5000   1896  1297  1807   433   311   4515235

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is stronger than human move 
       according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE or Sum((HumanScore - EngineScore)^2)/total, smaller is better.
Stockfish 10 and Arminius 2017-01-01 are added to tune the TOPSIS weight. These 2 engines are not at a level of uci elo 1500 and should be rank last in TOPSIS.


TOPSIS:
Apply 3 criteria with corresponding weight.
Match, w=0.1, maximize
LACD, w=0.4, maximize
HEMSE, w=0.5, minimize

Code: Select all

TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.1    LACD (max) W.0.4    HEMSE (min) W.0.5    Rank
------------------------------------  -------------------  ------------------  -------------------  ------
      Ufim v8.02 UCI_Elo 1500                2168                 361              4.73706e+06        5
      Arasan 21.3 UCI_Elo 1500               1641                 452              6.70896e+06        12
      CT800 V1.34 UCI_Elo 1500               1642                 807              1.0656e+07         13
 DanaSah 7.9 human_opp UCI_Elo 1500          1679                 396              6.04194e+06        14
      Cheng 4.39 UCI_Elo 1500                2068                 300              5.01644e+06        17
    Discocheck 5.2 UCI_Elo 1500              1948                 443              5.5055e+06         3
      Amyan 1.72 UCI_Elo 1500                1803                 561              7.64869e+06        6
     MadChess 2.2 UCI_Elo 1500               1693                 552              7.68024e+06        8
      Cheese 2.1 UCI_Elo 1500                2123                 210              3.54674e+06        15
    Rhetoric 1.4.3 UCI_Elo 1500              1853                 461              5.71909e+06        2
            Stockfish 10                     2244                  46              3.23867e+06        19
        Arminius 2017-01-01                  2216                  68              3.23647e+06        18
     Rybka v2.3.2a UCI_Elo 1500              2128                 269              4.12832e+06        10
     Rodent IV 021 UCI_Elo 1500              2013                 537              6.60509e+06        1
 DanaSah 7.9 engine_opp ucielo 1500          1730                 390              5.67368e+06        9
   Stockfish 260819 UCI_Elo 1500             1337                 436              6.48453e+06        11
       Houdini 3 UCI_Elo 1500                1722                 236              4.12179e+06        16
       Hiarcs 14 UCI_Elo 1500                1795                 695              8.62719e+06        4
Deuterium v2019.2.37.71 UCI_Elo 1500         1896                 311              4.51524e+06        7
The new number 1 that plays like human at around 1500 elo is Rodent. See the rank column, lower is more human.

TOPSIS ref:
https://en.wikipedia.org/wiki/TOPSIS
https://scikit-criteria.readthedocs.io/ ... start.html
Honey , in the last release was updated to use the correct uci options/parameters, please feel free to test if interested. I would be curious to see how it does with your engines using Elo 1500.
Image
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: UCI_Elo

Post by pedrox »

lkaufman wrote: Mon Sep 16, 2019 12:05 am
Ferdy wrote: Sun Sep 15, 2019 11:13 pm
lkaufman wrote: Sun Sep 15, 2019 8:51 pm
Ferdy wrote: Sat Jul 06, 2019 8:38 pm
Patrice Duhamel wrote: Sat Jul 06, 2019 12:29 pm Any advice to tune engine parameters to correspond to the right UCI_Elo value ?

In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
If we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.
If the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.
I am primarily interested in the fide elo 2000 and below. The human will play at std tc. I start investigating which engines with uci elo feature set at ucielo 1500 that can play as close as possible to human at around elo1500. Currently I tested these engines at 1s per position for 5k pos to get some stats.
My best guess is that an engine with a CCRL 40/40 rating of 1500 would be an even match (running on their reference hardware) at 40/2 hours with a human with a Fide rating of about 1900.
This could be in accordance with Kai's formula:

Elo FIDE = (0.7 x Elo CCRL) + 840 = (0.7 x 1500) + 840 = 1890

But I find that this formula is not good when the Elo FIDE drops from about 1400 points.

For example if we use Elos FIDE very low as 1000:

Elo CCRL = (Elo FIDE - 840)/0.7 = 229

This is the value of an engine with random moves (CCRL 40/4) and I believe that a player with an Elo FIDE of 1000 easily wins this engine. For this case I find the formula Elo FIDE = (0.8 x CCRL) + 560 or Elo FIDE = (0.85 x CCRL) + 420 somewhat better
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: UCI_Elo

Post by pedrox »

Ferdy wrote: Sun Sep 15, 2019 5:37 pm Run a new RR with uci elo 1500 engines at TC 60s+100ms. New engines are Rybka, Rodent, Danasah with engine opponent setting, new version of Stockfish and my new engine version.
Thanks for testing DanaSah.

I have a new version (beta) with limited strength, I think I didn't publish it here. If you do any other test you can use this version.

http://www.mediafire.com/file/jmc6cwjcf ... 2.zip/file

Now, by default, when the ucielo option is selected, it will play as suggested in the thread with Elo FIDE (although other options will remain in the configuration). I have also slightly modified the time control when playing with increment, it seemed that I was playing with less force than necessary, possibly playing something stronger, although the regulation was calibrated for a more time control like CCRL 40/4.

To do the calibration, I used CCRL engines and the engine playing in that mode, the regulation seems to work between 500 and 2500.

Code: Select all

Site/ Country: DESKTOP-AMD, Spain
Level: Tournament 40/3
Hardware: AMD FX(tm)-6100 Six-Core Processor with 8,0 GB Memory
-----------------------------------------------------------------

   Engine         Score                    Da
1: DanaSah 7.9 LS 10,5/20 ···················· 
2: Beowulf 2.4a   9,5/20  =1=001110001110=1000

Beowulf 2.4a (ELO CCRL 2202) - DanaSah 7.9 LS (ELO CCRL 2202) :
9,5/20 8-9-3 (=1=001110001110=1000)  48%   -14

20 games played / Tournament is finished
Name of the tournament: Arena Tournament 1482

-----------------------------------------------------------------

   Engine                 Score                    Da
1: DanaSah 7.9 LS         21,0/40 ···················· 
2: RataAeroespacial 0.2.1 10,5/20 001=11010110101010== 
3: CDrill 1800            8,5/20  1=011101100010001000

CDrill 1800 (ELO CCRL 1786) - DanaSah 7.9 LS (ELO CCRL 1800) :
8,5/20 8-11-1 (1=011101100010001000)  43%   -49

RataAeroespacial 0.2.1 (ELO CCRL 1819) - DanaSah 7.9 LS (ELO CCRL 1800) :
10,5/20 9-8-3 (001=11010110101010==)  53%   +21

40 games played / Tournament is finished
Name of the tournament: Arena Tournament 1479

-----------------------------------------------------------------

   Engine         Score                    Da
1: DanaSah 7.9 LS 21,5/40 ···················· 
2: Goyaz 0.007    11,5/20 =10001=0011110111=01 
3: Minimardi 1.3  7,0/20  =0=00101=0000==0110=

Goyaz 0.007 (ELO CCRL 1409) - DanaSah 7.9 LS (ELO CCRL 1409) :
11,5/20 10-7-3 (=10001=0011110111=01)  58%   +56

Minimardi 1.3 (ELO CCRL 1409) - DanaSah 7.9 LS (ELO CCRL 1409) :
7,0/20 4-10-6 (=0=00101=0000==0110=)  35%  -108 

40 games played / Tournament is finished
Name of the tournament: Arena Tournament 1478

-----------------------------------------------------------------

   Engine              Score                    Da
1: DanaSah 7.9 LS      10,0/20 ···················· 
1: feeks 2018-01-31 LM 10,0/20 =10==0=10=110001101=

feeks 2018-01-31 LM (ELO CCRL 971) - DanaSah 7.9 LS (ELO CCRL 971) : 
10,0/20 7-7-6 (=10==0=10=110001101=)  50%    ±0 

20 games played / Tournament is finished
Name of the tournament: Arena Tournament 1475

-----------------------------------------------------------------

   Engine                Score                    Da
1: DanaSah 7.9 LS        19,5/40 ···················· 
2: EasyPeasy 1.0         10,5/20 1=00==1=1=0111100100 
3: Alouette 0.0.5 64-bit 10,0/20 0===0=0===1==1===1==

Alouette 0.0.5 64-bit (ELO CCRL 700) - DanaSah 7.9 LS (ELO CCRL 690): 
10,0/20 3-3-14 (0===0=0===1==1===1==)  50%    ±0
EasyPeasy 1.0 (ELO CCRL 683) - DanaSah 7.9 LS (ELO CCRL 690) : 
10,5/20 8-7-5 (1=00==1=1=0111100100)  53%   +21 

40 games played / Tournament is finished
Name of the tournament: Arena Tournament 1483

-----------------------------------------------------------------

   Engine         Score                    Da
1: DanaSah 7.9 LS 25,0/40 ···················· 
2: Acqua 20160918 10,5/20 =0011===1==0=0===1=1 
3: Ram 2.0        4,5/20  00001010000000=00110

Acqua 20160918 (ELO CCRL 508) - DanaSah 7.9 LS (ELO CCRL 510) : 
10,5/20 5-4-11 (=0011===1==0=0===1=1)  53%   +21
Ram 2.0 (ELO CCRL 511) - DanaSah 7.9 LS (ELO CCRL 510) : 
4,5/20 4-15-1 (00001010000000=00110)  23%  -210 

40 games played / Tournament is finished
Name of the tournament: Arena Tournament 1487

-----------------------------------------------------------------
Then to move to Elo FIDE I use Kai's formula. Although in my opinion it fails for values below 1400 FIDE, perhaps in an upcoming version I will try to adjust that.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

MikeB wrote: Mon Sep 16, 2019 12:35 am Honey , in the last release was updated to use the correct uci options/parameters, please feel free to test if interested. I would be curious to see how it does with your engines using Elo 1500.
Added Honey, its in top 6 :!:

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

                               Engine  Total  Match  High   Low  HACD  LACD     HEMSE
              Ufim v8.02 UCI_Elo 1500   5000   2168  1363  1469   422   361   4737056
             Arasan 21.3 UCI_Elo 1500   5000   1641  1167  2192   495   452   6708957
             CT800 V1.34 UCI_Elo 1500   5000   1642  1149  2209   333   807  10656012
              Cheng 4.39 UCI_Elo 1500   5000   2068  1466  1466   474   300   5016436
          Discocheck 5.2 UCI_Elo 1500   5000   1948  1308  1744   381   443   5505496
              Amyan 1.72 UCI_Elo 1500   5000   1803  1218  1979   419   561   7648687
            MadChess 2.2 UCI_Elo 1500   5000   1693  1240  2067   424   552   7680236
              Cheese 2.1 UCI_Elo 1500   5000   2123  1486  1391   404   210   3546737
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1853  1349  1798   357   461   5719087
                         Stockfish 10   5000   2244  2340   416   355    46   3238666
                  Arminius 2017-01-01   5000   2216  1827   957   420    68   3236474
           Rybka v2.3.2a UCI_Elo 1500   5000   2128  1368  1504   428   269   4128324
           Rodent IV 021 UCI_Elo 1500   5000   2013  1270  1717   410   537   6605086
   DanaSah 7.9 engine_opp ucielo 1500   5000   1730  1214  2056   425   390   5673680
        Stockfish 260819 UCI_Elo 1500   5000   1337  1252  2411   421   436   6484532
               Houdini 3 UCI_Elo 1500   5000   1722  1290  1988   446   236   4121787
               Hiarcs 14 UCI_Elo 1500   5000   1795  1107  2098   376   695   8627186
               Honey X5i UCI_Elo 1500   5000   1908  1418  1674   390   372   4968602
 Deuterium v2019.2.37.72 UCI_Elo 1500   5000   1900  1270  1830   440   337   4886933

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is stronger than human move 
       according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE (Mean Squared Error) or
       Sum((HumanScore - EngineScore)^2)/total, smaller is better.

Code: Select all

TOPSIS ranking, smaller is more human
TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.1    LACD (max) W.0.4    HEMSE (min) W.0.5    Rank
------------------------------------  -------------------  ------------------  -------------------  ------
      Ufim v8.02 UCI_Elo 1500                2168                 361              4.73706e+06        5
      Arasan 21.3 UCI_Elo 1500               1641                 452              6.70896e+06        13
      CT800 V1.34 UCI_Elo 1500               1642                 807              1.0656e+07         14
      Cheng 4.39 UCI_Elo 1500                2068                 300              5.01644e+06        17
    Discocheck 5.2 UCI_Elo 1500              1948                 443              5.5055e+06         3
      Amyan 1.72 UCI_Elo 1500                1803                 561              7.64869e+06        7
     MadChess 2.2 UCI_Elo 1500               1693                 552              7.68024e+06        9
      Cheese 2.1 UCI_Elo 1500                2123                 210              3.54674e+06        15
    Rhetoric 1.4.3 UCI_Elo 1500              1853                 461              5.71909e+06        2
            Stockfish 10                     2244                  46              3.23867e+06        19
        Arminius 2017-01-01                  2216                  68              3.23647e+06        18
     Rybka v2.3.2a UCI_Elo 1500              2128                 269              4.12832e+06        11
     Rodent IV 021 UCI_Elo 1500              2013                 537              6.60509e+06        1
 DanaSah 7.9 engine_opp ucielo 1500          1730                 390              5.67368e+06        10
   Stockfish 260819 UCI_Elo 1500             1337                 436              6.48453e+06        12
       Houdini 3 UCI_Elo 1500                1722                 236              4.12179e+06        16
       Hiarcs 14 UCI_Elo 1500                1795                 695              8.62719e+06        4
       Honey X5i UCI_Elo 1500                1908                 372              4.9686e+06         6
Deuterium v2019.2.37.72 UCI_Elo 1500         1900                 337              4.88693e+06        8
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: UCI_Elo

Post by MikeB »

Ferdy wrote: Wed Sep 18, 2019 3:48 pm
MikeB wrote: Mon Sep 16, 2019 12:35 am Honey , in the last release was updated to use the correct uci options/parameters, please feel free to test if interested. I would be curious to see how it does with your engines using Elo 1500.
Added Honey, its in top 6 :!:

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

                               Engine  Total  Match  High   Low  HACD  LACD     HEMSE
              Ufim v8.02 UCI_Elo 1500   5000   2168  1363  1469   422   361   4737056
             Arasan 21.3 UCI_Elo 1500   5000   1641  1167  2192   495   452   6708957
             CT800 V1.34 UCI_Elo 1500   5000   1642  1149  2209   333   807  10656012
              Cheng 4.39 UCI_Elo 1500   5000   2068  1466  1466   474   300   5016436
          Discocheck 5.2 UCI_Elo 1500   5000   1948  1308  1744   381   443   5505496
              Amyan 1.72 UCI_Elo 1500   5000   1803  1218  1979   419   561   7648687
            MadChess 2.2 UCI_Elo 1500   5000   1693  1240  2067   424   552   7680236
              Cheese 2.1 UCI_Elo 1500   5000   2123  1486  1391   404   210   3546737
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1853  1349  1798   357   461   5719087
                         Stockfish 10   5000   2244  2340   416   355    46   3238666
                  Arminius 2017-01-01   5000   2216  1827   957   420    68   3236474
           Rybka v2.3.2a UCI_Elo 1500   5000   2128  1368  1504   428   269   4128324
           Rodent IV 021 UCI_Elo 1500   5000   2013  1270  1717   410   537   6605086
   DanaSah 7.9 engine_opp ucielo 1500   5000   1730  1214  2056   425   390   5673680
        Stockfish 260819 UCI_Elo 1500   5000   1337  1252  2411   421   436   6484532
               Houdini 3 UCI_Elo 1500   5000   1722  1290  1988   446   236   4121787
               Hiarcs 14 UCI_Elo 1500   5000   1795  1107  2098   376   695   8627186
               Honey X5i UCI_Elo 1500   5000   1908  1418  1674   390   372   4968602
 Deuterium v2019.2.37.72 UCI_Elo 1500   5000   1900  1270  1830   440   337   4886933

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is stronger than human move 
       according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE (Mean Squared Error) or
       Sum((HumanScore - EngineScore)^2)/total, smaller is better.

Code: Select all

TOPSIS ranking, smaller is more human
TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.1    LACD (max) W.0.4    HEMSE (min) W.0.5    Rank
------------------------------------  -------------------  ------------------  -------------------  ------
      Ufim v8.02 UCI_Elo 1500                2168                 361              4.73706e+06        5
      Arasan 21.3 UCI_Elo 1500               1641                 452              6.70896e+06        13
      CT800 V1.34 UCI_Elo 1500               1642                 807              1.0656e+07         14
      Cheng 4.39 UCI_Elo 1500                2068                 300              5.01644e+06        17
    Discocheck 5.2 UCI_Elo 1500              1948                 443              5.5055e+06         3
      Amyan 1.72 UCI_Elo 1500                1803                 561              7.64869e+06        7
     MadChess 2.2 UCI_Elo 1500               1693                 552              7.68024e+06        9
      Cheese 2.1 UCI_Elo 1500                2123                 210              3.54674e+06        15
    Rhetoric 1.4.3 UCI_Elo 1500              1853                 461              5.71909e+06        2
            Stockfish 10                     2244                  46              3.23867e+06        19
        Arminius 2017-01-01                  2216                  68              3.23647e+06        18
     Rybka v2.3.2a UCI_Elo 1500              2128                 269              4.12832e+06        11
     Rodent IV 021 UCI_Elo 1500              2013                 537              6.60509e+06        1
 DanaSah 7.9 engine_opp ucielo 1500          1730                 390              5.67368e+06        10
   Stockfish 260819 UCI_Elo 1500             1337                 436              6.48453e+06        12
       Houdini 3 UCI_Elo 1500                1722                 236              4.12179e+06        16
       Hiarcs 14 UCI_Elo 1500                1795                 695              8.62719e+06        4
       Honey X5i UCI_Elo 1500                1908                 372              4.9686e+06         6
Deuterium v2019.2.37.72 UCI_Elo 1500         1900                 337              4.88693e+06        8
I'll take that. Now what would be interesting is to test 1500, 1700, 1900 and say 2100 -see which engines are closest to humans when you factor these 4 levels. Akin to overall Elo championship.
Image