UCI_Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Added MadChess in TOPSIS ranking.

Image
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Added Stockfish 10 in the 5000 test positions to see how it would compare to the UCI_Elo 1500 engines.
The result is a bit surprising, it topped the number of match criteria. This would mean that Humans with Elo 1500 is capable of making good moves, perhaps some of these are forced moves like captures and check evasions. If this is the case then uci elo 1500 engine can be adjusted to make stronger moves. But there is a clear difference on Stockfish 10 performance and that is its low LACD at 47cp, or if its move is weak it only gives its opponent a small advantage. So to improve uci elo 1500 approximation, LACD should be increased. Notable engines with high LACD are CT800, Hiarcs, Amyan and MadChess.

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

Code: Select all


                               Engine  Total  Match  High   Low  HACD  LACD
 Deuterium v2019.2.37.59 UCI_Elo 1500   5000   1891  1360  1749   426   284
              Ufim v8.02 UCI_Elo 1500   5000   2164  1360  1476   447   423
             CT800 V1.34 UCI_Elo 1500   5000   1627  1164  2209   329   914
             Arasan 21.3 UCI_Elo 1500   5000   1634  1178  2188   491   440
       DanaSah 7.9 Human UCI_Elo 1500   5000   1710  1195  2095   434   324
    Stockfish 2019.07.14 UCI_Elo 1500   5000   1304  1231  2465   443   390
              Cheng 4.39 UCI_Elo 1500   5000   2141  1527  1332   427   144
          Discocheck 5.2 UCI_Elo 1500   5000   1947  1308  1745   380   445
               Houdini 3 UCI_Elo 1500   5000   1704  1269  2027   427   174
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1875  1330  1795   360   448
               Hiarcs 14 UCI_Elo 1500   5000   1798  1112  2090   375   685
              Cheese 2.1 UCI_Elo 1500   5000   2138  1532  1330   421   165
              Amyan 1.72 UCI_Elo 1500   5000   1803  1186  2011   460   551
            MadChess 2.2 UCI_Elo 1500   5000   1705  1195  2100   438   526
                         Stockfish 10   5000   2237  2333   430   355    47

Code: Select all

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, between engine move and human move where engine move is stronger 
       than human move by Centipawn amount, according to Stockfish 2019.04.16.
LACD : Low Average Centipawn Difference, between engine move and human move where engine move is weaker 
       than human move by Centipawn amount, according to Stockfish 2019.04.16.

Now I got an idea on TOPSIS weights. Modify criteria weight such that Stockfish 10 ranking is bad.
Here is a sample weighing, weight=[0.5, 0.05, 0.05, 0.1, 0.3].
With these weights Stockfish 10 is at rank 14/15.

Code: Select all

TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.5    High (min) W.0.05    Low (max) W.0.05    HACD (min) W.0.1    LACD (max) W.0.3    Rank
------------------------------------  -------------------  -------------------  ------------------  ------------------  ------------------  ------
Deuterium v2019.2.37.59 UCI_Elo 1500         1891                 1360                 1749                426                 284            11
      Ufim v8.02 UCI_Elo 1500                2164                 1360                 1476                447                 423            5
      CT800 V1.34 UCI_Elo 1500               1627                 1164                 2209                329                 914            1
      Arasan 21.3 UCI_Elo 1500               1634                 1178                 2188                491                 440            8
   DanaSah 7.9 Human UCI_Elo 1500            1710                 1195                 2095                434                 324            10
 Stockfish 2019.07.14 UCI_Elo 1500           1304                 1231                 2465                443                 390            9
      Cheng 4.39 UCI_Elo 1500                2141                 1527                 1332                427                 144            13
    Discocheck 5.2 UCI_Elo 1500              1947                 1308                 1745                380                 445            6
       Houdini 3 UCI_Elo 1500                1704                 1269                 2027                427                 174            15
    Rhetoric 1.4.3 UCI_Elo 1500              1875                 1330                 1795                360                 448            7
       Hiarcs 14 UCI_Elo 1500                1798                 1112                 2090                375                 685            2
      Cheese 2.1 UCI_Elo 1500                2138                 1532                 1330                421                 165            12
      Amyan 1.72 UCI_Elo 1500                1803                 1186                 2011                460                 551            3
     MadChess 2.2 UCI_Elo 1500               1705                 1195                 2100                438                 526            4
            Stockfish 10                     2237                 2333                 430                 355                  47            14
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Add new criteria to compare the effect of human move with Elo 1500 and engine move with uci elo 1500. This is called HEMSE or
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better. If the score of human move is close to the score of engine move then the human and engine is similar. For same move the MSE is zero because their scores are the same. So even if for a given position the human move is not the save as the engine move, if their score is close according to the judge Stockfish dev 2019.04.16 then there is a high probability that their strength is close.

Add Stockfish 10 ccrl 3000+ engine and Arminius a ccrl 2600+ engine in the test to observe its MSE and use to help determine appropriate weight for TOPSIS. These 2 engines should be ranked lowest when weights are applied in TOPSIS.

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

                               Engine  Total  Match  High   Low  HACD  LACD     HEMSE
 Deuterium v2019.2.37.61 UCI_Elo 1500   5000   1886  1370  1744   424   282   4208112
              Ufim v8.02 UCI_Elo 1500   5000   2168  1363  1469   422   361   4737056
             Arasan 21.3 UCI_Elo 1500   5000   1641  1167  2192   495   452   6708957
             CT800 V1.34 UCI_Elo 1500   5000   1642  1149  2209   333   807  10656012
       DanaSah 7.9 Human UCI_Elo 1500   5000   1679  1202  2119   461   396   6041935
    Stockfish 2019.07.14 UCI_Elo 1500   5000   1335  1245  2420   421   428   6462853
              Cheng 4.39 UCI_Elo 1500   5000   2068  1466  1466   474   300   5016436
          Discocheck 5.2 UCI_Elo 1500   5000   1948  1308  1744   381   443   5505496
              Amyan 1.72 UCI_Elo 1500   5000   1803  1218  1979   419   561   7648687
            MadChess 2.2 UCI_Elo 1500   5000   1693  1240  2067   424   552   7680236
              Cheese 2.1 UCI_Elo 1500   5000   2123  1486  1391   404   210   3546737
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1853  1349  1798   357   461   5719087
                         Stockfish 10   5000   2244  2340   416   355    46   3238666
                  Arminius 2017-01-01   5000   2216  1827   957   420    68   3236474
 

Code: Select all

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is stronger than human move 
       according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better.

Tried to run TOPSIS with weight distributions [0.1, 0.4, 0.5] for criteria [Match, LACD, HEMSE] respectively. By using these weights Stockfish and Arminius are ranked lower.

Code: Select all

TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.1    LACD (max) W.0.4    HEMSE (min) W.0.5    Rank
------------------------------------  -------------------  ------------------  -------------------  ------
Deuterium v2019.2.37.61 UCI_Elo 1500         1886                 282              4.20811e+06        7
      Ufim v8.02 UCI_Elo 1500                2168                 361              4.73706e+06        3
      Arasan 21.3 UCI_Elo 1500               1641                 452              6.70896e+06        8
      CT800 V1.34 UCI_Elo 1500               1642                 807              1.0656e+07         6
   DanaSah 7.9 Human UCI_Elo 1500            1679                 396              6.04194e+06        9
 Stockfish 2019.07.14 UCI_Elo 1500           1335                 428              6.46285e+06        10
      Cheng 4.39 UCI_Elo 1500                2068                 300              5.01644e+06        12
    Discocheck 5.2 UCI_Elo 1500              1948                 443              5.5055e+06         2
      Amyan 1.72 UCI_Elo 1500                1803                 561              7.64869e+06        4
     MadChess 2.2 UCI_Elo 1500               1693                 552              7.68024e+06        5
      Cheese 2.1 UCI_Elo 1500                2123                 210              3.54674e+06        11
    Rhetoric 1.4.3 UCI_Elo 1500              1853                 461              5.71909e+06        1
            Stockfish 10                     2244                  46              3.23867e+06        14
        Arminius 2017-01-01                  2216                  68              3.23647e+06        13
The top 3 engines that is close to the human player with Elo 1500 are:
Rhetoric, Discocheck and Ufim.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Ferdy wrote: Fri Jul 26, 2019 1:02 am UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550
Test set (around 60k pos) from human players with fide elo 1450 to 1550.
https://drive.google.com/file/d/1iaA-SA ... sp=sharing
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

Ferdy wrote: Fri Jul 26, 2019 3:33 am
Ferdy wrote: Fri Jul 26, 2019 1:02 am UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550
Test set (around 60k pos) from human players with fide elo 1450 to 1550.
https://drive.google.com/file/d/1iaA-SA ... sp=sharing
Extract some info on blunder amount and opponen'ts material values from that test set.
From the first plot (top, left), human blunders 50 to 150, cp or around 1 pawn from a playable positions [+/-50]cp. There is a high occurence when opponen'ts material is still high. Generally high cp loss is observed as own's position gets worse and opponent's material is still high. This info can be used to simulate the uci elo 1500 engines of when to blunder and at what amount.

Image
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: UCI_Elo

Post by jdart »

I have not tuned UCI_Elo for Arasan except very approximately. There are some parameters used for strength reduction and I am sure they could be set better than they are. Still, it appears a ELO setting of 1500 is not too far off the mark.

--Jon
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: UCI_Elo

Post by pedrox »

Ferdy wrote: Fri Jul 26, 2019 1:02 am Add new criteria to compare the effect of human move with Elo 1500 and engine move with uci elo 1500. This is called HEMSE or
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better. If the score of human move is close to the score of engine move then the human and engine is similar. For same move the MSE is zero because their scores are the same. So even if for a given position the human move is not the save as the engine move, if their score is close according to the judge Stockfish dev 2019.04.16 then there is a high probability that their strength is close.

Add Stockfish 10 ccrl 3000+ engine and Arminius a ccrl 2600+ engine in the test to observe its MSE and use to help determine appropriate weight for TOPSIS. These 2 engines should be ranked lowest when weights are applied in TOPSIS.

Code: Select all

UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550

                               Engine  Total  Match  High   Low  HACD  LACD     HEMSE
 Deuterium v2019.2.37.61 UCI_Elo 1500   5000   1886  1370  1744   424   282   4208112
              Ufim v8.02 UCI_Elo 1500   5000   2168  1363  1469   422   361   4737056
             Arasan 21.3 UCI_Elo 1500   5000   1641  1167  2192   495   452   6708957
             CT800 V1.34 UCI_Elo 1500   5000   1642  1149  2209   333   807  10656012
       DanaSah 7.9 Human UCI_Elo 1500   5000   1679  1202  2119   461   396   6041935
    Stockfish 2019.07.14 UCI_Elo 1500   5000   1335  1245  2420   421   428   6462853
              Cheng 4.39 UCI_Elo 1500   5000   2068  1466  1466   474   300   5016436
          Discocheck 5.2 UCI_Elo 1500   5000   1948  1308  1744   381   443   5505496
              Amyan 1.72 UCI_Elo 1500   5000   1803  1218  1979   419   561   7648687
            MadChess 2.2 UCI_Elo 1500   5000   1693  1240  2067   424   552   7680236
              Cheese 2.1 UCI_Elo 1500   5000   2123  1486  1391   404   210   3546737
          Rhetoric 1.4.3 UCI_Elo 1500   5000   1853  1349  1798   357   461   5719087
                         Stockfish 10   5000   2244  2340   416   355    46   3238666
                  Arminius 2017-01-01   5000   2216  1827   957   420    68   3236474
 

Code: Select all

::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low  : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is stronger than human move 
       according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score 
       and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better.

Tried to run TOPSIS with weight distributions [0.1, 0.4, 0.5] for criteria [Match, LACD, HEMSE] respectively. By using these weights Stockfish and Arminius are ranked lower.

Code: Select all

TOPSIS (mnorm=vector, wnorm=sum) - Solution:
             ALT./CRIT.                Match (max) W.0.1    LACD (max) W.0.4    HEMSE (min) W.0.5    Rank
------------------------------------  -------------------  ------------------  -------------------  ------
Deuterium v2019.2.37.61 UCI_Elo 1500         1886                 282              4.20811e+06        7
      Ufim v8.02 UCI_Elo 1500                2168                 361              4.73706e+06        3
      Arasan 21.3 UCI_Elo 1500               1641                 452              6.70896e+06        8
      CT800 V1.34 UCI_Elo 1500               1642                 807              1.0656e+07         6
   DanaSah 7.9 Human UCI_Elo 1500            1679                 396              6.04194e+06        9
 Stockfish 2019.07.14 UCI_Elo 1500           1335                 428              6.46285e+06        10
      Cheng 4.39 UCI_Elo 1500                2068                 300              5.01644e+06        12
    Discocheck 5.2 UCI_Elo 1500              1948                 443              5.5055e+06         2
      Amyan 1.72 UCI_Elo 1500                1803                 561              7.64869e+06        4
     MadChess 2.2 UCI_Elo 1500               1693                 552              7.68024e+06        5
      Cheese 2.1 UCI_Elo 1500                2123                 210              3.54674e+06        11
    Rhetoric 1.4.3 UCI_Elo 1500              1853                 461              5.71909e+06        1
            Stockfish 10                     2244                  46              3.23867e+06        14
        Arminius 2017-01-01                  2216                  68              3.23647e+06        13
The top 3 engines that is close to the human player with Elo 1500 are:
Rhetoric, Discocheck and Ufim.
In the list is not Hiarcs that was the engine that seemed closer to Kai's formula, was it an oversight?
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: UCI_Elo

Post by pedrox »

I find a problem with the Kai formula for Elo CCRL bass (minor 800 CCRL or 1400 FIDE).

I have been able to establish a relationship between Elo FIDE, Elo USCF and Elo Active ( Elo of dedicated chess machines in
https://www.schach-computer.info/wiki/i ... -Elo-Liste )

Code: Select all

ELO CCRL	ELO FIDE	ELO USCF	ELO ACTIVE	NODES/S		RANDF		RANDF/2
2600		2660		2733		2665		838.861		
2500		2590		2662		2594		419.430		0,00		0,00
2400		2520		2590		2522		209.715		12,50		6,25
2300		2450		2519		2451		104.858		25,00		12,50
2200		2380		2448		2380		52.429		37,50		18,75
2100		2310		2376		2308		26.214		50,00		25,00
2000		2240		2305		2237		15.729		75,00		37,50
1900		2170		2233		2165		13.107		100,00		50,00
1800		2100		2162		2094		9.175		125,00		62,50
1700		2030		2091		2023		5.898		150,00		75,00
1600		1960		2019		1951		3.604		183,33		91,67
1500		1890		1948		1880		2.130		216,67		108,33
1400		1820		1857		1789		1.229		250,00		125,00
1300		1750		1763		1695		696		300,00		150,00
1200		1680		1670		1602		389		350,00		175,00
1100		1610		1577		1509		215		400,00		200,00
1000		1540		1483		1415		200		450,00		225,00
900		1470		1390		1322		200		650,00		325,00
800		1400		1297		1229		200		850,00		425,00
700		1330		1203		1135		100		1050,00		525,00
600		1260		1110		1042		100		1250,00		625,00
500		1190		1017		949		100		1450,00		725,00
400		1120		923		855		100		1850,00		925,00
300		1050		830		762		100		2250,00		1125,00
200		980		737		669		100		2650,00		1325,00
100		910		643		575		100		3050,00		1525,00
0		840		550		482		100		
I use (formulas of other experts) :

Code: Select all

ELO USCF <--> ELO ACTIVE
Elo USCF = Elo Info-Active + 68

ELO FIDE <--> ELO USCF
Elo USCF = 20 + (1.02×Elo FIDE) if Elo FIDE > 1886
Elo USCF = -570 + (Elo FIDE/0.75) if Elo FIDE ≤ 1886
Elo FIDE = (Elo USCF-20) / 1.02 if Elo USCF > 1945
Elo FIDE = 0.75 * (570 + Elo USCF) if Elo USCF ≤ 1945

ELO FIDE <--> ELO CCRL
Elo FIDE = (0.7 x Elo CCRL) + 840
If we take the value of an engine with random moves with an approximate Elo of 250 CCRL this would give an Elo FIDE of more than 1000, Elo USCF of more than 750 and an Elo Active of more than 700 that are the minimum of each list and for the games that I have observed with a player with Elo FIDE 1000 plays much better than an engine with random moves, the players of 1000 points know the value of the pieces and in most plays they do not lose material. Also if we take an engine like Ram with 500 CCRL we can easily see that it will not win a 1200 FIDE. Even Alouette with CCRL 700 will not play as a 1300 FIDE.

In these cases I think it is more approximate:
Elo FIDE = (0.75 x Elo CCRL) + 700 or
Elo FIDE = (0.8 x Elo CCRL) + 560 or even
Elo FIDE = (0.85 x Elo CCRL) + 420

I have applied the 3 formulas progressively, so for example Ram has an ELO FIDE more similar to 850 instead of 1200, a random move has a ELO FIDE of 650 and not 1000 and Alouette will play as 1100 Elo and not 1300.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: UCI_Elo

Post by Robert Pope »

A little bit of a tangent: if you get an engine to frequently match the moves of a 1500 player, is there any reason to believe the engine would also be 1500? If you match 95% of moves, what are in the 5%? Probably all the dumb blunders. So I would think you would end up with something that picks 1500-level quality moves, but never blunders, and it would actually be maybe 200 elo stronger in practice. Or am I missing something?
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: UCI_Elo

Post by Ferdy »

pedrox wrote: Thu Aug 08, 2019 3:07 pm In the list is not Hiarcs that was the engine that seemed closer to Kai's formula, was it an oversight?
Hiarcs has bug on promotion uci move notation, also it is commercial and some people may not be able to try it.