New Tool

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

Ferdy wrote: Fri Mar 13, 2020 11:55 pm
Rebel wrote: Fri Mar 13, 2020 8:32 pm Thanks Ferdy, I will give it a try, but isn't it more precise to involve the solution time in the formula? For example, running a set at 5000ms, an engine that finds the best move at 200ms (and is constant) should receive more points than an engine that finds the move at 4700ms.
That is possible indeed, but these engines are also capable of changing its bestmove. It may like m1 at 200ms, but maybe it may like m2 at 4700ms. One idea is just to test engines at a lower time of say 200ms, it would become a ranking of engines at that particular time on a particular system.
It's my experience that many engines don't have their go movetime right below one second, many take too much time, a few others move too fast. Hence I always test at least at 1000ms for reliable results. And even then there is no 100% guarantee, when I ran the latest Lc0 with MT=60000 it moved much too fast, instead of one minute it played at 32 seconds.
90% of coding is debugging, the other 10% is writing bugs.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: New Tool

Post by Ferdy »

Rebel wrote: Sat Mar 14, 2020 8:41 am
Ferdy wrote: Fri Mar 13, 2020 11:55 pm
Rebel wrote: Fri Mar 13, 2020 8:32 pm Thanks Ferdy, I will give it a try, but isn't it more precise to involve the solution time in the formula? For example, running a set at 5000ms, an engine that finds the best move at 200ms (and is constant) should receive more points than an engine that finds the move at 4700ms.
That is possible indeed, but these engines are also capable of changing its bestmove. It may like m1 at 200ms, but maybe it may like m2 at 4700ms. One idea is just to test engines at a lower time of say 200ms, it would become a ranking of engines at that particular time on a particular system.
It's my experience that many engines don't have their go movetime right below one second, many take too much time, a few others move too fast. Hence I always test at least at 1000ms for reliable results. And even then there is no 100% guarantee, when I ran the latest Lc0 with MT=60000 it moved much too fast, instead of one minute it played at 32 seconds.
Regarding Lc0 moving fast, you can set the option smartpruningfactor to 0 to maximize its search time.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

Ferdy wrote: Fri Mar 13, 2020 12:05 pm Nice tool, but would like to request a scoring feature, using scoring rate percentage as points. So for example if score1 has 100cp and score2 has 80cp, we may use for example.

Code: Select all

def perf(cp):
    K = 0.7  # SF
    pr = 100*1/(1+10**(-K*cp/400))
    
    return pr

where K can be adjusted depending on the engine.

for cp = 100
pr = 59.94% or say 60 points

for cp = 80
pr = 47.95% or say 48 points

epd c0 "m1=60, m2=48 ...";
Tried your formula, didn't work for me. Looked at the MEA logfile and everything is present in there to include solution times in the calculation formula.
90% of coding is debugging, the other 10% is writing bugs.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: New Tool

Post by Ferdy »

Rebel wrote: Sun Mar 15, 2020 10:28 am
Ferdy wrote: Fri Mar 13, 2020 12:05 pm Nice tool, but would like to request a scoring feature, using scoring rate percentage as points. So for example if score1 has 100cp and score2 has 80cp, we may use for example.

Code: Select all

def perf(cp):
    K = 0.7  # SF
    pr = 100*1/(1+10**(-K*cp/400))
    
    return pr

where K can be adjusted depending on the engine.

for cp = 100
pr = 59.94% or say 60 points

for cp = 80
pr = 47.95% or say 48 points

epd c0 "m1=60, m2=48 ...";
Tried your formula, didn't work for me.

Code: Select all

cp = 100

pr = 100*1/(1+10**(-K*cp/400))
a = -K*cp/400
K=0.7
a = -0.7*cp/400 = 0.7*100/400 = -0.175
pr = 100*1/(1+10**a) = 100*1/(1+10^a) = 100/(1 + 10^(-0.175)) = 100/(1 + 0.668) = 100/1.668 = 59.95 or 60
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

UPDATE

I am working on version 1.1, it will support MultiPV 1-8 and include the time key moves are found in the formula that calculates the (bonus) points for positions. As for an impression I created a small (10 positions) not so hard tactical set and let 11 engines run it at 10 seconds per move.

Version 1.0 will give:

Code: Select all

    EPD  : epd\eddy.epd
    Time : 10000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Stockfish 11         8  3200     8    10  0.800     10  0.800  10000   128    1
 2  Xiphos 0.6           7  2799     7    10  0.700     10  0.700  10000   128    1
 3  Andscacs 0.95        7  2799     7    10  0.700     10  0.700  10000   128    1
 4  Wasp 3.75            6  2399     6    10  0.600     10  0.600  10000   128    1
 5  Komodo 10            5  2000     5    10  0.500     10  0.500  10000   128    1
 6  Ethereal 12          4  1600     4    10  0.400     10  0.400  10000   128    1
 7  Laser 1.7            4  1600     4    10  0.400     10  0.400  10000   128    1
 8  rofChade 2.2         4  1600     4    10  0.400     10  0.400  10000   128    1
 9  Fire 7.1             4  1600     4    10  0.400     10  0.400  10000   128    1
10  RubiChess 1.4        2   800     2    10  0.200     10  0.200  10000   128    1
11  Arasan 21.3          2   800     2    10  0.200     10  0.200  10000   128    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
Stockfish the winner 8/10, Andscacs and Xiphos following with 7 found positions.

With (beta) version version 1.1 that includes time in the formula as now a key move found at 350ms receives much more points when a key move is found at 8000ms.

Code: Select all

    EPD  : epd\eddy.epd
    Time : 10000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
  1  Andscacs 0.95      210  2799     7    10  0.700    300  0.700  10000   128    1
 2  Stockfish 11       155  2068     8    10  0.800    300  0.517  10000   128    1
 3  Komodo 10          150  2000     5    10  0.500    300  0.500  10000   128    1
 4  Xiphos 0.6         132  1760     7    10  0.700    300  0.440  10000   128    1
 5  Fire 7.1           120  1600     4    10  0.400    300  0.400  10000   128    1
 6  rofChade 2.2       120  1600     4    10  0.400    300  0.400  10000   128    1
 7  Wasp 3.75          119  1588     6    10  0.600    300  0.397  10000   128    1
 8  Ethereal 12        102  1360     4    10  0.400    300  0.340  10000   128    1
 9  Laser 1.7           95  1268     4    10  0.400    300  0.317  10000   128    1
10  Arasan 21.3         51   680     2    10  0.200    300  0.170  10000   128    1
11  RubiChess 1.4       32   427     2    10  0.200    300  0.107  10000   128    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
Andscacs on top now even though it found less key moves, Andscacs solved them quicker than Stockfish, for the same reason Komodo only 5 points below Stockfish for the second place.

The 10 poistions for MEA:

Code: Select all

r1bk1n1r/pp1n1q1p/2p2p1R/3p4/3PpN2/2NB2Q1/PPP2PP1/2K1R3 w - - bm Bxe4; c0 "Bxe4=1";
2rr2k1/1pqbbppp/p3p3/4p3/2P5/2N1P1P1/PP2QPBP/2RR2K1 w - - bm c5; c0 "c5=1";
r1b1k2r/p2n1ppp/1p2p3/3p2B1/P2P4/1Nn1P3/5PPP/R3KB1R w KQkq - bm f3; c0 "f3=1";
r3qrk1/4bppp/4p3/p2pP2Q/1p1B4/1PpPP3/P1P2RPP/5RK1 w - - bm Rf6; c0 "Rf6=1";
r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq - bm Nxe6; c0 "Nxe6=1";
1br1r1k1/1b1q1pp1/p1p2n1p/1p2n3/1P6/PNN1P3/2QBBPPP/4RRK1 b - - bm c5; c0 "c5=1";
2r1qrk1/pp1b1ppp/4pn2/n1b5/8/2NQ1NP1/PP1BPPBP/R2R2K1 w - - bm b4; c0 "b4=1";
r5k1/2p1b1p1/6bp/p4P2/q3pP2/2PnB2P/PP1N4/KR3Q1R b - - bm Nb4; c0 "Nb4=1";
5R2/2kp4/pr6/1p2P3/1p6/8/1PP2PPP/6K1 b - - bm a5; c0 "a5=1";
1R6/p3k1p1/7p/2b1pP2/P1r3P1/B7/7P/7K w - - bm Rc8; c0 "Rc8=1";
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

I have given up on STS, it's not only outdated but can't be updated to something useful.

STS in its orginal form:

Code: Select all

    EPD  : epd\sts.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Bouquet 1.5      35920  3192  1226  1500  0.817  45000  0.798   1000   128    1
 2  Houdini 1.5      35869  3188  1223  1500  0.815  45000  0.797   1000   128    1
 3  Stockfish 11     35124  3124  1208  1500  0.805  45000  0.781   1000   128    1
 4  Komodo 10        34622  3076  1196  1500  0.797  45000  0.769   1000   128    1
 5  Laser 1.7        34156  3036  1162  1500  0.775  45000  0.759   1000   128    1
 6  Ethereal 1.2     33795  3004  1153  1500  0.769  45000  0.751   1000   128    1
 7  Xiphos 0.6       33193  2951  1133  1500  0.755  45000  0.738   1000   128    1
 8  rofChade 2.2     32768  2911  1119  1500  0.746  45000  0.728   1000   128    1
 9  Wasp 3.75        32040  2847  1089  1500  0.726  45000  0.712   1000   128    1
10  Arasan 21.3      30190  2684  1040  1500  0.693  45000  0.671   1000   128    1
11  RubiChess 1.4    29900  2656  1024  1500  0.683  45000  0.664   1000   128    1
12  Fire 7.1         29310  2604  1019  1500  0.679  45000  0.651   1000   128    1
Okay, I aready knew that a couple of years ago, the set is tuned with the Rybka family and its derivatives Houdini, Bouquet, the then strongest engines and it can't be that those 300-400 less programs outperform Stockfish and Komodo.

Code: Select all

    EPD  : sts-sf11
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Houdini 1.5      30125  2676  1026  1500  0.684  45000  0.669   1000   128    1
 2  Komodo 10        29570  2628  1023  1500  0.682  45000  0.657   1000   128    1
 3  Xiphos 0.6       29569  2628  1011  1500  0.674  45000  0.657   1000   128    1
 4  Ethereal 1.2     29103  2588   992  1500  0.661  45000  0.647   1000   128    1
 5  Bouquet 1.5      29080  2584   992  1500  0.661  45000  0.646   1000   128    1
 6  Laser 1.7        28806  2560   980  1500  0.653  45000  0.640   1000   128    1
 7  rofChade 2.2     28693  2552   981  1500  0.654  45000  0.638   1000   128    1
 8  Wasp 3.75        27597  2451   938  1500  0.625  45000  0.613   1000   128    1
 9  Arasan 21.3      26379  2343   909  1500  0.606  45000  0.586   1000   128    1
10  RubiChess 1.4    25996  2311   889  1500  0.593  45000  0.578   1000   128    1
11  Fire 7.1         25731  2287   895  1500  0.597  45000  0.572   1000   128    1
Running TSC allowing Sf11 at 20 cores with MultiPV=4 at 60.000ms (1 miute) creating more reliable points distribution did not help either, Houdini 1.5 still tops., Bouquet as fifth.

It not only shows how STS was tuned with the Ryba family engines but also the positions were chosen to fit the wished outcome, I will not longer support it, it's beyond hope although it will have value for starters.

----------------------------------------------------------------------------------------------------------

I switched to Jon's Arasan test suite. That helped:

Code: Select all

    EPD  : epd\arasan.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Stockfish 11      1684  1143    62   196  0.316   5880  0.286   1000   128    1
 2  Xiphos 0.6        1335   908    48   196  0.245   5880  0.227   1000   128    1
 3  Ethereal 1.2       672   456    24   196  0.122   5880  0.114   1000   128    1
 4  Komodo 10          623   423    22   196  0.112   5880  0.106   1000   128    1
 5  Wasp 3.75          619   419    22   196  0.112   5880  0.105   1000   128    1
 6  Houdini 1.5        528   355    19   196  0.096   5880  0.089   1000   128    1
 7  Fire 7.1           477   324    17   196  0.086   5880  0.081   1000   128    1
 8  Laser 1.7          460   311    16   196  0.081   5880  0.078   1000   128    1
 9  rofChade 2.2       393   264    14   196  0.071   5880  0.066   1000   128    1
10  Arasan 21.3        267   179    10   196  0.051   5880  0.045   1000   128    1
11  RubiChess 1.4      252   168     9   196  0.045   5880  0.042   1000   128    1
And at 5000ms (5 seconds)

Code: Select all

    EPD  : epd\arasan.epd
    Time : 5000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Stockfish 11      2556  1739   109   196  0.556   5880  0.435   5000   128    1
 2  Xiphos 0.6        1828  1243    75   196  0.383   5880  0.311   5000   128    1
 3  Komodo 10         1521  1036    60   196  0.306   5880  0.259   5000   128    1
 4  Ethereal 1.2      1086   739    46   196  0.235   5880  0.185   5000   128    1
 5  Houdini 1.5       1067   723    45   196  0.230   5880  0.181   5000   128    1
 6  Wasp 3.75          948   644    40   196  0.204   5880  0.161   5000   128    1
 7  rofChade 2.2       942   640    42   196  0.214   5880  0.160   5000   128    1
 8  Laser 1.7          829   563    38   196  0.194   5880  0.141   5000   128    1
 9  Fire 7.1           675   460    29   196  0.148   5880  0.115   5000   128    1
10  Arasan 21.3        511   343    24   196  0.122   5880  0.086   5000   128    1
11  RubiChess 1.4      322   215    13   196  0.066   5880  0.054   5000   128    1
90% of coding is debugging, the other 10% is writing bugs.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Tool

Post by Dann Corbit »

Re:"Okay, I aready knew that a couple of years ago, the set is tuned with the Rybka family and its derivatives Houdini, Bouquet, the then strongest engines and it can't be that those 300-400 less programs outperform Stockfish and Komodo."
I don't think I ever used Boquet.
At the start of the test, Rybka was used, but Houdini did not exist yet.
As soon as Komodo and Stockfish were at least the third strongest engines, they were used.
When I first started, the strongest engine available was Rybka. I did not have a 64 bit OS, so it was the 32 bit version that i used. About the middle of the test set, Rybka was no longer used because it was no longer the third strongest.

My formula for the analysis was to use the top three engines at one hour each for each position. It spanned a duration of 6 years or so. Hence the hardware and software were both on the order of 64 times stronger at the end of the test formation than at the beginning.

If the methodology of your test improvement system is correct, it should work for any random collection of positions.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: New Tool

Post by Terje »

@Rebel You probably mean Ethereal 12, not 1.2? :)
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

Yep.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

Dann Corbit wrote: Tue Mar 24, 2020 9:44 am Re:"Okay, I aready knew that a couple of years ago, the set is tuned with the Rybka family and its derivatives Houdini, Bouquet, the then strongest engines and it can't be that those 300-400 less programs outperform Stockfish and Komodo."
I don't think I ever used Boquet.
At the start of the test, Rybka was used, but Houdini did not exist yet.
As soon as Komodo and Stockfish were at least the third strongest engines, they were used.
When I first started, the strongest engine available was Rybka. I did not have a 64 bit OS, so it was the 32 bit version that i used. About the middle of the test set, Rybka was no longer used because it was no longer the third strongest.

My formula for the analysis was to use the top three engines at one hour each for each position. It spanned a duration of 6 years or so. Hence the hardware and software were both on the order of 64 times stronger at the end of the test formation than at the beginning.
My admiration for your tireless efforts.
Dann Corbit wrote: Tue Mar 24, 2020 9:44 am If the methodology of your test improvement system is correct, it should work for any random collection of positions.
Yes, but like with STS, sets created with as base the nowadays strongest engines I suppose these sets will outdate within 3-4-5 years and need refreshment.

Remains those (mainly tactical) sets that have 100% correct best moves, engines will get 10 points for finding the move plus 0-20 points based on how quick they find the move.
90% of coding is debugging, the other 10% is writing bugs.