Working with CLOP

Adam Hair · Post by **Adam Hair** » Thu Jan 22, 2015 11:34 pm

mar wrote:This: http://www.talkchess.com/forum/viewtopi ... 22&t=50823
From what I understood Miguel did something very similar (as can be seen in the thread).

I have tried multiple times to improve on Gaviota's parameter values with CLOP with no success. No luck with SPSA either. Miguel's method, which is similar to Peter's, seems to work great.

jdart · Post by **jdart** » Fri Jan 23, 2015 2:16 am

I am using NOMAD (https://www.gerad.ca/nomad/).

--Jon

Ferdy · Post by **Ferdy** » Fri Jan 23, 2015 4:12 am

Robert Pope wrote:I finally got CLOP up and running on my computer. That was a real headache, since I didn't realize for quite a while that I needed to install python to run the python cutechess-cli script, and then I couldn't get the PATH variable to stick.

Now that it is running and I am doing my first optimization test (just of standard piece values), I have a few questions:

1. How do you know if a term has been optimized "enough"? If it takes 500,000 games, so be it.

One way is to look at the plot for example there could be a point where param value converges. See sample below, clopping one parameter with a range of 100 to 1000, it appears to converge around 800 to 900, x is time y is my param term.

In my example one param with 100 to 1000 range, I can have 1000-100 + 1 = 901 unique values, say a minimum of 1000 games per unique values so you may need a minimum of 1000 games * 901 unique param value = 901000 games. But clop is smart, so not all param unique value will be tested at 1000 games, if it converges early to some values you will get early indication.

I make a little tool to get detail of what param were tested most and its performance.

Code: Select all

CLOP Data Reader v3.0

Number of parameters&#58; 1
First parameter&#58; OffensivePercent
Param1&#58; Min 110, Max 1000
Total games&#58; 5212

Param1     W /    L /    D     NetW   Games   Score     LOS
   110     0 /    1 /    1,     -1       2    25.00%   25.00%

&#91;...&#93;

   462     0 /    1 /    1,     -1       2    25.00%   25.00%
   463     1 /    0 /    1,     +1       2    75.00%   75.00%
   464     0 /    1 /    1,     -1       2    25.00%   25.00%
   472     2 /    0 /    0,     +2       2   100.00%   87.50%
   473     2 /    0 /    0,     +2       2   100.00%   87.50%
   478     4 /    0 /    0,     +4       4   100.00%   96.88%
   481     1 /    0 /    1,     +1       2    75.00%   75.00%

&#91;...&#93;
  
   717     2 /    3 /    1,     -1       6    41.67%   34.38%
   719     1 /    0 /    1,     +1       2    75.00%   75.00%
   720     5 /    1 /    0,     +4       6    83.33%   93.75%
   721     2 /    0 /    0,     +2       2   100.00%   87.50%
   722     3 /    0 /    1,     +3       4    87.50%   93.75%
   723     2 /    0 /    0,     +2       2   100.00%   87.50%
   724     2 /    0 /    0,     +2       2   100.00%   87.50%
   726     3 /    1 /    0,     +2       4    75.00%   81.25%

&#91;...&#93;

   839    28 /    1 /    1,    +27      30    95.00%  100.00%
   840    19 /    0 /    1,    +19      20    97.50%  100.00%
   841    20 /    4 /    2,    +16      26    80.77%   99.95%
   842    24 /    2 /    4,    +22      30    86.67%  100.00%
   843    32 /    0 /    0,    +32      32   100.00%  100.00%
   844    15 /    2 /    5,    +13      22    79.55%   99.93%
   845    25 /    2 /    1,    +23      28    91.07%  100.00%
   846    17 /    1 /    2,    +16      20    90.00%  100.00%
   847    16 /    0 /    2,    +16      18    94.44%  100.00%
   848    30 /    0 /    0,    +30      30   100.00%  100.00%
   849    16 /    4 /    0,    +12      20    80.00%   99.64%
   850    21 /    1 /    4,    +20      26    88.46%  100.00%
   851    33 /    1 /    2,    +32      36    94.44%  100.00%
   852    19 /    2 /    3,    +17      24    85.42%   99.99%
   853    18 /    4 /    2,    +14      24    79.17%   99.87%
   854    17 /    1 /    2,    +16      20    90.00%  100.00%
   855    28 /    4 /    2,    +24      34    85.29%  100.00%
   856    19 /    1 /    0,    +18      20    95.00%  100.00%
   857    26 /    0 /    0,    +26      26   100.00%  100.00%
   858    32 /    2 /    2,    +30      36    91.67%  100.00%
   859    26 /    1 /    5,    +25      32    89.06%  100.00%

&#91;...&#93;

   921    22 /    2 /    2,    +20      26    88.46%  100.00%
   922    15 /    3 /    2,    +12      20    80.00%   99.78%
   923    24 /    3 /    1,    +21      28    87.50%  100.00%
   924    27 /    0 /    1,    +27      28    98.21%  100.00%
   925    10 /    3 /    1,     +7      14    75.00%   97.13%
   926    27 /    1 /    2,    +26      30    93.33%  100.00%
   927    11 /    2 /    1,     +9      14    82.14%   99.35%

&#91;...&#93;

   997     7 /    2 /    1,     +5      10    75.00%   94.53%
   998     6 /    0 /    2,     +6       8    87.50%   99.22%
   999     4 /    0 /    0,     +4       4   100.00%   96.88%
  1000    13 /    0 /    1,    +13      14    96.43%   99.99%

Top Parameters&#58; By LOS
&#91;1&#93; par1  838, score  92.65%, LOS 100.000%, Games    34, NetWins   +29
&#91;2&#93; par1  839, score  95.00%, LOS 100.000%, Games    30, NetWins   +27
&#91;3&#93; par1  843, score 100.00%, LOS 100.000%, Games    32, NetWins   +32
&#91;4&#93; par1  848, score 100.00%, LOS 100.000%, Games    30, NetWins   +30
&#91;5&#93; par1  851, score  94.44%, LOS 100.000%, Games    36, NetWins   +32

Look at the games so far played by top param values, it is low but that would give you an idea early, 34 games for param 838, +29 net wins. If I can see that one of the top param values reaches at least 1000 games, I stop clop, and go for test matches ahead. In the example above the top param values are close to each other, so it does not matter much whether you use 838, 839, 843, 848, 851 as your final value as long as during test matches the engine that uses that param has improved.

2. My CLOP file has me running 3 processors on my Quad. But invariably, after a few hundred games, I start to get processors that drop out. A soft pause won't close out the threads, and I manually have to do a hard close and then go into Task Manager and kill an instance of cutechess-cli and the engine I was playing.

I'm guessing it must be my engine that is terminating prematurely, since it is never the one left idling, but how can I troubleshoot this? I have no idea which game it stuck on to find a corresponding log file.

I use cutechess-cli version 0.5.1 in clop.

nionita · Post by **nionita** » Fri Jan 23, 2015 6:35 pm

mar wrote:It has been used sucessfully in several engines giving significant gains in gameplay

It's true, but of course you understand what is the difference between a theoretical proof and "several engines" in which it was successful

Also maybe there is a proof for that method too, and I don't know about it...

mar · Post by **mar** » Fri Jan 23, 2015 7:04 pm

MTDf should have been theoretically sound as well but practice has proven otherwise.
I got significant improvement thanks to this method, that's proof enough for me.
Of course if you're going to tune hundreds of eval params with CLOP I say good luck and let us know in a couple of years.

lucasart · Post by **lucasart** » Sat Jan 24, 2015 12:58 am

mar wrote:My suggestion is: don't waste time on CLOP. If you want to tune eval there are vastly superior and much faster methods that actually converge.

Agreed. I've wasted so much time trying to tune just 2 or 3 variables with CLOP, and rarely saw it converge. In my experience, it's not uncommon that CLOP needs over 100,000 games to tune only 2 variables. As for 3, forget about it!

Admittedly, I've gain a non trivial elo amount in DiscoCheck thanks to CLOP. But anything else would have done a better job than CLOP with the same amount of resources.

Plus the CLOP interface is horrible. CLOP runs a script that plays a single game. Already that it's serious design flaw, because there is lots of overhead in starting everything (from cutechess-cli to the engines) and playing a single game, instead of sending a simply ucinewgame to an already existent process.

Since CLOP needs to play hundred of thousands of games to only tune 2 or 3 variables, you need to kill any overhead you can, and you can't even use tc, but have to resort to depth=6 or so to play super fast games. Not to mention the fact, that once you have found the depth=6 optimal values, they are likely not optimal at normal tc testing...

In Stockfish, we use SPSA by Joona, which is much superior in practice (perhaps not in theory, but read my signature about theory and practice). I strongly recommend you look at Joona's SPSA script, instead of wasting yourtime trying to figure out how the CLOP/Python/cutechess-cli plumbing works (I know how it works, it's a real penance, and a waste of time anyway, trust me!).

lucasart · Post by **lucasart** » Sat Jan 24, 2015 1:13 am

mar wrote:MTDf should have been theoretically sound as well but practice has proven otherwise.
I got significant improvement thanks to this method, that's proof enough for me.
Of course if you're going to tune hundreds of eval params with CLOP I say good luck and let us know in a couple of years.

couple of years? Even billions of years won't be enough!

Laskos · Post by **Laskos** » Sat Jan 24, 2015 9:05 am

lucasart wrote:
mar wrote:My suggestion is: don't waste time on CLOP. If you want to tune eval there are vastly superior and much faster methods that actually converge.
Agreed. I've wasted so much time trying to tune just 2 or 3 variables with CLOP, and rarely saw it converge. In my experience, it's not uncommon that CLOP needs over 100,000 games to tune only 2 variables. As for 3, forget about it!

Pretty much what I have said at CLOP's release:
http://www.talkchess.com/forum/viewtopic.php?t=40987
Quadratic non-iterative regression on noisy data needed 300,000 datapoints to detect the global optimum even in 2 dimensions on pretty smooth distribution. Pictures are gone, though. With more than 2-3 dimensions, the detection of global optima is hopeless, even the many local optima will be missed. Basically CLOP can be useful only in low dimensional (2-3) perturbative cases, where one can fit one parameter by one sequentially anyway.

nionita · Post by **nionita** » Sat Jan 24, 2015 10:41 am

lucasart wrote:
mar wrote:MTDf should have been theoretically sound as well but practice has proven otherwise.
I got significant improvement thanks to this method, that's proof enough for me.
Of course if you're going to tune hundreds of eval params with CLOP I say good luck and let us know in a couple of years.
couple of years? Even billions of years won't be enough!

Ok, then I will also try SPSA

Michel · Post by **Michel** » Sat Jan 24, 2015 10:42 am

In Stockfish, we use SPSA by Joona, which is much superior in practice (perhaps not in theory, but read my signature about theory and practice).

Now where did that come from?? There are convergence proofs for SPSA but not for CLOP. So the theoretical situation for SPSA is considerably better than for CLOP.

More importantly, CLOP invests all its time in finding a true (perhaps local) optimum whereas SPSA does hill climbing. Since you are not really interested in a true optimum (if you are near an optimum there is little elo to be gained anyway), but rather in an improvement, SPSA can give results fast even with a large number of parameters (the parameters that have no elo impact or are already optimal will perform a random walk).

Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP

Re: Working with CLOP