| View previous topic :: View next topic |
| Author |
Message |
Gary
Joined: 12 Dec 2006 Posts: 338
|
Posted: Sat Mar 10, 2012 2:07 pm Post subject: CLOP on Stockfish |
|
|
I've been working on the ThreatAttacks evaluation in Stockfish, trying to come up with an improvement. So, I decided to try CLOP out.
I CLOPtomized the 5 parameters (code diff: https://github.com/glinscott/Stockfish/compare/master...improve_evaluate_threats). After 7500 games, CLOP was reporting excellent numbers in the weighted ELO of 28, with 95% LCB of 19.
The Max tab was showing a Mean Elo of 21 though, with 95% LCB of -3. There was no Max found.
Running 1500 games at 40/2sec+0.01 against base Stockfish, ended up at -3 ELO. So, it's close to being good, but not an improvement yet.
Any thoughts on where I could be going wrong with my CLOP usage? Just not enough games in CLOP land? I tested CLOP at 40/1sec initially, and the values I got tested at about -30ELO. So, it's definitely doing better at 40/2sec, but, still - not beating base Stockfish . |
|
| Back to top |
|
 |
Marco Costalba
Joined: 14 Jun 2008 Posts: 2090
|
Posted: Sat Mar 10, 2012 2:51 pm Post subject: Re: CLOP on Stockfish |
|
|
I am very interested in your tests with CLOP, I have looked at your patch and here are some quick comments:
1) 7500 games is defenitely to low number for 5 parameters tuning
2) You not only have added tuning code but have also changed the semantic unifying pawn threats with others pieces threats and implementing MultipleThreatBonus idea. I'd suggest to proceed in a more step by step approach. First rewrite the code in the way that you like but preserving the same functionality (verify with ./stockfish bench) this is very important to don't become crazy following multiple patterns at the same time, remember that you are chasing for 5 ELO maximum increase so it is mandatory to keep external noise level at the minimum. Then, once you have added your tuning code and be sure that functionality is the same, then start tuning but I'd suggest to start with 2-3 parameters maximum or, OTH, run much longer, say 30K-40K games.
4) If instead you want to test new ideas, I'd suggest to test one by one.
3) CLOP makes no miracles and tuning is hard and requires a lot of self discipline, given that SF is a mature engine it is even harder. But don't give up: efforts will be rewarded in the long term, you just need a lot of "patience and perseverance", this can well be the chess engine developer motto  |
|
| Back to top |
|
 |
Gary
Joined: 12 Dec 2006 Posts: 338
|
Posted: Sat Mar 10, 2012 3:40 pm Post subject: Re: CLOP on Stockfish |
|
|
Thanks Marco! I agree that the patch is attempting to take on too much in one go. It was my first attempt with CLOP, so more of an experiment.
Verifying with bench was indeed very important. When all parameters were set at 100 initially, it was same as default. And CLOP did cluster the values around 100, so it seems to be working well.
I'll try breaking it up into smaller chunks and seeing how it goes. |
|
| Back to top |
|
 |
Rémi Coulom
Joined: 24 Apr 2006 Posts: 350
|
Posted: Sat Mar 10, 2012 3:41 pm Post subject: Re: CLOP on Stockfish |
|
|
The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi |
|
| Back to top |
|
 |
Gary
Joined: 12 Dec 2006 Posts: 338
|
Posted: Sat Mar 10, 2012 6:42 pm Post subject: Re: CLOP on Stockfish |
|
|
| Rémi Coulom wrote: |
The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi |
Thanks Rémi, good to know that the Elo is a bit optimistic. CLOP is a fantastic tool btw! Thanks for releasing it for us all to play with . |
|
| Back to top |
|
 |
Gary
Joined: 12 Dec 2006 Posts: 338
|
Posted: Mon Mar 12, 2012 7:26 pm Post subject: Re: CLOP on Stockfish |
|
|
| Rémi Coulom wrote: |
The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi |
A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?
I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out. |
|
| Back to top |
|
 |
Marco Costalba
Joined: 14 Jun 2008 Posts: 2090
|
Posted: Mon Mar 12, 2012 7:34 pm Post subject: Re: CLOP on Stockfish |
|
|
| gladius wrote: |
A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?
|
We use varied.bin or performance.bin
http://wbec-ridderkerk.nl/html/download.htm
But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent. |
|
| Back to top |
|
 |
Rémi Coulom
Joined: 24 Apr 2006 Posts: 350
|
Posted: Mon Mar 12, 2012 10:03 pm Post subject: Re: CLOP on Stockfish |
|
|
| gladius wrote: |
A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?
I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out. |
It is good to introduce variety in the opening.
It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.
I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.
Rémi |
|
| Back to top |
|
 |
Gary
Joined: 12 Dec 2006 Posts: 338
|
Posted: Tue Mar 13, 2012 2:19 pm Post subject: Re: CLOP on Stockfish |
|
|
| mcostalba wrote: |
We use varied.bin or performance.bin
http://wbec-ridderkerk.nl/html/download.htm
But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent. |
Thanks Marco. Testing with varied.bin on fixed node count games on parameters that have big elo impact (like weighting material from 0-100%) worked very well. It quickly figured out that low material % is bad . |
|
| Back to top |
|
 |
Gary
Joined: 12 Dec 2006 Posts: 338
|
Posted: Tue Mar 13, 2012 2:21 pm Post subject: Re: CLOP on Stockfish |
|
|
| Rémi Coulom wrote: |
It is good to introduce variety in the opening.
It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.
I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.
Rémi |
I was using the "Replications 2" parameter - however, I was not setting the cutechess srand seed based on clop seed, so it was not useful! I fixed that, and the resulting test (using material as in the post above) was very successful.
Thanks! |
|
| Back to top |
|
 |
|