Testing very small changes ( <= 5 ELO points of gain)

FrancoisK · Post by **FrancoisK** » Fri Apr 08, 2011 2:10 pm

Hi Firmin,

A major question

As big ones have become miracles for me, I am now doing 32000 games per tested change, 16 opponents, 1000 starting pos (used to be 16000) .
I have no cluster available so a couple years ago i buit an internet cluster (a la SETI) app to be able to use as many cores as my friends and colleagues would agree to give me...I think Don Dailey is using something similar. The downside : as it can run on "any" hardware with unknown load, it has to run in fixed-nodes mode (NPS=xxx in winboard) to produce reliable results, which means it will not work for changes that impact NPS + the result can be distorted in unkown ways so i have to regularly countercheck with real life time controls.

François

bob · Post by **bob** » Fri Apr 08, 2011 7:15 pm

Kempelen wrote:For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.

I dont know you, but I think I will give my intuition a chance in those situations. I dont want to wait so long for a small change. I will be doing the 4000 test but for the purpose of seen there is no a big bug or break.

I would like to know how others deal with this kind of problem and test very small changes. Are you so patient to do a ton-of-games tourney or are you taking another kind of consideration? what your experience in this field?

thanks.

There is no solution except to use a large number of games. You can use very fast time controls to speed the process up...

Daniel Shawul · Post by **Daniel Shawul** » Fri Apr 08, 2011 8:38 pm

For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.

For eval changes why don't you use very fast time controls. I guess statistics is better than quality of games for such changes.

Ferdy · Post by **Ferdy** » Sun Apr 10, 2011 6:12 pm

Kempelen wrote:For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.

I dont know you, but I think I will give my intuition a chance in those situations. I dont want to wait so long for a small change. I will be doing the 4000 test but for the purpose of seen there is no a big bug or break.

I would like to know how others deal with this kind of problem and test very small changes. Are you so patient to do a ton-of-games tourney or are you taking another kind of consideration? what your experience in this field?

thanks.

can you elaborate what change is this?

my last change was to give a little penalty for have no pawns

How do you define a small changes? Is it because of the simple code or is it because of the elo that you got after running a lot of games?

In my case if the idea is reasonable, generally I will accept the small elo improvement provided I will reach at least 10k test games, for example king safety eval changes. One idea to minimize number of test games and yet your change will be tested thoroughly is to use appropriate test suites. For example if your change is all about passed pawns, then try to select test positions where there is a big chance that passed pawns will arise on the game.

To have a very good chance to get a big elo boost, combine your changes on your eval and search, For example if you change something on search, implement also an eval change that could be related to what you change in your search, if you change your eval and add queen and knight combination attack to opp king, then in your search, try to avoid reducing knight moves (maneuvering moves) that try to attack opp king provided that the queen was already close to opp king, or in another case the last move was a queen that brings closer to opp king and now you have a knight that moves closer to opp king you may not reduce the knight move in this case.

Kempelen · Post by **Kempelen** » Mon Apr 11, 2011 10:17 am

Ferdy wrote: can you elaborate what change is this?
my last change was to give a little penalty for have no pawns

Well, this is a idea I have but maybe not a good one. I thought that in a endgame, if I have not pawns, the winning could be more difficult, so try to hold the pawns until unavoidable could give more chances to my engine. Of course is a so subtle eval term that I thought it is less than a 5 ELO gain (if it seems good in a ton of games test).

Ferdy wrote: How do you define a small changes? Is it because of the simple code or is it because of the elo that you got after running a lot of games?

It is because of the elo that you got after running a lot of games is less or equal than 5.

Ferdy wrote: In my case if the idea is reasonable, generally I will accept the small elo improvement provided I will reach at least 10k test games, for example king safety eval changes. One idea to minimize number of test games and yet your change will be tested thoroughly is to use appropriate test suites. For example if your change is all about passed pawns, then try to select test positions where there is a big chance that passed pawns will arise on the game.

To have a very good chance to get a big elo boost, combine your changes on your eval and search, For example if you change something on search, implement also an eval change that could be related to what you change in your search, if you change your eval and add queen and knight combination attack to opp king, then in your search, try to avoid reducing knight moves (maneuvering moves) that try to attack opp king provided that the queen was already close to opp king, or in another case the last move was a queen that brings closer to opp king and now you have a knight that moves closer to opp king you may not reduce the knight move in this case.

Very good observations I will take into account. It looks natural to you and me that a change must be tested and look if it is reasonable. 10k games for you sounds like if you are trying only to see if there is somethink broke or not, because 10k games could be not enought.

What I am starting to do if samething similar to you, to check with many games (me around 5k, you 10k) and see if the change is reasonable, the do it.

Taking about king safety, this is a eval term I am thinking in making a complete rewrite. It seems to me a very deep one and testing in is more complex that I thought, because as you point, the search must be tunning in accordance. I have not accomplished yet, but it sounds I will need a lot of work to get something better than what I have now.

Thanks.

Testing very small changes ( <= 5 ELO points of gain)

Re: Testing very small changes ( <= 5 ELO points of gain)

Re: Testing very small changes ( <= 5 ELO points of gain)

Re: Testing very small changes ( <= 5 ELO points of gain)

Re: Testing very small changes ( <= 5 ELO points of gain)

Re: Testing very small changes ( <= 5 ELO points of gain)