For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.
I dont know you, but I think I will give my intuition a chance in those situations. I dont want to wait so long for a small change. I will be doing the 4000 test but for the purpose of seen there is no a big bug or break.
I would like to know how others deal with this kind of problem and test very small changes. Are you so patient to do a ton-of-games tourney or are you taking another kind of consideration? what your experience in this field?
thanks.
Testing very small changes ( <= 5 ELO points of gain)
Moderator: Ras
-
Kempelen
- Posts: 620
- Joined: Fri Feb 08, 2008 10:44 am
- Location: Madrid - Spain
-
FrancoisK
- Posts: 80
- Joined: Tue Jul 18, 2006 10:46 pm
Re: Testing very small changes ( <= 5 ELO points of gain)
Hi Firmin,
A major question
As big ones have become miracles for me, I am now doing 32000 games per tested change, 16 opponents, 1000 starting pos (used to be 16000) .
I have no cluster available so a couple years ago i buit an internet cluster (a la SETI) app to be able to use as many cores as my friends and colleagues would agree to give me...I think Don Dailey is using something similar. The downside : as it can run on "any" hardware with unknown load, it has to run in fixed-nodes mode (NPS=xxx in winboard) to produce reliable results, which means it will not work for changes that impact NPS + the result can be distorted in unkown ways so i have to regularly countercheck with real life time controls.
François
A major question
As big ones have become miracles for me, I am now doing 32000 games per tested change, 16 opponents, 1000 starting pos (used to be 16000) .
I have no cluster available so a couple years ago i buit an internet cluster (a la SETI) app to be able to use as many cores as my friends and colleagues would agree to give me...I think Don Dailey is using something similar. The downside : as it can run on "any" hardware with unknown load, it has to run in fixed-nodes mode (NPS=xxx in winboard) to produce reliable results, which means it will not work for changes that impact NPS + the result can be distorted in unkown ways so i have to regularly countercheck with real life time controls.
François
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Testing very small changes ( <= 5 ELO points of gain)
There is no solution except to use a large number of games. You can use very fast time controls to speed the process up...Kempelen wrote:For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.
I dont know you, but I think I will give my intuition a chance in those situations. I dont want to wait so long for a small change. I will be doing the 4000 test but for the purpose of seen there is no a big bug or break.
I would like to know how others deal with this kind of problem and test very small changes. Are you so patient to do a ton-of-games tourney or are you taking another kind of consideration? what your experience in this field?
thanks.
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Testing very small changes ( <= 5 ELO points of gain)
For eval changes why don't you use very fast time controls. I guess statistics is better than quality of games for such changes.For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.
-
Ferdy
- Posts: 4851
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Testing very small changes ( <= 5 ELO points of gain)
can you elaborate what change is this?Kempelen wrote:For the most of us, testing changes is easy when you are testing "the big ones", I mean: extensions and LMR, aspiration window, lazy eval, futility, TT, and so on. They all give a big amount of ELO. But now, in my engine, those things are implemented, also I have been tunning them. Now my problem is that I am making small changes that need lots of games to test. The typical 5 ELO points. P.e. my last change was to give a little penalty for have no pawns. This is where my test goes nowhere. I test with 4.000 games (take aprox. 2 days for me) and the error margin for both dont let me know if that change is good or not. Possible I would need a 15.000 or even more games to get a conclusion and 4 or 5 days waiting for the result for that small change.
I dont know you, but I think I will give my intuition a chance in those situations. I dont want to wait so long for a small change. I will be doing the 4000 test but for the purpose of seen there is no a big bug or break.
I would like to know how others deal with this kind of problem and test very small changes. Are you so patient to do a ton-of-games tourney or are you taking another kind of consideration? what your experience in this field?
thanks.
How do you define a small changes? Is it because of the simple code or is it because of the elo that you got after running a lot of games?my last change was to give a little penalty for have no pawns
In my case if the idea is reasonable, generally I will accept the small elo improvement provided I will reach at least 10k test games, for example king safety eval changes. One idea to minimize number of test games and yet your change will be tested thoroughly is to use appropriate test suites. For example if your change is all about passed pawns, then try to select test positions where there is a big chance that passed pawns will arise on the game.
To have a very good chance to get a big elo boost, combine your changes on your eval and search, For example if you change something on search, implement also an eval change that could be related to what you change in your search, if you change your eval and add queen and knight combination attack to opp king, then in your search, try to avoid reducing knight moves (maneuvering moves) that try to attack opp king provided that the queen was already close to opp king, or in another case the last move was a queen that brings closer to opp king and now you have a knight that moves closer to opp king you may not reduce the knight move in this case.
-
Kempelen
- Posts: 620
- Joined: Fri Feb 08, 2008 10:44 am
- Location: Madrid - Spain
Re: Testing very small changes ( <= 5 ELO points of gain)
Well, this is a idea I have but maybe not a good one. I thought that in a endgame, if I have not pawns, the winning could be more difficult, so try to hold the pawns until unavoidable could give more chances to my engine. Of course is a so subtle eval term that I thought it is less than a 5 ELO gain (if it seems good in a ton of games test).Ferdy wrote: can you elaborate what change is this?my last change was to give a little penalty for have no pawns
It is because of the elo that you got after running a lot of games is less or equal than 5.Ferdy wrote: How do you define a small changes? Is it because of the simple code or is it because of the elo that you got after running a lot of games?
Very good observations I will take into account. It looks natural to you and me that a change must be tested and look if it is reasonable. 10k games for you sounds like if you are trying only to see if there is somethink broke or not, because 10k games could be not enought.Ferdy wrote: In my case if the idea is reasonable, generally I will accept the small elo improvement provided I will reach at least 10k test games, for example king safety eval changes. One idea to minimize number of test games and yet your change will be tested thoroughly is to use appropriate test suites. For example if your change is all about passed pawns, then try to select test positions where there is a big chance that passed pawns will arise on the game.
To have a very good chance to get a big elo boost, combine your changes on your eval and search, For example if you change something on search, implement also an eval change that could be related to what you change in your search, if you change your eval and add queen and knight combination attack to opp king, then in your search, try to avoid reducing knight moves (maneuvering moves) that try to attack opp king provided that the queen was already close to opp king, or in another case the last move was a queen that brings closer to opp king and now you have a knight that moves closer to opp king you may not reduce the knight move in this case.
What I am starting to do if samething similar to you, to check with many games (me around 5k, you 10k) and see if the change is reasonable, the do it.
Taking about king safety, this is a eval term I am thinking in making a complete rewrite. It seems to me a very deep one and testing in is more complex that I thought, because as you point, the search must be tunning in accordance. I have not accomplished yet, but it sounds I will need a lot of work to get something better than what I have now.
Thanks.