Thanks. Good match.
With all the hype about Houdini it should have done better.
I am now playilng Houdini vs.IvanHoe 12i.
The winner will play Firebird 1.3.1.
Best,
Gerold.
P.S. Be nice if you could play this one over with
all the lastest up to date engines again.
I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).
After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.
I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).
One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.
My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.
brianr wrote:I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).
After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.
I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).
One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.
My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.
I find Arena is very bad for ultra fast games (though great for longer games). I suspect it has something to do with the time accounting related to printing output to the screen. For super fast games I like cute_chess.
Thanks. Good match.
With all the hype about Houdini it should have done better.
I am now playilng Houdini vs.IvanHoe 12i.
The winner will play Firebird 1.3.1.
Best,
Gerold.
P.S. Be nice if you could play this one over with
all the lastest up to date engines again.
I have updated Houdini with 1.01 x64 1cpu. I am getting crashes with
illegal move notice. Therefore 1.01 2cpu version replaced the previous
Houdini, ie 1.0 1cpu I am curious whether 2cpu version fares better.
Stockfish is replaced by Protector 135. I know its a little weak for this group.
brianr wrote:I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).
After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.
I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).
One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.
My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.
Yep, I have been using Arena lately.
One suggestion against time losses could be to set the Auto Flag to off.
Then you search only illegal moves and sometimes weird adjudication of Arena. It happens up and then.
brianr wrote:I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).
After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.
I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).
One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.
My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.
Yep, I have been using Arena lately.
One suggestion against time losses could be to set the Auto Flag to off.
Then you search only illegal moves and sometimes weird adjudication of Arena. It happens up and then.
Best,
I used to do that. It creates other problems. For example, some engines when they run out of time will make a depth 1 move, some will hang, and some will search to a minimum depth (of say 6 or 8). This artificially skews results pretty strongly.
brianr wrote:I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).
After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.
I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).
One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.
My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.
Yep, I have been using Arena lately.
One suggestion against time losses could be to set the Auto Flag to off.
Then you search only illegal moves and sometimes weird adjudication of Arena. It happens up and then.
Best,
I used to do that. It creates other problems. For example, some engines when they run out of time will make a depth 1 move, some will hang, and some will search to a minimum depth (of say 6 or 8). This artificially skews results pretty strongly.
-Sam
On the other hand, some engines crash at longer TCs, one or two hours per engine, I mean. Some problems arose due to engines, some other due to guis. There is no such thing as perfect testing.
My ultra-fast tests started without problems. I am happy so far.