Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.
CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.
CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.
CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.
Of course fifty games is a tiny sample, but these results are very different than the ones I'm getting with the same pairing done in a similar way but at 30" +.3", where Houdini 4 leads by 40 elo after 317 games. Of course the different time control and sample error could explain this difference, but I am beginning to suspect that the key difference between your test and most others is Hyperthreading. I too got worse results for Houdini than did others until I stopped testing with Hyperthreading on. I don't know why Hyperthreading would favor Stockfish and Komodo over Houdini, but it seems to be so.
Although it's pretty clear that hyperthreading should be off for single core testing, it's less clear for MP testing with threads = cores. As long as only one engine is being used at a time, either way should be "fair". The only issue is whether the engines simply play better in MP mode one way or the other. The consensus seems to be that HT off leads to best play, but there is some disagreement about that. I think this question needs to be answered clearly to determine which way of testing is best for MP. Maybe I'll do some tests to determine this soon.
lkaufman wrote:
Although it's pretty clear that hyperthreading should be off for single core testing, it's less clear for MP testing with threads = cores.
Tom's
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899
mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.
CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.
Of course fifty games is a tiny sample, but these results are very different than the ones I'm getting with the same pairing done in a similar way but at 30" +.3", where Houdini 4 leads by 40 elo after 317 games. Of course the different time control and sample error could explain this difference, but I am beginning to suspect that the key difference between your test and most others is Hyperthreading. I too got worse results for Houdini than did others until I stopped testing with Hyperthreading on. I don't know why Hyperthreading would favor Stockfish and Komodo over Houdini, but it seems to be so.
Although it's pretty clear that hyperthreading should be off for single core testing, it's less clear for MP testing with threads = cores. As long as only one engine is being used at a time, either way should be "fair". The only issue is whether the engines simply play better in MP mode one way or the other. The consensus seems to be that HT off leads to best play, but there is some disagreement about that. I think this question needs to be answered clearly to determine which way of testing is best for MP. Maybe I'll do some tests to determine this soon.
Don't jump to any conclusion, I have not stopped testing yet.
If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.
Stockfish has the option to run HT. That is the whole point of the sleeping threads option.
At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.
CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote:
Don't jump to any conclusion, I have not stopped testing yet.
Good, I'm far from ready to conclude anything.
If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.
But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.
Stockfish has the option to run HT. That is the whole point of the sleeping threads option.
Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.
At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.
What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.
mwyoung wrote:
Don't jump to any conclusion, I have not stopped testing yet.
Good, I'm far from ready to conclude anything.
If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.
But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.
Stockfish has the option to run HT. That is the whole point of the sleeping threads option.
Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.
At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.
What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.
I don't know, but I do know it has never hurt Houdini before. I have run this test with Houdini 3, and it beats older version of stockfish. I have had the setup for over 2 years.
It is only this month versions of stockfish that Houdini 3, Houdini 4, Critter, and Rybka are having major problems with Stockfish at this time control. That tells me it is not the testing, but something with the new version of stockfish. Because Stockfish DD does not perform this way.
I even played Stockfish DD vs New Stockfish. As a sanity check, Stockfish DD got beat badly at this time control.
This is what I am observing in my testing...
How would you suggest a way to test this. I can run Stockfish vs Stockfish 4 core vs 8 logical cores. This could tell us that Stockfish is just gaining more from HT only. Since most programmer claim there is no gain from HT.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote:
Don't jump to any conclusion, I have not stopped testing yet.
Good, I'm far from ready to conclude anything.
If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.
But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.
Stockfish has the option to run HT. That is the whole point of the sleeping threads option.
Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.
At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.
What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.
I don't know, but I do know it has never hurt Houdini before. I have run this test with Houdini 3, and it beats older version of stockfish. I have had the setup for over 2 years.
It is only this month versions of stockfish that Houdini 3, Houdini 4, Critter, and Rybka are having major problems with Stockfish at this time control. That tells me it is not the testing, but something with the new version of stockfish. Because Stockfish DD does not perform this way.
I even played Stockfish DD vs New Stockfish. As a sanity check, Stockfish DD got beat badly at this time control.
This is what I am observing in my testing...
How would you suggest a way to test this. I can run Stockfish vs Stockfish 4 core vs 8 logical cores. This could tell us that Stockfish is just gaining more from HT only. Since most programmer claim there is no gain from HT.
Regarding SF dec 30 improvement over SFDD, after more than 17,000 games for each against Komodo (TCEC and latest dev. version), the gain for SF is 15.7 elo points. Anything much different from that must be sample error.
Regarding HT, no, testing with 8 vs 4 is a totally different test than HT on or off. If you can't turn off HT then someone else will have to do this test. Probably I will test whether the results vary much with HT on or off, but I need some accurate info on setting the sleeping threads option, as to whether it depends on HT. Can you or anyone point me to a thread where this was discussed/explained? Testing whether HT helps or hurts a given engine is probably just a matter of comparing NPS each way, although I'm not sure that is accurate in this case.
mwyoung wrote:
Don't jump to any conclusion, I have not stopped testing yet.
Good, I'm far from ready to conclude anything.
If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.
But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.
Stockfish has the option to run HT. That is the whole point of the sleeping threads option.
Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.
At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.
What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.
I don't know, but I do know it has never hurt Houdini before. I have run this test with Houdini 3, and it beats older version of stockfish. I have had the setup for over 2 years.
It is only this month versions of stockfish that Houdini 3, Houdini 4, Critter, and Rybka are having major problems with Stockfish at this time control. That tells me it is not the testing, but something with the new version of stockfish. Because Stockfish DD does not perform this way.
I even played Stockfish DD vs New Stockfish. As a sanity check, Stockfish DD got beat badly at this time control.
This is what I am observing in my testing...
How would you suggest a way to test this. I can run Stockfish vs Stockfish 4 core vs 8 logical cores. This could tell us that Stockfish is just gaining more from HT only. Since most programmer claim there is no gain from HT.
Regarding SF dec 30 improvement over SFDD, after more than 17,000 games for each against Komodo (TCEC and latest dev. version), the gain for SF is 15.7 elo points. Anything much different from that must be sample error.
Regarding HT, no, testing with 8 vs 4 is a totally different test than HT on or off. If you can't turn off HT then someone else will have to do this test. Probably I will test whether the results vary much with HT on or off, but I need some accurate info on setting the sleeping threads option, as to whether it depends on HT. Can you or anyone point me to a thread where this was discussed/explained? Testing whether HT helps or hurts a given engine is probably just a matter of comparing NPS each way, although I'm not sure that is accurate in this case.
Keep us informed.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.