Stockfish 301213 - Houdini 4 x64A, 1 CPU Core Test

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Stockfish 301213 - Houdini 4 x64A, 1 CPU Core Test

Post by mwyoung »

Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.

CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.


Results after 35 games.

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +40  +10/=19/-6 55.71%   19.5/35
2   Houdini 4 Pro x64A           -40  +6/=19/-10 44.29%   15.5/35

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Results at game 50

Post by mwyoung »

mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.

CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.


Results after 35 games.

Code: Select all





Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +40  +10/=19/-6 55.71%   19.5/35
2   Houdini 4 Pro x64A           -40  +6/=19/-10 44.29%   15.5/35

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +35  +14/=27/-9 55.00%   27.5/50
2   Houdini 4 Pro x64A           -35  +9/=27/-14 45.00%   22.5/50


"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Results at game 50

Post by lkaufman »

mwyoung wrote:
mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.

CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.


Results after 35 games.

Code: Select all





Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +40  +10/=19/-6 55.71%   19.5/35
2   Houdini 4 Pro x64A           -40  +6/=19/-10 44.29%   15.5/35

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +35  +14/=27/-9 55.00%   27.5/50
2   Houdini 4 Pro x64A           -35  +9/=27/-14 45.00%   22.5/50


Of course fifty games is a tiny sample, but these results are very different than the ones I'm getting with the same pairing done in a similar way but at 30" +.3", where Houdini 4 leads by 40 elo after 317 games. Of course the different time control and sample error could explain this difference, but I am beginning to suspect that the key difference between your test and most others is Hyperthreading. I too got worse results for Houdini than did others until I stopped testing with Hyperthreading on. I don't know why Hyperthreading would favor Stockfish and Komodo over Houdini, but it seems to be so.
Although it's pretty clear that hyperthreading should be off for single core testing, it's less clear for MP testing with threads = cores. As long as only one engine is being used at a time, either way should be "fair". The only issue is whether the engines simply play better in MP mode one way or the other. The consensus seems to be that HT off leads to best play, but there is some disagreement about that. I think this question needs to be answered clearly to determine which way of testing is best for MP. Maybe I'll do some tests to determine this soon.
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: Results at game 50

Post by ouachita »

lkaufman wrote: Although it's pretty clear that hyperthreading should be off for single core testing, it's less clear for MP testing with threads = cores.
Tom's
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 291213 64 SSE4.2x - Houdini 4 x64xCT0 19.0 - 21.0 +6/=26/-8 47.50%

Time Control = 2+2

Stockfish 291213 64 SSE4.2x - Houdini 4 x64xCT0 18.0 - 22.0 +8/=20/-12 45.00%

Bobby

SF311213 v. H4B, 3m+1s

one (E5-2687W) core
H=contempt 0
no tb, pb or rtb.

1 Houdini 4 Pro x64B +31 +27/=55/-18 54.50% 54.5/100
2 Stockfish 311213 64 SSE4.2 -31 +18/=55/-27 45.50% 45.5/100

100 games. My H4 is not "cracked" but might be on steroids. Or, Perhaps this dev. ver. needs tweaking?
SIM, PhD, MBA, PE
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Results at game 50

Post by mwyoung »

lkaufman wrote:
mwyoung wrote:
mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.

CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.


Results after 35 games.

Code: Select all





Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +40  +10/=19/-6 55.71%   19.5/35
2   Houdini 4 Pro x64A           -40  +6/=19/-10 44.29%   15.5/35

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +35  +14/=27/-9 55.00%   27.5/50
2   Houdini 4 Pro x64A           -35  +9/=27/-14 45.00%   22.5/50


Of course fifty games is a tiny sample, but these results are very different than the ones I'm getting with the same pairing done in a similar way but at 30" +.3", where Houdini 4 leads by 40 elo after 317 games. Of course the different time control and sample error could explain this difference, but I am beginning to suspect that the key difference between your test and most others is Hyperthreading. I too got worse results for Houdini than did others until I stopped testing with Hyperthreading on. I don't know why Hyperthreading would favor Stockfish and Komodo over Houdini, but it seems to be so.
Although it's pretty clear that hyperthreading should be off for single core testing, it's less clear for MP testing with threads = cores. As long as only one engine is being used at a time, either way should be "fair". The only issue is whether the engines simply play better in MP mode one way or the other. The consensus seems to be that HT off leads to best play, but there is some disagreement about that. I think this question needs to be answered clearly to determine which way of testing is best for MP. Maybe I'll do some tests to determine this soon.
Don't jump to any conclusion, I have not stopped testing yet.

If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.

Stockfish has the option to run HT. That is the whole point of the sleeping threads option.

At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Houdini 4 Pro x64A            +5  +32/=84/-30 50.68%   74.0/146
2   Stockfish 301213 64 SSE4.2    -5  +30/=84/-32 49.32%   72.0/146

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Results at Game 151

Post by mwyoung »

mwyoung wrote:Larry K. requested I run a one core test with Stockfish 301213 playing Houdini 4. Here is the setup.

CPU i7 Q840
TC = 3m+3s
Houdini 4 x64A Contempt=0, Threads=1, All other settings default.
Stockfish 301313 Threads=1, All other settings default.
GM book to 8 moves, programs will play each side of opening.
Hash = 256mb
5 Stone Syzygy
Programs play to checkmate, or forced draw.


Results after 35 games.

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2   +40  +10/=19/-6 55.71%   19.5/35
2   Houdini 4 Pro x64A           -40  +6/=19/-10 44.29%   15.5/35

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Stockfish 301213 64 SSE4.2    +2  +33/=86/-32 50.33%   76.0/151
2   Houdini 4 Pro x64A            -2  +32/=86/-33 49.67%   75.0/151

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Results at game 50

Post by lkaufman »

mwyoung wrote: Don't jump to any conclusion, I have not stopped testing yet.

Good, I'm far from ready to conclude anything.

If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.

But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.

Stockfish has the option to run HT. That is the whole point of the sleeping threads option.

Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.

At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.

What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.

Larry

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Houdini 4 Pro x64A            +5  +32/=84/-30 50.68%   74.0/146
2   Stockfish 301213 64 SSE4.2    -5  +30/=84/-32 49.32%   72.0/146

b
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Results at game 50

Post by mwyoung »

lkaufman wrote:
mwyoung wrote: Don't jump to any conclusion, I have not stopped testing yet.

Good, I'm far from ready to conclude anything.

If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.

But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.

Stockfish has the option to run HT. That is the whole point of the sleeping threads option.

Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.

At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.

What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.

Larry

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Houdini 4 Pro x64A            +5  +32/=84/-30 50.68%   74.0/146
2   Stockfish 301213 64 SSE4.2    -5  +30/=84/-32 49.32%   72.0/146

b
I don't know, but I do know it has never hurt Houdini before. I have run this test with Houdini 3, and it beats older version of stockfish. I have had the setup for over 2 years.

It is only this month versions of stockfish that Houdini 3, Houdini 4, Critter, and Rybka are having major problems with Stockfish at this time control. That tells me it is not the testing, but something with the new version of stockfish. Because Stockfish DD does not perform this way.


I even played Stockfish DD vs New Stockfish. As a sanity check, Stockfish DD got beat badly at this time control.

This is what I am observing in my testing...

How would you suggest a way to test this. I can run Stockfish vs Stockfish 4 core vs 8 logical cores. This could tell us that Stockfish is just gaining more from HT only. Since most programmer claim there is no gain from HT.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Results at game 50

Post by lkaufman »

mwyoung wrote:
lkaufman wrote:
mwyoung wrote: Don't jump to any conclusion, I have not stopped testing yet.

Good, I'm far from ready to conclude anything.

If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.

But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.

Stockfish has the option to run HT. That is the whole point of the sleeping threads option.

Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.

At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.

What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.

Larry

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Houdini 4 Pro x64A            +5  +32/=84/-30 50.68%   74.0/146
2   Stockfish 301213 64 SSE4.2    -5  +30/=84/-32 49.32%   72.0/146

b
I don't know, but I do know it has never hurt Houdini before. I have run this test with Houdini 3, and it beats older version of stockfish. I have had the setup for over 2 years.

It is only this month versions of stockfish that Houdini 3, Houdini 4, Critter, and Rybka are having major problems with Stockfish at this time control. That tells me it is not the testing, but something with the new version of stockfish. Because Stockfish DD does not perform this way.


I even played Stockfish DD vs New Stockfish. As a sanity check, Stockfish DD got beat badly at this time control.

This is what I am observing in my testing...

How would you suggest a way to test this. I can run Stockfish vs Stockfish 4 core vs 8 logical cores. This could tell us that Stockfish is just gaining more from HT only. Since most programmer claim there is no gain from HT.
Regarding SF dec 30 improvement over SFDD, after more than 17,000 games for each against Komodo (TCEC and latest dev. version), the gain for SF is 15.7 elo points. Anything much different from that must be sample error.
Regarding HT, no, testing with 8 vs 4 is a totally different test than HT on or off. If you can't turn off HT then someone else will have to do this test. Probably I will test whether the results vary much with HT on or off, but I need some accurate info on setting the sleeping threads option, as to whether it depends on HT. Can you or anyone point me to a thread where this was discussed/explained? Testing whether HT helps or hurts a given engine is probably just a matter of comparing NPS each way, although I'm not sure that is accurate in this case.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Results at game 50

Post by mwyoung »

lkaufman wrote:
mwyoung wrote:
lkaufman wrote:
mwyoung wrote: Don't jump to any conclusion, I have not stopped testing yet.

Good, I'm far from ready to conclude anything.

If your theory is correct, and it is still a theory. If HT helps Stockfish, but it is not hurting Houdini. That is Houdini's problem.

But it may be that HT hurts both engines, but hurts Houdini more. That's what I suspect.

Stockfish has the option to run HT. That is the whole point of the sleeping threads option.

Are you saying that this option should be set differently depending on whether HT is on or off? If so I didn't know that. This suggests my theory is right.

At this setting being run below Stockfish and Houdini are pretty equal right now, it could also be Stockfish gains more from longer think times them Houdini. This is also a treat of Stockfish.

What longer think time? Three minutes on one core is almost same as one minute on 4 cores. If you are comparing to my result then yes, this should account for about 15 elo. My current result is -38 for SF vs H4 after 660 games at 30" +.3, one core, one at a time.

Larry

Code: Select all

Blitz, Blitz 3m+3s  0

                                      
1   Houdini 4 Pro x64A            +5  +32/=84/-30 50.68%   74.0/146
2   Stockfish 301213 64 SSE4.2    -5  +30/=84/-32 49.32%   72.0/146

b
I don't know, but I do know it has never hurt Houdini before. I have run this test with Houdini 3, and it beats older version of stockfish. I have had the setup for over 2 years.

It is only this month versions of stockfish that Houdini 3, Houdini 4, Critter, and Rybka are having major problems with Stockfish at this time control. That tells me it is not the testing, but something with the new version of stockfish. Because Stockfish DD does not perform this way.


I even played Stockfish DD vs New Stockfish. As a sanity check, Stockfish DD got beat badly at this time control.

This is what I am observing in my testing...

How would you suggest a way to test this. I can run Stockfish vs Stockfish 4 core vs 8 logical cores. This could tell us that Stockfish is just gaining more from HT only. Since most programmer claim there is no gain from HT.
Regarding SF dec 30 improvement over SFDD, after more than 17,000 games for each against Komodo (TCEC and latest dev. version), the gain for SF is 15.7 elo points. Anything much different from that must be sample error.
Regarding HT, no, testing with 8 vs 4 is a totally different test than HT on or off. If you can't turn off HT then someone else will have to do this test. Probably I will test whether the results vary much with HT on or off, but I need some accurate info on setting the sleeping threads option, as to whether it depends on HT. Can you or anyone point me to a thread where this was discussed/explained? Testing whether HT helps or hurts a given engine is probably just a matter of comparing NPS each way, although I'm not sure that is accurate in this case.
Keep us informed.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.