Stockfish "Use Sleeping Threads" Test

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish "Use Sleeping Threads" Test

Post by zullil »

Karlo Bala wrote:
Why not try with 9 or 10 threads? On my i5 mobile (dual core with HT) stockfish works best with 3 cores.
The issue is what does "works best" mean? :D
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: Stockfish "Use Sleeping Threads" Test

Post by Karlo Bala »

zullil wrote:
Karlo Bala wrote:
Why not try with 9 or 10 threads? On my i5 mobile (dual core with HT) stockfish works best with 3 cores.
The issue is what does "works best" mean? :D
:D

Depth/Time :wink:
Best Regards,
Karlo Balla Jr.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

zullil wrote:
bob wrote: Your test is no good. You need to run _several_ different positions, multiple times each, and then average all the times together. SMP is highly non-deterministic and you need a significant number of samples to get a reasonable estimate.
The stockfish bench uses 16 positions for each run. Perhaps Marco can clarify this.

I understand that SMP is quite variable. When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"

I wonder how many times I would need to run each test in order for the average values of the nps results to be significant.
Ideally you want to do the following:

(1) use a significant number of positions. 16 is probably way too low, but could work.

(2) run each position enough times so that you drive the standard deviation of the time for each position down to something reasonable;

(3) run each position for a significant amount of time. Say 3 minutes or whatever so that the noise of very fast searches/splits at shallow depths is drowned out by significant searches to appropriate depths.

(4) every position has to be run to the same depth. Same hash can be an issue but probably should be used (why? HT will increase speed by maybe 10% max for a well cache-optimized program, quite a bit more for one that is poorly tuned to cache. If you search faster, you search more nodes in a given amount of time, which can distort the final results. For this reason, I always do these kinds of tests with a really large hash, say 4g or 8g so that the size is not an issue for either.

I have not seen an example of a program that can add an extra thread and only incur a 10% node increase for the same depth. If one could do this, we would see speedups of 7.1 out of 8, or 15.1 out of 16, which is not very realistic with todays selectivity. Given that 10% search overhead is way too low, and given that hyper-threading only helps speed a decent program up by maybe 10%, hyper-threading is always a net loss. In Crafty, as an example, each thread comes with about a 30% overhead of additional nodes searched. 2 threads will search a tree that is bigger than a one-thread search. Yet it will search 2x faster (no hyper-threading here). However, that single extra thread will waste about 30% of its time searching nodes that the single-thread search avoids, giving a speedup of roughly 1.7x. If you have 1 real core, two logical cores, and with one core you search 100,000,000 nodes at 1M nodes per second, it takes you 100 seconds to do the search. Turning HT on will run the nps up to 1.1M nodes per second, but the tree increases to about 115,000,000. If you do the division, it takes 105 seconds rather than 100 seconds, because the added overhead more than offsets the 10% speed gain. I get 115,000,000 nodes because the base tree is 100M, each thread will search about 50M, and one thread is going to add 30% of that 50M to the total, or an extra 15M nodes. So both have to search 115M rather than 100M, which is a net loss...

I have tested SF on an 8-core box, and since they do not split at the root, their overhead is actually > 30%, which means this can not possibly make them search faster...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

mcostalba wrote:
zullil wrote:When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
No, he means that the result is not as he was expecting ;-)

Just joking, I agree that real games are needed. As a positive note a fast TC is acceptable in this case.
I do not believe fast games are good here. Millions of games have shown that for parallel search, longer is better. The deeper you go, the better the parallel search does, because those nodes near the root get better and better move ordering, which is critical to keep overhead under control...

Better for this particular issue is a reasonable number of positions including opening, middlegame and endgame, tactical and positional. Search them to significant depth and run them enough times to get an average time per position with an acceptable standard deviation.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish "Use Sleeping Threads" Test

Post by zullil »

Tord Romstad wrote:
zullil wrote:I understand that SMP is quite variable. When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
He means that comparing N/s between 8 and 16 threads isn't very interesting. The strength of Stockfish (and any other program) doesn't derive from seeing lots of nodes, but from searching very deeply. The interesting question, therefore, isn't whether the average N/s with 16 threads is higher than with 8 threads, but whether Stockfish is on average able to complete deeper searches in a given amount of time with 16 threads. Searching a 10% higher number of nodes per second doesn't help if you need to search 20% more nodes to search to the same depth.

The only way to be sure is to play lots of games, but I'd be extremely surprised if Stockfish is stronger with 16 than with 8 threads on your machine. Using 16 threads is a huge handicap, and it would be very remarkable if a tiny 10% increase in N/s is enough to compensate.
Here's a (statistically meaningless :D ) test result. Here are two SF searches, each restricted to 3 minutes. The position happens to be the opening position. The first search (with 16 threads) completes depth 28. The second search (with 8 threads) almost completes depth 28.

Code: Select all

Searching: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -
infinite: 0 ponder: 0 time: 0 increment: 0 moves to go: 0
 1     +0.73   00:00       20 Nf3 
 2     +0.12   00:00       44 Nf3 Nf6 
 3     +0.69   00:00      148 Nf3 Nf6 Nc3 
 4     +0.12   00:00      267 Nf3 Nf6 Nc3 Nc6 
 5     +0.32   00:00      566 Nf3 Nf6 Nc3 Nc6 d4 
 6     +0.12   00:00     1059 Nf3 Nf6 Nc3 Nc6 d4 d5 
 7     +0.32   00:00     2005 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4 
 8     +0.12   00:00     3438 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4 Bf5 
 9     +0.08   00:00     7613 Nf3 Nf6 Nc3 d5 d4 Nc6 h3 Bf5 Bf4 
 9     +0.36   00:00    23180 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 Nxc3 bxc3 Bf5 Nd4 
10     +0.48   00:00    75565 e4 Nc6 Nc3 e5 Nf3 Nf6 Bb5 Bd6 O-O O-O d4 exd4 
                              Nxd4 
11     +0.48   00:00   107803 e4 Nc6 d4 Nf6 e5 Nd5 Nf3 e6 Bd3 Be7 O-O O-O 
12  <  +0.40   00&#58;00   185032 e4 e5 Nc3 Nc6 Nf3 Nf6 Bb5 Bd6 O-O Nd4 Nxd4 exd4 
12  <  +0.32   00&#58;00   222282 e4 e5 Nc3 Nc6 Nf3 Nf6 Bb5 Bd6 O-O Nd4 Nxd4 exd4 
12     +0.28   00&#58;00   249382 e4 e5 Nc3 Nc6 Nf3 Nf6 Bb5 Bd6 O-O Nd4 Bd3 Nxf3+ 
                              Qxf3 O-O Bc4 
13     +0.24   00&#58;00   565091 e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 Nc3 Nxc3 dxc3 Nc6 
                              Bd3 Be7 O-O O-O 
14  >  +0.40   00&#58;00   738276 e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 Bd3 Nc5 Be2 
14     +0.24   00&#58;00    1032K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 g6 Nxf5 gxf5 exd6 
                              cxd6 Nc3 
15  >  +0.32   00&#58;00    1266K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 g6 Nxf5 gxf5 exd6 
                              cxd6 c4 
15  >  +0.40   00&#58;01    1321K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 g6 Nxf5 gxf5 exd6 
                              cxd6 c4 
15     +0.36   00&#58;01    1542K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 Bd7 c4 Nb4 Nc3 c5 
                              Nd5 Nxd5 cxd5 cxd4 Qxd4 dxe5 Qxe5 
16  >  +0.48   00&#58;01    1985K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Bd3 Bxd3 Qxd3 Nc6 O-O 
                              Ndb4 Qc4 
16  >  +0.61   00&#58;01    2624K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Bd3 Bxd3 Qxd3 Nc6 O-O 
                              Ndb4 Qc4 
16     +0.53   00&#58;01    3524K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 d6 Bg5 
                              f6 exf6 gxf6 Nc3 
17     +0.44   00&#58;01    4634K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Nc3 
                              Nc6 Qf4 Ncxe5 O-O c6 Nxe5 Qxe5 Qxe5+ Nxe5 Bf4 
18     +0.48   00&#58;01    5450K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1 
                              Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nxe5 dxe5 Bg5 Qd6 Bc4 
                              Qxd1 Rfxd1 
19  <  +0.40   00&#58;02    7303K e4 e5 Nf3 Nc6 Nc3 Nf6 d4 exd4 Nxd4 Bb4 Nxc6 Bxc3+ 
                              bxc3 dxc6 
19  <  +0.32   00&#58;02    8369K e4 e5 Nf3 Nc6 d4 exd4 Nxd4 Nf6 Nc3 Bb4 Nxc6 Bxc3+ 
                              bxc3 bxc6 
19     +0.32   00&#58;02    9788K e4 e5 Nf3 Nc6 Bb5 Bd6 O-O Nf6 Bxc6 dxc6 d4 Qe7 
                              Qe1 Bg4 dxe5 Bxe5 Nxe5 Qxe5 f3 Be6 
20     +0.28   00&#58;03   14135K e4 e5 Nf3 Nf6 d4 Nxe4 Bd3 d5 Nxe5 Bb4+ Nd2 Bxd2+ 
                              Bxd2 Nxd2 Qxd2 f6 Nf3 Qe7+ Be2 Nc6 O-O O-O Bd3 
                              Re8 Rfe1 Be6 
21  >  +0.40   00&#58;04   18397K e4 e5 Nf3 Nf6 d4 Nxe4 Bd3 d5 Nxe5 Bb4+ Nd2 Bxd2+ 
                              Bxd2 Nxd2 Qxd2 f6 Nf3 Qe7+ Be2 Nc6 O-O O-O Bd3 
                              Re8 Rfe1 Be6 c3 
21     +0.32   00&#58;04   19895K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1 
                              Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nd5 Nxd5 Qxd5 Nxf3+ 
                              Bxf3 
22  >  +0.40   00&#58;05   24430K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1 
                              Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nxe5 Qxe5 Re1 
22     +0.32   00&#58;05   26109K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1 
                              Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nxe5 dxe5 Bg5 Qd7 Bb5 
                              c6 Bd3 
23  <  +0.24   00&#58;06   34758K e4 e5 Nf3 Nf6 d4 exd4 e5 Ne4 Qxd4 f5 exf6 Nxf6 
                              Qe3+ Be7 Bd3 d5 O-O Nc6 Nd4 Nxd4 Qxd4 c5 Qe3 c4 
23  >  +0.40   00&#58;08   46992K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Be7 Bd3 d5 O-O 
                              O-O c4 Nc6 cxd5 Qxd5 Bxe4 Qxe4 Nc3 
23  >  +0.48   00&#58;09   55690K e4 e5 Nf3 Nc6 Bb5 a6 Bxc6 dxc6 O-O Bg4 h3 Bxf3 
                              Qxf3 Qf6 Qb3 O-O-O d3 Bd6 Nc3 Kb8 Be3 
23     +0.44   00&#58;10   60461K e4 e5 Nf3 Nc6 Bb5 a6 Bxc6 dxc6 O-O Bg4 h3 Bxf3 
                              Qxf3 Nf6 d3 Bd6 Nc3 O-O Bg5 Bc5 Bxf6 Qxf6 Qxf6 
                              gxf6 Na4 Ba7 Rae1 Kg7 Nc3 
24  <  +0.32   00&#58;17  114766K e4 e6 Nc3 d5 Nf3 Nf6 exd5 exd5 d4 Nc6 Bd3 Be7 O-O 
                              O-O Bf4 Bg4 Nb5 a6 Nxc7 Rc8 
24  <  +0.20   00&#58;20  140244K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bb5 Bb4 
                              Qe2+ Be6 Ne5 O-O Bxc6 bxc6 O-O c5 Nc6 Qd6 
24     +0.28   00&#58;25  168864K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bd3 Be7 O-O 
                              O-O a3 Re8 h3 Be6 Bf4 Bd6 Qd2 a6 Rfe1 Rc8 Ne5 
                              Nxd4 Bxh7+ Nxh7 Qxd4 
25     +0.28   00&#58;32  222650K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bb5 Bb4 O-O 
                              O-O Bxc6 bxc6 Ne5 c5 Bg5 Bxc3 bxc3 Qd6 Bxf6 Qxf6 
                              Re1 cxd4 cxd4 Re8 Qd3 Ba6 Qc3 Rac8 
26  <  +0.20   00&#58;43  308820K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bb5 Bb4 Ne5 
                              Bd7 Bxc6 Bxc6 O-O Bxc3 bxc3 O-O Ba3 Re8 Re1 Bb5 
                              Qf3 c6 Qg3 Ne4 Qf4 Qc7 
26     +0.28   00&#58;51  365805K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bd3 Bg4 O-O 
                              Be7 Be3 Nb4 Bf4 O-O h3 Be6 
27  <  +0.20   01&#58;01  441932K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Bd6 Bg5 Be6 Bd3 
                              O-O O-O h6 
27  <  +0.12   01&#58;31  682347K e4 e6 Nc3 d5 d4 Nf6 Bg5 Be7 e5 Nfd7 Bxe7 Qxe7 Nf3 
                              O-O 
27  >  +0.36   02&#58;06  961810K Nf3 Nf6 d4 e6 c4 Bb4+ Nc3 O-O e3 d5 Bd3 Ne4 Qc2 
                              Bxc3+ bxc3 Nf6 O-O 
27     +0.24   02&#58;48    1298M d4 Nf6 Nf3 e6 c4 Bb4+ Nc3 O-O e3 c5 Bd3 d5 O-O 
                              cxd4 exd4 dxc4 Bxc4 Nc6 Be3 Bd7 Rc1 Bd6 a3 a6 Bd3 
                              Rc8 
28  >  +0.32   02&#58;54    1344M d4 Nf6 Nf3 e6 c4 Bb4+ Nc3 O-O e3 c5 Bd3 d5 O-O 
                              cxd4 exd4 dxc4 Bxc4 Nc6 Be3 Bd7 Rc1 Bd6 a3 a6 Bd3 
                              Rc8 Ne4 Nxe4 Bxe4 
28     +0.32   02&#58;58    1371M d4 Nf6 Nf3 e6 c4 Bb4+ Nc3 O-O e3 c5 Bd3 d5 O-O 
                              cxd4 exd4 dxc4 Bxc4 Nc6 Be3 Bd6 Qb1 Bd7 Bd3 Qe7 
                              Ng5 g6 Nge4 

Nodes&#58; 1386704705
Nodes/second&#58; 7698611
Best move&#58; d4
Ponder move&#58; Nf6


Searching&#58; rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -
infinite&#58; 0 ponder&#58; 0 time&#58; 0 increment&#58; 0 moves to go&#58; 0
 1     +0.73   00&#58;00       20 Nf3 
 2     +0.12   00&#58;00       44 Nf3 Nf6 
 3     +0.69   00&#58;00      148 Nf3 Nf6 Nc3 
 4     +0.12   00&#58;00      267 Nf3 Nf6 Nc3 Nc6 
 5     +0.32   00&#58;00      566 Nf3 Nf6 Nc3 Nc6 d4 
 6     +0.12   00&#58;00     1059 Nf3 Nf6 Nc3 Nc6 d4 d5 
 7     +0.32   00&#58;00     2005 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4 
 8     +0.12   00&#58;00     3442 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4 Bf5 
 9     +0.08   00&#58;00     7653 Nf3 Nf6 Nc3 d5 d4 Nc6 h3 Bf5 Bf4 
 9     +0.36   00&#58;00    24716 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 Nxc3 bxc3 Bf5 Nd4 
10     +0.40   00&#58;00    58448 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bb5+ c6 Be2 Be7 
                              O-O O-O d4 
11  <  +0.24   00&#58;00    78727 e4 Nf6 Nc3 Nc6 Nf3 e6 Bd3 Bd6 O-O a6 
11     +0.36   00&#58;00   130925 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4 
                              Nc6 
12  <  +0.28   00&#58;00   164010 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4 
                              Nc6 
12     +0.36   00&#58;00   204119 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4 
                              Nc6 
13  <  +0.28   00&#58;00   227211 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4 
                              Nxc3 bxc3 Nc6 Kh1 Bd6 
13  >  +0.44   00&#58;00   386984 e4 Nf6 e5 Nd5 Nf3 Nc6 d4 e6 Bd3 
13     +0.40   00&#58;00   438747 e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nc3 Nxc3 bxc3 Nc6 Rb1 
                              Qc8 Bd3 
14     +0.40   00&#58;00   590455 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bb5+ c6 Bc4 Nxc3 
                              bxc3 Be7 O-O O-O d4 b5 
15  <  +0.32   00&#58;00   674740 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bb5+ c6 Bc4 Nxc3 
                              bxc3 Be7 O-O O-O d4 b5 
15  >  +0.48   00&#58;00    1275K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nc3 Nxc3 bxc3 Nc6 Rb1 
                              Bc8 Bb5 
15  >  +0.57   00&#58;01    1657K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nc3 Nxc3 bxc3 Nc6 Rb1 
                              Bc8 Bb5 
15     +0.28   00&#58;01    2725K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 d4 exd4 e5 Ng4 Bg5 
                              Be7 Nxd4 Bxg5 Qxg4 Nxe5 
16  >  +0.40   00&#58;01    3111K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 O-O d3 a6 Bg5 
                              b5 Bb3 b4 Nd5 
16     +0.20   00&#58;01    3299K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Bc5 Nc3 O-O Nxe5 Nxe5 
                              d4 a6 dxe5 axb5 exf6 Qxf6 Nxb5 
17  >  +0.36   00&#58;01    4622K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Bc5 d3 O-O Bxc6 bxc6 
                              Nxe5 Re8 Nf3 d5 e5 Ng4 d4 
17     +0.36   00&#58;01    4756K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Bc5 c3 Nxe4 d4 exd4 
                              cxd4 Bb6 Nc3 d5 Ne5 O-O Nxc6 bxc6 Bxc6 
18     +0.44   00&#58;02    6067K e4 e5 Nf3 Nc6 Bc4 Bd6 Nc3 Nf6 O-O O-O d3 Na5 Bg5 
                              c6 Bb3 b5 d4 Nxb3 axb3 exd4 Qxd4 
19  <  +0.28   00&#58;02    6923K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 d6 Na4 Bb6 Nxb6 
                              axb6 d4 exd4 Bg5 h6 
19     +0.28   00&#58;02    8253K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 d6 Na4 Bb6 d4 
                              exd4 Nxb6 axb6 Nxd4 Ne5 Qe2 Nxc4 Qxc4 O-O Bg5 d5 
                              exd5 Qxd5 Qxd5 Nxd5 
20     +0.20   00&#58;02   10320K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 d6 d3 Na5 Bb3 
                              O-O Be3 Nxb3 axb3 Bxe3 fxe3 Qe7 Nb5 d5 exd5 Qc5 
                              Nc3 Qxe3+ Kh1 
20     +0.28   00&#58;03   13979K Nf3 Nf6 e3 Nc6 d4 e6 Bb5 a6 Bxc6 dxc6 O-O Be7 Nc3 
                              O-O Ne5 c5 Qd3 Nd7 Nf3 cxd4 exd4 
21  <  +0.20   00&#58;03   16175K Nf3 Nf6 e3 e6 d4 Be7 Bd3 d5 O-O O-O c4 dxc4 Bxc4 
                              c5 dxc5 Bxc5 Nc3 Nc6 e4 Qxd1 Rxd1 Ng4 
21  >  +0.36   00&#58;04   19568K e4 e5 Nf3 Nf6 Nxe5 Nxe4 Qe2 d5 d3 Bd6 Nf3 
21     +0.20   00&#58;05   22984K Nf3 Nf6 e3 e6 d4 Be7 Bd3 d5 O-O O-O c4 dxc4 Bxc4 
                              c5 Nc3 Nc6 a3 cxd4 exd4 Bd6 Be3 Bd7 
22  <  +0.12   00&#58;05   26425K Nf3 Nf6 e3 e6 d4 Be7 Bd3 d5 O-O O-O c4 dxc4 Bxc4 
                              c5 Nc3 Nc6 a3 cxd4 exd4 Qb6 d5 Rd8 
22  >  +0.28   00&#58;06   31643K e4 e5 Nf3 Nc6 Bc4 Bc5 Nc3 Nf6 O-O d6 d3 O-O Bg5 
                              Be6 Nd5 Bxd5 Bxd5 h6 Bxc6 bxc6 Bd2 Kh8 
22     +0.24   00&#58;07   33134K e4 e5 Nf3 Nc6 Bc4 Bc5 Nc3 Nf6 d3 d6 Bg5 O-O O-O 
                              Be6 Nd5 Bxd5 Bxd5 h6 Bxc6 bxc6 Bxf6 Qxf6 c3 Rfb8 
                              b4 
23  >  +0.32   00&#58;07   36214K e4 e5 Nf3 Nc6 Bc4 Bc5 Nc3 Nf6 d3 d6 Bg5 O-O O-O 
                              Be6 Nd5 Bxd5 Bxd5 h6 Bxf6 Qxf6 c3 Rad8 d4 
23  >  +0.40   00&#58;08   45039K e4 e5 Nf3 Nf6 Nxe5 Nxe4 Qe2 d5 d3 Bd6 Nf3 
23     +0.24   00&#58;10   56808K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d3 Nf6 d4 Nc6 Nc3 
                              d5 Bb5 Bb4 Bxc6+ bxc6 O-O O-O Ne5 c5 Be3 Bxc3 
                              bxc3 Ne4 
24  >  +0.32   00&#58;12   63979K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d3 Nf6 d4 Nc6 Nc3 
                              d5 Bb5 Bb4 Qe2+ Ne4 Bxc6+ bxc6 Ng5 
24     +0.28   00&#58;12   68859K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d3 Nf6 d4 Nc6 Nc3 
                              d5 Bb5 Bb4 O-O O-O Bxc6 bxc6 Ne5 c5 Bg5 Bxc3 bxc3 
                              Bf5 dxc5 
25  >  +0.36   00&#58;16   92225K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Qe7 Be2 
25     +0.32   00&#58;17   99170K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Be7 Bd3 d5 O-O 
                              O-O Re1 Bf5 Nbd2 Nd6 Nb3 Ne4 Bf4 Bd6 Ne5 Nd7 Bxe4 
                              dxe4 
26  >  +0.40   00&#58;22  127720K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Be7 Bd3 d5 O-O 
                              O-O Re1 Bf5 Nbd2 Nd6 Nb3 Ne4 Nfd2 Nd6 Bxf5 Nxf5 
                              c3 Nc6 Qh5 Bg5 Nf3 
26  >  +0.48   00&#58;28  168981K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Nxe4 Re1 Nd6 Nxe5 Nxe5 
                              Rxe5+ Be7 Bd3 O-O Nc3 Bf6 Re1 Bd4 Nd5 Bxf2+ Kxf2 
26     +0.44   00&#58;31  185866K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Nxe4 Qe2 Ng5 Bxc6 dxc6 
                              Qxe5+ Ne6 Re1 Be7 d3 O-O Nc3 Bd6 Qh5 Nf4 Bxf4 
                              Bxf4 g3 Bd6 Ng5 Bf5 Nce4 h6 Nxd6 Qxd6 Ne4 
27  <  +0.28   00&#58;59  369095K e4 e6 Nf3 d5 Nc3 Nf6 Bb5+ c6 Bd3 Be7 e5 Nfd7 O-O 
                              O-O Be2 c5 
27     +0.28   01&#58;12  449669K e4 e6 Nf3 d5 exd5 exd5 d4 Bd6 Nc3 Nf6 Bd3 Nc6 O-O 
                              O-O Bg5 Be7 a3 Bg4 Be3 Be6 Re1 Ng4 Bd2 Bh4 g3 Be7 
                              Qb1 a6 
28  <  +0.04   02&#58;30  972687K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Bb5+ c6 Bd3 Bd6 
                              Qe2+ Be7 O-O O-O Re1 Bd6 Nc3 Re8 Qf1 Be6 Ng5 Qb6 
                              Nxe6 fxe6 Ne2 e5 dxe5 Bxe5 c3 Nbd7 

Nodes&#58; 1168373026
Nodes/second&#58; 6489627
Best move&#58; e4
Ponder move&#58; e6

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

zullil wrote:
Tord Romstad wrote:
zullil wrote:I understand that SMP is quite variable. When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
The interesting question, therefore, isn't whether the average N/s with 16 threads is higher than with 8 threads, but whether Stockfish is on average able to complete deeper searches in a given amount of time with 16 threads.
Thanks. So my test should have been to give each version of the engine some fixed amount of time to search and record the depth that it reached. (And to do this with a collection of positions and repeat these a large number of times to ensure significance of the results.)
No. It is much easier to pick a set of positions that you feel are representative. Opening, middlegame and endgame positions. Some tactical. Some positional. Some with one best move. Some with many nearly equally good moves.

Search them to fixed depth, but for each position choose a depth that makes it use significant time. Say at least one minute per position. As you run them, you will notice how wildly the times will fluctuate for many positions. You may well find a few that are very consistent. you don't need to run those dozens of times. But the more variability you get, the more runs you need to compute a reasonable mean.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

first thing is to look at the output -carefully-. :)

For instance, what was the best move at depth 27? Same for both searches? That is what we mean by non-deterministic behaviour of parallel search. If the searches don't even find the same best move, clearly they are not searching the same search space. This causes time to jump around all over the place...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test (Crafty

Post by bob »

zullil wrote:Well, this is also no good statistically, but Crafty-23.4 gets an 11% and a 15% increase in nps using hyperthreading. This is based on the crafty bench command, which searches positions to a fixed depth (as does the SF bench). As Robert Houdart suggests, perhaps this increase has no effect on game results. Note that the speed gain seems to increase with the depth of the fixed-depth searches.
Unfortunately, in the game of chess, we don't win games with NPS. We win games by searching deeply. Crafty adds about 30% overhead for each thread beyond 1. Yet it is only gaining 10-15% speed. So a 15% speed gain, a 30% longer search, that is a net _loss_ of 15% overall going from 1 thread to 2. Normally, with _real_ cores, that extra cpu gives 100% speed improvement, with a 30% larger tree, which is a gain of 70%. But with HT, the speed improvement is less than the search space increase, which hurts.

Look at the "total nodes" output in each run, you will see what I mean. And again, 30% is an estimate. For one case, it went from 6.2 billion to 8.5 billion nodes. HT is not going to offset that 2.3B extra nodes (25% larger tree, 10% faster search, larger tree wins.)

Code: Select all

LZsMacPro-OSX6&#58; ~/Documents/Chess/Crafty/Crafty-23.4&#93; ./crafty-23.4 
unable to open book file &#91;./Books/book.bin&#93;.
book is disabled
unable to open book file &#91;./Books/books.bin&#93;.
Warning--  xboard 'cores' option disabled
max threads set to 16.
maximum thread group size set to 12.
minimum nodes before a split 4000.
EGTB access enabled
using tbpath=../TB
0 piece tablebase files found
EGTB cache memory = 256M bytes.
Warning--  xboard 'memory' option disabled
hash table memory = 1024M bytes &#40;64M entries&#41;.
Warning--  xboard 'memory' option disabled
pawn hash table memory = 256M bytes &#40;8M entries&#41;.
choose from book moves randomly &#40;using weights.)
choose from 5 best moves.
pondering disabled.
Audio output disabled
 game/10 minutes primary time control


Crafty v23.4 &#40;16 cpus&#41;

White&#40;1&#41;&#58; bench+3
Running benchmark 3. . .
......
Total nodes&#58; 2520046720
Raw nodes per second&#58; 25540151
Total elapsed time&#58; 98.67
White&#40;1&#41;&#58; quit



LZsMacPro-OSX6&#58; ~/Documents/Chess/Crafty/Crafty-23.4&#93; ./crafty-23.4 
unable to open book file &#91;./Books/book.bin&#93;.
book is disabled
unable to open book file &#91;./Books/books.bin&#93;.
Warning--  xboard 'cores' option disabled
max threads set to 16.
maximum thread group size set to 12.
minimum nodes before a split 4000.
EGTB access enabled
using tbpath=../TB
0 piece tablebase files found
EGTB cache memory = 256M bytes.
Warning--  xboard 'memory' option disabled
hash table memory = 1024M bytes &#40;64M entries&#41;.
Warning--  xboard 'memory' option disabled
pawn hash table memory = 256M bytes &#40;8M entries&#41;.
choose from book moves randomly &#40;using weights.)
choose from 5 best moves.
pondering disabled.
Audio output disabled
 game/10 minutes primary time control


Crafty v23.4 &#40;16 cpus&#41;

White&#40;1&#41;&#58; mt=8
Warning--  xboard 'cores' option disabled
max threads set to 8.
White&#40;1&#41;&#58; bench+3
Running benchmark 3. . .
......
Total nodes&#58; 2265323093
Raw nodes per second&#58; 23084918
Total elapsed time&#58; 98.13
White&#40;1&#41;&#58; quit



LZsMacPro-OSX6&#58; ~/Documents/Chess/Crafty/Crafty-23.4&#93; ./crafty-23.4 
unable to open book file &#91;./Books/book.bin&#93;.
book is disabled
unable to open book file &#91;./Books/books.bin&#93;.
Warning--  xboard 'cores' option disabled
max threads set to 16.
maximum thread group size set to 12.
minimum nodes before a split 4000.
EGTB access enabled
using tbpath=../TB
0 piece tablebase files found
EGTB cache memory = 256M bytes.
Warning--  xboard 'memory' option disabled
hash table memory = 1024M bytes &#40;64M entries&#41;.
Warning--  xboard 'memory' option disabled
pawn hash table memory = 256M bytes &#40;8M entries&#41;.
choose from book moves randomly &#40;using weights.)
choose from 5 best moves.
pondering disabled.
Audio output disabled
 game/10 minutes primary time control


Crafty v23.4 &#40;16 cpus&#41;

White&#40;1&#41;&#58; bench+5
Running benchmark 5. . .
......
Total nodes&#58; 8855963985
Raw nodes per second&#58; 27566344
Total elapsed time&#58; 321.26
White&#40;1&#41;&#58; quit



LZsMacPro-OSX6&#58; ~/Documents/Chess/Crafty/Crafty-23.4&#93; ./crafty-23.4 
unable to open book file &#91;./Books/book.bin&#93;.
book is disabled
unable to open book file &#91;./Books/books.bin&#93;.
Warning--  xboard 'cores' option disabled
max threads set to 16.
maximum thread group size set to 12.
minimum nodes before a split 4000.
EGTB access enabled
using tbpath=../TB
0 piece tablebase files found
EGTB cache memory = 256M bytes.
Warning--  xboard 'memory' option disabled
hash table memory = 1024M bytes &#40;64M entries&#41;.
Warning--  xboard 'memory' option disabled
pawn hash table memory = 256M bytes &#40;8M entries&#41;.
choose from book moves randomly &#40;using weights.)
choose from 5 best moves.
pondering disabled.
Audio output disabled
 game/10 minutes primary time control


Crafty v23.4 &#40;16 cpus&#41;

White&#40;1&#41;&#58; mt=8
Warning--  xboard 'cores' option disabled
max threads set to 8.
White&#40;1&#41;&#58; bench+5
Running benchmark 5. . .
......
Total nodes&#58; 6220117960
Raw nodes per second&#58; 23970549
Total elapsed time&#58; 259.49
White&#40;1&#41;&#58; quit


bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

PawnStormZ wrote:Hi Marco.

Am I missing something here? How could a match like this be run on 1 pc? The hyperthreading is a BIOS setting that needs to be either on or off. There is not a way to play one engine where the cpu is using it and one not on the same pc.

Actually you can. If you have 4 physical cores, and enable HT, you still have 4 physical cores but now 8 logical cores. Recent windows (since Vista at least) and Linux kernels for the past several years will run a 4-thread program scheduling one thread per physical core. If you run an 8 thread program, they will both then schedule one thread per logical core. It will work just fine...
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish "Use Sleeping Threads" Test

Post by zullil »

bob wrote:first thing is to look at the output -carefully-. :)

For instance, what was the best move at depth 27? Same for both searches? That is what we mean by non-deterministic behaviour of parallel search. If the searches don't even find the same best move, clearly they are not searching the same search space. This causes time to jump around all over the place...
Why quibble about e4 versus d4? :wink:

Yes, I understand your point about the searches being done over different spaces when threading is involved.

Thanks for your replies, by the way.