The issue is what does "works best" mean?Karlo Bala wrote:
Why not try with 9 or 10 threads? On my i5 mobile (dual core with HT) stockfish works best with 3 cores.
Stockfish "Use Sleeping Threads" Test
Moderators: hgm, Rebel, chrisw
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish "Use Sleeping Threads" Test
-
- Posts: 373
- Joined: Wed Mar 22, 2006 10:17 am
- Location: Novi Sad, Serbia
- Full name: Karlo Balla
Re: Stockfish "Use Sleeping Threads" Test
zullil wrote:The issue is what does "works best" mean?Karlo Bala wrote:
Why not try with 9 or 10 threads? On my i5 mobile (dual core with HT) stockfish works best with 3 cores.
Depth/Time
Best Regards,
Karlo Balla Jr.
Karlo Balla Jr.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish "Use Sleeping Threads" Test
Ideally you want to do the following:zullil wrote:The stockfish bench uses 16 positions for each run. Perhaps Marco can clarify this.bob wrote: Your test is no good. You need to run _several_ different positions, multiple times each, and then average all the times together. SMP is highly non-deterministic and you need a significant number of samples to get a reasonable estimate.
I understand that SMP is quite variable. When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
I wonder how many times I would need to run each test in order for the average values of the nps results to be significant.
(1) use a significant number of positions. 16 is probably way too low, but could work.
(2) run each position enough times so that you drive the standard deviation of the time for each position down to something reasonable;
(3) run each position for a significant amount of time. Say 3 minutes or whatever so that the noise of very fast searches/splits at shallow depths is drowned out by significant searches to appropriate depths.
(4) every position has to be run to the same depth. Same hash can be an issue but probably should be used (why? HT will increase speed by maybe 10% max for a well cache-optimized program, quite a bit more for one that is poorly tuned to cache. If you search faster, you search more nodes in a given amount of time, which can distort the final results. For this reason, I always do these kinds of tests with a really large hash, say 4g or 8g so that the size is not an issue for either.
I have not seen an example of a program that can add an extra thread and only incur a 10% node increase for the same depth. If one could do this, we would see speedups of 7.1 out of 8, or 15.1 out of 16, which is not very realistic with todays selectivity. Given that 10% search overhead is way too low, and given that hyper-threading only helps speed a decent program up by maybe 10%, hyper-threading is always a net loss. In Crafty, as an example, each thread comes with about a 30% overhead of additional nodes searched. 2 threads will search a tree that is bigger than a one-thread search. Yet it will search 2x faster (no hyper-threading here). However, that single extra thread will waste about 30% of its time searching nodes that the single-thread search avoids, giving a speedup of roughly 1.7x. If you have 1 real core, two logical cores, and with one core you search 100,000,000 nodes at 1M nodes per second, it takes you 100 seconds to do the search. Turning HT on will run the nps up to 1.1M nodes per second, but the tree increases to about 115,000,000. If you do the division, it takes 105 seconds rather than 100 seconds, because the added overhead more than offsets the 10% speed gain. I get 115,000,000 nodes because the base tree is 100M, each thread will search about 50M, and one thread is going to add 30% of that 50M to the total, or an extra 15M nodes. So both have to search 115M rather than 100M, which is a net loss...
I have tested SF on an 8-core box, and since they do not split at the root, their overhead is actually > 30%, which means this can not possibly make them search faster...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish "Use Sleeping Threads" Test
I do not believe fast games are good here. Millions of games have shown that for parallel search, longer is better. The deeper you go, the better the parallel search does, because those nodes near the root get better and better move ordering, which is critical to keep overhead under control...mcostalba wrote:No, he means that the result is not as he was expectingzullil wrote:When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
Just joking, I agree that real games are needed. As a positive note a fast TC is acceptable in this case.
Better for this particular issue is a reasonable number of positions including opening, middlegame and endgame, tactical and positional. Search them to significant depth and run them enough times to get an average time per position with an acceptable standard deviation.
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish "Use Sleeping Threads" Test
Here's a (statistically meaningless ) test result. Here are two SF searches, each restricted to 3 minutes. The position happens to be the opening position. The first search (with 16 threads) completes depth 28. The second search (with 8 threads) almost completes depth 28.Tord Romstad wrote:He means that comparing N/s between 8 and 16 threads isn't very interesting. The strength of Stockfish (and any other program) doesn't derive from seeing lots of nodes, but from searching very deeply. The interesting question, therefore, isn't whether the average N/s with 16 threads is higher than with 8 threads, but whether Stockfish is on average able to complete deeper searches in a given amount of time with 16 threads. Searching a 10% higher number of nodes per second doesn't help if you need to search 20% more nodes to search to the same depth.zullil wrote:I understand that SMP is quite variable. When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
The only way to be sure is to play lots of games, but I'd be extremely surprised if Stockfish is stronger with 16 than with 8 threads on your machine. Using 16 threads is a huge handicap, and it would be very remarkable if a tiny 10% increase in N/s is enough to compensate.
Code: Select all
Searching: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -
infinite: 0 ponder: 0 time: 0 increment: 0 moves to go: 0
1 +0.73 00:00 20 Nf3
2 +0.12 00:00 44 Nf3 Nf6
3 +0.69 00:00 148 Nf3 Nf6 Nc3
4 +0.12 00:00 267 Nf3 Nf6 Nc3 Nc6
5 +0.32 00:00 566 Nf3 Nf6 Nc3 Nc6 d4
6 +0.12 00:00 1059 Nf3 Nf6 Nc3 Nc6 d4 d5
7 +0.32 00:00 2005 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4
8 +0.12 00:00 3438 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4 Bf5
9 +0.08 00:00 7613 Nf3 Nf6 Nc3 d5 d4 Nc6 h3 Bf5 Bf4
9 +0.36 00:00 23180 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 Nxc3 bxc3 Bf5 Nd4
10 +0.48 00:00 75565 e4 Nc6 Nc3 e5 Nf3 Nf6 Bb5 Bd6 O-O O-O d4 exd4
Nxd4
11 +0.48 00:00 107803 e4 Nc6 d4 Nf6 e5 Nd5 Nf3 e6 Bd3 Be7 O-O O-O
12 < +0.40 00:00 185032 e4 e5 Nc3 Nc6 Nf3 Nf6 Bb5 Bd6 O-O Nd4 Nxd4 exd4
12 < +0.32 00:00 222282 e4 e5 Nc3 Nc6 Nf3 Nf6 Bb5 Bd6 O-O Nd4 Nxd4 exd4
12 +0.28 00:00 249382 e4 e5 Nc3 Nc6 Nf3 Nf6 Bb5 Bd6 O-O Nd4 Bd3 Nxf3+
Qxf3 O-O Bc4
13 +0.24 00:00 565091 e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 Nc3 Nxc3 dxc3 Nc6
Bd3 Be7 O-O O-O
14 > +0.40 00:00 738276 e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 Bd3 Nc5 Be2
14 +0.24 00:00 1032K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 g6 Nxf5 gxf5 exd6
cxd6 Nc3
15 > +0.32 00:00 1266K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 g6 Nxf5 gxf5 exd6
cxd6 c4
15 > +0.40 00:01 1321K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 g6 Nxf5 gxf5 exd6
cxd6 c4
15 +0.36 00:01 1542K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nh4 Bd7 c4 Nb4 Nc3 c5
Nd5 Nxd5 cxd5 cxd4 Qxd4 dxe5 Qxe5
16 > +0.48 00:01 1985K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Bd3 Bxd3 Qxd3 Nc6 O-O
Ndb4 Qc4
16 > +0.61 00:01 2624K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Bd3 Bxd3 Qxd3 Nc6 O-O
Ndb4 Qc4
16 +0.53 00:01 3524K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 d6 Bg5
f6 exf6 gxf6 Nc3
17 +0.44 00:01 4634K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Nc3
Nc6 Qf4 Ncxe5 O-O c6 Nxe5 Qxe5 Qxe5+ Nxe5 Bf4
18 +0.48 00:01 5450K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1
Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nxe5 dxe5 Bg5 Qd6 Bc4
Qxd1 Rfxd1
19 < +0.40 00:02 7303K e4 e5 Nf3 Nc6 Nc3 Nf6 d4 exd4 Nxd4 Bb4 Nxc6 Bxc3+
bxc3 dxc6
19 < +0.32 00:02 8369K e4 e5 Nf3 Nc6 d4 exd4 Nxd4 Nf6 Nc3 Bb4 Nxc6 Bxc3+
bxc3 bxc6
19 +0.32 00:02 9788K e4 e5 Nf3 Nc6 Bb5 Bd6 O-O Nf6 Bxc6 dxc6 d4 Qe7
Qe1 Bg4 dxe5 Bxe5 Nxe5 Qxe5 f3 Be6
20 +0.28 00:03 14135K e4 e5 Nf3 Nf6 d4 Nxe4 Bd3 d5 Nxe5 Bb4+ Nd2 Bxd2+
Bxd2 Nxd2 Qxd2 f6 Nf3 Qe7+ Be2 Nc6 O-O O-O Bd3
Re8 Rfe1 Be6
21 > +0.40 00:04 18397K e4 e5 Nf3 Nf6 d4 Nxe4 Bd3 d5 Nxe5 Bb4+ Nd2 Bxd2+
Bxd2 Nxd2 Qxd2 f6 Nf3 Qe7+ Be2 Nc6 O-O O-O Bd3
Re8 Rfe1 Be6 c3
21 +0.32 00:04 19895K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1
Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nd5 Nxd5 Qxd5 Nxf3+
Bxf3
22 > +0.40 00:05 24430K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1
Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nxe5 Qxe5 Re1
22 +0.32 00:05 26109K e4 e5 Nf3 Nf6 d4 exd4 e5 Qe7 Be2 Ng4 Qxd4 h5 Qd1
Nc6 O-O Ncxe5 h3 Nf6 Nc3 d6 Nxe5 dxe5 Bg5 Qd7 Bb5
c6 Bd3
23 < +0.24 00:06 34758K e4 e5 Nf3 Nf6 d4 exd4 e5 Ne4 Qxd4 f5 exf6 Nxf6
Qe3+ Be7 Bd3 d5 O-O Nc6 Nd4 Nxd4 Qxd4 c5 Qe3 c4
23 > +0.40 00:08 46992K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Be7 Bd3 d5 O-O
O-O c4 Nc6 cxd5 Qxd5 Bxe4 Qxe4 Nc3
23 > +0.48 00:09 55690K e4 e5 Nf3 Nc6 Bb5 a6 Bxc6 dxc6 O-O Bg4 h3 Bxf3
Qxf3 Qf6 Qb3 O-O-O d3 Bd6 Nc3 Kb8 Be3
23 +0.44 00:10 60461K e4 e5 Nf3 Nc6 Bb5 a6 Bxc6 dxc6 O-O Bg4 h3 Bxf3
Qxf3 Nf6 d3 Bd6 Nc3 O-O Bg5 Bc5 Bxf6 Qxf6 Qxf6
gxf6 Na4 Ba7 Rae1 Kg7 Nc3
24 < +0.32 00:17 114766K e4 e6 Nc3 d5 Nf3 Nf6 exd5 exd5 d4 Nc6 Bd3 Be7 O-O
O-O Bf4 Bg4 Nb5 a6 Nxc7 Rc8
24 < +0.20 00:20 140244K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bb5 Bb4
Qe2+ Be6 Ne5 O-O Bxc6 bxc6 O-O c5 Nc6 Qd6
24 +0.28 00:25 168864K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bd3 Be7 O-O
O-O a3 Re8 h3 Be6 Bf4 Bd6 Qd2 a6 Rfe1 Rc8 Ne5
Nxd4 Bxh7+ Nxh7 Qxd4
25 +0.28 00:32 222650K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bb5 Bb4 O-O
O-O Bxc6 bxc6 Ne5 c5 Bg5 Bxc3 bxc3 Qd6 Bxf6 Qxf6
Re1 cxd4 cxd4 Re8 Qd3 Ba6 Qc3 Rac8
26 < +0.20 00:43 308820K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bb5 Bb4 Ne5
Bd7 Bxc6 Bxc6 O-O Bxc3 bxc3 O-O Ba3 Re8 Re1 Bb5
Qf3 c6 Qg3 Ne4 Qf4 Qc7
26 +0.28 00:51 365805K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Nc6 Bd3 Bg4 O-O
Be7 Be3 Nb4 Bf4 O-O h3 Be6
27 < +0.20 01:01 441932K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Nc3 Bd6 Bg5 Be6 Bd3
O-O O-O h6
27 < +0.12 01:31 682347K e4 e6 Nc3 d5 d4 Nf6 Bg5 Be7 e5 Nfd7 Bxe7 Qxe7 Nf3
O-O
27 > +0.36 02:06 961810K Nf3 Nf6 d4 e6 c4 Bb4+ Nc3 O-O e3 d5 Bd3 Ne4 Qc2
Bxc3+ bxc3 Nf6 O-O
27 +0.24 02:48 1298M d4 Nf6 Nf3 e6 c4 Bb4+ Nc3 O-O e3 c5 Bd3 d5 O-O
cxd4 exd4 dxc4 Bxc4 Nc6 Be3 Bd7 Rc1 Bd6 a3 a6 Bd3
Rc8
28 > +0.32 02:54 1344M d4 Nf6 Nf3 e6 c4 Bb4+ Nc3 O-O e3 c5 Bd3 d5 O-O
cxd4 exd4 dxc4 Bxc4 Nc6 Be3 Bd7 Rc1 Bd6 a3 a6 Bd3
Rc8 Ne4 Nxe4 Bxe4
28 +0.32 02:58 1371M d4 Nf6 Nf3 e6 c4 Bb4+ Nc3 O-O e3 c5 Bd3 d5 O-O
cxd4 exd4 dxc4 Bxc4 Nc6 Be3 Bd6 Qb1 Bd7 Bd3 Qe7
Ng5 g6 Nge4
Nodes: 1386704705
Nodes/second: 7698611
Best move: d4
Ponder move: Nf6
Searching: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -
infinite: 0 ponder: 0 time: 0 increment: 0 moves to go: 0
1 +0.73 00:00 20 Nf3
2 +0.12 00:00 44 Nf3 Nf6
3 +0.69 00:00 148 Nf3 Nf6 Nc3
4 +0.12 00:00 267 Nf3 Nf6 Nc3 Nc6
5 +0.32 00:00 566 Nf3 Nf6 Nc3 Nc6 d4
6 +0.12 00:00 1059 Nf3 Nf6 Nc3 Nc6 d4 d5
7 +0.32 00:00 2005 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4
8 +0.12 00:00 3442 Nf3 Nf6 Nc3 Nc6 d4 d5 Bf4 Bf5
9 +0.08 00:00 7653 Nf3 Nf6 Nc3 d5 d4 Nc6 h3 Bf5 Bf4
9 +0.36 00:00 24716 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 Nxc3 bxc3 Bf5 Nd4
10 +0.40 00:00 58448 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bb5+ c6 Be2 Be7
O-O O-O d4
11 < +0.24 00:00 78727 e4 Nf6 Nc3 Nc6 Nf3 e6 Bd3 Bd6 O-O a6
11 +0.36 00:00 130925 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4
Nc6
12 < +0.28 00:00 164010 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4
Nc6
12 +0.36 00:00 204119 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4
Nc6
13 < +0.28 00:00 227211 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bc4 Be7 O-O O-O d4
Nxc3 bxc3 Nc6 Kh1 Bd6
13 > +0.44 00:00 386984 e4 Nf6 e5 Nd5 Nf3 Nc6 d4 e6 Bd3
13 +0.40 00:00 438747 e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nc3 Nxc3 bxc3 Nc6 Rb1
Qc8 Bd3
14 +0.40 00:00 590455 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bb5+ c6 Bc4 Nxc3
bxc3 Be7 O-O O-O d4 b5
15 < +0.32 00:00 674740 e4 Nf6 Nc3 d5 exd5 Nxd5 Nf3 e6 Bb5+ c6 Bc4 Nxc3
bxc3 Be7 O-O O-O d4 b5
15 > +0.48 00:00 1275K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nc3 Nxc3 bxc3 Nc6 Rb1
Bc8 Bb5
15 > +0.57 00:01 1657K e4 Nf6 e5 Nd5 Nf3 d6 d4 Bf5 Nc3 Nxc3 bxc3 Nc6 Rb1
Bc8 Bb5
15 +0.28 00:01 2725K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 d4 exd4 e5 Ng4 Bg5
Be7 Nxd4 Bxg5 Qxg4 Nxe5
16 > +0.40 00:01 3111K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 O-O d3 a6 Bg5
b5 Bb3 b4 Nd5
16 +0.20 00:01 3299K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Bc5 Nc3 O-O Nxe5 Nxe5
d4 a6 dxe5 axb5 exf6 Qxf6 Nxb5
17 > +0.36 00:01 4622K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Bc5 d3 O-O Bxc6 bxc6
Nxe5 Re8 Nf3 d5 e5 Ng4 d4
17 +0.36 00:01 4756K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Bc5 c3 Nxe4 d4 exd4
cxd4 Bb6 Nc3 d5 Ne5 O-O Nxc6 bxc6 Bxc6
18 +0.44 00:02 6067K e4 e5 Nf3 Nc6 Bc4 Bd6 Nc3 Nf6 O-O O-O d3 Na5 Bg5
c6 Bb3 b5 d4 Nxb3 axb3 exd4 Qxd4
19 < +0.28 00:02 6923K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 d6 Na4 Bb6 Nxb6
axb6 d4 exd4 Bg5 h6
19 +0.28 00:02 8253K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 d6 Na4 Bb6 d4
exd4 Nxb6 axb6 Nxd4 Ne5 Qe2 Nxc4 Qxc4 O-O Bg5 d5
exd5 Qxd5 Qxd5 Nxd5
20 +0.20 00:02 10320K e4 e5 Nf3 Nc6 Bc4 Bc5 O-O Nf6 Nc3 d6 d3 Na5 Bb3
O-O Be3 Nxb3 axb3 Bxe3 fxe3 Qe7 Nb5 d5 exd5 Qc5
Nc3 Qxe3+ Kh1
20 +0.28 00:03 13979K Nf3 Nf6 e3 Nc6 d4 e6 Bb5 a6 Bxc6 dxc6 O-O Be7 Nc3
O-O Ne5 c5 Qd3 Nd7 Nf3 cxd4 exd4
21 < +0.20 00:03 16175K Nf3 Nf6 e3 e6 d4 Be7 Bd3 d5 O-O O-O c4 dxc4 Bxc4
c5 dxc5 Bxc5 Nc3 Nc6 e4 Qxd1 Rxd1 Ng4
21 > +0.36 00:04 19568K e4 e5 Nf3 Nf6 Nxe5 Nxe4 Qe2 d5 d3 Bd6 Nf3
21 +0.20 00:05 22984K Nf3 Nf6 e3 e6 d4 Be7 Bd3 d5 O-O O-O c4 dxc4 Bxc4
c5 Nc3 Nc6 a3 cxd4 exd4 Bd6 Be3 Bd7
22 < +0.12 00:05 26425K Nf3 Nf6 e3 e6 d4 Be7 Bd3 d5 O-O O-O c4 dxc4 Bxc4
c5 Nc3 Nc6 a3 cxd4 exd4 Qb6 d5 Rd8
22 > +0.28 00:06 31643K e4 e5 Nf3 Nc6 Bc4 Bc5 Nc3 Nf6 O-O d6 d3 O-O Bg5
Be6 Nd5 Bxd5 Bxd5 h6 Bxc6 bxc6 Bd2 Kh8
22 +0.24 00:07 33134K e4 e5 Nf3 Nc6 Bc4 Bc5 Nc3 Nf6 d3 d6 Bg5 O-O O-O
Be6 Nd5 Bxd5 Bxd5 h6 Bxc6 bxc6 Bxf6 Qxf6 c3 Rfb8
b4
23 > +0.32 00:07 36214K e4 e5 Nf3 Nc6 Bc4 Bc5 Nc3 Nf6 d3 d6 Bg5 O-O O-O
Be6 Nd5 Bxd5 Bxd5 h6 Bxf6 Qxf6 c3 Rad8 d4
23 > +0.40 00:08 45039K e4 e5 Nf3 Nf6 Nxe5 Nxe4 Qe2 d5 d3 Bd6 Nf3
23 +0.24 00:10 56808K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d3 Nf6 d4 Nc6 Nc3
d5 Bb5 Bb4 Bxc6+ bxc6 O-O O-O Ne5 c5 Be3 Bxc3
bxc3 Ne4
24 > +0.32 00:12 63979K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d3 Nf6 d4 Nc6 Nc3
d5 Bb5 Bb4 Qe2+ Ne4 Bxc6+ bxc6 Ng5
24 +0.28 00:12 68859K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d3 Nf6 d4 Nc6 Nc3
d5 Bb5 Bb4 O-O O-O Bxc6 bxc6 Ne5 c5 Bg5 Bxc3 bxc3
Bf5 dxc5
25 > +0.36 00:16 92225K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Qe7 Be2
25 +0.32 00:17 99170K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Be7 Bd3 d5 O-O
O-O Re1 Bf5 Nbd2 Nd6 Nb3 Ne4 Bf4 Bd6 Ne5 Nd7 Bxe4
dxe4
26 > +0.40 00:22 127720K e4 e5 Nf3 Nf6 Nxe5 d6 Nf3 Nxe4 d4 Be7 Bd3 d5 O-O
O-O Re1 Bf5 Nbd2 Nd6 Nb3 Ne4 Nfd2 Nd6 Bxf5 Nxf5
c3 Nc6 Qh5 Bg5 Nf3
26 > +0.48 00:28 168981K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Nxe4 Re1 Nd6 Nxe5 Nxe5
Rxe5+ Be7 Bd3 O-O Nc3 Bf6 Re1 Bd4 Nd5 Bxf2+ Kxf2
26 +0.44 00:31 185866K e4 e5 Nf3 Nc6 Bb5 Nf6 O-O Nxe4 Qe2 Ng5 Bxc6 dxc6
Qxe5+ Ne6 Re1 Be7 d3 O-O Nc3 Bd6 Qh5 Nf4 Bxf4
Bxf4 g3 Bd6 Ng5 Bf5 Nce4 h6 Nxd6 Qxd6 Ne4
27 < +0.28 00:59 369095K e4 e6 Nf3 d5 Nc3 Nf6 Bb5+ c6 Bd3 Be7 e5 Nfd7 O-O
O-O Be2 c5
27 +0.28 01:12 449669K e4 e6 Nf3 d5 exd5 exd5 d4 Bd6 Nc3 Nf6 Bd3 Nc6 O-O
O-O Bg5 Be7 a3 Bg4 Be3 Be6 Re1 Ng4 Bd2 Bh4 g3 Be7
Qb1 a6
28 < +0.04 02:30 972687K e4 e6 Nf3 d5 exd5 exd5 d4 Nf6 Bb5+ c6 Bd3 Bd6
Qe2+ Be7 O-O O-O Re1 Bd6 Nc3 Re8 Qf1 Be6 Ng5 Qb6
Nxe6 fxe6 Ne2 e5 dxe5 Bxe5 c3 Nbd7
Nodes: 1168373026
Nodes/second: 6489627
Best move: e4
Ponder move: e6
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish "Use Sleeping Threads" Test
No. It is much easier to pick a set of positions that you feel are representative. Opening, middlegame and endgame positions. Some tactical. Some positional. Some with one best move. Some with many nearly equally good moves.zullil wrote:Thanks. So my test should have been to give each version of the engine some fixed amount of time to search and record the depth that it reached. (And to do this with a collection of positions and repeat these a large number of times to ensure significance of the results.)Tord Romstad wrote:The interesting question, therefore, isn't whether the average N/s with 16 threads is higher than with 8 threads, but whether Stockfish is on average able to complete deeper searches in a given amount of time with 16 threads.zullil wrote:I understand that SMP is quite variable. When you say that the "test is no good" do you mean more than "the results are statistically insignificant?"
Search them to fixed depth, but for each position choose a depth that makes it use significant time. Say at least one minute per position. As you run them, you will notice how wildly the times will fluctuate for many positions. You may well find a few that are very consistent. you don't need to run those dozens of times. But the more variability you get, the more runs you need to compute a reasonable mean.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish "Use Sleeping Threads" Test
first thing is to look at the output -carefully-.
For instance, what was the best move at depth 27? Same for both searches? That is what we mean by non-deterministic behaviour of parallel search. If the searches don't even find the same best move, clearly they are not searching the same search space. This causes time to jump around all over the place...
For instance, what was the best move at depth 27? Same for both searches? That is what we mean by non-deterministic behaviour of parallel search. If the searches don't even find the same best move, clearly they are not searching the same search space. This causes time to jump around all over the place...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish "Use Sleeping Threads" Test (Crafty
Unfortunately, in the game of chess, we don't win games with NPS. We win games by searching deeply. Crafty adds about 30% overhead for each thread beyond 1. Yet it is only gaining 10-15% speed. So a 15% speed gain, a 30% longer search, that is a net _loss_ of 15% overall going from 1 thread to 2. Normally, with _real_ cores, that extra cpu gives 100% speed improvement, with a 30% larger tree, which is a gain of 70%. But with HT, the speed improvement is less than the search space increase, which hurts.zullil wrote:Well, this is also no good statistically, but Crafty-23.4 gets an 11% and a 15% increase in nps using hyperthreading. This is based on the crafty bench command, which searches positions to a fixed depth (as does the SF bench). As Robert Houdart suggests, perhaps this increase has no effect on game results. Note that the speed gain seems to increase with the depth of the fixed-depth searches.
Look at the "total nodes" output in each run, you will see what I mean. And again, 30% is an estimate. For one case, it went from 6.2 billion to 8.5 billion nodes. HT is not going to offset that 2.3B extra nodes (25% larger tree, 10% faster search, larger tree wins.)
Code: Select all
LZsMacPro-OSX6: ~/Documents/Chess/Crafty/Crafty-23.4] ./crafty-23.4 unable to open book file [./Books/book.bin]. book is disabled unable to open book file [./Books/books.bin]. Warning-- xboard 'cores' option disabled max threads set to 16. maximum thread group size set to 12. minimum nodes before a split 4000. EGTB access enabled using tbpath=../TB 0 piece tablebase files found EGTB cache memory = 256M bytes. Warning-- xboard 'memory' option disabled hash table memory = 1024M bytes (64M entries). Warning-- xboard 'memory' option disabled pawn hash table memory = 256M bytes (8M entries). choose from book moves randomly (using weights.) choose from 5 best moves. pondering disabled. Audio output disabled game/10 minutes primary time control Crafty v23.4 (16 cpus) White(1): bench+3 Running benchmark 3. . . ...... Total nodes: 2520046720 Raw nodes per second: 25540151 Total elapsed time: 98.67 White(1): quit LZsMacPro-OSX6: ~/Documents/Chess/Crafty/Crafty-23.4] ./crafty-23.4 unable to open book file [./Books/book.bin]. book is disabled unable to open book file [./Books/books.bin]. Warning-- xboard 'cores' option disabled max threads set to 16. maximum thread group size set to 12. minimum nodes before a split 4000. EGTB access enabled using tbpath=../TB 0 piece tablebase files found EGTB cache memory = 256M bytes. Warning-- xboard 'memory' option disabled hash table memory = 1024M bytes (64M entries). Warning-- xboard 'memory' option disabled pawn hash table memory = 256M bytes (8M entries). choose from book moves randomly (using weights.) choose from 5 best moves. pondering disabled. Audio output disabled game/10 minutes primary time control Crafty v23.4 (16 cpus) White(1): mt=8 Warning-- xboard 'cores' option disabled max threads set to 8. White(1): bench+3 Running benchmark 3. . . ...... Total nodes: 2265323093 Raw nodes per second: 23084918 Total elapsed time: 98.13 White(1): quit LZsMacPro-OSX6: ~/Documents/Chess/Crafty/Crafty-23.4] ./crafty-23.4 unable to open book file [./Books/book.bin]. book is disabled unable to open book file [./Books/books.bin]. Warning-- xboard 'cores' option disabled max threads set to 16. maximum thread group size set to 12. minimum nodes before a split 4000. EGTB access enabled using tbpath=../TB 0 piece tablebase files found EGTB cache memory = 256M bytes. Warning-- xboard 'memory' option disabled hash table memory = 1024M bytes (64M entries). Warning-- xboard 'memory' option disabled pawn hash table memory = 256M bytes (8M entries). choose from book moves randomly (using weights.) choose from 5 best moves. pondering disabled. Audio output disabled game/10 minutes primary time control Crafty v23.4 (16 cpus) White(1): bench+5 Running benchmark 5. . . ...... Total nodes: 8855963985 Raw nodes per second: 27566344 Total elapsed time: 321.26 White(1): quit LZsMacPro-OSX6: ~/Documents/Chess/Crafty/Crafty-23.4] ./crafty-23.4 unable to open book file [./Books/book.bin]. book is disabled unable to open book file [./Books/books.bin]. Warning-- xboard 'cores' option disabled max threads set to 16. maximum thread group size set to 12. minimum nodes before a split 4000. EGTB access enabled using tbpath=../TB 0 piece tablebase files found EGTB cache memory = 256M bytes. Warning-- xboard 'memory' option disabled hash table memory = 1024M bytes (64M entries). Warning-- xboard 'memory' option disabled pawn hash table memory = 256M bytes (8M entries). choose from book moves randomly (using weights.) choose from 5 best moves. pondering disabled. Audio output disabled game/10 minutes primary time control Crafty v23.4 (16 cpus) White(1): mt=8 Warning-- xboard 'cores' option disabled max threads set to 8. White(1): bench+5 Running benchmark 5. . . ...... Total nodes: 6220117960 Raw nodes per second: 23970549 Total elapsed time: 259.49 White(1): quit
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish "Use Sleeping Threads" Test
Actually you can. If you have 4 physical cores, and enable HT, you still have 4 physical cores but now 8 logical cores. Recent windows (since Vista at least) and Linux kernels for the past several years will run a 4-thread program scheduling one thread per physical core. If you run an 8 thread program, they will both then schedule one thread per logical core. It will work just fine...PawnStormZ wrote:Hi Marco.
Am I missing something here? How could a match like this be run on 1 pc? The hyperthreading is a BIOS setting that needs to be either on or off. There is not a way to play one engine where the cpu is using it and one not on the same pc.
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish "Use Sleeping Threads" Test
Why quibble about e4 versus d4?bob wrote:first thing is to look at the output -carefully-.
For instance, what was the best move at depth 27? Same for both searches? That is what we mean by non-deterministic behaviour of parallel search. If the searches don't even find the same best move, clearly they are not searching the same search space. This causes time to jump around all over the place...
Yes, I understand your point about the searches being done over different spaces when threading is involved.
Thanks for your replies, by the way.