An altenative perft() initial FEN
Moderator: Ras
-
CRoberson
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: An altenative perft() initial FEN
I've run Telepath on the position for each of the first 7 ply and obtained the same node counts as you did.
-
xmas79
- Posts: 286
- Joined: Mon Jun 03, 2013 7:05 pm
- Location: Italy
Re: An altenative perft() initial FEN.
Hi all,
I'm interested in NPS scaling with multithread enabled and hashtable disabled (since I don't want to use hashtables in perft).
r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R w KQkq -
I wrote different perft functions: one in particular where I use fast "stripped down" Make/Unmake functions where I removed hash signature updates etc... and a special MakeUnmakeFast function in the last ply to check only if the move is legal (I have only pseudo-legal move generator). With the fully bloated Make/Unmake with no other tricks in the last ply I can get "only" about half the speed.
This is all on my quad-core Core i7-3630QM@2.40GHz.
And here are the single thread versions:
As you can see I get only a 4x scaling even if I'm using 8 threads (CPU have 4 cores with HT). I thought it must be the HT stuff, but using 4 threads seems to halves scaling factor.
What NPS scaling do you get with multithread?
And here's in the end divided results of perft 6 and perft 7:
And here a perft 7 result:
perft 7 results are not ordered because of non determinism of multithread.
Best regards,
Natale.
I'm interested in NPS scaling with multithread enabled and hashtable disabled (since I don't want to use hashtables in perft).
r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R w KQkq -
I wrote different perft functions: one in particular where I use fast "stripped down" Make/Unmake functions where I removed hash signature updates etc... and a special MakeUnmakeFast function in the last ply to check only if the move is legal (I have only pseudo-legal move generator). With the fully bloated Make/Unmake with no other tricks in the last ply I can get "only" about half the speed.
This is all on my quad-core Core i7-3630QM@2.40GHz.
Code: Select all
perftfastmt 6
Time elapsed: 28.65400 seconds
Total leaf nodes: 7891984336
275.4M LNPS
perftmt 6
Time elapsed: 60.90700 seconds
Total leaf nodes: 7891984336
129.6M LNPS
Code: Select all
perftfast 6
Time elapsed: 134.12200 seconds
Total leaf nodes: 7891984336
58.8M LNPS
perft 6
Time elapsed: 268.17700 seconds
Total leaf nodes: 7891984336
29.4M LNPS
What NPS scaling do you get with multithread?
And here's in the end divided results of perft 6 and perft 7:
Code: Select all
perftfast 6
1 Nf3*e5 256118651: r3k2r/1pp1qppp/p1np1n2/2b1N1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
2 Bc4*a6 170724575: r3k2r/1pp1qppp/B1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
3 Bc4*f7 22358900: r3k2r/1pp1qBpp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2Rb KQkq -
4 Bg5*f6 145467022: r3k2r/1pp1qppp/p1np1B2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
5 Qe2-d1 169787452: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R2QK2R b KQkq -
6 Qe2-f1 157901116: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R3KQ1R b KQkq -
7 Qe2-d2 178876552: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPQ1PPP/R3K2R b KQkq -
8 Qe2-e3 187562592: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NPQN2/1PP2PPP/R3K2R b KQkq -
9 Ra1-b1 163110039: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/1R2K2R b KQkq -
10 Ra1-c1 162063286: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2R1K2R b KQkq -
11 Ra1-d1 155111784: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/3RK2R b KQkq -
12 Ra1-a2 142402945: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/RPP1QPPP/4K2R b KQkq -
13 Rh1-f1 160853862: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3KR2 b KQkq -
14 Rh1-g1 168403724: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K1R1 b KQkq -
15 Bc4-a2 165001356: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/BPP1QPPP/R3K2R b KQkq -
16 Bc4-b3 162876250: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/PBNP1N2/1PP1QPPP/R3K2R b KQkq -
17 Bc4-b5 118534407: r3k2r/1pp1qppp/p1np1n2/1Bb1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
18 Bc4-d5 156746652: r3k2r/1pp1qppp/p1np1n2/2bBp1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
19 Bc4-e6 162159260: r3k2r/1pp1qppp/p1npBn2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
20 Bg5-c1 163956956: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R1B1K2R b KQkq -
21 Bg5-d2 167436099: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PPBQPPP/R3K2R b KQkq -
22 Bg5-e3 188923139: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NPBN2/1PP1QPPP/R3K2R b KQkq -
23 Bg5-f4 212478824: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1PBb1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
24 Bg5-h4 157267257: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1bB/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
25 Bg5-h6 179105449: r3k2r/1pp1qppp/p1np1n1B/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
26 Nc3-b1 133597337: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/RN2K2R b KQkq -
27 Nc3-d1 132979018: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R2NK2R b KQkq -
28 Nc3-a2 153356008: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/NPP1QPPP/R3K2R b KQkq -
29 Nc3-a4 158446983: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/N1B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
30 Nc3-b5 158989454: r3k2r/1pp1qppp/p1np1n2/1Nb1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
31 Nc3-d5 161873336: r3k2r/1pp1qppp/p1np1n2/2bNp1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
32 Nf3-g1 173731523: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K1NR b KQkq -
33 Nf3-d2 184486630: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PPNQPPP/R3K2R b KQkq -
34 Nf3-d4 213244060: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BNP1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
35 Nf3-h4 189677279: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bN/P1NP4/1PP1QPPP/R3K2R b KQkq -
36 b2-b3 161223442: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/PPNP1N2/2P1QPPP/R3K2R b KQkq -
37 g2-g3 175267258: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1NP1/1PP1QP1P/R3K2R b KQkq -
38 h2-h3 196761843: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N1P/1PP1QPP1/R3K2R b KQkq -
39 a3-a4 182863372: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/P1B1P1b1/2NP1N2/1PP1QPPP/R3K2R b KQkq -
40 d3-d4 203014324: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BPP1b1/P1N2N2/1PP1QPPP/R3K2R b KQkq -
41 b2-b4 172122868: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/1PB1P1b1/P1NP1N2/2P1QPPP/R3K2R b KQkq -
42 h2-h4 182697403: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bP/P1NP1N2/1PP1QPP1/R3K2R b KQkq -
43 O-O 176251801: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 b KQkq -
44 O-O-O 162758263: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2KR3R b KQkq -
45 Ke1-d1 173202558: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R2K3R b KQkq -
46 Ke1-f1 176260981: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4K1R b KQkq -
47 Ke1-d2 193950446: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPKQPPP/R6R b KQkq -
Time elapsed: 134.12200 seconds
Total leaf nodes: 7891984336
58.8M LNPSAnd here a perft 7 result:
Code: Select all
perftfastmt 7
1 h2-h3 8700041322: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N1P/1PP1QPP1/R3K2R b KQkq -
2 a3-a4 7844893814: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/P1B1P1b1/2NP1N2/1PP1QPPP/R3K2R b KQkq -
3 d3-d4 9408057290: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BPP1b1/P1N2N2/1PP1QPPP/R3K2R b KQkq -
4 b2-b4 7383734723: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/1PB1P1b1/P1NP1N2/2P1QPPP/R3K2R b KQkq b3
5 h2-h4 7857707773: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bP/P1NP1N2/1PP1QPP1/R3K2R b KQkq h3
6 O-O 7441315266: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 b kq -
7 Ke1-d1 7176704182: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R2K3R b kq -
8 Ke1-f1 7434607500: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4K1R b kq -
9 Ke1-d2 8429004623: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPKQPPP/R6R b kq -
10 Rh1-f1 6548916568: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3KR2 b Qkq -
11 Rh1-g1 6997505595: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K1R1 b Qkq -
12 Bc4-a2 6828224394: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/BPP1QPPP/R3K2R b KQkq -
13 Bc4-b3 6724378117: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/PBNP1N2/1PP1QPPP/R3K2R b KQkq -
14 Bc4-b5 4815395108: r3k2r/1pp1qppp/p1np1n2/1Bb1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
15 Bc4-d5 6637464383: r3k2r/1pp1qppp/p1np1n2/2bBp1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
16 Bc4-e6 7051856718: r3k2r/1pp1qppp/p1npBn2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
17 Bg5-d2 6877428331: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PPBQPPP/R3K2R b KQkq -
18 Bg5-e3 8097829773: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NPBN2/1PP1QPPP/R3K2R b KQkq -
19 Bg5-f4 9110198725: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1PBb1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
20 Bg5-h4 6243857458: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1bB/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
21 Bg5-h6 7609854994: r3k2r/1pp1qppp/p1np1n1B/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
22 Nc3-b1 5106093309: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/RN2K2R b KQkq -
23 Nc3-d1 5092743663: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R2NK2R b KQkq -
24 Nc3-a2 6217317396: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/NPP1QPPP/R3K2R b KQkq -
25 Nc3-a4 6581923754: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/N1B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
26 Nc3-b5 6743690873: r3k2r/1pp1qppp/p1np1n2/1Nb1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
27 Nc3-d5 7031516349: r3k2r/1pp1qppp/p1np1n2/2bNp1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
28 Nf3-g1 7058061238: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K1NR b KQkq -
29 Nf3-d2 7793889011: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PPNQPPP/R3K2R b KQkq -
30 Nf3-d4 9751959347: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BNP1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
31 Nf3-h4 8146091138: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bN/P1NP4/1PP1QPPP/R3K2R b KQkq -
32 b2-b3 6563013573: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/PPNP1N2/2P1QPPP/R3K2R b KQkq -
33 g2-g3 7362922786: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1NP1/1PP1QP1P/R3K2R b KQkq -
34 O-O-O 6677652450: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2KR3R b kq -
35 Nf3*e5 11833091585: r3k2r/1pp1qppp/p1np1n2/2b1N1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
36 Bc4*a6 6911473293: r3k2r/1pp1qppp/B1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
37 Bc4*f7 931459632: r3k2r/1pp1qBpp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
38 Bg5*f6 6135150412: r3k2r/1pp1qppp/p1np1B2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
39 Qe2-d1 7066718673: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R2QK2R b KQkq -
40 Qe2-f1 6205034378: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R3KQ1R b KQkq -
41 Qe2-d2 7713599363: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPQ1PPP/R3K2R b KQkq -
42 Qe2-e3 8417344023: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NPQN2/1PP2PPP/R3K2R b KQkq -
43 Ra1-b1 6753584464: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/1R2K2R b Kkq -
44 Ra1-c1 6656410108: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2R1K2R b Kkq -
45 Ra1-d1 6275244828: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/3RK2R b Kkq -
46 Ra1-a2 5588602038: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/RPP1QPPP/4K2R b Kkq -
47 Bg5-c1 6569302985: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R1B1K2R b KQkq -
Time elapsed: 1225.87400 seconds
Total leaf nodes: 332402867326
271.2M LNPS
Best regards,
Natale.
-
ibid
- Posts: 89
- Joined: Mon Jun 13, 2011 12:09 pm
Re: An altenative perft() initial FEN.
Something like this should get near-perfect scaling, at least as long as you're using physical cores. A multi-threaded no-hash-table perft(6) for me:xmas79 wrote:Hi all,
I'm interested in NPS scaling with multithread enabled and hashtable disabled (since I don't want to use hashtables in perft).
[d]r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R w KQkq -
I wrote different perft functions: one in particular where I use fast "stripped down" Make/Unmake functions where I removed hash signature updates etc... and a special MakeUnmakeFast function in the last ply to check only if the move is legal (I have only pseudo-legal move generator). With the fully bloated Make/Unmake with no other tricks in the last ply I can get "only" about half the speed.
This is all on my quad-core Core i7-3630QM@2.40GHz.
And here are the single thread versions:Code: Select all
perftfastmt 6 Time elapsed: 28.65400 seconds Total leaf nodes: 7891984336 275.4M LNPS perftmt 6 Time elapsed: 60.90700 seconds Total leaf nodes: 7891984336 129.6M LNPSAs you can see I get only a 4x scaling even if I'm using 8 threads (CPU have 4 cores with HT). I thought it must be the HT stuff, but using 4 threads seems to halves scaling factor.Code: Select all
perftfast 6 Time elapsed: 134.12200 seconds Total leaf nodes: 7891984336 58.8M LNPS perft 6 Time elapsed: 268.17700 seconds Total leaf nodes: 7891984336 29.4M LNPS
What NPS scaling do you get with multithread?
Code: Select all
1 core 17.813
2 cores 8.981 [50.4%]
3 cores 5.992 [33.6%]
4 cores 4.490 [25.2%]Random thoughts:
- Your cpu can go as high as 3.4 GHz with turbo, which could be throwing off the single thread numbers if you didn't turn it off.
- When I get odd multi-threaded numbers like that, I usually find that multiple threads are doing frequent writes to variables in the same cache line in memory.
-paul
-
xmas79
- Posts: 286
- Joined: Mon Jun 03, 2013 7:05 pm
- Location: Italy
Re: An altenative perft() initial FEN.
Ummhh... funny stuff here.... That's what I think too, but here are my results of a perft(5) (it's late...):Something like this should get near-perfect scaling, at least as long as you're using physical cores
Code: Select all
perftfastmt 5
1 core : 58.0M LNPS [ 100% ]
2 cores: 110.2M LNPS [ 190% ]
3 cores: 160.4M LNPS [ 277% ]
4 cores: 186.9M LNPS [ 322% ]
8 cores: 207.2M LNPS [ 357% ]Not near-perfect NPS scaling must be due to something I do in a fancy way... I simply put every position up to ply=3 into a queue and then start processing them in parallel. When a thread finishes its work, dequeues another position. I see no "waiting" threads, that means I'm doing something in a very inefficient way? How to split correctly?
Disabling turboboost seems to have poor effect on my scaling, except for 8 cores:Random thoughts:
- Your cpu can go as high as 3.4 GHz with turbo, which could be throwing off the single thread numbers if you didn't turn it off.
Code: Select all
perftfastmt 5
1 core : 43.6M LNPS [ 100% ]
2 cores: 83.5M LNPS [ 192% ]
3 cores: 116.9M LNPS [ 268% ]
4 cores: 133.3M LNPS [ 306% ]
8 cores: 183.0M LNPS [ 419% ] <-- boostAhhh processor things... I think I don't have any false sharing issue in this code, expect for the queue itself of course. But items in the queue can be up to let's say 10.000 and get dequeued at not so fast rate, so I don't expect that performance drop. Splitting up to ply=2 instead of ply=3 reduces queue elements (and hence a possible false sharing problem), but performance doesn't go up...- When I get odd multi-threaded numbers like that, I usually find that multiple threads are doing frequent writes to variables in the same cache line in memory.
Natl.
-
syzygy
- Posts: 5786
- Joined: Tue Feb 28, 2012 11:56 pm
Re: An altenative perft() initial FEN.
Since you mention hyperthreading, you probably have an intel cpu. In that case it is very unlikely you have 8 cores. You most likely have 4 cores with HT, which means 4 logical hardware threads. It is expected that 8 hyperthreads only go a bit faster than 4 threads on a 4 core cpu.xmas79 wrote:Up to 4 cores performance could be acceptable (even if a 3.3x is not as 4x), but, seriously men, 357% with 8 cores is really ridiculous. As already said, I suspect it's a Hyperthreading problem, waiting to see if other people noticed something similiar...
Assuming you don't write into the queue after creating it, there can be no false sharing there. The only thing you share is the atomic counter for picking the next element in the queue (or a mutex or spinlock for accessing and incrementing the counter if you don't use an atomic increment). As you said this shouldn't be an issue as the number of elements in the queue is very low compared to the total number of visited nodes.Ahhh processor things... I think I don't have any false sharing issue in this code, expect for the queue itself of course.
Of course not sharing anything else does not mean there is no false sharing.
-
xmas79
- Posts: 286
- Joined: Mon Jun 03, 2013 7:05 pm
- Location: Italy
Re: An altenative perft() initial FEN.
syzygy wrote:Since you mention hyperthreading, you probably have an intel cpu. In that case it is very unlikely you have 8 cores. You most likely have 4 cores with HT, which means 4 logical hardware threads.
xmas79 wrote:...This is all on my quad-core Core i7-3630QM@2.40GHz...
xmas79 wrote:...(CPU have 4 cores with HT)...
What I already suspected (and then confirmed).syzygy wrote:It is expected that 8 hyperthreads only go a bit faster than 4 threads on a 4 core cpu.
This is far less obvious in my opinion. If I write a multithread program that have zero global/shared variables among treads (and cache-aligned variables) I think false share is absent too. Am I wrong?syzygy wrote:Of course not sharing anything else does not mean there is no false sharing.