about speed et profiling tools

xr_a_y · Post by **xr_a_y** » Thu Sep 20, 2018 8:58 pm

I did this experiment this week.

Starting from a bare alpha-beta search only config in Weini (without even any sort of move ordering) and only a PST eval, Weini is -500elo versus fairy-max, but is crunching 1Mnps using 1 thread.
On the same hardware, I gradually add some sorting stuff, null move, pvs, aspiration, lmr, lmp, extension ... still with only a PST eval. The good news is that Weini strength is raising as I add more search tricks, and at the end easilly wins fairy-max. The best additions seems to be mvv-lva, TT, and recapture extension for now. Not everything is activated now but Weini speed has fallen now to 600knps.
For information Weini 0.0.24 with better evaluation is now only at 400knps (I count nodes+qnodes).

I easilly understand how speed decreases when evaluation becomes more complex. I have difficulties to understand why it also decreases (a lot) when activating more search features. Of course, they are more branches in the code, TT must be probed, ...

I use perf-stat, perf-top, valgrind, visual studio, gnuprof, ... but I cannot catch real hot-spots now, especially in the optimized compilation ...

Any advices will be appreciated.

Ras · Post by **Ras** » Thu Sep 20, 2018 9:16 pm

You could use gprof on the 1M NPS version and then on the 400k NPS version and compare the data to see where the additional time is being spent now. For example, you could give a node limit to the move time and tell Weini to search let's say a 100 million nodes and then stop.

xr_a_y · Post by **xr_a_y** » Thu Sep 20, 2018 9:30 pm

Ras wrote: ↑Thu Sep 20, 2018 9:16 pm You could use gprof on the 1M NPS version and then on the 400k NPS version and compare the data to see where the additional time is being spent now. For example, you could give a node limit to the move time and tell Weini to search let's say a 100 million nodes and then stop.

This is more or less what I do but profilers cannot give the exact behavior of the optimized (-O3) compiled binary. I mean I cannot access a useful line by line profiling. And of course the not-optimized code seems to differ quite a lot in term of hot spot.

I think what is happening is that the more nodes are cut, the more the movegenerator is the bottleneck (I am not using bitboard for move generation yet...).

Sven · Post by **Sven** » Thu Sep 20, 2018 10:08 pm

xr_a_y wrote: ↑Thu Sep 20, 2018 8:58 pm I easilly understand how speed decreases when evaluation becomes more complex. I have difficulties to understand why it also decreases (a lot) when activating more search features. Of course, they are more branches in the code, TT must be probed, ...

It is not necessarily a speed issue.

A possible explanation might also be that the stronger search implies a higher ratio of QS nodes vs. full-width nodes, and if you almost never call eval() at full-width nodes then QS nodes on average take significantly more time. In my imagination a weaker search frequently visits "bad" nodes that can be refuted very easily by a simple capture and therefore with a very small QS subtree.

You could try to analyze this by collecting statistics about node counts.

Ras · Post by **Ras** » Fri Sep 21, 2018 1:10 am

xr_a_y wrote: ↑Thu Sep 20, 2018 9:30 pmAnd of course the not-optimized code seems to differ quite a lot in term of hot spot.

I would still go with the non-optimised binaries for profiling because that still shows where the program is spending most of its time. For the move generator, just implement perft - that gives you an idea how many NPS you could have without search and eval. If that is something like 10 times the node rate in the real search, then this won't be a bottleneck.

And of course Sven's explanation makes also a lot of sense. There's something else here: once your eval becomes more fine grained, this itself will lower the speed. Let's look at three cases where your coarse eval gave beta, that would have been three cut-offs. Now with the finer one, let's assume you get beta-1, beta and beta+1. That will only be two cut-offs. On the other hand, you get more accuracy. That will likely become a trade-off.

If I take a look at my engine, which is in the same range as yours, I get around 1900 kNPS when analysing in the start position (single thread only). With perft, I get around 11000 kNPS. No bitboards either, and my hardware is quite old (Phenom 2 1100T).

xr_a_y · Post by **xr_a_y** » Fri Sep 21, 2018 6:56 am

Ras wrote: ↑Fri Sep 21, 2018 1:10 am
xr_a_y wrote: ↑Thu Sep 20, 2018 9:30 pmAnd of course the not-optimized code seems to differ quite a lot in term of hot spot.
I would still go with the non-optimised binaries for profiling because that still shows where the program is spending most of its time. For the move generator, just implement perft - that gives you an idea how many NPS you could have without search and eval. If that is something like 10 times the node rate in the real search, then this won't be a bottleneck.

And of course Sven's explanation makes also a lot of sense. There's something else here: once your eval becomes more fine grained, this itself will lower the speed. Let's look at three cases where your coarse eval gave beta, that would have been three cut-offs. Now with the finer one, let's assume you get beta-1, beta and beta+1. That will only be two cut-offs. On the other hand, you get more accuracy. That will likely become a trade-off.

If I take a look at my engine, which is in the same range as yours, I get around 1900 kNPS when analysing in the start position (single thread only). With perft, I get around 11000 kNPS. No bitboards either, and my hardware is quite old (Phenom 2 1100T).

Perft is implemented, Weini generates around 5Mnps (valid nodes), 10Mnps (pseudo nodes), on a very old corei7 and is very far from your engine in term of real search speed.

Joost Buijs · Post by **Joost Buijs** » Fri Sep 21, 2018 7:48 am

I use Intel VTune, from which I feel that it gives me the most information. It only runs on Intel proc. not on AMD, for me no problem because I don't use AMD anyway.

My engine typically does ~2500 kn/s on the starting position (single core on a 4 GHz. Broadwell), perft runs at ~125.000 kn/s. I count each move.do() as a node, most engines count a call to search() as a node which usually gives a higher number because of re-searches.

Most time consuming functions are evaluation, SEE and probing the hash table, move generation only takes a few percent of the total time.

xr_a_y · Post by **xr_a_y** » Fri Sep 21, 2018 7:47 pm

Sven wrote: ↑Thu Sep 20, 2018 10:08 pm
xr_a_y wrote: ↑Thu Sep 20, 2018 8:58 pm I easilly understand how speed decreases when evaluation becomes more complex. I have difficulties to understand why it also decreases (a lot) when activating more search features. Of course, they are more branches in the code, TT must be probed, ...
It is not necessarily a speed issue.

A possible explanation might also be that the stronger search implies a higher ratio of QS nodes vs. full-width nodes, and if you almost never call eval() at full-width nodes then QS nodes on average take significantly more time. In my imagination a weaker search frequently visits "bad" nodes that can be refuted very easily by a simple capture and therefore with a very small QS subtree.

You could try to analyze this by collecting statistics about node counts.

Using this test position

[d] rnbq1rk1/1p1n1ppp/2pbp3/3p2P1/p4P2/1P1BPN1P/PBPP4/RN1QK2R w KQ - 1 10

Activating the major search trick but only a PST eval ==> 450knps
the values of nodes / qnodes is :

Code: Select all

#INFO    - 2018-09-21 19:19:59-228: visitednodes                         104361
#INFO    - 2018-09-21 19:19:59-228: visitedqnodes                        372858
#INFO    - 2018-09-21 19:19:59-228: visitedrealqnodes                    109763

Note that a "real" qnodes is a qnodes that is not cut by the stand pat test.

For what it's worth, here's the seldepth for all depth

Code: Select all

#INFO    - 2018-09-21 19:19:59-228:   0 seldepth  8
#INFO    - 2018-09-21 19:19:59-228:   1 seldepth  6
#INFO    - 2018-09-21 19:19:59-228:   2 seldepth  13
#INFO    - 2018-09-21 19:19:59-228:   3 seldepth  12
#INFO    - 2018-09-21 19:19:59-228:   4 seldepth  15
#INFO    - 2018-09-21 19:19:59-228:   5 seldepth  22
#INFO    - 2018-09-21 19:19:59-228:   6 seldepth  21
#INFO    - 2018-09-21 19:19:59-228:   7 seldepth  25
#INFO    - 2018-09-21 19:19:59-228:   8 seldepth  27

The hotspots are then

Code: Select all

  %            function
 21.54      GetTheat
 10.42      ApplyMove
  5.21      Sort moves
  5.21      Move generator
  5.21      Eval

Now using the alpha/beta bare search (not going to the same depth ...) ==> 1.5Mnps

Code: Select all

#INFO    - 2018-09-21 19:22:41-773: visitednodes                         64
#INFO    - 2018-09-21 19:22:41-773: visitedqnodes                        17410543
#INFO    - 2018-09-21 19:22:41-773: visitedrealqnodes                    8737286

Seldepth speak by themself (without any move ordering, qsearch is just crazy ...)

Code: Select all

#INFO    - 2018-09-21 19:22:41-773:   0 seldepth  25
#INFO    - 2018-09-21 19:22:41-773:   1 seldepth  30
#INFO    - 2018-09-21 19:22:41-773:   2 seldepth  32

So that searching more qnode seems faster than less here.

The hotspots are then

Code: Select all

%                function
42.00       Eval
 16.00      Move generator
  8.00      Qsearch

xr_a_y · Post by **xr_a_y** » Fri Sep 21, 2018 9:40 pm

And here is the full statistic output of Weini 0.0.24 on the same position

Code: Select all

#INFO    - 2018-09-21 21:38:38-910: FEN : rnbq1rk1/1p1n1ppp/2pbp3/3p2P1/p4P2/1P1BPN1P/PBPP4/RN1QK2R w KQ- - 1 10
#INFO    - 2018-09-21 21:38:38-910: HASH  : 14581085400006350364
#INFO    - 2018-09-21 21:38:38-910:  ===================== Position display (end) ==================== 
#INFO    - 2018-09-21 21:38:38-910: Next search time 1200
#INFO    - 2018-09-21 21:38:38-910: SearchSync begin
#INFO    - 2018-09-21 21:38:38-910: SearchSync waiting for threads
#INFO    - 2018-09-21 21:38:38-910: SearchSync all threads ok
1 -24 0 113 (1/6), pv 	0-0  
2 -21 0 592 (2/6), pv 	b3a4 a8a4  
3 -1 1 5456 (3/11), pv 	0-0 a4a3 b2c3  
4 -28 1 8288 (4/14), pv 	b3a4 a8a4 0-0 a4b4  
5 -16 3 14487 (5/16), pv 	b3a4 a8a4 0-0 d7c5 b1c3  
6 -18 5 23042 (6/16), pv 	b3a4 d7c5 0-0 c5a4 b2d4 b8d7  
7 -22 9 39048 (7/17), pv 	b3a4 d7c5 b2d4 c5d3+ c2d3 c6c5 d4c3  
8 -27 17 74902 (8/22), pv 	0-0 d8b6 b2d4 c6c5 d4b2 a4b3 c2b3 b8c6  
9 -27 29 120297 (9/23), pv 	0-0 d8b6 b2c3 c6c5 f1e1 b8c6 a2a3 f7f6 g5f6  
#INFO    - 2018-09-21 21:38:39-414: Window re-search -33 -21 (-21, b3 a4)
10 -19 75 321398 (10/27), pv 	b3a4 d8b6 b2e5 d7e5 f4e5 d6c5 b1c3 b8d7 0-0 b6c7  
#INFO    - 2018-09-21 21:38:40-109: >> TC endeed in AlphaBeta
#INFO    - 2018-09-21 21:38:40-109: nodes per s : 424 knps (n:128978, q:380056 (74%), t:1200ms)
#INFO    - 2018-09-21 21:38:40-109: End of search
#INFO    - 2018-09-21 21:38:40-109: Searcher return move b3 a4(6000) | with score -19 at depth 9
#INFO    - 2018-09-21 21:38:40-109: Best move is b3 a4 with score -19 (depth 9/40)
#INFO    - 2018-09-21 21:38:40-109: betacut                              47767
#INFO    - 2018-09-21 21:38:40-109: betacutroot                          1
#INFO    - 2018-09-21 21:38:40-109: qbetacut                             64105
#INFO    - 2018-09-21 21:38:40-109: qearlybetacut                        203860
#INFO    - 2018-09-21 21:38:40-109: deltaalphacut                        911
#INFO    - 2018-09-21 21:38:40-109: seecut                               309
#INFO    - 2018-09-21 21:38:40-109: seecutAlphaBeta                      2808
#INFO    - 2018-09-21 21:38:40-109: nullmoveverification                 0
#INFO    - 2018-09-21 21:38:40-109: nullmovecutafterverification         0
#INFO    - 2018-09-21 21:38:40-109: nullmovecutbeta                      10842
#INFO    - 2018-09-21 21:38:40-109: nullmovecuttry                       22456
#INFO    - 2018-09-21 21:38:40-109: nullmovecutskipped                   358
#INFO    - 2018-09-21 21:38:40-109: reductionAndPruningReset             1694
#INFO    - 2018-09-21 21:38:40-109: extensionReset                       0
#INFO    - 2018-09-21 21:38:40-109: staticNullMovebetacut                0
#INFO    - 2018-09-21 21:38:40-109: staticNullMovebetacutQtry            0
#INFO    - 2018-09-21 21:38:40-109: staticNullMovebetacutQ               0
#INFO    - 2018-09-21 21:38:40-109: staticNullMovebetacutH               37436
#INFO    - 2018-09-21 21:38:40-109: failedReduction                      0
#INFO    - 2018-09-21 21:38:40-109: futilitycut                          0
#INFO    - 2018-09-21 21:38:40-109: qfutilitycut                         70915
#INFO    - 2018-09-21 21:38:40-109: futilityMovePruned                   79843
#INFO    - 2018-09-21 21:38:40-109: moveCountPruned                      261148
#INFO    - 2018-09-21 21:38:40-109: lmralphabeta                         25374
#INFO    - 2018-09-21 21:38:40-109: lmralphabetafail                     182
#INFO    - 2018-09-21 21:38:40-109: lmrsearch                            245
#INFO    - 2018-09-21 21:38:40-109: lmrsearchfail                        0
#INFO    - 2018-09-21 21:38:40-109: ttARHitExact                         146
#INFO    - 2018-09-21 21:38:40-109: ttARHitAlpha                         1313
#INFO    - 2018-09-21 21:38:40-109: ttARHitBeta                          5237
#INFO    - 2018-09-21 21:38:40-109: ttARHitAlphaOver                     1930
#INFO    - 2018-09-21 21:38:40-109: ttARHitBetaOver                      7047
#INFO    - 2018-09-21 21:38:40-109: ttARHitUsed                          0
#INFO    - 2018-09-21 21:38:40-109: ttARCollision                        1551
#INFO    - 2018-09-21 21:38:40-109: ttAllHit                             15956
#INFO    - 2018-09-21 21:38:40-109: ttHitExact                           0
#INFO    - 2018-09-21 21:38:40-109: ttHitAlpha                           3
#INFO    - 2018-09-21 21:38:40-109: ttHitBeta                            51
#INFO    - 2018-09-21 21:38:40-109: ttHitAlphaOver                       55
#INFO    - 2018-09-21 21:38:40-109: ttHitBetaOver                        174
#INFO    - 2018-09-21 21:38:40-109: ttHitUsed                            4106
#INFO    - 2018-09-21 21:38:40-109: ttCollision                          1551
#INFO    - 2018-09-21 21:38:40-109: ttARQHitExact                        43
#INFO    - 2018-09-21 21:38:40-109: ttARQHitAlpha                        7520
#INFO    - 2018-09-21 21:38:40-109: ttARQHitBeta                         4516
#INFO    - 2018-09-21 21:38:40-109: ttARQHitAlphaOver                    12201
#INFO    - 2018-09-21 21:38:40-109: ttARQHitBetaOver                     6310
#INFO    - 2018-09-21 21:38:40-109: ttARQHitUsed                         0
#INFO    - 2018-09-21 21:38:40-110: ttARQCollision                       20472
#INFO    - 2018-09-21 21:38:40-110: ttQAllHit                            31285
#INFO    - 2018-09-21 21:38:40-110: ttQHitExact                          0
#INFO    - 2018-09-21 21:38:40-110: ttQHitAlpha                          16
#INFO    - 2018-09-21 21:38:40-110: ttQHitBeta                           91
#INFO    - 2018-09-21 21:38:40-110: ttQHitAlphaOver                      362
#INFO    - 2018-09-21 21:38:40-110: ttQHitBetaOver                       226
#INFO    - 2018-09-21 21:38:40-110: ttQHitUsed                           12045
#INFO    - 2018-09-21 21:38:40-110: ttQCollision                         20472
#INFO    - 2018-09-21 21:38:40-110: ttClear                              0
#INFO    - 2018-09-21 21:38:40-110: ttQClear                             0
#INFO    - 2018-09-21 21:38:40-110: ttEClear                             0
#INFO    - 2018-09-21 21:38:40-110: ttELClear                            0
#INFO    - 2018-09-21 21:38:40-110: ttEPClear                            0
#INFO    - 2018-09-21 21:38:40-110: leafNodes                            106125
#INFO    - 2018-09-21 21:38:40-110: visitednodes                         128561
#INFO    - 2018-09-21 21:38:40-110: visitedqnodes                        380056
#INFO    - 2018-09-21 21:38:40-110: visitedrealqnodes                    161266
#INFO    - 2018-09-21 21:38:40-110: windowsuccess                        5
#INFO    - 2018-09-21 21:38:40-110: windowfails                          1
#INFO    - 2018-09-21 21:38:40-110: pvsRootsuccess                       397
#INFO    - 2018-09-21 21:38:40-110: pvsRootfails                         6
#INFO    - 2018-09-21 21:38:40-110: pvsAlphaBetasuccess                  5563
#INFO    - 2018-09-21 21:38:40-110: pvsAlphaBetafails                    198
#INFO    - 2018-09-21 21:38:40-110: analysisnodes                        471263
#INFO    - 2018-09-21 21:38:40-110: analysedlazynodes                    0
#INFO    - 2018-09-21 21:38:40-110: ttEvalLazyHits                       0
#INFO    - 2018-09-21 21:38:40-110: ttEvalHits                           132206
#INFO    - 2018-09-21 21:38:40-110: ttEvalPawnHits                       313647
#INFO    - 2018-09-21 21:38:40-110: ttEvalLazyCollision                  0
#INFO    - 2018-09-21 21:38:40-110: ttEvalCollision                      14932
#INFO    - 2018-09-21 21:38:40-110: ttEvalPawnCollision                  150
#INFO    - 2018-09-21 21:38:40-110: terminalnodes                        0
#INFO    - 2018-09-21 21:38:40-110: openingBookHits                      0
#INFO    - 2018-09-21 21:38:40-110: positionCopy                         2224710
#INFO    - 2018-09-21 21:38:40-110: ttsort                               15899
#INFO    - 2018-09-21 21:38:40-110: ttsorttry                            15899
#INFO    - 2018-09-21 21:38:40-110: pvsort                               0
#INFO    - 2018-09-21 21:38:40-110: pvsorttry                            0
#INFO    - 2018-09-21 21:38:40-110: pvrootsort                           0
#INFO    - 2018-09-21 21:38:40-110: pvrootsorttry                        0
#INFO    - 2018-09-21 21:38:40-110: iidsort                              0
#INFO    - 2018-09-21 21:38:40-110: probCutTry                           2069
#INFO    - 2018-09-21 21:38:40-110: probCutSuccess                       523
#INFO    - 2018-09-21 21:38:40-110: countCheckExtensionRoot              0
#INFO    - 2018-09-21 21:38:40-110: countCheckExtensionSearch            10
#INFO    - 2018-09-21 21:38:40-110: countCheckExtensionAlphaBeta         16244
#INFO    - 2018-09-21 21:38:40-110: countEndGameExtensionSearch          0
#INFO    - 2018-09-21 21:38:40-110: countVeryEndGameExtensionSearch      0
#INFO    - 2018-09-21 21:38:40-110: countPVExtensionSearch               0
#INFO    - 2018-09-21 21:38:40-110: countSingleReplyExtensionSearch      0
#INFO    - 2018-09-21 21:38:40-110: countSingularExtensionAlphaBeta      0
#INFO    - 2018-09-21 21:38:40-110: countNearPromotionExtensionSearch    0
#INFO    - 2018-09-21 21:38:40-110: countNearPromotionExtensionAlphaBeta 5607
#INFO    - 2018-09-21 21:38:40-110: countReCaptureExtensionSearch        6
#INFO    - 2018-09-21 21:38:40-110: countReCaptureExtensionAlphaBeta     5106
#INFO    - 2018-09-21 21:38:40-110: tcEBFAbort                           0
#INFO    - 2018-09-21 21:38:40-110: tcForcedAbort                        1
#INFO    - 2018-09-21 21:38:40-110: tcIdealAbort                         0
#INFO    - 2018-09-21 21:38:40-110: *********************************
#INFO    - 2018-09-21 21:38:40-110: Nodes per depths: 
#INFO    - 2018-09-21 21:38:40-110:   0 nodes  42
#INFO    - 2018-09-21 21:38:40-110:   0 qnodes 71
#INFO    - 2018-09-21 21:38:40-110:   1 nodes  131
#INFO    - 2018-09-21 21:38:40-110:   1 qnodes 461
#INFO    - 2018-09-21 21:38:40-110:   2 nodes  641
#INFO    - 2018-09-21 21:38:40-110:   2 qnodes 4815
#INFO    - 2018-09-21 21:38:40-110:   3 nodes  1695
#INFO    - 2018-09-21 21:38:40-110:   3 qnodes 6593
#INFO    - 2018-09-21 21:38:40-110:   4 nodes  2978
#INFO    - 2018-09-21 21:38:40-110:   4 qnodes 11509
#INFO    - 2018-09-21 21:38:40-110:   5 nodes  5040
#INFO    - 2018-09-21 21:38:40-110:   5 qnodes 18002
#INFO    - 2018-09-21 21:38:40-110:   6 nodes  8876
#INFO    - 2018-09-21 21:38:40-110:   6 qnodes 30172
#INFO    - 2018-09-21 21:38:40-110:   7 nodes  16811
#INFO    - 2018-09-21 21:38:40-110:   7 qnodes 58091
#INFO    - 2018-09-21 21:38:40-110:   8 nodes  28086
#INFO    - 2018-09-21 21:38:40-110:   8 qnodes 92211
#INFO    - 2018-09-21 21:38:40-110:   9 nodes  76453
#INFO    - 2018-09-21 21:38:40-110:   9 qnodes 244945
#INFO    - 2018-09-21 21:38:40-110: EBF                     1.82548
#INFO    - 2018-09-21 21:38:40-110:   2 BF                      3.90665
#INFO    - 2018-09-21 21:38:40-110:   2 QBF                     8.2351
#INFO    - 2018-09-21 21:38:40-110:   3 BF                      3.59707
#INFO    - 2018-09-21 21:38:40-110:   3 QBF                     3.78173
#INFO    - 2018-09-21 21:38:40-110:   4 BF                      2.15543
#INFO    - 2018-09-21 21:38:40-110:   4 QBF                     1.54604
#INFO    - 2018-09-21 21:38:40-110:   5 BF                      1.72437
#INFO    - 2018-09-21 21:38:40-110:   5 QBF                     1.65241
#INFO    - 2018-09-21 21:38:40-110:   6 BF                      1.72642
#INFO    - 2018-09-21 21:38:40-110:   6 QBF                     1.61914
#INFO    - 2018-09-21 21:38:40-110:   7 BF                      1.82634
#INFO    - 2018-09-21 21:38:40-110:   7 QBF                     1.79636
#INFO    - 2018-09-21 21:38:40-110:   8 BF                      1.77884
#INFO    - 2018-09-21 21:38:40-110:   8 QBF                     1.74819
#INFO    - 2018-09-21 21:38:40-110:   9 BF                      2.13256
#INFO    - 2018-09-21 21:38:40-110:   9 QBF                     2.05343
#INFO    - 2018-09-21 21:38:40-110:   0 seldepth  6
#INFO    - 2018-09-21 21:38:40-110:   1 seldepth  6
#INFO    - 2018-09-21 21:38:40-110:   2 seldepth  11
#INFO    - 2018-09-21 21:38:40-110:   3 seldepth  14
#INFO    - 2018-09-21 21:38:40-110:   4 seldepth  16
#INFO    - 2018-09-21 21:38:40-110:   5 seldepth  16
#INFO    - 2018-09-21 21:38:40-110:   6 seldepth  17
#INFO    - 2018-09-21 21:38:40-110:   7 seldepth  22
#INFO    - 2018-09-21 21:38:40-110:   8 seldepth  23
#INFO    - 2018-09-21 21:38:40-110:   9 seldepth  27
#INFO    - 2018-09-21 21:38:40-110:  10 seldepth  30
#INFO    - 2018-09-21 21:38:40-111: TT hit rate (used)      3.19381% (25.7333%)
#INFO    - 2018-09-21 21:38:40-111: TT collision rate       1.20643%
#INFO    - 2018-09-21 21:38:40-111: col rate (theoretical)    3.00343% 
#INFO    - 2018-09-21 21:38:40-111: QTT hit rate (used)     3.16927% (38.5009%)
#INFO    - 2018-09-21 21:38:40-111: QTT collision rate      5.38657%
#INFO    - 2018-09-21 21:38:40-111: col rate (theoretical)    16.1178%
#INFO    - 2018-09-21 21:38:40-111: ETT hit rate            28.0535%
#INFO    - 2018-09-21 21:38:40-111: ETT collision rate      3.16851%
#INFO    - 2018-09-21 21:38:40-111: col rate (theoretical)    5.41326%
#INFO    - 2018-09-21 21:38:40-111: ETTP hit rate           92.5057%
#INFO    - 2018-09-21 21:38:40-111: ETTP collision rate     0.0442403%
#INFO    - 2018-09-21 21:38:40-111: col rate (theoretical)    7.66514%
#INFO    - 2018-09-21 21:38:40-111: pvs success rate        96.5631%
#INFO    - 2018-09-21 21:38:40-111: pvs success rate (root) 98.5112%
#INFO    - 2018-09-21 21:38:40-111: window success rate     83.3333%
#INFO    - 2018-09-21 21:38:40-111: lmr success rate        99.2827%
#INFO    - 2018-09-21 21:38:40-111: lmr success rate (root) 100%
#INFO    - 2018-09-21 21:38:40-111: nullmove success rate   48.2811%
#INFO    - 2018-09-21 21:38:40-111: nullmove skipped rate   1.59423%
#INFO    - 2018-09-21 21:38:40-111: tt sort success rate    100% (should be 100 %)
#INFO    - 2018-09-21 21:38:40-111: Qtt sort success rate   100% (should be 100 %)

Sven · Post by **Sven** » Sat Sep 22, 2018 12:36 am

xr_a_y wrote: ↑Fri Sep 21, 2018 7:47 pm
Sven wrote: ↑Thu Sep 20, 2018 10:08 pm
xr_a_y wrote: ↑Thu Sep 20, 2018 8:58 pm I easilly understand how speed decreases when evaluation becomes more complex. I have difficulties to understand why it also decreases (a lot) when activating more search features. Of course, they are more branches in the code, TT must be probed, ...
It is not necessarily a speed issue.

A possible explanation might also be that the stronger search implies a higher ratio of QS nodes vs. full-width nodes, and if you almost never call eval() at full-width nodes then QS nodes on average take significantly more time. In my imagination a weaker search frequently visits "bad" nodes that can be refuted very easily by a simple capture and therefore with a very small QS subtree.

You could try to analyze this by collecting statistics about node counts.
[...]
Activating the major search trick but only a PST eval ==> 450knps
the values of nodes / qnodes is :
Code: Select all
#INFO    - 2018-09-21 19:19:59-228: visitednodes                         104361
#INFO    - 2018-09-21 19:19:59-228: visitedqnodes                        372858
#INFO    - 2018-09-21 19:19:59-228: visitedrealqnodes                    109763
Note that a "real" qnodes is a qnodes that is not cut by the stand pat test.
[...]
Now using the alpha/beta bare search (not going to the same depth ...) ==> 1.5Mnps
Code: Select all
#INFO    - 2018-09-21 19:22:41-773: visitednodes                         64
#INFO    - 2018-09-21 19:22:41-773: visitedqnodes                        17410543
#INFO    - 2018-09-21 19:22:41-773: visitedrealqnodes                    8737286
[...]
So that searching more qnode seems faster than less here.

I suggest to compare numbers for equal fixed depth only. Also your comparison goes to the extreme by removing *all* search features at once. Smaller steps would be good.

I took the same test position, started with normal current Jumbo, removed search features step by step while keeping the same full evaluation function, and got the following numbers for a fixed-depth search to 10 plies (single-core, VS build which is slower than GCC for me):

Code: Select all

search features    nodes       qsNodes      time    nps        qs%
normal              2.123.407    1.369.755    2,04  1.040.886  65%
no pruning         10.081.246    7.896.069    7,90  1.276.107  78%
no killer, no IID   9.829.471    7.722.409    7,76  1.266.684  79%
no nullmove       182.208.589  141.437.003  130,19  1.399.559  78%
no TT             915.899.029  749.467.322  670,14  1.366.728  82%

Here "no pruning" means that I removed LMR, Futility Pruning, Razoring, and Delta Pruning. "nodes" is the total number of nodes (including QS nodes).

So there are different cases: Removing all pruning features makes the program faster in terms of NPS while increasing the QS ratio. Of course the resulting program is a lot weaker than before. Removing killer moves and IID has almost no effect at this stage (for this test position it even gives a small improvement in "time to depth"). Removing nullmove increases NPS again without increasing the QS ratio (and time to depth goes up drastically here). And finally removing TT increases QS ratio, keeps NPS almost constant, and makes the program much weaker again.

It is hard to say what exactly causes what. For some search features introducing them lowers NPS, for others it doesn't. At least it seems to be confirmed (for this test position) that a stronger search does not necessarily correlate with a higher QS ratio, despite my expectation.

about speed et profiling tools

about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools

Re: about speed et profiling tools