Quiescence Search doesn't improve strength

hgm · Post by **hgm** » Tue Mar 02, 2021 11:01 pm

I had a look at Carnivor's move generator, and it is indeed using a very fast algorithm. I had seen that before; someone once told me that one of the GNU Chesses also used this principle. The trick is that all the moves a given piece would have from a given square on an empty board are tabulated, and by running through that table (and for sliders, skipping to the first square of the next direction (which is also tabulated) when you encounter an obstacle), you will never attempt to generate any moves that go off board. (Except that the end of the list is indicated by an off-board square number.) The tables are organized such that the current to-square can be used as index to them for looking up the next square (on the current or the next ray) in the sequence, so that you don't have to keep a separate index.

This was about as fast as you could get on pre-Pentium CPUs. With the advent of super-scalar CPUs (which can do multiple instructions per clock), it does have a big disadvantage, though: it contains an extremely long 'critical path' of long-latency instructions. Every iteration you are doing

ts = someTable[ts];

Even from the L1 cache (where the tables probably reside permanently) this takes 3 clocks on a modern CPU (or maybe even 4, nowadays?). So you cannot do an iteration faster in less than 3 clocks, and since CPUs can nowadays do 4 instructions per clock, you could do 12 instructions in that time. But especially for capture generation you don't have to do that much on the side: just fetch board[ts], and conclude it is empty, most of the time. That is a load, a test and a branch instruction.

So it could actually be faster now to use a separate index to the tables (which would be kept in a register), and just increment that every loop; the increment would be done in parallel with other instructions. Since, unlike (cached) memory fetch, incrementing can be done in a single clock, this makes the critical path much shorter. So I cannot exclude that a mailbox engine could still be sped up compared to Carnivor, by adapting the algorithm to the peculiarities of modern super-scaler CPUs.

Mike Sherwin · Post by **Mike Sherwin** » Tue Mar 02, 2021 11:23 pm

hgm wrote: ↑Tue Mar 02, 2021 11:01 pm I had a look at Carnivor's move generator, and it is indeed using a very fast algorithm. I had seen that before; someone once told me that one of the GNU Chesses also used this principle. The trick is that all the moves a given piece would have from a given square on an empty board are tabulated, and by running through that table (and for sliders, skipping to the first square of the next direction (which is also tabulated) when you encounter an obstacle), you will never attempt to generate any moves that go off board. (Except that the end of the list is indicated by an off-board square number.) The tables are organized such that the current to-square can be used as index to them for looking up the next square (on the current or the next ray) in the sequence, so that you don't have to keep a separate index.

This was about as fast as you could get on pre-Pentium CPUs. With the advent of super-scalar CPUs (which can do multiple instructions per clock), it does have a big disadvantage, though: it contains an extremely long 'critical path' of long-latency instructions. Every iteration you are doing

ts = someTable[ts];

Even from the L1 cache (where the tables probably reside permanently) this takes 3 clocks on a modern CPU (or maybe even 4, nowadays?). So you cannot do an iteration faster in less than 3 clocks, and since CPUs can nowadays do 4 instructions per clock, you could do 12 instructions in that time. But especially for capture generation you don't have to do that much on the side: just fetch board[ts], and conclude it is empty, most of the time. That is a load, a test and a branch instruction.

So it could actually be faster now to use a separate index to the tables (which would be kept in a register), and just increment that every loop; the increment would be done in parallel with other instructions. Since, unlike (cached) memory fetch, incrementing can be done in a single clock, this makes the critical path much shorter. So I cannot exclude that a mailbox engine could still be sped up compared to Carnivor, by adapting the algorithm to the peculiarities of modern super-scaler CPUs.

But what if the engine was multithreaded and was using all the threads? Then every thread would be using a single pipeline anyway.

mvanthoor · Post by **mvanthoor** » Tue Mar 02, 2021 11:33 pm

Mike Sherwin wrote: ↑Tue Mar 02, 2021 10:26 pm Hmm, I was not trying to fool anyone. So, I'm glad you pointed that out.

And here I am, implementing hash move sorting... it does seem to work (the hash move gets a score high enough to move it to the front of the move list, as it is added to the MVV-LVA score that may have already been there, and is higher than the highest MVV-LVA score), and in most positions, there's a massive cut in the search tree... but it still only gains me some time; not even enough to make an extra ply in the middle game.

Talk about expectations.

I'm half expecting that one of these sorting options will "magically" get me from depth 7-8 to somewhere around 14, but it seems that, to some extent, the gain the hash move sorting makes, is partially off-set by the speed loss the engine incurs due to the probe.

Note the following (I might be doing something wrong here):
- I am NOT yet cutting off the entire search by hash value; I'm ONLY probing for the move to test the move sorting.
- I do the probe right before the moves are generated, and if there's a hash entry, I get the move from it; without paying attention to the depth the hash entry was stored for. Can this be done for the move, or should the depth be equal or greater, just as for the values?
- Then I sort the moves.
- I save a "best_move" for the move loop, to put in the hash table when Alpha is raised (where the PV is also updated)
- I put this best move into the hash table when I write the entry. I write an entry where Beta is raised, and just before Alpha is returned (same as is shown on Bruce Moreland's site). I also save the hash flags and evaluation score, but as said, I don't use them yet.
- I write "depth" into the hash entry... intuitively, this feels correct, but by checking some engines, I have seen one that writes ply's.

Some results:

For example, this is a run in a K+P vs. K endgame, without a hash table:

Code: Select all

info score cp 140 depth 1 seldepth 1 time 0 nodes 7 nps 0 pv e2e4
info score cp 90 depth 2 seldepth 2 time 0 nodes 26 nps 0 pv e2e4 e8f7
info score cp 140 depth 3 seldepth 3 time 0 nodes 100 nps 0 pv e1f2 e8f7 e2e4
info score cp 110 depth 4 seldepth 4 time 0 nodes 394 nps 0 pv e1f2 e8d7 e2e4 d7c6
info score cp 140 depth 5 seldepth 5 time 0 nodes 843 nps 0 pv e1d2 e8d7 d2c3 d7c6 e2e4
info score cp 110 depth 6 seldepth 6 time 0 nodes 3666 nps 0 pv e1d2 e8d7 d2c3 d7d6 c3d4 d6c6
info score cp 180 depth 7 seldepth 8 time 2 nodes 11114 nps 5557000 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4
info score cp 180 depth 8 seldepth 10 time 7 nodes 52829 nps 7547000 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6b5 e2e4 b5c6
info score cp 190 depth 9 seldepth 10 time 27 nodes 141749 nps 5249963 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6c6 e4e5
info score cp 190 depth 10 seldepth 12 time 90 nodes 624560 nps 6939556 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6e6 e4e5 e6f5
info score cp 190 depth 11 seldepth 14 time 293 nodes 1523320 nps 5199044 pv e1d2 e8d7 d2c3 d7c6 c3c4 c6d6 c4d4 d6c6 e2e4 c6d6 e4e5 d6c6
info score cp 190 depth 12 seldepth 16 time 829 nodes 5604808 nps 6760926 pv e1d1 e8d7 d1c2 d7c6 c2d3 c6b5 d3d4 b5c6 e2e4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 13 seldepth 18 time 3158 nodes 16045639 nps 5080950 pv e1d1 e8d7 d1c2 d7c6 c2c3 c6c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5
info score cp 190 depth 14 seldepth 20 time 9473 nodes 62300713 nps 6576661 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5c6 c3d4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 15 seldepth 22 time 44420 nodes 215290136 nps 4846694 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5

Same run with a 32 MB Hash table (which in the end is only used 6.1%, because there are a lot of transpositions)

Code: Select all

info score cp 140 depth 1 seldepth 1 time 0 nodes 7 nps 0 pv e2e4
info score cp 90 depth 2 seldepth 2 time 0 nodes 21 nps 0 pv e2e4 e8f7
info score cp 140 depth 3 seldepth 3 time 0 nodes 55 nps 0 pv e2e4 e8f7 e1f2
info score cp 110 depth 4 seldepth 4 time 0 nodes 185 nps 0 pv e2e4 e8f7 e1f2 f7e6
info score cp 140 depth 5 seldepth 5 time 0 nodes 412 nps 0 pv e2e4 e8f7 e1f2 f7e6 f2e3
info score cp 110 depth 6 seldepth 6 time 0 nodes 2449 nps 0 pv e1d2 e8d7 d2c3 d7d6 c3d4 d6c6
info score cp 180 depth 7 seldepth 8 time 1 nodes 7339 nps 7339000 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4
info score cp 180 depth 8 seldepth 10 time 6 nodes 36287 nps 6047833 hashfull 2 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6b5 e2e4 b5c6
info score cp 190 depth 9 seldepth 10 time 23 nodes 100365 nps 4363696 hashfull 3 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6c6 e4e5
info score cp 190 depth 10 seldepth 12 time 85 nodes 526843 nps 6198153 hashfull 6 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6e6 e4e5 e6f5
info score cp 190 depth 11 seldepth 13 time 270 nodes 1274079 nps 4718811 hashfull 10 pv e1d2 e8d7 d2c3 d7c6 c3c4 c6d6 c4d4 d6c6 e2e4 c6d6 e4e5 d6c6
info score cp 190 depth 12 seldepth 16 time 780 nodes 4822260 nps 6182385 hashfull 14 pv e1d1 e8d7 d1c2 d7c6 c2d3 c6b5 d3d4 b5c6 e2e4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 13 seldepth 18 time 2862 nodes 13443760 nps 4697331 hashfull 22 pv e1d1 e8d7 d1c2 d7c6 c2c3 c6c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5
info score cp 190 depth 14 seldepth 20 time 8767 nodes 53252243 nps 6074169 hashfull 36 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5c6 c3d4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 15 seldepth 22 time 39418 nodes 178134146 nps 4519107 hashfull 61 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5

As can be seen, at depth 15, the time without the hash table is 44.4 seconds. With hash table, the time is 39.42 seconds. The search tree has been cut from ~215 million moves, down to ~178 million moves. So yes, it does seem that the hash move sorting is working, but I had expected a lot more from it. One of the CCRL-testers said: "As soon as you implement a hash table, your engine will be at least at 2200 Elo". I don't really believe that anymore, except if the search cuts by hash value gives a massive improvement. I also still need to implement all the other sorting options, such as PV Move sort, killers, and heuristics. It could of course be that all of these sorting options, together with the search cut by hash value, gives a huge boost: in the same way as ELO_GAIN(PST + SQ) > (ELO_GAIN(PST) + ELO_GAIN(SQ))

I have a feeling my expectations are somewhat out of line here.

PS: I have checked the scoring. It works. I have a MOVE 64-bit int (within a struct), which also holds the sort score. The struct has a function called "add_value"; this is used to add the MVV_LVA value, and also the hash move value. It basically gets the current score, XOR's it out of the move, adds current+value together, shifts it left the correct number of bits, and XOR's it in again. I have checked the before and after, and the hash move score is correctly added.

mvanthoor · Post by **mvanthoor** » Tue Mar 02, 2021 11:45 pm

From the starting position, no hash table:

Code: Select all

info score cp 70 depth 1 seldepth 1 time 0 nodes 7 nps 0 pv e2e4
info score cp 0 depth 2 seldepth 4 time 0 nodes 114 nps 0 pv d2d4 d7d5
info score cp 60 depth 3 seldepth 8 time 0 nodes 1037 nps 0 pv d2d4 d7d5 e2e3
info score cp 0 depth 4 seldepth 11 time 5 nodes 16662 nps 3332400 pv d2d4 d7d5 c1g5 c8g4
info score cp 20 depth 5 seldepth 15 time 55 nodes 180683 nps 3285145 pv d2d4 d7d5 c1f4 c8g4 f4e5
info score cp 5 depth 6 seldepth 18 time 403 nodes 1548514 nps 3842467 pv d2d4 d7d5 e2e3 c8d7 c2c4 e7e6 c4d5 e6d5
info score cp 15 depth 7 seldepth 24 time 3350 nodes 12511658 nps 3734823 pv c2c4 b8c6 d2d3 e7e5 e2e4 f8b4 c1d2 d7d6 d2b4 c6b4
info score cp 5 depth 8 seldepth 26 time 27462 nodes 112472885 nps 4095582 pv e2e4 d7d5 e4e5 c8d7 d2d4 e7e6 c2c4 f8e7 c4d5 e6d5

From the starting position, 64 MB hash table:

Code: Select all

info score cp 70 depth 1 seldepth 1 time 0 nodes 7 nps 0 pv e2e4
info score cp 0 depth 2 seldepth 4 time 0 nodes 82 nps 0 pv e2e4 e7e5
info score cp 60 depth 3 seldepth 8 time 0 nodes 473 nps 0 pv e2e4 e7e5 d2d4
info score cp 0 depth 4 seldepth 10 time 2 nodes 6122 nps 3061000 pv e2e4 e7e5 d2d4 d7d5
info score cp 20 depth 5 seldepth 14 time 43 nodes 122093 nps 2839372 hashfull 2 pv e2e4 e7e5 d2d4 d7d5 c1e3
info score cp 5 depth 6 seldepth 17 time 327 nodes 1146897 nps 3507330 hashfull 22 pv d2d4 d7d5 e2e3 c8d7 c2c4 e7e6 c4d5 e6d5
info score cp 15 depth 7 seldepth 24 time 2659 nodes 8952407 nps 3366832 hashfull 151 pv c2c4 b8c6 d2d3 e7e5 e2e4 f8b4 c1d2 d7d6 d2b4 c6b4
info score cp 5 depth 8 seldepth 26 time 18988 nodes 70583728 nps 3717281 hashfull 669 pv e2e4 d7d5 e4e5 c8d7 d2d4 e7e6 c2c4 f8e7 c4d5 e6d5

As before, only probing for the move, and sorting on it.

At depth 8, the search tree became reduced from 112 million nodes to 70 million nodes, a drop of 37.5%. That is no small improvemnt. The needed time drops from 27.5 seconds down to 18.9 seconds. That's also massively better. However, these speedups are not enough to go from depth 7 to 8 in an average middle-game, on fast time controls such as 1m+0.6s or 2m+1s.

So I'm still wondering if I'm doing something wrong, or, just as Lithander, my expectations were too high, and that I just need even MORE move sorting before it really starts to pay off. (And of course implement the cut-off of the entire move loop on the basis of the hash value.)

emadsen · Post by **emadsen** » Tue Mar 02, 2021 11:49 pm

lithander wrote: ↑Tue Mar 02, 2021 4:59 pm 1.) I implemented QSearch and MVV-LVA move ordering and didn't measure a significant ELO improvement in my test. I assumed a bug but couldn't find one. So I came here looking for advice.

2.) The first great tip was to not use time controls but measure to a fixed depth. Now I saw a small improvement in strength which was negated by my inefficient (e.g. use of LINQ) implementation under time (and not depth) constraints.

3.) Why was it only small? Shouldn't you expect much larger gains? Marcel disabled PSTs in Rustics evaluation and was now only counting material like MinimalChess and found that now the QSearch didn't provide him much benefit either.
This was the 2nd learning: You need a better eval for QSearch to provide a significant boost to playing strength. My expectations were wrong. So I implemented PSTs.

Thomas, thanks for the thread summary.

I disagree with #2. In my experience, fixed depth games cannot accurately measure engine strength, either in a gauntlet against other engines or in a self-testing match. Like NPS, depth is a nebulous concept, meaning one thing to engine A and another to engine B (due to variations in how they implement reductions and extensions). In self-testing, fixed-depth matches create a hall of mirrors. You don't know what you're looking at. A real strength increase? Or an illusion due to an eval term in the modified engine enabling it to exploit the unmodified engine, yet unable to perform any better against other engines (because the "bookkeeping" costs of calculating the eval term offset the benefits of the knowledge- meaning the engine is smarter but slower).

#3 makes no sense to me. An engine that evaluates positions solely by material is strategically stupid (not knowing how to develop pieces) and ignorant of the danger posed by passed pawns and attacks on squares near the king. However, if it has a correct quiescence search implementation, it can find discovered checks, forks, attacks on overloaded pieces, etc... all of which win material.

I took MadChess 3.0 Beta (2570 Elo), removed all eval terms except material, and ran a gauntlet tournament against other engines of varying strength, from TSCP 1.81 (1751 Elo) to Sungorus 1.4 (2311 Elo) at 2m+1s time control. I got this result:

Code: Select all

MadChess 3.0 Beta, Material Only,   No Quiescence = 1746 Elo
MadChess 3.0 Beta, Material Only, With Quiescence = 1915 Elo

This suggests a correct implementation of quiescence search is worth 169 Elo. I only had time for a quick test so the error bars are like +/- 50 Elo.

MadChess 3.0 Beta has a fairly fast bitboard implementation, doesn't allocate memory during search, has a hash table for score cutoffs and best moves, orders quiet moves by history heuristic score, implements null move and LMR (reducing between 3 and 6 ply), etc...

So perhaps my experience differs from yours due to me coming at the problem from the opposite direction as you: I've stripped down a mature engine. You're building up a new engine. Perhaps the difference between me finding quiescence search to be worth 169 Elo and you and Marcel finding it not worth much at all (both for material-only eval) can be explained away by the non-linear manner in which chess engine features combine. But, in my opinion, I don't think so. An engine with a correct quiescence search implementation should be stronger than one without it, even if it evaluates only material, because it's less vulnerable to hanging pieces due to the horizon effect.

Of course if you pit both versions of the engine against competitors that are > 300 Elo points stronger, it's likely to lose so often that quiescence search doesn't matter. But when pitted against nearly equal competitors, the version with quiescence search should be measurably stronger.

Ras · Post by **Ras** » Tue Mar 02, 2021 11:56 pm

mvanthoor wrote: ↑Tue Mar 02, 2021 11:45 pmAs before, only probing for the move, and sorting on it.

You won't reach much more than that except in the endgame because that's where actual transpositions happen a lot, in particular in king-pawn endgames. And yes, you can always take the move from a hash table hit. For cutting off however, your remaining search depth must not be larger than the depth from the hash table hit.

Mike Sherwin · Post by **Mike Sherwin** » Wed Mar 03, 2021 12:01 am

mvanthoor wrote: ↑Tue Mar 02, 2021 11:33 pm
Mike Sherwin wrote: ↑Tue Mar 02, 2021 10:26 pm Hmm, I was not trying to fool anyone. So, I'm glad you pointed that out.
And here I am, implementing hash move sorting... it does seem to work (the hash move gets a score high enough to move it to the front of the move list, as it is added to the MVV-LVA score that may have already been there, and is higher than the highest MVV-LVA score), and in most positions, there's a massive cut in the search tree... but it still only gains me some time; not even enough to make an extra ply in the middle game.

Talk about expectations.

I'm half expecting that one of these sorting options will "magically" get me from depth 7-8 to somewhere around 14, but it seems that, to some extent, the gain the hash move sorting makes, is partially off-set by the speed loss the engine incurs due to the probe.

Note the following (I might be doing something wrong here):
- I am NOT yet cutting off the entire search by hash value; I'm ONLY probing for the move to test the move sorting.
- I do the probe right before the moves are generated, and if there's a hash entry, I get the move from it; without paying attention to the depth the hash entry was stored for. Can this be done for the move, or should the depth be equal or greater, just as for the values?
- Then I sort the moves.
- I save a "best_move" for the move loop, to put in the hash table when Alpha is raised (where the PV is also updated)
- I put this best move into the hash table when I write the entry. I write an entry where Beta is raised, and just before Alpha is returned (same as is shown on Bruce Moreland's site). I also save the hash flags and evaluation score, but as said, I don't use them yet.
- I write "depth" into the hash entry... intuitively, this feels correct, but by checking some engines, I have seen one that writes ply's.

Some results:

For example, this is a run in a K+P vs. K endgame, without a hash table:
Code: Select all
info score cp 140 depth 1 seldepth 1 time 0 nodes 7 nps 0 pv e2e4
info score cp 90 depth 2 seldepth 2 time 0 nodes 26 nps 0 pv e2e4 e8f7
info score cp 140 depth 3 seldepth 3 time 0 nodes 100 nps 0 pv e1f2 e8f7 e2e4
info score cp 110 depth 4 seldepth 4 time 0 nodes 394 nps 0 pv e1f2 e8d7 e2e4 d7c6
info score cp 140 depth 5 seldepth 5 time 0 nodes 843 nps 0 pv e1d2 e8d7 d2c3 d7c6 e2e4
info score cp 110 depth 6 seldepth 6 time 0 nodes 3666 nps 0 pv e1d2 e8d7 d2c3 d7d6 c3d4 d6c6
info score cp 180 depth 7 seldepth 8 time 2 nodes 11114 nps 5557000 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4
info score cp 180 depth 8 seldepth 10 time 7 nodes 52829 nps 7547000 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6b5 e2e4 b5c6
info score cp 190 depth 9 seldepth 10 time 27 nodes 141749 nps 5249963 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6c6 e4e5
info score cp 190 depth 10 seldepth 12 time 90 nodes 624560 nps 6939556 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6e6 e4e5 e6f5
info score cp 190 depth 11 seldepth 14 time 293 nodes 1523320 nps 5199044 pv e1d2 e8d7 d2c3 d7c6 c3c4 c6d6 c4d4 d6c6 e2e4 c6d6 e4e5 d6c6
info score cp 190 depth 12 seldepth 16 time 829 nodes 5604808 nps 6760926 pv e1d1 e8d7 d1c2 d7c6 c2d3 c6b5 d3d4 b5c6 e2e4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 13 seldepth 18 time 3158 nodes 16045639 nps 5080950 pv e1d1 e8d7 d1c2 d7c6 c2c3 c6c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5
info score cp 190 depth 14 seldepth 20 time 9473 nodes 62300713 nps 6576661 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5c6 c3d4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 15 seldepth 22 time 44420 nodes 215290136 nps 4846694 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5
Same run with a 32 MB Hash table (which in the end is only used 6.1%, because there are a lot of transpositions)
Code: Select all
info score cp 140 depth 1 seldepth 1 time 0 nodes 7 nps 0 pv e2e4
info score cp 90 depth 2 seldepth 2 time 0 nodes 21 nps 0 pv e2e4 e8f7
info score cp 140 depth 3 seldepth 3 time 0 nodes 55 nps 0 pv e2e4 e8f7 e1f2
info score cp 110 depth 4 seldepth 4 time 0 nodes 185 nps 0 pv e2e4 e8f7 e1f2 f7e6
info score cp 140 depth 5 seldepth 5 time 0 nodes 412 nps 0 pv e2e4 e8f7 e1f2 f7e6 f2e3
info score cp 110 depth 6 seldepth 6 time 0 nodes 2449 nps 0 pv e1d2 e8d7 d2c3 d7d6 c3d4 d6c6
info score cp 180 depth 7 seldepth 8 time 1 nodes 7339 nps 7339000 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4
info score cp 180 depth 8 seldepth 10 time 6 nodes 36287 nps 6047833 hashfull 2 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6b5 e2e4 b5c6
info score cp 190 depth 9 seldepth 10 time 23 nodes 100365 nps 4363696 hashfull 3 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6c6 e4e5
info score cp 190 depth 10 seldepth 12 time 85 nodes 526843 nps 6198153 hashfull 6 pv e1d2 e8d7 d2c3 d7c6 c3d4 c6d6 e2e4 d6e6 e4e5 e6f5
info score cp 190 depth 11 seldepth 13 time 270 nodes 1274079 nps 4718811 hashfull 10 pv e1d2 e8d7 d2c3 d7c6 c3c4 c6d6 c4d4 d6c6 e2e4 c6d6 e4e5 d6c6
info score cp 190 depth 12 seldepth 16 time 780 nodes 4822260 nps 6182385 hashfull 14 pv e1d1 e8d7 d1c2 d7c6 c2d3 c6b5 d3d4 b5c6 e2e4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 13 seldepth 18 time 2862 nodes 13443760 nps 4697331 hashfull 22 pv e1d1 e8d7 d1c2 d7c6 c2c3 c6c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5
info score cp 190 depth 14 seldepth 20 time 8767 nodes 53252243 nps 6074169 hashfull 36 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5c6 c3d4 c6d6 e4e5 d6e6 d4e4
info score cp 190 depth 15 seldepth 22 time 39418 nodes 178134146 nps 4519107 hashfull 61 pv e1d1 e8d7 d1c1 d7c6 c1c2 c6b5 c2c3 b5c5 e2e4 c5d6 c3d4 d6e6 e4e5 e6f5 d4d5
As can be seen, at depth 15, the time without the hash table is 44.4 seconds. With hash table, the time is 39.42 seconds. The search tree has been cut from ~215 million moves, down to ~178 million moves. So yes, it does seem that the hash move sorting is working, but I had expected a lot more from it. One of the CCRL-testers said: "As soon as you implement a hash table, your engine will be at least at 2200 Elo". I don't really believe that anymore, except if the search cuts by hash value gives a massive improvement. I also still need to implement all the other sorting options, such as PV Move sort, killers, and heuristics. It could of course be that all of these sorting options, together with the search cut by hash value, gives a huge boost: in the same way as ELO_GAIN(PST + SQ) > (ELO_GAIN(PST) + ELO_GAIN(SQ))

I have a feeling my expectations are somewhat out of line here.

PS: I have checked the scoring. It works. I have a MOVE 64-bit int (within a struct), which also holds the sort score. The struct has a function called "add_value"; this is used to add the MVV_LVA value, and also the hash move value. It basically gets the current score, XOR's it out of the move, adds current+value together, shifts it left the correct number of bits, and XOR's it in again. I have checked the before and after, and the hash move score is correctly added.

Just for move ordering I don't think the depth matters. I do think that you should get a lot more than what you did though. And 2200 from 1500 just because of adding a TT is nowhere near reality. Stockfish is Stockfish because it has "hundreds" of little things all finely tuned that give sometimes only a single elo and sometimes a bit more.

Unless a super genius comes along, and I'm talking like a dignitary from heaven, the days of the lone programmer competing with the top engines is way past us. I mean without digging into SF code for example. It would be better if Lithander, you and I joined forces and we can invite HGM to join with us also as well as anyone else to produce something new. I don't know how that would work but it would be excellent if it can be done. It would require each of us keeping our egos in check though, lol!

Mike Sherwin · Post by **Mike Sherwin** » Wed Mar 03, 2021 12:03 am

Referring back to Bricabrac and the unusual code it comes from a basic idea that produces this.
FEN: rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1

Bricabrac:
1 00:00 20 20 +0.33 b1c3
2 00:00 218 218 +0.12 e2e4 d7d5
3 00:00 974 974 +0.34 d2d4 b8c6 e2e4
4 00:00 6k 6k +0.18 g1f3 d7d5 d2d4 e7e6
5 00:00 8k 8k +0.18 g1f3 d7d5 d2d4 g8f6 e2e3
6 00:00 18k 18k +0.14 b1c3 g8f6 e2e4 d7d5 e4d5 f6d5
7 00:00 35k 35k +0.18 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 e2e3
8 00:00 63k 63k +0.17 b1c3 g8f6 e2e4 b8c6 g1f3 d7d5 e4d5 f6d5
9 00:00 192k 19,224k +0.22 b1c3 g8f6 d2d4 d7d5 c1f4 c8f5 c3b5 b8a6 e2e3
10 00:00 408k 13,604k +0.18 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 c1f4 e7e6 a1c1 h7h6
11 00:00 1,199k 11,991k +0.30 b1c3 g8f6 e2e4 b8c6 g1f3 e7e5 f1b5 f8d6 b5c6 d7c6 e1g1
12 00:00 2,300k 12,104k +0.11 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 c1e3 c8e6 a1c1 h7h6 h2h3 a7a6
13 00:00 3,994k 12,481k +0.09 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 c1e3 c8e6 a1c1 a8c8 h2h3 h7h6 a2a3
14 00:00 6,884k 12,294k +0.18 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 f3e5 c8e6 e5c6 b7c6 c1f4 g7g6 f4e5 f8g7
15 00:01 12,972k 12,238k +0.07 b1c3 g8f6 g1f3 d7d5 d2d4 c8f5 c1e3 b8c6 a1c1 a8c8 h2h3 e7e6 g2g3 f8d6 f1g2
16 00:02 31,007k 11,880k +0.14 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 c1f4 c8e6 a1c1 a8c8 e2e3 f6h5 f4g5 h7h6 g5h4 a7a6
17 00:05 61,884k 11,993k +0.09 b1c3 g8f6 g1f3 d7d5 d2d4 b8c6 c1e3 c8e6 a1c1 a8c8 h2h3 h7h6 a2a3 g7g6 g2g3 f8g7 f1g2
18 00:11 135,566k 11,799k +0.10 b1c3 g8f6 g1f3 d7d5 d2d4 c8e6 c1f4 b8c6 e2e3 f6h5 f4g5 h7h6 g5h4 g7g6 f1d3 f8g7 e1g1 e8g8
19 00:23 281,382k 11,964k +0.35 e2e4 g8f6 e4e5 f6d5 g1f3 c7c5 b1c3 d5c3 d2c3 d7d5 c1e3 e7e6 f1b5 b8c6 e1g1 c8d7 a2a4 a8c8 h2h3

If the final pv is explored with SF13 it is quite reasonable with no obvious blunders except maybe positional. I did this in RomiChess a long time ago and the idea in its most raw form lost 50 elo even though Romi was searching 40+ ply deep in about a minute. I think it is worth losing 50 elo vs humans to show more of the expected game. Anyway, I could not figure out how to tune it to gain elo so I lost interest in the idea. I'm just taking a little time with Bric to see if it can gain some elo. Here is the pertinent code within search. Just the most basic idea that produced the above output.

Code: Select all

    if (depth > 3) {
      mov->score = -Search(t, m + n, -beta, -alpha, depth - 3);
      if (mov->score > alpha && mov->score < beta)
        mov->score = -Search(t, m + n, -beta, -alpha, depth - 1);
    }
    else {
      mov->score = -Search(t, m + n, -beta, -alpha, depth - 1);
    }

And sorry to Lithander for hijacking his thread.

mvanthoor · Post by **mvanthoor** » Wed Mar 03, 2021 12:06 am

emadsen wrote: ↑Tue Mar 02, 2021 11:49 pm So perhaps my experience differs from yours due to me coming at the problem from the opposite direction as you: I've stripped down a mature engine. You're building up a new engine. Perhaps the difference between me finding quiescence search to be worth 169 Elo and you and Marcel finding it not worth much at all (both for material-only eval)

It would be quicker to run a test again, than to try to find what I said in this thread

I have seen a noticable improvement with QSearch enabled, a noticable improvement with PST's on top of material only, and a massive (more than 500 Elo) with both enabled.

Of course if you pit both versions of the engine against competitors that are > 300 Elo points stronger, it's likely to lose so often that quiescence search doesn't matter. But when pitted against nearly equal competitors, the version with quiescence search should be measurably stronger.

It is; at least in Rustic's case.

mvanthoor · Post by **mvanthoor** » Wed Mar 03, 2021 12:14 am

Ras wrote: ↑Tue Mar 02, 2021 11:56 pm
mvanthoor wrote: ↑Tue Mar 02, 2021 11:45 pmAs before, only probing for the move, and sorting on it.
You won't reach much more than that except in the endgame because that's where actual transpositions happen a lot, in particular in king-pawn endgames. And yes, you can always take the move from a hash table hit. For cutting off however, your remaining search depth must not be larger than the depth from the hash table hit.

Thanks. I know I have to take the hashflags Alpha, Beta, and Exact into account because of the fact that cutoffs are only bounds, and not exact values; and because cutting off on a result searched less deep than where you are, wouldn't be good.

Even in the endgame, move sorting only doesn't cut the tree enough to be able able to occasionally reach one ply more than a version of Rustic without this feature. The problem here is that Rustic itself is so fast (>5 million nps in the endgame, in some positions), that the search tree cut gotten by the hash move move probe/sort is offset by the speed loss of the engine overall. So the move tree may be smaller, but the engine also searches it slower.

I've started a match between Alpha 1, and the current version, with the only difference being the hash move sorting for the current version. At this point, after 30 out of 60 games at 1m+0.6s (quick test, will run a 200 game test overnight at 2m+1s), the current version WITH hash move sorting stands at 60% score, or +70 Elo. (But the error bars are huge, obviously.)

Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength