Quiescence Search doesn't improve strength

mvanthoor · Post by **mvanthoor** » Thu Feb 25, 2021 6:32 pm

lithander wrote: ↑Thu Feb 25, 2021 5:41 pm Thanks for doing that! I'm (almost ) convinced that I need PSTs in my chess engine! (No, really, I will add them asap)

If you'd run Rustic PASTA against a Rustic PASTA without QSearch is there a significant difference between both versions? If yes: My implementation is probably faulty. If they both play 'abysmal' without much difference my QSearch may actually be okay and the evaluation just needs to be better before it shows.

In the previous match, PASTA actually managed to win 3 games and draw 4, which put it at -480 Elo to Alpha 1. (= +/- CCRL 1200 for PASTA).

I have disabled QSearch in Rustic PASTA, to create Rustic YOLO (You Only Live Once).

PASTA = no PST
YOLO = no PST, no QSearch

After about 30 games out of 100, PASTA is +105 Elo ahead. I'll post the final result.

It's not fun to watch though. Both engines are floundering; if the position ends up qiuet, they shuffle pieces back and forth (because of no PST, because everything that doesn't lose material is equally good), and many of those games end in 3-fold draws. If the opening ends up in a somewhat sharp position, YOLO sometimes makes a mistake. The engine can see 7-8 plies ahead (because of no QSearch and that one removed eval term), and that's deep enough to not make any massively blatant blunders; but it sometimes does lose material, and thus the game.

Both versions still have check extensions, but at seldepth, it's clearly visible that the version with QSearch extends MUCH further; often up to 10 ply extra.

I might run a test of Alpha 1 against Alpha 0.9 (version without QSearch) to see what the result is there.

mvanthoor · Post by **mvanthoor** » Thu Feb 25, 2021 6:36 pm

In short:

Rustic Alpha 1: 1680 Elo
Rustic PASTA: 1200 Elo (without PST, -480 Elo)
Rustic YOLO: 1095 Elo (without PST, without QSearch, -105 Elo)

Rustic ends up at a tiny bit stronger than MinimalChess.

So if we take MinimalChess:

0.2: 1050 Elo
0.2 + QSearch: 1155 Elo (+105)
0.2 + QSearch + PST: 1635 Elo (+480)

MinimalChess would end up a bit weaker than Rustic Alpha 1... assuming your PST's give the same Elo boost as mine (you could try by actually using mine; I don't mind, as the program is open source), you have no bugs, and the speed/search depth is comparable.

mvanthoor · Post by **mvanthoor** » Thu Feb 25, 2021 7:20 pm

After 100 games, the difference between PASTA (no PST) and YOLO (no PST, no QSearch) is +79 Elo for PASTA. (edit: with fairly large error bars, so I still think +/- 100 Elo as mentioned by Sven, is possible at the least.)

The one thing I hadn't expected was the MASSIVE Elo boost provided by the PST's. After the hash table, hash move ordering, and killer/heuristic ordering are done, I'll go and research Texel tuning, to tune the PST's and material values. I wonder how much Elo that last step will gain. I'm seeing values between 70-120 Elo for different engines. But that step is for later

mhouppin · Post by **mhouppin** » Thu Feb 25, 2021 9:58 pm

FWIW, I have been collecting data for two weeks about the rating differences between each version of Stash.
Version 9 has a PST-only eval, and a simple Negascout search. Same for version 10, except there is a quiescence search on the top, and nothing else.
I ran a 200-game minimatch at 60+0.6 between the two versions, using a 2-move (4-ply) book. Here are the results:

Code: Select all

   # PLAYER          :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 Stash v10.0     :   416.7   80.6   183.0     200  91.5     100  178   10   12   5.0
   2 Stash v9.0.1    :     0.0   ----    17.0     200   8.5     ---   12   10  178   5.0

Quiescence search is in my opinion a must-have component in any A/B engine to get above 1600-1700 Elo, so not gaining anything from it suggests that there is a bug somewhere. First thing I would check is if you have a stand-pat evaluation in it. Not having it will make the engine think all the captures are "forced", when most of the time they are not, and there are better quiet moves lurking around: https://www.chessprogramming.org/Quiesc ... anding_Pat

mvanthoor · Post by **mvanthoor** » Fri Feb 26, 2021 12:48 am

mhouppin wrote: ↑Thu Feb 25, 2021 9:58 pm FWIW, I have been collecting data for two weeks about the rating differences between each version of Stash.
Version 9 has a PST-only eval, and a simple Negascout search. Same for version 10, except there is a quiescence search on the top, and nothing else.

Heh. Today I ran my own test, between Rustic Alpha 1 (PST + QSearch) and a version with PST but without QSearch. My results were basically the same as yours:

Code: Select all

Score of Rustic 2021-02-15 vs Rustic Alpha 0.9: 175 - 10 - 15 [0.912]
...      Rustic 2021-02-15 playing White: 88 - 4 - 8  [0.920] 100
...      Rustic 2021-02-15 playing Black: 87 - 6 - 7  [0.905] 100
...      White vs Black: 94 - 91 - 15  [0.507] 200
Elo difference: 407.3 +/- 78.2, LOS: 100.0 %, DrawRatio: 7.5 %
200 of 200 games finished.

So not having PST's loses +/- 520 Elo. Not having QSearch loses +/- 400 Elo. Not having both loses +/- 625 Elo.

Lithander: If I substract 625 Elo from Rustic's current CCRL rating, I get: 1680 - 625 = 1055. MinimalChess 0.2, which also doesn't have PST's or QSearch (does it have move ordering?) sits at 1050. It seems to pan out. If you have MVV-LVA in 0.2, and you add some good PST's and QSearch without bugs, MinimalChess should be able to reach a rating somewhere between 1650, +/- 25 Elo or thereabout.

Is it a bitboard engine? Sorry, I forgot. At this point it doesn't matter too much, because both our evaluations are small, but a bitboard engine has a measurable advantage in speed with bigger evaluation functions (and they're easier to write). That counts, because the evaluation is one of the most called function in the program.

lithander · Post by **lithander** » Fri Feb 26, 2021 3:23 am

I have run tests with limited search depth now, between the version with QSearch and one without. It seems like there are diminishing returns on the gained strength with increasing depth. And these drops in effectiveness happen every 2nd ply, apparently.

Code: Select all

100 Games, 2 plies depth:  L: 2 - W: 96 - D: 2
  +600 Elo (+/- nan)
100 Games, 3 plies depth:  L: 2 - W: 95 - D: 3
  +576 Elo (+/- 267.5)

100 Games, 4 plies depth:  L: 14 - W: 69 - D: 17
  +214 ELO (+/- 72.8)
100 Games, 5 plies depth:  L: 15 - W: 75 - D: 10
  +240.8 ELO (+/- 81.0)

100 Games, 6 plies depth:  L: 22 - W: 54 - D: 24
  +115.2 ELO (+/- 62.3)
100 Games, 7 plies depth:  L: 27 - W: 58 - D: 15
  +111.4 ELO (+/- 74.5)

If these diminishing returns are not a sign of a bug then my qsearch might actually produce the correct results. But of course it is a much more expensive way of evaluating a node than my super-simple material counting eval which costs almost nothing. Maybe in time constrained matches (not depth limited ones) qsearch brings you these 100 ELO but the searches end up shallower on average losing you 100 ELO and in the end it all cancels out.

...but Marcels experiments suggest that Qsearch together with PSTs are much more powerful then 100 ELO. And so I tried Marcels PSTs. And because it was getting late I made only one quick tournament of 600 games total limited to 5 plies depth.

Code: Select all

Rank Name                     Elo     +/-   Games   Score    Draw 
   1 MinimalChess PST+Q      538     102     300   95.7%    1.3% 
   2 MinimalChess Q           10      38     300   51.5%    7.7% 
   3 MinimalChess PST        -96      39     300   36.5%    8.3% 
   4 MinimalChess           -284      51     300   16.3%    6.7%

The combination of PSTs and QSearch clearly outperformed the version that had either QSearch or PSTs in isolation. It matches your results where taking one component away loses you 400-500 ELO and leaves the remaining component crippled. That's encouraging, I think!

But it has yet to prove that it also performs well under time limits. But if it doesn't at least it's only too slow and not wrong.

mhouppin wrote: ↑Thu Feb 25, 2021 9:58 pm Quiescence search is in my opinion a must-have component in any A/B engine to get above 1600-1700 Elo, so not gaining anything from it suggests that there is a bug somewhere. First thing I would check is if you have a stand-pat evaluation in it.

I definitely have stand-pat in there and I also try to handle the case where a capture puts the enemy king in check. Should this end the q-search? Or do you allow the king to make a non-capture move to get out of check and continue the capture sequence? (which I do)

hgm · Post by **hgm** » Fri Feb 26, 2021 10:00 am

QS should not slow you down so dramatically that it can compensate a 500 Elo advantage. Two things are very important, though:

1) First try to stand pat, and if that does not score above beta (to give an immediate cutoff), don't forget to make it raise alpha (if it scored above that) before searching any captures.
2) Make sure to the first capture that you search captures the most valuable piece of the opponent you can get.

These two things should be enough to prevent 'search explosion' in QS. Without them, QS would in principle still return the same result, but in some positions would take millions of nodes to evaluate just one position. This manifests itself by the search depth, which normally might be 8-10 ply, suddenly dropping to 2 or 3 ply during a game. Last week I encountered a case in the Janggi engine I am now developing, where due to a bug that messed up (2) it was not even able to finish the 1-ply iteration. (I had forgotten to define the moves as unsigned, so the captures with the larges sort key (= victim value) were interpreted as negative, and thus searched last!)

Sorting all the captures in MVV/LVA order helps to reduce the average number of nodes needed in QS even more, but micro-Max can get by quite well by only extracting the MVV/LVA-wise best move to search that first, and search the other captures in the order they happen to come out of the move generator.

As to checks in QS, there are two ways to treat those:

1) You could simply ignore them. Allow stand-pat, under the assumption that there will be a satisfactory non-capture evasion. (Which will be true most of the time.)
2) Apply a check extension, to search the position to 1 ply, so that also the non-capture evasions will be searched (and stand-pat would not be allowed).

Method (2) is of course more accurate, (it makes you detect the rare cases where the check is a mate earlier), but also slower. When you apply (1), you don't even have to test whether you are in check or not, which also saves time. Micro-Max uses (1), and is completely unaware whether it is in check or not in QS.

mvanthoor · Post by **mvanthoor** » Fri Feb 26, 2021 10:03 am

lithander wrote: ↑Fri Feb 26, 2021 3:23 am I have run tests with limited search depth now, between the version with QSearch and one without. It seems like there are diminishing returns on the gained strength with increasing depth. And these drops in effectiveness happen every 2nd ply, apparently.
Code: Select all
100 Games, 2 plies depth:  L: 2 - W: 96 - D: 2
  +600 Elo (+/- nan)
100 Games, 3 plies depth:  L: 2 - W: 95 - D: 3
  +576 Elo (+/- 267.5)

100 Games, 4 plies depth:  L: 14 - W: 69 - D: 17
  +214 ELO (+/- 72.8)
100 Games, 5 plies depth:  L: 15 - W: 75 - D: 10
  +240.8 ELO (+/- 81.0)

100 Games, 6 plies depth:  L: 22 - W: 54 - D: 24
  +115.2 ELO (+/- 62.3)
100 Games, 7 plies depth:  L: 27 - W: 58 - D: 15
  +111.4 ELO (+/- 74.5)
If these diminishing returns are not a sign of a bug then my qsearch might actually produce the correct results. But of course it is a much more expensive way of evaluating a node than my super-simple material counting eval which costs almost nothing. Maybe in time constrained matches (not depth limited ones) qsearch brings you these 100 ELO but the searches end up shallower on average losing you 100 ELO and in the end it all cancels out.

...but Marcels experiments suggest that Qsearch together with PSTs are much more powerful then 100 ELO. And so I tried Marcels PSTs. And because it was getting late I made only one quick tournament of 600 games total limited to 5 plies depth.
Code: Select all
Rank Name                     Elo     +/-   Games   Score    Draw 
   1 MinimalChess PST+Q      538     102     300   95.7%    1.3% 
   2 MinimalChess Q           10      38     300   51.5%    7.7% 
   3 MinimalChess PST        -96      39     300   36.5%    8.3% 
   4 MinimalChess           -284      51     300   16.3%    6.7% 
The combination of PSTs and QSearch clearly outperformed the version that had either QSearch or PSTs in isolation. It matches your results where taking one component away loses you 400-500 ELO and leaves the remaining component crippled. That's encouraging, I think! But it has yet to prove that it also performs well under time limits. But if it doesn't at least it's only too slow and not wrong.

mhouppin wrote: ↑Thu Feb 25, 2021 9:58 pm Quiescence search is in my opinion a must-have component in any A/B engine to get above 1600-1700 Elo, so not gaining anything from it suggests that there is a bug somewhere. First thing I would check is if you have a stand-pat evaluation in it.
I definitely have stand-pat in there and I also try to handle the case where a capture puts the enemy king in check. Should this end the q-search? Or do you allow the king to make a non-capture move to get out of check and continue the capture sequence? (which I do)

Cool

There has been extensive research about search depth vs. strength (search Google, and you'll find at least a few academic papers). Deminishing returns for QSearch are logical if you think about it. Let's say, one engine can consistently outperform another by 3 ply. Then you get this:

3 ply vs 6ply: advantage: 100%.
12 ply vs. 15 ply: advantage: 25%
30 ply vs. 33 ply: advantage: 10%

The strength difference will diminish, as the search depth will become closer and closer. Same with QSearch: if you search deeper with the normal search, then this is already avoiding the biggest blunders by itself, and QSearch makes it avoid more; the deeper you search, and the further QSearch searches, the more subtle the blunders become. Therefore playing strength will converge.

If I calibrate your table to 0 for the weakest engine, I get this:

Code: Select all

1 MinimalChess PST+Q      822     102     300   95.7%    1.3% 
2 MinimalChess Q          294      38     300   51.5%    7.7% 
3 MinimalChess PST        188      39     300   36.5%    8.3% 
4 MinimalChess              0      51     300   16.3%    6.7%

So 188 points for adding the PST (in my case, that was 520), 294 points for adding the QSearch, and 822 points for adding both (!). However, I tested at the normal 1m+0.6s time controls, not by fixed depth. I'd be quite surprised if you add PST's and QSearch, and your engine would storm to around 1870 Elo. Then I want to know what sort of magic keyboard you have for typing your code.

IS this PST + Q-search version available as a binary already? I'd like to test it, because it's exactly on par with my own engine (assuming it has MVV-LVA, king check extension in alpha/beta, and no other optimizations or hash tables).

lithander · Post by **lithander** » Fri Feb 26, 2021 12:11 pm

mvanthoor wrote: ↑Fri Feb 26, 2021 12:48 am Is it a bitboard engine? Sorry, I forgot. At this point it doesn't matter too much, because both our evaluations are small, but a bitboard engine has a measurable advantage in speed with bigger evaluation functions (and they're easier to write). That counts, because the evaluation is one of the most called function in the program.

No. Up to my first functional min/max version I hadn't discovered this forums yet. I didn't know that I'd eventually have to pit it against Rustic Alpha and so I didn't worry about how others were doing it. In other words I hadn't fallen into this rabbit hole called 'chess programming' yet. I think I didn't even know the words "bitboard" and "mailbox" and just reinvented the wheel because it didn't seem so hard that you'd need to read books or look at other implementations. I still like the resulting move generator code. I think it's pretty, because it's simple. When I forced my wife to watch my making-of videos she understood my explanation of how the move gen works just so. As she's not a programmer I doubt that would have been the case with bitboards!

mvanthoor wrote: ↑Fri Feb 26, 2021 12:48 am So 188 points for adding the PST (in my case, that was 520), 294 points for adding the QSearch, and 822 points for adding both (!). However, I tested at the normal 1m+0.6s time controls, not by fixed depth. I'd be quite surprised if you add PST's and QSearch, and your engine would storm to around 1870 Elo. Then I want to know what sort of magic keyboard you have for typing your code.

IS this PST + Q-search version available as a binary already? I'd like to test it, because it's exactly on par with my own engine (assuming it has MVV-LVA, king check extension in alpha/beta, and no other optimizations or hash tables).

I don't think you can add and subtract ELO points like that. Otherwise, just play against a random mover and win all games and you get infinity ELO and you've solved Chess!

Don't you worry! You're completely safe at this point.

Code: Select all

Score of MinimalChess 0.2.4 PST+Q vs Rustic: 13 - 49 - 13 [0.260]
...      MinimalChess 0.2.4 PST+Q playing White: 7 - 25 - 6  [0.263] 38
...      MinimalChess 0.2.4 PST+Q playing Black: 6 - 24 - 7  [0.257] 37
...      White vs Black: 31 - 31 - 13  [0.500] 75
Elo difference: -181.7 +/- 81.2, LOS: 0.0 %, DrawRatio: 17.3 %
75 of 100 games finished.

...but it's doing good against Sargon 1978 (1249 Elo on CCRL)

Code: Select all

Score of Sargon 1978 vs MinimalChess 0.2.4 PST+Q: 8 - 46 - 10 [0.203]
...      Sargon 1978 playing White: 3 - 24 - 5  [0.172] 32
...      Sargon 1978 playing Black: 5 - 22 - 5  [0.234] 32
...      White vs Black: 25 - 29 - 10  [0.469] 64
Elo difference: -237.5 +/- 97.0, LOS: 0.0 %, DrawRatio: 15.6 %
64 of 100 games finished.

So, I'd estimate MMC with QSearch and PST in it's current form to be at 1400-1500 ELO.

If you really want it I can send you the binary via PM. But I would prefer to take my time to make sure everything these new features are implemented to the best of my ability before I release a version with PSTs and QSearch. C# makes it easy to rapidly prototype stuff before you optimize it. For example the SortMvvLva is only a few lines of code currently and took me 10 minutes to write. I love C# making that possible. But then I've spend hours looking for a bug that probably is just a performance problem. So, as I've just painfully learned, you can implement a perfectly good optimization like qsearch so inefficiently that it negates it's effect entirely under certain conditions (e.g. too simplistic eval, depth > 7plys).

I wouldn't be surprised if the move ordering that looks up pieces in the board array and recomputes the score each time whatever sort-algorithm C# uses behind the scene needs to compare two elements. I know of course how this could be implemented faster but "premature optimization is the root of all evil"... well in this case Donald Knuth's aphorism was misleading me.

Code: Select all

        public static void SortMvvLva(List<Move> moves, Board context)
        {
            int Score(Move move)
            {
                Piece victim = context[move.ToIndex];
                Piece attacker = context[move.FromIndex];
                //Rating: 100 * Victims value first, offset by the attackers value
                return (victim == Piece.None) ? 0 : ((100 * (int)victim) - (int)attacker);
            }
            moves.Sort((a, b) => Score(b).CompareTo(Score(a)));
        }

lithander · Post by **lithander** » Fri Feb 26, 2021 12:51 pm

hgm wrote: ↑Fri Feb 26, 2021 10:00 am Sorting all the captures in MVV/LVA order helps to reduce the average number of nodes needed in QS even more, but micro-Max can get by quite well by only extracting the MVV/LVA-wise best move to search that first, and search the other captures in the order they happen to come out of the move generator.

As to checks in QS, there are two ways to treat those:

1) You could simply ignore them. Allow stand-pat, under the assumption that there will be a satisfactory non-capture evasion. (Which will be true most of the time.)
2) Apply a check extension, to search the position to 1 ply, so that also the non-capture evasions will be searched (and stand-pat would not be allowed).

Method (2) is of course more accurate, (it makes you detect the rare cases where the check is a mate earlier), but also slower. When you apply (1), you don't even have to test whether you are in check or not, which also saves time. Micro-Max uses (1), and is completely unaware whether it is in check or not in QS.

Currently I'm doing (2) and searching all moves, not only captures, if the king is in check. It's probably part of the reason why my qsearch is sometimes very slow. But it's good to know that there is not one 'true' solution and so it's probably best to just test different variants.

I haven't found time and concentration to put much thought into it but at the moment I'm not yet sure how not testing for checks would work. Wouldn't you play illegal moves without noticing? Or is that so rare it doesn't matter?
Also interesting idea to not sort all the captures fully but just start with the most promising one and play the rest in any order.

As always your posts provide a lot of food for thought!

Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength

Re: Quiescence Search doesn't improve strength