checks in q-search.

bob · Post by **bob** » Tue Sep 02, 2008 6:47 am

I have something that sort of works. By "sort of" I mean it looks good in test positions, and saves lots of plies at times. But then I started a cluster run. First the data. Crafty-22.2 is the version with q-search checks. Crafty-22.2R1 is the identical same code except no q-search checks (code is not even in this version, but everything else is identical). The 22.2 (with qchecks) test is not completed, but the data indicates not a lot of difference.

Code: Select all

56053 game(s) loaded, 0 game(s) with unknown result ignored.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.1       180    6    6 11212   78%   -40   17% 
   2 Fruit 2.1           64    5    6 11203   64%   -40   23% 
   3 opponent-21.7       23    5    6 11210   59%   -40   34% 
   4 Glaurung 1.1 SMP     8    6    6 11207   56%   -40   20% 
   5 Crafty-22.2        -35    5    5 17144   43%    15   22% 
   6 Crafty-22.2R1      -42    4    4 38909   42%    15   23% 
   7 Arasan 10.0       -197    6    6 11221   29%   -40   19%

I combined all the PGN from 22.2R1 vs the five opponents, with the PGN from the games using the qchecks verion that have completed so far. Remi suggested combining things into one PGN for most reliable Elo comparison. Remember that 22.2 and 22.2R1 do not play each other although I suppose I could do that if anyone is interested.

About 1/2 way through, there is a 7 Elo difference, but when you compare the ranges with the error bar (-40 to -30) and (-47 to -37) there is some overlap. The remaining 20K games will reduce the error bars for both to +/-4 roughly, I will report the results when they finish probably tomorrow morning (still waiting on A/C repair so we are at 1/2 cluster powered down). I also have one more run with plain null-move R=3, since the adaptive null move (R=2~3) tapered R off near the leaves due to the danger of overlooking a serious threat, and the q-search checks are partially intended to help with that.

So far, +500 lines of code, minimal improvement if it holds up.

More as it becomes available. This version is a pretty clean implementation. that is efficient in terms of speed, and pretty cute in the code that generates direct/discovered checks without any loops.

gladius · Post by **gladius** » Tue Sep 02, 2008 7:00 am

How deep do you generate checks in the q-search? Also, are you pruning see < 0 checks?

Generating checks in the first few plies of q-search seems sound. Almost an "in-between" q-search. Tord mentioned a while back that Glaurung tapers the search more gradually than a hard q-search cutoff.

Pruning see < 0 checks is a bit trickier, but searching all those suicidal sacs is a job for the full width search

.

Tord Romstad · Post by **Tord Romstad** » Tue Sep 02, 2008 10:32 am

gladius wrote:Tord mentioned a while back that Glaurung tapers the search more gradually than a hard q-search cutoff.

Yes: The last 7 plies of the main search are tapered, and becomes gradually more and more like a qsearch (except at PV nodes, where the search is completely full-width all the way to depth=0). Here is how it works:

The first N moves at all nodes are always searched. The number N depends on the remaining depth; it is smaller when the remaining depth is small. After the first N moves, only the following classes of moves are searched (the "threat move" is defined to be the move that refuted the null move at the current node):

Captures, checks, check evasions, and passed pawn pushes. All of these are searched, even when they have SEE < 0.
Moves where the moving piece is the piece captured by the threat move.
Moves which defend the destination square of the threat move, if the destination square is empty or the value of the piece on the destination square is smaller or equal to the value of the capturing piece in the threat move.
Moves with non-negative SEE value which block the ray of the threat move.
Moves with high history scores. The threshold depends on the remaining depth.

Most programmers don't like this scheme, because it sounds "too risky". I don't quite understand what they mean. It is not intuitively obvious to me why a gradual transition to the qsearch should be more risky than a sharp limit.

Tord

Uri Blass · Post by **Uri Blass** » Tue Sep 02, 2008 1:30 pm

Tord Romstad wrote:
gladius wrote:Tord mentioned a while back that Glaurung tapers the search more gradually than a hard q-search cutoff.
Yes: The last 7 plies of the main search are tapered, and becomes gradually more and more like a qsearch (except at PV nodes, where the search is completely full-width all the way to depth=0). Here is how it works:

The first N moves at all nodes are always searched. The number N depends on the remaining depth; it is smaller when the remaining depth is small. After the first N moves, only the following classes of moves are searched (the "threat move" is defined to be the move that refuted the null move at the current node):

Captures, checks, check evasions, and passed pawn pushes. All of these are searched, even when they have SEE < 0.

Moves where the moving piece is the piece captured by the threat move.

Moves which defend the destination square of the threat move, if the destination square is empty or the value of the piece on the destination square is smaller or equal to the value of the capturing piece in the threat move.

Moves with non-negative SEE value which block the ray of the threat move.

Moves with high history scores. The threshold depends on the remaining depth.
Most programmers don't like this scheme, because it sounds "too risky". I don't quite understand what they mean. It is not intuitively obvious to me why a gradual transition to the qsearch should be more risky than a sharp limit.

Tord

I do not think that most programmers tried it but I guess that the main reason is that most programmers do not use it is that they had no time to implement it or they do not know what you do(Glaurung is open source but it does not mean that most programmers studied it).

I do not consider what you do as risky and I think that it may be the opposite and maybe glaurung can earn some rating points by not considering bad checks or bad captures and by not considering part of the passed pawn pushes.

Uri

Uri Blass · Post by **Uri Blass** » Tue Sep 02, 2008 1:37 pm

bob wrote:I have something that sort of works. By "sort of" I mean it looks good in test positions, and saves lots of plies at times. But then I started a cluster run. First the data. Crafty-22.2 is the version with q-search checks. Crafty-22.2R1 is the identical same code except no q-search checks (code is not even in this version, but everything else is identical). The 22.2 (with qchecks) test is not completed, but the data indicates not a lot of difference.
Code: Select all
56053 game(s) loaded, 0 game(s) with unknown result ignored.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.1       180    6    6 11212   78%   -40   17% 
   2 Fruit 2.1           64    5    6 11203   64%   -40   23% 
   3 opponent-21.7       23    5    6 11210   59%   -40   34% 
   4 Glaurung 1.1 SMP     8    6    6 11207   56%   -40   20% 
   5 Crafty-22.2        -35    5    5 17144   43%    15   22% 
   6 Crafty-22.2R1      -42    4    4 38909   42%    15   23% 
   7 Arasan 10.0       -197    6    6 11221   29%   -40   19% 
I combined all the PGN from 22.2R1 vs the five opponents, with the PGN from the games using the qchecks verion that have completed so far. Remi suggested combining things into one PGN for most reliable Elo comparison. Remember that 22.2 and 22.2R1 do not play each other although I suppose I could do that if anyone is interested.

About 1/2 way through, there is a 7 Elo difference, but when you compare the ranges with the error bar (-40 to -30) and (-47 to -37) there is some overlap. The remaining 20K games will reduce the error bars for both to +/-4 roughly, I will report the results when they finish probably tomorrow morning (still waiting on A/C repair so we are at 1/2 cluster powered down). I also have one more run with plain null-move R=3, since the adaptive null move (R=2~3) tapered R off near the leaves due to the danger of overlooking a serious threat, and the q-search checks are partially intended to help with that.

So far, +500 lines of code, minimal improvement if it holds up.

More as it becomes available. This version is a pretty clean implementation. that is efficient in terms of speed, and pretty cute in the code that generates direct/discovered checks without any loops.

I guess that you are going to see another improvement when you use R=3
instead of R=2/3

Uri

BubbaTough · Post by **BubbaTough** » Tue Sep 02, 2008 2:46 pm

Most programmers don't like this scheme, because it sounds "too risky". I don't quite understand what they mean. It is not intuitively obvious to me why a gradual transition to the qsearch should be more risky than a sharp limit.

I didn't used to know either. Now that I have filled my program with similar risky things I am starting to get an inkling. When you do funky pruning like that (or reductions of a similar type) your program begins to play more brilliant and more dumb moves. Even if you can find a risky system that improves your overall results in whatever testing mechanism you use, it can be quite annoying when some weak program mates you in 5 in some endgame (which would not happen if you were not doing risky pruning). I am willing to put up with that kind of thing, and it sounds like Tord is willing to put up with analogous annoyances as long as his average engine performance is improved. But to many falling for simple combos or sequences is simply unacceptable, even if it happens infrequently.

-Sam

FrancoisK · Post by **FrancoisK** » Tue Sep 02, 2008 3:49 pm

I also tried checks in q-search (generation + handling) in bugchess with several flavors (the default is : checks neither generated nor handled)
- handle all checks (ie no stand pat, try all replies if in check), no generation
- same, and generate qs ply 0 checks (see >= 0)
- same, but generate only checks if last move was a null move
- same tests with R = 3 (the default is adaptative 2/3 in Bug)
- handle check in QS only if standpat > alpha
In fast games (40/20 sec), no version was clearly better than my default version over 20000 games.

bob · Post by **bob** » Tue Sep 02, 2008 6:37 pm

gladius wrote:How deep do you generate checks in the q-search? Also, are you pruning see < 0 checks?

Good questions.

First, I have tried two approaches. Generating checks at the first q-search ply, and generating checks at the first two, although only at the second if the first was a check also, otherwise stand-pat would make the second one pointless. Best result was just doing one, which is what this test run used.

Second, I tried that both ways as well, and the best result has been to use the SEE score to dump losing checks. Not doing so hurts the test results although the difference was not that much.

Generating checks in the first few plies of q-search seems sound. Almost an "in-between" q-search. Tord mentioned a while back that Glaurung tapers the search more gradually than a hard q-search cutoff.

Prior to the Jakarta chess tournament, Crafty used the same approach as we used in Cray Blitz. A normal full-width search, followed by 4 plies (at most) of checks/check evasions and any other move we though appropriate (null-move threat extension comes to mind), followed by normal q-search. Bruce Moreland and I were discussing ideas and he suggested we try a simple brain-dead q-search with no checks or anything. we both tried it and found the programs played better. So far, going back closer to the old approach is barely helping, although the null R=3 is yet to complete (probably tomorrow, the R=2~3 adaptive null run ought to be done soon.

Pruning see < 0 checks is a bit trickier, but searching all those suicidal sacs is a job for the full width search .

I agree. Although I didn't find anything tricky since I only generate checking moves and applying SEE is a simple operation.

bob · Post by **bob** » Tue Sep 02, 2008 6:39 pm

Uri Blass wrote:
bob wrote:I have something that sort of works. By "sort of" I mean it looks good in test positions, and saves lots of plies at times. But then I started a cluster run. First the data. Crafty-22.2 is the version with q-search checks. Crafty-22.2R1 is the identical same code except no q-search checks (code is not even in this version, but everything else is identical). The 22.2 (with qchecks) test is not completed, but the data indicates not a lot of difference.
Code: Select all
56053 game(s) loaded, 0 game(s) with unknown result ignored.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.1       180    6    6 11212   78%   -40   17% 
   2 Fruit 2.1           64    5    6 11203   64%   -40   23% 
   3 opponent-21.7       23    5    6 11210   59%   -40   34% 
   4 Glaurung 1.1 SMP     8    6    6 11207   56%   -40   20% 
   5 Crafty-22.2        -35    5    5 17144   43%    15   22% 
   6 Crafty-22.2R1      -42    4    4 38909   42%    15   23% 
   7 Arasan 10.0       -197    6    6 11221   29%   -40   19% 
I combined all the PGN from 22.2R1 vs the five opponents, with the PGN from the games using the qchecks verion that have completed so far. Remi suggested combining things into one PGN for most reliable Elo comparison. Remember that 22.2 and 22.2R1 do not play each other although I suppose I could do that if anyone is interested.

About 1/2 way through, there is a 7 Elo difference, but when you compare the ranges with the error bar (-40 to -30) and (-47 to -37) there is some overlap. The remaining 20K games will reduce the error bars for both to +/-4 roughly, I will report the results when they finish probably tomorrow morning (still waiting on A/C repair so we are at 1/2 cluster powered down). I also have one more run with plain null-move R=3, since the adaptive null move (R=2~3) tapered R off near the leaves due to the danger of overlooking a serious threat, and the q-search checks are partially intended to help with that.

So far, +500 lines of code, minimal improvement if it holds up.

More as it becomes available. This version is a pretty clean implementation. that is efficient in terms of speed, and pretty cute in the code that generates direct/discovered checks without any loops.
I guess that you are going to see another improvement when you use R=3
instead of R=2/3

Uri

I am not so sure. Early testing showed no difference between the two, but I am running the "big test" on everything for a final go/no-go decision.

bob · Post by **bob** » Tue Sep 02, 2008 6:43 pm

FrancoisK wrote:I also tried checks in q-search (generation + handling) in bugchess with several flavors (the default is : checks neither generated nor handled)
- handle all checks (ie no stand pat, try all replies if in check), no generation
- same, and generate qs ply 0 checks (see >= 0)
- same, but generate only checks if last move was a null move
- same tests with R = 3 (the default is adaptative 2/3 in Bug)
- handle check in QS only if standpat > alpha
In fast games (40/20 sec), no version was clearly better than my default version over 20000 games.

I am suspicious that this is what I am going to find as well.

checks in q-search.

checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.

Re: checks in q-search.