SEE Observation

brianr · Post by **brianr** » Sun Aug 02, 2009 9:26 pm

I have recently rewritten Tinker's SEE function and happened to run (for me) quite a large number (5,400+) of self-play games with slightly different SEE use in the q-search.

The first version with the "correct" new SEE is 7.47; however, this version did not make use of the attacker and victim already being known for a potential q-search move before calling the SEE function (which is only done in cases where the attacker value is < the defender value, when not in check, for promotions, etc). The net result is that the "get smallest attacker" function was always being called in the SEE function, which would sometimes find a smaller attacker than the potential q-search move being considered.

The second version is 7.49 and uses the known attacker and victim values and simply saves the first call to get the smallest attacker.

I am puzzled because the "wrong" version wins about 5% more games than the "right" version, which naturally got me thinking about this, since the difference is (I think) below the error margin for 5,400 games.

For the case when the q-search move being considered is the only attacker it should not matter what the SEE result is; likewise where the q-search move actually is the lowest attacker.

Nor would it seem to matter when the q-search attacker is not the lowest attacker when the SEE result using the lowest attacker indicates a "poor" capture, since using the actual attacker would be even worse (aside from rare xray situations).

It would seem to make a difference in the odd case where the lowest attcker SEE result is good, but the more valuable real attacker is not. I'm thinking this will mean more moves are actually tried in the q-search, which should be slower, but yet it plays better.

Incidentally, in Tinker's ecosystem, SEE is not used in the full width search for capture move ordering, just MVV/LVA. The overall move order is hash move (which is usually the PV move), all captures (and queen promotions), 2 non-capture killers, and finally the remaining non-captures and other moves. History-based LMR is used for non-captures after the first 3 non-caps (or 4 when the hash move is a capture), not including the killers.

In the full-width search, postponing poor captures for Tinker results in poorer play. In the q-search, Tinker skips poor captures when not in check (and a few other cases).

Code: Select all

Program                          Elo    +   -   Games   Score   Av.Op.  Draws

Tinker 747 x64                 &#58; 2408    7   7  5407    52.4 %   2392   34.8 %
Tinker 749 x64                 &#58; 2392    7   7  5407    47.6 %   2408   34.8 %

bob · Post by **bob** » Sun Aug 02, 2009 11:45 pm

brianr wrote:I have recently rewritten Tinker's SEE function and happened to run (for me) quite a large number (5,400+) of self-play games with slightly different SEE use in the q-search.

The first version with the "correct" new SEE is 7.47; however, this version did not make use of the attacker and victim already being known for a potential q-search move before calling the SEE function (which is only done in cases where the attacker value is < the defender value, when not in check, for promotions, etc). The net result is that the "get smallest attacker" function was always being called in the SEE function, which would sometimes find a smaller attacker than the potential q-search move being considered.

The second version is 7.49 and uses the known attacker and victim values and simply saves the first call to get the smallest attacker.

I am puzzled because the "wrong" version wins about 5% more games than the "right" version, which naturally got me thinking about this, since the difference is (I think) below the error margin for 5,400 games.

For the case when the q-search move being considered is the only attacker it should not matter what the SEE result is; likewise where the q-search move actually is the lowest attacker.

Nor would it seem to matter when the q-search attacker is not the lowest attacker when the SEE result using the lowest attacker indicates a "poor" capture, since using the actual attacker would be even worse (aside from rare xray situations).

It would seem to make a difference in the odd case where the lowest attcker SEE result is good, but the more valuable real attacker is not. I'm thinking this will mean more moves are actually tried in the q-search, which should be slower, but yet it plays better.

Incidentally, in Tinker's ecosystem, SEE is not used in the full width search for capture move ordering, just MVV/LVA. The overall move order is hash move (which is usually the PV move), all captures (and queen promotions), 2 non-capture killers, and finally the remaining non-captures and other moves. History-based LMR is used for non-captures after the first 3 non-caps (or 4 when the hash move is a capture), not including the killers.

In the full-width search, postponing poor captures for Tinker results in poorer play. In the q-search, Tinker skips poor captures when not in check (and a few other cases).
Code: Select all
Program                          Elo    +   -   Games   Score   Av.Op.  Draws

Tinker 747 x64                 &#58; 2408    7   7  5407    52.4 %   2392   34.8 %
Tinker 749 x64                 &#58; 2392    7   7  5407    47.6 %   2408   34.8 %

You are inside the error bar I believe. 40K games is +/-4 to +/-5.. so 10K games is +/-8 to +/-10, and you only have half that many games. This would probably be easier to test with pure positions. Run it on WAC at a fixed depth, and then compare total nodes for each version and keep the version with the smaller number, since all you are doing is affecting move order.

brianr · Post by **brianr** » Mon Aug 03, 2009 2:35 am

I also ran both versions against a common set of other opponents and the LOS seemed pretty clear.
While the number of games is not as large as the self-play set (which is included),
is it not enough or is the LOS wrong?

Code: Select all

Rank Name             Elo    +    - games score oppo. draws
   1                  163   35   32   400   71%     5   18%
   2                   60   31   31   400   58%     5   19%
   3                   59   32   31   400   58%     5   18%
   4 Tinker 747 x64    13    7    7  6993   53%    -7   32%
   5 Tinker 749 x64    -3    7    7  6793   49%     5   31%
   6                   -6   38   38   200   47%    13   38%
   7                  -11   29   29   400   48%     5   26%
   8                  -95   32   33   397   36%     5   16%
   9                 -125   33   36   375   32%     5   15%
  10                 -220   35   39   400   22%     5   14%
ResultSet-EloRating>los
                         Ti Ti 
                   99 99100100 99 99100100100
                 0    53 99 99 99 99 99 99100
                 0 46    99 99 99 99 99 99100

Tinker 747 x64   0  0  0    99 80 92 99 99100
Tinker 749 x64   0  0  0  0    54 68 99 99100

                 0  0  0 19 45    57 99 99 99
                 0  0  0  7 31 42    99 99100
                 0  0  0  0  0  0  0    88 99
                 0  0  0  0  0  0  0 11    99
                 0  0  0  0  0  0  0  0  0

Also, the ELO difference seems to be greater thant the error range.

bob · Post by **bob** » Mon Aug 03, 2009 6:58 am

brianr wrote:I also ran both versions against a common set of other opponents and the LOS seemed pretty clear.
While the number of games is not as large as the self-play set (which is included),
is it not enough or is the LOS wrong?

Code: Select all

Rank Name             Elo    +    - games score oppo. draws
   1                  163   35   32   400   71%     5   18%
   2                   60   31   31   400   58%     5   19%
   3                   59   32   31   400   58%     5   18%
   4 Tinker 747 x64    13    7    7  6993   53%    -7   32%
   5 Tinker 749 x64    -3    7    7  6793   49%     5   31%
   6                   -6   38   38   200   47%    13   38%
   7                  -11   29   29   400   48%     5   26%
   8                  -95   32   33   397   36%     5   16%
   9                 -125   33   36   375   32%     5   15%
  10                 -220   35   39   400   22%     5   14%
ResultSet-EloRating>los
                         Ti Ti 
                   99 99100100 99 99100100100
                 0    53 99 99 99 99 99 99100
                 0 46    99 99 99 99 99 99100

Tinker 747 x64   0  0  0    99 80 92 99 99100
Tinker 749 x64   0  0  0  0    54 68 99 99100

                 0  0  0 19 45    57 99 99 99
                 0  0  0  7 31 42    99 99100
                 0  0  0  0  0  0  0    88 99
                 0  0  0  0  0  0  0 11    99
                 0  0  0  0  0  0  0  0  0

Also, the ELO difference seems to be greater thant the error range.

You are just beyond the point of overlap with the extra games. Which only leaves a careful analysis of why it is better, and even more importantly, is it better at longer games as well, or is this a performance issue that becomes more important at very fast games???

SEE Observation

SEE Observation

Re: SEE Observation

Re: SEE Observation

Re: SEE Observation