Glaurung 2.2

mcostalba · Post by **mcostalba** » Sun Dec 21, 2008 7:39 pm

Denis P. Mendoza wrote:
I noticed that Stockfish 1.1 performs well on blitz on these settings compared to default (I could be wrong):

Futility Margin 2 = 200
Maximum Razoring Depth = 2
Razoring Margin = 200
It's just preliminary, and results may vary. The one thing I like is, it is beating the latest Togas, go toe-to-toe with some old Rybkas, and giving the new sensation, 'Inert Thinker' a taste of Glaurung's medicine!

Thanks Denis for testing, I will try use your settings and see if I can confirm your results.

Actually I also noticed that Stockfish is strong with strong engines, the problem is that it is weak with engines ranked at lower positions! Perhaps too much pruning??

The version now in development uses a less aggressive pruning, so far results are not bad. Lately I have started to be very very cautious with pruning, my rule now is: if test with a more aggressive pruning formula does not give clear and important results then discard. My guess is that is better to stick to a safe algorithm as long as the agressive one is just few ELO points of difference (also if positive).

Due to practical reasons I am testing at fast time controls, so if the pruning is dubious I would guess will be much more dubious or even a regression at longer time controls (where I cannot properly test).

Marco

Uri Blass · Post by **Uri Blass** » Sun Dec 21, 2008 8:22 pm

mcostalba wrote:
Denis P. Mendoza wrote:
I noticed that Stockfish 1.1 performs well on blitz on these settings compared to default (I could be wrong):

Futility Margin 2 = 200
Maximum Razoring Depth = 2
Razoring Margin = 200
It's just preliminary, and results may vary. The one thing I like is, it is beating the latest Togas, go toe-to-toe with some old Rybkas, and giving the new sensation, 'Inert Thinker' a taste of Glaurung's medicine!

Thanks Denis for testing, I will try use your settings and see if I can confirm your results.

Actually I also noticed that Stockfish is strong with strong engines, the problem is that it is weak with engines ranked at lower positions! Perhaps too much pruning??

The version now in development uses a less aggressive pruning, so far results are not bad. Lately I have started to be very very cautious with pruning, my rule now is: if test with a more aggressive pruning formula does not give clear and important results then discard. My guess is that is better to stick to a safe algorithm as long as the agressive one is just few ELO points of difference (also if positive).

Due to practical reasons I am testing at fast time controls, so if the pruning is dubious I would guess will be much more dubious or even a regression at longer time controls (where I cannot properly test).

Marco

I see no reason to assume that more pruning is better for blitz relative to long time control.
More pruning may be also worse for blitz and better for long time control
and I think that it is dependent on your pruning rules.

Uri

bob · Post by **bob** » Sun Dec 21, 2008 10:12 pm

Tord Romstad wrote:
Spock wrote:
Tord Romstad wrote: <snip> recent tests indicate that the version I prepared for the OPCCC is considerably stronger than 2.1. I have therefore fixed a handful of very minor bugs, polished the code a little, and published the result as Glaurung 2.2.
I've played 3 opponents at chess960 so far (3 x 100), and based on those games it is 60-70 ELO stronger than 2.1.
I don't know if that will hold with further games, but so far it is looking very good indeed
Sounds good, but the Elo improvement will be smaller in normal chess. As I wrote before, the most important change in the new version is that the program now evaluates space. This is extremely important in Chess960, but not quite so important when playing normal chess and cheating by using an opening book.

There are still some relatively easy improvements waiting to be made in Chess960: Glaurung still doesn't evaluate development.

Thanks for the early results!

Tord

Here are some quick results from my cluster. Note that these games are for 5 different programs playing Crafty 22.9, so all you see here is G21 or G22 vs Crafty alone.

Code: Select all

Rank Name                  Elo    +    - games  score oppo. draws
   1 Glaurung 2.2         2668    6    6 15564    59%  2599   21%
   2 Toga2                2659    2    3 171204   58%  2599   22%
   3 Glaurung 2.1         2655    3    2 186768   58%  2599   21%
   4 Crafty-22.9-11       2600    5    5 31128    51%  2593   21%
   5 Crafty-22.9-12       2599    4    4 31128    51%  2593   21%
   6 Fruit 2.1            2556    3    3 186768   44%  2599   23%
   7 Glaurung 1.1 SMP     2491    3    2 186768   36%  2599   19%

Based on that, against only Crafty, it seems to be about +13 Elo better. Note that the non-2.2 ratings are based on far more games but all against the same 22.9 version of Crafty.

I will try to run a different test today, where G2.1 and G2.2 play against G1, fruit2, crafty 22.9 and toga2 instead so you will play games against everyone. In fact, with just 6 players, I could play a round robin which will only take a few hours so I will fire that up. Everyone will play everyone roughly 8,000 games per opponent...

More later...

One more thing. I have a version of Crafty's evaluation that evaluates space, but I never found anything that worked well against humans. It was sort of "anti-anti-computer chess" in a sense, in that for humans that play anti-computer chess, the space evaluation played right into their hands, because it would lock the pawn structure up when its own pawns are more advanced, giving itself more space, but no way to use the space with no pawn breaks left...

Eelco de Groot · Post by **Eelco de Groot** » Sun Dec 21, 2008 10:19 pm

mcostalba wrote:
Denis P. Mendoza wrote:
I noticed that Stockfish 1.1 performs well on blitz on these settings compared to default (I could be wrong):

Futility Margin 2 = 200
Maximum Razoring Depth = 2
Razoring Margin = 200
It's just preliminary, and results may vary. The one thing I like is, it is beating the latest Togas, go toe-to-toe with some old Rybkas, and giving the new sensation, 'Inert Thinker' a taste of Glaurung's medicine!

Thanks Denis for testing, I will try use your settings and see if I can confirm your results.

Actually I also noticed that Stockfish is strong with strong engines, the problem is that it is weak with engines ranked at lower positions! Perhaps too much pruning??

The version now in development uses a less aggressive pruning, so far results are not bad. Lately I have started to be very very cautious with pruning, my rule now is: if test with a more aggressive pruning formula does not give clear and important results then discard. My guess is that is better to stick to a safe algorithm as long as the agressive one is just few ELO points of difference (also if positive).

Due to practical reasons I am testing at fast time controls, so if the pruning is dubious I would guess will be much more dubious or even a regression at longer time controls (where I cannot properly test).

Marco

Thanks much Tord for Glaurung 2.2! I'm glad that improvements were found after all in the version for the OPCCC, although it sometimes is hard to do all the testing well!

I just wanted to say something to Denis and Marco here about the Maximum Razoring Depth; I'm not 100% sure anymore if it applied to Toga Razoring (copied from Glaurung) exactly as in the original Glaurung version but I do hope I remember well that in Glaurung 2.1 if you use Razoring only when initial conditions for starting Nullmove pruning do not apply, Razoring only started at remaining depths 3 and below, so Maximum Razoring Depth = 3 (+1) was exactly the same as Maximum Razoring Depth = 2 (+ 1 * OnePly internally added). I believe that after I changed the conditions for Razoring by removing the

Code: Select all

else

from

Code: Select all

    
&#125; // End of Null move search

    // Razoring&#58;
    else if &#40;depth < RazorDepth

I could increase the Maximum Razoring Depth and now get different results too for higher Maximum Razoring Depths.

I thought removing this Else made some sense as razoring preconditions seemed to be fairly different from nullmove pruning conditions. But I added some safety margins in the pruning conditions (that is also in Toga Checkov Beta 3 and later, but maybe for fast time controls you will not like it; sorry I never test much, and will not test much in with fast games, because you really need 1,000s of games for statistical good results, and for long time controls it still does not mean much

)

Code now reads:

Code: Select all

    &#125; // End of Null move search

    // Razoring&#58;
    if &#40;depth < RazorDepth && approximateEval < beta - RazorMargin &&
       evaluate&#40;pos, ei, threadID&#41; < beta - &#40;RazorMargin + depth * 25&#41;) &#123;
      Value v = qsearch&#40;pos, ss, beta-1, beta, Depth&#40;0&#41;, ply, threadID&#41;;
      if &#40;v < beta - 0x100&#41;
        return v;
    &#125; // &#91;EdG&#58; was&#58; else if &#40;depth...&#93;

Eelco

bob · Post by **bob** » Sun Dec 21, 2008 11:04 pm

Eelco de Groot wrote:
mcostalba wrote:
Denis P. Mendoza wrote:
I noticed that Stockfish 1.1 performs well on blitz on these settings compared to default (I could be wrong):

Futility Margin 2 = 200
Maximum Razoring Depth = 2
Razoring Margin = 200
It's just preliminary, and results may vary. The one thing I like is, it is beating the latest Togas, go toe-to-toe with some old Rybkas, and giving the new sensation, 'Inert Thinker' a taste of Glaurung's medicine!

Thanks Denis for testing, I will try use your settings and see if I can confirm your results.

Actually I also noticed that Stockfish is strong with strong engines, the problem is that it is weak with engines ranked at lower positions! Perhaps too much pruning??

The version now in development uses a less aggressive pruning, so far results are not bad. Lately I have started to be very very cautious with pruning, my rule now is: if test with a more aggressive pruning formula does not give clear and important results then discard. My guess is that is better to stick to a safe algorithm as long as the agressive one is just few ELO points of difference (also if positive).

Due to practical reasons I am testing at fast time controls, so if the pruning is dubious I would guess will be much more dubious or even a regression at longer time controls (where I cannot properly test).

Marco
Thanks much Tord for Glaurung 2.2! I'm glad that improvements were found after all in the version for the OPCCC, although it sometimes is hard to do all the testing well!

I just wanted to say something to Denis and Marco here about the Maximum Razoring Depth; I'm not 100% sure anymore if it applied to Toga Razoring (copied from Glaurung) exactly as in the original Glaurung version but I do hope I remember well that in Glaurung 2.1 if you use Razoring only when initial conditions for starting Nullmove pruning do not apply, Razoring only started at remaining depths 3 and below, so Maximum Razoring Depth = 3 (+1) was exactly the same as Maximum Razoring Depth = 2 (+ 1 * OnePly internally added). I believe that after I changed the conditions for Razoring by removing the
Code: Select all
else
from
Code: Select all
    
&#125; // End of Null move search

    // Razoring&#58;
    else if &#40;depth < RazorDepth
I could increase the Maximum Razoring Depth and now get different results too for higher Maximum Razoring Depths.

I thought removing this Else made some sense as razoring preconditions seemed to be fairly different from nullmove pruning conditions. But I added some safety margins in the pruning conditions (that is also in Toga Checkov Beta 3 and later, but maybe for fast time controls you will not like it; sorry I never test much, and will not test much in with fast games, because you really need 1,000s of games for statistical good results, and for long time controls it still does not mean much )

Code now reads:
Code: Select all
    &#125; // End of Null move search

    // Razoring&#58;
    if &#40;depth < RazorDepth && approximateEval < beta - RazorMargin &&
       evaluate&#40;pos, ei, threadID&#41; < beta - &#40;RazorMargin + depth * 25&#41;) &#123;
      Value v = qsearch&#40;pos, ss, beta-1, beta, Depth&#40;0&#41;, ply, threadID&#41;;
      if &#40;v < beta - 0x100&#41;
        return v;
    &#125; // &#91;EdG&#58; was&#58; else if &#40;depth...&#93;
Eelco

Crafty has had razoring and futility pruning forever. I recently added extended futility pruning, and then spent several days tuning each one. Razoring is actually worth _very_ little in terms of Elo. 2-3 at the most. Futility is a bit more valuable, as is extended futility pruning. I ran hundreds of tests, adjusting the margins I use, and also experimenting with the various depth limits. What I am currently doing was the best I could discover.

I use a razoring margin of 3.00, a futility margin of 1.25, and an extended futility margin of 300. The depth limits are as explained in Heinz's book...

Eelco de Groot · Post by **Eelco de Groot** » Sun Dec 21, 2008 11:35 pm

Eelco de Groot wrote:
I thought removing this Else made some sense as razoring preconditions seemed to be fairly different from nullmove pruning conditions.

I should probably have added that without the sources it is difficult to compare these code changes, in Ancalagon for example in addition to above I try null move in almost all the zero width nodes even when the approximate eval is already pretty bad, so using the 'else' condition would only do razoring if approximate eval is already very very low, below a Null move margin of 0xA00, which is hexadecimal for 10 full Glaurung pawns

, I chose a fairly arbitrary low number here, or the position is check, but not sure if then other things are happening.

Code: Select all


  // Null move margin.  A null move search will not be done if the approximate
  // evaluation of the position is more than NullMoveMargin below beta.
  const Value NullMoveMargin = Value&#40;0xA00&#41;;

Eelco

bob · Post by **bob** » Mon Dec 22, 2008 4:07 am

Here's the result, 6 programs including glaurung 1, 2.1 and 2.2, fruit 2, toga2, and crafty 22.9.

Code: Select all

Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2      2662    4    3 38910   60%  2588   25% 
   2 Toga2             2650    4    3 38910   58%  2590   23% 
   3 Glaurung 2.1      2646    3    4 38910   58%  2591   25% 
   4 Crafty-22.9       2594    4    4 38910   49%  2601   21% 
   5 Fruit 2.1         2550    4    3 38910   42%  2610   24% 
   6 Glaurung 1.1 SMP  2497    3    4 38910   34%  2621   20%

Every program played 3891 positions against the other five, two games per positions alternating colors.

Dirt · Post by **Dirt** » Mon Dec 22, 2008 6:15 am

bob wrote:Here's the result, 6 programs including glaurung 1, 2.1 and 2.2, fruit 2, toga2, and crafty 22.9.
Code: Select all
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2      2662    4    3 38910   60%  2588   25% 
   2 Toga2             2650    4    3 38910   58%  2590   23% 
   3 Glaurung 2.1      2646    3    4 38910   58%  2591   25% 
   4 Crafty-22.9       2594    4    4 38910   49%  2601   21% 
   5 Fruit 2.1         2550    4    3 38910   42%  2610   24% 
   6 Glaurung 1.1 SMP  2497    3    4 38910   34%  2621   20% 
Every program played 3891 positions against the other five, two games per positions alternating colors.

Very nice. Am I right in thinking all the engines were running in SP mode? Also, what was the time control, and which version of TogaII was used?

bob · Post by **bob** » Mon Dec 22, 2008 7:04 am

Dirt wrote:
bob wrote:Here's the result, 6 programs including glaurung 1, 2.1 and 2.2, fruit 2, toga2, and crafty 22.9.
Code: Select all
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2      2662    4    3 38910   60%  2588   25% 
   2 Toga2             2650    4    3 38910   58%  2590   23% 
   3 Glaurung 2.1      2646    3    4 38910   58%  2591   25% 
   4 Crafty-22.9       2594    4    4 38910   49%  2601   21% 
   5 Fruit 2.1         2550    4    3 38910   42%  2610   24% 
   6 Glaurung 1.1 SMP  2497    3    4 38910   34%  2621   20% 
Every program played 3891 positions against the other five, two games per positions alternating colors.
Very nice. Am I right in thinking all the engines were running in SP mode? Also, what was the time control, and which version of TogaII was used?

SP mode (one thread), no book, no EGTBs, time control was 20 sec + .1sec increment. All I can say about toga2 is that it was very recent and the discussion here indicated this was the best one (at the time, which was within the past month).

No pondering either... as simple as possible.

Spock · Post by **Spock** » Mon Dec 22, 2008 8:08 am

At chess960, after 600 games, Glaurung 2.2 is holding at +62 ELO over Glaurung 2.1

http://www.computerchess.org.uk/ccrl/40 ... _pure.html
(no game downloads from here, that will come when all games completed)

This improvement has allowed Glaurung to jump ahead of both Fruit and Loop.

So as Tord suggested, looks like his changes are very beneficial at chess960

There are 500 more games to come, then that will be completed

Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2

Re: Glaurung 2.2