Questions for the Stockfish team

jdart · Post by **jdart** » Fri Jul 23, 2010 9:57 pm

Yes, it is very true about correctness being important.

I never cease to be amazed how buggy a program can be and still play chess, even win games. I've had really serious bugs and it still mostly worked, probably because material still counts for a lot and the program at least knows how to calculate that.

--Jon

Daniel Shawul · Post by **Daniel Shawul** » Fri Jul 23, 2010 10:14 pm

Since you actually sound like you want to move on and I do too (God knows how much energy I lost last couple
of days generating crap data ), the way I understood it from the beginning was as eval() = rand() and nothing
else, and you are saying also turning off null move etc. Me and Tord tried it and got shocking results..
Maybe you forgot to tell us to turn off all selectivity , as all you have said was random eval .

-----------------------
Well about the Beal's effect being there how would I know it requires a unique type of tree.
The first two points I raised are real issues if you use modern selective search. Who woulda
thunk it was meant for Shannon type A trees..

I don't even know why the rand() is required really. Can't we just return the number of moves
at a ply, instead of generating random numbers and taking the maximum. The latter only makes the intended
effect less pronounced. Can I call this Daniels's effect inspired by Beal's effect

and put a
disclaimer _for shannon type A trees only and probably other unforseen restrictions to make it work _ .
That really sums up what was going on last couple of days from my perspective.

jwes · Post by **jwes** » Fri Jul 23, 2010 10:37 pm

Daniel Shawul wrote:There seem to be a big difference between skill=1 and skill=0. crafty scored much better almost +200elo !! Never underestimate anything..Thats why I objected to it strongly in the first place!
Code: Select all
Num. Name          games   score 
   0 Crafty-23.1     300     122 
   1 XboardEngine    300     178 
Rank Name           Elo    +    - games score oppo. draws 
   1 XboardEngine    33   18   18   300   59%   -33   18% 
   2 Crafty-23.1    -33   18   18   300   41%    33   18% 
Games http://sites.google.com/site/dshawul/cr ... ects=0&d=1

It will be good if someone does the same tests as I did for confrimation..
I did the test for skill 0 again and the result is the same 80-20 % for TSCP...

I was thinking that even 1% of eval could make a significant difference, e.g. moves that drop a queen will evaluate to 9 centipawns less than moves that do not, and when your numbers are in the range 0-100, it could make a significant difference.

jwes · Post by **jwes** » Fri Jul 23, 2010 10:40 pm

jdart wrote:Yes, it is very true about correctness being important.

I never cease to be amazed how buggy a program can be and still play chess, even win games. I've had really serious bugs and it still mostly worked, probably because material still counts for a lot and the program at least knows how to calculate that.

--Jon

It would be frustrating to spend a lot of time writing a program and have it lose to a program with a random evaluation.

Uri Blass · Post by **Uri Blass** » Fri Jul 23, 2010 11:02 pm

bob wrote:
Here is the ply-1 move list as it is sorted, with the scores returned by quiesce():
move score
e5 -2
Nf6 -9
e6 -11
d6 -12
a6 -16
f5 -21
f6 -28
c5 -30
Na6 -35
c6 -41
g6 -47
d5 -51
b5 -60
a5 -64
Nc6 -66
b6 -66
h5 -78
Nh6 -82
g5 -88
h6 -98

Something seems wrong here because all the moves have a negative score.

If I understand correctly it means that if there is a move that force repetition during the search it is going to be at the top of the list because it is going to have evaluation of 0 that is better than -2 and with deep search it may encourage winning material because often the side with the advantage can force repetition.

It is more logical to have random scores of -50<eval<50 and maybe the playing strength is going to be weaker with random evaluation in the right range.

Uri

jdart · Post by **jdart** » Fri Jul 23, 2010 11:29 pm

> It would be frustrating to spend a lot of time writing a program and have it lose to a program with a random evaluation.

Well, I've never had a random eval. But I did have a sign error in eval - so it was scoring something + when it should be -. That is a pretty big deal.

--Jon

bob · Post by **bob** » Fri Jul 23, 2010 11:37 pm

jwes wrote:
Daniel Shawul wrote:There seem to be a big difference between skill=1 and skill=0. crafty scored much better almost +200elo !! Never underestimate anything..Thats why I objected to it strongly in the first place!
Code: Select all
Num. Name          games   score 
   0 Crafty-23.1     300     122 
   1 XboardEngine    300     178 
Rank Name           Elo    +    - games score oppo. draws 
   1 XboardEngine    33   18   18   300   59%   -33   18% 
   2 Crafty-23.1    -33   18   18   300   41%    33   18% 
Games http://sites.google.com/site/dshawul/cr ... ects=0&d=1

It will be good if someone does the same tests as I did for confrimation..
I did the test for skill 0 again and the result is the same 80-20 % for TSCP...
I was thinking that even 1% of eval could make a significant difference, e.g. moves that drop a queen will evaluate to 9 centipawns less than moves that do not, and when your numbers are in the range 0-100, it could make a significant difference.

As I said previously, I've already run this test and it made no difference, so long as all the other skill=1 stuff is done as well.

bob · Post by **bob** » Fri Jul 23, 2010 11:57 pm

Daniel Shawul wrote:Since you actually sound like you want to move on and I do too (God knows how much energy I lost last couple
of days generating crap data ), the way I understood it from the beginning was as eval() = rand() and nothing
else, and you are saying also turning off null move etc. Me and Tord tried it and got shocking results..
Maybe you forgot to tell us to turn off all selectivity , as all you have said was random eval .

To recap, Volker P reported that skill=1 was playing at a 1750 level, and when I tried it, I got around 1800. Initially I thought I had simply broken the skill stuff somehow. I compared the code and not one line was different. I then tried pure random, to see if somehow that 1% was causing the problem. No statistical difference when I ran a cluster test. Then I remembered Don's random eval paper and did a quick re-read and discovered I had walked right into the "Beal effect" without even thinking. And that was that. Don also reported that as depth increased, the "Beal effect" (again, my name, not his, named after him since he discovered this) became stronger. So I began working on an artificial slow-down to drop the depth. And sure enough, this dropped the Elo way down as expected, not bottoming out at 1800.

I had cut pruning, LMR, extensions and such out as skill goes down because I knew the mate recognition code (and repetition recognition code) would be functional, and an 800 player would not pop out and announce a mate in 20 in a tactical position. To prevent that, as the skill command is lowered, so to is all the selective stuff so that the average depth goes _way_ down. And that appears to be the necessary factor for the beal effect to work. deep but narrow lines defeat the probability idea and play more randomly. A pure minimax would probably be better than alpha/beta using this, in fact, but that's conjecture, not observed fact.

The effect is real, and surprising. Hardly useful, however, although it is interesting as a curiosity.

From the beginning, I was solely discussing the skill=1 problem and what I and several others were seeing. There is no bug in move ordering, no way for real scores to "slip in" or anything else. Root moves are ordered essentially randomly. The search is horribly inefficient (fh% drops into the toilet, which is to be expected with random scoring since no move ordering can be good). Etc. All added together produces an opponent much stronger than expected.

I found two ways to get around it. The first is to slow the thing down so that the depth goes in the tank, which minimizes the effect (it is a complete zero at a 1 ply search for obvious reasons). The second is to collapse the range of random numbers so that even with just a few moves you have a good probability of producing a large number since there may only be 5 or 10 different numbers available. I didn't like the "feel" of the latter because it becomes quite dumb, without any ability to fine-tune it. For example, if you just use 0 and 1, it will play terribly. 0-3 is much better. By the time you get to 20 or 30 it is at full "Beal effect" strength it seems.

In summary, 23.3 seems to work, which was my goal. In passing, it is worthwhile to note that a pure random eval can work to a limited extent, although one does need a wide (as opposed to narrow and selective) search so that the effect can exert its influence.

-----------------------
Well about the Beal's effect being there how would I know it requires a unique type of tree.
The first two points I raised are real issues if you use modern selective search. Who woulda
thunk it was meant for Shannon type A trees..

Quite simply. "Read his paper." Always the first tenet of doing research. "Know what has been done before you, so that you don't repeat past mistakes or wast time duplicating past results." At several points in this discussion I pointed out that this appeared to be an issue.

I don't even know why the rand() is required really. Can't we just return the number of moves
at a ply, instead of generating random numbers and taking the maximum. The latter only makes the intended
effect less pronounced. Can I call this Daniels's effect inspired by Beal's effect and put a
disclaimer _for shannon type A trees only and probably other unforseen restrictions to make it work _ .
That really sums up what was going on last couple of days from my perspective.

Sure. But # of moves is chess-specific and requires that one generate and count 'em, while a random evaluation is a tad simpler. That was his point. Counting moves becomes _real_ mobility.

In operating systems we often use a "random" algorithm to serve as a worst-case scenario, since any algorithm can be turned into a random one trivially. In page replacement, random can even be better than things like LRU when a program itself behaves randomly. So it offers an interesting "lower bound" concept. And that's all Don was investigating at the time, and discovered that this "lower bound" was not nearly as bad as he had expected. I ran into the same problem due to lack of thinking, because I was one of the referees for his paper when he published it and knew about the idea before I fell into the same oddity he had discovered.

UncombedCoconut · Post by **UncombedCoconut** » Sat Jul 24, 2010 12:28 am

bob wrote:SKILL has not changed. But I have improved the selective stuff quite a bit. Might be we failed to turn everything off somehow, I'll try to look.

Hi, Dr. Hyatt.
This is my first look through Crafty code, but I was wondering whether your material-based forward pruning should be altered in SKILL mode. For reference, I refer to (23.2a code)

Code: Select all

          if &#40;depth < pruning_depth && moves_searched &&
              MaterialSTM&#40;wtm&#41; + pruning_margin&#91;depth&#93; <= alpha&#41; &#123;
            tree->moves_pruned++;
            continue;
          &#125;

I haven't seen anything that disables it in low-skill modes.
If skill is close to zero, this would limit the mobility to 1 for a side which is down material during its last 1-4 plies, depending on depth & deficit. Would this not bias the pseudo-mobility the Beal effect optimizes toward material?

I think it would be somewhat logical to compare alpha to [skill% MaterialSTM(wtm) + (1-skill%)random], since that's how the wood count would evaluate and it causes more mistaken pruning in low-skill modes. Of course other ideas are possible.

My tests are showing random-eval Crafty gets *much* weaker / happier to drop major pieces if this code is disabled. However, I'm not using pure skill=1 (instead, something I was fooling around with for a different test). All I can say is that this could be an interesting way to weaken Crafty more.

bob · Post by **bob** » Sat Jul 24, 2010 12:39 am

UncombedCoconut wrote:
bob wrote:SKILL has not changed. But I have improved the selective stuff quite a bit. Might be we failed to turn everything off somehow, I'll try to look.
Hi, Dr. Hyatt.
This is my first look through Crafty code, but I was wondering whether your material-based forward pruning should be altered in SKILL mode. For reference, I refer to (23.2a code)
Code: Select all
          if &#40;depth < pruning_depth && moves_searched &&
              MaterialSTM&#40;wtm&#41; + pruning_margin&#91;depth&#93; <= alpha&#41; &#123;
            tree->moves_pruned++;
            continue;
          &#125;
I haven't seen anything that disables it in low-skill modes.
If skill is close to zero, this would limit the mobility to 1 for a side which is down material during its last 1-4 plies, depending on depth & deficit. Would this not bias the pseudo-mobility the Beal effect optimizes toward material?

I think it would be somewhat logical to compare alpha to [skill% MaterialSTM(wtm) + (1-skill%)random], since that's how the wood count would evaluate and it causes more mistaken pruning in low-skill modes. Of course other ideas are possible.

My tests are showing random-eval Crafty gets *much* weaker / happier to drop major pieces if this code is disabled. However, I'm not using pure skill=1 (instead, something I was fooling around with for a different test). All I can say is that this could be an interesting way to weaken Crafty more.

Look at the margin array, and then remember that for skill 1 the score range is only 0-99. That dumps the futility pruning. LMR and such get their reduction plies set to zero in option.c when you enter "skill=1".

The pruning will _always_ be mistaken in lower skill levels, because alpha and beta are going to be in the range 0-100 at most, so if it did make a decision to prune or not prune (it will choose not because the pruning margins are wider than max alpha/beta values) the decision would be purely random anyway.

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Attention : Possible Crafty problem.

Re: Attention : Possible Crafty problem.

Re: Questions for the Stockfish team

Re: Attention : Possible Crafty problem.

Re: Questions for the Stockfish team

Re: Attention : Possible Crafty problem.

Re: Attention : Possible Crafty problem.

Re: Attention : Possible Crafty problem.

Re: Attention : Possible Crafty problem.