Questions for the Stockfish team

bob · Post by **bob** » Thu Jul 22, 2010 9:37 pm

Chan Rasjid wrote:
bob wrote:
AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.
I posted the code from Crafty a couple of times. Here is a very simplified version of what it does:

int Evaluate(int wtm) {

int score = Random(); // returns a value 0 <= v < 100
return ((wtm) ? score : -score);
}

I am not certain that the negation at the bottom is required.

And doing that is producing a program that is playing around 1800 on the rating lists that measure programs on this end of the rating range...
I think 1800 might be possible as your engine has mates+draws+Beal's mobility.

But I still don't like small one-sided numbers (might be erroneous ?) for white and black; better to comment out Evaluate() and replace with something like:
-7000 + (hash % 14001) as long as they don't invalidate mates scores.

Rasjid

Let me give an example for why I think this doesn't matter.

Let's suppose we are going to do a simple 2 ply search + captures. Random eval (since there are not any captures in my example) of 0-99.

At the root, we have 2 moves, m1-1 and m1-2. m1-1 is a check that has only one way out. m1-2 is a normal move that leaves the opponent with 20 possible replies.

We make move m1-1 (at a max node) and reach the position at ply-2 which is a min node. We make the only possible move and get to ply 3. With no captures, we generate a random number (let's guess 50) and return this. But when it is passed back to ply-2 it is negated and gets there as a -50. We are finished with this move and back up this to the root, which via negamax now turns into a +50, the score for the first move. Now we try the second move from the root, and again reach ply 2, but here we have 20 moves. So we make each one of those, and call quiesce() which returns a random number between 0 and 99. But as it is backed up to ply2 negamax negates it. So we choose the largest of those numbers, which is actually the smallest random number we found. If we assume that the 20 random numbers are 1 thru 20, then at ply=2 we choose the largest of -1 to -20, which is -1. We return that back to the root where it becomes +1. And we choose the move with the largest score, which is max(50,1).

It doesn't matter if the numbers are centered around 0 or not. What is important is that you get a sampling distribution of the random numbers and always choose the most favorable result, but then it gets backed up thru negamax. So that at ply P you pick the largest value each time you get here and back that up, but at ply P-1 you get those values with the sign changed and pick the largest of them there, which is the smallest value from the next ply if you think about it.

These values propagate back thru the tree, although the numbers mean nothing. All you can conclude is that if you are a max node, you pick the move that leads to the largest "score" because that represents the move where you had the largest number of options further down, and your opponent had the fewest options.

A bit odd until you think about it, and then suddenly nothing matters but the random numbers. All you care about is that the more choices you have, the better your chance of being able to pick from a set that has at least one large number. And then at the previous ply, your opponent chooses from that set of scores, but negated, so that he is picking the smallest...

bob · Post by **bob** » Thu Jul 22, 2010 9:39 pm

Daniel Shawul wrote:I just did the exact same thing as you did, negate score, use 100 as a factor ... all futile attempts from you to avoid the inevitable. 300 games against TSCP with same disastrous results..
I will add more games if it is not convincing enough.

http://sites.google.com/site/dshawul/ev ... ects=0&d=1

negated score with 0 - 100
Code: Select all
Num. Name            games   score 
   0 Scorpio_random    300       4 
   1 XboardEngine      300     296 
Rank Name             Elo    +    - games score oppo. draws 
   1 XboardEngine     359   81   54   300   99%  -359    0% 
   2 Scorpio_random  -359   54   81   300    1%   359    0% 
0 - 100 without negation
Code: Select all
Num. Name            games   score 
   0 Scorpio_random    300     3.5 
   1 XboardEngine      300   296.5 
Rank Name             Elo    +    - games score oppo. draws 
   1 XboardEngine     359   81   54   300   99%  -359    0% 
   2 Scorpio_random  -359   54   81   300    1%   359    0% 

The negation is irrelevant, it turns out. Because the numbers themselves are not important. It is sampling from the distribution of numbers that is making this work.

bob · Post by **bob** » Thu Jul 22, 2010 9:41 pm

Daniel Shawul wrote:
Did you ever think that perhaps _you_ screwed up the test. Others of us are not having that problem. For me, when others report X, and I run a test and conclude ~X, I go back and look at the test carefully to make sure it is doing what it is supposed to do, rather than shouting that everyone _else_ is wrong.
NO. The change you told me to do is HARD to screw up. Think about it one line code change, what could possibly go wrong....
I just added this at the beginning of eval and touched _nothing_else .
Code: Select all
int SEARCHER&#58;&#58;eval&#40;int lazy&#41; &#123;
	int score = int&#40;&#40;100.0 * rand&#40;)) / &#40;RAND_MAX + 1.0&#41;);
	//print&#40;"eval %d\n",score&#41;;
	//if&#40;player == black&#41; return -score;
	return score;
     ....
You comment that out and you get the regular scorpio. Everything is here http://sites.google.com/site/dshawul/ev ... ects=0&d=1,
source code of what I am using two test games against tscp. With negated score,0 - 100.0 , hardly makes a difference, same garbage engine...
Anyone can repeat the experiment and confirm.

Stop blowing hot air...

Did you do what I explained? No null-move. No LMR. No extensions. Everything that makes the tree shallow and fat. Long selective lines are going to break this, likely.

meanwhile, I am working to try to make this produce worse results, and it is a bit of a challenge. Shallow depths is making a big difference so far, so that may be my key to getting this back down to sub-1000. But look at the post in general forum to see the original discussion. This is not an imaginary problem.

As far as hot air goes, that would seem to be _your_ providence. Beal wrote the paper. Others verified his results. I moved the skill level thread back to the top in the general forum so that you can see the results there. And I explained exactly how to see the problem in current crafty. What scorpio does or doesn't do is not the issue. What I and others are seeing is.

AlvaroBegue · Post by **AlvaroBegue** » Thu Jul 22, 2010 9:47 pm

AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.

bob wrote:It doesn't matter if the numbers are centered around 0 or not.

You didn't address my question at all.

bob · Post by **bob** » Thu Jul 22, 2010 9:47 pm

rjgibert wrote:My point:
elo 200 it's not 1/4, it's 1/4.16 or 3.16 to 1 against.
elo 400 it's not 1/16, it's 1/11 or 10 to 1
elo 600 it's not 1/64, it's 1/32.16 or 31.16 to 1
elo 800 it's not 1/256, it's 1/101 or 100 to 1
elo 1000 it's not 1/1024, it's 1/317.23 or 316.23 to 1

The odds are powers of 3.16227766 e.g. 3.16^5 = 316.23 approximately. You are computing powers of the probabilities, which is not valid in the context of your post.

And this time, please actually read what I write and actually think about what I write before replying.

And a factor of two overall means exactly what here, with respect to significance? Does this suddenly mean that 0/67 is now proof that the program is at least 1,000 elo worse if not more? I'm not so worried about the odds against winning, I was simply pointing out that 0/67 is more than expected if the rating difference is 1000 or more. In the case of Crafty, I want _much_ more because going from 2800 to 1800 is where I am already, but I want to go _much_ lower. So playing against Crafty (normal) is going to take a _ton_ of games to get a single win.

bob · Post by **bob** » Thu Jul 22, 2010 11:23 pm

AlvaroBegue wrote:
AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.

bob wrote:It doesn't matter if the numbers are centered around 0 or not.
You didn't address my question at all.

The answer is "unknown". Remember, my original intent was simply to weaken Crafty significantly in response to lots of user requests for such a feature. Previous attempts by others were deemed unsatisfactory. Turn off eval and leave material + normal search and you get a positional idiot + a tactical genius which doesn't feel right. Dumb down search and leave evaluation alone and you get positional genius + tactical idiot. Again, doesn't feel right. My approach was a little (or a lot) of both. As skill setting goes down, search depth shrinks because all the clever pruning stuff + extensions phase out, and eval gets dumber as it becomes more and more pure random numbers.

This is not a paper I am working on at all, so I have not given months of thought to the problem. When I did the skill command, it was tested on our cluster, and I noticed that skill 70 dropped elo by 200, skill 50 dropped elo by another 200. Seemed to be pretty non-linear and I didn't have any way to test the 600-800-1000 type drop since no programs I had were that weak. I just assumed that skill 1 was totally unusable.

Along came the thread in the general forum about the skill 1 getting to be too strong, and I started looking. And have come up with something that seems to solve the problem, which was my only intent. But the quality of moves by a pure random eval (with no moves) is remarkable, and the quality improves with depth. On my laptop, at 10 secs per move it doesn't make very many tactical goofs, while limiting the depth to 3-4-5 plies causes gross blunders... Hence my work to add in a "slow-down" for low skill settings. Now all I need is some calibration data to get an idea how this changes the Elo, as I'd like a pretty smooth transition from strong to weak...

To try to answer your question, I will try disabling the repetition detection, just to see if that changes anything at skill=1. No idea at the present.

Gerd Isenberg · Post by **Gerd Isenberg** » Thu Jul 22, 2010 11:34 pm

bob wrote: Let me give an example for why I think this doesn't matter.

Let's suppose we are going to do a simple 2 ply search + captures. Random eval (since there are not any captures in my example) of 0-99.

At the root, we have 2 moves, m1-1 and m1-2. m1-1 is a check that has only one way out. m1-2 is a normal move that leaves the opponent with 20 possible replies.

We make move m1-1 (at a max node) and reach the position at ply-2 which is a min node. We make the only possible move and get to ply 3. With no captures, we generate a random number (let's guess 50) and return this. But when it is passed back to ply-2 it is negated and gets there as a -50. We are finished with this move and back up this to the root, which via negamax now turns into a +50, the score for the first move. Now we try the second move from the root, and again reach ply 2, but here we have 20 moves. So we make each one of those, and call quiesce() which returns a random number between 0 and 99. But as it is backed up to ply2 negamax negates it. So we choose the largest of those numbers, which is actually the smallest random number we found. If we assume that the 20 random numbers are 1 thru 20, then at ply=2 we choose the largest of -1 to -20, which is -1. We return that back to the root where it becomes +1. And we choose the move with the largest score, which is max(50,1).

It doesn't matter if the numbers are centered around 0 or not. What is important is that you get a sampling distribution of the random numbers and always choose the most favorable result, but then it gets backed up thru negamax. So that at ply P you pick the largest value each time you get here and back that up, but at ply P-1 you get those values with the sign changed and pick the largest of them there, which is the smallest value from the next ply if you think about it.

These values propagate back thru the tree, although the numbers mean nothing. All you can conclude is that if you are a max node, you pick the move that leads to the largest "score" because that represents the move where you had the largest number of options further down, and your opponent had the fewest options.

A bit odd until you think about it, and then suddenly nothing matters but the random numbers. All you care about is that the more choices you have, the better your chance of being able to pick from a set that has at least one large number. And then at the previous ply, your opponent chooses from that set of scores, but negated, so that he is picking the smallest...

Daniel's results seem to confirm that symmetrical distributed eval around zero don't cares. If you take max from equal distributed samples, you have the largest, no matter how much smaller the others are:

Code: Select all

max (-100, 100&#41; == max&#40;99, 100&#41;

What happens with the usual > 95% fail high on first move measure and EBF?

bob · Post by **bob** » Thu Jul 22, 2010 11:49 pm

Gerd Isenberg wrote:
bob wrote: Let me give an example for why I think this doesn't matter.

Let's suppose we are going to do a simple 2 ply search + captures. Random eval (since there are not any captures in my example) of 0-99.

At the root, we have 2 moves, m1-1 and m1-2. m1-1 is a check that has only one way out. m1-2 is a normal move that leaves the opponent with 20 possible replies.

We make move m1-1 (at a max node) and reach the position at ply-2 which is a min node. We make the only possible move and get to ply 3. With no captures, we generate a random number (let's guess 50) and return this. But when it is passed back to ply-2 it is negated and gets there as a -50. We are finished with this move and back up this to the root, which via negamax now turns into a +50, the score for the first move. Now we try the second move from the root, and again reach ply 2, but here we have 20 moves. So we make each one of those, and call quiesce() which returns a random number between 0 and 99. But as it is backed up to ply2 negamax negates it. So we choose the largest of those numbers, which is actually the smallest random number we found. If we assume that the 20 random numbers are 1 thru 20, then at ply=2 we choose the largest of -1 to -20, which is -1. We return that back to the root where it becomes +1. And we choose the move with the largest score, which is max(50,1).

It doesn't matter if the numbers are centered around 0 or not. What is important is that you get a sampling distribution of the random numbers and always choose the most favorable result, but then it gets backed up thru negamax. So that at ply P you pick the largest value each time you get here and back that up, but at ply P-1 you get those values with the sign changed and pick the largest of them there, which is the smallest value from the next ply if you think about it.

These values propagate back thru the tree, although the numbers mean nothing. All you can conclude is that if you are a max node, you pick the move that leads to the largest "score" because that represents the move where you had the largest number of options further down, and your opponent had the fewest options.

A bit odd until you think about it, and then suddenly nothing matters but the random numbers. All you care about is that the more choices you have, the better your chance of being able to pick from a set that has at least one large number. And then at the previous ply, your opponent chooses from that set of scores, but negated, so that he is picking the smallest...
Daniel's results seem to confirm that symmetrical distributed eval around zero don't cares. If you take max from equal distributed samples, you have the largest, no matter how much smaller the others are:
Code: Select all
max (-100, 100&#41; == max&#40;99, 100&#41;
What happens with the usual > 95% fail high on first move measure and EBF?

EBF goes to hell in a handbasket, because as I mentioned, null-move, LMR, extensions, pruning, all get turned completely off by the time we get down to "skill 1". I have not tested random eval with normal search, but will do so before long just to see if this hurts, which I suspect it might. A short, fat tree seems to offer the best chance for "the Beal effect" rather than our current wildly variable depth searches for different lines... But I am can't make that statement with any confidence until I actually test it.

If I disable the current "spin-loop" that slows the search _way_ down at skill 1, the thing plays reasonably. Hardly hangs pawns or pieces. You'd think that with purely random eval, you could make a capture and it would fail to make the recapture, or you could threaten a piece and it would ignore it. Not so at all, which is amazing. Even slowed down to 3-4 ply searches at skill 1 on my laptop, it still seems many of the above moves, although it will certainly not recognize any sort of threat with any reliability.

If you have not tried, you should compile crafty with -DSKILL, and run it with no book and skill 1 and watch how it plays. Knowing that the eval is 99% random. It is pretty amazing, at least to me as a decent chess player. When Volker first pointed out that the thing had somehow gotten too strong at skill 1, I was watching TV and just playing instant moves myself, and lost 3 games before I knew what hit me. I then started playing more seriously, and noticed that "this thing is not blundering around at all, some of the moves are ugly at times (we all know what a pure mobility eval can do for moves like a4 and h4 and such, and bringing the queen out too early) but it is not hanging pieces, and when I do hang a piece, damn thing rips it instantly...

I was not expecting that, and it took some testing to finally understand what was happening, then some thinking to figure out how to fix it.

jwes · Post by **jwes** » Fri Jul 23, 2010 12:26 am

I tried playing against crafty with skill 1 and depth 2 and it played suspiciously well. It seemed surprisingly unwilling to drop pieces, more than I would guess the beal effect would explain. It also means if someone really cares, they can trace through the play at ply 2 and see if it is just the beal effect.

bob · Post by **bob** » Fri Jul 23, 2010 3:04 am

jwes wrote:I tried playing against crafty with skill 1 and depth 2 and it played suspiciously well. It seemed surprisingly unwilling to drop pieces, more than I would guess the beal effect would explain. It also means if someone really cares, they can trace through the play at ply 2 and see if it is just the beal effect.

If you look at the code, there is no other explanation. Clearly the eval is 99% random, 1% real score. No tricks. Surprises me as well, although at depth 2 I didn't find it playing that great. But at st=1 is sure was aware of what it was doing...

I'm going to keep poking around, however.

Questions for the Stockfish team

Re: Result 10000 - 0

Re: Result 10000 - 0

Re: Result 10000 - 0

Re: Result 10000 - 0

Re: Questions for the Stockfish team

Re: Result 10000 - 0

Re: Result 10000 - 0

Re: Result 10000 - 0

Re: Result 10000 - 0

Re: Result 10000 - 0