Questions for the Stockfish team

bob · Post by **bob** » Wed Jul 21, 2010 11:00 pm

Gerd Isenberg wrote:
Daniel Shawul wrote:Gerd,
My objection was to a completely random evaluation which I tried to outline as much as I can here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455. Now we have come to apparent consensus how the score of one side should be negated to the other side for minimax to work... This was originally absent from his reply to me but somehow expected me to understand even after giving me pseudocode how to do it..

I gave up the point that it does some weird kind of mobility evaluation the minute Marco posted it. But I pointed out how bad that eval is and how one sided it is completely disregarding the perfect information game assumption. The supposed engine evaluates like 'poker' , like it can't see what the opponent has to offer. It just evaluates its mobility and goes on... See points c & d of my post in the link above.

Did they (Don Beal) say a 1800 elo engine can be constructed this way ? Even he (Bob) himself didn't belive it when people first told him it plays like 1800. He thought it played like 800 (said it in this thread ofcourse). I can't say what crafty does / does not do, that is why I am sticking to what he says about the effect with random eval and I am definately not getting a 1800 elo engine.

Daniel
Hi Daniel,
yes, you can not weigh one's words in this threads and there are tons of misunderstandings due to implicit knowledge and assumptions about the point, selective reading, impatience etc..

Apparently, if the random range per side is symmetric around zero, you may use the same "evaluation" function for white and black, of course no matter whether you use explicit min versus max or negamax. I guess if both white and black to move leaf nodes have always "random" winning scores from their negamax perspective let say in the 1000..2000 range (> 15000 mate) things are obviously different, specially with all usual ID, TT, extension stuff etc.. I guess that the latter has some more search "instability problems" and I expect it weaker than the symmetric one.

I don't have that ICCA Journal handy, and I'm not sure I ever read Beal's article. I remember some discussions. So no idea on any Elo-figures. May be Bob's 1800 claim is a bit overestimated due to huge error by some "random" wins or draws against otherwise > 2800 engines, which don't apply any opponent model approach. On the other hand such random eval engines still have mate scores.

Gerd

This is not "my estimate". This happened in a general forum post where someone that was maintaining such a "crippled list" clearly had Crafty way stronger than it was supposed to be. And there were no 2000+ programs in this list, because of the error that introduces...

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Jul 21, 2010 11:01 pm

AlvaroBegue wrote:
bob wrote:The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.
Why would the low-order bits in the hash not be essentially random?

Position independent random would give same positions, whether re-searched or transposed, different and not same scores.

bob · Post by **bob** » Wed Jul 21, 2010 11:01 pm

jwes wrote:I just had another idea about why crafty plays too well. Since the search will always choose the first move that returns the minimax score, the move that your move ordering picks first is very likely to be chosen.

However, that ordering is done by q-searches, so it is ordered based on random results, which is _really_ bad. If there are no captures, each root move is scored (by q-search) as a pure random number.

Ralph Stoesser · Post by **Ralph Stoesser** » Wed Jul 21, 2010 11:05 pm

Dann Corbit wrote:
Ralph Stoesser wrote:
Dann Corbit wrote: Here is how it ended up:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         &#58; 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod &#58; 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         &#58; 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         &#58; 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         &#58; 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         &#58; 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         &#58; 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         &#58; 2486    0   0    55     0.0 %   3086    0.0 %
It seems you have found a strong setting. What was the time control?
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.

It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.

Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.

bob · Post by **bob** » Wed Jul 21, 2010 11:05 pm

Gerd Isenberg wrote:
bob wrote:
Daniel Shawul wrote:You said random evaluation at first, and then you started bringing
order first by 0.01 * real eval which I strenously objected to,
then you said eval of white = -eval of black which further breaks the random nature of the eval, period.
I will not try to convince anyone further. Anyone interested to know my position can read all the issues
I raised here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455 with the perspective of random eval and take their own conclusion.
It really doesn't help if you post voluminous game resutls with different setup than what was discussed.
This is basically a strawman argument from you which neglects the complete random evaluation criteria
you originally proposed.

They say insanity is doing the same thing over and over again and expecting different results.
I say it is expecting a consistent miracle from a random event.
I said, quite clearly, that I have tried _all_ of those things, if you read. The original discussion was about Crafty's "skill 1" performance. Just look at 23.2, search for SKILL in evaluate.c. It is not _that_ complicated to understand what I do. skill 100 simply uses 100% of normal eval. Which is negated at the return point in evaluate.c if it is not WTM since the eval is based on +=good for white.

for skill 1, you get:

score = .1 * score + .99 * random()

then it does the normal

return (wtm) ? score : - score;

That has been there since the skill command was added. I assume that if you jump into a discussion, you at least know what it is about. Which would include this information. I then pointed out that I had tried a pure random number with no positional component at all. just:

score = random();

with the usual return.

If you use negamax, then yes, you have to negate the score because black wants the biggest score, but that has to be the opposite of what white wants. So, the normal "colorful" return is used.

And that does give a pseudo-mobility that works just fine. Too fine, in fact...

So before we go on, how about looking at the statement following the "SKILL" token in evaluate.c, so that we are talking about the same thing. I am _always_ talking about what I do in Crafty, not what I imagine others are doing in their programs...
Not everybody likes to download crafty sources to inspect your code on the fly. It is much more appropriate and convenient for the readers if you explain it with patience

What is the value range of random? Is it symmetrical around zero or not?

How would this three more or less random evals play?
Code: Select all
negamaxEval &#58;&#58;= rand&#40;) > 0.5 ? MATE_SCORE/2 &#58; -MATE_SCORE/2

negamaxEval &#58;&#58;=  wtm ? MATE_SCORE/2 &#58; -MATE_SCORE/2;

negamaxEval &#58;&#58;=  MATE_SCORE/2;  // both are winning at a leaf

In Crafty-23.2, the range of numbers is 0 <= N < 99, so between 0.00 and 0.99. The only thing I do that needs some thought is that these numbers are negated if black is on move. I have not yet concluded exactly what the effect of that is.

I think all 3 of the above suggestions will negate the "Beal effect" because they are not random, and you either get A or B, but nothing in between. The Beal effect depends on a range of numbers assigned with equal probability, so that the more moves there are for one side, the more opportunities that side has to get a bigger number...

bob · Post by **bob** » Wed Jul 21, 2010 11:10 pm

Ralph Stoesser wrote:
Dann Corbit wrote:
Ralph Stoesser wrote:
Dann Corbit wrote: Here is how it ended up:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         &#58; 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod &#58; 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         &#58; 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         &#58; 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         &#58; 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         &#58; 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         &#58; 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         &#58; 2486    0   0    55     0.0 %   3086    0.0 %
It seems you have found a strong setting. What was the time control?
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.
It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.

In my tests (so far) skill=1 and skill=0 are scoring the same. And worse than skill=10, which is worse than skill=20, etc. I'd expect the skill = -N versions to play worse because rather than doing null-move searches that reduce the depth to save time, the null-move searches are actually deeper than the normal search which will slow things down significantly and make it play weaker, which is what happened. It almost looks like the skill=0 version is using a normal eval somehow...

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 11:10 pm

Here is another 300-0 this time with real random() betweeen 0 - 1000 NOT (hash_key % 1000) as I don't want to let you make that an issue...
crap games 2 http://sites.google.com/site/dshawul/te ... ects=0&d=1
Expect 30000 games in a couple of hours. All I need is more opponent and more postions...

bob · Post by **bob** » Wed Jul 21, 2010 11:13 pm

AlvaroBegue wrote:
bob wrote:The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.
Why would the low-order bits in the hash not be essentially random?

You are at position P, you make a move and then either generate a random number or use the lower N bits. Then you make another move. Are you certain that the low-order bits of the random number you use to update the board alters the lower N bits, as opposed to just altering a significant number of other bits, which is all we want for our Zobrist hashing. This "Beal effect" depends on uniformly distributed random numbers to come up with this pseudo-mobility stuff... I am not certain the hash signature will offer this. It might, and I am going to test it, but I am not sure enough to say good or bad at the moment. It would certainly be different, however.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 11:13 pm

This is actually a method I came up with in this thread to avoid breaking this same logic. Also suggested to use +ve/-ve for white black not to break minimax. But the result remains the same 300-0 to non-random..

bob · Post by **bob** » Wed Jul 21, 2010 11:16 pm

Tord Romstad wrote:
bob wrote:1 0 is not the best test. That restricts the depth enough that random eval fails, but use something like 1+1 or 2+2 and watch what happens.
As Larry pointed out, it wasn't 1 0, but 2+1. At any rate, here's what happens at 5+2:
Code: Select all
&#91;Event "Test Game"&#93;
&#91;Site "Oslo"&#93;
&#91;Date "2010.07.21"&#93;
&#91;Round "-"&#93;
&#91;White "Stockfish 100720 64bit"&#93;
&#91;Black "tord"&#93;
&#91;Result "0-1"&#93;
&#91;TimeControl "5+2"&#93;

1. e4 e6 2. d4 d5 3. Nd2 Nf6 4. e5 Nfd7 5. f4 c5 6. c3 Nc6 7. Ndf3 Qb6 8.
g3 cxd4 9. cxd4 Bb4+ 10. Kf2 g5 11. Nxg5 Qxd4+ 12. Qxd4 Nxd4 13. Be3 Nc2
14. Rc1 Nxe3 15. Bb5 Nf5 16. N1f3 Ba5 17. Rhd1 a6 18. a3 axb5 19. Nd4 Bb6
20. Kg2 Nxd4 21. Rc3 Nf5 22. Rdc1 Ke7 23. Rb3 Be3 24. Rc7 Bb6 25. Rc1 Ra5
26. Rcc3 Nc5 27. Rxc5 Bxc5 28. Rc3 Bb6 29. Kh3 Bd7 30. Rd3 Nd4 31. f5 Nxf5
32. Kg2 Bd4 33. b4 Ra6 34. Nf3 Bb2 35. a4 bxa4 36. Rd1 a3 37. Ne1 Ne3+ 38.
Kh3 Nxd1 39. Kh4 a2 40. Nc2 Ba4 41. Ne1 a1=Q 42. Nd3
&#123;White resigns&#125; 0-1
For those who can't be bothered to play through the game (admittedly, it isn't among the greatest games ever), white has blundered three pieces by move 20, and continues to make new blunders every few moves throughout the game. Not very different from the 2+1 game. The program is clearly extremely much stronger than a random mover, but also far closer to 800 than to 1800. In fact, I'm fairly sure it is also closer to 0 than to 800, but it is difficult to judge at such extremely low levels of play.

Suddenly it won't hang material, and plays decent chess...
I'm sure that's the case for Crafty, but for whatever reason, Stockfish is very different in this respect. With a random eval, it plays far weaker than any human beginner.

You said that null move and other types of pruning was disabled in Crafty at the lowest skill settings. In my test games, Stockfish used its usual search, with all tricks enabled. Could it be that these advanced search tricks perform much worse than plain alpha-beta with a random eval?

So many questions, so few answers.

I do not know. I only knew that I wanted to throttle the depth so that it would not be finding mates in 20 when supposedly playing at the 1000 level. Whether that actually helps or hurts is unknown. I'll add this to my list of things to try, that is normal crafty, with random eval, but everything else intact...

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team