Questions for the Stockfish team

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 9:21 pm

Yes it is unsigned. Random numbers from 0 to 1000 are taken as the eval.
Please don't blame me for carrying out these silly tests as this is exactly what he suggested originally,
which he confirmed just in his reply to your post! The ve/-ve for black/white or returning random numbers from a symmetric inteval about 0 is something which came up later..
I can test other suggestions but first I think it is better to understand the situation and what exactly is said in that paper.

regards,
Daniel

Dann Corbit · Post by **Dann Corbit** » Wed Jul 21, 2010 9:23 pm

Ralph Stoesser wrote:

Dann Corbit wrote: Here is how it ended up:

Code: Select all

   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %

It seems you have found a strong setting. What was the time control?

The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 9:44 pm

Hi Gerd
I agree totally. Just in case there is confusion, the previous post was not meant to address you..
Flat view and long threads could be confusing. I will try random number from symmetrical intervals next..

Daniel

Tord Romstad · Post by **Tord Romstad** » Wed Jul 21, 2010 9:58 pm

bob wrote:1 0 is not the best test. That restricts the depth enough that random eval fails, but use something like 1+1 or 2+2 and watch what happens.

As Larry pointed out, it wasn't 1 0, but 2+1. At any rate, here's what happens at 5+2:

Code: Select all

[Event "Test Game"]
[Site "Oslo"]
[Date "2010.07.21"]
[Round "-"]
[White "Stockfish 100720 64bit"]
[Black "tord"]
[Result "0-1"]
[TimeControl "5+2"]

1. e4 e6 2. d4 d5 3. Nd2 Nf6 4. e5 Nfd7 5. f4 c5 6. c3 Nc6 7. Ndf3 Qb6 8.
g3 cxd4 9. cxd4 Bb4+ 10. Kf2 g5 11. Nxg5 Qxd4+ 12. Qxd4 Nxd4 13. Be3 Nc2
14. Rc1 Nxe3 15. Bb5 Nf5 16. N1f3 Ba5 17. Rhd1 a6 18. a3 axb5 19. Nd4 Bb6
20. Kg2 Nxd4 21. Rc3 Nf5 22. Rdc1 Ke7 23. Rb3 Be3 24. Rc7 Bb6 25. Rc1 Ra5
26. Rcc3 Nc5 27. Rxc5 Bxc5 28. Rc3 Bb6 29. Kh3 Bd7 30. Rd3 Nd4 31. f5 Nxf5
32. Kg2 Bd4 33. b4 Ra6 34. Nf3 Bb2 35. a4 bxa4 36. Rd1 a3 37. Ne1 Ne3+ 38.
Kh3 Nxd1 39. Kh4 a2 40. Nc2 Ba4 41. Ne1 a1=Q 42. Nd3
{White resigns} 0-1

For those who can't be bothered to play through the game (admittedly, it isn't among the greatest games ever), white has blundered three pieces by move 20, and continues to make new blunders every few moves throughout the game. Not very different from the 2+1 game. The program is clearly extremely much stronger than a random mover, but also far closer to 800 than to 1800. In fact, I'm fairly sure it is also closer to 0 than to 800, but it is difficult to judge at such extremely low levels of play.

Suddenly it won't hang material, and plays decent chess...

I'm sure that's the case for Crafty, but for whatever reason, Stockfish is very different in this respect. With a random eval, it plays far weaker than any human beginner.

You said that null move and other types of pruning was disabled in Crafty at the lowest skill settings. In my test games, Stockfish used its usual search, with all tricks enabled. Could it be that these advanced search tricks perform much worse than plain alpha-beta with a random eval?

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Jul 21, 2010 10:10 pm

Daniel Shawul wrote:Yes it is unsigned. Random numbers from 0 to 1000 are taken as the eval.

Of course % 1000 is unsigned, dump question, sorry.
The pre-leaf side which maximizes the "negated" eval scores can only archive a heuristics draw then.

Please don't blame me for carrying out these silly tests as this is exactly what he suggested originally,
which he confirmed just in his reply to your post! The ve/-ve for black/white or returning random numbers from a symmetric inteval about 0 is something which came up later..

No way to blame you for testing! I think this random eval topic is quite interesting, quite complicated (for some of us). To understand search and search instability issues etc. Is Alpha-beta still equivalent to minimax here, I guess not. PVS and LMR-re-searches, shape of the tree. "Expected" all-nodes with lot of moves likely become cut-nodes. Not really near a "minimal tree" in PVS.

I can test other suggestions but first I think it is better to understand the situation and what exactly is said in that paper.

regards,
Daniel

Sure.

Regards,
Gerd

bob · Post by **bob** » Wed Jul 21, 2010 10:39 pm

Dann Corbit wrote:
Ralph Stoesser wrote:
Dann Corbit wrote: Here is how it ended up:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %
It seems you have found a strong setting. What was the time control?
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.

what was thiws, just a RR among the different skill settings, or were other opponents in the mix???

bob · Post by **bob** » Wed Jul 21, 2010 10:41 pm

jwes wrote:
bob wrote:It is definitely odd. Fortunately, to measure this stuff, I have the perfect facility here. The new thread I posted is the result of a 24 hour run to test 11 different skill settings for 30,000 games each.
Would it make any difference if you used the hash key to create the random score rather than random()? That would result in positions always having the same (random) evaluation.

I'd be concerned about the randomness in that regard, although I can test it later to see. The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.

I'll run it once the current attempt is finished to see...

AlvaroBegue · Post by **AlvaroBegue** » Wed Jul 21, 2010 10:44 pm

bob wrote:The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.

Why would the low-order bits in the hash not be essentially random?

Dann Corbit · Post by **Dann Corbit** » Wed Jul 21, 2010 10:49 pm

bob wrote:
Dann Corbit wrote:
Ralph Stoesser wrote:
Dann Corbit wrote: Here is how it ended up:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %
It seems you have found a strong setting. What was the time control?
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.
what was thiws, just a RR among the different skill settings, or were other opponents in the mix???

Simple round robin of the above programs. Everything makes sense except the zero skill entry.

bob · Post by **bob** » Wed Jul 21, 2010 10:56 pm

Daniel Shawul wrote:What does it take to convince you it is bad ??? Here are some games

scorpio_random : return (hash_key % 1000) at beginning of eval
scorpio_regular : the regular eval

REULT: Disastoruous 67 - 0 .. Every game tested my patience and indeed i can't take the awfulness after 67 games and had to stop it...

Crap games available for download here http://sites.google.com/site/dshawul/te ... ects=0&d=1

If you insist i can do million games or whatever it takes to convince you...

What is the rating for scorpio (normal)?

Then a math lesson. if A is 200 points weaker, it should win 1 of every 4 games played. If it is 400 points weaker, it should win 1 of every 16 games played. If it is 600 points weaker, it should only win 1 of every 64 games played. If it is 800 points weaker, 1 of every 256. See a pattern? So the random version could be only 800 weaker and not win a single game out of 67, and that would be perfectly normal... And even 0 out of 67 would not be that unusual in a 600 point weaker opponent.

So yes, the test is flawed. Badly. As always, _way_ too few games to measure this.

Also I am not sure that using rightmost 10 bits of hash signature is a good idea, nothing says those low order bits are that random since this is an XOR of a bunch of random numbers. May be OK, may not be.

The only issue I am trying to wrap my head around is whether I should be negating the numbers in positions where it is black to move, which may (or may not) be wrong, and which may (or may not) somehow make the thing play better. Studying this at the moment...

But before you jump to a conclusion about something, do a test that actually supports your conclusion. Why not just stop after 4 games if you are so convinced???

In the case of Crafty, which is probably 2800 on a single CPU with the hardware I am using, I need to drop it by 5 200 point increments. which means I would only expect to win maybe one game in every 1,000. Need a _bunch_ of games to measure that. In my case, I want to go _way_ below 1800. To get to 1200 I would expect to win one game out of 64,000 if I didn't blow the math mentally.

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team