Yes it is unsigned. Random numbers from 0 to 1000 are taken as the eval.
Please don't blame me for carrying out these silly tests as this is exactly what he suggested originally,
which he confirmed just in his reply to your post! The ve/-ve for black/white or returning random numbers from a symmetric inteval about 0 is something which came up later..
I can test other suggestions but first I think it is better to understand the situation and what exactly is said in that paper.
regards,
Daniel
Questions for the Stockfish team
Moderators: hgm, Rebel, chrisw
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
- Posts: 12606
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Questions for the Stockfish team
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Questions for the Stockfish team
Hi Gerd
I agree totally. Just in case there is confusion, the previous post was not meant to address you..
Flat view and long threads could be confusing. I will try random number from symmetrical intervals next..
Daniel
I agree totally. Just in case there is confusion, the previous post was not meant to address you..
Flat view and long threads could be confusing. I will try random number from symmetrical intervals next..
Daniel
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Questions for the Stockfish team
As Larry pointed out, it wasn't 1 0, but 2+1. At any rate, here's what happens at 5+2:bob wrote:1 0 is not the best test. That restricts the depth enough that random eval fails, but use something like 1+1 or 2+2 and watch what happens.
Code: Select all
[Event "Test Game"]
[Site "Oslo"]
[Date "2010.07.21"]
[Round "-"]
[White "Stockfish 100720 64bit"]
[Black "tord"]
[Result "0-1"]
[TimeControl "5+2"]
1. e4 e6 2. d4 d5 3. Nd2 Nf6 4. e5 Nfd7 5. f4 c5 6. c3 Nc6 7. Ndf3 Qb6 8.
g3 cxd4 9. cxd4 Bb4+ 10. Kf2 g5 11. Nxg5 Qxd4+ 12. Qxd4 Nxd4 13. Be3 Nc2
14. Rc1 Nxe3 15. Bb5 Nf5 16. N1f3 Ba5 17. Rhd1 a6 18. a3 axb5 19. Nd4 Bb6
20. Kg2 Nxd4 21. Rc3 Nf5 22. Rdc1 Ke7 23. Rb3 Be3 24. Rc7 Bb6 25. Rc1 Ra5
26. Rcc3 Nc5 27. Rxc5 Bxc5 28. Rc3 Bb6 29. Kh3 Bd7 30. Rd3 Nd4 31. f5 Nxf5
32. Kg2 Bd4 33. b4 Ra6 34. Nf3 Bb2 35. a4 bxa4 36. Rd1 a3 37. Ne1 Ne3+ 38.
Kh3 Nxd1 39. Kh4 a2 40. Nc2 Ba4 41. Ne1 a1=Q 42. Nd3
{White resigns} 0-1
I'm sure that's the case for Crafty, but for whatever reason, Stockfish is very different in this respect. With a random eval, it plays far weaker than any human beginner.Suddenly it won't hang material, and plays decent chess...
You said that null move and other types of pruning was disabled in Crafty at the lowest skill settings. In my test games, Stockfish used its usual search, with all tricks enabled. Could it be that these advanced search tricks perform much worse than plain alpha-beta with a random eval?
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Questions for the Stockfish team
Of course % 1000 is unsigned, dump question, sorry.Daniel Shawul wrote:Yes it is unsigned. Random numbers from 0 to 1000 are taken as the eval.
The pre-leaf side which maximizes the "negated" eval scores can only archive a heuristics draw then.
No way to blame you for testing! I think this random eval topic is quite interesting, quite complicated (for some of us). To understand search and search instability issues etc. Is Alpha-beta still equivalent to minimax here, I guess not. PVS and LMR-re-searches, shape of the tree. "Expected" all-nodes with lot of moves likely become cut-nodes. Not really near a "minimal tree" in PVS.Please don't blame me for carrying out these silly tests as this is exactly what he suggested originally,
which he confirmed just in his reply to your post! The ve/-ve for black/white or returning random numbers from a symmetric inteval about 0 is something which came up later..
Sure.I can test other suggestions but first I think it is better to understand the situation and what exactly is said in that paper.
regards,
Daniel
Regards,
Gerd
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
what was thiws, just a RR among the different skill settings, or were other opponents in the mix???Dann Corbit wrote:The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
I'd be concerned about the randomness in that regard, although I can test it later to see. The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.jwes wrote:Would it make any difference if you used the hash key to create the random score rather than random()? That would result in positions always having the same (random) evaluation.bob wrote:It is definitely odd. Fortunately, to measure this stuff, I have the perfect facility here. The new thread I posted is the result of a 24 hour run to test 11 different skill settings for 30,000 games each.
I'll run it once the current attempt is finished to see...
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Questions for the Stockfish team
Why would the low-order bits in the hash not be essentially random?bob wrote:The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.
-
- Posts: 12606
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Questions for the Stockfish team
Simple round robin of the above programs. Everything makes sense except the zero skill entry.bob wrote:what was thiws, just a RR among the different skill settings, or were other opponents in the mix???Dann Corbit wrote:The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
What is the rating for scorpio (normal)?Daniel Shawul wrote:What does it take to convince you it is bad ??? Here are some games
scorpio_random : return (hash_key % 1000) at beginning of eval
scorpio_regular : the regular eval
REULT: Disastoruous 67 - 0 .. Every game tested my patience and indeed i can't take the awfulness after 67 games and had to stop it...
Crap games available for download here http://sites.google.com/site/dshawul/te ... ects=0&d=1
If you insist i can do million games or whatever it takes to convince you...
Then a math lesson. if A is 200 points weaker, it should win 1 of every 4 games played. If it is 400 points weaker, it should win 1 of every 16 games played. If it is 600 points weaker, it should only win 1 of every 64 games played. If it is 800 points weaker, 1 of every 256. See a pattern? So the random version could be only 800 weaker and not win a single game out of 67, and that would be perfectly normal... And even 0 out of 67 would not be that unusual in a 600 point weaker opponent.
So yes, the test is flawed. Badly. As always, _way_ too few games to measure this.
Also I am not sure that using rightmost 10 bits of hash signature is a good idea, nothing says those low order bits are that random since this is an XOR of a bunch of random numbers. May be OK, may not be.
The only issue I am trying to wrap my head around is whether I should be negating the numbers in positions where it is black to move, which may (or may not) be wrong, and which may (or may not) somehow make the thing play better. Studying this at the moment...
But before you jump to a conclusion about something, do a test that actually supports your conclusion. Why not just stop after 4 games if you are so convinced???
In the case of Crafty, which is probably 2800 on a single CPU with the hardware I am using, I need to drop it by 5 200 point increments. which means I would only expect to win maybe one game in every 1,000. Need a _bunch_ of games to measure that. In my case, I want to go _way_ below 1800. To get to 1200 I would expect to win one game out of 64,000 if I didn't blow the math mentally.