This is not "my estimate". This happened in a general forum post where someone that was maintaining such a "crippled list" clearly had Crafty way stronger than it was supposed to be. And there were no 2000+ programs in this list, because of the error that introduces...Gerd Isenberg wrote:Hi Daniel,Daniel Shawul wrote:Gerd,
My objection was to a completely random evaluation which I tried to outline as much as I can here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455. Now we have come to apparent consensus how the score of one side should be negated to the other side for minimax to work... This was originally absent from his reply to me but somehow expected me to understand even after giving me pseudocode how to do it..
I gave up the point that it does some weird kind of mobility evaluation the minute Marco posted it. But I pointed out how bad that eval is and how one sided it is completely disregarding the perfect information game assumption. The supposed engine evaluates like 'poker' , like it can't see what the opponent has to offer. It just evaluates its mobility and goes on... See points c & d of my post in the link above.
Did they (Don Beal) say a 1800 elo engine can be constructed this way ? Even he (Bob) himself didn't belive it when people first told him it plays like 1800. He thought it played like 800 (said it in this thread ofcourse). I can't say what crafty does / does not do, that is why I am sticking to what he says about the effect with random eval and I am definately not getting a 1800 elo engine.
Daniel
yes, you can not weigh one's words in this threads and there are tons of misunderstandings due to implicit knowledge and assumptions about the point, selective reading, impatience etc..
Apparently, if the random range per side is symmetric around zero, you may use the same "evaluation" function for white and black, of course no matter whether you use explicit min versus max or negamax. I guess if both white and black to move leaf nodes have always "random" winning scores from their negamax perspective let say in the 1000..2000 range (> 15000 mate) things are obviously different, specially with all usual ID, TT, extension stuff etc.. I guess that the latter has some more search "instability problems" and I expect it weaker than the symmetric one.
I don't have that ICCA Journal handy, and I'm not sure I ever read Beal's article. I remember some discussions. So no idea on any Elo-figures. May be Bob's 1800 claim is a bit overestimated due to huge error by some "random" wins or draws against otherwise > 2800 engines, which don't apply any opponent model approach. On the other hand such random eval engines still have mate scores.
Gerd
Questions for the Stockfish team
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Questions for the Stockfish team
Position independent random would give same positions, whether re-searched or transposed, different and not same scores.AlvaroBegue wrote:Why would the low-order bits in the hash not be essentially random?bob wrote:The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
However, that ordering is done by q-searches, so it is ordered based on random results, which is _really_ bad. If there are no captures, each root move is scored (by q-search) as a pure random number.jwes wrote:I just had another idea about why crafty plays too well. Since the search will always choose the first move that returns the minimax score, the move that your move ordering picks first is very likely to be chosen.
-
- Posts: 408
- Joined: Sat Mar 06, 2010 9:28 am
Re: Questions for the Stockfish team
Dann Corbit wrote:The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
In Crafty-23.2, the range of numbers is 0 <= N < 99, so between 0.00 and 0.99. The only thing I do that needs some thought is that these numbers are negated if black is on move. I have not yet concluded exactly what the effect of that is.Gerd Isenberg wrote:Not everybody likes to download crafty sources to inspect your code on the fly. It is much more appropriate and convenient for the readers if you explain it with patiencebob wrote:I said, quite clearly, that I have tried _all_ of those things, if you read. The original discussion was about Crafty's "skill 1" performance. Just look at 23.2, search for SKILL in evaluate.c. It is not _that_ complicated to understand what I do. skill 100 simply uses 100% of normal eval. Which is negated at the return point in evaluate.c if it is not WTM since the eval is based on +=good for white.Daniel Shawul wrote:You said random evaluation at first, and then you started bringing
order first by 0.01 * real eval which I strenously objected to,
then you said eval of white = -eval of black which further breaks the random nature of the eval, period.
I will not try to convince anyone further. Anyone interested to know my position can read all the issues
I raised here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455 with the perspective of random eval and take their own conclusion.
It really doesn't help if you post voluminous game resutls with different setup than what was discussed.
This is basically a strawman argument from you which neglects the complete random evaluation criteria
you originally proposed.
They say insanity is doing the same thing over and over again and expecting different results.
I say it is expecting a consistent miracle from a random event.
for skill 1, you get:
score = .1 * score + .99 * random()
then it does the normal
return (wtm) ? score : - score;
That has been there since the skill command was added. I assume that if you jump into a discussion, you at least know what it is about. Which would include this information. I then pointed out that I had tried a pure random number with no positional component at all. just:
score = random();
with the usual return.
If you use negamax, then yes, you have to negate the score because black wants the biggest score, but that has to be the opposite of what white wants. So, the normal "colorful" return is used.
And that does give a pseudo-mobility that works just fine. Too fine, in fact...
So before we go on, how about looking at the statement following the "SKILL" token in evaluate.c, so that we are talking about the same thing. I am _always_ talking about what I do in Crafty, not what I imagine others are doing in their programs...
What is the value range of random? Is it symmetrical around zero or not?
How would this three more or less random evals play?
Code: Select all
negamaxEval ::= rand() > 0.5 ? MATE_SCORE/2 : -MATE_SCORE/2 negamaxEval ::= wtm ? MATE_SCORE/2 : -MATE_SCORE/2; negamaxEval ::= MATE_SCORE/2; // both are winning at a leaf
I think all 3 of the above suggestions will negate the "Beal effect" because they are not random, and you either get A or B, but nothing in between. The Beal effect depends on a range of numbers assigned with equal probability, so that the more moves there are for one side, the more opportunities that side has to get a bigger number...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
In my tests (so far) skill=1 and skill=0 are scoring the same. And worse than skill=10, which is worse than skill=20, etc. I'd expect the skill = -N versions to play worse because rather than doing null-move searches that reduce the depth to save time, the null-move searches are actually deeper than the normal search which will slow things down significantly and make it play weaker, which is what happened. It almost looks like the skill=0 version is using a normal eval somehow...Ralph Stoesser wrote:Dann Corbit wrote:The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Questions for the Stockfish team
Here is another 300-0 this time with real random() betweeen 0 - 1000 NOT (hash_key % 1000) as I don't want to let you make that an issue...
crap games 2 http://sites.google.com/site/dshawul/te ... ects=0&d=1
Expect 30000 games in a couple of hours. All I need is more opponent and more postions...
crap games 2 http://sites.google.com/site/dshawul/te ... ects=0&d=1
Expect 30000 games in a couple of hours. All I need is more opponent and more postions...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
You are at position P, you make a move and then either generate a random number or use the lower N bits. Then you make another move. Are you certain that the low-order bits of the random number you use to update the board alters the lower N bits, as opposed to just altering a significant number of other bits, which is all we want for our Zobrist hashing. This "Beal effect" depends on uniformly distributed random numbers to come up with this pseudo-mobility stuff... I am not certain the hash signature will offer this. It might, and I am going to test it, but I am not sure enough to say good or bad at the moment. It would certainly be different, however.AlvaroBegue wrote:Why would the low-order bits in the hash not be essentially random?bob wrote:The danger is that with real random numbers, they change significantly, but using the hash signature, it is possible for several positions to share low-order bits, which could be a problem.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Questions for the Stockfish team
This is actually a method I came up with in this thread to avoid breaking this same logic. Also suggested to use +ve/-ve for white black not to break minimax. But the result remains the same 300-0 to non-random..
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
So many questions, so few answers.Tord Romstad wrote:As Larry pointed out, it wasn't 1 0, but 2+1. At any rate, here's what happens at 5+2:bob wrote:1 0 is not the best test. That restricts the depth enough that random eval fails, but use something like 1+1 or 2+2 and watch what happens.
For those who can't be bothered to play through the game (admittedly, it isn't among the greatest games ever), white has blundered three pieces by move 20, and continues to make new blunders every few moves throughout the game. Not very different from the 2+1 game. The program is clearly extremely much stronger than a random mover, but also far closer to 800 than to 1800. In fact, I'm fairly sure it is also closer to 0 than to 800, but it is difficult to judge at such extremely low levels of play.Code: Select all
[Event "Test Game"] [Site "Oslo"] [Date "2010.07.21"] [Round "-"] [White "Stockfish 100720 64bit"] [Black "tord"] [Result "0-1"] [TimeControl "5+2"] 1. e4 e6 2. d4 d5 3. Nd2 Nf6 4. e5 Nfd7 5. f4 c5 6. c3 Nc6 7. Ndf3 Qb6 8. g3 cxd4 9. cxd4 Bb4+ 10. Kf2 g5 11. Nxg5 Qxd4+ 12. Qxd4 Nxd4 13. Be3 Nc2 14. Rc1 Nxe3 15. Bb5 Nf5 16. N1f3 Ba5 17. Rhd1 a6 18. a3 axb5 19. Nd4 Bb6 20. Kg2 Nxd4 21. Rc3 Nf5 22. Rdc1 Ke7 23. Rb3 Be3 24. Rc7 Bb6 25. Rc1 Ra5 26. Rcc3 Nc5 27. Rxc5 Bxc5 28. Rc3 Bb6 29. Kh3 Bd7 30. Rd3 Nd4 31. f5 Nxf5 32. Kg2 Bd4 33. b4 Ra6 34. Nf3 Bb2 35. a4 bxa4 36. Rd1 a3 37. Ne1 Ne3+ 38. Kh3 Nxd1 39. Kh4 a2 40. Nc2 Ba4 41. Ne1 a1=Q 42. Nd3 {White resigns} 0-1
I'm sure that's the case for Crafty, but for whatever reason, Stockfish is very different in this respect. With a random eval, it plays far weaker than any human beginner.Suddenly it won't hang material, and plays decent chess...
You said that null move and other types of pruning was disabled in Crafty at the lowest skill settings. In my test games, Stockfish used its usual search, with all tricks enabled. Could it be that these advanced search tricks perform much worse than plain alpha-beta with a random eval?
I do not know. I only knew that I wanted to throttle the depth so that it would not be finding mates in 20 when supposedly playing at the 1000 level. Whether that actually helps or hurts is unknown. I'll add this to my list of things to try, that is normal crafty, with random eval, but everything else intact...