lkaufman wrote:bob wrote:Tord Romstad wrote:.
No. with skill 1, what you get for a score from evaluate is this:
score = 0.01 * real_evaluation + .99 * random();
where random() returns a value between 0 and 100 (0 to 1 pawn).
With skill 1, the material/positional score is almost nothing, the remainder of the score is a pure random number.
1 0 is not the best test. That restricts the depth enough that random eval fails, but use something like 1+1 or 2+2 and watch what happens. Suddenly it won't hang material, and plays decent chess...
Two points: The game Tord cited was 2'+1", so he already followed your advice in advance. So there is some discrepancy between his findings and yours. The only obvious culprit is the 0.01 weight on real eval; it's not much, but maybe it biases things enough in favor of good moves to make the difference between 800 and 1800. Hard to believe, but you should play a version with zero weight for real eval against some weak program with a rating of maybe 1600 to see what happens. Do you have any other explanation for the huge difference between Crafty random and Stockfish random? With eval not an issue and with LMR and such turned off in Crafty, there can hardly be an explanation in the difference between th programs.
Somewhere in there I saw a 1+0. That was what I was basing my opinion on. Don't recall whether it was in the PGN or what, will try to look back to see.
I just ran a test with pure random eval between 0 and 100 (0 and 1 pawn). No difference at all.
When you think about it, if you play NxN and the program should recapture, you get two choices:
-300 + random() which will produce a score of -3 + random() at skill level 1. Or you can get 0 + random() when you recapture rather than leave the piece hanging. You really think that +3 matters? In any case, I ran with pure random and the thing plays pretty well. Not well enough that it will beat me, but it doesn't hang material (maybe an occasional (rarely) pawn, but that's about it. It never seems to miss a recapture, or hung material, most likely because of the random (Beal) effect.
I don't see any difference to explain. My results are not based on my testing. We had a thread here a few weeks back about someone complaining that way back, skill 1 was around 800, but it had slowly moved up to almost +1800. I looked to see if I had broken the code, and anyone can look at the 23.2 source code to see what skill 1 will do. I simply verified this by hand. I don't have an easy way to test down on that end of the scale, the lowest-rated opponent I have on my cluster is glaurung 1.something, and it is in the 2500 range. Testing something you want to be under 1,000 simjply can't be done with that as the worst opponent.
Believe me, this is real, not imagined. I tried it myself when it first came up. I had _never_ tested with skill level 1 before, and had assumed that this was going to play beyond ugly. Amazingly it played reasonable-looking chess for the most part.
I now have a solution, but I still have the problem of testing on the low end. I now knock the NPS down so that by the time I get to skill 1, the thing is running about 1K nodes per second, which is on down there. Only question is what is the Elo? We are about ready to release this and let those guys test it and give me feedback...
I did go back and look and might have well looked at the 1 0 result and thought that went with the time control entry on the next line... My laptop has a small font and my eyes are 62 years old now.
