Questions for the Stockfish team

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12550
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Questions for the Stockfish team

Post by Dann Corbit »

bob wrote:
Ralph Stoesser wrote:
Dann Corbit wrote:
Ralph Stoesser wrote:
Dann Corbit wrote: Here is how it ended up:

Code: Select all

   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %
It seems you have found a strong setting. What was the time control?
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.

It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.;)
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.
In my tests (so far) skill=1 and skill=0 are scoring the same. And worse than skill=10, which is worse than skill=20, etc. I'd expect the skill = -N versions to play worse because rather than doing null-move searches that reduce the depth to save time, the null-move searches are actually deeper than the normal search which will slow things down significantly and make it play weaker, which is what happened. It almost looks like the skill=0 version is using a normal eval somehow...
That is what happened. I had made the change to crafty.rc using UltraEdit, but failed to save that particular file to disk. So it is (in fact) skill=100 and not skill = 0
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

Daniel Shawul wrote:Here is another 300-0 this time with real random() betweeen 0 - 1000 NOT (hash_key % 1000) as I don't want to let you make that an issue...
crap games 2 http://sites.google.com/site/dshawul/te ... ects=0&d=1
Expect 30000 games in a couple of hours. All I need is more opponent and more postions...
What is the time control? If you read Beal's paper, this is a function of search depth as well. Game in 1 sec is not going to do it, that is how I am making the current version play very poorly, using random eval and very slow NPS to drop the depth back.

This is an example where time is important. The list that originally raised this issue was probably something like 30+ minute games, which makes this even more messy.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Questions for the Stockfish team

Post by Daniel Shawul »

Oh so now you are afraid of the result so you start complaining it should be 30 min per game, so that it would take decades. People are not stupid and will see the light at the end of the tunnel !

Just specify your conditions here so that we will get this crap done and dusted once and for all.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

Dann Corbit wrote:
bob wrote:
Ralph Stoesser wrote:
Dann Corbit wrote:
Ralph Stoesser wrote:
Dann Corbit wrote: Here is how it ended up:

Code: Select all

   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %
It seems you have found a strong setting. What was the time control?
The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.

The time control was game in one minute + one second Fischer time increment.

The machine was 4x3GHz, no ponder 64 bit crafty.

It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.;)
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.
In my tests (so far) skill=1 and skill=0 are scoring the same. And worse than skill=10, which is worse than skill=20, etc. I'd expect the skill = -N versions to play worse because rather than doing null-move searches that reduce the depth to save time, the null-move searches are actually deeper than the normal search which will slow things down significantly and make it play weaker, which is what happened. It almost looks like the skill=0 version is using a normal eval somehow...
That is what happened. I had made the change to crafty.rc using UltraEdit, but failed to save that particular file to disk. So it is (in fact) skill=100 and not skill = 0
And sanity returns to the world. :)

Been there, done that, got the T-shirt too. :)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

Daniel Shawul wrote:This is actually a method I came up with in this thread to avoid breaking this same logic. Also suggested to use +ve/-ve for white black not to break minimax. But the result remains the same 300-0 to non-random..
OK, let's talk about this "breaking minimax." This idea is based on the fact that at any node in the tree, the more moves I have, the better the score I will get, because the scores are completely random. Their values are meaningless as far as chess is concerned. But there is nothing that says scores must be distributed above and below zero. Yet minimax still works just fine. Just start a game with rook odds and test this. You won't be seeing negative scores, yet things work just fine. Or bias your static evaluation by +30,000 which will make _all_ scores > 0. Won't break minimax at all.

I've explained the idea for a 2 ply search. At ply 1, I have N moves to try. I try each, one at a time, and then recursively call search. At ply 2, for each ply 1 move I will have a different number of possible moves. And for each different ply-2 position, I make the moves and get a random number. I choose the path with the largest number, and then negate that and pass it back to ply 1. So ply 1 will effectively choose to make the move that gives ply-2 the smallest score (and smallest mobility). You can then pretend this 2 ply search is done after another root search to make it a 3 ply search.

Minimax makes it work that what is good for one side is bad for the other. All we are using is a simple random distribution, knowing that the more samples we make, the better the chance we get a good number for ourselves, and the worse the position is (mobility-wise) for our opponent. The random number doesn't mean a thing in this experiment. It is just something that drags us toward more mobility for ourself and less for our opponent. nothing more, nothing less...
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Questions for the Stockfish team

Post by michiguel »

Daniel Shawul wrote:You can always use your eval to guide your search,by looking at the value before and after the move to make reduction/pruning decisions. That doesn't necessarily require you to write eval first, except maybe you need to evaluate at every internal node. Specific evaluation features like king safety , passed pawns or other significant positional terms , are sometimes used to trigger search extensions and guide the engine to/away from these positions.. But i honestly can't think of more.

Also the question is not about which part to do first but which gives more benefits in an ELO per time spent perspective. All parts of the engine are ofcourse important and should be designed to compliment one another. But the importance of search over eval has already been demonstrated by engines like Olithink, Fruit etc.. while I am yet to see the reverse (very good eval with an average search).
Gaviota has a very primitive search (only nullmove + traditional extensions) and it is grossly outsearched by Olithink, but Gaviota is somewhat stronger.

I do not even consider Gaviota's eval to be good at all and there are way too many things to be done yet.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Questions for the Stockfish team

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.

It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...
And that is where most (all?) engines had the biggest holes in evaluation... endgame!

Miguel
I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.
Sorry, the endgame analysis of any engine have huge holes, including Crafty.

Miguel

My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Questions for the Stockfish team

Post by BubbaTough »

michiguel wrote: Sorry, the endgame analysis of any engine have huge holes, including Crafty.
I think huge holes is a perfect description. There is no question that in many endgames, computers are better at playing them than humans. That leads many to say computers are good at endgames. On the other hand, there are a large number of important types of endgames computers really don't understand, and that strong humans players do understand and can evaluate MUCH better (and in some cases player better). This leads many people to say computers are bad at endgames. I would rather ask a GM (or in some cases even a lowly master) about a pawn up rook endgame, a potential fortress, an opposite color bishop endgame, and many many others, than a computer. These areas are not insolvable in computer chess. Its just that the return on hours spent in terms of ELO working in this area is low compared to most other areas so it receives little attention.

-Sam
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Questions for the Stockfish team

Post by jwes »

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.

It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...
And that is where most (all?) engines had the biggest holes in evaluation... endgame!

Miguel
I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.
Sorry, the endgame analysis of any engine have huge holes, including Crafty.

Miguel
I agree. I think this is the greatest edge that human grandmasters still have over computers, is that they can evaluate an endgame as won or drawn while a computer might give it a score of 1.5 and the gm might possibly sacrifice material to bring about such an endgame.
bob wrote: My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
I have been seeing that too. Some endgame test positions stockfish solves 100x faster than crafty.
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Questions for the Stockfish team

Post by BubbaTough »

jwes wrote:
bob wrote: My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
I have been seeing that too. Some endgame test positions stockfish solves 100x faster than crafty.
When its 100x faster, its often because the evaluation is giving hints to the search. These two things are not as independent as most people seem to imply.

-Sam