That is what happened. I had made the change to crafty.rc using UltraEdit, but failed to save that particular file to disk. So it is (in fact) skill=100 and not skill = 0bob wrote:In my tests (so far) skill=1 and skill=0 are scoring the same. And worse than skill=10, which is worse than skill=20, etc. I'd expect the skill = -N versions to play worse because rather than doing null-move searches that reduce the depth to save time, the null-move searches are actually deeper than the normal search which will slow things down significantly and make it play weaker, which is what happened. It almost looks like the skill=0 version is using a normal eval somehow...Ralph Stoesser wrote:Dann Corbit wrote:The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.
Questions for the Stockfish team
Moderators: hgm, Rebel, chrisw
-
- Posts: 12606
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Questions for the Stockfish team
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
What is the time control? If you read Beal's paper, this is a function of search depth as well. Game in 1 sec is not going to do it, that is how I am making the current version play very poorly, using random eval and very slow NPS to drop the depth back.Daniel Shawul wrote:Here is another 300-0 this time with real random() betweeen 0 - 1000 NOT (hash_key % 1000) as I don't want to let you make that an issue...
crap games 2 http://sites.google.com/site/dshawul/te ... ects=0&d=1
Expect 30000 games in a couple of hours. All I need is more opponent and more postions...
This is an example where time is important. The list that originally raised this issue was probably something like 30+ minute games, which makes this even more messy.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Questions for the Stockfish team
Oh so now you are afraid of the result so you start complaining it should be 30 min per game, so that it would take decades. People are not stupid and will see the light at the end of the tunnel !
Just specify your conditions here so that we will get this crap done and dusted once and for all.
Just specify your conditions here so that we will get this crap done and dusted once and for all.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
And sanity returns to the world.Dann Corbit wrote:That is what happened. I had made the change to crafty.rc using UltraEdit, but failed to save that particular file to disk. So it is (in fact) skill=100 and not skill = 0bob wrote:In my tests (so far) skill=1 and skill=0 are scoring the same. And worse than skill=10, which is worse than skill=20, etc. I'd expect the skill = -N versions to play worse because rather than doing null-move searches that reduce the depth to save time, the null-move searches are actually deeper than the normal search which will slow things down significantly and make it play weaker, which is what happened. It almost looks like the skill=0 version is using a normal eval somehow...Ralph Stoesser wrote:Dann Corbit wrote:The strong setting that I found is only available with a customized change that allows skill=0. It makes no real sense that such a setting is stronger.Ralph Stoesser wrote:It seems you have found a strong setting. What was the time control?Dann Corbit wrote: Here is how it ended up:Code: Select all
Program Elo + - Games Score Av.Op. Draws 1 Crafty-232ap00 : 3344 133 121 55 90.0 % 2963 12.7 % 2 Crafty-23.2a-skill-mod : 3270 113 105 55 83.6 % 2986 14.5 % 3 Crafty-232ap50 : 3179 102 97 55 75.5 % 2984 12.7 % 4 Crafty-232ap10 : 3100 88 86 55 63.6 % 3003 18.2 % 5 Crafty-232ap01 : 2945 87 88 55 39.1 % 3022 16.4 % 6 Crafty-232am01 : 2889 90 94 55 30.0 % 3036 16.4 % 7 Crafty-232am10 : 2788 113 126 55 18.2 % 3049 3.6 % 8 Crafty-232am50 : 2486 0 0 55 0.0 % 3086 0.0 %
The time control was game in one minute + one second Fischer time increment.
The machine was 4x3GHz, no ponder 64 bit crafty.
It makes really no sense. But if we would be in Wonderland, a better PRNG would have played even stronger at skill=0.
Are there any chances you could have mixed up the players in the tournament? If not, then it's time for a little debug session.
Been there, done that, got the T-shirt too.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
OK, let's talk about this "breaking minimax." This idea is based on the fact that at any node in the tree, the more moves I have, the better the score I will get, because the scores are completely random. Their values are meaningless as far as chess is concerned. But there is nothing that says scores must be distributed above and below zero. Yet minimax still works just fine. Just start a game with rook odds and test this. You won't be seeing negative scores, yet things work just fine. Or bias your static evaluation by +30,000 which will make _all_ scores > 0. Won't break minimax at all.Daniel Shawul wrote:This is actually a method I came up with in this thread to avoid breaking this same logic. Also suggested to use +ve/-ve for white black not to break minimax. But the result remains the same 300-0 to non-random..
I've explained the idea for a 2 ply search. At ply 1, I have N moves to try. I try each, one at a time, and then recursively call search. At ply 2, for each ply 1 move I will have a different number of possible moves. And for each different ply-2 position, I make the moves and get a random number. I choose the path with the largest number, and then negate that and pass it back to ply 1. So ply 1 will effectively choose to make the move that gives ply-2 the smallest score (and smallest mobility). You can then pretend this 2 ply search is done after another root search to make it a 3 ply search.
Minimax makes it work that what is good for one side is bad for the other. All we are using is a simple random distribution, knowing that the more samples we make, the better the chance we get a good number for ourselves, and the worse the position is (mobility-wise) for our opponent. The random number doesn't mean a thing in this experiment. It is just something that drags us toward more mobility for ourself and less for our opponent. nothing more, nothing less...
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Questions for the Stockfish team
Gaviota has a very primitive search (only nullmove + traditional extensions) and it is grossly outsearched by Olithink, but Gaviota is somewhat stronger.Daniel Shawul wrote:You can always use your eval to guide your search,by looking at the value before and after the move to make reduction/pruning decisions. That doesn't necessarily require you to write eval first, except maybe you need to evaluate at every internal node. Specific evaluation features like king safety , passed pawns or other significant positional terms , are sometimes used to trigger search extensions and guide the engine to/away from these positions.. But i honestly can't think of more.
Also the question is not about which part to do first but which gives more benefits in an ELO per time spent perspective. All parts of the engine are ofcourse important and should be designed to compliment one another. But the importance of search over eval has already been demonstrated by engines like Olithink, Fruit etc.. while I am yet to see the reverse (very good eval with an average search).
I do not even consider Gaviota's eval to be good at all and there are way too many things to be done yet.
Miguel
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Questions for the Stockfish team
Sorry, the endgame analysis of any engine have huge holes, including Crafty.bob wrote:I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.michiguel wrote:And that is where most (all?) engines had the biggest holes in evaluation... endgame!bob wrote:OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.
It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
Miguel
Miguel
My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
-
- Posts: 1154
- Joined: Fri Jun 23, 2006 5:18 am
Re: Questions for the Stockfish team
I think huge holes is a perfect description. There is no question that in many endgames, computers are better at playing them than humans. That leads many to say computers are good at endgames. On the other hand, there are a large number of important types of endgames computers really don't understand, and that strong humans players do understand and can evaluate MUCH better (and in some cases player better). This leads many people to say computers are bad at endgames. I would rather ask a GM (or in some cases even a lowly master) about a pawn up rook endgame, a potential fortress, an opposite color bishop endgame, and many many others, than a computer. These areas are not insolvable in computer chess. Its just that the return on hours spent in terms of ELO working in this area is low compared to most other areas so it receives little attention.michiguel wrote: Sorry, the endgame analysis of any engine have huge holes, including Crafty.
-Sam
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: Questions for the Stockfish team
I agree. I think this is the greatest edge that human grandmasters still have over computers, is that they can evaluate an endgame as won or drawn while a computer might give it a score of 1.5 and the gm might possibly sacrifice material to bring about such an endgame.michiguel wrote:Sorry, the endgame analysis of any engine have huge holes, including Crafty.bob wrote:I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.michiguel wrote:And that is where most (all?) engines had the biggest holes in evaluation... endgame!bob wrote:OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.
It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
Miguel
Miguel
I have been seeing that too. Some endgame test positions stockfish solves 100x faster than crafty.bob wrote: My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
-
- Posts: 1154
- Joined: Fri Jun 23, 2006 5:18 am
Re: Questions for the Stockfish team
When its 100x faster, its often because the evaluation is giving hints to the search. These two things are not as independent as most people seem to imply.jwes wrote:I have been seeing that too. Some endgame test positions stockfish solves 100x faster than crafty.bob wrote: My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
-Sam