Questions for the Stockfish team

mcostalba · Post by **mcostalba** » Fri Jul 16, 2010 6:49 pm

bob wrote:
mcostalba wrote:
bob wrote: Don't follow your comments.
I have used official CEGT lists for single core CPU as reference for the numbers I have given.
I use my cluster test results exclusively.

I know this

But you cannot simply ignore public lists do exsist and are the independent official sources for engine comparisons. In a public discussion on engine comparison we should stick to them because are guaranteed to be independent.

You, as an engine author, should answer yourself a much more interesting question: "Why my ELO references are _so_ different from the public ones ?"....and then you have to choices: try to stick to them or prove them are garbage.

Michael Sherwin · Post by **Michael Sherwin** » Fri Jul 16, 2010 7:05 pm

mcostalba wrote:
bob wrote:
mcostalba wrote:
bob wrote: Don't follow your comments.
I have used official CEGT lists for single core CPU as reference for the numbers I have given.
I use my cluster test results exclusively.
I know this

But you cannot simply ignore public lists do exsist and are the independent official sources for engine comparisons. In a public discussion on engine comparison we should stick to them because are guaranteed to be independent.

You, as an engine author, should answer yourself a much more interesting question: "Why my ELO references are _so_ different from the public ones ?"....and then you have to choices: try to stick to them or prove them are garbage.

Are you happy with your little war? If you do not wish to be helpful then just butt out and go away!

jdart · Post by **jdart** » Fri Jul 16, 2010 7:18 pm

All the pieces interact, especially eval and search. Search tricks, especially pruning, that work with one eval function may perform poorly with a different one.

That makes it hard to say what works and doesn't, outside of a particular program.

I've tried variable LMR depth for example and it doesn't work for me, so far.

Unfortunately I think most programmers are just trying various things and keeping the variants that work well, but we aren't seeing a lot of reports about what didn't work, or published measurements of what improvement did or didn't come from particular changes. There has been research of this kind done in the past - by Ernst Heinz for example. But if you're a commercial programmer, especially, you aren't motivated to publish anything of this kind.

I also have some concern that when you're trying to optimize results by tweaking a lot of variables, whether they are eval terms or futility margins, etc., you have a risk of hitting a local maximum that looks good but you may not actually be finding the global maximum where all variables are optimal. This may be why some programs are doing "all the right things" but not performing as well.

--Jon

Tord Romstad · Post by **Tord Romstad** » Fri Jul 16, 2010 7:35 pm

Please, guys -- let's avoid having this thread degenerate into a Crafty vs Stockfish flamewar. I'm pretty sure none of us wants that.

I think I speak on behalf of the whole Stockfish team (and most of the readers of this forum) when I say that I have a tremendous respect and admiration for Bob and Crafty. Because Stockfish and Crafty really belong to two entirely different categories of software (one is a UCI engine, the other is a full-featured standalone chess program), it doesn't make sense to compare them directly and say that one of them is better or more successful than the other. They're both great. Moreover, being open source programmers, we're on the same team. If we absolutely have to fight someone (which I would prefer to avoid), let's target those who prefer to keep everything secret.

As to why Stockfish currently does better than Crafty on the public rating lists, I really have no idea. It's been a very long time since I looked at Crafty's source code (perhaps it's try to do it again soon!), and I don't know how it looks these days.

Joona's list of reasons why Stockfish is strong is pretty good, I think, although I suspect the complicated king safety evaluation gives more style than strength.

mcostalba · Post by **mcostalba** » Fri Jul 16, 2010 7:35 pm

Michael Sherwin wrote: Are you happy with your little war? If you do not wish to be helpful then just butt out and go away!

I don't have any war with Bob, nor big, nor little and my butt is fine where it is now.

If you are disappointed with me because I think your post is silly I can understand it....but is not my problem...and this won't make me change my mind.

Said this, I will take your hint and move away.

Tord Romstad · Post by **Tord Romstad** » Fri Jul 16, 2010 7:41 pm

bob wrote:I use my cluster test results exclusively. No book issues or anything else, just plain and simple "engine vs engine." Already found one serious timing bug that will influence rating lists that use "repeating" time controls that I never use or test with (for example, 40/60 repeating, 40 moves in 60 minutes, then repeat. Gross error in time usage in that.

I prefer fischer-clock games to avoid time scrambles also.

So do I.

This is a danger we are all facing: We are always optimizing for our own favorite testing conditions, and there is always a risk that the strength will suffer when somebody else tests the program under different conditions.

For people with limited testing resources, like us in the Stockfish team, the public rating lists are invaluable. I understand why it is not as interesting to you with your ability to get thousands of games very quickly on your cluster, but I still think it would be useful for you to have a look at the public rating lists once in a while, in order to to see how Crafty performs under different conditions and discover bugs like the one described above.

Michael Sherwin · Post by **Michael Sherwin** » Fri Jul 16, 2010 8:00 pm

mcostalba wrote:
If you are disappointed with me because I think your post is silly I can understand it....but is not my problem...and this won't make me change my mind.

I am not disappointed with you, because, that would indicate that I expected you to be forthcoming and helpful in the first place. I did not! However, I was confident that some other team members might offer something of value that was good for a meaningful discussion. I am glad that I have not been disappointed. Looking forward to the discussion continuing with out you. However, you are more than welcome back if you were to have a change of heart!

Eelco de Groot · Post by **Eelco de Groot** » Fri Jul 16, 2010 8:28 pm

Michael Sherwin wrote:
mcostalba wrote:
zamar wrote:I don't like making comparisons with any specific program (like Crafty), but I can list some major points why I think Stockfish is stronger than most other modern programs:

* Relaxed singular extension
* Logarithmic LMR
* Complicated king safety evaluation
* Fine tuned tables and constants in evaluation
* Speed-optimized code
* Aggressive late move pruning at low depths

I think that someone (read Bob) will have to say something on the above points.

I don't like this kind of discussions because are very prone to handwaving (where I predict this thread is heading to BTW...)

Anyhow one should first try to understand where are hidden the 80 ELO points of difference _still_ (after more then 2 years) exsisting with Glaurung 2.2 and then, once understood this point (hint: it won't be understood ) add on the table the remaining 260 points.
Here is all I know as I have not done much of an examination of g2.2.

...

It uses Late Move Pruning if the move number is beyond a threshold value

There is some type of feedback from the search to the eval

The move ordering routine is quite 'slick' and far more sophisticated than, say, Craftys

Tord was a student of Phalanx in which a certain type of extension is done

---

There is more I would suspect, but I know that these thing are not in Crafty. Am I on the right track?

I read Marco's comments more as saying that there is not always much point in trying to understand why something works. The computational performance, sure that is up to good programming and a lot of profiling I guess. But why certain extensions work and others not, often has to do with the mathematical structure of chess. And this is interacting with all the structures in the program. Chess is hard enough to understand as it is. If you have to understand another layer of complexity on top of chess, because you never know exactly what the program does, it quickly becomes a kind of alchemy. All you can do is use your intuition and test, write off all the wrong ideas and failed attempts, then test and test some more

However, instead of playing games with us you could just be nice and tell us! My mom has Alzheimer's and I have so far sacrificed two and a half years of my life to taking care of her day and night. When do you think that I have had the time to find out for my self the secrets of g2.2 or Stockfish?

We all have tremendous respect for what you are doing Michael. You once said you are very rational or something to that effect, and maybe that helps you cope a little better than most people. But I think most people in a situation like yours will feel the need to have some professional carers every once in a while take over from you or you will wear yourself out completely. Isn't there some way you could get somebody to help? I would be a bit afraid that you would ignore the stress signs in yourself because you feel you have to cope, and there is nobody else. Just don't go it alone completely if you don't absolutely have to, I hope you at least have checked any possibilities for occasional help.

Kind regards,
Eelco

Michael Sherwin · Post by **Michael Sherwin** » Fri Jul 16, 2010 8:53 pm

Eelco de Groot wrote:
We all have tremendous respect for what you are doing Michael. You once said you are very rational or something to that effect, and maybe that helps you cope a little better than most people. But I think most people in a situation like yours will feel the need to have some professional carers every once in a while take over from you or you will wear yourself out completely. Isn't there some way you could get somebody to help? I would be a bit afraid that you would ignore the stress signs in yourself because you feel you have to cope, and there is nobody else. Just don't go it alone completely if you don't absolutely have to, I hope you at least have checked any possibilities for occasional help.

Kind regards,
Eelco

Thank you Eelco! This kind of sympathy helps in that I can tell that you understand just how taxing of one strength the situation is. I am completely worn out. I had some help, but they were just not up to the task.

Milos · Post by **Milos** » Sat Jul 17, 2010 3:22 am

mcostalba wrote:You, as an engine author, should answer yourself a much more interesting question: "Why my ELO references are _so_ different from the public ones ?"....and then you have to choices: try to stick to them or prove them are garbage.

There is nothing to prove. They are garbage. As simple as that. Writing all reasons why would certainly take couple of days...

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team