Couple more ideas

vincenegri · Post by **vincenegri** » Mon Apr 06, 2015 1:40 pm

Lyudmil Tsvetkov wrote:I wonder how Joerg is going to introduce a kingside fianchettoe bonus, starting from this position, how Joerg?

By submitting tests that use the 8-move book. Such tests are perfectly allowed and the results would be accepted.

You don't have to use the 2-move book. It just provides better resolution in most cases. (Faster tuning convergence, less resources used on framework, etc)

For example, I have seen people use the 8-move book when testing changes focused on the endgame, since the 8-move book makes it more likely the game will reach the ending.

vincenegri · Post by **vincenegri** » Mon Apr 06, 2015 1:54 pm

zullil wrote:
I understand that without a large number of games one cannot decide if a patch is positive, negative or neutral. (And part of me is thinking: "so what, no lives will be lost if a bad change is made. Life is short. Take risks. Have fun!) Of course, anyone is free to fork Stockfish and do whatever he wants, free from the constraints imposed by the fishtest protocols.

Quite so. But it takes a lot of resources to do this sort of development. Why else are there so many stockfish clones that keep rebasing themselves to current master?

It's perhaps no surprise that the creative forks, like sting, focus on test position solving, since that doesn't require so much raw CPU time. Just keep running through your EPD collection.

Komodo uses a lot of self-play testing (and I bet testing against other major engines too) - Mark spoke about this during TCEC and about the large financial investment in hardware to be able to keep up with fishtest. IIRC they run longer TC games, so they need even *more* hardware..

zullil wrote: I just find myself wondering more and more if the current testing constraints almost preclude improving aspects of Stockfish's play that involve positional play and "long-term planning". Maybe Stockfish has now developed to the point where a new testing protocol is needed, in order for the engine to reach its true potential?

It depends on your development model. I don't think the importance of having a quantifiable criteria for patch testing and acceptance in a multi-developer project like SF should be underestimated. Fishcooking can get crabby, for sure, but it would be 100 times worse if the only way to decide which patch went in was via arguments over individual positions, games, etc.

At the same time I don't think you are going to get a paradigmatic shift via the SF model. At least not by a series of incremental changes. Probably in the future someone who has worked alone will present a mega-patch with solid local test results to back it up (after all, if you did achieve a paradigm shift it would be worth many Elo, so you would demonstrate 99% LOS quite quickly)

And the good thing is, if the patch truly did increase SF's strength, it would pass the framework test and be accepted.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Mon Apr 06, 2015 2:00 pm

vincenegri wrote:
Lyudmil Tsvetkov wrote:I wonder how Joerg is going to introduce a kingside fianchettoe bonus, starting from this position, how Joerg?
By submitting tests that use the 8-move book. Such tests are perfectly allowed and the results would be accepted.

You don't have to use the 2-move book. It just provides better resolution in most cases. (Faster tuning convergence, less resources used on framework, etc)

For example, I have seen people use the 8-move book when testing changes focused on the endgame, since the 8-move book makes it more likely the game will reach the ending.

No, that will not work.

In an 8-move book, all kingside fianchettoes will already have been made by move 8, or the book will simply not have them. You can not introduce a kingside fianchettoe with an 8-move long book.

What you need is a 2-3 moves long book, that is bounded with 30cps evals, but evals obtained not at depth 12, but at least at depth 20-25 and using at least 2 engines to filter the positions. Then the reamining positions can be further reviewed by a human with some chess knowledge, so that only sensible opening positions are left, while random move sequences discarded. That might discard another half of the book.

I know this is a loooot of work to do, but you can not improve without that. It is a work project that is once done and for all. It will repay every single hour of work put into it.

SF will be able to improve much easier and in a much more objective way than it is doing now. This is truely the biggest waste of resources ever, as good patches might be discarded, bad ones succeed, etc. The resolution of a book with much more carefully selected openings will of course be much better.

Concerning your comment that a closed position arose from this ugly opening, it is true, but I bet that if such ugly openings did not exist, closed positions would arise twice more often...

I think SF should give itself an year's time to perfectionise its book.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Mon Apr 06, 2015 2:11 pm

vincenegri wrote:
Lyudmil Tsvetkov wrote:
I have been winning hundreds of games against SF based on this rule, and so far I have not witnessed even single exception...
1) At what time control?

2) Are you claiming you would beat SF in a controlled match? You do realise that you would be the toast of the scene and any number of sites would be delighted to host such an event.

Me at 15 sec for the entire game, with SF having 2 minutes for the game.

Just kidding, TC is SF plays blitz, I play with double SF time, but sometimes I win even with equal time.

I am not interested in any public events, what would have any meaning for me is for SF to improve its play in such positions, so that it provides beter analysis for the users.

I think you understand quite well, that it is not that difficult to win against a top engine, when you know precisely what its deficiencies are.

I have never claimed I am stronger than SF in open games, all I say is that I am able to perform well and sometimes win in closed positions, and also that I know how to steer the game into such positions.

vincenegri · Post by **vincenegri** » Mon Apr 06, 2015 2:15 pm

Lyudmil Tsvetkov wrote:
No, that will not work.

In an 8-move book, all kingside fianchettoes will already have been made by move 8, or the book will simply not have them. You can not introduce a kingside fianchettoe with an 8-move long book..

I'm not sure what you are trying to achieve here.

If you want the engine to appreciate the value of a king-side fianchetto, and to correctly evaluate and play positions that have such a feature, absolutely you can test using the 8-move book. How the engine got to such a position does not matter. You can make the engine try to preserve the bishop, try to open the diagonal… etc.

And once the engine correctly evaluates the position, and if it is actually the case that the k-side fianchetto is as strong as you believe, the engines will 'learn' to use it.

Note: I suppose you want the engine to be able to 'discover' the KID from first principles, i.e to be able to play the openings without a book? But such a goal, although an interesting challenge to be sure, will neither gain rating points in engine tournaments (since they almost always use a book - TCEC round 2 being the exception, and that only because the round-robin of many engines avoids tedium) nor be of much use to GMs who are analysing positions many moves further along.

zullil · Post by **zullil** » Mon Apr 06, 2015 2:17 pm

vincenegri wrote:
zullil wrote:
I understand that without a large number of games one cannot decide if a patch is positive, negative or neutral. (And part of me is thinking: "so what, no lives will be lost if a bad change is made. Life is short. Take risks. Have fun!) Of course, anyone is free to fork Stockfish and do whatever he wants, free from the constraints imposed by the fishtest protocols.

Quite so. But it takes a lot of resources to do this sort of development. Why else are there so many stockfish clones that keep rebasing themselves to current master?

It's perhaps no surprise that the creative forks, like sting, focus on test position solving, since that doesn't require so much raw CPU time. Just keep running through your EPD collection.

Komodo uses a lot of self-play testing (and I bet testing against other major engines too) - Mark spoke about this during TCEC and about the large financial investment in hardware to be able to keep up with fishtest. IIRC they run longer TC games, so they need even *more* hardware..

zullil wrote: I just find myself wondering more and more if the current testing constraints almost preclude improving aspects of Stockfish's play that involve positional play and "long-term planning". Maybe Stockfish has now developed to the point where a new testing protocol is needed, in order for the engine to reach its true potential?
It depends on your development model. I don't think the importance of having a quantifiable criteria for patch testing and acceptance in a multi-developer project like SF should be underestimated. Fishcooking can get crabby, for sure, but it would be 100 times worse if the only way to decide which patch went in was via arguments over individual positions, games, etc.

At the same time I don't think you are going to get a paradigmatic shift via the SF model. At least not by a series of incremental changes. Probably in the future someone who has worked alone will present a mega-patch with solid local test results to back it up (after all, if you did achieve a paradigm shift it would be worth many Elo, so you would demonstrate 99% LOS quite quickly)

And the good thing is, if the patch truly did increase SF's strength, it would pass the framework test and be accepted.

Yes, your points are quite valid. And still I wonder if, for example, the "hotspot bonus" idea might prove positive were it possible to test at more standard time controls. I mean, if the engine doesn't have time to find the "point" of the position you've told it to choose, of course that choice will fare poorly in your testing. But I guess I'm just repeating myself...

vincenegri · Post by **vincenegri** » Mon Apr 06, 2015 2:18 pm

Lyudmil Tsvetkov wrote: Me at 15 sec for the entire game, with SF having 2 minutes for the game.

Ok so you were kidding, but that would involve some ninja mouse skills

vincenegri · Post by **vincenegri** » Mon Apr 06, 2015 2:22 pm

zullil wrote:
Yes, your points are quite valid. And still I wonder if, for example, the "hotspot bonus" idea might prove positive were it possible to test at more standard time controls. I mean, if the engine doesn't have time to find the "point" of the position you've told it to choose, of course that choice will fare poorly in your testing. But I guess I'm just repeating myself...

I'm running a local test of the latest variant at regular fishtest STC conditions right now. Again the VSTC looked good. So far at STC it is zero elo, but the error bars are still large.

If a patch depends on 'finding the point' you would expect it to scale up with longer TC, no?

zullil · Post by **zullil** » Mon Apr 06, 2015 3:33 pm

vincenegri wrote:
zullil wrote:
Yes, your points are quite valid. And still I wonder if, for example, the "hotspot bonus" idea might prove positive were it possible to test at more standard time controls. I mean, if the engine doesn't have time to find the "point" of the position you've told it to choose, of course that choice will fare poorly in your testing. But I guess I'm just repeating myself...
I'm running a local test of the latest variant at regular fishtest STC conditions right now. Again the VSTC looked good. So far at STC it is zero elo, but the error bars are still large.

If a patch depends on 'finding the point' you would expect it to scale up with longer TC, no?

Yes, that's what I'd expect.

Sorry, I guess I misunderstood (or failed to read carefully). I thought the hotspot idea failed at VSTC in your local testing. In any case, if my concern is a valid one, the idea should prove better as TC increases. If not, then maybe my concerns about fishtest are unfounded.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Mon Apr 06, 2015 4:26 pm

vincenegri wrote:
Lyudmil Tsvetkov wrote:
No, that will not work.

In an 8-move book, all kingside fianchettoes will already have been made by move 8, or the book will simply not have them. You can not introduce a kingside fianchettoe with an 8-move long book..
I'm not sure what you are trying to achieve here.

If you want the engine to appreciate the value of a king-side fianchetto, and to correctly evaluate and play positions that have such a feature, absolutely you can test using the 8-move book. How the engine got to such a position does not matter. You can make the engine try to preserve the bishop, try to open the diagonal… etc.

And once the engine correctly evaluates the position, and if it is actually the case that the k-side fianchetto is as strong as you believe, the engines will 'learn' to use it.

Note: I suppose you want the engine to be able to 'discover' the KID from first principles, i.e to be able to play the openings without a book? But such a goal, although an interesting challenge to be sure, will neither gain rating points in engine tournaments (since they almost always use a book - TCEC round 2 being the exception, and that only because the round-robin of many engines avoids tedium) nor be of much use to GMs who are analysing positions many moves further along.

The point is that by move 8 many book positions already would have get rid of kingside fianchettoes, or achieved a structure that would be different from a SF definition for kingside fianchettoe, so testing would be more difficult.

Besides, I am very much afraid that your 8-moves book does not contain a sufficient number of kingside fianchettoes, unless it is based entirely upon games of world champions.

You can not do without a kingside fianchettoe - that is a most essential feature. You can fianchettoe your king bishop to good avail in at least 90% of all openings. And that would be that best option. That was actually what most world champions have been doing.

GMs will start analysing from the very first move soon, as soon as they start bothering about what the best first move is and would like an engine to assist them in that task.

Couple more ideas

Re: SF book???

Re: Gandalf cross

Re: SF book???

Re: Closed sides again - especially for Vince

Re: SF book???

Re: Gandalf cross

Re: Closed sides again - especially for Vince

Re: Gandalf cross

Re: Gandalf cross

Re: SF book???