Help me to test an idea for Stockfish (I don't know programm

syzygy · Post by **syzygy** » Sat Jan 04, 2014 8:26 pm

bob wrote:
syzygy wrote:
Lyudmil Tsvetkov wrote:You have a point here, but do not you have terms that apply only for the middlegame, or only for the endgame? Those should also make transitions less smooth, theoretically.

Let us take a term like a bishop pin, for example. Difficult to believ you can do well without it in the middlegame, but it is totally useless in the endgame.
Miguel's point is this: if a particular term is worth say 100 in a "pure middlegame" position and 0 in a "pure endgame" position, then what SF's scheme does is exactly that. In addition, if the position being evaluated is somewhere between a "pure middlegame" and a "pure endgame" position, then the bonus being awarded is somewhere between 100 and 0 (linearly interpolated).
That on/off term is a dangerous way to program, however. Berliner referred to these as "evaluation discontinuities" and they should be avoided at all costs.

I think we all agree on that (except perhaps Lyudmil). It's OK to have a term ON in a position that in all respects is "middlegame" and to have it OFF in a position that in all respects is "endgame", but there will always be positions between the two extremes and for those interpolation has turned out to be a good idea.

Maybe Luydmil means that the endgame values of the piece square tables should all be set to 0. That would be easy to test. But I highly doubt that this would be better than the (I assume) well-tuned values that SF is using now. One thing is certain: that SF plays the endgame well is in no way evidence that setting the endgame values to 0 would have any benefit.

mohzus · Post by **mohzus** » Sat Jan 04, 2014 8:28 pm

Lyudmil Tsvetkov wrote: What Stockfish needs, for me, is some huge penalty for slow scheduling of tests, both for the middlegame and the endgame.

I think returning with the bishop to a position previously occupied by on the first rank not always and necessarily means the move is bad, but it might be. The problem would arise when the engine frequently does so. I do not have right now a specific example to show, but in my games when playing black Stockfish very frequently likes to retreat its bishop to e8, as if to guard the king, for example defending additionally g6, but such a retreat would not be justified. I have observed this as a pattern and therefore I suggested the d1/e1 penalty might work.

Sorry, Robert, I asked you because I know you are a kind person, if you find the time, you can test also my idea, otherwise please do not bother.

I think your programming knowledge is higher than mine as the sky to the earth...

My test failed at short time control rather quickly. In my case fixing a particular position worsened stockfish's play in general somehow.
If I do not have a particular position or positions to "fix" then trying arbitrary penalties (I think this would be called "tuning") would be the way to go. I am not really sure what is the best way to tackle this in the fishtest. Probably trying values such as a penalty of 40 cp (the penalty of bishop in e1/d1 is currently worth 30 cp), 50 cp, 60 cp and even 20 cp. I believe I would have to try these tests with a fixed number of games, at least 20k games for each test. Maybe even more. Someone suggested that the removal of the psq table for bishop removed less than 3 elo or so... So maybe I'd need to run 40k games for each one of these tests in order to obtain an error bar of +/- 2 elo. If the framework was less busy maybe they'd allow me, but right now I highly doubt I could try this.

Another idea that I have in mind is trying to give different penalties as I just suggested, I send you the executables of these modified stockfish and you tell me which one do you feel plays better. I would then run that version in the fishtest. But this would require you probably a lot of time.

If you have other ideas, I'm all ears. I do have some time these days, no worry about it.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Jan 04, 2014 11:41 pm

syzygy wrote: Maybe Luydmil means that the endgame values of the piece square tables should all be set to 0. That would be easy to test. But I highly doubt that this would be better than the (I assume) well-tuned values that SF is using now. One thing is certain: that SF plays the endgame well is in no way evidence that setting the endgame values to 0 would have any benefit.

That was exactly what I was referring to, Ronald. I think psqt in endgames is fully useless in top engines (but only in top engines, as they have all other things that would allow them play endings without psqt), except probably some bonus for minors on the 5th and 6th ranks.

Anyone who would like to test how psqt performs in middlegame and endgame, could simply switch off the endgame values once, and then the middlegame values, the second time, i.e. setting zero values for each stage respectively. My guess is that engines will not suffer without endgame psqt, but will have a very hard time without middlegame psqt. Another problem with psqt is that whatever tables I have looked at, they are simply disastrous.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sun Jan 05, 2014 12:06 am

mohzus wrote:
Lyudmil Tsvetkov wrote: What Stockfish needs, for me, is some huge penalty for slow scheduling of tests, both for the middlegame and the endgame.

I think returning with the bishop to a position previously occupied by on the first rank not always and necessarily means the move is bad, but it might be. The problem would arise when the engine frequently does so. I do not have right now a specific example to show, but in my games when playing black Stockfish very frequently likes to retreat its bishop to e8, as if to guard the king, for example defending additionally g6, but such a retreat would not be justified. I have observed this as a pattern and therefore I suggested the d1/e1 penalty might work.

Sorry, Robert, I asked you because I know you are a kind person, if you find the time, you can test also my idea, otherwise please do not bother.

I think your programming knowledge is higher than mine as the sky to the earth...
My test failed at short time control rather quickly. In my case fixing a particular position worsened stockfish's play in general somehow.
If I do not have a particular position or positions to "fix" then trying arbitrary penalties (I think this would be called "tuning") would be the way to go. I am not really sure what is the best way to tackle this in the fishtest. Probably trying values such as a penalty of 40 cp (the penalty of bishop in e1/d1 is currently worth 30 cp), 50 cp, 60 cp and even 20 cp. I believe I would have to try these tests with a fixed number of games, at least 20k games for each test. Maybe even more. Someone suggested that the removal of the psq table for bishop removed less than 3 elo or so... So maybe I'd need to run 40k games for each one of these tests in order to obtain an error bar of +/- 2 elo. If the framework was less busy maybe they'd allow me, but right now I highly doubt I could try this.

Another idea that I have in mind is trying to give different penalties as I just suggested, I send you the executables of these modified stockfish and you tell me which one do you feel plays better. I would then run that version in the fishtest. But this would require you probably a lot of time.

If you have other ideas, I'm all ears. I do have some time these days, no worry about it.

Hi Robert, so you are st (I would suppose you to be rt )

I really do not have time to do tests (besides I am not very good at testing), however, I have plenty of time for suggesting ideas for other to test...

The Fishtest, sorry to tell you my gut feeling, but some 80% of eval ideas tested are absolutely ludicrous, which means that ideas that have absolutely no chance to pass the test take up 4/5 of the testing time, at least what concerns eval ideas.

I have a fervent desire for someone to make a patch with bonus for longer chains of pawns, as this is very important, and can bring a lot of elo, but no one simply wants or is able to do such patch. And, concerning chains, longer chains are some 80% of what chains are all about. So they do just chain pawns, i.e. defended pawns, and some other chain-related things, but this is only 20% of what chains are all about.

My idea for longer chains was very simple, I will repeat it here again, maybe someone will be able to push a patch with this idea:

You give bonus for longer chains, that is additional to other bonus for the chain pawns, including rank, file, etc.
You give this bonus only to longer chains of pawns of 3 or more pawns in all.
You count the number of chain pawns along the same diagonal (anybody, is it possible to do this?).
In case you find 3 or more pawns along the same diagonal, you give the additional bonus:
- 10cps for 3 chain pawns
- 20cps for 4 chain pawns
- 30cps for 5 chain pawns

That is all, in the end you have a very nice positional engine.

(but that includes base chain pawns in the calculations, they are also important)

Another idea I very much would like to see implemented is to tune bishop and knight values for closed positions. The Joona Kiiski closed patch got yellow on LTC, but if you tune the piece values for that patch, it will quite probably pass the test. For closed positions, knight value is tuned up by some 10-20cps, while bishop value down by the same 10-20cps. In the end, instead of yellow, you will get green, but people are testing all kinds of unimaginable eval ideas and somehow miss the more important ones.

Btw, does anybody know why a successful patch that gets although a slight, but still winning percentage after 40 000 games does not get integrated into Stockfish? If you are positive after 40 000 games, you simply integrate it, why not?

Ralph Stoesser · Post by **Ralph Stoesser** » Sun Jan 05, 2014 12:23 am

Lyudmil Tsvetkov wrote:
syzygy wrote: Maybe Luydmil means that the endgame values of the piece square tables should all be set to 0. That would be easy to test. But I highly doubt that this would be better than the (I assume) well-tuned values that SF is using now. One thing is certain: that SF plays the endgame well is in no way evidence that setting the endgame values to 0 would have any benefit.
That was exactly what I was referring to, Ronald. I think psqt in endgames is fully useless in top engines (but only in top engines, as they have all other things that would allow them play endings without psqt), except probably some bonus for minors on the 5th and 6th ranks.

Anyone who would like to test how psqt performs in middlegame and endgame, could simply switch off the endgame values once, and then the middlegame values, the second time, i.e. setting zero values for each stage respectively. My guess is that engines will not suffer without endgame psqt, but will have a very hard time without middlegame psqt. Another problem with psqt is that whatever tables I have looked at, they are simply disastrous.

You may want to read this. http://stockfish.wikispaces.com/psqtab.h

Vinvin · Post by **Vinvin** » Sun Jan 05, 2014 12:23 am

Lyudmil Tsvetkov wrote:You give bonus for longer chains, that is additional to other bonus for the chain pawns, including rank, file, etc.
You give this bonus only to longer chains of pawns of 3 or more pawns in all.
You count the number of chain pawns along the same diagonal (anybody, is it possible to do this?).
In case you find 3 or more pawns along the same diagonal, you give the additional bonus:
- 10cps for 3 chain pawns
- 20cps for 4 chain pawns
- 30cps for 5 chain pawns

I think that more than 4 chain pawns is counterproductive because the chain pawns can be attacked by a lever or by sacking a pawn to create weak pawn(s)
[d]8/8/1p3P2/2p1P3/3P4/2P5/1P6/8 w - - 0 1

syzygy · Post by **syzygy** » Sun Jan 05, 2014 1:06 am

Lyudmil Tsvetkov wrote:Anyone who would like to test how psqt performs in middlegame and endgame, could simply switch off the endgame values once, and then the middlegame values, the second time, i.e. setting zero values for each stage respectively. My guess is that engines will not suffer without endgame psqt, but will have a very hard time without middlegame psqt. Another problem with psqt is that whatever tables I have looked at, they are simply disastrous.

Setting all endgame parts to 0 can be achieved by changing

Code: Select all

#define S&#40;mg, eg&#41; make_score&#40;mg, eg&#41;

into

Code: Select all

#define S&#40;mg, eg&#41; make_score&#40;mg, 0&#41;

Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.

syzygy · Post by **syzygy** » Sun Jan 05, 2014 1:52 am

syzygy wrote:Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.

The result after 500 games is 152 - 80 - 268, which is a bit less disastrous, but it seems rather unlikely that your proposal does not lose Elo.

This is not surprising. There was no good reason to believe that this change would not hurt impact playing strength. Your gut feeling is not worth any Elo.

michiguel · Post by **michiguel** » Sun Jan 05, 2014 2:03 am

syzygy wrote:
syzygy wrote:Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.
The result after 500 games is 152 - 80 - 268, which is a bit less disastrous, but it seems rather unlikely that your proposal does not lose Elo.

This is not surprising. There was no good reason to believe that this change would not hurt impact playing strength. Your gut feeling is not worth any Elo.

This is what I was trying to explain to Lyudmil. Setting endgame values to zero disrupts the middle game interpolated values completely. In fact, those will be lower, which means that the effective material values and their relationship will be messed up too.

Miguel

syzygy · Post by **syzygy** » Sun Jan 05, 2014 2:12 am

michiguel wrote:
syzygy wrote:
syzygy wrote:Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.
The result after 500 games is 152 - 80 - 268, which is a bit less disastrous, but it seems rather unlikely that your proposal does not lose Elo.

This is not surprising. There was no good reason to believe that this change would not hurt impact playing strength. Your gut feeling is not worth any Elo.
This is what I was trying to explain to Lyudmil. Setting endgame values to zero disrupts the middle game interpolated values completely. In fact, those will be lower, which means that the effective material values and their relationship will be messed up too.

It is not easy to reason with him (try to find an argument against "don't you see, it is obvious"), but this time the idea was simple to test and refute.

Help me to test an idea for Stockfish (I don't know programm

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog

Re: Help me to test an idea for Stockfish (I don't know prog