Help me to test an idea for Stockfish (I don't know programm

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help me to test an idea for Stockfish (I don't know prog

Post by hgm »

syzygy wrote:
Lyudmil Tsvetkov wrote:You have a point here, but do not you have terms that apply only for the middlegame, or only for the endgame? Those should also make transitions less smooth, theoretically.

Let us take a term like a bishop pin, for example. Difficult to believ you can do well without it in the middlegame, but it is totally useless in the endgame.
Miguel's point is this: if a particular term is worth say 100 in a "pure middlegame" position and 0 in a "pure endgame" position, then what SF's scheme does is exactly that. In addition, if the position being evaluated is somewhere between a "pure middlegame" and a "pure endgame" position, then the bonus being awarded is somewhere between 100 and 0 (linearly interpolated).
Sure, but what Lyudmil wants is a term that could be only important in the middle game. (i.e. not interpolated linearly, but by a second-order polynomial, controlled by a third parameter).
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by syzygy »

hgm wrote:
syzygy wrote:
Lyudmil Tsvetkov wrote:You have a point here, but do not you have terms that apply only for the middlegame, or only for the endgame? Those should also make transitions less smooth, theoretically.

Let us take a term like a bishop pin, for example. Difficult to believ you can do well without it in the middlegame, but it is totally useless in the endgame.
Miguel's point is this: if a particular term is worth say 100 in a "pure middlegame" position and 0 in a "pure endgame" position, then what SF's scheme does is exactly that. In addition, if the position being evaluated is somewhere between a "pure middlegame" and a "pure endgame" position, then the bonus being awarded is somewhere between 100 and 0 (linearly interpolated).
Sure, but what Lyudmil wants is a term that could be only important in the middle game. (i.e. not interpolated linearly, but by a second-order polynomial, controlled by a third parameter).
Hmm, are you sure he wants that? I have some difficulty reading a desire for quadratic interpolation into his post. Do you mean that the third parameter would be "number of complications"?

In any case, the point would remain that there are good reasons for applying some kind of interpolation. There is no clean split between middle game and endgame, or between positions with many complications and positions with few complications.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Help me to test an idea for Stockfish (I don't know prog

Post by xmas79 »

michiguel wrote:In addition, PST values are worthless per se, what matters is the relative value among the squares. Putting all together, and to be more precise, what matters is the relative slope between the squares
This is not completely true: a 10cp penalty for a white knight in g1 and a 10cp bonus for a white knight in f3 will make a difference of 20cp on a Ng1-f3 move, but if you have a 10cp penalty for bishop in c1 and a black knight captures that bishop, penalty will disappear. That means that big bonus/penalty on PST are the way for strange things IMHO. Wanna capture opponent bishops and save yours? Add +100cp to every cell on the bishop PST. Makes sense?
User avatar
mohzus
Posts: 106
Joined: Tue Sep 24, 2013 2:54 am

Re: Help me to test an idea for Stockfish (I don't know prog

Post by mohzus »

Lyudmil Tsvetkov wrote:Hi Robert.
Now that you have started modifying this table, if you would accept to test if possible a pseudoidea of mine: it seems to me that white bishops on d1 and e1 in the middlegame deserve much bigger penalty than just 8cps. I would probably double it. Actually, I think bishops retreating on d1 and e1 are much worse located than c1 and f1. There they take some other pieces' places. And Stockfish sometimes likes to retreat on e8 with its black bishop when it should not do so under any circumstances.

So maybe -25cps penalty for white bishops on d1/e1, black on d8/e8 for the middlegame could be meaningful.

Sorry to jump into your thread like a bum, but I could not resist seeing those nice tables.
Hi Lyudmil!
I personally have no problem applying your idea. Right now I will wait some days though, the fishtest framework is ultra busy. People are flowing with new ideas to a rhythm I have not seen before. My test has been scheduled, but it's still far from near being tested.
In fact what I did was more than just a "gut feeling" that SF needed more penalty for bishop on their starting position. I made my mind on a position that occurred in the nTCEC (king's indian game against Bouquet, where SF played Bg5, black replied with h6 and then SF played Bc1 which was equivalent to pass a move right in the opening).
I tried to add a penalty of 8 cp making the total penalty to 43cp. It didn't fix SF, SF would still play Bc1. I tried a penalty of 54 cp. SF would still play Bc1. Lastly I tried to double the penalty (i.e. penalty of 70 cp) and this "fixed" SF at least up to depth 34, it would now play Be3 which seems better than Bc1 to me. If you are interested, the FEN is r1bq1rk1/1pp2pb1/n2p1npp/p2Pp1B1/2P1P3/2N2N2/PP2BPPP/R2Q1RK1 w - - 0 10.
If you have a position in mind where SF played Bd1 or Be1 where you believe it should not have, please let me know. I will try your idea to fix it by modifying the code and add penalties until the problem is fixed.
In general I don't think this method to "fix" SF would make it stronger in terms of elo. Maybe a bit, but not much. But I could be wrong too, I don't know much about programming and even less in chess programming.

Lastly, you could even create a github account and try to apply your changes to SF by yourself :) I'm sure this forum would help you if you have some troubles to do so.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Help me to test an idea for Stockfish (I don't know prog

Post by michiguel »

xmas79 wrote:
michiguel wrote:In addition, PST values are worthless per se, what matters is the relative value among the squares. Putting all together, and to be more precise, what matters is the relative slope between the squares
This is not completely true: a 10cp penalty for a white knight in g1 and a 10cp bonus for a white knight in f3 will make a difference of 20cp on a Ng1-f3 move, but if you have a 10cp penalty for bishop in c1 and a black knight captures that bishop, penalty will disappear.
That penalty is part of the material. Every bonus/penalty of the PST is intrinsically added to the material. What matters is the difference between them, as I mentioned.

That means that big bonus/penalty on PST are the way for strange things IMHO. Wanna capture opponent bishops and save yours? Add +100cp to every cell on the bishop PST. Makes sense?
You just increased the value of the bishop in that way. In fact, some engines do not have values for the pieces, they incorporate them directly into the PST. That is why the absolute values do not mean much in term of "placing" pieces, just the relative values. The absolute values are part of the material.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Help me to test an idea for Stockfish (I don't know prog

Post by michiguel »

Lyudmil Tsvetkov wrote:
michiguel wrote:
Lyudmil Tsvetkov wrote:Btw., who needs endgame psqt?

My impression is that endgames in general are so simple for modern top engines, that they would do quite well even without endgame psqt. But still not without midgame psqt. Has anyone tried to measure the impact of psqt in the middlegame and endgame? I would think that for the middlegame the impact would be available, but probably negligeable for the endgame.
"Endgame" values have nothing to do with the "end game" per se, but they are a way to determine a straight line that links all the different degrees of material content. An identical (and possibly theoretically better for tuning!) way to do all this would be to have an "initial" value and a "slope" (score change per unit of material change).

In addition, PST values are worthless per se, what matters is the relative value among the squares. Putting all together, and to be more precise, what matters is the relative slope between the squares. It is not entirely correct to think in absolute terms of endgame or middlegame. In fact, this scheme of having two values and interpolate them was set to ignore broken transitions and make them smooth.

Miguel
You have a point here, but do not you have terms that apply only for the middlegame, or only for the endgame? Those should also make transitions less smooth, theoretically.

Let us take a term like a bishop pin, for example. Difficult to believ you can do well without it in the middlegame, but it is totally useless in the endgame. Actually, if you score bishop pin in the endgame, it is quite probable that the transition will be less smooth, as a bishop pin has meaning only in positions with many pieces and complications, but meaningless in the endgame.

Same with psqt, if endgame psqt only does damage, why not leave psqt just for the middlegame? When you have a position with just 5 or 6 pieces at most, psqt will hardly help, but maybe I am wrong and that is why I asked if someone has tested the performance of psqt separately for the middlegame and endgame.

One thing I know for sure: I have never seen a top engine go wrong in a simple pawn endgame, and never means never; I might have seen couple of examples when a top engine will misplay simple one or 2 piece endings, but this is extremely rare. On the other hand, engines still make many mistakes in the middlegame. Conclusion: engines play much better the endgame than the middlegame, so why not experiment to put different eval and search weight on the two stages? Especially what concerns eval, I might define a much bigger range of parameters that would apply only in the middlegame, or the endgame.
I am not sure I was clear. The endgame PST is not about the endgame. It is just a name. With interpolated techniques, there is no such thing as endgames. The whole games is a gradient. You can read about it here
http://chessprogramming.wikispaces.com/Tapered+Eval

If you remove the endgame PST, you completely disrupt the values what will be assigned in the middlegame and even the opening.

Miguel
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

mohzus wrote:
Lyudmil Tsvetkov wrote:Hi Robert.
Now that you have started modifying this table, if you would accept to test if possible a pseudoidea of mine: it seems to me that white bishops on d1 and e1 in the middlegame deserve much bigger penalty than just 8cps. I would probably double it. Actually, I think bishops retreating on d1 and e1 are much worse located than c1 and f1. There they take some other pieces' places. And Stockfish sometimes likes to retreat on e8 with its black bishop when it should not do so under any circumstances.

So maybe -25cps penalty for white bishops on d1/e1, black on d8/e8 for the middlegame could be meaningful.

Sorry to jump into your thread like a bum, but I could not resist seeing those nice tables.
Hi Lyudmil!
I personally have no problem applying your idea. Right now I will wait some days though, the fishtest framework is ultra busy. People are flowing with new ideas to a rhythm I have not seen before. My test has been scheduled, but it's still far from near being tested.
In fact what I did was more than just a "gut feeling" that SF needed more penalty for bishop on their starting position. I made my mind on a position that occurred in the nTCEC (king's indian game against Bouquet, where SF played Bg5, black replied with h6 and then SF played Bc1 which was equivalent to pass a move right in the opening).
I tried to add a penalty of 8 cp making the total penalty to 43cp. It didn't fix SF, SF would still play Bc1. I tried a penalty of 54 cp. SF would still play Bc1. Lastly I tried to double the penalty (i.e. penalty of 70 cp) and this "fixed" SF at least up to depth 34, it would now play Be3 which seems better than Bc1 to me. If you are interested, the FEN is r1bq1rk1/1pp2pb1/n2p1npp/p2Pp1B1/2P1P3/2N2N2/PP2BPPP/R2Q1RK1 w - - 0 10.
If you have a position in mind where SF played Bd1 or Be1 where you believe it should not have, please let me know. I will try your idea to fix it by modifying the code and add penalties until the problem is fixed.
In general I don't think this method to "fix" SF would make it stronger in terms of elo. Maybe a bit, but not much. But I could be wrong too, I don't know much about programming and even less in chess programming.

Lastly, you could even create a github account and try to apply your changes to SF by yourself :) I'm sure this forum would help you if you have some troubles to do so.
What Stockfish needs, for me, is some huge penalty for slow scheduling of tests, both for the middlegame and the endgame. :D

I think returning with the bishop to a position previously occupied by on the first rank not always and necessarily means the move is bad, but it might be. The problem would arise when the engine frequently does so. I do not have right now a specific example to show, but in my games when playing black Stockfish very frequently likes to retreat its bishop to e8, as if to guard the king, for example defending additionally g6, but such a retreat would not be justified. I have observed this as a pattern and therefore I suggested the d1/e1 penalty might work.

Sorry, Robert, I asked you because I know you are a kind person, if you find the time, you can test also my idea, otherwise please do not bother.

I think your programming knowledge is higher than mine as the sky to the earth...
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

michiguel wrote:
Lyudmil Tsvetkov wrote:
michiguel wrote:
Lyudmil Tsvetkov wrote:Btw., who needs endgame psqt?

My impression is that endgames in general are so simple for modern top engines, that they would do quite well even without endgame psqt. But still not without midgame psqt. Has anyone tried to measure the impact of psqt in the middlegame and endgame? I would think that for the middlegame the impact would be available, but probably negligeable for the endgame.
"Endgame" values have nothing to do with the "end game" per se, but they are a way to determine a straight line that links all the different degrees of material content. An identical (and possibly theoretically better for tuning!) way to do all this would be to have an "initial" value and a "slope" (score change per unit of material change).

In addition, PST values are worthless per se, what matters is the relative value among the squares. Putting all together, and to be more precise, what matters is the relative slope between the squares. It is not entirely correct to think in absolute terms of endgame or middlegame. In fact, this scheme of having two values and interpolate them was set to ignore broken transitions and make them smooth.

Miguel
You have a point here, but do not you have terms that apply only for the middlegame, or only for the endgame? Those should also make transitions less smooth, theoretically.

Let us take a term like a bishop pin, for example. Difficult to believ you can do well without it in the middlegame, but it is totally useless in the endgame. Actually, if you score bishop pin in the endgame, it is quite probable that the transition will be less smooth, as a bishop pin has meaning only in positions with many pieces and complications, but meaningless in the endgame.

Same with psqt, if endgame psqt only does damage, why not leave psqt just for the middlegame? When you have a position with just 5 or 6 pieces at most, psqt will hardly help, but maybe I am wrong and that is why I asked if someone has tested the performance of psqt separately for the middlegame and endgame.

One thing I know for sure: I have never seen a top engine go wrong in a simple pawn endgame, and never means never; I might have seen couple of examples when a top engine will misplay simple one or 2 piece endings, but this is extremely rare. On the other hand, engines still make many mistakes in the middlegame. Conclusion: engines play much better the endgame than the middlegame, so why not experiment to put different eval and search weight on the two stages? Especially what concerns eval, I might define a much bigger range of parameters that would apply only in the middlegame, or the endgame.
I am not sure I was clear. The endgame PST is not about the endgame. It is just a name. With interpolated techniques, there is no such thing as endgames. The whole games is a gradient. You can read about it here
http://chessprogramming.wikispaces.com/Tapered+Eval

If you remove the endgame PST, you completely disrupt the values what will be assigned in the middlegame and even the opening.

Miguel
I understood you quite well, Miguel.

I read about tapered eval and know what it means and how important it is.

You would disrupt tapered eval if the endgame psqt is meaningful. If it is not, you disrupt nothing. And in the endgame, I really think, you do not very much need psqt, it might actually be misleading there, just like assigning bonus for a bishop pin in the endgame would be. So when you interpolate between middlegame and endgame, you just stick to the middlegame psqt values until the transition is done, for example after material decreases below a certain point. Then you use no psqt at all.

That will hardly be damaging in any way, not in the last place because my impression is that most spqt tables are very crude and engines do not rely so much on them as on the interplay of remaining tuned terms.
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Suggestions for Fishtest.

Post by Ajedrecista »

Hello:

I do not write in this thread for SF but for Fishtest. There are times where tests with three cores are scheduled and I see users with (for example) five cores contributing to that test while there are other single core tests running simultaneously. Is it a waste of resources? Wouldn't be better to limit those MP tests to computers/users with 3, 6, 9... cores? There are a lot of users with quads that use three cores, so there should not be any problems.

Regarding SPRT tests, I have a suggestion that I do not know if it is useful or useless. Please let me explain the idea in the next paragraphs... it is about predicting remaining time of a SPRT test.

I think there are some SPRT simulators written by Lucas, Michel, etc. I even wrote one! The idea is to start predicting remaining time once a number of games is reached (say 1000 for example). Then drawelo and bayeselo parameters can be estimated from the sample:

Code: Select all

W = wins/games.
L = loses/games.

drawelo = 200*log[(1 - W)*(1 - L)/(W*L)].
bayeselo = 200*log{(1 - L)*W/[L*(1 - W)]}.
And then run some hundreds or thousands of simulations to get the expected mean or median of games before stop by SPRT rules. Knowing that each bunch of simulations can take several seconds, the best thing is to run them with a fixed frequency (for example once a minute) instead of updating the simulations after each finished game. For example:

Code: Select all

faster_first_move

LLR: -0.19 (-2.94,2.94) [0.00,6.00]
Total: 50846 W: 8150 L: 7923 D: 34773

sprt @ 60+0.05
th 1

Code: Select all

drawelo ~ 290.6060
bayeselo ~ 2.9142
I get an average of 82178 games after 1000 simulations (time of calculations: 39.76 seconds in my PC of year 2006). There were 446 passes and 554 fails in my SPRT simulation.

Then, in average: 82178 - 50846 = 31332 remaining games; knowing the speed and number of cores of users running this SPRT test (at this moment only two cores with a speed of 1.06 MNps each core), you can estimate the remaining time taking the mean as reference (you can also take the median as reference, as well as the shortest and longest simulations to get an idea of the shortest and longest remaining time after X simulations).

With ten more played games (+1 -1 = 8):

Code: Select all

faster_first_move

LLR: -0.20 (-2.94,2.94) [0.00,6.00]
Total: 50856 W: 8151 L: 7924 D: 34781

sprt @ 60+0.05
th 1

Code: Select all

drawelo ~ 290.6209
bayeselo ~ 2.9138
Now I get an average of 83593 games and 83593 - 50856 = 32737 remaining games, with 442 passes and 558 fails in 1000 simulations.

It could be applied in an easier way to fix game tests. Of course I am clueless about the viability of my suggestions.

Regards from Spain.

Ajedrecista.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Help me to test an idea for Stockfish (I don't know prog

Post by bob »

syzygy wrote:
Lyudmil Tsvetkov wrote:You have a point here, but do not you have terms that apply only for the middlegame, or only for the endgame? Those should also make transitions less smooth, theoretically.

Let us take a term like a bishop pin, for example. Difficult to believ you can do well without it in the middlegame, but it is totally useless in the endgame.
Miguel's point is this: if a particular term is worth say 100 in a "pure middlegame" position and 0 in a "pure endgame" position, then what SF's scheme does is exactly that. In addition, if the position being evaluated is somewhere between a "pure middlegame" and a "pure endgame" position, then the bonus being awarded is somewhere between 100 and 0 (linearly interpolated).
That on/off term is a dangerous way to program, however. Berliner referred to these as "evaluation discontinuities" and they should be avoided at all costs. Right around the discontinuity, strange things happen in an exhaustive search, where the program might elect to destroy its own king safety and then trade the piece that takes it across that boundary. All for a couple of centipawns, where it turned a won position into a draw due to broken pawn structure or whatever. Smoothly transitioning, whether using the fruit-like interpolation or Miguel's slope approach avoids this discontinuity (there is still one present since most use integer scores, but that's a different topic) and the associated bizarre behavior it causes at times.

I used to use Miguel's suggestion, but it has a drawback in integers. If you "scale" each term, there are not very many "steps" available in a term that is just 10 centipawns. If you first add 'em all together, and scale the final product, the scaling goes significantly smoother.