Help me to test an idea for Stockfish (I don't know programm

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
mohzus
Posts: 106
Joined: Tue Sep 24, 2013 2:54 am

Re: Help me to test an idea for Stockfish (I don't know prog

Post by mohzus »

Lyudmil Tsvetkov wrote:
Hi Robert, so you are st (I would suppose you to be rt ) :)

I really do not have time to do tests (besides I am not very good at testing), however, I have plenty of time for suggesting ideas for other to test... :D
Yes it's me. I think it stands for "stockfishbot" which is the name of my account in the fishtest and in FICS.
Lyudmil Tsvetkov wrote:The Fishtest, sorry to tell you my gut feeling, but some 80% of eval ideas tested are absolutely ludicrous, which means that ideas that have absolutely no chance to pass the test take up 4/5 of the testing time, at least what concerns eval ideas.

I have a fervent desire for someone to make a patch with bonus for longer chains of pawns, as this is very important, and can bring a lot of elo, but no one simply wants or is able to do such patch. And, concerning chains, longer chains are some 80% of what chains are all about. So they do just chain pawns, i.e. defended pawns, and some other chain-related things, but this is only 20% of what chains are all about.

My idea for longer chains was very simple, I will repeat it here again, maybe someone will be able to push a patch with this idea:

You give bonus for longer chains, that is additional to other bonus for the chain pawns, including rank, file, etc.
You give this bonus only to longer chains of pawns of 3 or more pawns in all.
You count the number of chain pawns along the same diagonal (anybody, is it possible to do this?).
In case you find 3 or more pawns along the same diagonal, you give the additional bonus:
- 10cps for 3 chain pawns
- 20cps for 4 chain pawns
- 30cps for 5 chain pawns

That is all, in the end you have a very nice positional engine.

(but that includes base chain pawns in the calculations, they are also important)

Another idea I very much would like to see implemented is to tune bishop and knight values for closed positions. The Joona Kiiski closed patch got yellow on LTC, but if you tune the piece values for that patch, it will quite probably pass the test. For closed positions, knight value is tuned up by some 10-20cps, while bishop value down by the same 10-20cps. In the end, instead of yellow, you will get green, but people are testing all kinds of unimaginable eval ideas and somehow miss the more important ones.
Good luck with that. If only I knew programming I would try your ideas :) I somehow have the feeling that you know really well some of the Stockfish's weaknesses.
Lyudmil Tsvetkov wrote:Btw, does anybody know why a successful patch that gets although a slight, but still winning percentage after 40 000 games does not get integrated into Stockfish? If you are positive after 40 000 games, you simply integrate it, why not?
I think that if it's a SRPT test and hasn't passed after 40k games but shows a positive score, then it means it adds almost no to none elo. Usually such tests consist of adding several lines of code, so if they show no to almost no improvement in terms of elo, they are not worth it; at least for SF philosophy.
It's not that rare to see Marco Costalba committing a patch that indicate a very small-to none loss in elo but shows to be a simplification of code.

To Ronald de Man, where do I have to modify "#define S(mg, eg) make_score(mg, eg)" ? Because I see it in evaluate.cpp (line 98), in pawns.cpp (line 31) and in psqtab (line 25).

When I tried my SF modification locally, I got a 28 wins 12 losses and 60 draws in favor of my version. In the fishtest it went up to

Code: Select all

LLR: 1.26 (-2.94,2.94) [-1.50,4.50]
Total: 5214 W: 1029 L: 970 D: 3215
, so +59 wins on about 5214 games. But then it went downhill quickly. I'm a bit disheartened I must say :)
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

Ralph Stoesser wrote:
Lyudmil Tsvetkov wrote:
syzygy wrote: Maybe Luydmil means that the endgame values of the piece square tables should all be set to 0. That would be easy to test. But I highly doubt that this would be better than the (I assume) well-tuned values that SF is using now. One thing is certain: that SF plays the endgame well is in no way evidence that setting the endgame values to 0 would have any benefit.
That was exactly what I was referring to, Ronald. I think psqt in endgames is fully useless in top engines (but only in top engines, as they have all other things that would allow them play endings without psqt), except probably some bonus for minors on the 5th and 6th ranks.

Anyone who would like to test how psqt performs in middlegame and endgame, could simply switch off the endgame values once, and then the middlegame values, the second time, i.e. setting zero values for each stage respectively. My guess is that engines will not suffer without endgame psqt, but will have a very hard time without middlegame psqt. Another problem with psqt is that whatever tables I have looked at, they are simply disastrous.
You may want to read this. http://stockfish.wikispaces.com/psqtab.h
Thanks Ralph, it is a lot easier that way.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

syzygy wrote:
michiguel wrote:
syzygy wrote:
syzygy wrote:Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.
The result after 500 games is 152 - 80 - 268, which is a bit less disastrous, but it seems rather unlikely that your proposal does not lose Elo.

This is not surprising. There was no good reason to believe that this change would not hurt impact playing strength. Your gut feeling is not worth any Elo.
This is what I was trying to explain to Lyudmil. Setting endgame values to zero disrupts the middle game interpolated values completely. In fact, those will be lower, which means that the effective material values and their relationship will be messed up too.
It is not easy to reason with him (try to find an argument against "don't you see, it is obvious"), but this time the idea was simple to test and refute.
Well, the result seems convincing, but I might as well ask you to run the necessary number of games so that you have a statistical LOS (of course, I will not do that).

My suggestion was not that this will work automatically for Stockfish, but that it is an idea worth trying. You know very well that between a crude idea and implementation there is a looong way to go. You should adjust some things, draw your conclusions, etc. For example, you might want also to check how Stockfish performs without middlegame psqt and compare which table is more important (which was our initial aim). Then you might want to look for a way around crude interpolations (for example, when you switch on endgame eval).

I never said Stockfish will perform better without endgame psqt, but that endgame psqt is mostly useless, much more so than middlegame psqt.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

Vinvin wrote:
Lyudmil Tsvetkov wrote:You give bonus for longer chains, that is additional to other bonus for the chain pawns, including rank, file, etc.
You give this bonus only to longer chains of pawns of 3 or more pawns in all.
You count the number of chain pawns along the same diagonal (anybody, is it possible to do this?).
In case you find 3 or more pawns along the same diagonal, you give the additional bonus:
- 10cps for 3 chain pawns
- 20cps for 4 chain pawns
- 30cps for 5 chain pawns
I think that more than 4 chain pawns is counterproductive because the chain pawns can be attacked by a lever or by sacking a pawn to create weak pawn(s)
[d]8/8/1p3P2/2p1P3/3P4/2P5/1P6/8 w - - 0 1
Longer chains limit enemy pieces' activity in a much more efficient way than chains of just 2 pawns. The longer the chain, the better it performs. Yor example with the 5 chain members is very good: this white chain would deserve the biggest additional bonus, followed by chains of 4 and 3 pawns. Such chains simply control much larger area of the board. The fact the chain members could be attacked does not change anything, it is like that for each eval term - it changes with time.

See for example how Komodo uses its chains, much more efficiently than Stockfish.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

mohzus wrote:
Lyudmil Tsvetkov wrote:
Hi Robert, so you are st (I would suppose you to be rt ) :)

I really do not have time to do tests (besides I am not very good at testing), however, I have plenty of time for suggesting ideas for other to test... :D
Yes it's me. I think it stands for "stockfishbot" which is the name of my account in the fishtest and in FICS.
Lyudmil Tsvetkov wrote:The Fishtest, sorry to tell you my gut feeling, but some 80% of eval ideas tested are absolutely ludicrous, which means that ideas that have absolutely no chance to pass the test take up 4/5 of the testing time, at least what concerns eval ideas.

I have a fervent desire for someone to make a patch with bonus for longer chains of pawns, as this is very important, and can bring a lot of elo, but no one simply wants or is able to do such patch. And, concerning chains, longer chains are some 80% of what chains are all about. So they do just chain pawns, i.e. defended pawns, and some other chain-related things, but this is only 20% of what chains are all about.

My idea for longer chains was very simple, I will repeat it here again, maybe someone will be able to push a patch with this idea:

You give bonus for longer chains, that is additional to other bonus for the chain pawns, including rank, file, etc.
You give this bonus only to longer chains of pawns of 3 or more pawns in all.
You count the number of chain pawns along the same diagonal (anybody, is it possible to do this?).
In case you find 3 or more pawns along the same diagonal, you give the additional bonus:
- 10cps for 3 chain pawns
- 20cps for 4 chain pawns
- 30cps for 5 chain pawns

That is all, in the end you have a very nice positional engine.

(but that includes base chain pawns in the calculations, they are also important)

Another idea I very much would like to see implemented is to tune bishop and knight values for closed positions. The Joona Kiiski closed patch got yellow on LTC, but if you tune the piece values for that patch, it will quite probably pass the test. For closed positions, knight value is tuned up by some 10-20cps, while bishop value down by the same 10-20cps. In the end, instead of yellow, you will get green, but people are testing all kinds of unimaginable eval ideas and somehow miss the more important ones.
Good luck with that. If only I knew programming I would try your ideas :) I somehow have the feeling that you know really well some of the Stockfish's weaknesses.
Lyudmil Tsvetkov wrote:Btw, does anybody know why a successful patch that gets although a slight, but still winning percentage after 40 000 games does not get integrated into Stockfish? If you are positive after 40 000 games, you simply integrate it, why not?
I think that if it's a SRPT test and hasn't passed after 40k games but shows a positive score, then it means it adds almost no to none elo. Usually such tests consist of adding several lines of code, so if they show no to almost no improvement in terms of elo, they are not worth it; at least for SF philosophy.
It's not that rare to see Marco Costalba committing a patch that indicate a very small-to none loss in elo but shows to be a simplification of code.

To Ronald de Man, where do I have to modify "#define S(mg, eg) make_score(mg, eg)" ? Because I see it in evaluate.cpp (line 98), in pawns.cpp (line 31) and in psqtab (line 25).

When I tried my SF modification locally, I got a 28 wins 12 losses and 60 draws in favor of my version. In the fishtest it went up to

Code: Select all

LLR: 1.26 (-2.94,2.94) [-1.50,4.50]
Total: 5214 W: 1029 L: 970 D: 3215
, so +59 wins on about 5214 games. But then it went downhill quickly. I'm a bit disheartened I must say :)
Absolutely, Robert, absolutely, I subscribe to everything you say.
But still, I would not bury the closed eval patch, but ratgher integrate it and try to improve on it. In this case it is not a matter of simplification, speed, etc., it is about introducing a whole new branch into Stockfish play that should pay off with time and subsequent modifications of closed eval.

Stockfish badly needs that. In engine-engine matches only about 1% of games are closed and engines as a rule do not know how to play such positions. But if you know how to proceed in closed games, suddenly 50% of all games played might relate to closed positions.
Ralph Stoesser
Posts: 408
Joined: Sat Mar 06, 2010 9:28 am

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Ralph Stoesser »

Lyudmil Tsvetkov wrote:
syzygy wrote:
michiguel wrote:
syzygy wrote:
syzygy wrote:Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.
The result after 500 games is 152 - 80 - 268, which is a bit less disastrous, but it seems rather unlikely that your proposal does not lose Elo.

This is not surprising. There was no good reason to believe that this change would not hurt impact playing strength. Your gut feeling is not worth any Elo.
This is what I was trying to explain to Lyudmil. Setting endgame values to zero disrupts the middle game interpolated values completely. In fact, those will be lower, which means that the effective material values and their relationship will be messed up too.
It is not easy to reason with him (try to find an argument against "don't you see, it is obvious"), but this time the idea was simple to test and refute.
Well, the result seems convincing, but I might as well ask you to run the necessary number of games so that you have a statistical LOS (of course, I will not do that).

My suggestion was not that this will work automatically for Stockfish, but that it is an idea worth trying. You know very well that between a crude idea and implementation there is a looong way to go. You should adjust some things, draw your conclusions, etc. For example, you might want also to check how Stockfish performs without middlegame psqt and compare which table is more important (which was our initial aim). Then you might want to look for a way around crude interpolations (for example, when you switch on endgame eval).

I never said Stockfish will perform better without endgame psqt, but that endgame psqt is mostly useless, much more so than middlegame psqt.
Don't forget about the king. King endgame PSQTs encode the knowledge to centralize the king. To remove that knowledge should hurt enough to be able to foresee it will not work to "remove" all endgame PSQTs. Moreover, whether something is worth trying does not only depend on the idea itself but also on how hard it is to correctly setup the experiment. If you set all midgame or endgame PSQTs to 0 you simply do not get what you probably would expect. Instead you would mess up the whole balance of the PSQTs. No chance that someone moderately experienced would schedule such a test for you. Typical case of "try and see yourself", preferably locally on your own machine, I would say:-). If one would not have to determine the bench number of a patch, it would be easy for not so much technically inclined people to setup a patch that changes some predefined values and to schedule a test in the testing framework. Only a web browser would be needed to do so. But to be able to determine the bench number one has to compile the patch and to run the bench command and that seems to be a hurdle for many people.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Lyudmil Tsvetkov »

Ralph Stoesser wrote:
Lyudmil Tsvetkov wrote:
syzygy wrote:
michiguel wrote:
syzygy wrote:
syzygy wrote:Just for fun I am running a test on my machine. After 155 games the score is 50 wins, 18 losses and 82 draws for the unmodified SF.
The result after 500 games is 152 - 80 - 268, which is a bit less disastrous, but it seems rather unlikely that your proposal does not lose Elo.

This is not surprising. There was no good reason to believe that this change would not hurt impact playing strength. Your gut feeling is not worth any Elo.
This is what I was trying to explain to Lyudmil. Setting endgame values to zero disrupts the middle game interpolated values completely. In fact, those will be lower, which means that the effective material values and their relationship will be messed up too.
It is not easy to reason with him (try to find an argument against "don't you see, it is obvious"), but this time the idea was simple to test and refute.
Well, the result seems convincing, but I might as well ask you to run the necessary number of games so that you have a statistical LOS (of course, I will not do that).

My suggestion was not that this will work automatically for Stockfish, but that it is an idea worth trying. You know very well that between a crude idea and implementation there is a looong way to go. You should adjust some things, draw your conclusions, etc. For example, you might want also to check how Stockfish performs without middlegame psqt and compare which table is more important (which was our initial aim). Then you might want to look for a way around crude interpolations (for example, when you switch on endgame eval).

I never said Stockfish will perform better without endgame psqt, but that endgame psqt is mostly useless, much more so than middlegame psqt.
Don't forget about the king. King endgame PSQTs encode the knowledge to centralize the king. To remove that knowledge should hurt enough to be able to foresee it will not work to "remove" all endgame PSQTs. Moreover, whether something is worth trying does not only depend on the idea itself but also on how hard it is to correctly setup the experiment. If you set all midgame or endgame PSQTs to 0 you simply do not get what you probably would expect. Instead you would mess up the whole balance of the PSQTs. No chance that someone moderately experienced would schedule such a test for you. Typical case of "try and see yourself", preferably locally on your own machine, I would say:-). If one would not have to determine the bench number of a patch, it would be easy for not so much technically inclined people to setup a patch that changes some predefined values and to schedule a test in the testing framework. Only a web browser would be needed to do so. But to be able to determine the bench number one has to compile the patch and to run the bench command and that seems to be a hurdle for many people.
Hi Ralph.

But I never meant disabling the king psqt, as well as pawn psqt, probably I was not specific, my intention was to disable only NBRQ tables in the endgame. Now I think that might have been the main reason for Ronald's results. Did you disable all tables, Ronald? King and pawn tables are of course useful for the endgame.

Now, I am not so concerned about psqt, Ralph, but more about the good positional ideas that might fail. Actually, personally I am not bothered how Stockfish will perform, my only problem is that I intend playing a match against Stockfish in the summer with full strength (now I am playing with about 20-30% strength), and in case Stockfish does not improve by then with at least some 100-150 elo, forget about Stockfish drawing the match... :shock: :shock:

That is my biggest concern, really. :D
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: Help me to test an idea for Stockfish (I don't know prog

Post by velmarin »

How funny, this thread is priceless.
:P
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Help me to test an idea for Stockfish (I don't know prog

Post by syzygy »

Lyudmil Tsvetkov wrote:My suggestion was not that this will work automatically for Stockfish, but that it is an idea worth trying.
This is what you wrote:
My guess is that engines will not suffer without endgame psqt
Very clearly SF does suffer.
You know very well that between a crude idea and implementation there is a looong way to go. You should adjust some things, draw your conclusions, etc.
Sure, so after setting all values to 0 as per your suggestion we should now adjust each of the values until SF is back at its old level, right?

I suggest instead to just stick to the tried and tested endgame values. Maybe it can help to tweak some of them, but setting them all to 0 and expect that a well-tuned engine such as SF would not suffer is... naive...
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Help me to test an idea for Stockfish (I don't know prog

Post by Sven »

Lyudmil Tsvetkov wrote:But I never meant disabling the king psqt, as well as pawn psqt, probably I was not specific, my intention was to disable only NBRQ tables in the endgame. Now I think that might have been the main reason for Ronald's results. Did you disable all tables, Ronald? King and pawn tables are of course useful for the endgame.
According to the link Ralph gave above, for SF there are two PSQ tables that are very significant for playing strength: Knight and King tables (see bottom of that page). So I would assume that setting Knight values to zero for the endgame would hurt strength.

Pawn and Rook PSQ tables are almost switched off for the endgame already in SF (at least according to that link above), both have constant values for all squares (Pawn = -8, Rook = +3) which could as well be replaced by modifying the endgame material value by that score and setting the endgame PSQ value to 0. Doing so would also be the better choice for other PSQ tables when experimenting with "switching them off": instead of only setting a table to zero, the material value should be adjusted accordingly (e.g. by adding some intelligently weighted average of the PSQ values) as well so that "switching off" only means not to get different values per square but keeps the overall scoring balanced. This problem would disappear if all PSQ values were in fact already averaged around zero and the difference were already part of the material score (but that is another issue).