Ahhh... the holy grail of computer chess

marcelk · Post by **marcelk** » Wed Aug 24, 2011 1:55 am

Don wrote:
marcelk wrote:
For me it is time to focus more on my long-term engine plans (esp. autonomous knowledge synthesis)

Ahhh the holy grail of computer chess! If you figure that out please let me know. I think many of us have toyed with that at one time or another.

Open a new thread and tell us your thoughts on "autonomous knowledge synthesis", maybe we can brainstorm this in the forum (instead of obsessive over what happens to Vas.)

that works so well in my current engine but that is also quite easy to decouple from chess if needed.

I can tell you that is an extremely painful path that is even slower than the standard approach of human-assisted improvements... (and it doesn't save human effort either, not yet...). But I have just noted that over the past half year, without changing a single line of code, my engine has grown from scoring a solid 0% against Crafty to 45% on ICC. Crafty playing on 8 cores, Rookie on 4, and crafty searching 5 to 6 ply more.

So the grail seems possible, give or take another decade.

So far I have done evaluation parameter tuning and opening book learning and I consider those solved. My immediate next step will be search tuning. After that evaluation pattern synthesis and I think somewhere around that point it would probably stop. (I can remotely imagine a program to learn by itself about passers and king safety, but I cannot see an algorithm coming up with concepts such as null-move search..)

Don · Post by **Don** » Wed Aug 24, 2011 2:54 am

marcelk wrote:
Don wrote:
marcelk wrote:
For me it is time to focus more on my long-term engine plans (esp. autonomous knowledge synthesis)

Ahhh the holy grail of computer chess! If you figure that out please let me know. I think many of us have toyed with that at one time or another.

Open a new thread and tell us your thoughts on "autonomous knowledge synthesis", maybe we can brainstorm this in the forum (instead of obsessive over what happens to Vas.)

that works so well in my current engine but that is also quite easy to decouple from chess if needed.
I can tell you that is an extremely painful path that is even slower than the standard approach of human-assisted improvements... (and it doesn't save human effort either, not yet...). But I have just noted that over the past half year, without changing a single line of code, my engine has grown from scoring a solid 0% against Crafty to 45% on ICC. Crafty playing on 8 cores, Rookie on 4, and crafty searching 5 to 6 ply more.

So the grail seems possible, give or take another decade.

So far I have done evaluation parameter tuning and opening book learning and I consider those solved. My immediate next step will be search tuning. After that evaluation pattern synthesis and I think somewhere around that point it would probably stop. (I can remotely imagine a program to learn by itself about passers and king safety, but I cannot see an algorithm coming up with concepts such as null-move search..)

So how do you do parameter tuning? This dicussion should be in programming and technical discussions section - but I am interested in hearing your ideas.

marcelk · Post by **marcelk** » Wed Aug 24, 2011 7:11 pm

Don wrote:
marcelk wrote: So far I have done evaluation parameter tuning and opening book learning and I consider those solved. My immediate next step will be search tuning. After that evaluation pattern synthesis and I think somewhere around that point it would probably stop. (I can remotely imagine a program to learn by itself about passers and king safety, but I cannot see an algorithm coming up with concepts such as null-move search..)
So how do you do parameter tuning? This dicussion should be in programming and technical discussions section - but I am interested in hearing your ideas.

[ I have asked the moderators to move the thread ]

Without going into the specifics of the realization (which I prefer not to disclose before the program development has leveled out), it is:

1. a least square fit,

2. established through hill-climbing,

3. on an oracle function that is comparing the result of 2-ply searches with the original game outcome,

4. in positions that were randomly selected from GM games (no filtering, not even for duplicates),

5. with the tuning holistically applied to the set of ~300 parameters in my program,

6. and resisting the urge to correct 'wrong' parameters manually.

My results so far show the program is extremely eager to sacrifice 1 or 2 pawns in the opening (which I need to correct with book) and has an aggressive, Tal-like, attacking style that the super-strong programs have no problem defending against, but the weaker ones can't handle. (But since Rookie v3 gets out-searched by 6 ply by those strong programs I think that is solvable by focussing on search next)

I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games. Therefore I'm currently experimenting with replacing the set of GM positions with a hybrid set from GMs (45%), CCRL games (45%) and self server games (10%).

Don · Post by **Don** » Wed Aug 24, 2011 8:20 pm

marcelk wrote:
Don wrote:
marcelk wrote: So far I have done evaluation parameter tuning and opening book learning and I consider those solved. My immediate next step will be search tuning. After that evaluation pattern synthesis and I think somewhere around that point it would probably stop. (I can remotely imagine a program to learn by itself about passers and king safety, but I cannot see an algorithm coming up with concepts such as null-move search..)
So how do you do parameter tuning? This dicussion should be in programming and technical discussions section - but I am interested in hearing your ideas.
[ I have asked the moderators to move the thread ]

Without going into the specifics of the realization (which I prefer not to disclose before the program development has leveled out), it is:

1. a least square fit,

2. established through hill-climbing,

3. on an oracle function that is comparing the result of 2-ply searches with the original game outcome,

4. in positions that were randomly selected from GM games (no filtering, not even for duplicates),

5. with the tuning holistically applied to the set of ~300 parameters in my program,

6. and resisting the urge to correct 'wrong' parameters manually.

My results so far show the program is extremely eager to sacrifice 1 or 2 pawns in the opening (which I need to correct with book) and has an aggressive, Tal-like, attacking style that the super-strong programs have no problem defending against, but the weaker ones can't handle. (But since Rookie v3 gets out-searched by 6 ply by those strong programs I think that is solvable by focussing on search next)

I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games. Therefore I'm currently experimenting with replacing the set of GM positions with a hybrid set from GMs (45%), CCRL games (45%) and self server games (10%).

It's difficult to determine success with automatic tuning because it depends on what you are comparing it to. If you can get more out if that if you had hand tune the weights yourself, then it's a limited form of success, but it may still be far inferior to weights someone else might be able to produce manually.

On another subject: In learning there is the concept of the "training signal", for example what do you compare against to know something is good or bad? In your description the training signal was the game outcome. With Temporal Different Learning the training signal is the evaluation of future positions and your distance from them. With population based algorithms (such as PBIL and genetic algorithms) the training signal is the "fitness function."

You want something that produces the strongest signal with least effort. I'm not sure game outcome is an ideal signal because it seems very weak to me. In other words a LOT of effort to get back 1 or 2 bits of information. However it does have some nice characteristics too, you cannot argue with results.

A very strong signal is produced by temporal concepts. You use the score of a future position to indicate the goodness of a current position and basically operate upon the delta. An example to clarify: if your evaluation at move 5 reports a negative score but at move 15 you find that you are winning, then there is some reason to believe your score at move 5 was in error. It's possible that the opponent blundered of course, but in self-play it's a reasonable assumption that even if the score was negative there may have been more resources in the position than you were aware of. So in this kind of learning the idea is to assign some credit to each move leading up to the current state and gradually make weight modifications that bring the reality in line with the programs thinking.

The beauty of this is that the final result essentially becomes a training signal too, but so does every intermediate position.

Years ago we used TDL on a version of Cilkchess and had some success. Like your experience we found that the program was aggressive - it was making all sorts of interesting sacrifices. I cannot say that they were all sound, but we were testing at limited depth, I think 6 ply. At any depth, a move is a calculated gamble and being wrong about a sacrifice, whether winning or losing, can cost you. So if there is to be error then why not error on the side of being too aggressive once in a while?

We didn't keep that version even though it tested quite well for technical reasons.

mike_bike_kite · Post by **mike_bike_kite** » Wed Aug 24, 2011 10:43 pm

Don wrote:So how do you do [evaluation]parameter tuning?

OMG - this is a question I think I can answer (at least if I understand the question correctly).

*Put all the weightings for each of your evaluation parts into a weighting table.
*Initialise these values with whatever you think is correct.
*Duplicate this table but alter one parameter - I just either add or subtract 1/2 the original value.
*Play the program against itself a number of times where one side has the new values and the other the old values.
*If the new values come out on top then adopt these values.
*Continue until bored
*If any of the values look really weird (ie 0 or -ve) then this indicates your code is wrong.

Don · Post by **Don** » Wed Aug 24, 2011 11:00 pm

mike_bike_kite wrote:
Don wrote:So how do you do [evaluation]parameter tuning?
OMG - this is a question I think I can answer (at least if I understand the question correctly).
*Put all the weightings for each of your evaluation parts into a weighting table.
*Initialise these values with whatever you think is correct.
*Duplicate this table but alter one parameter - I just either add or subtract 1/2 the original value.
*Play the program against itself a number of times where one side has the new values and the other the old values.
*If the new values come out on top then adopt these values.
*Continue until bored
*If any of the values look really weird (ie 0 or -ve) then this indicates your code is wrong.

This idea probably works given enough CPU power but is incredibly inefficient. The changes would be almost random (with only slight pressure to improve) and you would have to run a lot of games for each single change.

I think what you might like is some variation of simulated annealing. You basically make lot's of changes at one time, test, do again. Keep the changes that work, and base the next pass on modifications of those weights. After each pass the amount of change decreases gradually - that is called the annealing schedule.

Learning could be improved I think if a human put limits on the values each parameter could take i.e. we know that it would be unreasonable for a knight be less than 200 or more than 500 and that most positional terms should not be more than half a pawn - and any that are could be identified.

michiguel · Post by **michiguel** » Wed Aug 24, 2011 11:44 pm

marcelk wrote:
Don wrote:
marcelk wrote: So far I have done evaluation parameter tuning and opening book learning and I consider those solved. My immediate next step will be search tuning. After that evaluation pattern synthesis and I think somewhere around that point it would probably stop. (I can remotely imagine a program to learn by itself about passers and king safety, but I cannot see an algorithm coming up with concepts such as null-move search..)
So how do you do parameter tuning? This dicussion should be in programming and technical discussions section - but I am interested in hearing your ideas.
[ I have asked the moderators to move the thread ]

Without going into the specifics of the realization (which I prefer not to disclose before the program development has leveled out), it is:

1. a least square fit,

2. established through hill-climbing,

3. on an oracle function that is comparing the result of 2-ply searches with the original game outcome,

4. in positions that were randomly selected from GM games (no filtering, not even for duplicates),

5. with the tuning holistically applied to the set of ~300 parameters in my program,

6. and resisting the urge to correct 'wrong' parameters manually.

My results so far show the program is extremely eager to sacrifice 1 or 2 pawns in the opening (which I need to correct with book) and has an aggressive, Tal-like, attacking style that the super-strong programs have no problem defending against, but the weaker ones can't handle. (But since Rookie v3 gets out-searched by 6 ply by those strong programs I think that is solvable by focussing on search next)

I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games. Therefore I'm currently experimenting with replacing the set of GM positions with a hybrid set from GMs (45%), CCRL games (45%) and self server games (10%).

It looks very similar to what I posted here a couple of years ago, both, tuning and the style.
http://www.talkchess.com/forum/viewtopic.php?t=27266
I used games from CCRL.

Miguel

michiguel · Post by **michiguel** » Wed Aug 24, 2011 11:50 pm

Don wrote:
marcelk wrote:
Don wrote:
marcelk wrote: So far I have done evaluation parameter tuning and opening book learning and I consider those solved. My immediate next step will be search tuning. After that evaluation pattern synthesis and I think somewhere around that point it would probably stop. (I can remotely imagine a program to learn by itself about passers and king safety, but I cannot see an algorithm coming up with concepts such as null-move search..)
So how do you do parameter tuning? This dicussion should be in programming and technical discussions section - but I am interested in hearing your ideas.
[ I have asked the moderators to move the thread ]

Without going into the specifics of the realization (which I prefer not to disclose before the program development has leveled out), it is:

1. a least square fit,

2. established through hill-climbing,

3. on an oracle function that is comparing the result of 2-ply searches with the original game outcome,

4. in positions that were randomly selected from GM games (no filtering, not even for duplicates),

5. with the tuning holistically applied to the set of ~300 parameters in my program,

6. and resisting the urge to correct 'wrong' parameters manually.

My results so far show the program is extremely eager to sacrifice 1 or 2 pawns in the opening (which I need to correct with book) and has an aggressive, Tal-like, attacking style that the super-strong programs have no problem defending against, but the weaker ones can't handle. (But since Rookie v3 gets out-searched by 6 ply by those strong programs I think that is solvable by focussing on search next)

I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games. Therefore I'm currently experimenting with replacing the set of GM positions with a hybrid set from GMs (45%), CCRL games (45%) and self server games (10%).
It's difficult to determine success with automatic tuning because it depends on what you are comparing it to. If you can get more out if that if you had hand tune the weights yourself, then it's a limited form of success, but it may still be far inferior to weights someone else might be able to produce manually.

On another subject: In learning there is the concept of the "training signal", for example what do you compare against to know something is good or bad? In your description the training signal was the game outcome. With Temporal Different Learning the training signal is the evaluation of future positions and your distance from them. With population based algorithms (such as PBIL and genetic algorithms) the training signal is the "fitness function."

You want something that produces the strongest signal with least effort. I'm not sure game outcome is an ideal signal because it seems very weak to me. In other words a LOT of effort to get back 1 or 2 bits of information. However it does have some nice characteristics too, you cannot argue with results.

Using results, which is what I do, has something nice: You can start from scratch and you do not need predefined values for the parameters (at least in theory). To put it in practical terms, you have a very quick convergence from horrible starting numbers.

Once you do that, you can always move to what you mention in the next paragraph (which I do later).

Miguel

A very strong signal is produced by temporal concepts. You use the score of a future position to indicate the goodness of a current position and basically operate upon the delta. An example to clarify: if your evaluation at move 5 reports a negative score but at move 15 you find that you are winning, then there is some reason to believe your score at move 5 was in error. It's possible that the opponent blundered of course, but in self-play it's a reasonable assumption that even if the score was negative there may have been more resources in the position than you were aware of. So in this kind of learning the idea is to assign some credit to each move leading up to the current state and gradually make weight modifications that bring the reality in line with the programs thinking.

The beauty of this is that the final result essentially becomes a training signal too, but so does every intermediate position.

Years ago we used TDL on a version of Cilkchess and had some success. Like your experience we found that the program was aggressive - it was making all sorts of interesting sacrifices. I cannot say that they were all sound, but we were testing at limited depth, I think 6 ply. At any depth, a move is a calculated gamble and being wrong about a sacrifice, whether winning or losing, can cost you. So if there is to be error then why not error on the side of being too aggressive once in a while?

We didn't keep that version even though it tested quite well for technical reasons.

marcelk · Post by **marcelk** » Thu Aug 25, 2011 12:51 am

Don wrote: It's difficult to determine success with automatic tuning because it depends on what you are comparing it to. If you can get more out if that if you had hand tune the weights yourself, then it's a limited form of success, but it may still be far inferior to weights someone else might be able to produce manually.

Indeed. My primary objective is to remove the human chess expert from the design loop, and secondary reducing effort and third to find a more economic alternative for tuning than game play. The first is achieved as I can safely add patterns and don't have to worry about weights. The second is not, because working on the tuning problem costs me a lot of time too (but that is an initial cost only, not a recurring cost). The thirds I think is achieved.

For a paper I would have to show a correlation between tuning and ELO in games. The process currently is somewhat slow (~1 month to tune 300 parameters) so it is a hard to get enough data points with limited resources. Thus far I have always observed and ELO improvement after adding a pattern and retuning, so I'm confident the correlation can be shown and I intend to do so at some point in time.

The relation between game ELO and tuning has an indirection (or open end if you wish): The input GM games are not the same as the engine games that result from the process. Therefore manual expert tuning might exploit that difference, in theory. But in reality I think that if such manual improvement is possible that that is a sign of a missing pattern, not of a mistuned weight, and in my case some common chess patterns are not in there and the tuner is trying to compensate that with other parameters. That leads to exploitable play.

I also observe that a carefully tuned manual vector by chess experts is competitive with an auto tuned vector. At fixed depth games my simple eval outperforms some well-known engines, but not all of them. (I see that also as a sign of missing patterns though)

marcelk · Post by **marcelk** » Thu Aug 25, 2011 1:12 am

Don wrote: You want something that produces the strongest signal with least effort. I'm not sure game outcome is an ideal signal because it seems very weak to me. In other words a LOT of effort to get back 1 or 2 bits of information. However it does have some nice characteristics too, you cannot argue with results.

Yes, I agree with you.

The problem with TD is that you need something to bootstrap from (at least that is my assumption). Ok, I bootstrapped from GM games but I could just as well have used 1000 ELO games and bootstrap from that. And using game outcome can uncover knowledge that the oracle doesn't possess (or knowledge that even the GMs don't have but that be filtered out of their games). I'm not exactly sure if TD can correct blind spots. I have to admid that I haven't thought too much about that yet.

Game outcome is indeed a weak signal but a 2-ply search is a cheap operation also. I was trying to find something more economic than game playing which I consider an extremely wasteful method in terms of resources.

The only reason I do 2-ply instead of 0-ply is because I haven't optimized the communication between tuner and engine. That I/O still has such a big contribution that the ply difference is hardly noticeable. (First make something that works, then make it fast.)

Ahhh... the holy grail of computer chess

Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess

Re: Ahhh... the holy grail of computer chess