Positional learning

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 4:23 am
Location: Milky Way

Re: Positional learning

Post by bhlangonijr »

OliverUwira wrote:
bhlangonijr wrote: IMO the problem is we are trying to fit some learning system into the current evaluation structure and all evaluation terms are combined linearly to output the static evaluation. Having all these human-like evaluation terms is really very unlikely to come up with a good learning system to create a good balance between them.
I think a better approach would be pick many evaluation terms as we know and translate it into a more "fundamental" rule. I can imagine that for the pawn structure, for example. Then it would be more feasible to model a workable learning system.
I'm trying to go into a similar direction. In fact I haven't written a line of code for four weeks but spent quite some time on brainstorming ideas about evaluation design.

The approach I want to try is expressing positional characteristics as functional relations.

Take e.g. bishops. Their strength increases when pawns come off and decreases with pawns blocked or hemmed on their colour. I try to model notions like this as a term V = Base * F(x). In above cases, x might be the number of pawns or, respectively, the number of blocked pawns.

In the case of blocked pawns I would use a logarithmic function, such that the penalty does not increase too aggressively. One blocked pawn on the wrong colour is often enough for giving the other side a serious advantage (e.g. in ram positions, i.e. isolated pawns on d4/d5 and light squared bishops), but if there are more the "marginal utility" is likely diminishing.

The idea I have is that if the F(x) are continuously differentiable, the feature sum will also be, which will facilitate the application of curve fitting algorithms and similar stuff, maybe even TDL.

I'm quite excited to find out how this approach is going to work out :D
That's a very interesting idea and very close to what I am thinking of. I imagine many of the evaluation positional terms that exists today as being merely "symptoms" of a more essential chess property. For instance, think about rook on 7th rank. We usually give a bonus for putting the rook on the 7th rank although at short term the engine has no clue what it is good for. A more "essential" rule would look into the distance (Chebyshev) between the rook and the opposite king, the weakness of the opposite pawn formation on that side, especially the ones composing the king shield, etc. I think almost all information needed to make a simple function out of this is based on deltas of pieces location and attacked squares. It makes more sense to me use those relations to feed the inputs of a learning system instead of trying to tune arbitrary heuristic evaluation terms.

Keep us posted about your findings!
bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 4:23 am
Location: Milky Way

Re: Positional learning

Post by bhlangonijr »

Don wrote: Very few learning systems actually figure out new concepts on their own, they start with raw materials we provide them - stuff that has to be engineered by hand. It would be refreshing to see something different - something that can figure it out on it's own and create new concepts.

The concepts we use are arbitrary rules of thumb that we made up anyway, based on observation. They are not first principles even though we think of them as the fundamentals. The only fundamental is try to mate the opponent and everything else flows backwards from that. It's easier to do if you have more material, better mobility, etc.
I agree with everything you said, but I am not saying that the learning system should figure out new concepts based on the raw inputs. Actually what i am saying is that we should draw the common properties from the various arbitrary evaluation terms and then use only these fundamental terms to feed the learning system. It would be one more step besides the normalization of the inputs.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Positional learning

Post by Don »

bhlangonijr wrote:
Don wrote: Very few learning systems actually figure out new concepts on their own, they start with raw materials we provide them - stuff that has to be engineered by hand. It would be refreshing to see something different - something that can figure it out on it's own and create new concepts.

The concepts we use are arbitrary rules of thumb that we made up anyway, based on observation. They are not first principles even though we think of them as the fundamentals. The only fundamental is try to mate the opponent and everything else flows backwards from that. It's easier to do if you have more material, better mobility, etc.
I agree with everything you said, but I am not saying that the learning system should figure out new concepts based on the raw inputs. Actually what i am saying is that we should draw the common properties from the various arbitrary evaluation terms and then use only these fundamental terms to feed the learning system. It would be one more step besides the normalization of the inputs.
I knew you were not saying that, it's just something that I thought would be a useful characteristic of a good learning system.

I have often wondered what fundamental concepts could be presented to a learning system that gave the most flexibility without imposing our own notions too much. All I could come up with is which piece is on which square - the "human" learning system is able to produce all sorts of concepts from just that, such as assigning rough values to the pieces, rook on open files, king is safe in the corner, etc. A system has to be able to invent patterns that are meaningful.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

Oh, let's see now, maybe the question should be, 'what open source engine has used its learning ability to increase its performance and ELO rating over time against ever strengthening opposition'.

Wait a minute, let me think, sense this is a really tough question.

... thinking ... thinking ... thinking

I got the answer!

Well, there is only one that I can think of--RomiChess!

Check out the performance at the Open War tournaments for version P3k.

Or better yet, check out P3k at WBEC:

15th edition, 4th division #13

16th edition, 4th division #3

17th edition, 3rd division #4

18th edition next, 2nd division #?, DanaSah is seeded 7th and RomiChessP3k tied with DanaSah 4.45 at Open War 7, ahead of Francesca MAD 0.14 and a slew of other stronger (than Romi) engines!

Of course according to some authors of note the learning in RomiChess is crap, flawed and with out value. Ed Schroeder said though, "this learning will make all other learning obsolete"!

Edit: I forgot to mention that RomiChessP3k started WBEC off with a Dr. Wael Deeb opening book. Romi had to adjust to the book first and refine the lines through learning.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 4:23 am
Location: Milky Way

Re: Positional learning

Post by bhlangonijr »

Michael Sherwin wrote:Oh, let's see now, maybe the question should be, 'what open source engine has used its learning ability to increase its performance and ELO rating over time against ever strengthening opposition'.

Wait a minute, let me think, sense this is a really tough question.

... thinking ... thinking ... thinking

I got the answer!

Well, there is only one that I can think of--RomiChess!

Check out the performance at the Open War tournaments for version P3k.

Or better yet, check out P3k at WBEC:

15th edition, 4th division #13

16th edition, 4th division #3

17th edition, 3rd division #4

18th edition next, 2nd division #?, DanaSah is seeded 7th and RomiChessP3k tied with DanaSah 4.45 at Open War 7, ahead of Francesca MAD 0.14 and a slew of other stronger (than Romi) engines!

Of course according to some authors of note the learning in RomiChess is crap, flawed and with out value. Ed Schroeder said though, "this learning will make all other learning obsolete"!

Edit: I forgot to mention that RomiChessP3k started WBEC off with a Dr. Wael Deeb opening book. Romi had to adjust to the book first and refine the lines through learning.
Nice. I will take a look at Romi's source code. I'd appreciate some overview about the method you have used.

Thanks,
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

bhlangonijr wrote:
Michael Sherwin wrote:Oh, let's see now, maybe the question should be, 'what open source engine has used its learning ability to increase its performance and ELO rating over time against ever strengthening opposition'.

Wait a minute, let me think, sense this is a really tough question.

... thinking ... thinking ... thinking

I got the answer!

Well, there is only one that I can think of--RomiChess!

Check out the performance at the Open War tournaments for version P3k.

Or better yet, check out P3k at WBEC:

15th edition, 4th division #13

16th edition, 4th division #3

17th edition, 3rd division #4

18th edition next, 2nd division #?, DanaSah is seeded 7th and RomiChessP3k tied with DanaSah 4.45 at Open War 7, ahead of Francesca MAD 0.14 and a slew of other stronger (than Romi) engines!

Of course according to some authors of note the learning in RomiChess is crap, flawed and with out value. Ed Schroeder said though, "this learning will make all other learning obsolete"!

Edit: I forgot to mention that RomiChessP3k started WBEC off with a Dr. Wael Deeb opening book. Romi had to adjust to the book first and refine the lines through learning.
Nice. I will take a look at Romi's source code. I'd appreciate some overview about the method you have used.

Thanks,
Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
OliverUwira
Posts: 170
Joined: Mon Sep 13, 2010 9:57 am
Location: Frankfurt am Main

Re: Positional learning

Post by OliverUwira »

Michael Sherwin wrote: Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
This is about book learning only, isn't it? Have you also experimented with self-learning evaluation parameters?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

OliverUwira wrote:
Michael Sherwin wrote: Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
This is about book learning only, isn't it? Have you also experimented with self-learning evaluation parameters?
Yes, mainly book learning. However, it also mimics quite closely how humans progress over the years once their understanding of chess itself stops progressing. They copy other players moves and 'fish' around for better ones. Edit: Not just book learning though as when Romi fishes around for a move she might find one that is not in book that may if it wins become book. So, it does go beyond simple book selection learning into book creation. Start Romi off with an empty learn file and she will create her own book totally from scratch!

Only thought about eval learning, not experimented with. Thinking is:

Define a plausible range for each parameter. Assign random values in the range. Play many thousands of very fast games and save the results. Repeat this many thousands of times. Can all be automated. Use the best N results to slightly restrict the range for each parameter. Repeat until parameters have all been restricted to one value.

Something like that.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

Michael Sherwin wrote:
OliverUwira wrote:
Michael Sherwin wrote: Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
This is about book learning only, isn't it? Have you also experimented with self-learning evaluation parameters?
Yes, mainly book learning. However, it also mimics quite closely how humans progress over the years once their understanding of chess itself stops progressing. They copy other players moves and 'fish' around for better ones. Edit: Not just book learning though as when Romi fishes around for a move she might find one that is not in book that may if it wins become book. So, it does go beyond simple book selection learning into book creation. Start Romi off with an empty learn file and she will create her own book totally from scratch!

Only thought about eval learning, not experimented with. Thinking is:

Define a plausible range for each parameter. Assign random values in the range. Play many thousands of very fast games and save the results. Repeat this many thousands of times. Can all be automated. Use the best N results to slightly restrict the range for each parameter. Repeat until parameters have all been restricted to one value.

Something like that.
After giving it some more thought only Monkey See Monkey Do (MSMD) is book learning. With out this Romi would never play a book move. And Pavlov's adaptation would still work well over time as a purely position (not positional as in eval parameters) learning/modification method. It is juat that with MSMD, Pavlov's learning for winning lines is quickly absorbed into the book making it seem at first glance as though everything is merely book learning.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through