Positional learning

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

bhlangonijr wrote:
Michael Sherwin wrote:Oh, let's see now, maybe the question should be, 'what open source engine has used its learning ability to increase its performance and ELO rating over time against ever strengthening opposition'.

Wait a minute, let me think, sense this is a really tough question.

... thinking ... thinking ... thinking

I got the answer!

Well, there is only one that I can think of--RomiChess!

Check out the performance at the Open War tournaments for version P3k.

Or better yet, check out P3k at WBEC:

15th edition, 4th division #13

16th edition, 4th division #3

17th edition, 3rd division #4

18th edition next, 2nd division #?, DanaSah is seeded 7th and RomiChessP3k tied with DanaSah 4.45 at Open War 7, ahead of Francesca MAD 0.14 and a slew of other stronger (than Romi) engines!

Of course according to some authors of note the learning in RomiChess is crap, flawed and with out value. Ed Schroeder said though, "this learning will make all other learning obsolete"!

Edit: I forgot to mention that RomiChessP3k started WBEC off with a Dr. Wael Deeb opening book. Romi had to adjust to the book first and refine the lines through learning.
Nice. I will take a look at Romi's source code. I'd appreciate some overview about the method you have used.

Thanks,
Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
OliverUwira
Posts: 170
Joined: Mon Sep 13, 2010 9:57 am
Location: Frankfurt am Main

Re: Positional learning

Post by OliverUwira »

Michael Sherwin wrote: Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
This is about book learning only, isn't it? Have you also experimented with self-learning evaluation parameters?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

OliverUwira wrote:
Michael Sherwin wrote: Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
This is about book learning only, isn't it? Have you also experimented with self-learning evaluation parameters?
Yes, mainly book learning. However, it also mimics quite closely how humans progress over the years once their understanding of chess itself stops progressing. They copy other players moves and 'fish' around for better ones. Edit: Not just book learning though as when Romi fishes around for a move she might find one that is not in book that may if it wins become book. So, it does go beyond simple book selection learning into book creation. Start Romi off with an empty learn file and she will create her own book totally from scratch!

Only thought about eval learning, not experimented with. Thinking is:

Define a plausible range for each parameter. Assign random values in the range. Play many thousands of very fast games and save the results. Repeat this many thousands of times. Can all be automated. Use the best N results to slightly restrict the range for each parameter. Repeat until parameters have all been restricted to one value.

Something like that.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Positional learning

Post by Michael Sherwin »

Michael Sherwin wrote:
OliverUwira wrote:
Michael Sherwin wrote: Very simply Romi uses two types of learning:

1. Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

2. Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
This is about book learning only, isn't it? Have you also experimented with self-learning evaluation parameters?
Yes, mainly book learning. However, it also mimics quite closely how humans progress over the years once their understanding of chess itself stops progressing. They copy other players moves and 'fish' around for better ones. Edit: Not just book learning though as when Romi fishes around for a move she might find one that is not in book that may if it wins become book. So, it does go beyond simple book selection learning into book creation. Start Romi off with an empty learn file and she will create her own book totally from scratch!

Only thought about eval learning, not experimented with. Thinking is:

Define a plausible range for each parameter. Assign random values in the range. Play many thousands of very fast games and save the results. Repeat this many thousands of times. Can all be automated. Use the best N results to slightly restrict the range for each parameter. Repeat until parameters have all been restricted to one value.

Something like that.
After giving it some more thought only Monkey See Monkey Do (MSMD) is book learning. With out this Romi would never play a book move. And Pavlov's adaptation would still work well over time as a purely position (not positional as in eval parameters) learning/modification method. It is juat that with MSMD, Pavlov's learning for winning lines is quickly absorbed into the book making it seem at first glance as though everything is merely book learning.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through