final drawscore testing results.

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

final drawscore testing results.

Post by bob »

After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: final drawscore testing results.

Post by Don »

So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: final drawscore testing results.

Post by bob »

Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: final drawscore testing results.

Post by Don »

bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: final drawscore testing results.

Post by bob »

Don wrote:
bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.
I think one centipawn is WAY too small. My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: final drawscore testing results.

Post by BubbaTough »

bob wrote: I think one centipawn is WAY too small. My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
I don't see why 20 + (x/100)*20 is better than 20+x/5...its not impossible but 20+x/5 is more intuitive.

-Sam
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: final drawscore testing results.

Post by Don »

bob wrote:
Don wrote:
bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.
I think one centipawn is WAY too small.
1 centipawn per N ELO, so whether it's too big or too small depends on N doesn't it?

My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: final drawscore testing results.

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.
I think one centipawn is WAY too small.
1 centipawn per N ELO, so whether it's too big or too small depends on N doesn't it?
Yes. But in current context, I was assuming N was in the same range I was talking about. I'm still testing and playing around with ideas here...


My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
User avatar
hgm
Posts: 27828
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: final drawscore testing results.

Post by hgm »

When your pawn value is pretty much constant over game phase, I would expect the contempt would have to be scaled with game phase. If you are playing a real patzer, being 100cP down in the opening phase should not really make you go for a draw yet. Better sac the Pawn and crush the opponent with your Queen and Rooks. OTOH, trying to avoid a draw in KRPKRP might be a completely hopeless affair even against a patzer, and giving the Pawn to avoid a draw will be a very bad idea unless you know he cannot search deeper than 1 ply..
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: final drawscore testing results.

Post by Don »

hgm wrote:When your pawn value is pretty much constant over game phase, I would expect the contempt would have to be scaled with game phase. If you are playing a real patzer, being 100cP down in the opening phase should not really make you go for a draw yet. Better sac the Pawn and crush the opponent with your Queen and Rooks. OTOH, trying to avoid a draw in KRPKRP might be a completely hopeless affair even against a patzer, and giving the Pawn to avoid a draw will be a very bad idea unless you know he cannot search deeper than 1 ply..
My intuition on this is that it should apply mostly to the early game and that "real" draws (such as stalemate, 50 move rule or late endgame repetitions) should be almost zero.

However I actually tested that (by phasing out most of the contempt as the game progresses) and it did not test as well.