final drawscore testing results.

bob · Post by **bob** » Tue Sep 13, 2011 7:06 pm

After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...

Don · Post by **Don** » Fri Sep 16, 2011 8:57 pm

So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...

bob · Post by **bob** » Fri Sep 16, 2011 11:08 pm

Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...

I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.

Don · Post by **Don** » Fri Sep 16, 2011 11:52 pm

bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.

So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.

bob · Post by **bob** » Sat Sep 17, 2011 1:57 am

Don wrote:
bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.

I think one centipawn is WAY too small. My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...

BubbaTough · Post by **BubbaTough** » Sat Sep 17, 2011 2:11 am

bob wrote: I think one centipawn is WAY too small. My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...

I don't see why 20 + (x/100)*20 is better than 20+x/5...its not impossible but 20+x/5 is more intuitive.

-Sam

Don · Post by **Don** » Sat Sep 17, 2011 3:27 am

bob wrote:
Don wrote:
bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.
I think one centipawn is WAY too small.

1 centipawn per N ELO, so whether it's too big or too small depends on N doesn't it?

My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...

bob · Post by **bob** » Sun Sep 18, 2011 5:20 pm

Don wrote:
bob wrote:
Don wrote:
bob wrote:
Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.

It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.

I guess the same principle applies when you are playing up.

bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.

1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating

2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.

3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.

That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...

I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.

this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...

note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...

to be continued...

But in any case, +10 is always welcome...
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...

Running it on ICC and watching as much as possible to see what this does, if anything.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.
I think one centipawn is WAY too small.
1 centipawn per N ELO, so whether it's too big or too small depends on N doesn't it?

Yes. But in current context, I was assuming N was in the same range I was talking about. I'm still testing and playing around with ideas here...

My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...

hgm · Post by **hgm** » Sun Sep 18, 2011 8:37 pm

When your pawn value is pretty much constant over game phase, I would expect the contempt would have to be scaled with game phase. If you are playing a real patzer, being 100cP down in the opening phase should not really make you go for a draw yet. Better sac the Pawn and crush the opponent with your Queen and Rooks. OTOH, trying to avoid a draw in KRPKRP might be a completely hopeless affair even against a patzer, and giving the Pawn to avoid a draw will be a very bad idea unless you know he cannot search deeper than 1 ply..

Don · Post by **Don** » Mon Sep 19, 2011 4:39 am

hgm wrote:When your pawn value is pretty much constant over game phase, I would expect the contempt would have to be scaled with game phase. If you are playing a real patzer, being 100cP down in the opening phase should not really make you go for a draw yet. Better sac the Pawn and crush the opponent with your Queen and Rooks. OTOH, trying to avoid a draw in KRPKRP might be a completely hopeless affair even against a patzer, and giving the Pawn to avoid a draw will be a very bad idea unless you know he cannot search deeper than 1 ply..

My intuition on this is that it should apply mostly to the early game and that "real" draws (such as stalemate, 50 move rule or late endgame repetitions) should be almost zero.

However I actually tested that (by phasing out most of the contempt as the game progresses) and it did not test as well.

final drawscore testing results.

final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.

Re: final drawscore testing results.