After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
final drawscore testing results.
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: final drawscore testing results.
So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: final drawscore testing results.
I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
Running it on ICC and watching as much as possible to see what this does, if anything.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: final drawscore testing results.
So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.bob wrote:I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
Running it on ICC and watching as much as possible to see what this does, if anything.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: final drawscore testing results.
I think one centipawn is WAY too small. My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...Don wrote:So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.bob wrote:I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
Running it on ICC and watching as much as possible to see what this does, if anything.
-
- Posts: 1154
- Joined: Fri Jun 23, 2006 5:18 am
Re: final drawscore testing results.
I don't see why 20 + (x/100)*20 is better than 20+x/5...its not impossible but 20+x/5 is more intuitive.bob wrote: I think one centipawn is WAY too small. My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
-Sam
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: final drawscore testing results.
1 centipawn per N ELO, so whether it's too big or too small depends on N doesn't it?bob wrote:I think one centipawn is WAY too small.Don wrote:So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.bob wrote:I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
Running it on ICC and watching as much as possible to see what this does, if anything.
My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: final drawscore testing results.
Yes. But in current context, I was assuming N was in the same range I was talking about. I'm still testing and playing around with ideas here...Don wrote:1 centipawn per N ELO, so whether it's too big or too small depends on N doesn't it?bob wrote:I think one centipawn is WAY too small.Don wrote:So when Komodo is in tournaments I plan to simply use 1 centipawn contempt per N elo points. I'll estimate the ELO of my opponents conservatively to deal with uncertainty - assuming they are closer to Komodo that I think.bob wrote:I tried both symmetrical and asymmetrical approaches. Nothing worked better than simple symmetry. tried various levels of resolution. 100's seemed the best, but not much better, compared to units of 200. I even tried a function that approximates the elo idea. Ie 200 points worse means losing 3 of every 4, 400 points worse is not twice as bad, it means losing 15 of every 16 games. That didn't seem to help either, but I do have the drawback that I have not searched for opponents that week to factor in to my cluster testing. All of my opponents are in the range -300 to +200. So on the + side my tests were all about the same, obviously...Don wrote:So if I understand this correctly the formula is symmetrical right? I'm not surprised, I tested the formula playing down and it was quite a big boost in ELO.
It's easy to see how this can improve your ELO. Imagine that your draw score is zero and you play 1000 ELO down and your program is black. You come out of book with a white advantage (or after a few "ordinary moves" white still has an advantage) and you find a way to draw. It would be ridiculous to go for the repetition. One loss against a 1000 weaker play won't help your ELO any.
I guess the same principle applies when you are playing up.
bob wrote:After a lot of testing and such, with a couple of ideas still queued up, here's what I have found works the best for me, so far, with an improvement of +10 Elo over the older and simpler approach.
1. I compute a value "rd" (rating difference) = opponents_rating - craftys_rating
2. If this is > 0, I add 50, if it is < 0, I subtract 50. I then divide by 100 which simply counts the number of 100 Elo "intervals" between Crafty and the opponent.
3. I now compute draw_score = rd * 20 + 20. That is, for each 100 point difference in rating (rounded up, obviously due to the +/- 50 above) the draw score is increased/decreased by 20. If Crafty is stronger, then the drawscore is -20 * #_of_intervals, if Crafty is weaker, then the drawscore is +20 * #_of_intervals.
That +20 bias was tuned to crafty and is likely unique. It simply shifts the drawscore upward by .2 pawns in all cases. This likely means that at some points in the tree, Crafty's score is just a little optimistic, and it would normally play for a win and lose, where with the larger draw score, it will play for the draw unless the score is > 0.2 (against an equal opponent, of course)...
I looked at overall match results, and at results between individual opponents (as shown in the last post on this subject.) The only hypothesis left is that the above seems to do better against higher-rated opponents, but I am not sure that it is the "optimal" answer against lower-rated opponents. I am testing that next, where that "20" varied, but doubled if crafty is stronger, but left at the normal value if Crafty is weaker.
this is also a purely linear formula, where Elo is not a linear function that is comparable. So there might be a better formula for "RD" which is the second thing on my list to play around with...
note that this only works if you know your opponent's rating. Xboard provides that when playing on ICC. My automated referee program provides it in my cluster testing, also by the "rating -myrating- -opponent-rating-" command...
to be continued...
But in any case, +10 is always welcome...
Running it on ICC and watching as much as possible to see what this does, if anything.
My best results were with values in the 20-50 centipawn range per 100 Elo in difference. Once I got past 100 I started to see issues in trying too hard to avoid draws with weaker opponents...
-
- Posts: 27828
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: final drawscore testing results.
When your pawn value is pretty much constant over game phase, I would expect the contempt would have to be scaled with game phase. If you are playing a real patzer, being 100cP down in the opening phase should not really make you go for a draw yet. Better sac the Pawn and crush the opponent with your Queen and Rooks. OTOH, trying to avoid a draw in KRPKRP might be a completely hopeless affair even against a patzer, and giving the Pawn to avoid a draw will be a very bad idea unless you know he cannot search deeper than 1 ply..
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: final drawscore testing results.
My intuition on this is that it should apply mostly to the early game and that "real" draws (such as stalemate, 50 move rule or late endgame repetitions) should be almost zero.hgm wrote:When your pawn value is pretty much constant over game phase, I would expect the contempt would have to be scaled with game phase. If you are playing a real patzer, being 100cP down in the opening phase should not really make you go for a draw yet. Better sac the Pawn and crush the opponent with your Queen and Rooks. OTOH, trying to avoid a draw in KRPKRP might be a completely hopeless affair even against a patzer, and giving the Pawn to avoid a draw will be a very bad idea unless you know he cannot search deeper than 1 ply..
However I actually tested that (by phasing out most of the contempt as the game progresses) and it did not test as well.