Dragon 2.6.1 Elo levels

lkaufman · Post by **lkaufman** » Sat Jan 29, 2022 3:00 am

We're getting enough results from human Rapid games with the Dragon 2.6.1 Elo levels to draw some preliminary conclusions, though it's still early and we need a lot more data. Some of the data is from private testing so I'll just give the tentative conclusion. It seems that Dragon 2.6.1 plays stronger than the intended level at all elo settings near or above 1700 (all the way to top human level). On average I think the error is at least 100 elo, maybe more. So until we are able to be more precise, I suggest that anyone setting the Elo for a Rapid (human) match, if your elo is 1700 or more, you should set it to 100 below your actual elo for a reasonably fair match. At 1600 and below, I don't have much reason to suspect a significant error. Our next release will adjust the settings based on the best information we have at that time.

Chessqueen · Post by **Chessqueen** » Sat Jan 29, 2022 3:30 am

lkaufman wrote: ↑Sat Jan 29, 2022 3:00 am We're getting enough results from human Rapid games with the Dragon 2.6.1 Elo levels to draw some preliminary conclusions, though it's still early and we need a lot more data. Some of the data is from private testing so I'll just give the tentative conclusion. It seems that Dragon 2.6.1 plays stronger than the intended level at all elo settings near or above 1700 (all the way to top human level). On average I think the error is at least 100 elo, maybe more. So until we are able to be more precise, I suggest that anyone setting the Elo for a Rapid (human) match, if your elo is 1700 or more, you should set it to 100 below your actual elo for a reasonably fair match. At 1600 and below, I don't have much reason to suspect a significant error. Our next release will adjust the settings based on the best information we have at that time.

That is good to know, but I still do NOT get it clear, and I have an Idea that Komodo Dragon 2.6.1 could possibly draw a 6 games match against a GM rated around 2400 to 2425 at Knight Odds, since GM Ben Finegold was a little bit too strong, or you just have to use the MCTS version for all the Knight Odds matches from now on ? Another question that i have is why the MCTS performs better than the standard when playing at Knight Odds, it is more tactical or more positional than the standard version, or does it play more risky? https://ratings.fide.com/profile/2000261

lkaufman · Post by **lkaufman** » Sat Jan 29, 2022 6:48 am

Chessqueen wrote: ↑Sat Jan 29, 2022 3:30 am
lkaufman wrote: ↑Sat Jan 29, 2022 3:00 am We're getting enough results from human Rapid games with the Dragon 2.6.1 Elo levels to draw some preliminary conclusions, though it's still early and we need a lot more data. Some of the data is from private testing so I'll just give the tentative conclusion. It seems that Dragon 2.6.1 plays stronger than the intended level at all elo settings near or above 1700 (all the way to top human level). On average I think the error is at least 100 elo, maybe more. So until we are able to be more precise, I suggest that anyone setting the Elo for a Rapid (human) match, if your elo is 1700 or more, you should set it to 100 below your actual elo for a reasonably fair match. At 1600 and below, I don't have much reason to suspect a significant error. Our next release will adjust the settings based on the best information we have at that time.
That is good to know, but I still do NOT get it clear, and I have an Idea that Komodo Dragon 2.6.1 could possibly draw a 6 games match against a GM rated around 2400 to 2425 at Knight Odds, since GM Ben Finegold was a little bit too strong, or you just have to use the MCTS version for all the Knight Odds matches from now on ? Another question that i have is why the MCTS performs better than the standard when playing at Knight Odds, it is more tactical or more positional than the standard version, or does it play more risky? https://ratings.fide.com/profile/2000261

My comments above have to do with setting Elo levels for even game play, it has nothing to do with knight odds play when we use the full power of the engine. I don't know for sure that MCTS will perform better at knight odds vs humans than standard mode, but the preponderance of evidence suggests that (including many unofficial test games played). Maybe regular mode might perform around 2400 giving knight odds in Rapid, MCTS mode more like 2450, that would be my estimate. As to why this is so, it is probably because MCTS chooses the move that does best against a variety of plausible opposition moves, whereas standard mode always assumes that the opponent will play what it would play. Humans being very different from engines, the MCTS assumption is more realistic against a human. In other words, it plays the odds, rather than trusting the opponent to play perfectly.

Odd Gunnar Malin · Post by **Odd Gunnar Malin** » Sat Jan 29, 2022 2:21 pm

Hi.
I see these discussion drift away allready on first reply, but that ok with me for now. I have started some training sessions to get prepared for this summers nation championship, I play in the 50+ group, and have little time to do much more on my freetime.
Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games). I let it select the rating automatic, eg. it set it to my current rating. I play those game as against a real opponent without any cheat with paperbook in hand or any other information. The book I use is based on games by players I will meet in the next tournament and I don't study the book before the game. Of course I study the line played in the game after a game, that why I create those book, to learn those opening I will meet.
I will still play these game against Dragon. If you want, I can put them up on Lichess (study) or chess.com (library) and share those.

Edit: Forgot to mention that I have created a little utility to slow down Dragon in a somewhat intelligent way. Eg. this I spoke about in an earlier thread.

Chessqueen · Post by **Chessqueen** » Sat Jan 29, 2022 3:54 pm

Odd Gunnar Malin wrote: ↑Sat Jan 29, 2022 2:21 pm Hi.
I see these discussion drift away allready on first reply, but that ok with me for now. I have started some training sessions to get prepared for this summers nation championship, I play in the 50+ group, and have little time to do much more on my freetime.
Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games). I let it select the rating automatic, eg. it set it to my current rating. I play those game as against a real opponent without any cheat with paperbook in hand or any other information. The book I use is based on games by players I will meet in the next tournament and I don't study the book before the game. Of course I study the line played in the game after a game, that why I create those book, to learn those opening I will meet.
I will still play these game against Dragon. If you want, I can put them up on Lichess (study) or chess.com (library) and share those.

Edit: Forgot to mention that I have created a little utility to slow down Dragon in a somewhat intelligent way. Eg. this I spoke about in an earlier thread.

Good Luck with your training, and once you figured out how to beat Komodo Dragon 2.6.1 at 1650 UCI_Elo raise it to1700.

lkaufman · Post by **lkaufman** » Sat Jan 29, 2022 6:04 pm

Odd Gunnar Malin wrote: ↑Sat Jan 29, 2022 2:21 pm Hi.
I see these discussion drift away allready on first reply, but that ok with me for now. I have started some training sessions to get prepared for this summers nation championship, I play in the 50+ group, and have little time to do much more on my freetime.
Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games). I let it select the rating automatic, eg. it set it to my current rating. I play those game as against a real opponent without any cheat with paperbook in hand or any other information. The book I use is based on games by players I will meet in the next tournament and I don't study the book before the game. Of course I study the line played in the game after a game, that why I create those book, to learn those opening I will meet.
I will still play these game against Dragon. If you want, I can put them up on Lichess (study) or chess.com (library) and share those.

Edit: Forgot to mention that I have created a little utility to slow down Dragon in a somewhat intelligent way. Eg. this I spoke about in an earlier thread.

You wrote once that your FIDE rating is 1714, so this would appear to support my hypothesis that the Elo settings on Dragon 2.6.1 play somewhat stronger than the stated elo at 15' + 10" time control. Results are pretty consistent on this point.

Odd Gunnar Malin · Post by **Odd Gunnar Malin** » Sat Jan 29, 2022 7:26 pm

lkaufman wrote: ↑Sat Jan 29, 2022 6:04 pm
Odd Gunnar Malin wrote: ↑Sat Jan 29, 2022 2:21 pm Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games).
You wrote once that your FIDE rating is 1714, so this would appear to support my hypothesis that the Elo settings on Dragon 2.6.1 play somewhat stronger than the stated elo at 15' + 10" time control. Results are pretty consistent on this point.

Yes, I know. It was to confirm your findings.

Cornfed · Post by **Cornfed** » Sun Jan 30, 2022 3:54 am

Chessqueen wrote: ↑Sat Jan 29, 2022 3:54 pm
Odd Gunnar Malin wrote: ↑Sat Jan 29, 2022 2:21 pm Hi.
I see these discussion drift away allready on first reply, but that ok with me for now. I have started some training sessions to get prepared for this summers nation championship, I play in the 50+ group, and have little time to do much more on my freetime.
Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games). I let it select the rating automatic, eg. it set it to my current rating. I play those game as against a real opponent without any cheat with paperbook in hand or any other information. The book I use is based on games by players I will meet in the next tournament and I don't study the book before the game. Of course I study the line played in the game after a game, that why I create those book, to learn those opening I will meet.
I will still play these game against Dragon. If you want, I can put them up on Lichess (study) or chess.com (library) and share those.

Edit: Forgot to mention that I have created a little utility to slow down Dragon in a somewhat intelligent way. Eg. this I spoke about in an earlier thread.
Good Luck with your training, and once you figured out how to beat Komodo Dragon 2.6.1 at 1650 UCI_Elo raise it to1700.

I am no programmer, but as different playing sites ratings vary so much and so many play online these day...and OTB is more of a 'gold standard' because we try harder...the cat or wife doesn't interrupt us, etc...would it ever be possible to let Dragon analyze say, 50 games of an individual and let itself determine a suitable level of play to let me theoretically get close to a 50/50 result? It would seem to work for either OTB at long time controls or even on line at 15min/10sec...or whatever.

Lets say Dragon - based on a good set of my games (I'm just saying 50 as an example) deemed me a 2050 player. I could use that and be more assured of a 50/50 result - no 'adjusting on the fly, game to game as I think Fritz tries to do...if I wanted to 'play up', I could set it to 2200 and still have my chances on a given day.

lkaufman · Post by **lkaufman** » Sun Jan 30, 2022 6:06 am

Cornfed wrote: ↑Sun Jan 30, 2022 3:54 am
Chessqueen wrote: ↑Sat Jan 29, 2022 3:54 pm
Odd Gunnar Malin wrote: ↑Sat Jan 29, 2022 2:21 pm Hi.
I see these discussion drift away allready on first reply, but that ok with me for now. I have started some training sessions to get prepared for this summers nation championship, I play in the 50+ group, and have little time to do much more on my freetime.
Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games). I let it select the rating automatic, eg. it set it to my current rating. I play those game as against a real opponent without any cheat with paperbook in hand or any other information. The book I use is based on games by players I will meet in the next tournament and I don't study the book before the game. Of course I study the line played in the game after a game, that why I create those book, to learn those opening I will meet.
I will still play these game against Dragon. If you want, I can put them up on Lichess (study) or chess.com (library) and share those.

Edit: Forgot to mention that I have created a little utility to slow down Dragon in a somewhat intelligent way. Eg. this I spoke about in an earlier thread.
Good Luck with your training, and once you figured out how to beat Komodo Dragon 2.6.1 at 1650 UCI_Elo raise it to1700.

I am no programmer, but as different playing sites ratings vary so much and so many play online these day...and OTB is more of a 'gold standard' because we try harder...the cat or wife doesn't interrupt us, etc...would it ever be possible to let Dragon analyze say, 50 games of an individual and let itself determine a suitable level of play to let me theoretically get close to a 50/50 result? It would seem to work for either OTB at long time controls or even on line at 15min/10sec...or whatever.

Lets say Dragon - based on a good set of my games (I'm just saying 50 as an example) deemed me a 2050 player. I could use that and be more assured of a 50/50 result - no 'adjusting on the fly, game to game as I think Fritz tries to do...if I wanted to 'play up', I could set it to 2200 and still have my chances on a given day.

It is certainly possible to develop a version of Dragon that would review a file of games and (for the time limit at which the games were played) give an estimated rating. We don't currently have good data on the average error rate at multiple time limits for players of various ratings (which of course in turn depends on whether we are talking about OTB FIDE ratings or online game scores with online ratings), but there is no problem in principle, it just takes a lot of work. But I think we'll soon be able to say with some precision that if your FIDE rating is X or your chess.com Rapid rating is Y or your lichess rapid rating is Z (given enough games to be valid), then do this simple calculation to determine a fair setting for Dragon Elo. If you only have an online rating and it is unrealistic due to interruptions or playing drunk or whatever, you just need to play 50 or so games under proper conditions to see what your real level is. Most likely this would be more accurate than a rating based on reviewing games, although I would very much like to be able to estimate ratings at various time limits from game scores. Then we could answer questions such as "Does Magnus Carlsen play better Rapid chess than Botvinnik or Tal played Classical chess?", or "What Classical Time limit Elo rating today would be of equal quality to Hikaru playing blitz?" or "Who would win a match between Ben Finegold and Paul Morphy?".

Cornfed · Post by **Cornfed** » Sun Jan 30, 2022 6:09 pm

lkaufman wrote: ↑Sun Jan 30, 2022 6:06 am
Cornfed wrote: ↑Sun Jan 30, 2022 3:54 am
Chessqueen wrote: ↑Sat Jan 29, 2022 3:54 pm
Odd Gunnar Malin wrote: ↑Sat Jan 29, 2022 2:21 pm Hi.
I see these discussion drift away allready on first reply, but that ok with me for now. I have started some training sessions to get prepared for this summers nation championship, I play in the 50+ group, and have little time to do much more on my freetime.
Anyhow, my current rating with Hiarcs as GUI and Dragon 2.6.1 in 15+10 games is 1634 (21 games). I let it select the rating automatic, eg. it set it to my current rating. I play those game as against a real opponent without any cheat with paperbook in hand or any other information. The book I use is based on games by players I will meet in the next tournament and I don't study the book before the game. Of course I study the line played in the game after a game, that why I create those book, to learn those opening I will meet.
I will still play these game against Dragon. If you want, I can put them up on Lichess (study) or chess.com (library) and share those.

Edit: Forgot to mention that I have created a little utility to slow down Dragon in a somewhat intelligent way. Eg. this I spoke about in an earlier thread.
Good Luck with your training, and once you figured out how to beat Komodo Dragon 2.6.1 at 1650 UCI_Elo raise it to1700.

I am no programmer, but as different playing sites ratings vary so much and so many play online these day...and OTB is more of a 'gold standard' because we try harder...the cat or wife doesn't interrupt us, etc...would it ever be possible to let Dragon analyze say, 50 games of an individual and let itself determine a suitable level of play to let me theoretically get close to a 50/50 result? It would seem to work for either OTB at long time controls or even on line at 15min/10sec...or whatever.

Lets say Dragon - based on a good set of my games (I'm just saying 50 as an example) deemed me a 2050 player. I could use that and be more assured of a 50/50 result - no 'adjusting on the fly, game to game as I think Fritz tries to do...if I wanted to 'play up', I could set it to 2200 and still have my chances on a given day.
It is certainly possible to develop a version of Dragon that would review a file of games and (for the time limit at which the games were played) give an estimated rating. We don't currently have good data on the average error rate at multiple time limits for players of various ratings (which of course in turn depends on whether we are talking about OTB FIDE ratings or online game scores with online ratings), but there is no problem in principle, it just takes a lot of work. But I think we'll soon be able to say with some precision that if your FIDE rating is X or your chess.com Rapid rating is Y or your lichess rapid rating is Z (given enough games to be valid), then do this simple calculation to determine a fair setting for Dragon Elo. If you only have an online rating and it is unrealistic due to interruptions or playing drunk or whatever, you just need to play 50 or so games under proper conditions to see what your real level is. Most likely this would be more accurate than a rating based on reviewing games, although I would very much like to be able to estimate ratings at various time limits from game scores. Then we could answer questions such as "Does Magnus Carlsen play better Rapid chess than Botvinnik or Tal played Classical chess?", or "What Classical Time limit Elo rating today would be of equal quality to Hikaru playing blitz?" or "Who would win a match between Ben Finegold and Paul Morphy?".

Yes, you get the idea! I like it because it is tailor made to the individual and you the individual does not have to try to arrive at different ratings at different TC's one game at a time.

Off my head I see possible issues (perhaps just phantoms...) with your approach. Perhaps items which have entered your head as well.

Online: almost all my games (most everyone's really) online are blitz. While I've beaten players as highly rated as 2700 on chess.com, I can certainly lose to people lower than myself. Truthfully, my setting allows only games for people <50 pts and with no upper limit as that is more testing so I lose more than I normally would if I played a wider range of players.

My local club has their monthly G30+5 today on lichess. I quit playing in those over a year ago because - with all that time...people still tend to blitz their moves out, blunder and it all became a pointless waste of an evening for me. Of course, it is hard for everyone to maintain their focus in longer games 'online'...especially when it's 'for fun'/no money involved. My 'Classical' (25 min +) rating there was 2258 and rising before I quit. Rapid?, similar but only after 19 games. On chess.com my rating (forget what it is - 2190 I think) benefited from various opponents having been caught cheating and because they gift you rating points as if you had won (as I recall), my rating there is probably well north of what it should be - certainly is in "daily" (Corresepondence) where I've been gift hundreds of rating points from their cheat detection.

I know comparatively few people have ever played enough FIDE rated games. I've played USCF tournament chess for right at 50 yrs now and despite hundreds of tournaments, have never been able to play in a FIDE rated event...a World Open, but that wasn't too many games. I wonder if the same holds true in other countries around the world and those which have their own ratings system? Perhaps though there is a good formula for each to match those individuals to a FIDE rating...but not likely among various time controls (?). The process just seems so convoluted with a lot of issues in comparing ratings which are iffy site to site anyway.

Yes, being able to have Dragon point everyone to some 'truth' as to how different players of different generations might fare against others would be unique and great fodder for message boards indeed. Going hand in hand with being able to feed it 50 or so games of ones own and let it set an internal level equal to ones level would certainly be something Team Stockfish would never offer.

Their only reason for being is to pursue elo.

Dragon 2.6.1 Elo levels

Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels

Re: Dragon 2.6.1 Elo levels