Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

mmt · Post by **mmt** » Thu Feb 06, 2020 10:54 pm

Ovyron wrote: ↑Thu Feb 06, 2020 9:54 pm Not if if produces a lot more games where the engine presents a high eval in a draw game but less wins in general.

It's not a problem if an engine gets a better score in drawn positions that seem like an advantage to it. Moving towards a winning score would still happen because wins would still count much more. And you can calibrate the eval to the actual W/D/L like I suggested.

Ovyron · Post by **Ovyron** » Thu Feb 06, 2020 11:06 pm

Okay then, but specifically for engine developers the amount of games played play a role, because ELO gets more accurate with more games. If you're going to extract information from drawn games, then why not just adjudicate the games in some manner and play more of them?

Say, instead of playing to mate, you play them until both versions show a score of 2.00, and count that as a win, or -2.00 as a loss, and play more games. What is the lowest score that you can use to adjudicate? Because counting draws as draws but playing more games might produce a more accurate ELO than playing a draw till the end and adjust how much the engines that drew got. Instead of some 1.00 score being reached and playing till the end for full point or draw+bonus you just give full point already and play another game. Would playing more games be better than less games with draw bonuses/penalties?

mmt · Post by **mmt** » Fri Feb 07, 2020 1:31 am

With early adjudication, you'd have no idea how the program works when it's far ahead or far behind and that's still very important. With my method, you still have this info.

Ovyron · Post by **Ovyron** » Fri Feb 07, 2020 2:32 am

mmt wrote: ↑Fri Feb 07, 2020 1:31 am With early adjudication, you'd have no idea how the program works when it's far ahead or far behind and that's still very important. With my method, you still have this info.

An engine that is far ahead and misses the winning continuation, yet gets more elo, or another that is far behind but manages to find the drawing continuation, yet gets less elo, don't tell us more than one that got to that elo by actually winning games.

I do hope someone implements you idea so we get to see how this works in practice, though. How much percentage of games do you think can be saved (they don't need to be played) by scoring draws depending on how the games went? We don't need to play games for this, we just need already played games and resulting ELO, then we get rid of this portion of games and see if we can get known ELO to be matched with less games by looking at draws.

mmt · Post by **mmt** » Fri Feb 07, 2020 4:25 am

Ovyron wrote: ↑Fri Feb 07, 2020 2:32 am
An engine that is far ahead and misses the winning continuation, yet gets more elo, or another that is far behind but manages to find the drawing continuation, yet gets less elo, don't tell us more than one that got to that elo by actually winning games.

It's not about which engine gets more or less ELO at all. I doubt any tournaments would use Maz scoring. It's about which engine is better, like my soccer example.

Ovyron wrote: ↑Fri Feb 07, 2020 2:32 amI do hope someone implements you idea so we get to see how this works in practice, though. How much percentage of games do you think can be saved (they don't need to be played) by scoring draws depending on how the games went? We don't need to play games for this, we just need already played games and resulting ELO, then we get rid of this portion of games and see if we can get known ELO to be matched with less games by looking at draws.

It doesn't seem like a hard project to implement it. I think there is no use guessing if we can get real numbers. I'll do it when I have some time.

Ovyron · Post by **Ovyron** » Fri Feb 07, 2020 5:59 am

mmt wrote: ↑Fri Feb 07, 2020 4:25 am It's not about which engine gets more or less ELO at all. I doubt any tournaments would use Maz scoring. It's about which engine is better, like my soccer example.

ELO is all about which engine is better, people spend their time testing engines ad nauseam because that's what ELO promises, it even has error bars that go down with more games and a "Likelihood of Superiority" that is supposed to guarantee scientifically that an engine is better than another after reaching 100%. Any method that would be used to rank engines from better to worse would be approached by ELO after enough games.

If you managed be able to sort engines from worst to best by skipping ELO, without having to play so many games, it'd be a revolution!

Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?