michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
Right, that is the way to frame it.
You play A and B. Winning against A and losing to B should be different than losing to A and beating B? or should it be different than drawing to both?
Ideally, winning against a stronger opponent and losing against a weaker one should give the same rating, but bigger rating variance (in other words, a two-dimensional model should be used)
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
Rating as a statistical category means strength of an engine which is unknown and is just estimated using its performance, therefore you can not use strength estimated from the same results as an a priori information in order to estimate strength coz that would be circular reasoning.
You could only do that only if your a priori strength estimation is independent (not derived from current results) and if you knew how reliable this a priori strength estimation is (i.e. error bars).
Last edited by Milos on Wed Jun 04, 2014 10:15 pm, edited 2 times in total.
Look at the engine rankings, then look at the "Performance" column, which is the Elo calculation of the engines just on the games played in that tournament. Note that WaDuttie, in 8th place with 5 points, has a higher Elo performance rating than Telepath in 4th with 5.5 points. That directly contradicts what Larry and Miguel have been saying, yes ? Or have I misunderstood. I don't know what tool is used for those ratings.
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
Rating as a statistical category means strength of an engine which is unknown and is just estimated using its performance, therefore you can not use strength estimated from the same results as an a priori information in order to estimate strength coz that would be circular reasoning.
You could only do that only if your a priori strength estimation is independent (not derived from current results) and if you knew how reliable this a priori strength estimation is (i.e. error bars).
I agree. All of the games must be weighted the same unless we have a priori information on which to base the weighting on.
Look at the engine rankings, then look at the "Performance" column, which is the Elo calculation of the engines just on the games played in that tournament. Note that WaDuttie, in 8th place with 5 points, has a higher Elo performance rating than Telepath in 4th with 5.5 points. That directly contradicts what Larry and Miguel have been saying, yes ? Or have I misunderstood. I don't know what tool is used for those ratings.
Adam Hair wrote:One assumption being used in the discussion is that the engines play a RR. Telepath and Waduuttie did not play the same opponents.
Exactly.
In a chess tournament with 10 players that play a single or double RR, would anyone dispute that the player with the highest number of points at the end is the winner?
If the tournament is not RR, then obviously you need to resort to tricky calculations that take into account the strengths of the opponents you've played.
Adam Hair wrote:One assumption being used in the discussion is that the engines play a RR. Telepath and Waduuttie did not play the same opponents.
Exactly.
In a chess tournament with 10 players that play a single or double RR, would anyone dispute that the player with the highest number of points at the end is the winner?
If the tournament is not RR, then obviously you need to resort to tricky calculations that take into account the strengths of the opponents you've played.
RR is the best format in my opinion....
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
That is very good, as everyone can take the list he likes
Regards
Ingo
I took the results that ended in a draw and save them into a separate file. Then, I pasted those results into the original file (that means that all draw will be present twice). Then, I ran ordo with that file (td.pgn) and I reproduced exactly what BayesELO had. Except that I had to expand the scale a bit (with -z243.5, default is 202 for 76% winning expectancy), since counting the draws twice contracted the scale.
That is very good, as everyone can take the list he likes
Regards
Ingo
I took the results that ended in a draw and save them into a separate file. Then, I pasted those results into the original file (that means that all draw will be present twice). Then, I ran ordo with that file (td.pgn) and I reproduced exactly what BayesELO had. Except that I had to expand the scale a bit (with -z243.5, default is 202 for 76% winning expectancy), since counting the draws twice contracted the scale.
So, the reason for the discrepancy in the ranking order is exactly that: BE counts the draws twice, Ordo once.
Miguel
Rating lists should be as free of controversy as possible. It seems that BE is controversial and Ordo is less so. Maybe Ordo is better suited for rating lists.