Don wrote:
Another problem I have identified is the contempt factor. I am of the opinion that the relative strength of the opponent should be communicated to the engines and built it to the protocol because this is becoming an important issue too. It can be worth 100 ELO or more if you are playing way up or down by several hundred ELO (I studied this too.) Unless that is handled a perfect player will draw far too many games against weak opponents. In human play it's rare not to have a rough idea of the strength of your opponent. That can be communicated via some user defined contempt factor but I would like to see it built in to the protocol.
Don
I think the big problem is not building strength of opponent information into the protocol (in fact I think there is support for this using the UCI_Opponent option) its getting people to use it. If you play on ICC you can get this information automatically, but for things like testing and rating groups it is a long hill to climb to convince the community to support that.
Laskos wrote:It was Houdini 3 ply 14 vs ply 15, ultra-bullet, 80% draws are impossible. The highest I seen at very long time controls is 73% or so.
Correct, with Houdini 3 this is nearly impossible - it suggests that the opening positions for the test are not well chosen.
Even at long time control Houdini will have close to 50% decided games. See for example the 90 min+30 sec/move tests I played with the Houdini 3 beta against Houdini 2, Stockfish 2.3.1 and Komodo 5, which over-all was +135 -50 =175.
Robert
The problem with your data is that we don't know if the relatively low draw rate here is because of Houdini's superiority or it's style.
But it does raise an issue I wanted to mention. How much a program draws is not just a function of ELO, it's also a function of style. Some programs are more willing to take chances to win games and some are less willing. I believe that Houdini is more of risk taker than many other program including Komodo and perhaps Komodo is one of the least willing to take risks of the top programs.
In order to talk about this in a meaningful way and have it make sense we have to isolate the style of a program from the strength. I propose the following experiment - as a kind of stylistic "risk adverseness" quotient for any given chess program:
Play a round robin match with a variety of programs of different styles. Pick an arbitrary reference program and time adjust all the other programs to play at the same strength - as closely as feasibly possible. If you succeed all programs should score very close to 50%
The programs that are willing to take risks, while having the same 50% score will also have more decisive results. For example if we assume that Houdini is a more dynamic and aggressive program than most it should lose more game than the other programs in exchange for more wins.
I don't think Houdini or Komodo represent the extremes however. The very old Genius program had a reputation of being "boring" and not taking chances. The Kittinger programs were the opposite. I think a modern day version of Genius would make Komodo look like a risk taking monster and a modern day time adjusted constellation program would make Houdini seems like a timid and careful player. I don't know enough about the style of other interesting programs such as Junior and Hiarcs and others - how would they rate?
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Laskos wrote:I played a gauntlet with Komodo 5 at different time/move vs Houdini 3 at 1s/move. This took some time because the time controls are not very short, and I only saw such tests performed at ultra-short fixed time or fixed depth controls (by Don and Adam).
From 2s/move to 4s/move (blitz) +81 Elo points
From 1s/move to 2s/move +93 Elo points
From 0.5s/move to 1s/move (bullet) +107 Elo points
The fit is:
107*(0.87)^{log2(time in seconds per move)} = 107*(time in seconds per move)^(-0.20) Elo points per doubling time (or cores, assuming perfect scaling).
Extrapolating to longer time controls, for 120min/40 moves on one core it gives 107*180^(-0.20) ~ 40 Elo points per doubling time. On eight cores for 120min/40moves LTC it's ~30 Elo points per doubling time. Of course, this is an extrapolation.
Further speculation: to the infinite time control, the improvement from 1s/move is 107/(1-0.87) ~ 820 Elo points, so that Komodo 5 is limited by something like 4000 Elo points strength (calibrated to the current lists) at infinite time control.
I think the formula 107*(time per move in seconds)^(-0.20) Elo points is useful as a rule of thumb for gain from doubling time. This is on one modern core, on several cores time should be multiplied by #cores.
Kai
Based on a study of the public blitz rating lists, I concluded that at that level the average value of a doubling for all Komodo versions was 90 elo. This is at a 6" average level (40/4'). Your results are lower than this but not drastically so. Part of the difference may be because as each Komodo version gets stronger than the previous one, the doubling value tends to decline as it is effectively moving up the curve to higher time limits. Put another way, the average Komodo on the lists may play about like Komodo 5 at the 3" level, for which you show 81 elo, only 9 less.
Laskos wrote:It was Houdini 3 ply 14 vs ply 15, ultra-bullet, 80% draws are impossible. The highest I seen at very long time controls is 73% or so.
Correct, with Houdini 3 this is nearly impossible - it suggests that the opening positions for the test are not well chosen.
Even at long time control Houdini will have close to 50% decided games. See for example the 90 min+30 sec/move tests I played with the Houdini 3 beta against Houdini 2, Stockfish 2.3.1 and Komodo 5, which over-all was +135 -50 =175.
Robert
Note that it is possible that having houdini3 against houdini3 and not against other programs increase the number of draws.
His results only showed that Houdini was superior - it's difficult to draw any other conclusions from it except that the draw ratio is actually incredibly high for such a good result.
I still think that 15 plies against 14 plies should not give so many draws unless you have a bad choice of the opening positions(for example using a big book that cause the engines to start to play only at move 20 or move 30 is a bad idea)
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Don wrote:
Another problem I have identified is the contempt factor. I am of the opinion that the relative strength of the opponent should be communicated to the engines and built it to the protocol because this is becoming an important issue too. It can be worth 100 ELO or more if you are playing way up or down by several hundred ELO (I studied this too.) Unless that is handled a perfect player will draw far too many games against weak opponents. In human play it's rare not to have a rough idea of the strength of your opponent. That can be communicated via some user defined contempt factor but I would like to see it built in to the protocol.
Don
I think the big problem is not building strength of opponent information into the protocol (in fact I think there is support for this using the UCI_Opponent option) its getting people to use it. If you play on ICC you can get this information automatically, but for things like testing and rating groups it is a long hill to climb to convince the community to support that.
-Sam
But you can still build support for it. The default is "I don't know" which mirrors the human situation. We either know the opponents rating or we don't. With computers it's always, "we don't."
You have to also provide the "I don't know" option so that the program can use it's own default as it's not necessarily zero.
But with UCI_Oppoent I guess you are right, it's already built it. In tournaments we actually do adjust the draw-score parameter in Komodo.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Laskos wrote:I played a gauntlet with Komodo 5 at different time/move vs Houdini 3 at 1s/move. This took some time because the time controls are not very short, and I only saw such tests performed at ultra-short fixed time or fixed depth controls (by Don and Adam).
From 2s/move to 4s/move (blitz) +81 Elo points
From 1s/move to 2s/move +93 Elo points
From 0.5s/move to 1s/move (bullet) +107 Elo points
The fit is:
107*(0.87)^{log2(time in seconds per move)} = 107*(time in seconds per move)^(-0.20) Elo points per doubling time (or cores, assuming perfect scaling).
Extrapolating to longer time controls, for 120min/40 moves on one core it gives 107*180^(-0.20) ~ 40 Elo points per doubling time. On eight cores for 120min/40moves LTC it's ~30 Elo points per doubling time. Of course, this is an extrapolation.
Further speculation: to the infinite time control, the improvement from 1s/move is 107/(1-0.87) ~ 820 Elo points, so that Komodo 5 is limited by something like 4000 Elo points strength (calibrated to the current lists) at infinite time control.
I think the formula 107*(time per move in seconds)^(-0.20) Elo points is useful as a rule of thumb for gain from doubling time. This is on one modern core, on several cores time should be multiplied by #cores.
Kai
Based on a study of the public blitz rating lists, I concluded that at that level the average value of a doubling for all Komodo versions was 90 elo. This is at a 6" average level (40/4'). Your results are lower than this but not drastically so. Part of the difference may be because as each Komodo version gets stronger than the previous one, the doubling value tends to decline as it is effectively moving up the curve to higher time limits. Put another way, the average Komodo on the lists may play about like Komodo 5 at the 3" level, for which you show 81 elo, only 9 less.
What you say seems in line to confirm results, as the cores on which I test are modern i7 3.5GHz, which are twice as fast as those cores in the lists like 40/4', so effectively their 6'' are equivalent to my 3''. Yes, 80-90 points at their 6'' (mine 3'') is very plausible. I was curious about extrapolation to 120min/40moves, my extrapolation gives some 40 points gain per doubling on one core, 30 points on 8 cores at this long time control. Do you have empirical data on this TC to confirm the prediction?