Doubling of time control

IWB · Post by **IWB** » Sat Oct 22, 2016 4:17 pm

corres wrote:
IWB wrote:
corres wrote:
IWB wrote: ... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...
You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.
I am pretty sure I am (given a decent time overall)!
I am running 5 + 3 and the only difference to longer time controls I can see is that the longer ones are more compressed. Every change in ranking is within error margins (unfortunately the longer list have ridiculous huge error margins). The boundary to produce a proper list seems to be below that 5 + 3 Ponder on, one core, average HW. Ponder off I don't know, but the CEGT 40/4 is not that different than my list, so I assume that is ok too even if the overall processing time (with one core) is below mine ...
If the difference in Elo between participants is rather high and/or the difference in move time is not too big the order of engines may be the same. However if the shifting of move time would be an effect nothing than the TCEC rapid is won by Stockfish and no Houdini, for e.g.

No, you forget the error margins, I mentioned them. TCEC is by far not a rating list. Within longer time rating list the error bar is huge so sometimes an engines might change a position but they don't get into another league.

To make it extrem- you can't determine a best engine by a few games (TCEC), you can however determine he winner of a tourney.

Error margins!

Btw, the data shown here indicate something different than you belive

Gruß
Ingo

Ps: maybe Houdini IS better than Stockfish. I haven't seen any data to conclude one or the other ... it that's no point for this discussion

beram · Post by **beram** » Sat Oct 22, 2016 4:20 pm

IWB wrote:Thx for that (Andreas and Jesus)

The interesting thing is, that adding time and cores does not necessarily mean better game play from a certain point onwards. Cores and long games can be used different!

... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...

Fully agreed to that Ingo

Henk · Post by **Henk** » Sat Oct 22, 2016 4:42 pm

Looks like elo improvements of top engines will be smaller and smaller each next year or is that nonsense.

mjlef · Post by **mjlef** » Sat Oct 22, 2016 5:02 pm

I wonder if all engines would eventually reach the same elo, or is this just limited to Komodo? Perhaps a run with a 10 year old engine might be interesting. Or even a run with an old program that does no pruning (no nullmove, LMR, futility...). A few more data points would be helpful, but becomes impractical for longer time controls.

shrapnel · Post by **shrapnel** » Sat Oct 22, 2016 5:29 pm

beram wrote:When engine B did loose substantial say >55% to engine A on STC, programmers and fanboys of B would(will) often say or claim that it would be all different at LTC conditions
When than same good winning results of engine A over B where made on LTC, than these people simply repeated the mantra, saying that it would be all very different when the match would be played at very LTC

Hmm.... I think you've got the wrong end of the stick here.
In ongoing TCEC Rapids, Stockfish ( let's call it your Engine B) is losing quite badly to Houdini (Engine A).
BUT, majority still believe that in LTC Superfinal, the fantastically strong Stockfish will beat Houdini !
I'm one of the people who want Houdini to win, but after your sneering Post, I think I'll switch to Adam's side and hope that Stockfish wins, just to disprove your theory !

lkaufman · Post by **lkaufman** » Sat Oct 22, 2016 5:35 pm

IWB wrote:Thx for that (Andreas and Jesus)

The interesting thing is, that adding time and cores does not necessarily mean better game play from a certain point onwards. Cores and long games can be used different!

... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...

Comparison of the 40/40 (or 40/20) CCRL and CEGT lists with their 40/4 lists consistently shows Komodo versions ranking higher relative to Stockfish versions on the longer TC lists. This was even more clearly true of Komodo relative to Houdini. At one time I calculated that each doubling of tc favored Komodo over Houdini by eight elo points. But your IPON list is closer to the 40/20 list (which is really roughly a 40/10 list on modern Hardware I think) than to the 40/4 lists, so I can't be certain that this trend would continue much beyond the level of your test. I think the level you use is a good compromise between the accuracy of longer TC games and the need for large numbers of games.
On a completely unrelated matter, my daughter is getting married today, and her husband-to-be has the last name of ...Bauer! Do you have family in Massachusetts?

Henk · Post by **Henk** » Sat Oct 22, 2016 5:39 pm

Bauer = Pawn

beram · Post by **beram** » Sat Oct 22, 2016 6:03 pm

shrapnel wrote:
beram wrote:When engine B did loose substantial say >55% to engine A on STC, programmers and fanboys of B would(will) often say or claim that it would be all different at LTC conditions
When than same good winning results of engine A over B where made on LTC, than these people simply repeated the mantra, saying that it would be all very different when the match would be played at very LTC
Hmm.... I think you've got the wrong end of the stick here.
In ongoing TCEC Rapids, Stockfish ( let's call it your Engine B) is losing quite badly to Houdini (Engine A).
BUT, majority still believe that in LTC Superfinal, the fantastically strong Stockfish will beat Houdini !
I'm one of the people who want Houdini to win, but after your sneering Post, I think I'll switch to Adam's side and hope that Stockfish wins, just to disprove your theory !

Sorry, but you are comparing apples to oranges
I am talking about matchplay and you are referring to tournament play

Btw I am also hoping that Houdini is as good as it promises to be based on the about 160 games we have seen so far in TCEC

corres · Post by **corres** » Sat Oct 22, 2016 6:09 pm

[quote="IWB"][quote="corres"][quote="IWB"][quote="corres"][quote="IWB"]
... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...[/quote]

You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.[/quote]

I am pretty sure I am (given a decent time overall)!
I am running 5 + 3 and the only difference to longer time controls I can see is that the longer ones are more compressed. Every change in ranking is within error margins (unfortunately the longer list have ridiculous huge error margins). The boundary to produce a proper list seems to be below that 5 + 3 Ponder on, one core, average HW. Ponder off I don't know, but the CEGT 40/4 is not that different than my list, so I assume that is ok too even if the overall processing time (with one core) is below mine ...[/quote]

If the difference in Elo between participants is rather high and/or the difference in move time is not too big the order of engines may be the same. However if the shifting of move time would be an effect nothing than the TCEC rapid is won by Stockfish and no Houdini, for e.g.[/quote]

No, you forget the error margins, I mentioned them. TCEC is by far not a rating list. Within longer time rating list the error bar is huge so sometimes an engines might change a position but they don't get into another league.

To make it extrem- you can't determine a best engine by a few games (TCEC), you can however determine he winner of a tourney.

Error margins!

Btw, the data shown here indicate something different than you belive

Gruß
Ingo

Ps: maybe Houdini IS better than Stockfish. I haven't seen any data to conclude one or the other ... it that's no point for this discussion[/quote]

Because the limited number of games (the longer move time the more limited number of games) the reference to error margin looks like a good argument. In the sense of statistics it is really a good argument.
But error margin covers up the difference in attitude of engines.
TCEC does not produce a good order of participants (error margin!) but
the error margin is not the cause of the fact that TCEC Stage 3 have been won by Stockfish however TCEC rapid will be won by Houdini.
The real cause is in the difference between the frame of Stockfish and the frame of Houdini what results the difference in their attitude hanging on move time.
By the way, the data shown here talks about Komodo 9.3 only.
Greetings
Robert

Laskos · Post by **Laskos** » Sun Oct 23, 2016 5:51 am

mjlef wrote:I wonder if all engines would eventually reach the same elo, or is this just limited to Komodo? Perhaps a run with a 10 year old engine might be interesting. Or even a run with an old program that does no pruning (no nullmove, LMR, futility...). A few more data points would be helpful, but becomes impractical for longer time controls.

It says that at this 3700-3800 CCRL ELO level the doubling won't give any gain and draw rate becomes 100% for Komodo in self-play. I once got few datapoints with Mephisto Gideon, an early 1990s engine, maybe 2300 ELO level in its time on x486.
http://talkchess.com/forum/viewtopic.php?p=657446
It does show similar diminishing returns, although it doesn't have nullmove, LMR, futility, etc. I got with it a vague 4800 ELO points for perfect engine, but I didn't have such data quality as Andreas got.

I fitted now the Andreas datapoints for total ELO to simplest decaying exponential, and got an excellent fit, better and simpler than what Jesus got. Average error is about 3 ELO points. It gives again the perfect engine in the range of 3700 ELO points. Although I would have preferred to have 4000+ for it, the fits seem converge towards these lower values. Returns do seem to diminish pretty fast at longer TC which Andreas tested. I have never tested at such long time controls in so many games.

Doubling of time control

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control

Re: Doubling of time control.

Re: Doubling of time control.