The data:
TC = 3' + 2" -40 Elo (5000 games)
TC = 10' + 6" -30 Elo (5000 games)
TC = 30' + 15" -21 Elo (2000 games)
TC = 90' + 30" -10 Elo ( 500 games)
So what we have presented here is an approximate 10 Elo gain for each tripling of time. Now, if we just glance at the data a clear trend is immediately apparent, the aforementioned 10 Elo gain for 3x time, or is it a 10 Elo loss for each 1/3rd of time.... Do we really know?
In order to be absolutely certain we are witnessing a genuine Elo gain as opposed to Elo compression, the absolute best evidence we could have is a data point where Komodo is clearly stronger than Stockfish. Assuming, the naive path continues, 270' + 90" should be equal, and at 810' + 270" Komodo should have a 10 Elo lead.
Obviously, this presents practical problems, but luckily we have a solution. I believe Mark & Larry have alluded to better scaling for Komodo with core count as well. Thus, assuming this claim to be true, we can up the core count and reduce the time, and Komodo should still come out ahead.
810' + 270" / 32 = 25.3' + 8.5"
Obviously scaling won't be perfect, so let's round up to 30' + 15" on 32+ cores. With both superior time and thread scaling, that should be more than enough for Komodo to assert its dominance. Now, all we need is someone with such a machine to run a 2000 game match (we will forget that we have seen fishtest results swing after even 15-20k games).
But we do have one fairly recent data point to look upon, the last TCEC. It was only 20 cores for stage 3 but the time control of 150' + 15" was certainly large enough. And yet....
I would say that in order to make the claim that engine A genuinely out-scales engine B, then you need to be able to show a (reasonable) data point where engine A actually beats engine B.
Why is this necessary? Let's look at the data Mark posted once again. Now, imagine I fiddle with SF's time management a tiny bit to make it slightly sub-optimal. This change will certainly be felt at low time controls, but will essentially disappear as T approaches infinity. At any rate, the upshot is I artificially lower SF's strength by 30 Elo at the shortest time control, but SF's strength at the longest time control is virtually unchanged. Now, SF simply has a 10 Elo advantage forever according to the "scaling" trajectory. Or maybe Komodo is the one with worse time management
The truth is I don't know. Nobody can know just from that set of data. It could be Elo compression. It could be poor time management by Komodo. It could be that Komodo does indeed scale better with time than Stockfish. It could also be a whole host of other issues from eval, to pruning techniques, to opening selections, to branching factor, etc... I don't know. But I do know for someone to say they do know means that either they haven't it all through, made a mistake, are rather arrogant, or are being purposefully disingenuous. Personally, I like to give people the benefit of the doubt and just assume they made a mistake and/or forgot to take some factors into consideration. I know I do that all the time (usually several times per day).
