Scaling of engines from FGRL rating list

Milos · Post by **Milos** » Tue Apr 11, 2017 2:05 am

mjlef wrote:You present no data and suggest worthless modifications to programs to ruin what data might be collected. Kai just showed the scaling effect in the time control ranges he presented. It is possible that scaling could change remarkably at a much longer time control. But we have not said it would, and neither has Kai.

You are not taking this seriously, so I will stop taking you seriously too.

What Kai shows is just contempt since data is from the multiple opponents tournament which is useless for scaling purposes as already proven without doubt (one just needs to look at draw percentages, only a blind man or someone advertising a product would not see the obvious). You also never show real data. You just "show" numbers. And since you are selling a product, sorry that I don't believe your numbers. Show us some real PGNs and then we can talk. Till then I call BS on your results about superior scaling and I am certainly not the only one here.

jdart · Post by **jdart** » Tue Apr 11, 2017 2:41 am

I don't think it is so strong that it is bumping against the limits of what is possible in terms of strength.

But I am quite amazed at how strong it is tactically, even compared to Houdini and Komodo.

--Jon

Laskos · Post by **Laskos** » Tue Apr 11, 2017 2:54 am

Milos wrote:
mjlef wrote:You present no data and suggest worthless modifications to programs to ruin what data might be collected. Kai just showed the scaling effect in the time control ranges he presented. It is possible that scaling could change remarkably at a much longer time control. But we have not said it would, and neither has Kai.

You are not taking this seriously, so I will stop taking you seriously too.
What Kai shows is just contempt since data is from the multiple opponents tournament which is useless for scaling purposes as already proven without doubt (one just needs to look at draw percentages, only a blind man or someone advertising a product would not see the obvious). You also never show real data. You just "show" numbers. And since you are selling a product, sorry that I don't believe your numbers. Show us some real PGNs and then we can talk. Till then I call BS on your results about superior scaling and I am certainly not the only one here.

I took 10 engines as shown in excellent FGRL Top 10 rating list, I was not intending to compare Komodo and Stockfish, and they are anyway close in scaling, at least in my first results, maybe within error margins. Andscacs 0.89 and Fritz 15 stand out as well and respectively badly scaling probably outside error margins. It's not very hard to come up even with an inversion due to scaling. I knew that Ippos are scaling badly. So I compared RobboLito 0.10, antecessor of Houdini 5, and very close in rating Komodo 5, antecessor of Komodo 10.

100ms/move:

Code: Select all

Games Completed = 1000 of 1000 &#40;Avg game length = 14.311 sec&#41;
Settings = Gauntlet/32MB/100ms per move/M 600cp for 3 moves, D 120 moves/EPD&#58;C&#58;\LittleBlitzer\2moves_v1.epd&#40;32000&#41;
Time = 3695 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	474.5/1000	341-392-267  	&#40;L&#58; m=1 t=0 i=0 a=391&#41;	&#40;D&#58; r=121 i=44 f=10 s=1 a=91&#41;	&#40;tpm=110.3 d=12.92 nps=1710990&#41;
 2.  RobboLito 0.10 SMP x64   	525.5/1000	392-341-267  	&#40;L&#58; m=1 t=0 i=0 a=340&#41;	&#40;D&#58; r=121 i=44 f=10 s=1 a=91&#41;	&#40;tpm=108.2 d=12.61 nps=2330838&#41;

500ms/move:

Code: Select all

Games Completed = 1000 of 1000 &#40;Avg game length = 72.125 sec&#41;
Settings = Gauntlet/32MB/500ms per move/M 600cp for 3 moves, D 120 moves/EPD&#58;C&#58;\LittleBlitzer\2moves_v1.epd&#40;32000&#41;
Time = 18221 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	540.0/1000	364-284-352  	&#40;L&#58; m=0 t=0 i=0 a=284&#41;	&#40;D&#58; r=155 i=66 f=7 s=4 a=120&#41;	&#40;tpm=506.1 d=15.97 nps=1670582&#41;
 2.  RobboLito 0.10 SMP x64   	460.0/1000	284-364-352  	&#40;L&#58; m=0 t=0 i=0 a=364&#41;	&#40;D&#58; r=155 i=66 f=7 s=4 a=120&#41;	&#40;tpm=510.7 d=15.60 nps=2321888&#41;

The result is outside 2SD interval. This inversion cannot be assigned to any kind of Contempt. Engines can scale differently, if you take an ancient engine like Mephisto Gideon in modern conditions, you will see that its doubling at 40/60'' is roughly 60 Elo points, while a modern engine can bring 120 Elo points at this time control.

Milos · Post by **Milos** » Tue Apr 11, 2017 3:26 am

Laskos wrote:Andscacs 0.89 and Fritz 15 stand out as well and respectively badly scaling probably outside error margins. It's not very hard to come up even with an inversion due to scaling. I knew that Ippos are scaling badly. So I compared RobboLito 0.10, antecessor of Houdini 5, and very close in rating Komodo 5, antecessor of Komodo 10.

100ms/move:
Code: Select all
Games Completed = 1000 of 1000 &#40;Avg game length = 14.311 sec&#41;
Settings = Gauntlet/32MB/100ms per move/M 600cp for 3 moves, D 120 moves/EPD&#58;C&#58;\LittleBlitzer\2moves_v1.epd&#40;32000&#41;
Time = 3695 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	474.5/1000	341-392-267  	&#40;L&#58; m=1 t=0 i=0 a=391&#41;	&#40;D&#58; r=121 i=44 f=10 s=1 a=91&#41;	&#40;tpm=110.3 d=12.92 nps=1710990&#41;
 2.  RobboLito 0.10 SMP x64   	525.5/1000	392-341-267  	&#40;L&#58; m=1 t=0 i=0 a=340&#41;	&#40;D&#58; r=121 i=44 f=10 s=1 a=91&#41;	&#40;tpm=108.2 d=12.61 nps=2330838&#41;
500ms/move:
Code: Select all
Games Completed = 1000 of 1000 &#40;Avg game length = 72.125 sec&#41;
Settings = Gauntlet/32MB/500ms per move/M 600cp for 3 moves, D 120 moves/EPD&#58;C&#58;\LittleBlitzer\2moves_v1.epd&#40;32000&#41;
Time = 18221 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	540.0/1000	364-284-352  	&#40;L&#58; m=0 t=0 i=0 a=284&#41;	&#40;D&#58; r=155 i=66 f=7 s=4 a=120&#41;	&#40;tpm=506.1 d=15.97 nps=1670582&#41;
 2.  RobboLito 0.10 SMP x64   	460.0/1000	284-364-352  	&#40;L&#58; m=0 t=0 i=0 a=364&#41;	&#40;D&#58; r=155 i=66 f=7 s=4 a=120&#41;	&#40;tpm=510.7 d=15.60 nps=2321888&#41;
The result is outside 2SD interval. This inversion cannot be assigned to any kind of Contempt. Engines can scale differently, if you take an ancient engine like Mephisto Gideon in modern conditions, you will see that its doubling at 40/60'' is roughly 60 Elo points, while a modern engine can bring 120 Elo points at this time control.

This is a nice example for inversion but I would not attribute it really to actual strength scaling more to the fact that Komodo at that time was known to be slow engine, i.e. particularly bad at hyperbullet, probably due to extra cautious time management and slow search initialization, while Robbo on the other hand was known for time management really optimized for hyperbullet as well as very quick search initialization code and also very well optimized code which resulted in inflated rating at hyperbullet.

carldaman · Post by **carldaman** » Tue Apr 11, 2017 7:33 am

Isaac wrote:Is the following guess/interpretation gibberish or plausible? :
Since Stockfish 8 seems to be the highest rated engine in the list, it is closer to a perfect player and finds "the best" move quicker than all the other engines in average. Hence it cannot scale much better with more time control because there is not as much strength to improve compared to any other engine.
Thank you math/logic guys for answering.

The fallacy in the interpretation you posted is that the top engine should be extremely close to perfection at any time control (TC), no matter how short, thus leaving no room for fast scaling. Even if you could allow for near-perfect strength at LTC, it cannot hold true at a very STC.

Stockfish or whatever engine is best right now, is not that close to being perfect, and even more so at short time controls. Lots of room should therefore remain for 'scaling' towards more strength with more time given.
Of course, the strongest engine at a given time control may not necessarily be the better/best 'scaler', and this will not guarantee being on top at all TCs.

CL

mjlef · Post by **mjlef** » Tue Apr 11, 2017 2:45 pm

jhellis3 wrote:
Science and absolute certainty:
Thanks for the lecture prof Mark, you are such a smart guy . Nothing I like more than being talked down to.... .

I think your flippant remarks are not helping you convince people.
Like I said earlier (perhaps you are a bit slow on the uptake?), I am not here to convince anybody. I am not here to promote an agenda *cough*. I present my viewpoints, and let other people do with them what they may.

In my view, false belief is its own punishment .

Selective editing I see. I said:

Quote:
In science there is no "absolute certainty"

You responded:
Actually, there is. It is called reality.

That is incorrect. My statement is just what this conversation is about. You need lots of data in science to show something is likely right. "absolute reality" is not science and has nothing to do with my point. Instead of responding to the issues you launch ineffective rhetoric.

I am not trying to talk down to you, but you do like rhetoric which in this case is not useful. There is a lot to learn here if we are all willing to listen.

jhellis3 · Post by **jhellis3** » Tue Apr 11, 2017 6:06 pm

There is a lot to learn here if we are all willing to listen.

Just not for you right... gross.

You need lots of data in science to show something is likely right

More down talking... Jesus Wept.

I am not trying to talk down to you, but you do like rhetoric which in this case is not useful.

Right.... which is why you did it again... At any rate I will take well reasoned "rhetoric" (if actual games played by engines is called that now) over snake oil and bad science any day of the week...

Isaac · Post by **Isaac** » Tue Apr 11, 2017 8:56 pm

carldaman wrote:
Isaac wrote:Is the following guess/interpretation gibberish or plausible? :
Since Stockfish 8 seems to be the highest rated engine in the list, it is closer to a perfect player and finds "the best" move quicker than all the other engines in average. Hence it cannot scale much better with more time control because there is not as much strength to improve compared to any other engine.
Thank you math/logic guys for answering.
The fallacy in the interpretation you posted is that the top engine should be extremely close to perfection at any time control (TC), no matter how short, thus leaving no room for fast scaling. Even if you could allow for near-perfect strength at LTC, it cannot hold true at a very STC.

Stockfish or whatever engine is best right now, is not that close to being perfect, and even more so at short time controls. Lots of room should therefore remain for 'scaling' towards more strength with more time given.
Of course, the strongest engine at a given time control may not necessarily be the better/best 'scaler', and this will not guarantee being on top at all TCs.

CL

I agree with you, thank you for the reply. It makes sense.

Isaac · Post by **Isaac** » Tue Apr 11, 2017 9:18 pm

jhellis3 wrote:Instead of saying Andscacs scales well with increasing time, we might say it actually just scales horribly with decreasing time. The problem is we only looked at 2 data points and have no way of knowing for sure, without broadening our scope. And it doesn't even have to be one or the other, but could potentially be a combination of both, where Andscacs does scale better with more time but not nearly as much as it first appears because it also scales relatively poorly with less time.

Hello Joseph, I would like to understand you here but I fail to see the difference between scaling better with increasing time and scaling badly with decreasing time. To me, it is exactly the same, just another way of describing the same effect.

For example I can't imagine a way to scale both well at increasing and decreasing time control. Will you (or any other) please help me to figure this particular case out? Thank you.

cdani · Post by **cdani** » Tue Apr 11, 2017 10:17 pm

Isaac wrote: Hello Joseph, I would like to understand you here but I fail to see the difference between scaling better with increasing time and scaling badly with decreasing time. To me, it is exactly the same, just another way of describing the same effect.

For example I can't imagine a way to scale both well at increasing and decreasing time control. Will you (or any other) please help me to figure this particular case out? Thank you.

You can tweak the search of an engine to make it play worst at short time control, but more or less equal at ltc. So not that is talking about doing it on purpose, but that some improvements on the engine produced this effect instead of the more desirable one of to improve it at all time controls.
Anyway this is more on the speculation field, as the "character" of an engine is due to so many parts that is just open to any interpretation.

Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list.

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list.

Re: Scaling of engines from FGRL rating list.

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list.

Re: Scaling of engines from FGRL rating list.

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list.

Re: Scaling of engines from FGRL rating list.