Scaling of engines from FGRL rating list

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Scaling of engines from FGRL rating list.

Post by Milos »

mjlef wrote:You present no data and suggest worthless modifications to programs to ruin what data might be collected. Kai just showed the scaling effect in the time control ranges he presented. It is possible that scaling could change remarkably at a much longer time control. But we have not said it would, and neither has Kai.

You are not taking this seriously, so I will stop taking you seriously too.
What Kai shows is just contempt since data is from the multiple opponents tournament which is useless for scaling purposes as already proven without doubt (one just needs to look at draw percentages, only a blind man or someone advertising a product would not see the obvious). You also never show real data. You just "show" numbers. And since you are selling a product, sorry that I don't believe your numbers. Show us some real PGNs and then we can talk. Till then I call BS on your results about superior scaling and I am certainly not the only one here.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Scaling of engines from FGRL rating list

Post by jdart »

I don't think it is so strong that it is bumping against the limits of what is possible in terms of strength.

But I am quite amazed at how strong it is tactically, even compared to Houdini and Komodo.

--Jon
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Scaling of engines from FGRL rating list.

Post by Laskos »

Milos wrote:
mjlef wrote:You present no data and suggest worthless modifications to programs to ruin what data might be collected. Kai just showed the scaling effect in the time control ranges he presented. It is possible that scaling could change remarkably at a much longer time control. But we have not said it would, and neither has Kai.

You are not taking this seriously, so I will stop taking you seriously too.
What Kai shows is just contempt since data is from the multiple opponents tournament which is useless for scaling purposes as already proven without doubt (one just needs to look at draw percentages, only a blind man or someone advertising a product would not see the obvious). You also never show real data. You just "show" numbers. And since you are selling a product, sorry that I don't believe your numbers. Show us some real PGNs and then we can talk. Till then I call BS on your results about superior scaling and I am certainly not the only one here.
I took 10 engines as shown in excellent FGRL Top 10 rating list, I was not intending to compare Komodo and Stockfish, and they are anyway close in scaling, at least in my first results, maybe within error margins. Andscacs 0.89 and Fritz 15 stand out as well and respectively badly scaling probably outside error margins. It's not very hard to come up even with an inversion due to scaling. I knew that Ippos are scaling badly. So I compared RobboLito 0.10, antecessor of Houdini 5, and very close in rating Komodo 5, antecessor of Komodo 10.

100ms/move:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 14.311 sec)
Settings = Gauntlet/32MB/100ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 3695 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	474.5/1000	341-392-267  	(L: m=1 t=0 i=0 a=391)	(D: r=121 i=44 f=10 s=1 a=91)	(tpm=110.3 d=12.92 nps=1710990)
 2.  RobboLito 0.10 SMP x64   	525.5/1000	392-341-267  	(L: m=1 t=0 i=0 a=340)	(D: r=121 i=44 f=10 s=1 a=91)	(tpm=108.2 d=12.61 nps=2330838)
500ms/move:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 72.125 sec)
Settings = Gauntlet/32MB/500ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 18221 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	540.0/1000	364-284-352  	(L: m=0 t=0 i=0 a=284)	(D: r=155 i=66 f=7 s=4 a=120)	(tpm=506.1 d=15.97 nps=1670582)
 2.  RobboLito 0.10 SMP x64   	460.0/1000	284-364-352  	(L: m=0 t=0 i=0 a=364)	(D: r=155 i=66 f=7 s=4 a=120)	(tpm=510.7 d=15.60 nps=2321888)
The result is outside 2SD interval. This inversion cannot be assigned to any kind of Contempt. Engines can scale differently, if you take an ancient engine like Mephisto Gideon in modern conditions, you will see that its doubling at 40/60'' is roughly 60 Elo points, while a modern engine can bring 120 Elo points at this time control.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Scaling of engines from FGRL rating list.

Post by Milos »

Laskos wrote:Andscacs 0.89 and Fritz 15 stand out as well and respectively badly scaling probably outside error margins. It's not very hard to come up even with an inversion due to scaling. I knew that Ippos are scaling badly. So I compared RobboLito 0.10, antecessor of Houdini 5, and very close in rating Komodo 5, antecessor of Komodo 10.

100ms/move:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 14.311 sec)
Settings = Gauntlet/32MB/100ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 3695 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	474.5/1000	341-392-267  	(L: m=1 t=0 i=0 a=391)	(D: r=121 i=44 f=10 s=1 a=91)	(tpm=110.3 d=12.92 nps=1710990)
 2.  RobboLito 0.10 SMP x64   	525.5/1000	392-341-267  	(L: m=1 t=0 i=0 a=340)	(D: r=121 i=44 f=10 s=1 a=91)	(tpm=108.2 d=12.61 nps=2330838)
500ms/move:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 72.125 sec)
Settings = Gauntlet/32MB/500ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 18221 sec elapsed, 0 sec remaining
 1.  Komodo 5 64-bit          	540.0/1000	364-284-352  	(L: m=0 t=0 i=0 a=284)	(D: r=155 i=66 f=7 s=4 a=120)	(tpm=506.1 d=15.97 nps=1670582)
 2.  RobboLito 0.10 SMP x64   	460.0/1000	284-364-352  	(L: m=0 t=0 i=0 a=364)	(D: r=155 i=66 f=7 s=4 a=120)	(tpm=510.7 d=15.60 nps=2321888)
The result is outside 2SD interval. This inversion cannot be assigned to any kind of Contempt. Engines can scale differently, if you take an ancient engine like Mephisto Gideon in modern conditions, you will see that its doubling at 40/60'' is roughly 60 Elo points, while a modern engine can bring 120 Elo points at this time control.
This is a nice example for inversion but I would not attribute it really to actual strength scaling more to the fact that Komodo at that time was known to be slow engine, i.e. particularly bad at hyperbullet, probably due to extra cautious time management and slow search initialization, while Robbo on the other hand was known for time management really optimized for hyperbullet as well as very quick search initialization code and also very well optimized code which resulted in inflated rating at hyperbullet.
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Scaling of engines from FGRL rating list

Post by carldaman »

Isaac wrote:Is the following guess/interpretation gibberish or plausible? :
Since Stockfish 8 seems to be the highest rated engine in the list, it is closer to a perfect player and finds "the best" move quicker than all the other engines in average. Hence it cannot scale much better with more time control because there is not as much strength to improve compared to any other engine.
Thank you math/logic guys for answering.
The fallacy in the interpretation you posted is that the top engine should be extremely close to perfection at any time control (TC), no matter how short, thus leaving no room for fast scaling. Even if you could allow for near-perfect strength at LTC, it cannot hold true at a very STC.

Stockfish or whatever engine is best right now, is not that close to being perfect, and even more so at short time controls. Lots of room should therefore remain for 'scaling' towards more strength with more time given.
Of course, the strongest engine at a given time control may not necessarily be the better/best 'scaler', and this will not guarantee being on top at all TCs.

CL
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Scaling of engines from FGRL rating list.

Post by mjlef »

jhellis3 wrote:
Science and absolute certainty:
Thanks for the lecture prof Mark, you are such a smart guy :roll: . Nothing I like more than being talked down to.... :roll: .
I think your flippant remarks are not helping you convince people.
Like I said earlier (perhaps you are a bit slow on the uptake?), I am not here to convince anybody. I am not here to promote an agenda *cough*. I present my viewpoints, and let other people do with them what they may.

In my view, false belief is its own punishment :).
Selective editing I see. I said:

Quote:
In science there is no "absolute certainty"

You responded:
Actually, there is. It is called reality.

That is incorrect. My statement is just what this conversation is about. You need lots of data in science to show something is likely right. "absolute reality" is not science and has nothing to do with my point. Instead of responding to the issues you launch ineffective rhetoric.

I am not trying to talk down to you, but you do like rhetoric which in this case is not useful. There is a lot to learn here if we are all willing to listen.
jhellis3
Posts: 546
Joined: Sat Aug 17, 2013 12:36 am

Re: Scaling of engines from FGRL rating list.

Post by jhellis3 »

There is a lot to learn here if we are all willing to listen.
Just not for you right... gross.
You need lots of data in science to show something is likely right
More down talking... Jesus Wept.
I am not trying to talk down to you, but you do like rhetoric which in this case is not useful.
Right.... which is why you did it again... At any rate I will take well reasoned "rhetoric" (if actual games played by engines is called that now) over snake oil and bad science any day of the week...
Isaac
Posts: 265
Joined: Sat Feb 22, 2014 8:37 pm

Re: Scaling of engines from FGRL rating list

Post by Isaac »

carldaman wrote:
Isaac wrote:Is the following guess/interpretation gibberish or plausible? :
Since Stockfish 8 seems to be the highest rated engine in the list, it is closer to a perfect player and finds "the best" move quicker than all the other engines in average. Hence it cannot scale much better with more time control because there is not as much strength to improve compared to any other engine.
Thank you math/logic guys for answering.
The fallacy in the interpretation you posted is that the top engine should be extremely close to perfection at any time control (TC), no matter how short, thus leaving no room for fast scaling. Even if you could allow for near-perfect strength at LTC, it cannot hold true at a very STC.

Stockfish or whatever engine is best right now, is not that close to being perfect, and even more so at short time controls. Lots of room should therefore remain for 'scaling' towards more strength with more time given.
Of course, the strongest engine at a given time control may not necessarily be the better/best 'scaler', and this will not guarantee being on top at all TCs.

CL
I agree with you, thank you for the reply. It makes sense.
Isaac
Posts: 265
Joined: Sat Feb 22, 2014 8:37 pm

Re: Scaling of engines from FGRL rating list.

Post by Isaac »

jhellis3 wrote:Instead of saying Andscacs scales well with increasing time, we might say it actually just scales horribly with decreasing time. The problem is we only looked at 2 data points and have no way of knowing for sure, without broadening our scope. And it doesn't even have to be one or the other, but could potentially be a combination of both, where Andscacs does scale better with more time but not nearly as much as it first appears because it also scales relatively poorly with less time.
Hello Joseph, I would like to understand you here but I fail to see the difference between scaling better with increasing time and scaling badly with decreasing time. To me, it is exactly the same, just another way of describing the same effect.

For example I can't imagine a way to scale both well at increasing and decreasing time control. Will you (or any other) please help me to figure this particular case out? Thank you.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Scaling of engines from FGRL rating list.

Post by cdani »

Isaac wrote: Hello Joseph, I would like to understand you here but I fail to see the difference between scaling better with increasing time and scaling badly with decreasing time. To me, it is exactly the same, just another way of describing the same effect.

For example I can't imagine a way to scale both well at increasing and decreasing time control. Will you (or any other) please help me to figure this particular case out? Thank you.
You can tweak the search of an engine to make it play worst at short time control, but more or less equal at ltc. So not that is talking about doing it on purpose, but that some improvements on the engine produced this effect instead of the more desirable one of to improve it at all time controls.
Anyway this is more on the speculation field, as the "character" of an engine is due to so many parts that is just open to any interpretation.