TalkChess.com

Posted: **Mon Dec 10, 2012 8:14 pm**

I played a gauntlet with Komodo 5 at different time/move vs Houdini 3 at 1s/move. This took some time because the time controls are not very short, and I only saw such tests performed at ultra-short fixed time or fixed depth controls (by Don and Adam).

Code: Select all

    Program                            Games    Elo 

  1 Komodo 5 4s                    &#58;   2000    3131
  2 Komodo 5 2s                    &#58;   2000    3050
  3 Komodo 5 1s                    &#58;   4016    2957
  4 Komodo 5 0.5s                  &#58;   4017    2850

  5 Houdini 3 1s                   &#58;  12033    3036

The scaling of Komodo 5 with doubling time is:

Code: Select all

From 2s/move to 4s/move &#40;blitz&#41;     +81 Elo points
From 1s/move to 2s/move             +93 Elo points
From 0.5s/move to 1s/move &#40;bullet&#41; +107 Elo points

The fit is:
107*(0.87)^{log2(time in seconds per move)} = 107*(time in seconds per move)^(-0.20) Elo points per doubling time (or cores, assuming perfect scaling).

Extrapolating to longer time controls, for 120min/40 moves on one core it gives 107*180^(-0.20) ~ 40 Elo points per doubling time. On eight cores for 120min/40moves LTC it's ~30 Elo points per doubling time. Of course, this is an extrapolation.

Further speculation: to the infinite time control, the improvement from 1s/move is 107/(1-0.87) ~ 820 Elo points, so that Komodo 5 is limited by something like 4000 Elo points strength (calibrated to the current lists) at infinite time control.

I think the formula 107*(time per move in seconds)^(-0.20) Elo points is useful as a rule of thumb for gain from doubling time. This is on one modern core, on several cores time should be multiplied by #cores.

Kai

Posted: **Mon Dec 10, 2012 9:11 pm**

Your testing method is somewhat susceptible to systematic error, because you test against a fixed-strength opponent. So the ratings you get are sensitive to the rating model, and the saturation you see could very well be caused by the shape of the rating curve being different from what the Elo calculator assumes.

Posted: **Mon Dec 10, 2012 10:22 pm**

hgm wrote:Your testing method is somewhat susceptible to systematic error, because you test against a fixed-strength opponent. So the ratings you get are sensitive to the rating model, and the saturation you see could very well be caused by the shape of the rating curve being different from what the Elo calculator assumes.

I don't think this is a serious issue here. The ratings of Komodo are connected only through Houdini, and the points given reflect strictly the Elo curve. It's true that the Elo curve might be off the rating curve for the engines, but I saw some possible deviations only on some 1200 Elo points span on the tails (possibly a Gaussian), not 250 points as shown here. Yes, these points here assume the Elo model, but I don't think it's problematic on this span.

Posted: **Mon Dec 10, 2012 10:51 pm**

Well, what you think and what you measure are two different things...

If you plan to continue this investigation, I would be really curious to see what happens if you also throw in a 0.25s, 0.5s and 2s Houdini.

Posted: **Mon Dec 10, 2012 10:55 pm**

BTW in CSS forum is search depth test with Houdini 3. Result is weird:

Code: Select all


Tiefe   Elodiff.   Ergebnis
        
1  -  2   222    217,5-782,5  (+82=271-647&#41;
2  -  3   185    256-744      (+75=362-563&#41;
3  -  4   151    295,5-704,5  (+67=457-476&#41;
4  -  5   128    323,5-676,5  (+69=509-422&#41;
5  -  6   122    331,5-668,5  (+69=525-406&#41;
6  -  7   107    350,5-649,5  (+72=557-371&#41;
7  -  8   120    333,5-666,5  (+54=559-387&#41;
8  -  9   175    268-732      (+28=480-492&#41;
9  -  10  123    330,5-669,5  (+39=583-378&#41;
10 -  11   92    370-630      (+29=682-289&#41;
11 -  12   75    393,5-606,5  (+49=689-220&#41;
12 -  13   63    410,5-589,5  (+41=739-220&#41;
13  - 14   63    411-589      (+28=766-206&#41;
14  - 15   46    433,5-566,5  (+36=795-169&#41;

What on earth happens in 8-9 ply match !?!?

Posted: **Mon Dec 10, 2012 11:11 pm**

The result is weird anyway: the draw fraction seems to go up enormously at higher depth. I would consider 80% draws a ridiculously high draw fraction, between nearly equal engines.

Posted: **Mon Dec 10, 2012 11:24 pm**

Extremities happen from time to time. Here a snapshot from an 1+1 match I am running in 8 threads. After 100 games I have:

Code: Select all

ProDeo 1.82 &#40;work&#41; vs ProDeo 1.81 &#40;main&#41; 10-12-2012 using &#91;TIME 1+1&#93;

Testing&#58;

&#91;HR_DEPTH = 1&#93; * default=3

  # ENGINE    &#58; RATING     POINTS   PLAYED  (%)
   1 WORK1    &#58; 2649.4      11.0      13   84.6%
   2 MAIN5    &#58; 2625.7      10.5      13   80.8%
   3 MAIN7    &#58; 2596.3       9.0      12   75.0%
   4 WORK4    &#58; 2587.5       9.5      13   73.1%
   5 MAIN3    &#58; 2577.7       8.5      12   70.8%
   6 MAIN6    &#58; 2549.0       7.0      11   63.6%
   7 MAIN2    &#58; 2544.8       7.5      12   62.5%
   8 WORK8    &#58; 2500.0       7.0      14   50.0%
   9 MAIN8    &#58; 2500.0       7.0      14   50.0%
  10 WORK2    &#58; 2455.2       4.5      12   37.5%
  11 WORK6    &#58; 2451.0       4.0      11   36.4%
  12 WORK3    &#58; 2422.3       3.5      12   29.2%
  13 MAIN4    &#58; 2412.5       3.5      13   26.9%
  14 WORK7    &#58; 2403.7       3.0      12   25.0%
  15 WORK5    &#58; 2374.3       2.5      13   19.2%
  16 MAIN1    &#58; 2350.6       2.0      13   15.4%

Engine WORK &#40;elo 2500&#41; vs Engine MAIN &#40;elo 2500&#41; estimated TPR 2465 (-35&#41;
28-34-38 &#40;100&#41; match score 45.0 - 55.0 &#40;45.0%)
Won-loss 28-38 = -10 &#40;100 games&#41; draws 34.0%
LOS = 11.1%  Elo Error Margin +56 -56

WORK  4&#58;15&#58;26 &#40;25.920M nodes&#41; NPS = 1.691K
MAIN  4&#58;17&#58;49 &#40;25.971M nodes&#41; NPS = 1.679K

Depth Stats      MIDG    END0   END1   END2
WORK             11.63  12.09  12.80  16.70
MAIN             11.20  12.16  12.41  16.12

MAIN = ProDeo 1.81
WORK = ProDeo 1.81 + a code change

Please explain WORK1 version scores 11/13 and the same version only scores 2.5/13 while the total score is -10.

I am confident that after 2000 games (or so) there will be clearness.

Posted: **Mon Dec 10, 2012 11:38 pm**

hgm wrote:Well, what you think and what you measure are two different things...

If you plan to continue this investigation, I would be really curious to see what happens if you also throw in a 0.25s, 0.5s and 2s Houdini.

Besides the Elo model, which is pretty irrelevant here (on -100,+100 points interval all models are almost linear), I see the problem with still large error margins. These 2,000-4,000 blitz games matches take a lot of time, so I will not throw in different TC Houdinis, as I just wanted to see the rule of thumb law.

Posted: **Tue Dec 11, 2012 8:24 am**

Interesting test! I guess the value most are assuming for a doubling of cores is around 40ELO points, but that is at longer time controls. For super fast matches I think the architecture of the engine makes a huge difference if different engines are pitted against each other. I would think process vs. threaded engines would especially have a huge difference at super fast time controls and then moving to longer time controls.

Posted: **Tue Dec 11, 2012 8:37 am**

Laskos wrote:These 2,000-4,000 blitz games matches take a lot of time, so I will not throw in different TC Houdinis, as I just wanted to see the rule of thumb law.

Well, the 'rule of thumb' seems to be 100 Elo per doubling. (Which, btw, is more than I expected. I always assumed 70 Elo per doubling.) The rest of the analysis seems mostly analysing noise, based on a +7 +/- 4.2 @1s and a -19 +/- 6.3 @ 4s.

TalkChess.com

Elo points gain from doubling time

Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time

Re: Elo points gain from doubling time