Houdini 3 running for the IPON

pohl4711 · Post by **pohl4711** » Tue Oct 16, 2012 12:29 pm

I would be disappointed with less than 40 Elo improvement for Houdini 3 in IPON.
Either way, we're very close to the 50 Elo gain I "officially" announced, you cannot expect any more precision from any rating list nor from my own development testing gauntlets (which measured 50 to 55 Elo).

Robert

Hello,

After now 2040 of 10000 games played in my LS-rankinglist-gauntlet of Houdini 3, the score of Houdini 3 is 68.2%. Houdini 2.0c final result after 10000 games against the same 10 strong opponents was 62%.
+6.2% is around +43 Elo. Houdini 3 seems to be very strong, but for any conclusions I want to play the complete 10000 games.
Final result here (hopefully) sunday evening...

Best - Stefan

Houdini · Post by **Houdini** » Tue Oct 16, 2012 12:51 pm

Stefan, thank's for the testing.
Not that it's really important, but the jump from 62.0% (+85 Elo) to 68.2% (+133 Elo) is 48 Elo.

The "1% = 7 Elo" rule is only valid up to about 60% performance, see for example the Elo table http://www.pradu.us/old/Nov27_2008/Buzz/elotable.html which I used for the numbers above.

Robert

overlord · Post by **overlord** » Tue Oct 16, 2012 1:53 pm

Robert,

I believe that the ELO gain would be bigger if we don´t count weaker opponents. Rating for top engines should be calculated only from 5 best engines. It is the same for people. Anand and Carlsen would never have so high ratings if they play common open tournaments with many weakers players. Simply they are not able to beat everybody...playing style of new Houdini 3 looks really amazing

Leto · Post by **Leto** » Tue Oct 16, 2012 3:34 pm

Testing at IPON almost done, and H3 showing +51 improvement so far:

Houdini 3

Houdini 3 STD - Komodo 5 (3012) 81.5 - 45.5 64.17% Perf=3113
Houdini 3 STD - Critter 1.4a (2990) 80.5 - 49.5 61.92% Perf=3074
Houdini 3 STD - Stockfish 2.2.2 JA (2972) 91.5 - 38.5 70.38% Perf=3122
Houdini 3 STD - Deep Rybka 4.1 (2965) 85.5 - 42.5 66.80% Perf=3086
Houdini 3 STD - Naum 4.2 (2840) 108.0 - 21.0 83.72% Perf=3124
Houdini 3 STD - HIARCS 14 WCSC 32b (2824) 106.0 - 22.0 82.81% Perf=3097
Houdini 3 STD - Gull 1.2 (2805) 113.5 - 14.5 88.67% Perf=3162
Houdini 3 STD - Hannibal 1.2 (2801) 110.5 - 18.5 85.66% Perf=3111
Houdini 3 STD - Deep Shredder 12 (2800) 103.5 - 23.5 81.50% Perf=3057
Houdini 3 STD - Deep Sjeng c't 2010 32b (2795) 116.5 - 13.5 89.62% Perf=3169
Houdini 3 STD - Spike 1.4 32b (2784) 116.0 - 13.0 89.92% Perf=3164
Houdini 3 STD - spark-1.0 (2770) 111.5 - 17.5 86.43% Perf=3091
Houdini 3 STD - Protector 1.4.0 (2761) 108.5 - 19.5 84.77% Perf=3059
Houdini 3 STD - Deep Junior 13.3 (2754) 113.0 - 17.0 86.92% Perf=3083
Houdini 3 STD - Quazar 0.4 (2740) 114.5 - 14.5 88.76% Perf=3098
Houdini 3 STD - Zappa Mexico II (2709) 117.0 - 13.0 90.00% Perf=3090
Houdini 3 STD - MinkoChess 1.3 (2696) 121.0 - 8.0 93.80% Perf=3167
1798.5 - 391.5 82.12% Perf=3088

2190 out of 2550 games played

IWB · Post by **IWB** » Tue Oct 16, 2012 3:37 pm

I got a few similar request already. As I am interested in that by myself I will run a short test to see where it would be in the IPON. Nonetheless I will not include it officialy.

Bye
Ingo

Vinvin wrote:
IWB wrote:As usual you find the running tourney here: http://www.inwoba.de

Have fun
Ingo
Ingo, please after this match would you mind to play a shorter match with H3 tactical vs non-top engines (let say 100 games against : Deep Rybka 4.1, Deep Fritz 13 32b, Naum 4.2, Chiron 1.1a, HIARCS 14 WCSC 32b and Gull 1.2) ?

I didn't see any rating for H3tactical in games yet ...

Thanks,
Vincent

IWB · Post by **IWB** » Tue Oct 16, 2012 3:39 pm

Leto wrote:Testing at IPON almost done, ...

There are 650 games missing! Do not forget the games against Fritz and Chiron ...

Nonetheless I have to admit that the first impression looks promising.

Bye
Ingo

Ajedrecista · Post by **Ajedrecista** » Tue Oct 16, 2012 4:53 pm

Hello Johan:

Maharadja wrote:Houdini 3 STD - Komodo 5 (3012) 71.5 - 43.5 62.17% Perf=3098
Houdini 3 STD - Critter 1.4a (2990) 69.0 - 46.0 60.00% Perf=3060
Houdini 3 STD - Stockfish 2.2.2 JA (2972) 81.5 - 33.5 70.87% Perf=3126
Houdini 3 STD - Deep Rybka 4.1 (2965) 76.5 - 37.5 67.11% Perf=3088
Houdini 3 STD - Naum 4.2 (2840) 95.0 - 18.0 84.07% Perf=3128
Houdini 3 STD - HIARCS 14 WCSC 32b (2824) 93.0 - 21.0 81.58% Perf=3082
Houdini 3 STD - Gull 1.2 (2805) 97.5 - 14.5 87.05% Perf=3136
Houdini 3 STD - Hannibal 1.2 (2801) 96.5 - 17.5 84.65% Perf=3097
Houdini 3 STD - Deep Shredder 12 (2800) 92.0 - 22.0 80.70% Perf=3048
Houdini 3 STD - Deep Sjeng c't 2010 32b (2795) 101.5 - 12.5 89.04% Perf=3158
Houdini 3 STD - Spike 1.4 32b (2784) 103.0 - 12.0 89.57% Perf=3157
Houdini 3 STD - spark-1.0 (2770) 98.5 - 15.5 86.40% Perf=3091
Houdini 3 STD - Protector 1.4.0 (2761) 96.5 - 17.5 84.65% Perf=3057
Houdini 3 STD - Deep Junior 13.3 (2754) 98.5 - 15.5 86.40% Perf=3075
Houdini 3 STD - Quazar 0.4 (2740) 101.5 - 11.5 89.82% Perf=3118
Houdini 3 STD - Zappa Mexico II (2709) 103.0 - 11.0 90.35% Perf=3097
Houdini 3 STD - MinkoChess 1.3 (2696) 106.5 - 7.5 93.42% Perf=3156
1581.5 - 356.5 81.60% Perf=3082

1938 out of 2550 games played

Hi,

I don't understand how this average rating is calculated.
why cant we sum up the ratings per engine and divide it by 17?
thus: 52772/17=3104,235294

Your question was widely discussed back in January:

http://www.talkchess.com/forum/viewtopic.php?t=41773

That thread is very long so you must have patience. You can read it and try to extract your own conclusions, though it could be difficult due to the large amount of stuff posted there. Good luck!

Regards from Spain.

Ajedrecista.

Ajedrecista · Post by **Ajedrecista** » Tue Oct 16, 2012 5:04 pm

Hello Stefan:

pohl4711 wrote:
I would be disappointed with less than 40 Elo improvement for Houdini 3 in IPON.
Either way, we're very close to the 50 Elo gain I "officially" announced, you cannot expect any more precision from any rating list nor from my own development testing gauntlets (which measured 50 to 55 Elo).

Robert
Hello,

After now 2040 of 10000 games played in my LS-rankinglist-gauntlet of Houdini 3, the score of Houdini 3 is 68.2%. Houdini 2.0c final result after 10000 games against the same 10 strong opponents was 62%.
+6.2% is around +43 Elo. Houdini 3 seems to be very strong, but for any conclusions I want to play the complete 10000 games.
Final result here (hopefully) sunday evening...

Best - Stefan

Thank you very much for your LS lists. I stay tuned for the final results!

I also saw yesterday than improving from 62% to 68.2% was not improving 43 Elo but a little more (obviating the error bars). I did not want to post about it, but I see that Robert also realized (thanks for Houdini 3 and the table).

Stefan, I recommend you to follow the own definition of Elo differences: if you have two scores of 62% - 38% and 68.2% - 31.8%, then the expected difference is 400*[log(68.2/31.8) - log(62/38)] ~ 47.5 Elo. This equation can be applied for whatever two scores you want except the extreme cases of 0% - 100% and viceversa, of course.

Regards from Spain.

Ajedrecista.

IWB · Post by **IWB** » Tue Oct 16, 2012 5:11 pm

If you continue clicking on my page I might get a new daily record for the IPON.

The last one was on the 28th of December 2011. One day after the Komodo 4 run and 1 day in the Critter 1.4 test!

Thx for that support

Reg
Ingo

Houdini · Post by **Houdini** » Tue Oct 16, 2012 5:42 pm

A nice 3100 average performance against the top 4:

Code: Select all

Houdini 3 STD - Komodo 5 (3012)              88.0 - 48.0    64.71%    Perf=3117
Houdini 3 STD - Critter 1.4a (2990)          86.5 - 52.5    62.23%    Perf=3076
Houdini 3 STD - Stockfish 2.2.2 JA (2972)    98.0 - 41.0    70.50%    Perf=3123
Houdini 3 STD - Deep Rybka 4.1 (2965)        92.5 - 44.5    67.52%    Perf=3092

Behind that it's a massacre, no engine scores over 18%.

Robert

Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON.

Re: Houdini 3 running for the IPON.

Re: Houdini 3 running for the IPON

Re: Houdini 3 running for the IPON