Long TC matches with Houdini 3 Beta

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Martin Thoresen
Posts: 1833
Joined: Thu Jun 22, 2006 12:07 am

Re: Long TC matches with Houdini 3 Beta

Post by Martin Thoresen »

Impressive results Robert, I can only say congratulations!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Long TC matches with Houdini 3 Beta

Post by Laskos »

MM wrote:
Laskos wrote:




Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.


Best Regards
Noomen test positions + reversed colours are pretty fair and representative, I guess. Not only these positions do not favour one side, but also favouring heavily one side compresses the differences if played reversed colours. 360 games give about 25 points errors, so let's see, but for now very impressive, the ratio W/L against the top engines is 2-3 for H3.

Kai
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Long TC matches with Houdini 3 Beta

Post by MM »

Laskos wrote:
MM wrote:
Laskos wrote:




Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.


Best Regards
Noomen test positions + reversed colours are pretty fair and representative, I guess. Not only these positions do not favour one side, but also favouring heavily one side compresses the differences if played reversed colours. 360 games give about 25 points errors, so let's see, but for now very impressive, the ratio W/L against the top engines is 2-3 for H3.

Kai
I didn't say that ''positions'' favor one side, i said that medium-long opening books lines can favor one side (or another). I think Houdini adapts better than other engines to the different positions of the middlegames, even when it doesn't like them, even the same position with white and black.

That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.

Best Regards
MM
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Long TC matches with Houdini 3 Beta

Post by Laskos »

MM wrote:
Laskos wrote:
MM wrote:
Laskos wrote:




Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.


Best Regards
Noomen test positions + reversed colours are pretty fair and representative, I guess. Not only these positions do not favour one side, but also favouring heavily one side compresses the differences if played reversed colours. 360 games give about 25 points errors, so let's see, but for now very impressive, the ratio W/L against the top engines is 2-3 for H3.

Kai
I didn't say that ''positions'' favor one side, i said that medium-long opening books lines can favor one side (or another). I think Houdini adapts better than other engines to the different positions of the middlegames, even when it doesn't like them, even the same position with white and black.

That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.

Best Regards
Chess960 is a bit different game, I think Critter is optimized for that. Maybe it's better to test with shorter lines, testing groups use different length book lines or opening positions, but to me those Noomen positions are good enough. Maybe Houdini adapts to middlegames better, but theoretically longer the (balanced) lines, more balanced are the results.

Kai
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Long TC matches with Houdini 3 Beta

Post by Houdini »

MM wrote:That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.
I expect the improvement for Chess960 to be slightly larger than for normal chess.

Two months ago I've run Chess960 matches facing Critter 1.6a with Houdini 2.0c and Houdini 3 DEV - see my post http://www.talkchess.com/forum/viewtopic.php?p=476331 and following. 1920 games at 2'+2", single thread.

Code: Select all

Houdini 2.0c - Critter 1.6a  : 910-1010 (-18 Elo ± 12 Elo)
Houdini 3 DEV - Critter 1.6a : 1134-786 (+64 Elo ± 12 Elo)
Measured gain was (82 Elo ± 17 Elo).
The current Houdini 3 is approx. 10 Elo stronger than the DEV version of July.

Robert
User avatar
Rolf
Posts: 6081
Joined: Fri Mar 10, 2006 11:14 pm
Location: Munster, Nuremberg, Princeton

Re: Long TC matches with Houdini 3 Beta

Post by Rolf »

Could you please publish these 1920 games? Thanks
-Popper and Lakatos are good but I'm stuck on Leibowitz
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Long TC matches with Houdini 3 Beta

Post by lucasart »

carldaman wrote:Nice result, Robert. Also, ironic and interesting that the win % =~ phi, wonder if there is a significance :wink:

Regards,
CL
you mean 1/phi ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Long TC matches with Houdini 3 Beta

Post by MM »

Houdini wrote:
MM wrote:That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.
I expect the improvement for Chess960 to be slightly larger than for normal chess.

Two months ago I've run Chess960 matches facing Critter 1.6a with Houdini 2.0c and Houdini 3 DEV - see my post http://www.talkchess.com/forum/viewtopic.php?p=476331 and following. 1920 games at 2'+2", single thread.

Code: Select all

Houdini 2.0c - Critter 1.6a  : 910-1010 (-18 Elo ± 12 Elo)
Houdini 3 DEV - Critter 1.6a : 1134-786 (+64 Elo ± 12 Elo)
Measured gain was (82 Elo ± 17 Elo).
The current Houdini 3 is approx. 10 Elo stronger than the DEV version of July.

Robert
Hi Robert, thank you, i was aware of that match, i'm glad to hear that the current Houdini3 is about 10 elo stronger than that. Don't you think it would be interesting to run some other quick match (2'+2'' is good) against other engines (eg Stockfish, Rybka 4.1..)?

Best Regards
MM
Uri Blass
Posts: 10281
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Long TC matches with Houdini 3 Beta

Post by Uri Blass »

MM wrote:
Houdini wrote:
MM wrote:That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.
I expect the improvement for Chess960 to be slightly larger than for normal chess.

Two months ago I've run Chess960 matches facing Critter 1.6a with Houdini 2.0c and Houdini 3 DEV - see my post http://www.talkchess.com/forum/viewtopic.php?p=476331 and following. 1920 games at 2'+2", single thread.

Code: Select all

Houdini 2.0c - Critter 1.6a  : 910-1010 (-18 Elo ± 12 Elo)
Houdini 3 DEV - Critter 1.6a : 1134-786 (+64 Elo ± 12 Elo)
Measured gain was (82 Elo ± 17 Elo).
The current Houdini 3 is approx. 10 Elo stronger than the DEV version of July.

Robert
Hi Robert, thank you, i was aware of that match, i'm glad to hear that the current Houdini3 is about 10 elo stronger than that. Don't you think it would be interesting to run some other quick match (2'+2'' is good) against other engines (eg Stockfish, Rybka 4.1..)?

Best Regards
I think that for comparison between long and short time control
it is better to use the same type of time control and the same positions
and the same opponents.

This is the reason that I suggested 6'+2''(90/15+30/15) or 3'+1''(90/30+30/30)
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Long TC matches with Houdini 3 Beta

Post by Albert Silver »

Uri Blass wrote:

Code: Select all

90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9% 
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.

I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.
It's not, and the result is invalid. There was a bug in the tester, and it gave this bogus result.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."