Long TC matches with Houdini 3 Beta

Martin Thoresen · Post by **Martin Thoresen** » Fri Sep 28, 2012 11:21 am

Impressive results Robert, I can only say congratulations!

Laskos · Post by **Laskos** » Fri Sep 28, 2012 11:26 am

MM wrote:
Laskos wrote:

Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.

Best Regards

Noomen test positions + reversed colours are pretty fair and representative, I guess. Not only these positions do not favour one side, but also favouring heavily one side compresses the differences if played reversed colours. 360 games give about 25 points errors, so let's see, but for now very impressive, the ratio W/L against the top engines is 2-3 for H3.

Kai

MM · Post by MM » Fri Sep 28, 2012 11:38 am

Laskos wrote:
MM wrote:
Laskos wrote:

Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.

Best Regards
Noomen test positions + reversed colours are pretty fair and representative, I guess. Not only these positions do not favour one side, but also favouring heavily one side compresses the differences if played reversed colours. 360 games give about 25 points errors, so let's see, but for now very impressive, the ratio W/L against the top engines is 2-3 for H3.

Kai

I didn't say that ''positions'' favor one side, i said that medium-long opening books lines can favor one side (or another). I think Houdini adapts better than other engines to the different positions of the middlegames, even when it doesn't like them, even the same position with white and black.

That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.

Best Regards

Laskos · Post by **Laskos** » Fri Sep 28, 2012 11:53 am

MM wrote:
Laskos wrote:
MM wrote:
Laskos wrote:

Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.

Best Regards
Noomen test positions + reversed colours are pretty fair and representative, I guess. Not only these positions do not favour one side, but also favouring heavily one side compresses the differences if played reversed colours. 360 games give about 25 points errors, so let's see, but for now very impressive, the ratio W/L against the top engines is 2-3 for H3.

Kai
I didn't say that ''positions'' favor one side, i said that medium-long opening books lines can favor one side (or another). I think Houdini adapts better than other engines to the different positions of the middlegames, even when it doesn't like them, even the same position with white and black.

That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.

Best Regards

Chess960 is a bit different game, I think Critter is optimized for that. Maybe it's better to test with shorter lines, testing groups use different length book lines or opening positions, but to me those Noomen positions are good enough. Maybe Houdini adapts to middlegames better, but theoretically longer the (balanced) lines, more balanced are the results.

Kai

Houdini · Post by **Houdini** » Fri Sep 28, 2012 11:59 am

MM wrote:That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.

I expect the improvement for Chess960 to be slightly larger than for normal chess.

Two months ago I've run Chess960 matches facing Critter 1.6a with Houdini 2.0c and Houdini 3 DEV - see my post http://www.talkchess.com/forum/viewtopic.php?p=476331 and following. 1920 games at 2'+2", single thread.

Code: Select all

Houdini 2.0c - Critter 1.6a  &#58; 910-1010 (-18 Elo ± 12 Elo&#41;
Houdini 3 DEV - Critter 1.6a &#58; 1134-786 (+64 Elo ± 12 Elo&#41;

Measured gain was (82 Elo ± 17 Elo).
The current Houdini 3 is approx. 10 Elo stronger than the DEV version of July.

Robert

Rolf · Post by **Rolf** » Fri Sep 28, 2012 12:20 pm

Could you please publish these 1920 games? Thanks

lucasart · Post by **lucasart** » Fri Sep 28, 2012 12:31 pm

carldaman wrote:Nice result, Robert. Also, ironic and interesting that the win % =~ phi, wonder if there is a significance

Regards,
CL

you mean 1/phi ?

MM · Post by MM » Fri Sep 28, 2012 12:53 pm

Houdini wrote:
MM wrote:That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.
I expect the improvement for Chess960 to be slightly larger than for normal chess.

Two months ago I've run Chess960 matches facing Critter 1.6a with Houdini 2.0c and Houdini 3 DEV - see my post http://www.talkchess.com/forum/viewtopic.php?p=476331 and following. 1920 games at 2'+2", single thread.
Code: Select all
Houdini 2.0c - Critter 1.6a  &#58; 910-1010 (-18 Elo ± 12 Elo&#41;
Houdini 3 DEV - Critter 1.6a &#58; 1134-786 (+64 Elo ± 12 Elo&#41;
Measured gain was (82 Elo ± 17 Elo).
The current Houdini 3 is approx. 10 Elo stronger than the DEV version of July.

Robert

Hi Robert, thank you, i was aware of that match, i'm glad to hear that the current Houdini3 is about 10 elo stronger than that. Don't you think it would be interesting to run some other quick match (2'+2'' is good) against other engines (eg Stockfish, Rybka 4.1..)?

Best Regards

Uri Blass · Post by **Uri Blass** » Fri Sep 28, 2012 1:38 pm

MM wrote:
Houdini wrote:
MM wrote:That's why, when i can, i prefer to test at chess960, i don't like to see engines jump over the opening and play already prepared positions.
I expect the improvement for Chess960 to be slightly larger than for normal chess.

Two months ago I've run Chess960 matches facing Critter 1.6a with Houdini 2.0c and Houdini 3 DEV - see my post http://www.talkchess.com/forum/viewtopic.php?p=476331 and following. 1920 games at 2'+2", single thread.
Code: Select all
Houdini 2.0c - Critter 1.6a  &#58; 910-1010 (-18 Elo ± 12 Elo&#41;
Houdini 3 DEV - Critter 1.6a &#58; 1134-786 (+64 Elo ± 12 Elo&#41;
Measured gain was (82 Elo ± 17 Elo).
The current Houdini 3 is approx. 10 Elo stronger than the DEV version of July.

Robert
Hi Robert, thank you, i was aware of that match, i'm glad to hear that the current Houdini3 is about 10 elo stronger than that. Don't you think it would be interesting to run some other quick match (2'+2'' is good) against other engines (eg Stockfish, Rybka 4.1..)?

Best Regards

I think that for comparison between long and short time control
it is better to use the same type of time control and the same positions
and the same opponents.

This is the reason that I suggested 6'+2''(90/15+30/15) or 3'+1''(90/30+30/30)

Albert Silver · Post by **Albert Silver** » Fri Sep 28, 2012 1:55 pm

Uri Blass wrote:

Code: Select all

90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9%

It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.

I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.

It's not, and the result is invalid. There was a bug in the tester, and it gave this bogus result.

Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta