Houdini 4 has been released

Houdini · Post by **Houdini** » Tue Nov 26, 2013 3:50 pm

lkaufman wrote: Are you saying that your results were about the same at 10" and at 120"? That would be surprising. If you don't mind, can you tell us your result specifically at 120" + 1.2", including the number of games played and the elo range of the opposition? I'm curious because so far the LS list reports +27 elo which probably differs from your +45 by more than the combined error, unless your 120" sample was rather small.
My own results for h4 against h3 are very good, but as we know these gains often shrink when testing against foreign opposition.
Anyway, you have clearly made a solid improvement over h3, and Houdini 4 is again going to be topping the rapid rating lists for now.

At 10"+0.1" I play 27,000 or 30,000 games (9 or 10 opponents x 3000).
At 120"+1.2" I play 3,600 or 4,000 games (9 or 10 opponents x 400).

The opponents for Houdini 4 range from Naum 4 (the weakest) to Houdini 3 (the strongest).
The gauntlets are run for every dev version of the engine, and the end the evolution (hopefully progress) is documented.

I've been using this system since Houdini 1.5; in my current lists I have the following relative ratings, settings Houdini 1.03 at a conventional 3000 points:

> 10"+0.1":
Houdini 1.03a: 3000
Houdini 1.5: 3045
Houdini 2.0: 3070
Houdini 3: 3120
Houdini 4: 3165

> 120"+1.2"
Houdini 1.03a: 3000
Houdini 1.5: 3050
Houdini 2.0: 3080
Houdini 3: 3140
Houdini 4: 3185

As you can see the results are very similar at 10" and 120".
It also shows the progress in 3.5 years

.

Cheers,
Robert

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 4:06 pm

Houdini wrote:
lkaufman wrote: Are you saying that your results were about the same at 10" and at 120"? That would be surprising. If you don't mind, can you tell us your result specifically at 120" + 1.2", including the number of games played and the elo range of the opposition? I'm curious because so far the LS list reports +27 elo which probably differs from your +45 by more than the combined error, unless your 120" sample was rather small.
My own results for h4 against h3 are very good, but as we know these gains often shrink when testing against foreign opposition.
Anyway, you have clearly made a solid improvement over h3, and Houdini 4 is again going to be topping the rapid rating lists for now.
At 10"+0.1" I play 27,000 or 30,000 games (9 or 10 opponents x 3000).
At 120"+1.2" I play 3,600 or 4,000 games (9 or 10 opponents x 400).

The opponents for Houdini 4 range from Naum 4 (the weakest) to Houdini 3 (the strongest).
The gauntlets are run for every dev version of the engine, and the end the evolution (hopefully progress) is documented.

I've been using this system since Houdini 1.5; in my current lists I have the following relative ratings, settings Houdini 1.03 at a conventional 3000 points:

> 10"+0.1":
Houdini 1.03a: 3000
Houdini 1.5: 3045
Houdini 2.0: 3070
Houdini 3: 3120
Houdini 4: 3165

> 120"+1.2"
Houdini 1.03a: 3000
Houdini 1.5: 3050
Houdini 2.0: 3080
Houdini 3: 3140
Houdini 4: 3185

As you can see the results are very similar at 10" and 120".
It also shows the progress in 3.5 years .

Cheers,
Robert

Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?

Milos · Post by **Milos** » Tue Nov 26, 2013 4:33 pm

lkaufman wrote: Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?

It is very easy to explain. RH has +45 (without TBs) with 1SD approximatelly 7Elo, SP has +27 with 1SD approximatelly 10Elo. So it is quite probable that the "real" improvement is 37Elo which is quite in the middle.
In addition to that SP has higher average rating of opponents with compreses the ratings more.

Houdini · Post by **Houdini** » Tue Nov 26, 2013 5:02 pm

lkaufman wrote: Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?

I have no idea.
The time that I wondered about rating list results is long gone

. One version gets 10 or 15 points too much, the next version gets 10 or 15 points too few... in the long run it more or less evens out.

All the best,
Robert

A Distel · Post by **A Distel** » Tue Nov 26, 2013 6:01 pm

Houdini wrote:Houdini 4 has been released today.
More information on our web page http://www.cruxis.com/chess/houdini.htm . Make sure to refresh the browser cache using F5 or Ctrl-F5 to get the most recent page.

Thank you for reading,
Robert

Mogen we een 'ChessBase' versie van 'Houdini 4 (DVD)' verwachten?

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 7:13 pm

Houdini wrote:
lkaufman wrote: After 500 direct bullet (30" + 0.3") games against H3 I'm showing +48 elo, close enough to the claimed 50, but normally elo gains diminish with increased time limit and also against unrelated engines, so I'll "predict" that the real gain (say on the CEGT 5' +3" list, which is similar to IPON) will be around 30 elo. We'll see.
In my tests at 10"+0.1" and 120"+1.2" against 9 opponents Houdini 4 without table bases is about 45 Elo better than Houdini 3.
The Syzygy 6-men add another 5 to 10 Elo in my tests at 60"+1" time control, which explains the official number of "50 Elo" for the release.

How much this will produce in rating lists is always the big surprise, inasmuch as the time management of Houdini 4 has changed as well I'm not even trying to predict these numbers with a precision better than 20 Elo...

Cheers,
Robert

One more question: When you say that the Syzygy 6 men tb adds five to ten elo, that appears to mean compared to no TB. But shouldn't the proper comparison be with the best TB supported by Houdini 3? Or are you saying that Syzygy is that much better than any other supported TB?
By the way I'm now running a recent Komodo version (which was already tested vs H3) against H4 to measure the improvement against an unrelated opponent at 1' +.5". So far it is showing roughly midway between the 27 LS figure so far and your 45 figure. I'll report fully at the end, when I should have 4000 games.

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 7:25 pm

Milos wrote:
lkaufman wrote: Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?
It is very easy to explain. RH has +45 (without TBs) with 1SD approximatelly 7Elo, SP has +27 with 1SD approximatelly 10Elo. So it is quite probable that the "real" improvement is 37Elo which is quite in the middle.
In addition to that SP has higher average rating of opponents with compreses the ratings more.

The numbers you mention for 1SD look about right for 2SD to me, assuming something close to half draws. For 2800 games with half draws and a 27 elo gap I get margin of error 9.35, which might be about ten with somewhat less than half draws. These are two SD values, not 1 SD.I think you made some simple error. The combined margin of error is much less than the sum of the two errors; for 10 and 7 it is about 12.1, way below 17 or 18.
I think your 37 estimate might be about right, but it is not reasonable to consider 27 and 45 elo with the given numbers of games to be just sample error. There should be some other factor.

Larry

Milos · Post by **Milos** » Tue Nov 26, 2013 9:01 pm

lkaufman wrote:The numbers you mention for 1SD look about right for 2SD to me, assuming something close to half draws. For 2800 games with half draws and a 27 elo gap I get margin of error 9.35, which might be about ten with somewhat less than half draws. These are two SD values, not 1 SD.I think you made some simple error. The combined margin of error is much less than the sum of the two errors; for 10 and 7 it is about 12.1, way below 17 or 18.
I think your 37 estimate might be about right, but it is not reasonable to consider 27 and 45 elo with the given numbers of games to be just sample error. There should be some other factor.

Larry

For 2800 games assuming draw and win rates slightly better than H3 (which has 44% and 62% respectivelly) - i.e. 41% and 66% 1SD between 2 opponents would be 0.66%. Since there are many opponents here SD is larger and you have to multiply it with at least sqrt(2), which gives 6.5 Elo. On RH side 1SD is about 5.8Elo. This combined gives 8.7Elo for 1SD. 2SD is than around 17Elo which is already the difference between two results.
In addition to that LS list uses stronger opponents (for 60-80Elo on average) which translates into 5-10Elo rating compression.

ernest · Post by **ernest** » Tue Nov 26, 2013 9:09 pm

Hi Robert,

Any possibility for a 32-bit version ???

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 9:25 pm

Milos wrote:
lkaufman wrote:The numbers you mention for 1SD look about right for 2SD to me, assuming something close to half draws. For 2800 games with half draws and a 27 elo gap I get margin of error 9.35, which might be about ten with somewhat less than half draws. These are two SD values, not 1 SD.I think you made some simple error. The combined margin of error is much less than the sum of the two errors; for 10 and 7 it is about 12.1, way below 17 or 18.
I think your 37 estimate might be about right, but it is not reasonable to consider 27 and 45 elo with the given numbers of games to be just sample error. There should be some other factor.

Larry
For 2800 games assuming draw and win rates slightly better than H3 (which has 44% and 62% respectivelly) - i.e. 41% and 66% 1SD between 2 opponents would be 0.66%. Since there are many opponents here SD is larger and you have to multiply it with at least sqrt(2), which gives 6.5 Elo. On RH side 1SD is about 5.8Elo. This combined gives 8.7Elo for 1SD. 2SD is than around 17Elo which is already the difference between two results.
In addition to that LS list uses stronger opponents (for 60-80Elo on average) which translates into 5-10Elo rating compression.

So the 18 elo gap is just one elo more than the margin of error, so certainly it is possible. But I don't understand your second point. Longer time limits mean rating compression, but the strength of the opposition should not affect properly calculated ratings in general, unless one uses the broken "elostat" which wrongly averages ratings. Why do you claim that stronger opponents makes for rating compression in general (not specifically for Houdini)?

Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released