Houdini 4 has been released

Houdini · Post by **Houdini** » Tue Nov 26, 2013 12:35 pm

Modern Times wrote:It is fair to say also, most ratings lists would not be using 6-men bases. I don't have enough SSD space for Syzygy 6-men, only 5-men on SSD, so that is what I will use. I have 6-men on hard disk just for analysis purposes.

Most of my 6-men Syzygy tests have been made with a normal hard disk, not with a SSD.
With the 6-men on a normal hard disk I find the strength improvement of 5 to 10 Elo I reported above for 60"+1" games, played between Houdini 4 with and without table bases.

Cheers,
Robert

shrapnel · Post by **shrapnel** » Tue Nov 26, 2013 2:06 pm

Houdini wrote:
Modern Times wrote:It is fair to say also, most ratings lists would not be using 6-men bases. I don't have enough SSD space for Syzygy 6-men, only 5-men on SSD, so that is what I will use. I have 6-men on hard disk just for analysis purposes.
Most of my 6-men Syzygy tests have been made with a normal hard disk, not with a SSD.
With the 6-men on a normal hard disk I find the strength improvement of 5 to 10 Elo I reported above for 60"+1" games, played between Houdini 4 with and without table bases.

Cheers,
Robert

Hi Mr Houdart
I already have the 6-men Nalimov EGTBs on a fast SSD Disk.
Do I really need to bother with the 6-men Syzygy ? Will be tiresome downloading 6-men Syzgy, but I'll do it if you think it will give a significant performance boost to H4 .

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 2:23 pm

pohl4711 wrote:
lkaufman wrote:
gerold wrote:
Dr.Wael Deeb wrote:The price of the new Houdini is reasonable considering the improvements advertised on it's web page....
Dr.D

P.S.I know that I'll be executed by Kim for this comment of mine

Hi Doc.
My guess is Houdini 4 is 70 elo. stronger than 2nd place engine.

Best,
Gerold.
I suppose you are talking about bullet chess and only official releases. Anyway H4 did end up beating h3 by 54 elo (+351 -197 =452) in my thousand game bullet match (30" +.3"), and is currently about 50 elo ahead of h3 at a slower 2' + 1.2" after about a thousand games (I'll actually run 4000 as I have 4 thousand game matches running on two monster computers). Allowing for some dropoff with longer time and when playing foreign opponents, I expect about +40 over h3 in the "new IPON" meaning the 5' + 3" CEGT list, which I think is overall the most meaningful list now considering both quantity of games and relevance of the time limit. This would put h4 about fifty above Komodo 6 or about 30 above our current version at this time limit. If so we're probably already equal with h4 at serious tournament levels but we have some work to do to catch h4 at blitz levels.
Most meaningful is the ratinglist with the highest quantity of games and with the highest number of strong opponents, when a strong engine like Houdini 4 is tested. And that is the LS-ratinglist with all the strong Ippolit-derivatives and Amitis, the Stockfish-derivative.
And the intermediate result after 2150 of 10000 games is only +30 Elo for Houdini 4 (compared with the final result of Houdini 3 in the LS-top10-tournament), although the LS-tests are done with bullet-speed, which is not of high "relevance"...
Final (and meaningful) result on Thursday on the LS-website...

Stefan

In general I approve of your methods and list, and I consider it an excellent way of measuring progress for any given engine. However it seems to me that including multiple Ippo derivatives (and now multiple Stockfish versions or derivatives) is pretty much like just playing more games against Ippo and Stockfish. Personally I consider closeness in rating to the engine being tested to be more important than variety of opposition, so I'm not suggesting you change anything. Which list is best depends on how people use the engine. For those who like to compete with other engines at bullet speed your list is clearly the best. For those who use it to review their games or for live analysis of tournament games the CEGT 5' + 3" list is more relevant. For analysis the CCRL 40/40 list is the best one that is not hopelessly out of date, along with the TCEC rating list which of course suffers from small samples.

Modern Times · Post by **Modern Times** » Tue Nov 26, 2013 2:29 pm

Houdini wrote: Most of my 6-men Syzygy tests have been made with a normal hard disk, not with a SSD.
With the 6-men on a normal hard disk I find the strength improvement of 5 to 10 Elo I reported above for 60"+1" games, played between Houdini 4 with and without table bases.

Cheers,
Robert

Righto then, I'll use 6-men ! Chess960 tests will start later today

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 2:31 pm

Houdini wrote:
lkaufman wrote: After 500 direct bullet (30" + 0.3") games against H3 I'm showing +48 elo, close enough to the claimed 50, but normally elo gains diminish with increased time limit and also against unrelated engines, so I'll "predict" that the real gain (say on the CEGT 5' +3" list, which is similar to IPON) will be around 30 elo. We'll see.
In my tests at 10"+0.1" and 120"+1.2" against 9 opponents Houdini 4 without table bases is about 45 Elo better than Houdini 3.
The Syzygy 6-men add another 5 to 10 Elo in my tests at 60"+1" time control, which explains the official number of "50 Elo" for the release.

How much this will produce in rating lists is always the big surprise, inasmuch as the time management of Houdini 4 has changed as well I'm not even trying to predict these numbers with a precision better than 20 Elo...

Cheers,
Robert

Are you saying that your results were about the same at 10" and at 120"? That would be surprising. If you don't mind, can you tell us your result specifically at 120" + 1.2", including the number of games played and the elo range of the opposition? I'm curious because so far the LS list reports +27 elo which probably differs from your +45 by more than the combined error, unless your 120" sample was rather small.
My own results for h4 against h3 are very good, but as we know these gains often shrink when testing against foreign opposition.
Anyway, you have clearly made a solid improvement over h3, and Houdini 4 is again going to be topping the rapid rating lists for now.

Houdini · Post by **Houdini** » Tue Nov 26, 2013 3:50 pm

lkaufman wrote: Are you saying that your results were about the same at 10" and at 120"? That would be surprising. If you don't mind, can you tell us your result specifically at 120" + 1.2", including the number of games played and the elo range of the opposition? I'm curious because so far the LS list reports +27 elo which probably differs from your +45 by more than the combined error, unless your 120" sample was rather small.
My own results for h4 against h3 are very good, but as we know these gains often shrink when testing against foreign opposition.
Anyway, you have clearly made a solid improvement over h3, and Houdini 4 is again going to be topping the rapid rating lists for now.

At 10"+0.1" I play 27,000 or 30,000 games (9 or 10 opponents x 3000).
At 120"+1.2" I play 3,600 or 4,000 games (9 or 10 opponents x 400).

The opponents for Houdini 4 range from Naum 4 (the weakest) to Houdini 3 (the strongest).
The gauntlets are run for every dev version of the engine, and the end the evolution (hopefully progress) is documented.

I've been using this system since Houdini 1.5; in my current lists I have the following relative ratings, settings Houdini 1.03 at a conventional 3000 points:

> 10"+0.1":
Houdini 1.03a: 3000
Houdini 1.5: 3045
Houdini 2.0: 3070
Houdini 3: 3120
Houdini 4: 3165

> 120"+1.2"
Houdini 1.03a: 3000
Houdini 1.5: 3050
Houdini 2.0: 3080
Houdini 3: 3140
Houdini 4: 3185

As you can see the results are very similar at 10" and 120".
It also shows the progress in 3.5 years

.

Cheers,
Robert

lkaufman · Post by **lkaufman** » Tue Nov 26, 2013 4:06 pm

Houdini wrote:
lkaufman wrote: Are you saying that your results were about the same at 10" and at 120"? That would be surprising. If you don't mind, can you tell us your result specifically at 120" + 1.2", including the number of games played and the elo range of the opposition? I'm curious because so far the LS list reports +27 elo which probably differs from your +45 by more than the combined error, unless your 120" sample was rather small.
My own results for h4 against h3 are very good, but as we know these gains often shrink when testing against foreign opposition.
Anyway, you have clearly made a solid improvement over h3, and Houdini 4 is again going to be topping the rapid rating lists for now.
At 10"+0.1" I play 27,000 or 30,000 games (9 or 10 opponents x 3000).
At 120"+1.2" I play 3,600 or 4,000 games (9 or 10 opponents x 400).

The opponents for Houdini 4 range from Naum 4 (the weakest) to Houdini 3 (the strongest).
The gauntlets are run for every dev version of the engine, and the end the evolution (hopefully progress) is documented.

I've been using this system since Houdini 1.5; in my current lists I have the following relative ratings, settings Houdini 1.03 at a conventional 3000 points:

> 10"+0.1":
Houdini 1.03a: 3000
Houdini 1.5: 3045
Houdini 2.0: 3070
Houdini 3: 3120
Houdini 4: 3165

> 120"+1.2"
Houdini 1.03a: 3000
Houdini 1.5: 3050
Houdini 2.0: 3080
Houdini 3: 3140
Houdini 4: 3185

As you can see the results are very similar at 10" and 120".
It also shows the progress in 3.5 years .

Cheers,
Robert

Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?

Milos · Post by **Milos** » Tue Nov 26, 2013 4:33 pm

lkaufman wrote: Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?

It is very easy to explain. RH has +45 (without TBs) with 1SD approximatelly 7Elo, SP has +27 with 1SD approximatelly 10Elo. So it is quite probable that the "real" improvement is 37Elo which is quite in the middle.
In addition to that SP has higher average rating of opponents with compreses the ratings more.

Houdini · Post by **Houdini** » Tue Nov 26, 2013 5:02 pm

lkaufman wrote: Thanks. Since your methodology seems to be very similar to the LS list, and since LS uses a time limit in between the two you use, can you offer any theory other than sample error for the discrepancy between your +45 figure and LS +27 figure (as last reported, subject to change of course)?

I have no idea.
The time that I wondered about rating list results is long gone

. One version gets 10 or 15 points too much, the next version gets 10 or 15 points too few... in the long run it more or less evens out.

All the best,
Robert

A Distel · Post by **A Distel** » Tue Nov 26, 2013 6:01 pm

Houdini wrote:Houdini 4 has been released today.
More information on our web page http://www.cruxis.com/chess/houdini.htm . Make sure to refresh the browser cache using F5 or Ctrl-F5 to get the most recent page.

Thank you for reading,
Robert

Mogen we een 'ChessBase' versie van 'Houdini 4 (DVD)' verwachten?

Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released

Re: Houdini 4 has been released