CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.1

Wolfgang · Post by **Wolfgang** » Mon Apr 30, 2007 9:57 am

mclane wrote:.....
i wonder how those different results happen.

come on Thorsten, how long do you test chess computers and engines? 20 years or 20 days?

different conditions, time controls, books, number of games etc. etc.

Look at CCRL-list. CT2007.1 has only played 34 games so far (error bar +/- 100) and is about 130 points (!) behind CT2007. Even if there are very very few games played so far, i can't believe that it will be significantly ahead the older version after (many) more games.

i do mainly test in ARENA.

thats my favourite one too. But CT also runs good under others, i also tested with Shredder Classic and ERT and did not face problems.

since the name of the engine-executable of tiger2007 is similar to that of tiger2007.1 it could be possible that someone tests e.g. the old version but believes he is testing the new version.

maybe for "someone", but not for experienced testers like Werner. We all know the problem with identical names but it is quite easy to test (doubleclick the .exe, type "uci" + enter) which version is playing

as you maybe know, CEGT list has also problems to differenciate different Loop versions.
so CT is not the only case where the list cannot find out about different versions. i wonder why.
there is another strange thing in the CEGT list. that shredder10.1 is behind.
so we have in fact 3 cases where something very strange is going on. i wonder if there is a hidden parameter in testing methods that produces wrong data or that produces wrong interpretation.

as Werner just wrote, your posting is quite unfair, because you know very well that things like this can happen (must not, but can). It is ridiculous to speak about "hidden parameter in testing". Our testing methods are well known and did not change. Even if there would be such a "hidden" thing (how can I do that?

), why does it only affect three engines and not 300?

Wolfgang

Graham Banks · Post by **Graham Banks** » Mon Apr 30, 2007 10:14 am

Hi Wolfgang,

I'd like to back Werner and yourself up here.
I think that Thorsten is being too harsh on all testers and testing groups whose results don't compare with his.

Regards, Graham.

Wolfgang · Post by **Wolfgang** » Mon Apr 30, 2007 10:37 am

Graham Banks wrote:Hi Wolfgang,

I'd like to back Werner and yourself up here.
I think that Thorsten is being too harsh on all testers and testing groups whose results don't compare with his.

Regards, Graham.

thanks Graham

btw, do you (not you personally, but CCRL) plan to play further games with Tiger 2007.1?

Shaun · Post by **Shaun** » Mon Apr 30, 2007 10:40 am

Werner,

Thanks again for posting your latest update.

I certainly watch you results with interest - I find it useful to compare the CEGT and our CCRL results - particularly when results are unexpected.

The existence of multiple testing groups/ratings lists is vital to provide greater accuracy as it helps prevent incorrect ratings as significant differences will be investigated and errors found.

With regards to Chess Tiger 2007.1 it looks to me like this is an improvement at fast time controls but as the time control gets slower the results get relatively worse

- obviously more games are required to prove this.

(In a way I hope this is proved as I fear that sometimes good changes may be thrown out because they fail at blitz and a change in testing strategy may prevent this).

When testing Toga versions for Thomas I ran 40/4 then 40/12 and finally 40/40 - the 40/12 test was to hopefully identify trends to save 40/40 testing - more investigation is required regarding testing methodology to validate theory against results.

And could even faster time controls be used to spot trends???

Anyway keep up the good work

Shaun

Graham Banks · Post by **Graham Banks** » Mon Apr 30, 2007 10:44 am

Wolfgang wrote: btw, do you (not you personally, but CCRL) plan to play further games with Tiger 2007.1?

Yes we intend to test Chess Tiger 2007.1 as thoroughly as we tested Chess Tiger 2007.
Hopefully its results will pick up!

Regards, Graham.

Shaun · Post by **Shaun** » Mon Apr 30, 2007 10:46 am

Wolfgang,

I will be running additonal games with CT 2007.1 at all CCRL time controls.

One interesting result will be 40/12 (to me at least) - I currently have a CT 2007 gauntlet running (several hundred games on a single PC - I think it has been running 4 weeks or so now) once this is finished I will upgrade to CT 2007 and re-run - the only difference being the new CT2007.1 engine - this gives ideal comparison as everything is identical - unfortunately I wont have the results for about a month.

Time has been critical recently so I am running a lot of longer Tournament to save admin

Shaun

Wolfgang · Post by **Wolfgang** » Mon Apr 30, 2007 10:58 am

Shaun wrote:Wolfgang,

I will be running additonal games with CT 2007.1 at all CCRL time controls.

One interesting result will be 40/12 (to me at least) - I currently have a CT 2007 gauntlet running (several hundred games on a single PC - I think it has been running 4 weeks or so now) once this is finished I will upgrade to CT 2007 and re-run - the only difference being the new CT2007.1 engine - this gives ideal comparison as everything is identical - unfortunately I wont have the results for about a month.

Time has been critical recently so I am running a lot of longer Tournament to save admin

Shaun

Hi Shaun,

thanks for the info. After my blitz-test with new DeepSjeng 2.5 is finished, I will start one with CT2007.1, maybe there is more improvement with shorter time controls?!

Take your time for the tests, it is no problem to wait some weeks. As you say, time becomes more and more critical for nearly all of us.

mclane · Post by **mclane** » Mon Apr 30, 2007 12:08 pm

i have started ct2007.1 versus toga1.2.1a.
same time control.

so far:

Code: Select all

   Motor                          Punkte   Ch  To    S-B
1&#58; ChessTiger2007.1 UCI           2,5/3   ··· 1=1    1,25
2&#58; TogaII1.2.1a &#91;performance.bin&#93; 0,5/3   0=0 ···    1,25

3 Partien von 50 gespielt
Name des Turniers&#58; 2007.1 match toga
Ort/ Land&#58; SCW, Germany
Spielstufe&#58; Turnier 40/20

Tony Thomas · Post by **Tony Thomas** » Mon Apr 30, 2007 12:10 pm

Graham Banks wrote:Hi Wolfgang,

I'd like to back Werner and yourself up here.
I think that Thorsten is being too harsh on all testers and testing groups whose results don't compare with his.

Regards, Graham.

Thorsten doesnt seem to understand anything about statistics or its relevence in chess. He seem to think that we can repeat the same results in chess every day, regardless of the location, computer, book and other such things. I wonder why the highest rated human player doesnt win all the tournaments? May be it is his tournament that has problems. I dont even understand why he would rant about Shredder 10.1, as it only addresses compatiility problems, and Stephan said nothing about any improvements in strength.

mclane · Post by **mclane** » Mon Apr 30, 2007 12:13 pm

where am i harsh ? please quote the word or sentence that indicated this.

when i have 3 programs in a rating list,

loop versions.
shredder versions
ct2007 versions.

and i get with all 3 versions strange results i have to ask myself
WHY i do get those results.

here on all my machines tiger2007.1 is doing much better than 2007.

CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.1

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.

Re: CEGT 40/20 update including DeepSjeng2.5+ChessTiger2007.