Glaurung 2.2

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Glaurung 2.2

Post by IWB »

Hello Tord and others,
Tord Romstad wrote:Hi all,

...

When I tested the most recent development version before the tournament, it performed poorly, and I ended up using version 2.1 in the tournament. Evidently, I didn't test thoroughly enough, because recent tests indicate that the version I prepared for the OPCCC is considerably stronger than 2.1.

...

Tord
As this is still here I add some of my test even if it is not the tournament part of the forum, sorry if anyone is upset by these but I try to keep things together.

Code: Select all

Glaurung 2.2 JA 1T        : 2536  1000 (+257,=370,-373), 44.2 %

LoopMP 12.32                  : 100 (+ 32,= 36,- 32), 50.0 %
Deep Shredder 10 x64 1T       : 100 (+ 32,= 36,- 32), 50.0 %
Spike 1.2 Turin               : 100 (+ 44,= 36,- 20), 62.0 %
Fruit 05/11/03                : 100 (+ 31,= 43,- 26), 52.5 %
Toga II 1.4 beta5c BB         : 100 (+ 29,= 41,- 30), 49.5 %
Rybka 2.2n2 mp 1T             : 100 (+  9,= 31,- 60), 24.5 %
Shredder Bonn 1T              : 100 (+ 21,= 26,- 53), 34.0 %
DSjeng WC2008 x64 1T          : 100 (+ 26,= 43,- 31), 47.5 %
Naum 4                        : 100 (+ 13,= 32,- 55), 29.0 %
H12.1 MP 1T                   : 100 (+ 20,= 46,- 34), 43.0 %

Code: Select all

Glaurung 2.1 1T           : 2508  800 (+212,=273,-315), 43.6 %

LoopMP 12.32                  : 100 (+ 27,= 35,- 38), 44.5 %
Deep Shredder 10 x64 1T       : 100 (+ 33,= 32,- 35), 49.0 %
Spike 1.2 Turin               : 100 (+ 39,= 35,- 26), 56.5 %
Fruit 05/11/03                : 100 (+ 27,= 36,- 37), 45.0 %
Naum 3.1 1T                   : 100 (+ 21,= 37,- 42), 39.5 %
Toga II 1.4 beta5c BB         : 100 (+ 24,= 34,- 42), 41.0 %
Deep Shredder 11 x64 1T       : 100 (+ 22,= 28,- 50), 36.0 %
Shredder Bonn 1T              : 100 (+ 19,= 36,- 45), 37.0 %
Played with ponder on, ONE Thread, 256 MB Hash, just 4 pc Tbs, opening positions, changing colors, no learning, 6 min + 3sec, equal hardware (of course).
OK, not particular the same opponents and maybe a bit too much Shredder but all this lasts in a plus of 28 Elo with Elostat.

Bye
Ingo
Spock

Re: Glaurung 2.2

Post by Spock »

After 900 chess960 games now (9 opponents x 100) Glaurung 2.2 is holding a +55 ELO gain over 2.1

http://www.computerchess.org.uk/ccrl/40 ... _pure.html

Still more opponents to play, but I don't expect the rating to change much now. I'd expect it to hold on to a 50+ gain, fingers crossed
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Glaurung 2.2

Post by bob »

Spock wrote:After 900 chess960 games now (9 opponents x 100) Glaurung 2.2 is holding a +55 ELO gain over 2.1

http://www.computerchess.org.uk/ccrl/40 ... _pure.html

Still more opponents to play, but I don't expect the rating to change much now. I'd expect it to hold on to a 50+ gain, fingers crossed
Almost every test I run sees major changes after even 1,000 games, so with that few games anything can happen. Never forget the "error bar".
Spock

Re: Glaurung 2.2

Post by Spock »

bob wrote:
Almost every test I run sees major changes after even 1,000 games, so with that few games anything can happen. Never forget the "error bar".
Major changes are theoretically possible after 1,000 games, but I've never seen it happen to me. But my testing is a small fraction of the testing that you carry out... The error bars are there of course, I 100% agree with that
Spock

Re: Glaurung 2.2

Post by Spock »

bob wrote: Almost every test I run sees major changes after even 1,000 games, so with that few games anything can happen. Never forget the "error bar".
The FRC site is running an older version of the web pages, but on the 40/40 list you can see "Rating changes with played games" graphs. Take a look at the one for Spike (you need to scroll down, the last graph).

http://www.computerchess.org.uk/ccrl/40 ... _1_2_Turin

Unfortunately there is no scale on the x axis, but you know there are just under 6,000 games so then you know roughly. Reading the graph you can see maybe a 5-10 ELO change between 1,000 and 6,000 games at the very most ?
Green line is actual rating, red lines are upper and lower error margins, and blue line is number of games.
PauloSoare
Posts: 1335
Joined: Thu Mar 09, 2006 5:30 am
Location: Cabo Frio, Brasil

Re: Glaurung 2.2

Post by PauloSoare »

Thanks for Glaurung 2.2, Tord. Glaurung 2.2 (JA) following strong at the endgames. In Eigenmann_Endgame_Test, using a Q6600 4 cores, HT = 1Gb, it maintained the same number of points of Glaurung 2.1 (JA), just behind of Rybka 3 and Zappa Mexico II.
Uri was the first person that call my attention to Glaurung endgame.
Important, the official Eigenmann_Endgame_Test is made for
Walter, and he runs the test whith a single processor. The link:

http://glareanverlag.wordpress.com/2007 ... endspiele/
PauloSoare
Posts: 1335
Joined: Thu Mar 09, 2006 5:30 am
Location: Cabo Frio, Brasil

Re: Glaurung 2.2

Post by PauloSoare »

Please, Bob, make a Crafty UCI.
Thanks,
Paulo Soares
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Glaurung 2.2

Post by Dr.Wael Deeb »

PauloSoare wrote:Please, Bob, make a Crafty UCI.
Thanks,
Paulo Soares
Only in your dreams Paulo :lol:
It's easier to make a cow fly regards,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Glaurung 2.2

Post by bob »

Dr.Wael Deeb wrote:
PauloSoare wrote:Please, Bob, make a Crafty UCI.
Thanks,
Paulo Soares
Only in your dreams Paulo :lol:
It's easier to make a cow fly regards,
Dr.D
You are correct for several reasons.

(1) it is a non-trivial process since UCI usurps many responsibilities that normally lie with the engine. Such as which book moves to play, which EGTB (at root positions) move to play, when to ponder, what to ponder, how long to search, etc. The UI, IMHO, should be responsible for _interfacing_ between the user and the engine. That is what UI means. GUI just adds a graphical chess board display to the mixture. I don't want the UI to handle the book or book learning, decide what EGTB move to play and bypass Crafty's "swindle mode" and "missing EGTB code", nor telling me when to move, etc. I want the UI to accept a move from the user and pass it to my program, accept a move from my program, and pass it back to the user. It might be asked to pass other information as well, such as a PV or score, and it might be asked to pass clock information back to the program since the UI typically manages the clock since it is responsible for displaying it and deciding which clock should run based on whose turn it is to move.

Current UIs are going way too far, particularly when they are used by multiple people in a tournament. And since the winboard protocol is perfectly capable of allowing Crafty to play as it chooses, and since this protocol works well with Crafty, I don't see any justification to whack crafty to pieces to make it conform to the UCI protocol. yes the "button and knob stuff" that you can define for each engine is neat. But the rest of the protocol is not so neat...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Glaurung 2.2

Post by bob »

Spock wrote:
bob wrote:
Almost every test I run sees major changes after even 1,000 games, so with that few games anything can happen. Never forget the "error bar".
Major changes are theoretically possible after 1,000 games, but I've never seen it happen to me. But my testing is a small fraction of the testing that you carry out... The error bars are there of course, I 100% agree with that
I'll try to post some samples. I have a small shell script I run in a window that every 30 seconds goes out and grabs all the PGN from the current match running on the cluster and then displays the BayesElo output. I can't get to the files right now, but I have captured that for an hour or so and it shows what I mean. It is quite common to see the Elo change by 30 from 1,000 games to 32,000 games, and on occasion it has changed by much more. Particularly after just a couple of hundred... I will post this later today...