different kinds of testing

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: different kinds of testing

Post by hgm »

Don wrote:xboard is actually pretty fast even with the GUI compared to arena, however I'm sure the graphics still slows it down considerably.

Can you run fractional time controls with fraction increments? Like game in 6 seconds + 0.1 increment?

Since I have my own tester and run on linux, I really like using xboard for watching a few games at slower time controls.

On my wish list for xboard is to support UCI in addition to xboard so that matches could be played without the annoying polyglot adaptor. That would be simple to do except for the support to change things in the program - which requires a bunch of widget programming.
Fractional seconds can be acheived by using a time-oddes factor. E.g. if you give both engines a time factor 10, and then ask for 1 min + 1 sec/move you will play in fact 6 sec + 0.1 sec/move. The problem, however, is that B protocol does not support transmitting the fractional increment to the engine, so they ould receive level 0 0:06 0 after rounding, and the increment would come to them as a surprise. Most engines would not understand a fraction in the TC string, though, and might fail to recognize the level command altogether if you sent them one.

The graphics do indeed slow XBoard down, (it seems that especially changing the icon on the task bar eats time), which is why the -noGUI option shoud be used to suppress any graphics.

I am not sure why you think using Polyglot is annoying. Its use is completely transparent. Absolutely nothing would change for the user when XBoard would implement UCS protocol itself. You would still have to supply the -fUCI option to tell Xoard the egine is UCI. Also performance wise it would probably make no difference: the conversion to UCI would have to be done anyway, and it make little difference if you would make a thread in XBoard do it compared to having a seperate process doing it.

The engine settings can already be controlled from the user interface. Although I admit that the layout of the Engine Sttings dialog is not so nice as in WinBoard. It looks a bit jumbled, especially on an engine with many UCI options.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: different kinds of testing

Post by Don »

hgm wrote:
Don wrote:xboard is actually pretty fast even with the GUI compared to arena, however I'm sure the graphics still slows it down considerably.

Can you run fractional time controls with fraction increments? Like game in 6 seconds + 0.1 increment?

Since I have my own tester and run on linux, I really like using xboard for watching a few games at slower time controls.

On my wish list for xboard is to support UCI in addition to xboard so that matches could be played without the annoying polyglot adaptor. That would be simple to do except for the support to change things in the program - which requires a bunch of widget programming.
Fractional seconds can be acheived by using a time-oddes factor. E.g. if you give both engines a time factor 10, and then ask for 1 min + 1 sec/move you will play in fact 6 sec + 0.1 sec/move. The problem, however, is that B protocol does not support transmitting the fractional increment to the engine, so they ould receive level 0 0:06 0 after rounding, and the increment would come to them as a surprise. Most engines would not understand a fraction in the TC string, though, and might fail to recognize the level command altogether if you sent them one.

The graphics do indeed slow XBoard down, (it seems that especially changing the icon on the task bar eats time), which is why the -noGUI option shoud be used to suppress any graphics.

I am not sure why you think using Polyglot is annoying. Its use is completely transparent. Absolutely nothing would change for the user when XBoard would implement UCS protocol itself. You would still have to supply the -fUCI option to tell Xoard the egine is UCI. Also performance wise it would probably make no difference: the conversion to UCI would have to be done anyway, and it make little difference if you would make a thread in XBoard do it compared to having a seperate process doing it.

The engine settings can already be controlled from the user interface. Although I admit that the layout of the Engine Sttings dialog is not so nice as in WinBoard. It looks a bit jumbled, especially on an engine with many UCI options.
The polyglot interface works fine. The annoying part isn't about it's functionality it is all the additional infrastructure. Now it's not just a tester and 2 programs, it's a tester, 2 programs, 2 polyglot.ini files and the polyglot software. And even polyglot has support for even more stuff such as books. That's not a bad thing of course, but it starts to get crazy that you have so much configuration involved to get 2 programs testing together.

Anyway, I have no complaints about xboard - it's a really nice tool and it's the only linux GUI that works without bugs - (which is a pathetic state of affairs.) I don't really need to use it much for testing because I have my own tester which is super powered, but it's not designed as a user interface. So xboard is my user interface. I still use xboard for casual testing because my tester does not yet support xboard style programs (something I intend to fix soon.) I will also be fixing my own program to use the xboard protocol as an option.

Is there an adapter that goes from xboard to UCI protocol?
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: different kinds of testing

Post by hgm »

Don wrote:Is there an adapter that goes from xboard to UCI protocol?
Yes, Polyglot. But I probably misunderstand your question, as I am sure you kno that...
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: different kinds of testing

Post by Don »

hgm wrote:
Don wrote:Is there an adapter that goes from xboard to UCI protocol?
Yes, Polyglot. But I probably misunderstand your question, as I am sure you kno that...
I didn't know that it went both ways. I thought it only went from UCI to xboard protocol.

- Don
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: different kinds of testing

Post by hgm »

Oh, you mean a WB engine on a UCI GUI. No, sorry for the misunderstanding, Polyglot does not do that. (It does UCI to UCI, though, to act as book-adapter.)

Yes, the inverse adapter does also exist. It is called WB2UCI. I never used it, though.
Will Singleton
Posts: 128
Joined: Thu Mar 09, 2006 5:14 pm
Location: Los Angeles, CA

Re: different kinds of testing

Post by Will Singleton »

I'm wondering whether it's better to test from a set of opening positions, flipping the colors, or rather from game start using books. I guess you'd get more volatility using books, so you'd need more games. But if you eliminate the books, then you're not testing from the positions your program would typically get into.

And if you use a set of opening positions, I guess you'd have to have a large set if you wanted to play 1000 games without duplication. Anyone have a set like that?

Will
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: different kinds of testing

Post by Don »

Will Singleton wrote:I'm wondering whether it's better to test from a set of opening positions, flipping the colors, or rather from game start using books. I guess you'd get more volatility using books, so you'd need more games. But if you eliminate the books, then you're not testing from the positions your program would typically get into.

And if you use a set of opening positions, I guess you'd have to have a large set if you wanted to play 1000 games without duplication. Anyone have a set like that?

Will
I created a set of openings from several hundred thousand master games. My book goes to depth 10 ply. This is not very deep but I am more interested in the testing the program than the book. A custom book can be designed later.

I sanitized the huge PGN file I extracted these from so that no games repeated. I required that any given move be played at least N times so that I am not looking at ridiculous positions. I don't remember what N was, but it generated more than 3785 different starting positions, 5 moves (or 10 ply) deep.

I should also mention that I checked them for transpositions. The resulting end positions are are all unique.

I also checked for "likely" transpositions by leting the program play through all the openings and I removed an opening if it transposed to another opening within 4 ply of leaving book. Or something like that, I don't remember the exact rule. Nothing you can easily do would guarantee no eventual transpositions (you could eventually transpose into a rook vs king ending if you go crazy.)

So I ended up with 3785 carefully selecting starting positions for testing. So for any 2 players I can generate 7570 games. Sometimes I want bigger samples than this, but my tester does round robins between up to 256 players, so even 3 players will give me 15000 games per player.

My autotester also takes care that openings are not played in the same sequence. (Actually, it's deterministic by a hash of the 2 players names.) Even between players you will not play the same first white opening as your opponent played with you. I think this is fairly important to do so that your shorter tests are not always hammering the same openings.

I can give anyone who wants my openings. it's in a C file that looks like this:


#define BOOK_SIZE 3785

char *book[BOOK_SIZE] = {

"d2d4 g8f6 c2c4 e7e6 b1c3 f8b4 c1d2 e8g8 g1f3 d7d5",
"e2e4 d7d6 d2d4 g8f6 b1c3 g7g6 f1e2 f8g7 h2h4 c7c5",
"d2d4 d7d6 c2c4 e7e5 g1f3 e5e4 f3d2 f7f5 e2e3 g8f6",
"e2e4 e7e5 g1f3 b8c6 f1b5 g8f6 e1g1 d7d6 d2d4 e5d4",
"c2c4 g8f6 b1c3 e7e6 g1f3 d7d5 d2d4 f8b4 e2e3 e8g8",
...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: different kinds of testing

Post by bob »

Will Singleton wrote:I'm wondering whether it's better to test from a set of opening positions, flipping the colors, or rather from game start using books. I guess you'd get more volatility using books, so you'd need more games. But if you eliminate the books, then you're not testing from the positions your program would typically get into.

And if you use a set of opening positions, I guess you'd have to have a large set if you wanted to play 1000 games without duplication. Anyone have a set like that?

Will
I have a set of 4,000 positions. And a set of 8,000, and finally a set of 16,000 for really high-resolution comparisoins.

While testing without a book does skew things a bit, it does so in a _good_ way. Because a good book can only help your program, and while testing you are improving the program overall, which is a good thing due to lots of book tricks like transpositions, or early h3/a3 moves, etc, which will lead you into positions you failed to prepare a book line for.
Jan Brouwer
Posts: 201
Joined: Thu Mar 22, 2007 7:12 pm
Location: Netherlands

Re: different kinds of testing

Post by Jan Brouwer »

Don wrote:
Jan Brouwer wrote: I am interested to hear what specific testing regimes people are using.
I use 20 opening positions x 2 (white / black) * 6 oppenents = 240 games at 10 s + 1 s / move. This takes about a night...
Since the last two versions of my program, I use a a kind of verification test which repeats the above 4 times.
I realise that this means that the uncertainty in the results is quite large, but the process seems to work so far, my program has reached 2669 on the CCRL list.
Which program is yours?
It's called Rotor (http://home.kpn.nl/f2hjbrouwer120/index.html).
But I wonder how much further I can take this as the improvements become smaller and smaller.
Probably I also need to start testing at shorter time controls, which means finding an alternative to Arena.
I think you can still get good testing with modest means. But you are pretty much forced into much faster testing if you want to get a statistically viable sample. You can also run carefully constructed tests during periods of time when you know you will doing other things and cannot do chess.

But you cannot do this very well with Arena. Do what I did, build your own tester than is not graphical. (The graphics kills the speed when you are talking about hyper-bullet time controls.)
Yes, I will probably do this, maybe a good opportunity to improve my C# skills a bit :wink:
It is also encouraging to hear that testing at very fast time controls is a viable method.
User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 10:09 am
Location: Spain

Re: different kinds of testing

Post by Aser Huerga »

Jan Brouwer wrote: It's called Rotor (http://home.kpn.nl/f2hjbrouwer120/index.html).


Hello Jan,

please, could you check this link? Clicking on Rotor 0.5 sends you to a page where there is no clear download link.

Thanks.