Quantifying progress && Naked WAC - TalkChess.com

Quantifying progress && Naked WAC

Moderators: hgm, Rebel, chrisw

9 posts • Page 1 of 1

AxolotlFever: Posts: 50; Joined: Sun Nov 11, 2018 9:26 pm; Location: Germany; Full name: Louis Mackenzie-Smith

Quantifying progress && Naked WAC

Post by AxolotlFever » Tue Dec 11, 2018 8:38 pm

Hi all,

I am trying to make my engine better, and it can be hard to tell if it is weak because it has bugs, or just because the evaluation function is not so good and everything is slow.

First, I would be really grateful if anyone can give me ideas of how to quantify progress. I could connect to arena and play other engines, but 1. they are all too strong for me and 2. this takes a while to see results.

Currently I am using the WAC test suite, at 20 seconds a move, and I get about 150 / 300, without null move / tt / lmr. With everything turned on, I get to about 170. Has anyone else compared their regular WAC scores with a "naked" WAC ?

Kind regards, and thanks for any help,
Louis

PS: I would in particular find it really useful if you could say something like "at 30 seconds a move, with only basic alpha beta, you should get a score of xxx / 300". This would give me a clear way of evaluating where my engine is failing. Thanks!

My chess engine is Axolotl: https://github.com/louism33/Axolotl
And it uses: https://github.com/louism33/
Othello/Reversi: https://github.com/louism33/Mudpuppy

xr_a_y: Posts: 1871; Joined: Sat Nov 25, 2017 2:28 pm; Location: France

Re: Quantifying progress && Naked WAC

Post by xr_a_y » Tue Dec 11, 2018 9:14 pm

you can play 1000 games at 0:10/40 or 0:20/40 TC against tscp or micro-max.

Vivien Clauzon
Weini
https://github.com/tryingsomestuff/Weini
Minic
https://github.com/tryingsomestuff/Minic
https://github.com/tryingsomestuff/NNUE-Nets

Ratosh: Posts: 77; Joined: Mon Apr 16, 2018 6:56 pm

Re: Quantifying progress && Naked WAC

Post by Ratosh » Tue Dec 11, 2018 11:02 pm

Hi Louis,

If you wanna test the base strength of your engine you can use some really low rated engine on CCRL or any other rating list, but the best way to measure progress is using SPRT. I use OpenBench by Andrew Grant, it has integration with GitHub branches, so you can test if a branch is "better" than development, It's pretty similar to Fish-test but you run the server and the client on your machine. On my OpenBench fork you can see the changes i made on the client to build my engine using Gradle.

PS.: The link is pointing to my fork.

Dann Corbit: Posts: 12538; Joined: Wed Mar 08, 2006 8:57 pm; Location: Redmond, WA USA

Re: Quantifying progress && Naked WAC

Post by Dann Corbit » Wed Dec 12, 2018 12:20 am

AxolotlFever wrote: ↑Tue Dec 11, 2018 8:38 pm Hi all,

I am trying to make my engine better, and it can be hard to tell if it is weak because it has bugs, or just because the evaluation function is not so good and everything is slow.

First, I would be really grateful if anyone can give me ideas of how to quantify progress. I could connect to arena and play other engines, but 1. they are all too strong for me and 2. this takes a while to see results.

Currently I am using the WAC test suite, at 20 seconds a move, and I get about 150 / 300, without null move / tt / lmr. With everything turned on, I get to about 170. Has anyone else compared their regular WAC scores with a "naked" WAC ?

Kind regards, and thanks for any help,
Louis

PS: I would in particular find it really useful if you could say something like "at 30 seconds a move, with only basic alpha beta, you should get a score of xxx / 300". This would give me a clear way of evaluating where my engine is failing. Thanks!

The absolute score of the WAC is probably not that important.
What it is most useful for is to find holes in evaluation.
If it is a king safety position and you do not find it, then look at your king safety.

Another fun test is to permute the WAC and see if you get the same results (you should).
You might see minor differences due to SMP or if you order the moves differently.

Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

xr_a_y: Posts: 1871; Joined: Sat Nov 25, 2017 2:28 pm; Location: France

Re: Quantifying progress && Naked WAC

Post by xr_a_y » Fri Dec 28, 2018 5:50 pm

Is there a formula with WAC to convert results xxx/300 to estimated elo ? I know some other test suite does provide this, maybe someone has one for WAC ?

Vivien Clauzon
Weini
https://github.com/tryingsomestuff/Weini
Minic
https://github.com/tryingsomestuff/Minic
https://github.com/tryingsomestuff/NNUE-Nets

hgm: Posts: 27790; Joined: Fri Mar 10, 2006 10:06 am; Location: Amsterdam; Full name: H G Muller

Re: Quantifying progress && Naked WAC

Post by hgm » Fri Dec 28, 2018 7:28 pm

That doesn't seem possible. It used to be that tuning your engine for maximum score would actually decrease its Elo. (Not sure whether this is still the case for modern engines.) As extremely weak engines (e.g. with hardy any search depth) would also not score very well, that would mean any relation between WAC score and Elo would not be a 1:1 mapping.

JVMerlino: Posts: 1357; Joined: Wed Mar 08, 2006 10:15 pm; Location: San Francisco, California

Re: Quantifying progress && Naked WAC

Post by JVMerlino » Fri Dec 28, 2018 7:49 pm

hgm wrote: ↑Fri Dec 28, 2018 7:28 pm That doesn't seem possible. It used to be that tuning your engine for maximum score would actually decrease its Elo. (Not sure whether this is still the case for modern engines.) As extremely weak engines (e.g. with hardy any search depth) would also not score very well, that would mean any relation between WAC score and Elo would not be a 1:1 mapping.

While I (mostly) agree with this, scoring only 170/300 at 20 seconds per move shows some clear problems. Myrddin v0.82, which was rated around 1925, could get 262/300 at 10 seconds per move. That's on hardware that would be 10+ years old now.

I also highly doubt that the problem is in the evaluation (as the OP suggested this as one of the possible issues). I would be curious to see some output from the engine showing nodes per second and time to depth. Or even the entire output from your WAC test. This could enlighten us quite a bit.

xr_a_y: Posts: 1871; Joined: Sat Nov 25, 2017 2:28 pm; Location: France

Re: Quantifying progress && Naked WAC

Post by xr_a_y » Fri Dec 28, 2018 8:40 pm

For what it's worth Minic 0.28 is scoring 266/300 @ 3sec per position on a 7 years old hardware single thread.

Vivien Clauzon
Weini
https://github.com/tryingsomestuff/Weini
Minic
https://github.com/tryingsomestuff/Minic
https://github.com/tryingsomestuff/NNUE-Nets

Dann Corbit: Posts: 12538; Joined: Wed Mar 08, 2006 8:57 pm; Location: Redmond, WA USA

Re: Quantifying progress && Naked WAC

Post by Dann Corbit » Fri Dec 28, 2018 8:46 pm

xr_a_y wrote: ↑Fri Dec 28, 2018 5:50 pm Is there a formula with WAC to convert results xxx/300 to estimated elo ? I know some other test suite does provide this, maybe someone has one for WAC ?

The main purpose for a simple test like WAC is to find programming errors and missing evaluation features.

You can tune your engine to WAC in the early stages of writing it, but for best results in game play you should tune it with game play.

If you get many wrong answers with WAC problems at 10 seconds per position, then you have errors in your evaluation and search.

A modern engine should easily get 275/300 at that time control. There are many engines which will get all but WAC.230 correct at ten seconds per move. WAC.230 is also busted, in that the best move does not win (proved by Alex Szabo) because there is an equally brilliant refutation where the defender creates a passed pawn. Still, the rook sacrifice is the only move that has any winning chances, so in that sense it really is the best move.

Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

9 posts • Page 1 of 1

Return to “Computer Chess Club: Programming and Technical Discussions”