Poor mans testing process please

xr_a_y · Post by **xr_a_y** » Fri Mar 20, 2020 1:44 pm

I still don't get the issue with knowledge and facility.
If you have a working engine implementing Xboard or UCI protocol (and if not, just do it asap, it is easy in its simpler form) you can use whatever OS you want and run some little game. Thus you can test.

abulmo2 · Post by **abulmo2** » Fri Mar 20, 2020 2:07 pm

lauriet wrote: ↑Fri Mar 20, 2020 8:33 am I'm afraid I have very limited knowledge and faciitiy to do any "hitech" testing.
I think the best I can do is to use test suites.
I have looked at 'STS'.
Can anyone explain how I can use this to give me a 'Ball Park' idea.
I'm not looking for +/- 10 elo resolution, but would be happy to know if my engine is 1800 or 2200.

Test suites are very inaccurate. I just use them to check in 5 minutes that I did not introduce a big bug. Playing (a lot of) games is the only way to get an estimation of a program Elo. LC0 is notorious at doing poorly on tactical test suites like STS, although it is incredibly strong at playing games.

xr_a_y · Post by **xr_a_y** » Fri Mar 20, 2020 2:17 pm

xr_a_y wrote: ↑Fri Mar 20, 2020 1:44 pm I still don't get the issue with knowledge and facility.
If you have a working engine implementing Xboard or UCI protocol (and if not, just do it asap, it is easy in its simpler form) you can use whatever OS you want and run some little game. Thus you can test.

Look at other Pascal engine (for instance alouette) here : https://github.com/rchastain/alouette/b ... te.pas#L93

Daniel Anulliero · Post by **Daniel Anulliero** » Sat Mar 21, 2020 8:06 am

Henk wrote: ↑Fri Mar 20, 2020 12:01 pm
lauriet wrote: ↑Thu Mar 19, 2020 12:41 am Hi all,
I have pretty much no facilities for testing changes in my program for gains or loses.
What is the best compromise that will get me in the ball park? I can, and have a lot of stats, but is there a most logical/reliable way to determine if a change is a plus or a minus ?

Time to depth ?
Node counts ?
Cut offs ?
TT hits.
etc, etc.

Thanks
Laurie (LTchess2)
They say I am a bad tester. Play one game and see what errors it made. Try correct these errors in your software. Extract positions. Always make sure that errors are reproducable.

Of course you can follow Henk's testing methodology.
Just ask him how much elo did he win for Skipper with that

Henk · Post by **Henk** » Sat Mar 21, 2020 10:30 am

Daniel Anulliero wrote: ↑Sat Mar 21, 2020 8:06 am
Henk wrote: ↑Fri Mar 20, 2020 12:01 pm
lauriet wrote: ↑Thu Mar 19, 2020 12:41 am Hi all,
I have pretty much no facilities for testing changes in my program for gains or loses.
What is the best compromise that will get me in the ball park? I can, and have a lot of stats, but is there a most logical/reliable way to determine if a change is a plus or a minus ?

Time to depth ?
Node counts ?
Cut offs ?
TT hits.
etc, etc.

Thanks
Laurie (LTchess2)
They say I am a bad tester. Play one game and see what errors it made. Try correct these errors in your software. Extract positions. Always make sure that errors are reproducable.
Of course you can follow Henk's testing methodology.
Just ask him how much elo did he win for Skipper with that

[-200, 0]
Probably first version or second version was best but I stole/copied piece square values from chess programming website.

lauriet · Post by **lauriet** » Sat Mar 21, 2020 10:39 pm

My program has its own text interface and I don't have the time/knowledge/inclination to implement an external gui or a mechanism to pipe it's inputs and outputs .
I have started to try "time to depth" and cutting out features like null move, futility move ordering etc and this gives me what appears to be a good indication of a benefit or not.
Let me state again that I'm not looking to discern a 10 elo difference, but rather if the change is in the ballpark.
I am assuming that if time to depth is improved and the printed PV doesn't change then the change must be helpful.... Right ???

hgm · Post by **hgm** » Sat Mar 21, 2020 11:05 pm

Well, that would only be true if it would not change the PV on a few thousand different test positions. That it leaves one PV unchanged in one position doesn't mean it wouldn't cause blundering in many other positions.

Time-to-depth also doesn't mean much, as depth is an ill-defined concept in a search that uses reductions and extensions. You can greatly reduce the time for a given depth by reducing more side branches, or pruning them altogether. But that will also make the engine weaker for the same nominal depth.

BTW, the nice thing about engines is that you never need to implement a mechanism to pipe its inputs and outputs. It is the GUI that does this for you, and the engine doesn't need to know anything about it. It only needs a text interface, reading from stdin, and writing to stdout.

If you don't have the time to let your engine's text interface respond to the commands 'new', 'force' and 'go', I guess the only way will be to play the games against other engines over the board, as an operator, or feed it the couple of thousand test positions by hand.

Ovyron · Post by **Ovyron** » Sun Mar 22, 2020 12:37 am

lauriet wrote: ↑Sat Mar 21, 2020 10:39 pm My program has its own text interface and I don't have the time/knowledge/inclination to implement an external gui or a mechanism to pipe it's inputs and outputs .

You could save a lot, and I mean A LOT of time if you devoted a bit of it to implement a protocol like Winboard's or UCI's and allowed your engine to play games in a GUI automatically.

This is on the scale of being able to have the progress of the next three years in a single month, so you should really consider it.

noobpwnftw · Post by **noobpwnftw** » Sun Mar 22, 2020 10:46 am

There are established testing frameworks and with the effort of some communications there are "free" compute resources available to conduct reasonably conclusive tests in a timely manner.
Even on the other side, people are using more and more comprehensive testing procedures, so chess-related developments in all seem to have reached pinnacle heights of being scientifically accurate in the recent years.

jp · Post by jp » Sun Mar 22, 2020 11:32 am

noobpwnftw wrote: ↑Sun Mar 22, 2020 10:46 am Even on the other side, people are using more and more comprehensive testing procedures

What do you mean by "on the other side"?

Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please

Re: Poor mans testing process please