Obsidian (DEV)

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Ciekce
Posts: 197
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Obsidian (DEV)

Post by Ciekce »

Dann Corbit wrote: Sat Dec 07, 2024 2:40 am The positions may be useless to you, but they are very useful to me.
They show the sort of positions that many engines struggle with.
I am not the only one who finds them useful, but if they are useless to you, I certainly cannot refute such a claim.
I am sorry that you were unable to extract any value from them. Perhaps someday, I will provide a set of positions with more value to you.
I am not referring to your specific set, I am talking about any use of test suites for testing engines.
Dann Corbit wrote: Sat Dec 07, 2024 2:44 am You will likely find (at least at fast time control) that the modified engine is stronger in game play.
I have not completed these tests, but in all my previous tests, the same changes produced about +30 Elo at game in one minute plus half a second bonus.
The benefits are harder and harder to measure at longer time control, due to increasing draw rate.
if only there were standardised ways to test changes to engines

if only there were a way to decrease the draw rates in tests

if only such techniques had existed for years

what a world we would live in
Chris Formula
Posts: 123
Joined: Sun Aug 21, 2016 7:59 am
Full name: Chris Euler

Re: Obsidian (DEV)

Post by Chris Formula »

Obsidian dev-14.10 DC: https://drive.google.com/file/d/1cZbmZm ... fAj0d/view

These are alternative binaries for Obsidian dev-14.10 containing Dann's mod.

Enjoy!

Chris
Dann Corbit
Posts: 12797
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Obsidian (DEV)

Post by Dann Corbit »

Ciekce wrote: Sat Dec 07, 2024 2:56 am I am not referring to your specific set, I am talking about any use of test suites for testing engines.
What such tests are useful for is not for finding out how strong an engine is at game play because it clearly does not deliver that answer.
What such tests are useful for is finding out how strong an engine is at solving difficult positions.
It also give an indication of how strong an engine is in a tactical position (though we know that most positions in the game of chess are quiet rather than tactical).

There is an entire genre of interest in the game of chess involved with solving positions.
You will find that a high percentage of posts in this forum are about solving positions, and people are also very curious about how strong engines are at solving positions.

There will never be an easy way to find out which of two engines of about the same strength is stronger than the other.
Testing at game in one minute plus 500 milliseconds bonus is done for practicality sake.
The minimum of 800 games per engine pair, and 1000 for a truly sensible answer takes about a week on my machine.
To test at game in four minutes plus one second increment is a little over a month to test. I have found that by the time that sort of test is finished, there will almost always be a new version of the engine, invalidating the test.
So to test reliably over longer time spans requires a testing body like CCRL, CEGT and the like.
No everyone's engine gets accepted to those lists, so I have to test myself to understand (roughly) how good or bad something is.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Jouni
Posts: 3690
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Obsidian (DEV)

Post by Jouni »

I think test suites are interesting always! Because chess is solved to be draw and UHO cheating don't change that :wink: .
Jouni
Ciekce
Posts: 197
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Obsidian (DEV)

Post by Ciekce »

Dann Corbit wrote: Sat Dec 07, 2024 9:35 am There will never be an easy way to find out which of two engines of about the same strength is stronger than the other.
damn that's crazy I guess SPRT doesn't exist
Dann Corbit wrote: Sat Dec 07, 2024 9:35 am Testing at game in one minute plus 500 milliseconds bonus is done for practicality sake.
The minimum of 800 games per engine pair, and 1000 for a truly sensible answer takes about a week on my machine.
if only there were normal TCs for this that didn't take 393292389 years for each test

if only there was a place that obsidian changes could be tested on distributed hardware

if only you were complaining about solved problems, instead of terrible intractable issues that completely block you from doing any sort of sane testing




wait...
Dann Corbit
Posts: 12797
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Obsidian (DEV)

Post by Dann Corbit »

Ciekce wrote: Sat Dec 07, 2024 11:36 am damn that's crazy I guess SPRT doesn't exist
I am aware of SPRT.
SPRT tests prove results for the given time control.
The assumption is that the result will hold true for other time controls.
The tests at other time controls always have different Elo values, sometimes even negative.
The tests used for SPRT are generally hyperbullet and bullet sort of ranges.
How do these tests translate to longer time control?
Ciekce wrote: Sat Dec 07, 2024 11:36 am
if only there were normal TCs for this that didn't take 393292389 years for each test

if only there was a place that obsidian changes could be tested on distributed hardware

if only you were complaining about solved problems, instead of terrible intractable issues that completely block you from doing any sort of sane testing
Testing at long time control is the only way to know if an engine will perform at long time control.
Fast tests just produce useful guesses. Most of the time it translates.
Sometimes a change has to be reverted.
Of course, nobody wants to test at 40/2hrs except SSDF because it takes eons to complete the test.

I am not complaining about super-fast test speeds to make engine changes.
In general, we see the Elo values rising rapidly.
But testing at different time controls produces different information.
Perhaps this sort of information is not interesting to you. But it is interesting to me.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Chris Formula
Posts: 123
Joined: Sun Aug 21, 2016 7:59 am
Full name: Chris Euler

Re: Obsidian (DEV)

Post by Chris Formula »

chesskobra
Posts: 355
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: Obsidian (DEV)

Post by chesskobra »

Besides what Dan said above about interest in test suites, I have another perspective. I ran a program called sts_rating.py by fsmosca with different engines. I now see that Obsidian 14.11 has much improved score over previous versions. For example, on Obsidian 14.01, I consistently got STS score in the 3430-3440 range, and now on 14.11 I get in the 3480-3490 range. This is interesting information for me. It does not matter if the STS ratings obtained by the program closely match CCRL ratings. But it would be interesting if the STS ratings correlated well with game play ratings or if they roughly preserved the rating ladder. If it doesn't correlate well, the test suite may be improved or the scoring may be calibrated better. STS test has 1500 positions. One can imagine creating, say, 5000-10000 test positions on which STS like rating would quite accurately estimate the playing strength or would correlate well with game play strength. Even some book writers have written books about "300 most important positions that every player must know", and creating such a test suite for engines is a similar goal.
Chris Formula
Posts: 123
Joined: Sun Aug 21, 2016 7:59 am
Full name: Chris Euler

Re: Obsidian (DEV)

Post by Chris Formula »

cc2150dx
Posts: 407
Joined: Sat Nov 30, 2013 9:51 am
Full name: Jason Coombs

Re: Obsidian (DEV)

Post by cc2150dx »

Hey Chris, do you happen to have Obsidian dev-14.14 playing over at https://www.chess.com/computer-chess-championship available for download ?