ShashChess

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10374
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: ShashChess

Post by Uri Blass »

amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
The test does not show which engine is stronger but I disagree that it proves absolutely nothing.
I think it clearly proves that the rating difference between the engines is small(less than 20 elo with more than 99% confidence) and this result is not obvious with no testing.
Pali
Posts: 27
Joined: Wed Dec 01, 2021 12:23 pm
Full name: Doruk Sekercioglu

Re: ShashChess

Post by Pali »

DrEinstein wrote: Wed Apr 24, 2024 10:03 pm
Pali wrote: Wed Apr 24, 2024 9:14 pm
amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
https://github.com/amchess/ShashChess/wiki/Matches

It is the exact same amount of matches you played on your own tests - it's just as valid.
Yes in both matches 200 games, but please read more carefully the user names. I have absolutely nothing to do with amchess or ShashChess, I only thought it would help some guys to understand what such kind of results mean: Nothing, amchess' comment is absolutely correct... more or less for both matches. However, amchess has used UHO openings which decrease the number of games necessary for a certain error margin or decreases the margin if the number of games is kept constant. Note that increasing the TC doesn't change anything in the statistics!
Yes, amchess' comment is absolutely correct. In addition, my comment that the two tests are equally as valid is also absolutely correct.

UHO books don't decrease the number of games necessary for a certain error margin - they lower draw rate.
Pali
Posts: 27
Joined: Wed Dec 01, 2021 12:23 pm
Full name: Doruk Sekercioglu

Re: ShashChess

Post by Pali »

Uri Blass wrote: Wed Apr 24, 2024 10:24 pm
amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
The test does not show which engine is stronger but I disagree that it proves absolutely nothing.
I think it clearly proves that the rating difference between the engines is small(less than 20 elo with more than 99% confidence) and this result is not obvious with no testing.
I fully agree with your first sentence. It proves that 200 games is simply not large enough of a sample size as can be seen clearly by the two conflicting results.
Uri Blass
Posts: 10374
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: ShashChess

Post by Uri Blass »

Pali wrote: Wed Apr 24, 2024 10:27 pm
Uri Blass wrote: Wed Apr 24, 2024 10:24 pm
amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
The test does not show which engine is stronger but I disagree that it proves absolutely nothing.
I think it clearly proves that the rating difference between the engines is small(less than 20 elo with more than 99% confidence) and this result is not obvious with no testing.
I fully agree with your first sentence. It proves that 200 games is simply not large enough of a sample size as can be seen clearly by the two conflicting results.
200 games is large enough or not large enough dependent on the result.

200 games can be clearly enough to show that A is stronger than B
if you get 100 wins for A ,1 win for B and 99 draws.
Pali
Posts: 27
Joined: Wed Dec 01, 2021 12:23 pm
Full name: Doruk Sekercioglu

Re: ShashChess

Post by Pali »

Uri Blass wrote: Wed Apr 24, 2024 10:30 pm
Pali wrote: Wed Apr 24, 2024 10:27 pm
Uri Blass wrote: Wed Apr 24, 2024 10:24 pm
amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
The test does not show which engine is stronger but I disagree that it proves absolutely nothing.
I think it clearly proves that the rating difference between the engines is small(less than 20 elo with more than 99% confidence) and this result is not obvious with no testing.
I fully agree with your first sentence. It proves that 200 games is simply not large enough of a sample size as can be seen clearly by the two conflicting results.
200 games is large enough or not large enough dependent on the result.

200 games can be clearly enough to show that A is stronger than B
if you get 100 wins for A ,1 win for B and 99 draws.
I again fully agree. In this specific case, 200 games is not large enough as amchess' test in unbalanced openings suggests a 6 game lead in favor of ShashChess and Jouni's test in balanced openings suggest a 2 game lead in favor of SF.
DrEinstein
Posts: 75
Joined: Wed Sep 15, 2021 8:50 pm
Full name: Albert Einstein

Re: ShashChess

Post by DrEinstein »

Pali wrote: Wed Apr 24, 2024 10:26 pm
DrEinstein wrote: Wed Apr 24, 2024 10:03 pm
Pali wrote: Wed Apr 24, 2024 9:14 pm
amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
https://github.com/amchess/ShashChess/wiki/Matches

It is the exact same amount of matches you played on your own tests - it's just as valid.
Yes in both matches 200 games, but please read more carefully the user names. I have absolutely nothing to do with amchess or ShashChess, I only thought it would help some guys to understand what such kind of results mean: Nothing, amchess' comment is absolutely correct... more or less for both matches. However, amchess has used UHO openings which decrease the number of games necessary for a certain error margin or decreases the margin if the number of games is kept constant. Note that increasing the TC doesn't change anything in the statistics!
Yes, amchess' comment is absolutely correct. In addition, my comment that the two tests are equally as valid is also absolutely correct.

UHO books don't decrease the number of games necessary for a certain error margin - they lower draw rate.
I really didn't want to calculate errors for both matches. Errors depend on several things. Draw rate is one, the lower the draw rate the more (decisive) games end 1-0 or 0-1 and the smaller the (relative to the gap size) the error bars are. Quite difficult to explain it with less words and without formulas. But with UHO definitely less games are needed for getting the same "quality" of the result. This is the main reason why people use these kind of openings.
Edit: You are probably right that errors are (more or less) constant but the ELO difference increases with UHO by roughly a factor of about 2, iirc.
fishpov
Posts: 103
Joined: Sat Mar 07, 2015 6:05 pm

Re: ShashChess

Post by fishpov »

I Amchess

What I want to say,
under fritz gui into properties dialog of ShashChess 35.2
I have not those 2 options
Fix montecarlo
Speedup WinProbability

Thanks
Uri Blass
Posts: 10374
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: ShashChess

Post by Uri Blass »

DrEinstein wrote: Wed Apr 24, 2024 11:02 pm
Pali wrote: Wed Apr 24, 2024 10:26 pm
DrEinstein wrote: Wed Apr 24, 2024 10:03 pm
Pali wrote: Wed Apr 24, 2024 9:14 pm
amchess wrote: Wed Apr 24, 2024 2:40 pm Two observations: the openings are balanced (not unbalanced) which means a guarantee of a truckload of draws. The thinking time is 1 minute plus 1 second. Considering the balance of the two engines, their strength, and the randomness we found, this test proves absolutely nothing.
https://github.com/amchess/ShashChess/wiki/Matches

It is the exact same amount of matches you played on your own tests - it's just as valid.
Yes in both matches 200 games, but please read more carefully the user names. I have absolutely nothing to do with amchess or ShashChess, I only thought it would help some guys to understand what such kind of results mean: Nothing, amchess' comment is absolutely correct... more or less for both matches. However, amchess has used UHO openings which decrease the number of games necessary for a certain error margin or decreases the margin if the number of games is kept constant. Note that increasing the TC doesn't change anything in the statistics!
Yes, amchess' comment is absolutely correct. In addition, my comment that the two tests are equally as valid is also absolutely correct.

UHO books don't decrease the number of games necessary for a certain error margin - they lower draw rate.
I really didn't want to calculate errors for both matches. Errors depend on several things. Draw rate is one, the lower the draw rate the more (decisive) games end 1-0 or 0-1 and the smaller the (relative to the gap size) the error bars are. Quite difficult to explain it with less words and without formulas. But with UHO definitely less games are needed for getting the same "quality" of the result. This is the main reason why people use these kind of openings.
Edit: You are probably right that errors are (more or less) constant but the ELO difference increases with UHO by roughly a factor of about 2, iirc.
UHO is a different type of game and in theory it is possible that the better engine with UHO is not the better engine without UHO.
User avatar
Graham Banks
Posts: 41532
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: ShashChess

Post by Graham Banks »

Uri Blass wrote: Thu Apr 25, 2024 1:29 amUHO is a different type of game and in theory it is possible that the better engine with UHO is not the better engine without UHO.
Quite correct.
If either CCRL or CEGT were to use UHO openings, it would skew and invalidate their previous testing.
That is why Stefan kept two lists - his main list and his UHO list.
gbanksnz at gmail.com
amchess
Posts: 336
Joined: Tue Dec 05, 2017 2:42 pm

Re: ShashChess

Post by amchess »

Statistically, it does change, as at 1m+1s much more pruning is done.
In addition, ShashChess dynamically evaluates position quality to direct the search: it should be obvious that it needs more time...
The 200 matches in our match are all from unbalanced positions, not dead draws.
Indeed, your result shows that, to try to beat ShashChess, you have to use Stockfish at its peak performance (bullet monster) and ShashChess at its minimum, but not even that!
In fact, if you try to run the same test between Stockfish and itself you will see obvious randomness in the final result.
In fact, the original search algorithm of Stockfish (and of any decent modern chess engine) inherently has its own randomness.