Seer 2.0.0

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Seer 2.0.0

Post by connor_mcmonigle »

I've released a new version of Seer here: https://github.com/connormcmonigle/seer ... tag/v2.6.0. Under UHO selfplay testing conditions, this release adds approximately 100 Elo
User avatar
pohl4711
Posts: 2731
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Seer 2.0.0

Post by pohl4711 »

Nice. I started the testrun. Because my UHO-openings are spreading the Elos, I expect 40-50 Elo progress in my ratinglist.
Result in 3 days, if all works as expected. Perhaps Seer 2.6.0 can reach Stockfish final HCE, that would be a milestone...
Lets hope, new Seer will play more aggressive - in my EAS-ratinglist Seer 2.5.0 is only on rank 40 (of 45 engines)...

Stay tuned.
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Seer 2.0.0

Post by connor_mcmonigle »

pohl4711 wrote: Sat Oct 22, 2022 10:22 am Nice. I started the testrun. Because my UHO-openings are spreading the Elos, I expect 40-50 Elo progress in my ratinglist.
Result in 3 days, if all works as expected. Perhaps Seer 2.6.0 can reach Stockfish final HCE, that would be a milestone...
Lets hope, new Seer will play more aggressive - in my EAS-ratinglist Seer 2.5.0 is only on rank 40 (of 45 engines)...

Stay tuned.
Awesome. Thanks for testing! I appreciate it.
Regarding your EAS list, I do believe your use of different pools in testing engines is somewhat of a methodological flaw. Stronger engines blunder allowing opponents advantageous sacrifices far less frequently relative to weaker engines which skews the results. However, I will admit you're at least actually measuring something as the rankings correspond roughly with my intuitions about the relative aggression of the various engines on your list. I feel that Seer 2.6 tends to play a bit more aggressively relative to 2.5 so I'm definitely curious to see if that feeling is reflected in its performance on your EAS list.
User avatar
pohl4711
Posts: 2731
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Seer 2.0.0

Post by pohl4711 »

connor_mcmonigle wrote: Sat Oct 22, 2022 2:04 pm Regarding your EAS list, I do believe your use of different pools in testing engines is somewhat of a methodological flaw.
No, it is not. Look here:
https://talkchess.com/forum3/viewtopic.php?f=6&t=80813
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Seer 2.0.0

Post by connor_mcmonigle »

pohl4711 wrote: Sat Oct 22, 2022 2:55 pm
connor_mcmonigle wrote: Sat Oct 22, 2022 2:04 pm Regarding your EAS list, I do believe your use of different pools in testing engines is somewhat of a methodological flaw.
No, it is not. Look here:
https://talkchess.com/forum3/viewtopic.php?f=6&t=80813
I don't exactly see how that refutes my argument. If you placed Seer into that pool, it would prove incredibly dominant and, therefore, have a correspondingly inflated EAS rating -> the EAS rating is a function of the pool. This is a problem. On the other hand, Elo is (at least theoretically) invariant to the opponent which enables for consistent results with differing pools. Perhaps it would be possible to formulate a similarly invariant metric for aggressiveness by way of incorporating the relative strength and absolute strength of the opponents (something like "Elo normalized aggressiveness").
User avatar
pohl4711
Posts: 2731
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Seer 2.0.0

Post by pohl4711 »

connor_mcmonigle wrote: Sun Oct 23, 2022 7:08 am
pohl4711 wrote: Sat Oct 22, 2022 2:55 pm
connor_mcmonigle wrote: Sat Oct 22, 2022 2:04 pm Regarding your EAS list, I do believe your use of different pools in testing engines is somewhat of a methodological flaw.
No, it is not. Look here:
https://talkchess.com/forum3/viewtopic.php?f=6&t=80813
I don't exactly see how that refutes my argument. If you placed Seer into that pool, it would prove incredibly dominant and, therefore, have a correspondingly inflated EAS rating -> the EAS rating is a function of the pool. This is a problem. On the other hand, Elo is (at least theoretically) invariant to the opponent which enables for consistent results with differing pools. Perhaps it would be possible to formulate a similarly invariant metric for aggressiveness by way of incorporating the relative strength and absolute strength of the opponents (something like "Elo normalized aggressiveness").
Of course, the opponents may not be so much stronger or weaker, that Seer (or any other EAS-evaluated engine) would win or loose nearly all games. Thats clear. But the EAS-ratinglist is built on the games of my SPCC-ratinglist. And there, I test the engines vs. opponents around the (expected) strength of the engine, which is tested. The linked example shows clearly, that it is meaningless, if the engine scores nearly 50% or 42%, for the valid functionality of the EAS-scoring. And except Stockfish and KomodoDragon, all engines in the SPCC-ratinglist have a scoring between 40% - 60%. For Stockfish or KomodoDragon there are just not enough strong opponents...(But look at KomodoDragon in the EAS-ratinglist: Rank 27, only, although KomodoDragon 3.1 has a score of 63.2% (vs its 11 opponents)). That works, because of the fact, the EAS-tool works on percents not absolute numbers. If an engine has more wins, it has to play more sacrifices for the same EAS-score, than an engine with less wins, which needs less games with sacrifices for the same score. Same for short wins-scoring an bad draws-scoring. If the engine has 100 wins and plays one rook sac, the score is the same like the engine has 1000 wins and played 10 rook sacs. Thats why, the EAS-tool works, as long as the opponents of an engine are not ridiculous stronger or weaker than the engine itself - and that is part of my testing-method long before the EAS-ratinglist was introduced.

Here again, the important posting of the linked thread:

Just look at Slow Chess 2.9:
Score 43.1%, like Berserk 10. And EAS Score is 95348, nearly 3x bigger than Berserk 10. Rank 4 in EAS ratinglist...

The EAS calculations are all done with percent values, because of the reason that it should not matter, how strong the engine plays and how high the score is!
From my website:
"Because a weaker player can be playing aggressive, too, the EAS-Score (= Engine Aggressivenes Score, see explanation below) and all other statistics are build on percents from the won games of an engine/player. So, if an engine has won more games, it must win more short games or win games with sacrifices. A weaker engine, which has won less games, need less wins of short games or win games with sacrifices."


Or look at the full-ratinglist, where all played games of the engines are stored and no Stockfish-dev-versions are included (below the full ratinglist, the full EAS-ratinglist follows):
https://www.sp-cc.de/files/spcc_full_list.txt

Here you have Berserk 9 with 13000 played games, no SF-devs as opponent and a score of 49.2% (nearly 50%):
17 Berserk 9 avx2 : 3647 5 5 13000 49.2% 3653 70.2%
And the EAS-score stays as bad as always (rank 158 of 166 entries!!!):
158 35269 10.09% 05.89% 26.71% Berserk 9 avx2

And SlowChess 2.9 has 19000 games here, with only 42.5% score:
26 Slow Chess 2.9 avx2 : 3585 4 4 17000 42.5% 3641 66.7%
And the EAS-score stays high (Rank 17 of 166):
17 92177 23.99% 23.54% 17.84% Slow Chess 2.9 avx2
User avatar
pohl4711
Posts: 2731
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Seer 2.0.0

Post by pohl4711 »

Lets take a look at my full ratinglist and Seer until now:

43 Seer 2.5.0 avx2 : 3526 20000 49.4% (Elo, number of games, overall score)
73 Seer 2.4.0 avx2 : 3444 16000 46.3%
99 Seer 2.3.0 avx : 3367 14000 54.0%

EAS-ratinglist:
87 Seer 2.3.0 avx :53830 16.77% 23.64% 25.56% (EAS-Score, percents of sacs, percents of short wins, percents of bad draws)
136 Seer 2.4.0 avx2 :41199 12.85% 19.46% 27.27%
146 Seer 2.5.0 avx2 :39084 11.57% 16.57% 28.93%

What we see here is typical: Engines, which are getting stronger, mostly getting playing less aggressive. Nevertheless, See 2.5.0 has a higher score vs its opponents (49.4%) in the SPCC-ratinglist, than Seer 2.4.0 (46.3%) has, but the EAS-score of Seer 2.5.0 is lower than the EAS-score of Seer 2.4.0...

So, if the pool of opponents is not complete crazy (engine scoring nearly 0% or 100%) and the number of played games is high, the EAS-tool is working. And both conditions are true in my testings: Each engine plays at least 7000 games and each engine is tested vs. opponents, which are relativly close in strength compared to the engine itself.

PS: Good news is, that strong engines can play aggressive: Slow Chess 2.9 shows, that this is possible... Slow Chess 2.9 is on rank 11 of the SPCC-ratinglist and on rank 4 of the EAS-ratinglist (even though, its ratinglist-score is 43%, only!)... If other strong engines, like Seer, Ethereal, Koivisto or Beserk fail to get a high EAS-score, it is, because they do not play aggressive chess, not because the EAS-tool is not working!
User avatar
Rebel
Posts: 7312
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Seer 2.0.0

Post by Rebel »

pohl4711 wrote: Sun Oct 23, 2022 8:36 amWhat we see here is typical: Engines, which are getting stronger, mostly getting playing less aggressive.
That's a correct observation, nets get better because they become more balanced, thus less aggressive, I think my second reckless Rebel 14.1 net was a good example :D

I have been trying to make my new net more aggressive, from your tool I get:

Code: Select all

Rebel 15.1 - Number of all sacrifices         : 306 (06.12% of all games) (14.85% of won games) 
Rebel beta - Number of all sacrifices         : 375 (07.50% of all games) (17.93% of won games) 
What does it mean?
90% of coding is debugging, the other 10% is writing bugs.
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Seer 2.0.0

Post by connor_mcmonigle »

pohl4711 wrote: Sun Oct 23, 2022 8:36 am Lets take a look at my full ratinglist and Seer until now:

43 Seer 2.5.0 avx2 : 3526 20000 49.4% (Elo, number of games, overall score)
73 Seer 2.4.0 avx2 : 3444 16000 46.3%
99 Seer 2.3.0 avx : 3367 14000 54.0%

EAS-ratinglist:
87 Seer 2.3.0 avx :53830 16.77% 23.64% 25.56% (EAS-Score, percents of sacs, percents of short wins, percents of bad draws)
136 Seer 2.4.0 avx2 :41199 12.85% 19.46% 27.27%
146 Seer 2.5.0 avx2 :39084 11.57% 16.57% 28.93%

What we see here is typical: Engines, which are getting stronger, mostly getting playing less aggressive. Nevertheless, See 2.5.0 has a higher score vs its opponents (49.4%) in the SPCC-ratinglist, than Seer 2.4.0 (46.3%) has, but the EAS-score of Seer 2.5.0 is lower than the EAS-score of Seer 2.4.0...

So, if the pool of opponents is not complete crazy (engine scoring nearly 0% or 100%) and the number of played games is high, the EAS-tool is working. And both conditions are true in my testings: Each engine plays at least 7000 games and each engine is tested vs. opponents, which are relativly close in strength compared to the engine itself.

PS: Good news is, that strong engines can play aggressive: Slow Chess 2.9 shows, that this is possible... Slow Chess 2.9 is on rank 11 of the SPCC-ratinglist and on rank 4 of the EAS-ratinglist (even though, its ratinglist-score is 43%, only!)... If other strong engines, like Seer, Ethereal, Koivisto or Beserk fail to get a high EAS-score, it is, because they do not play aggressive chess, not because the EAS-tool is not working!
The relative Elo of the pool to the Elo of the given engine determines the score percent. The absolute Elo of the pool impacts the EAS rating. Therefore, two engines having comparable score percents vs. their respective pools does not imply their EAS ratings can be compared (as the average absolute Elo of their respective pools could be wildly different). Only engines tested with similar pools of opponents' EAS ratings can be meaningfully compared (and your rating list correctly shows SlowChess 2.9 is more "aggressive" than Seer given SlowChess was tested with a pool with >= average absolute Elo relative to Seer). I do think it is possible to remedy this by some normalization of the EAS rating such that the expected EAS rating is approximately invariant to the opponent/pool of opponents.
Uri Blass
Posts: 10803
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Seer 2.0.0

Post by Uri Blass »

pohl4711 wrote: Sun Oct 23, 2022 8:18 am
connor_mcmonigle wrote: Sun Oct 23, 2022 7:08 am
pohl4711 wrote: Sat Oct 22, 2022 2:55 pm
connor_mcmonigle wrote: Sat Oct 22, 2022 2:04 pm Regarding your EAS list, I do believe your use of different pools in testing engines is somewhat of a methodological flaw.
No, it is not. Look here:
https://talkchess.com/forum3/viewtopic.php?f=6&t=80813
I don't exactly see how that refutes my argument. If you placed Seer into that pool, it would prove incredibly dominant and, therefore, have a correspondingly inflated EAS rating -> the EAS rating is a function of the pool. This is a problem. On the other hand, Elo is (at least theoretically) invariant to the opponent which enables for consistent results with differing pools. Perhaps it would be possible to formulate a similarly invariant metric for aggressiveness by way of incorporating the relative strength and absolute strength of the opponents (something like "Elo normalized aggressiveness").
Of course, the opponents may not be so much stronger or weaker, that Seer (or any other EAS-evaluated engine) would win or loose nearly all games. Thats clear. But the EAS-ratinglist is built on the games of my SPCC-ratinglist. And there, I test the engines vs. opponents around the (expected) strength of the engine, which is tested. The linked example shows clearly, that it is meaningless, if the engine scores nearly 50% or 42%, for the valid functionality of the EAS-scoring. And except Stockfish and KomodoDragon, all engines in the SPCC-ratinglist have a scoring between 40% - 60%. For Stockfish or KomodoDragon there are just not enough strong opponents...(But look at KomodoDragon in the EAS-ratinglist: Rank 27, only, although KomodoDragon 3.1 has a score of 63.2% (vs its 11 opponents)). That works, because of the fact, the EAS-tool works on percents not absolute numbers. If an engine has more wins, it has to play more sacrifices for the same EAS-score, than an engine with less wins, which needs less games with sacrifices for the same score. Same for short wins-scoring an bad draws-scoring. If the engine has 100 wins and plays one rook sac, the score is the same like the engine has 1000 wins and played 10 rook sacs. Thats why, the EAS-tool works, as long as the opponents of an engine are not ridiculous stronger or weaker than the engine itself - and that is part of my testing-method long before the EAS-ratinglist was introduced.

Here again, the important posting of the linked thread:

Just look at Slow Chess 2.9:
Score 43.1%, like Berserk 10. And EAS Score is 95348, nearly 3x bigger than Berserk 10. Rank 4 in EAS ratinglist...

The EAS calculations are all done with percent values, because of the reason that it should not matter, how strong the engine plays and how high the score is!
From my website:
"Because a weaker player can be playing aggressive, too, the EAS-Score (= Engine Aggressivenes Score, see explanation below) and all other statistics are build on percents from the won games of an engine/player. So, if an engine has won more games, it must win more short games or win games with sacrifices. A weaker engine, which has won less games, need less wins of short games or win games with sacrifices."


Or look at the full-ratinglist, where all played games of the engines are stored and no Stockfish-dev-versions are included (below the full ratinglist, the full EAS-ratinglist follows):
https://www.sp-cc.de/files/spcc_full_list.txt

Here you have Berserk 9 with 13000 played games, no SF-devs as opponent and a score of 49.2% (nearly 50%):
17 Berserk 9 avx2 : 3647 5 5 13000 49.2% 3653 70.2%
And the EAS-score stays as bad as always (rank 158 of 166 entries!!!):
158 35269 10.09% 05.89% 26.71% Berserk 9 avx2

And SlowChess 2.9 has 19000 games here, with only 42.5% score:
26 Slow Chess 2.9 avx2 : 3585 4 4 17000 42.5% 3641 66.7%
And the EAS-score stays high (Rank 17 of 166):
17 92177 23.99% 23.54% 17.84% Slow Chess 2.9 avx2
I think that the way that you calculate Engine Aggressivenes Score is unfair.

Imagine an engine that has 1 win out of 100 games and the win is with a sacrifice(rest of the games are draws and losses
The engine improved and against the same opponents has 10 wins out of 100 games and 2 wins are with sacrifices(1 of them the same game that the older version won with a sacrifice.

The engine is more aggressive from my point of view because it wins more games with sacrifices but when you look at percent from won games you do not see it.