patricia devlog

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Guenther
Posts: 4622
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: patricia devlog

Post by Guenther »

Whiskers wrote: Thu Apr 04, 2024 5:45 pm
Guenther wrote: Thu Apr 04, 2024 8:55 am
Whiskers wrote: Thu Apr 04, 2024 3:18 am New baseline pool for Patricia. These engines are about equal strength or stronger, with the exception of poor Wahoo (I didn't expect it to perform that poorly!)

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    290153  31.42%  29.50%  08.47%   61   Patricia  
   2    126973  16.04%  22.23%  15.86%   64   Svart 6  
   3    116043  09.46%  31.21%  24.78%   60   Wahoo v4  
   4    110608  12.72%  26.20%  20.54%   64   Leorik 3.0.1  
   5    109185  13.46%  28.64%  21.11%   65   Velvet 3.1  
   6     79544  08.61%  18.72%  25.89%   68   StockNemo 5.7  
   7     52576  09.23%  08.87%  21.35%   83   Princhess 0.16  
-------------------------------------------------------------------
This is a very odd list to me, Velvet and Princhess so low? how? Princhess is one of the best style playing engines I've seen. Its draw rate on CCRL, at 31%, is lower than every single engine rated higher than 2865. And Velvet is, well, Velvet.

Patricia's no longer able to wipe opponents out at the speed of light; as most of its sacrifices are at least somewhat unsound, it often wins despite the sacrifices rather than because of them against these stronger opponents. However, the EAS score is still way higher than any other engine in the gauntlet, so I am satisfied with her performance and will start development of 2.1/3.0 using this pool.

I really can't help but shake the feeling that optimizing for a particular pool of engines isn't the best idea... maybe I should test using a gauntlet against more (like 20) engines?
Would you mind also to show the real scores for comparison?
ELO scores you mean? Sure, I’ll do it as soon as I get home. Patricia is pretty much right in the middle strength wise.
Just the normal score (success in percentage, or points/games, or whatever).
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
chesskobra
Posts: 175
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: patricia devlog

Post by chesskobra »

As I suggested in another thread, it would be interesting to look at points scored per 100 moves, because engines scoring short wins would do well on such metrics.
User avatar
Whiskers
Posts: 163
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Guenther wrote: Thu Apr 04, 2024 5:53 pm
Whiskers wrote: Thu Apr 04, 2024 5:45 pm
Guenther wrote: Thu Apr 04, 2024 8:55 am
Whiskers wrote: Thu Apr 04, 2024 3:18 am New baseline pool for Patricia. These engines are about equal strength or stronger, with the exception of poor Wahoo (I didn't expect it to perform that poorly!)

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    290153  31.42%  29.50%  08.47%   61   Patricia  
   2    126973  16.04%  22.23%  15.86%   64   Svart 6  
   3    116043  09.46%  31.21%  24.78%   60   Wahoo v4  
   4    110608  12.72%  26.20%  20.54%   64   Leorik 3.0.1  
   5    109185  13.46%  28.64%  21.11%   65   Velvet 3.1  
   6     79544  08.61%  18.72%  25.89%   68   StockNemo 5.7  
   7     52576  09.23%  08.87%  21.35%   83   Princhess 0.16  
-------------------------------------------------------------------
This is a very odd list to me, Velvet and Princhess so low? how? Princhess is one of the best style playing engines I've seen. Its draw rate on CCRL, at 31%, is lower than every single engine rated higher than 2865. And Velvet is, well, Velvet.

Patricia's no longer able to wipe opponents out at the speed of light; as most of its sacrifices are at least somewhat unsound, it often wins despite the sacrifices rather than because of them against these stronger opponents. However, the EAS score is still way higher than any other engine in the gauntlet, so I am satisfied with her performance and will start development of 2.1/3.0 using this pool.

I really can't help but shake the feeling that optimizing for a particular pool of engines isn't the best idea... maybe I should test using a gauntlet against more (like 20) engines?
Would you mind also to show the real scores for comparison?
ELO scores you mean? Sure, I’ll do it as soon as I get home. Patricia is pretty much right in the middle strength wise.
Just the normal score (success in percentage, or points/games, or whatever).
This is 1800 games per each engine, I did it because halfway through my last test the power went out so I had to start the cutechess script again.

Book is 4moves-noob.epd (slightly unbalanced), time control is 10+0.1.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   1 Leorik 3.0.1                  123      14    1800   67.1%   31.7% 
   2 Svart 6                       104      14    1800   64.5%   29.3% 
   3 StockNemo 5.7                  41      13    1800   55.9%   30.8% 
   4 Patricia                      -18      14    1800   47.4%   24.9% 
   5 Princhess 0.15                -68      14    1800   40.4%   28.1% 
   6 Velvet 3.1                    -69      14    1800   40.3%   26.9% 
   7 Wahoo v4                     -111      15    1800   34.5%   21.4%
   
User avatar
Guenther
Posts: 4622
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: patricia devlog

Post by Guenther »

Thanks for posting the real scores too. After checking the opponents in CCRL Blitz it seems the result of Wahoo was just to be expected.
Patricia 2.0 now should be already around 3200 (CCRL scale) now and congrats for the successful tuning towards aggressivity!

Code: Select all

Name		CCRL	Games	Err
------------------------------------
StockNemo 5.7	3285	1397	15
Leorik 3.0 	3284	1047	17
Svart 6 	3261	1199	16
(Patricia 2.0)	----	----	----
Velvet 3.1	3161	 900	17
Princhess 0.16	3154	 850	20
Wahoo v4	3081	1240	16
Whiskers wrote: Fri Apr 05, 2024 1:30 am
New baseline pool for Patricia. These engines are about equal strength or stronger, with the exception of poor Wahoo (I didn't expect it to perform that poorly!)

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    290153  31.42%  29.50%  08.47%   61   Patricia  
   2    126973  16.04%  22.23%  15.86%   64   Svart 6  
   3    116043  09.46%  31.21%  24.78%   60   Wahoo v4  
   4    110608  12.72%  26.20%  20.54%   64   Leorik 3.0.1  
   5    109185  13.46%  28.64%  21.11%   65   Velvet 3.1  
   6     79544  08.61%  18.72%  25.89%   68   StockNemo 5.7  
   7     52576  09.23%  08.87%  21.35%   83   Princhess 0.16  
-------------------------------------------------------------------
This is 1800 games per each engine, I did it because halfway through my last test the power went out so I had to start the cutechess script again.

Book is 4moves-noob.epd (slightly unbalanced), time control is 10+0.1.

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   1 Leorik 3.0.1                  123      14    1800   67.1%   31.7% 
   2 Svart 6                       104      14    1800   64.5%   29.3% 
   3 StockNemo 5.7                  41      13    1800   55.9%   30.8% 
   4 Patricia                      -18      14    1800   47.4%   24.9% 
   5 Princhess 0.15                -68      14    1800   40.4%   28.1% 
   6 Velvet 3.1                    -69      14    1800   40.3%   26.9% 
   7 Wahoo v4                     -111      15    1800   34.5%   21.4%
   
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
User avatar
Whiskers
Posts: 163
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Firstly, I fixed a couple bugs that were causing some undefined behavior / occasional attempts to access an index that was one greater than what the array had storage for. I could never get Patricia to crash on either one of my machines but now that it actually passes all valgrind/UbSan checks I feel safe to release a bugfix version.

I'm generating data for a friend and can't do testing at the moment, so decided to try my hand at skill levels. The first most obvious idea is to just limit depth.
At 60 + 0.6, depth 2 Patricia = about 1100-1200 CCRL, depth 3 = 1400-1500, and depth 4 = about 1700. When I play against Patricia, its mistakes and blunders feel relatively normal, like those of me on a bad day in blitz (until she starts losing at least, at which points she starts throwing all her pieces into the garbage!) Depth 1 cannot spot mate in 1s if the mating move is a quiet one, so I discarded that idea.

The next idea is to limit nodes; above all, I want to avoid putting in code to intentionally force blunders. Patricia already makes enough funny mistakes at full strength :D
User avatar
Whiskers
Posts: 163
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Patricia now has several skill levels, which were roughly determined by testing at 60 + 0.6 (at shorter time controls these skill levels will be stronger, at longer time controls they will be weaker). The ratings are somewhat anchored to CCRL 40|15 rating - 50.

These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia

I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.

I also implemented go depth and go nodes, next up is a proper PV and multithreading.
User avatar
Whiskers
Posts: 163
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Multithreading support is now added in Patricia. Supports up to 1024 threads. In doing so, however, I discovered something that I need some time to think on.

When I first tested it, multithreading did not give nearly as much ELO as I thought it was - just 60 ELO for 4 threads, on an unbalanced book. After verifying that there was nothing fishy locally, I wondered if it might be due to the sacrifices Patricia plays. So I removed the sacrifice bonuses in eval, and lo and behold, suddenly multithreading was giving the gains I had expected it to.

It seems that I've reached a point in strength where most of Patricia's losses come from the ridiculous sacrifices she's forced to do. A lot of Patricia's sacrifices have great compensation, but some are basically just giving the opponent piece odds. All the threads in the world can't stop Patricia from playing three bad sacrifices in a row and getting into a losing position.

It seems then that for Patricia 3, I'm going to have to take a step back, and figure out how to get her to play into positions where good (especially close to best move) sacrifices are plentiful. In some positions, there just are not good sacrifices that can be played, and in those scenarios I want to code Patricia so that she steers the game into more exciting waters and then sacrifices, instead of immediately forcing a "sacrifice" that is just a hang of a pawn.
User avatar
Whiskers
Posts: 163
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

I decided to extract data from SPCC testing to get some better data for retraining Patricia's net on. To do this, I grabbed Patricia's games, as well as all the games played in SPCC testing (found on the site), used the interesting wins filter to search for, well, interesting games, used pgn-extract to grab the FENs (with best moves and scores) from the PGNs, and wrote a script to perform filtering + conversion on those FENs. This yielded about 8.25m "interesting" FENs; if retraining Patricia's network on it yields positive results, I'll probably grab CCRL games as well.

For testing the new retrained net I'm going to remove the features that directly force sacrifices in Patricia. I feel like they're a bit unhealthy for how she plays, especially as the bonuses get *huge* for some sacrifices. I think I'm also not going to let Patricia give bonuses for sacrifices if she's losing, because sacrifices in losing positions are really just throwing pieces in the garbage and are not conducive whatsoever to style of play.
User avatar
pohl4711
Posts: 2460
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: patricia devlog

Post by pohl4711 »

Whiskers wrote: Mon Apr 08, 2024 7:14 am Patricia now has several skill levels, which were roughly determined by testing at 60 + 0.6 (at shorter time controls these skill levels will be stronger, at longer time controls they will be weaker). The ratings are somewhat anchored to CCRL 40|15 rating - 50.

These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia

I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.

I also implemented go depth and go nodes, next up is a proper PV and multithreading.
My 2cents: fixed nodes or fixed depths are a good way to reduce the strength without damaging the playing-sytle. But please mention, in the endgame, the number of nodes (or the max depth) must be increased, otherwise there is a huge Elo-loss in the endgame. As I mentioned before: TheKing-Element Chesscomputer offers limited nodes levels, too, but doubles and quadruples this node-limit, when the board is getting more and more empty (=endgame).
User avatar
Whiskers
Posts: 163
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

pohl4711 wrote: Wed Apr 10, 2024 2:14 pm
Whiskers wrote: Mon Apr 08, 2024 7:14 am Patricia now has several skill levels, which were roughly determined by testing at 60 + 0.6 (at shorter time controls these skill levels will be stronger, at longer time controls they will be weaker). The ratings are somewhat anchored to CCRL 40|15 rating - 50.

These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia

I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.

I also implemented go depth and go nodes, next up is a proper PV and multithreading.
My 2cents: fixed nodes or fixed depths are a good way to reduce the strength without damaging the playing-sytle. But please mention, in the endgame, the number of nodes (or the max depth) must be increased, otherwise there is a huge Elo-loss in the endgame. As I mentioned before: TheKing-Element Chesscomputer offers limited nodes levels, too, but doubles and quadruples this node-limit, when the board is getting more and more empty (=endgame).
I definitely understand this for max depth (and will come back around to revising Patricia's skill levels before releasing), but for endgames why does it need more nodes? Thanks to the transposition table engines can hit very high depths with comparatively very few nodes.