The PSTs of Carnivor

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: The PSTs of Carnivor

Post by abulmo2 »

Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.

Code: Select all

  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).
Richard Delorme
Mike Sherwin
Posts: 868
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: The PSTs of Carnivor

Post by Mike Sherwin »

abulmo2 wrote: Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.

Code: Select all

  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).
Wow thanks! That is very informative. Makes me think that I can improve CAR PST even more. :D
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: The PSTs of Carnivor

Post by mvanthoor »

abulmo2 wrote: Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.

Code: Select all

  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).
This jives perfectly with my own tests, even though I used BayesElo.

Rustic Alpha 1 is at the expected rating; it's 1677 in CCRL, but you have mixed in 2 versions of TSCP... on of which is stronger than the one on CCRL. Rustic Alpha 1 plays badly against TSCP, and underperforms against it by about 50 Elo, compared to other engines. And now it does it twice.

Cool to see that that TSCP can be increased by over 100 Elo just by weights, and that the difference between lowest and highest dumber is about 270 Elo. So, a tuned and tapered evaluation _can_ yield up to 250-300 points, depending on how good the original PST's were.

Nice. I'll be happy to see what tuning, and then tapering/tuning does for Rustic. (With some extra move ordering in the mix, which it doesn't yet have.) At that point, we'll be at Rustic 5.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The PSTs of Carnivor

Post by lithander »

abulmo2 wrote: Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.

Code: Select all

  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).
That's really interesting! Thanks for sharing! :)

MMC doesn't use tapered eval and I had already planned to try Pesto over the weekend because it seems to perform incredibly well. But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: The PSTs of Carnivor

Post by mvanthoor »

lithander wrote: Fri Mar 12, 2021 11:20 am That's really interesting! Thanks for sharing! :)

MMC doesn't use tapered eval and I had already planned to try Pesto over the weekend because it seems to perform incredibly well. But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?
Think of the relationship between the Dumb and Dumber engines. Dumber is a stripped version of Dumb.

Many people are referring to "PeSTO" as piece square tables; that is correct, but they didn't appear out of thin air. They are the PST's of the engine PeSTO.

PeSTO is a stripped version of RofChade, and that is a massively strong engine. I _assume_ the PeSTO tables have been tuned against evaluations obtained with RofChade's evaluation function, and thus they include a lot of RofChade's positional knowledge (in a simplified form, obviously, because you can't fit everything in PST's).
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: The PSTs of Carnivor

Post by abulmo2 »

mvanthoor wrote: Fri Mar 12, 2021 1:05 pm
lithander wrote: Fri Mar 12, 2021 11:20 am That's really interesting! Thanks for sharing! :)

MMC doesn't use tapered eval and I had already planned to try Pesto over the weekend because it seems to perform incredibly well. But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?
Think of the relationship between the Dumb and Dumber engines. Dumber is a stripped version of Dumb.

Many people are referring to "PeSTO" as piece square tables; that is correct, but they didn't appear out of thin air. They are the PST's of the engine PeSTO.

PeSTO is a stripped version of RofChade, and that is a massively strong engine. I _assume_ the PeSTO tables have been tuned against evaluations obtained with RofChade's evaluation function, and thus they include a lot of RofChade's positional knowledge (in a simplified form, obviously, because you can't fit everything in PST's).
I think the story is quite different http://talkchess.com/forum3/viewtopic.php?f=2&t=68311
Originaly Rofchade had only pst & material and Pesto is more a derivative of early Rofchade than a stripped version. Somewhere in the discussion about Rofchade, Ronald said it uses a set of positions quiet-labeled.epd set created by Alexandru Mosoi.
Richard Delorme
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: The PSTs of Carnivor

Post by abulmo2 »

lithander wrote: Fri Mar 12, 2021 11:20 am But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?
Pesto's weights are fully asymmetrical, not Dumber's (only the king one is asymmetrical). The set of games / positions used for training is different, the tuning algorithm is different, etc. So many differences can explain why one is better than the other. I will probably retuned my weights to see if I can get something better one day. The only thing I know for sure, is that human hand written evaluation are incredibly wrong and weak. Our brains understand nothing about chess ;-)
Richard Delorme
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: The PSTs of Carnivor

Post by mvanthoor »

abulmo2 wrote: Fri Mar 12, 2021 2:44 pm The only thing I know for sure, is that human hand written evaluation are incredibly wrong and weak. Our brains understand nothing about chess ;-)
The problem is that we as humans can't determine if "putting a rook on an open file" should be valued at +0.12, or +0.17...

Take this into account for many parameters, such as the bishop pair (should this be +0.10, or +0.11?) and you can end up with an evaluation that is much stronger when tuned by a computer, because it can test a bazillion versions one after another.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: The PSTs of Carnivor

Post by mvanthoor »

abulmo2 wrote: Fri Mar 12, 2021 2:36 pm
I think the story is quite different http://talkchess.com/forum3/viewtopic.php?f=2&t=68311
Originaly Rofchade had only pst & material and Pesto is more a derivative of early Rofchade than a stripped version. Somewhere in the discussion about Rofchade, Ronald said it uses a set of positions quiet-labeled.epd set created by Alexandru Mosoi.
Thanks for the correction. I've always believed, on the basis of other posts, that PeSTO was a stripped RofChade, instead of an early version of RofChade (and then renamed an developed separately).
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The PSTs of Carnivor

Post by lithander »

I've run a few more tests with the Carnivor PSTs with the new King Table.

Code: Select all

MinimalChess 0.3 vs tscp181: 225 - 588 - 187  [0.319] 1000
Elo difference: -132.1 +/- 20.5, LOS: 0.0 %, DrawRatio: 18.7 %
MinimalChess 0.3 Carnivor vs tscp181: 234 - 612 - 154  [0.311] 1000
Elo difference: -138.2 +/- 21.1, LOS: 0.0 %, DrawRatio: 15.4 %
-8 ELO

MinimalChess 0.3 vs Rustic: 242 - 561 - 197  [0.341] 100
Elo difference: -114.8 +/- 20.1, LOS: 0.0 %, DrawRatio: 19.7 %
MinimalChess 0.3 Carnivor vs Rustic: 217 - 576 - 207  [0.321] 1000
Elo difference: -130.5 +/- 20.1, LOS: 0.0 %, DrawRatio: 20.7 %
-15 ELO

MinimalChess 0.3 vs Shallow Blue 2.0.0 64-bit: 258 - 496 - 246  [0.381] 1000
Elo difference: -84.3 +/- 19.1, LOS: 0.0 %, DrawRatio: 24.6 %
 MinimalChess 0.3 Carnivor vs Shallow Blue 2.0.0 64-bit: 214 - 618 - 168  [0.298] 1000
Elo difference: -148.8 +/- 21.1, LOS: 0.0 %, DrawRatio: 16.8 %
-64 ELO

MinimalChess 0.3 vs FracTalv1.0: 283 - 313 - 404  [0.485] 1000
Elo difference: -10.4 +/- 16.6, LOS: 11.0 %, DrawRatio: 40.4 %
MinimalChess 0.3 Carnivor vs FracTalv1.0: 308 - 320 - 372  [0.494] 1000
Elo difference: -4.2 +/- 17.1, LOS: 31.6 %, DrawRatio: 37.2 %
+6 ELO

Sadly it seems like the default PSTs were doing a little better against all engines but FracTalv1.0. Of course even despite running 1000 games each the error bars are still quite large, often larger then the difference between the PSTs.

I also tried replacing only the King table in my default PSTs with your suggested version but in my tests that didn't make any positive difference either. Maybe Carnivor's PSTs need more search depth than MMC can provide to show their true potential.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess