The PSTs of Carnivor

abulmo2 · Post by **abulmo2** » Fri Mar 12, 2021 7:49 am

Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.

Code: Select all

  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%

Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).

Mike Sherwin · Post by **Mike Sherwin** » Fri Mar 12, 2021 8:30 am

abulmo2 wrote: ↑Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.
Code: Select all
  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).

Wow thanks! That is very informative. Makes me think that I can improve CAR PST even more.

mvanthoor · Post by **mvanthoor** » Fri Mar 12, 2021 10:06 am

abulmo2 wrote: ↑Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.
Code: Select all
  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).

This jives perfectly with my own tests, even though I used BayesElo.

Rustic Alpha 1 is at the expected rating; it's 1677 in CCRL, but you have mixed in 2 versions of TSCP... on of which is stronger than the one on CCRL. Rustic Alpha 1 plays badly against TSCP, and underperforms against it by about 50 Elo, compared to other engines. And now it does it twice.

Cool to see that that TSCP can be increased by over 100 Elo just by weights, and that the difference between lowest and highest dumber is about 270 Elo. So, a tuned and tapered evaluation _can_ yield up to 250-300 points, depending on how good the original PST's were.

Nice. I'll be happy to see what tuning, and then tapering/tuning does for Rustic. (With some extra move ordering in the mix, which it doesn't yet have.) At that point, we'll be at Rustic 5.

lithander · Post by **lithander** » Fri Mar 12, 2021 11:20 am

abulmo2 wrote: ↑Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.
Code: Select all
  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).

That's really interesting! Thanks for sharing!

MMC doesn't use tapered eval and I had already planned to try Pesto over the weekend because it seems to perform incredibly well. But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?

mvanthoor · Post by **mvanthoor** » Fri Mar 12, 2021 1:05 pm

lithander wrote: ↑Fri Mar 12, 2021 11:20 am That's really interesting! Thanks for sharing!

MMC doesn't use tapered eval and I had already planned to try Pesto over the weekend because it seems to perform incredibly well. But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?

Think of the relationship between the Dumb and Dumber engines. Dumber is a stripped version of Dumb.

Many people are referring to "PeSTO" as piece square tables; that is correct, but they didn't appear out of thin air. They are the PST's of the engine PeSTO.

PeSTO is a stripped version of RofChade, and that is a massively strong engine. I _assume_ the PeSTO tables have been tuned against evaluations obtained with RofChade's evaluation function, and thus they include a lot of RofChade's positional knowledge (in a simplified form, obviously, because you can't fit everything in PST's).

abulmo2 · Post by **abulmo2** » Fri Mar 12, 2021 2:36 pm

mvanthoor wrote: ↑Fri Mar 12, 2021 1:05 pm
lithander wrote: ↑Fri Mar 12, 2021 11:20 am That's really interesting! Thanks for sharing!

MMC doesn't use tapered eval and I had already planned to try Pesto over the weekend because it seems to perform incredibly well. But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?
Think of the relationship between the Dumb and Dumber engines. Dumber is a stripped version of Dumb.

Many people are referring to "PeSTO" as piece square tables; that is correct, but they didn't appear out of thin air. They are the PST's of the engine PeSTO.

PeSTO is a stripped version of RofChade, and that is a massively strong engine. I _assume_ the PeSTO tables have been tuned against evaluations obtained with RofChade's evaluation function, and thus they include a lot of RofChade's positional knowledge (in a simplified form, obviously, because you can't fit everything in PST's).

I think the story is quite different http://talkchess.com/forum3/viewtopic.php?f=2&t=68311
Originaly Rofchade had only pst & material and Pesto is more a derivative of early Rofchade than a stripped version. Somewhere in the discussion about Rofchade, Ronald said it uses a set of positions quiet-labeled.epd set created by Alexandru Mosoi.

abulmo2 · Post by **abulmo2** » Fri Mar 12, 2021 2:44 pm

lithander wrote: ↑Fri Mar 12, 2021 11:20 am But why is it so good? You said in the other thread your weights are "tuned and tapered" and isn't that exactly what Pesto does, too? I don't understand why there's such a big difference between Pestos "tuned and tapered" evaluation and anybody else's. Any ideas?

Pesto's weights are fully asymmetrical, not Dumber's (only the king one is asymmetrical). The set of games / positions used for training is different, the tuning algorithm is different, etc. So many differences can explain why one is better than the other. I will probably retuned my weights to see if I can get something better one day. The only thing I know for sure, is that human hand written evaluation are incredibly wrong and weak. Our brains understand nothing about chess

mvanthoor · Post by **mvanthoor** » Fri Mar 12, 2021 4:39 pm

abulmo2 wrote: ↑Fri Mar 12, 2021 2:44 pm The only thing I know for sure, is that human hand written evaluation are incredibly wrong and weak. Our brains understand nothing about chess

The problem is that we as humans can't determine if "putting a rook on an open file" should be valued at +0.12, or +0.17...

Take this into account for many parameters, such as the bishop pair (should this be +0.10, or +0.11?) and you can end up with an evaluation that is much stronger when tuned by a computer, because it can test a bazillion versions one after another.

mvanthoor · Post by **mvanthoor** » Fri Mar 12, 2021 4:41 pm

abulmo2 wrote: ↑Fri Mar 12, 2021 2:36 pm
I think the story is quite different http://talkchess.com/forum3/viewtopic.php?f=2&t=68311
Originaly Rofchade had only pst & material and Pesto is more a derivative of early Rofchade than a stripped version. Somewhere in the discussion about Rofchade, Ronald said it uses a set of positions quiet-labeled.epd set created by Alexandru Mosoi.

Thanks for the correction. I've always believed, on the basis of other posts, that PeSTO was a stripped RofChade, instead of an early version of RofChade (and then renamed an developed separately).

lithander · Post by **lithander** » Fri Mar 12, 2021 11:20 pm

I've run a few more tests with the Carnivor PSTs with the new King Table.

Code: Select all

MinimalChess 0.3 vs tscp181: 225 - 588 - 187  [0.319] 1000
Elo difference: -132.1 +/- 20.5, LOS: 0.0 %, DrawRatio: 18.7 %
MinimalChess 0.3 Carnivor vs tscp181: 234 - 612 - 154  [0.311] 1000
Elo difference: -138.2 +/- 21.1, LOS: 0.0 %, DrawRatio: 15.4 %
-8 ELO

MinimalChess 0.3 vs Rustic: 242 - 561 - 197  [0.341] 100
Elo difference: -114.8 +/- 20.1, LOS: 0.0 %, DrawRatio: 19.7 %
MinimalChess 0.3 Carnivor vs Rustic: 217 - 576 - 207  [0.321] 1000
Elo difference: -130.5 +/- 20.1, LOS: 0.0 %, DrawRatio: 20.7 %
-15 ELO

MinimalChess 0.3 vs Shallow Blue 2.0.0 64-bit: 258 - 496 - 246  [0.381] 1000
Elo difference: -84.3 +/- 19.1, LOS: 0.0 %, DrawRatio: 24.6 %
 MinimalChess 0.3 Carnivor vs Shallow Blue 2.0.0 64-bit: 214 - 618 - 168  [0.298] 1000
Elo difference: -148.8 +/- 21.1, LOS: 0.0 %, DrawRatio: 16.8 %
-64 ELO

MinimalChess 0.3 vs FracTalv1.0: 283 - 313 - 404  [0.485] 1000
Elo difference: -10.4 +/- 16.6, LOS: 11.0 %, DrawRatio: 40.4 %
MinimalChess 0.3 Carnivor vs FracTalv1.0: 308 - 320 - 372  [0.494] 1000
Elo difference: -4.2 +/- 17.1, LOS: 31.6 %, DrawRatio: 37.2 %

+6 ELO

Sadly it seems like the default PSTs were doing a little better against all engines but FracTalv1.0. Of course even despite running 1000 games each the error bars are still quite large, often larger then the difference between the PSTs.

I also tried replacing only the King table in my default PSTs with your suggested version but in my tests that didn't make any positive difference either. Maybe Carnivor's PSTs need more search depth than MMC can provide to show their true potential.

The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor

Re: The PSTs of Carnivor