The PSTs of Carnivor

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ras
Posts: 2488
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: The PSTs of Carnivor

Post by Ras »

lithander wrote: Fri Mar 12, 2021 11:20 pm+6 ELO
Doesn't mean much with such a low number of games. In order to even tell that there is any progress at all with something like +6 Elo, you'll need 10k games at least.
Rasmus Althoff
https://www.ct800.net
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The PSTs of Carnivor

Post by lithander »

Ras wrote: Sat Mar 13, 2021 12:22 am
lithander wrote: Fri Mar 12, 2021 11:20 pm+6 ELO
Doesn't mean much with such a low number of games. In order to even tell that there is any progress at all with something like +6 Elo, you'll need 10k games at least.
That was meant to go into the code block like all the other "+/- ELO" summaries. :oops: I even mentioned that the error margin is larger than the measured difference to make readers aware that these results are all pretty rough.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Uri Blass
Posts: 10323
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: The PSTs of Carnivor

Post by Uri Blass »

mvanthoor wrote: Fri Mar 12, 2021 4:39 pm
abulmo2 wrote: Fri Mar 12, 2021 2:44 pm The only thing I know for sure, is that human hand written evaluation are incredibly wrong and weak. Our brains understand nothing about chess ;-)
The problem is that we as humans can't determine if "putting a rook on an open file" should be valued at +0.12, or +0.17...

Take this into account for many parameters, such as the bishop pair (should this be +0.10, or +0.11?) and you can end up with an evaluation that is much stronger when tuned by a computer, because it can test a bazillion versions one after another.
Only chess programmers use this type of evaluation and only in their engine.
When chess players or chess programmers play chess they never add numbers like 0.12 to evaluate a position.
The evaluation is not only material evaluation but the only thing that chess players count is material so they know who has material advantage.
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The PSTs of Carnivor

Post by lithander »

abulmo2 wrote: Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.

Code: Select all

  # PLAYER                      : RATING  ERROR   POINTS  PLAYED    (%)
   1 dumber-pesto 1.2            : 2006.6   25.3    586.5     700   83.8%
   2 dumber-1.2                  : 1939.7   24.0    541.5     700   77.4%
   3 tscp-pesto                  : 1804.3   21.4    435.0     700   62.1%
   4 dumber-car 1.2              : 1735.8   23.0    377.0     700   53.9%
   5 tscp181                     : 1692.0   21.2    340.0     700   48.6%
   6 rustic.alpha1               : 1623.7   22.8    284.5     700   40.6%
   7 MinimalChessEngine 0.3      : 1549.1   26.1    229.0     700   32.7%
   8 MinimalChessEngine 0.2.1    :  888.8   86.8      6.5     700    0.9%
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).
I have tried PeSTO tapared evaluation in MMC now and I'm literally blown away.

It gained 183 ELO in self play!

Code: Select all

MinimalChess 0.3 PeSTO vs MinimalChess 0.3: 582 - 98 - 320  [0.742] 1000
But against other engines it gained even more! MMC PeSTO is now 191 ELO ahead of Rustic Alpha

Code: Select all

MinimalChess 0.3 PeSTO vs Rustic: 679 - 178 - 143  [0.750] 1000
...and 122 ELO ahead of TSCP181...

Code: Select all

MinimalChess 0.3 PeSTO vs tscp181: 312 - 137 - 66  [0.670] 515
... and while gainst Dumber 1.2 it's still 150 ELO behind...

Code: Select all

MinimalChess 0.3 PeSTO vs dumber-1.2: 202 - 605 - 193  [0.298] 1000
...this result, too, confirms that I have turned a sub 1600 engine into a ~1800 one just by replacing one set of magic numbers with a different (slightly larger) one.

What sorcery is this!?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: The PSTs of Carnivor

Post by mvanthoor »

lithander wrote: Sat Mar 13, 2021 2:49 pm I have tried PeSTO tapared evaluation in MMC now and I'm literally blown away.

...this result, too, confirms that I have turned a sub 1600 engine into a ~1800 one just by replacing one set of magic numbers with a different (slightly larger) one.

What sorcery is this!?
That is the entire reason why I said...

- Just use my PST's, and don't worry about it... that's just data you're going to replace later, by a tuner that produces PST's that you can't actually understand.
- Now you replaced the data with precisely that (but not generated by yourself, I assume), and you gained almost 200 Elo
- And created an engine with a tentative rating of 1850.

You are now +191 Elo ahead of Alpha 1, and you came from about 100 Elo behind. That is a roughly 300 Elo boost, by switching from a set of standard set of PST's to a tuned and tapered set.

Now you understand why I follow the path to first write the basic engine (Alpha 1), then add a TT (Alpha 2), finish move ordering (Alpha 3), write a tuner (Rustic 4), and then add a tapered, tuned evaluation (Rustic 5). Let's just add the results.

Alpha 1: 1677
Alpha 2: 1677 + 105 from TT = 1782
Alpha 3: 1782 + 35 expected from move ordering = 1817
Rustic 4: 1877 + ? (Tuning evaluation) = unknown
Rustic 5: 1782 + 300 (Tapered and tuned evaluation over standard): 2082.

Rustic 5 will have feature parity with Dumber 1.2, except it will also have a TT. If I subtract the 105 Elo from the TT, the rating of Rustic 5 would be 1982... which is around Dumber 1.2's rating. (Dumber 1.2 performs at around 1960 Elo against Rustic Alpha 1). If all this comes to pass as I expect, I can indeed reach Vice's rating of 2050, WITHOUT any of Vice's massive list of evaluation terms, and WITHOUT some of the other features it has. On top of Rustic 5, I'll be able to add all sorts of pruning to make it faster, even before I start writing more of the evaluation.

Improving evaluation does a lot, as you may also have seen with very strong engines, that got even stronger when their evaluation was replaced by a NNUE.

To be clear:
- You now have 2 sets of PST's?
- Those are the tuned PST's from PeSTO?
- You do tapering between the two sets of PST's (or a hard switch at some point)? If you do tapering: how / where do you switch game phase? (Determine when / how far you're in the middlegame/endgame?)

Now the only decision you have to make is... do you keep the PeSTO evaluation, or do you rip it out again, write your own tuner, and tune the evaluation yourself? You've now seen what's possible with a tuned, tapered evaluation.

The reason why I never try such things is that I don't want the glimpses of what is possible, because I get very impatient when I do. I rather work my way up to it myself. Also, there are many people who are so eager to gain rating, that they copy/paste things like this, get a good result, and move on to the next feature, resulting in an engine that is constructed of copy/pasted material of which the author of the engine sometimes doesn't have an inkling of how that material came about.

Congratulations with the massive Elo jump. Are you going to leave it in there, or are you going to achieve it on your own, with your own tuner, data, and evaluation? You now know it's possible... :) See you around the 2000 mark, at some point :)

Development of my engine is rather slow, because I demand from myself that I understand every detail of each feature I put into the engine. If there's something I don't fully understand, I'll keep studying it and testing it to find out; I'm not going to put anything into the engine I don't fully understand. (I have to understand it, if I want to write about it on rustic-chess.org, for example.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The PSTs of Carnivor

Post by lithander »

mvanthoor wrote: Sat Mar 13, 2021 3:21 pm To be clear:
- You now have 2 sets of PST's?
- Those are the tuned PST's from PeSTO?
- You do tapering between the two sets of PST's (or a hard switch at some point)? If you do tapering: how / where do you switch game phase? (Determine when / how far you're in the middlegame/endgame?)
3x yes! I just read the thread abulmo2 linked where Ronald Friederich himself provided the PeSTO tables. He also explained how he did the tapering:
Compute MG and EG score seperateley and mix them later. The interpolation factor is derived from adding up all the midgame piece values except pawns. If this sum is greater than 6192 you use 100% the MG score, if it's smaller than 518 you use the EG score. In between your interpolate linearly. I assumed that these thresholds were also found during the tuning process and didn't mess with it.
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm Now the only decision you have to make is... do you keep the PeSTO evaluation, or do you rip it out again, write your own tuner, and tune the evaluation yourself? You've now seen what's possible with a tuned, tapered evaluation.
Tough question indeed.

I have written a Minimal Neural Classifier from scratch and a Minimal Bitcoin Miner and each was interesting and educational but I was finished after a few days and moved on. (I hate cryptocoins actually for all the energy they waste on pointless computation)

MinimalChess was meant to be like that. Hence the name. I just wanted to understand how chess engines work after I got interested in chess by watching queens gambit.

But now, after adding PeSTO MinimalChess has, against all plans and expectations, actually evolved into a decent engine. Watching it play it looks completely fine now to my amateurs eye.
I've setup a Q+K vs K endgame that Stockfish analyzes as mate in #9. MMC PeSTO wins this against Stockfish in 11 moves.
It wins a R+K vs K that is a mate in #15 in 17 moves. And a 2B+K vs K which is mate in 18 is won after 20 moves.

It's hard to go back now and say, I don't need it it, it's not minimal. I don't feel like wrapping this project up and calling it done only because I wasn't more ambitious when I started it. I would feel much better about my first chess engine if it actually played some decent chess - like it does with PeSTO. Now that I've seen that I want it! ;)

But the other question is of course how I feel about just taking the complete evaluation from some other engine. Would that even be legal? When I joined the forum I was a little bemused by all the drama around derivative work and properly crediting or not crediting the original authors. But now I get it. My personal stance to that is that I'm not going to release a version 0.4 that gains 200 ELO from copying PeSTO's tables. But there's probably going to be a version 0.4 eventually, with some kind of tapered evaluation and "my own" tables derived from tuning.

Is this still minimal? No. But compared to other evaluations like the one from VICE you mentioned it's still a very simple way to calculate a positional evaluation. If you auto-tune it you don't even need to know much about chess, you just need a Dataset with labeled positions. Everyone on here said that an engine needs some chess knowledge beyond material value to function properly and that I should add PSTs. Now tapered evaluation is basically the same thing just that you have two sets of PSTs. And that it stops the engine from blundering endgames! ;)
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm Congratulations with the massive Elo jump. Are you going to leave it in there, or are you going to achieve it on your own, with your own tuner, data, and evaluation? You now know it's possible... :) See you around the 2000 mark, at some point :)
I think we both agree that congratulations are only in order if I can do it without PeSTO's tables. The 2000 mark is probably something Minimal Chess will never cross though. But who knows, I felt like I was "almost" done a few times, now! ;)
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm I'm not going to put anything into the engine I don't fully understand.
Same here! That's my golden rule I won't break. But I understand tapered evals and PSTs and I understand how Texel's tuning method works. (also that it only finds local minima so finding PSTs as good as PeSTO involves a large amount of luck.) The only thing I don't understand is the meaning of the individual values it produces. But nobody does. It's like asking someone to explain the individual weights meaning in a neural network. You can't.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Mike Sherwin
Posts: 868
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: The PSTs of Carnivor

Post by Mike Sherwin »

lithander wrote: Sat Mar 13, 2021 4:54 pm
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm To be clear:
- You now have 2 sets of PST's?
- Those are the tuned PST's from PeSTO?
- You do tapering between the two sets of PST's (or a hard switch at some point)? If you do tapering: how / where do you switch game phase? (Determine when / how far you're in the middlegame/endgame?)
3x yes! I just read the thread abulmo2 linked where Ronald Friederich himself provided the PeSTO tables. He also explained how he did the tapering:
Compute MG and EG score seperateley and mix them later. The interpolation factor is derived from adding up all the midgame piece values except pawns. If this sum is greater than 6192 you use 100% the MG score, if it's smaller than 518 you use the EG score. In between your interpolate linearly. I assumed that these thresholds were also found during the tuning process and didn't mess with it.
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm Now the only decision you have to make is... do you keep the PeSTO evaluation, or do you rip it out again, write your own tuner, and tune the evaluation yourself? You've now seen what's possible with a tuned, tapered evaluation.
Tough question indeed.

I have written a Minimal Neural Classifier from scratch and a Minimal Bitcoin Miner and each was interesting and educational but I was finished after a few days and moved on. (I hate cryptocoins actually for all the energy they waste on pointless computation)

MinimalChess was meant to be like that. Hence the name. I just wanted to understand how chess engines work after I got interested in chess by watching queens gambit.

But now, after adding PeSTO MinimalChess has, against all plans and expectations, actually evolved into a decent engine. Watching it play it looks completely fine now to my amateurs eye.
I've setup a Q+K vs K endgame that Stockfish analyzes as mate in #9. MMC PeSTO wins this against Stockfish in 11 moves.
It wins a R+K vs K that is a mate in #15 in 17 moves. And a 2B+K vs K which is mate in 18 is won after 20 moves.

It's hard to go back now and say, I don't need it it, it's not minimal. I don't feel like wrapping this project up and calling it done only because I wasn't more ambitious when I started it. I would feel much better about my first chess engine if it actually played some decent chess - like it does with PeSTO. Now that I've seen that I want it! ;)

But the other question is of course how I feel about just taking the complete evaluation from some other engine. Would that even be legal? When I joined the forum I was a little bemused by all the drama around derivative work and properly crediting or not crediting the original authors. But now I get it. My personal stance to that is that I'm not going to release a version 0.4 that gains 200 ELO from copying PeSTO's tables. But there's probably going to be a version 0.4 eventually, with some kind of tapered evaluation and "my own" tables derived from tuning.

Is this still minimal? No. But compared to other evaluations like the one from VICE you mentioned it's still a very simple way to calculate a positional evaluation. If you auto-tune it you don't even need to know much about chess, you just need a Dataset with labeled positions. Everyone on here said that an engine needs some chess knowledge beyond material value to function properly and that I should add PSTs. Now tapered evaluation is basically the same thing just that you have two sets of PSTs. And that it stops the engine from blundering endgames! ;)
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm Congratulations with the massive Elo jump. Are you going to leave it in there, or are you going to achieve it on your own, with your own tuner, data, and evaluation? You now know it's possible... :) See you around the 2000 mark, at some point :)
I think we both agree that congratulations are only in order if I can do it without PeSTO's tables. The 2000 mark is probably something Minimal Chess will never cross though. But who knows, I felt like I was "almost" done a few times, now! ;)
mvanthoor wrote: Sat Mar 13, 2021 3:21 pm I'm not going to put anything into the engine I don't fully understand.
Same here! That's my golden rule I won't break. But I understand tapered evals and PSTs and I understand how Texel's tuning method works. (also that it only finds local minima so finding PSTs as good as PeSTO involves a large amount of luck.) The only thing I don't understand is the meaning of the individual values it produces. But nobody does. It's like asking someone to explain the individual weights meaning in a neural network. You can't.
PeSTO, imo, is tuned for the "90%". It is my prediction that if 1000 games were played where both sides castled queensiden then PeSTO would not perform as well.
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The PSTs of Carnivor

Post by lithander »

Mike Sherwin wrote: Sat Mar 13, 2021 7:00 pm PeSTO, imo, is tuned for the "90%". It is my prediction that if 1000 games were played where both sides castled queensiden then PeSTO would not perform as well.
I suppose with a custom opening book where the mainlines are removed we could test that! Does such a book exist?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Mike Sherwin
Posts: 868
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: The PSTs of Carnivor

Post by Mike Sherwin »

lithander wrote: Sat Mar 13, 2021 8:07 pm
Mike Sherwin wrote: Sat Mar 13, 2021 7:00 pm PeSTO, imo, is tuned for the "90%". It is my prediction that if 1000 games were played where both sides castled queensiden then PeSTO would not perform as well.
I suppose with a custom opening book where the mainlines are removed we could test that! Does such a book exist?
I think, just download a large pgn opening book and use a database to select those openings in which both sides castle queenside and save them to their own pgn file. Then just use that file as the opening book in the gui. I think any pgn file should work as the source. Maybe download a gm pgn file with a few hundred thousand high quality games. I know Arena allows any pgn file to be used as an opening book to a set depth.