Doesn't mean much with such a low number of games. In order to even tell that there is any progress at all with something like +6 Elo, you'll need 10k games at least.
The PSTs of Carnivor
Moderators: hgm, Rebel, chrisw
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: The PSTs of Carnivor
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: The PSTs of Carnivor
That was meant to go into the code block like all the other "+/- ELO" summaries. I even mentioned that the error margin is larger than the measured difference to make readers aware that these results are all pretty rough.
-
- Posts: 10323
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: The PSTs of Carnivor
Only chess programmers use this type of evaluation and only in their engine.mvanthoor wrote: ↑Fri Mar 12, 2021 4:39 pmThe problem is that we as humans can't determine if "putting a rook on an open file" should be valued at +0.12, or +0.17...
Take this into account for many parameters, such as the bishop pair (should this be +0.10, or +0.11?) and you can end up with an evaluation that is much stronger when tuned by a computer, because it can test a bazillion versions one after another.
When chess players or chess programmers play chess they never add numbers like 0.12 to evaluate a position.
The evaluation is not only material evaluation but the only thing that chess players count is material so they know who has material advantage.
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: The PSTs of Carnivor
I have tried PeSTO tapared evaluation in MMC now and I'm literally blown away.abulmo2 wrote: ↑Fri Mar 12, 2021 7:49 am Here is a comparison of different PST & material weights used in dumber.
dumber-pesto 1.2 = Pesto's weights.
dumber-1.2; Dumber's default weights
dumber-car 1.2 = Carnivor' weights.
Tournament's settings: 40/0:05+0.5 played on cutechess under Linux.Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) 1 dumber-pesto 1.2 : 2006.6 25.3 586.5 700 83.8% 2 dumber-1.2 : 1939.7 24.0 541.5 700 77.4% 3 tscp-pesto : 1804.3 21.4 435.0 700 62.1% 4 dumber-car 1.2 : 1735.8 23.0 377.0 700 53.9% 5 tscp181 : 1692.0 21.2 340.0 700 48.6% 6 rustic.alpha1 : 1623.7 22.8 284.5 700 40.6% 7 MinimalChessEngine 0.3 : 1549.1 26.1 229.0 700 32.7% 8 MinimalChessEngine 0.2.1 : 888.8 86.8 6.5 700 0.9%
Rating is computed by ordo and set so that tscp181 reaches 1692 (its CCRL 40/15's rating).
It gained 183 ELO in self play!
Code: Select all
MinimalChess 0.3 PeSTO vs MinimalChess 0.3: 582 - 98 - 320 [0.742] 1000
Code: Select all
MinimalChess 0.3 PeSTO vs Rustic: 679 - 178 - 143 [0.750] 1000
Code: Select all
MinimalChess 0.3 PeSTO vs tscp181: 312 - 137 - 66 [0.670] 515
Code: Select all
MinimalChess 0.3 PeSTO vs dumber-1.2: 202 - 605 - 193 [0.298] 1000
What sorcery is this!?
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: The PSTs of Carnivor
That is the entire reason why I said...lithander wrote: ↑Sat Mar 13, 2021 2:49 pm I have tried PeSTO tapared evaluation in MMC now and I'm literally blown away.
...this result, too, confirms that I have turned a sub 1600 engine into a ~1800 one just by replacing one set of magic numbers with a different (slightly larger) one.
What sorcery is this!?
- Just use my PST's, and don't worry about it... that's just data you're going to replace later, by a tuner that produces PST's that you can't actually understand.
- Now you replaced the data with precisely that (but not generated by yourself, I assume), and you gained almost 200 Elo
- And created an engine with a tentative rating of 1850.
You are now +191 Elo ahead of Alpha 1, and you came from about 100 Elo behind. That is a roughly 300 Elo boost, by switching from a set of standard set of PST's to a tuned and tapered set.
Now you understand why I follow the path to first write the basic engine (Alpha 1), then add a TT (Alpha 2), finish move ordering (Alpha 3), write a tuner (Rustic 4), and then add a tapered, tuned evaluation (Rustic 5). Let's just add the results.
Alpha 1: 1677
Alpha 2: 1677 + 105 from TT = 1782
Alpha 3: 1782 + 35 expected from move ordering = 1817
Rustic 4: 1877 + ? (Tuning evaluation) = unknown
Rustic 5: 1782 + 300 (Tapered and tuned evaluation over standard): 2082.
Rustic 5 will have feature parity with Dumber 1.2, except it will also have a TT. If I subtract the 105 Elo from the TT, the rating of Rustic 5 would be 1982... which is around Dumber 1.2's rating. (Dumber 1.2 performs at around 1960 Elo against Rustic Alpha 1). If all this comes to pass as I expect, I can indeed reach Vice's rating of 2050, WITHOUT any of Vice's massive list of evaluation terms, and WITHOUT some of the other features it has. On top of Rustic 5, I'll be able to add all sorts of pruning to make it faster, even before I start writing more of the evaluation.
Improving evaluation does a lot, as you may also have seen with very strong engines, that got even stronger when their evaluation was replaced by a NNUE.
To be clear:
- You now have 2 sets of PST's?
- Those are the tuned PST's from PeSTO?
- You do tapering between the two sets of PST's (or a hard switch at some point)? If you do tapering: how / where do you switch game phase? (Determine when / how far you're in the middlegame/endgame?)
Now the only decision you have to make is... do you keep the PeSTO evaluation, or do you rip it out again, write your own tuner, and tune the evaluation yourself? You've now seen what's possible with a tuned, tapered evaluation.
The reason why I never try such things is that I don't want the glimpses of what is possible, because I get very impatient when I do. I rather work my way up to it myself. Also, there are many people who are so eager to gain rating, that they copy/paste things like this, get a good result, and move on to the next feature, resulting in an engine that is constructed of copy/pasted material of which the author of the engine sometimes doesn't have an inkling of how that material came about.
Congratulations with the massive Elo jump. Are you going to leave it in there, or are you going to achieve it on your own, with your own tuner, data, and evaluation? You now know it's possible... See you around the 2000 mark, at some point
Development of my engine is rather slow, because I demand from myself that I understand every detail of each feature I put into the engine. If there's something I don't fully understand, I'll keep studying it and testing it to find out; I'm not going to put anything into the engine I don't fully understand. (I have to understand it, if I want to write about it on rustic-chess.org, for example.)
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: The PSTs of Carnivor
3x yes! I just read the thread abulmo2 linked where Ronald Friederich himself provided the PeSTO tables. He also explained how he did the tapering:mvanthoor wrote: ↑Sat Mar 13, 2021 3:21 pm To be clear:
- You now have 2 sets of PST's?
- Those are the tuned PST's from PeSTO?
- You do tapering between the two sets of PST's (or a hard switch at some point)? If you do tapering: how / where do you switch game phase? (Determine when / how far you're in the middlegame/endgame?)
Compute MG and EG score seperateley and mix them later. The interpolation factor is derived from adding up all the midgame piece values except pawns. If this sum is greater than 6192 you use 100% the MG score, if it's smaller than 518 you use the EG score. In between your interpolate linearly. I assumed that these thresholds were also found during the tuning process and didn't mess with it.
Tough question indeed.
I have written a Minimal Neural Classifier from scratch and a Minimal Bitcoin Miner and each was interesting and educational but I was finished after a few days and moved on. (I hate cryptocoins actually for all the energy they waste on pointless computation)
MinimalChess was meant to be like that. Hence the name. I just wanted to understand how chess engines work after I got interested in chess by watching queens gambit.
But now, after adding PeSTO MinimalChess has, against all plans and expectations, actually evolved into a decent engine. Watching it play it looks completely fine now to my amateurs eye.
I've setup a Q+K vs K endgame that Stockfish analyzes as mate in #9. MMC PeSTO wins this against Stockfish in 11 moves.
It wins a R+K vs K that is a mate in #15 in 17 moves. And a 2B+K vs K which is mate in 18 is won after 20 moves.
It's hard to go back now and say, I don't need it it, it's not minimal. I don't feel like wrapping this project up and calling it done only because I wasn't more ambitious when I started it. I would feel much better about my first chess engine if it actually played some decent chess - like it does with PeSTO. Now that I've seen that I want it!
But the other question is of course how I feel about just taking the complete evaluation from some other engine. Would that even be legal? When I joined the forum I was a little bemused by all the drama around derivative work and properly crediting or not crediting the original authors. But now I get it. My personal stance to that is that I'm not going to release a version 0.4 that gains 200 ELO from copying PeSTO's tables. But there's probably going to be a version 0.4 eventually, with some kind of tapered evaluation and "my own" tables derived from tuning.
Is this still minimal? No. But compared to other evaluations like the one from VICE you mentioned it's still a very simple way to calculate a positional evaluation. If you auto-tune it you don't even need to know much about chess, you just need a Dataset with labeled positions. Everyone on here said that an engine needs some chess knowledge beyond material value to function properly and that I should add PSTs. Now tapered evaluation is basically the same thing just that you have two sets of PSTs. And that it stops the engine from blundering endgames!
I think we both agree that congratulations are only in order if I can do it without PeSTO's tables. The 2000 mark is probably something Minimal Chess will never cross though. But who knows, I felt like I was "almost" done a few times, now!
Same here! That's my golden rule I won't break. But I understand tapered evals and PSTs and I understand how Texel's tuning method works. (also that it only finds local minima so finding PSTs as good as PeSTO involves a large amount of luck.) The only thing I don't understand is the meaning of the individual values it produces. But nobody does. It's like asking someone to explain the individual weights meaning in a neural network. You can't.
-
- Posts: 868
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: The PSTs of Carnivor
PeSTO, imo, is tuned for the "90%". It is my prediction that if 1000 games were played where both sides castled queensiden then PeSTO would not perform as well.lithander wrote: ↑Sat Mar 13, 2021 4:54 pm3x yes! I just read the thread abulmo2 linked where Ronald Friederich himself provided the PeSTO tables. He also explained how he did the tapering:mvanthoor wrote: ↑Sat Mar 13, 2021 3:21 pm To be clear:
- You now have 2 sets of PST's?
- Those are the tuned PST's from PeSTO?
- You do tapering between the two sets of PST's (or a hard switch at some point)? If you do tapering: how / where do you switch game phase? (Determine when / how far you're in the middlegame/endgame?)
Compute MG and EG score seperateley and mix them later. The interpolation factor is derived from adding up all the midgame piece values except pawns. If this sum is greater than 6192 you use 100% the MG score, if it's smaller than 518 you use the EG score. In between your interpolate linearly. I assumed that these thresholds were also found during the tuning process and didn't mess with it.
Tough question indeed.
I have written a Minimal Neural Classifier from scratch and a Minimal Bitcoin Miner and each was interesting and educational but I was finished after a few days and moved on. (I hate cryptocoins actually for all the energy they waste on pointless computation)
MinimalChess was meant to be like that. Hence the name. I just wanted to understand how chess engines work after I got interested in chess by watching queens gambit.
But now, after adding PeSTO MinimalChess has, against all plans and expectations, actually evolved into a decent engine. Watching it play it looks completely fine now to my amateurs eye.
I've setup a Q+K vs K endgame that Stockfish analyzes as mate in #9. MMC PeSTO wins this against Stockfish in 11 moves.
It wins a R+K vs K that is a mate in #15 in 17 moves. And a 2B+K vs K which is mate in 18 is won after 20 moves.
It's hard to go back now and say, I don't need it it, it's not minimal. I don't feel like wrapping this project up and calling it done only because I wasn't more ambitious when I started it. I would feel much better about my first chess engine if it actually played some decent chess - like it does with PeSTO. Now that I've seen that I want it!
But the other question is of course how I feel about just taking the complete evaluation from some other engine. Would that even be legal? When I joined the forum I was a little bemused by all the drama around derivative work and properly crediting or not crediting the original authors. But now I get it. My personal stance to that is that I'm not going to release a version 0.4 that gains 200 ELO from copying PeSTO's tables. But there's probably going to be a version 0.4 eventually, with some kind of tapered evaluation and "my own" tables derived from tuning.
Is this still minimal? No. But compared to other evaluations like the one from VICE you mentioned it's still a very simple way to calculate a positional evaluation. If you auto-tune it you don't even need to know much about chess, you just need a Dataset with labeled positions. Everyone on here said that an engine needs some chess knowledge beyond material value to function properly and that I should add PSTs. Now tapered evaluation is basically the same thing just that you have two sets of PSTs. And that it stops the engine from blundering endgames!
I think we both agree that congratulations are only in order if I can do it without PeSTO's tables. The 2000 mark is probably something Minimal Chess will never cross though. But who knows, I felt like I was "almost" done a few times, now!
Same here! That's my golden rule I won't break. But I understand tapered evals and PSTs and I understand how Texel's tuning method works. (also that it only finds local minima so finding PSTs as good as PeSTO involves a large amount of luck.) The only thing I don't understand is the meaning of the individual values it produces. But nobody does. It's like asking someone to explain the individual weights meaning in a neural network. You can't.
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: The PSTs of Carnivor
I suppose with a custom opening book where the mainlines are removed we could test that! Does such a book exist?Mike Sherwin wrote: ↑Sat Mar 13, 2021 7:00 pm PeSTO, imo, is tuned for the "90%". It is my prediction that if 1000 games were played where both sides castled queensiden then PeSTO would not perform as well.
-
- Posts: 868
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: The PSTs of Carnivor
I think, just download a large pgn opening book and use a database to select those openings in which both sides castle queenside and save them to their own pgn file. Then just use that file as the opening book in the gui. I think any pgn file should work as the source. Maybe download a gm pgn file with a few hundred thousand high quality games. I know Arena allows any pgn file to be used as an opening book to a set depth.lithander wrote: ↑Sat Mar 13, 2021 8:07 pmI suppose with a custom opening book where the mainlines are removed we could test that! Does such a book exist?Mike Sherwin wrote: ↑Sat Mar 13, 2021 7:00 pm PeSTO, imo, is tuned for the "90%". It is my prediction that if 1000 games were played where both sides castled queensiden then PeSTO would not perform as well.