Progress on Rustic

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Progress on Rustic

Post by Ras »

lithander wrote: Thu Mar 18, 2021 2:40 pmGoogling "Quiet move history" didn't turn up many results, though. Do you have a link?
See https://www.chessprogramming.org/History_Heuristic.
And I tried Internal Iterative Deepening but I think my move generator is too slow for that to make it worth the additional cost at the rather low search depths (7-8) I achieve at the moment.
I'm doing IID only if the remaining search depth is at least 6 plies, and if no hash/PV move is available, and then with sharply reduced depth. You won't get spectacular results from IID - it's works more like a worst case insurance, but then it's pretty good to have.
Rasmus Althoff
https://www.ct800.net
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Progress on Rustic

Post by mvanthoor »

The release for Rustic Alpha 2 has been republished:

https://github.com/mvanthoor/rustic/rel ... ag/alpha-2

Binaries have been added for Rustic Alpha 1.1, which fixes the omission of taking GUI overhead into account in MoveTime mode:

https://github.com/mvanthoor/rustic/rel ... /alpha-1.1

For rating list testers: If you don't use MoveTime mode (i.e., 5 seconds per move), there's no need to test Alpha 1.1.

Thanks to mar and emadsen for the sparring thread regarding the transposition table. That made finding the problem easier than expected :)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Progress on Rustic

Post by mvanthoor »

First skeleton of the tutorial/book is (finally) uploaded.

https://rustic-chess.org/

Now I "only" need to add pages and write some stuff in them. If people have any suggestions, I'd be glad to hear them. Except for "when are you going to add page X or Y", because I don't know. I'll probably start with the board representation, but I might suddenly switch to describing the PST's. The site will fill out gradually, and at some point, catch up with Rustic's development stage.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Progress on Rustic

Post by mvanthoor »

Today the CCRL-list was updated with the test for Rustic Alpha 2:

Rustic Alpha 2

Thanks for testing guys :)

Even so, I think the engine's rating is disappointing: the addition of the TT + TTMove sorting yielded +200 Elo in self-play, and +165 in my gauntlets. That would put the engine at around 1840, but it only achieved 1781 in the CCRL-list. There can be a few possibilities:

- I forgot to setup hash tables for all the other engines in my gauntlet, and ran both Rustic Alpha 1 and 2 against those engines. It could be that both massively overperformed. I'm rerunning two new gauntlets, with both Alpha 1 and Alpha 2.
- I only test Rustic against other engines: the other engines don't play one another. (I don't have the computing power to do this; if I upgrade to a 16+ core computer at some time, I may have the option to do a few round-robin tournaments.) It could be that my results are skewed because of this.
- CCRL used different time controls (2m+1s for them, and 1m+0.6s for me), different opening books, and apart from one, different opponents.
- It could also be that I mistakenly uploaded the binary with the TT move sorting bug in it, which _would_ have caused the engine to underperform 60 Elo. I'll ask Gabor to redownload it to be sure. Even so, it doesn't matter. Even if I did, the next test with Alpha 3 will just have a bigger Elo jump.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Progress on Rustic

Post by Sven »

mvanthoor wrote: Sat Mar 27, 2021 11:28 pm but it only achieved 1781 in the CCRL-list. There can be a few possibilities:

[...]
- I only test Rustic against other engines: the other engines don't play one another. (I don't have the computing power to do this; if I upgrade to a 16+ core computer at some time, I may have the option to do a few round-robin tournaments.) It could be that my results are skewed because of this.
You can safely exclude this as a reason. Results from games between the other engines don't influence the rating estimation you get for your engine. A long while ago I had the same thought as you but I learned that the opposite was true.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
User avatar
emadsen
Posts: 434
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Progress on Rustic

Post by emadsen »

Sven wrote: Sun Mar 28, 2021 6:46 pm
mvanthoor wrote: Sat Mar 27, 2021 11:28 pm but it only achieved 1781 in the CCRL-list. There can be a few possibilities:

[...]
- I only test Rustic against other engines: the other engines don't play one another. (I don't have the computing power to do this; if I upgrade to a 16+ core computer at some time, I may have the option to do a few round-robin tournaments.) It could be that my results are skewed because of this.
You can safely exclude this as a reason. Results from games between the other engines don't influence the rating estimation you get for your engine. A long while ago I had the same thought as you but I learned that the opposite was true.
Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?

When I bought a new computer three years ago, the first thing I did was run tournaments among four classes of engines (separated by estimated strength), estimate Elo, then run more tournaments among the lower half of a class against the higher half of the lower class to ensure good cross-pollination of games and eliminate isolated groups of engines. See my Tournaments page for details.

I viewed this as a critical prerequisite before ever attempting to measure the strength of MadChess. I combine these games with games from a gauntlet tournament pitting a particular version of MadChess against ten opponents of strength in the range +/- 100 Elo.

Your comment above suggests this is unnecessary. I’m struggling to understand what I’m missing here. Because my technique has produced private Elo estimates of MadChess strength that highly correlate with CCRL Elo measurements when I release a public version of my engine. (I use Ordo and anchor four engines to their CCRL Elo rating.) Interested to hear your thoughts.
My C# chess engine: https://www.madchess.net
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Progress on Rustic

Post by mvanthoor »

emadsen wrote: Sun Mar 28, 2021 7:35 pm Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?

When I bought a new computer three years ago, the first thing I did was run tournaments among four classes of engines (separated by estimated strength), estimate Elo, then run more tournaments among the lower half of a class against the higher half of the lower class to ensure good cross-pollination of games and eliminate isolated groups of engines. See my Tournaments page for details.

I viewed this as a critical prerequisite before ever attempting to measure the strength of MadChess. I combine these games with games from a gauntlet tournament pitting a particular version of MadChess against ten opponents of strength in the range +/- 100 Elo.

Your comment above suggests this is unnecessary. I’m struggling to understand what I’m missing here. Because my technique has produced private Elo estimates of MadChess strength that highly correlate with CCRL Elo measurements when I release a public version of my engine. (I use Ordo and anchor four engines to their CCRL Elo rating.) Interested to hear your thoughts.
I was having the same thoughts, basically. It is possible that A defeats B, B defeats C, but still C can defeat A. That ratings don't have to turn out the same, even if exepected, can be seen in the new engine BitGenie. In my own test of this new engine, it cored exactly 50% against Rustic Alpha 2, which would suggest that the engines would end up at the same spot in the rating list, but Rustic is 30 Elo stronger on CCRL. Being in a rating list also means that, if an engine's rating drops or increases, and your engine played against this engine, your engine's rating also change.

The one thing I was wondering about is:
- Did someone from CCRL grab the buggy binary and start a test with it? That would explain the 60 Elo difference between the result and my expectations. (Fixing the bug increased the engine's strength by 60 Elo.)
- Are my expectations skewed, because of the way I test the engine? Sven says no, you say yes (or, probably).
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
emadsen
Posts: 434
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Progress on Rustic

Post by emadsen »

mvanthoor wrote: Sun Mar 28, 2021 9:47 pm I was having the same thoughts, basically. It is possible that A defeats B, B defeats C, but still C can defeat A. That ratings don't have to turn out the same, even if exepected, can be seen in the new engine BitGenie. In my own test of this new engine, it cored exactly 50% against Rustic Alpha 2, which would suggest that the engines would end up at the same spot in the rating list, but Rustic is 30 Elo stronger on CCRL.
Yes, this is very common. Engine A has eval knowledge that engine B doesn't, or "sees" moves engine B doesn't due to differences in search reductions, and it beats up engine B. However engine C also has this particular eval knowledge, and / or also "sees" the same moves as engine A due to similar search reduction logic, and is faster or has a more efficient cache implementation so it beats up engine A. Meanwhile, engine B leverages a different exploit to consistently defeat engine C. These non-transitive strength relationships occur frequently. That's why it's necessary to create a diverse pool of chess engines in order to improve the accuracy of engine strength measurements.

We don't need all-play-all tournaments. I can accurately predict the result of MadChess playing games against Stockfish. It will lose all the games. I don't need to include those games in my engine database. However, among engines within a couple hundred Elo points of each other, all-play-all tournaments reduce the error margins of engine strength estimates.
mvanthoor wrote: Sun Mar 28, 2021 9:47 pmDid someone from CCRL grab the buggy binary and start a test with it? That would explain the 60 Elo difference between the result and my expectations.
This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.
My C# chess engine: https://www.madchess.net
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Progress on Rustic

Post by mvanthoor »

emadsen wrote: Sun Mar 28, 2021 10:39 pm This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.
I've thought about it, and probably will do so in the future. I've asked the CCRL-testers if it's possible to remove the games from the DB and retest the engine, or test it again with a fairly large tournament to "fix" the rating as much as possible. If this isn't possible or undesired, then I'll just leave it be. I assume the testers will at least replace the binary with the current one for future games against other engines, so the rating should be fixed by itself in time.

When I release Alpha 3, it will have this fix (+ the improvements I intend it to have), and the Elo-jump will then be bigger and correct again. Alpha 2 will then be in the rating-list with a somewhat lower rating.

The binary was up for about half an hour before I removed the release during our TT discussion.

Ah well; the Alpha releases are not that important, because I'm still adding basic stuff. Alpha 3 will have Killer+History, and maybe Aspiration Windows + PVS, but that could be Alpha 4, depending on how much Elo Killer+History brings. I don't yet know when I'll consider the engine "done" and ready to drop the Alpha part. This could be the version with the tuned eval, or the version with the tapered+tuned eval. I'll start on the tuning after PVS.

The worst that could happen is that Alpha 2 is in the list at a somewhat lower rating than expected; lesson learned.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
lithander
Posts: 880
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Progress on Rustic

Post by lithander »

emadsen wrote: Sun Mar 28, 2021 7:35 pm Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?
I think you can skip the part where these engines play each other if you know their rating *and* if you use these ratings as anchors when you process your PGNs. In ordo for example you can pass a CSV file via the -m parameter. (it's explained in the manual)
emadsen wrote: Sun Mar 28, 2021 10:39 pm This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.
I know of build systems that increase the number with each build automatically. Or that will include the revision of the version control system in the build. Definitely a best practice.

But with my private project and without a build system I would have to remember to increase the version manually with every source push. So far I only changed the version number when making a build and would certainly inc the version whenever I release binaries. But that wouldn't stop someone from checking out a specific revision from git *after* important features have been added but *before* I tagged the next version and it would play under the previous version but much stronger.

Do you increase the version with each push manually or do you have an automatic system?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess