Progress on Rustic

Ras · Post by **Ras** » Thu Mar 18, 2021 3:01 pm

lithander wrote: ↑Thu Mar 18, 2021 2:40 pmGoogling "Quiet move history" didn't turn up many results, though. Do you have a link?

See https://www.chessprogramming.org/History_Heuristic.

And I tried Internal Iterative Deepening but I think my move generator is too slow for that to make it worth the additional cost at the rather low search depths (7-8) I achieve at the moment.

I'm doing IID only if the remaining search depth is at least 6 plies, and if no hash/PV move is available, and then with sharply reduced depth. You won't get spectacular results from IID - it's works more like a worst case insurance, but then it's pretty good to have.

mvanthoor · Post by **mvanthoor** » Thu Mar 18, 2021 6:07 pm

The release for Rustic Alpha 2 has been republished:

https://github.com/mvanthoor/rustic/rel ... ag/alpha-2

Binaries have been added for Rustic Alpha 1.1, which fixes the omission of taking GUI overhead into account in MoveTime mode:

https://github.com/mvanthoor/rustic/rel ... /alpha-1.1

For rating list testers: If you don't use MoveTime mode (i.e., 5 seconds per move), there's no need to test Alpha 1.1.

Thanks to mar and emadsen for the sparring thread regarding the transposition table. That made finding the problem easier than expected

mvanthoor · Post by **mvanthoor** » Thu Mar 25, 2021 11:14 pm

First skeleton of the tutorial/book is (finally) uploaded.

https://rustic-chess.org/

Now I "only" need to add pages and write some stuff in them. If people have any suggestions, I'd be glad to hear them. Except for "when are you going to add page X or Y", because I don't know. I'll probably start with the board representation, but I might suddenly switch to describing the PST's. The site will fill out gradually, and at some point, catch up with Rustic's development stage.

mvanthoor · Post by **mvanthoor** » Sat Mar 27, 2021 11:28 pm

Today the CCRL-list was updated with the test for Rustic Alpha 2:

Rustic Alpha 2

Thanks for testing guys

Even so, I think the engine's rating is disappointing: the addition of the TT + TTMove sorting yielded +200 Elo in self-play, and +165 in my gauntlets. That would put the engine at around 1840, but it only achieved 1781 in the CCRL-list. There can be a few possibilities:

- I forgot to setup hash tables for all the other engines in my gauntlet, and ran both Rustic Alpha 1 and 2 against those engines. It could be that both massively overperformed. I'm rerunning two new gauntlets, with both Alpha 1 and Alpha 2.
- I only test Rustic against other engines: the other engines don't play one another. (I don't have the computing power to do this; if I upgrade to a 16+ core computer at some time, I may have the option to do a few round-robin tournaments.) It could be that my results are skewed because of this.
- CCRL used different time controls (2m+1s for them, and 1m+0.6s for me), different opening books, and apart from one, different opponents.
- It could also be that I mistakenly uploaded the binary with the TT move sorting bug in it, which _would_ have caused the engine to underperform 60 Elo. I'll ask Gabor to redownload it to be sure. Even so, it doesn't matter. Even if I did, the next test with Alpha 3 will just have a bigger Elo jump.

Sven · Post by **Sven** » Sun Mar 28, 2021 6:46 pm

mvanthoor wrote: ↑Sat Mar 27, 2021 11:28 pm but it only achieved 1781 in the CCRL-list. There can be a few possibilities:

[...]
- I only test Rustic against other engines: the other engines don't play one another. (I don't have the computing power to do this; if I upgrade to a 16+ core computer at some time, I may have the option to do a few round-robin tournaments.) It could be that my results are skewed because of this.

You can safely exclude this as a reason. Results from games between the other engines don't influence the rating estimation you get for your engine. A long while ago I had the same thought as you but I learned that the opposite was true.

emadsen · Post by **emadsen** » Sun Mar 28, 2021 7:35 pm

Sven wrote: ↑Sun Mar 28, 2021 6:46 pm
mvanthoor wrote: ↑Sat Mar 27, 2021 11:28 pm but it only achieved 1781 in the CCRL-list. There can be a few possibilities:

[...]
- I only test Rustic against other engines: the other engines don't play one another. (I don't have the computing power to do this; if I upgrade to a 16+ core computer at some time, I may have the option to do a few round-robin tournaments.) It could be that my results are skewed because of this.
You can safely exclude this as a reason. Results from games between the other engines don't influence the rating estimation you get for your engine. A long while ago I had the same thought as you but I learned that the opposite was true.

Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?

When I bought a new computer three years ago, the first thing I did was run tournaments among four classes of engines (separated by estimated strength), estimate Elo, then run more tournaments among the lower half of a class against the higher half of the lower class to ensure good cross-pollination of games and eliminate isolated groups of engines. See my Tournaments page for details.

I viewed this as a critical prerequisite before ever attempting to measure the strength of MadChess. I combine these games with games from a gauntlet tournament pitting a particular version of MadChess against ten opponents of strength in the range +/- 100 Elo.

Your comment above suggests this is unnecessary. I’m struggling to understand what I’m missing here. Because my technique has produced private Elo estimates of MadChess strength that highly correlate with CCRL Elo measurements when I release a public version of my engine. (I use Ordo and anchor four engines to their CCRL Elo rating.) Interested to hear your thoughts.

mvanthoor · Post by **mvanthoor** » Sun Mar 28, 2021 9:47 pm

emadsen wrote: ↑Sun Mar 28, 2021 7:35 pm Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?

When I bought a new computer three years ago, the first thing I did was run tournaments among four classes of engines (separated by estimated strength), estimate Elo, then run more tournaments among the lower half of a class against the higher half of the lower class to ensure good cross-pollination of games and eliminate isolated groups of engines. See my Tournaments page for details.

I viewed this as a critical prerequisite before ever attempting to measure the strength of MadChess. I combine these games with games from a gauntlet tournament pitting a particular version of MadChess against ten opponents of strength in the range +/- 100 Elo.

Your comment above suggests this is unnecessary. I’m struggling to understand what I’m missing here. Because my technique has produced private Elo estimates of MadChess strength that highly correlate with CCRL Elo measurements when I release a public version of my engine. (I use Ordo and anchor four engines to their CCRL Elo rating.) Interested to hear your thoughts.

I was having the same thoughts, basically. It is possible that A defeats B, B defeats C, but still C can defeat A. That ratings don't have to turn out the same, even if exepected, can be seen in the new engine BitGenie. In my own test of this new engine, it cored exactly 50% against Rustic Alpha 2, which would suggest that the engines would end up at the same spot in the rating list, but Rustic is 30 Elo stronger on CCRL. Being in a rating list also means that, if an engine's rating drops or increases, and your engine played against this engine, your engine's rating also change.

The one thing I was wondering about is:
- Did someone from CCRL grab the buggy binary and start a test with it? That would explain the 60 Elo difference between the result and my expectations. (Fixing the bug increased the engine's strength by 60 Elo.)
- Are my expectations skewed, because of the way I test the engine? Sven says no, you say yes (or, probably).

emadsen · Post by **emadsen** » Sun Mar 28, 2021 10:39 pm

mvanthoor wrote: ↑Sun Mar 28, 2021 9:47 pm I was having the same thoughts, basically. It is possible that A defeats B, B defeats C, but still C can defeat A. That ratings don't have to turn out the same, even if exepected, can be seen in the new engine BitGenie. In my own test of this new engine, it cored exactly 50% against Rustic Alpha 2, which would suggest that the engines would end up at the same spot in the rating list, but Rustic is 30 Elo stronger on CCRL.

Yes, this is very common. Engine A has eval knowledge that engine B doesn't, or "sees" moves engine B doesn't due to differences in search reductions, and it beats up engine B. However engine C also has this particular eval knowledge, and / or also "sees" the same moves as engine A due to similar search reduction logic, and is faster or has a more efficient cache implementation so it beats up engine A. Meanwhile, engine B leverages a different exploit to consistently defeat engine C. These non-transitive strength relationships occur frequently. That's why it's necessary to create a diverse pool of chess engines in order to improve the accuracy of engine strength measurements.

We don't need all-play-all tournaments. I can accurately predict the result of MadChess playing games against Stockfish. It will lose all the games. I don't need to include those games in my engine database. However, among engines within a couple hundred Elo points of each other, all-play-all tournaments reduce the error margins of engine strength estimates.

mvanthoor wrote: ↑Sun Mar 28, 2021 9:47 pmDid someone from CCRL grab the buggy binary and start a test with it? That would explain the 60 Elo difference between the result and my expectations.

This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.

mvanthoor · Post by **mvanthoor** » Sun Mar 28, 2021 10:55 pm

emadsen wrote: ↑Sun Mar 28, 2021 10:39 pm This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.

I've thought about it, and probably will do so in the future. I've asked the CCRL-testers if it's possible to remove the games from the DB and retest the engine, or test it again with a fairly large tournament to "fix" the rating as much as possible. If this isn't possible or undesired, then I'll just leave it be. I assume the testers will at least replace the binary with the current one for future games against other engines, so the rating should be fixed by itself in time.

When I release Alpha 3, it will have this fix (+ the improvements I intend it to have), and the Elo-jump will then be bigger and correct again. Alpha 2 will then be in the rating-list with a somewhat lower rating.

The binary was up for about half an hour before I removed the release during our TT discussion.

Ah well; the Alpha releases are not that important, because I'm still adding basic stuff. Alpha 3 will have Killer+History, and maybe Aspiration Windows + PVS, but that could be Alpha 4, depending on how much Elo Killer+History brings. I don't yet know when I'll consider the engine "done" and ready to drop the Alpha part. This could be the version with the tuned eval, or the version with the tapered+tuned eval. I'll start on the tuning after PVS.

The worst that could happen is that Alpha 2 is in the list at a somewhat lower rating than expected; lesson learned.

lithander · Post by **lithander** » Sun Mar 28, 2021 11:10 pm

emadsen wrote: ↑Sun Mar 28, 2021 7:35 pm Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?

I think you can skip the part where these engines play each other if you know their rating *and* if you use these ratings as anchors when you process your PGNs. In ordo for example you can pass a CSV file via the -m parameter. (it's explained in the manual)

emadsen wrote: ↑Sun Mar 28, 2021 10:39 pm This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.

I know of build systems that increase the number with each build automatically. Or that will include the revision of the version control system in the build. Definitely a best practice.

But with my private project and without a build system I would have to remember to increase the version manually with every source push. So far I only changed the version number when making a build and would certainly inc the version whenever I release binaries. But that wouldn't stop someone from checking out a specific revision from git *after* important features have been added but *before* I tagged the next version and it would play under the previous version but much stronger.

Do you increase the version with each push manually or do you have an automatic system?

Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic