I think you can skip the part where these engines play each other if you know their relative ratings (from CCRL) *and* if you use these ratings as anchors when you process your PGNs. In ordo for example you can pass a CSV file via the -m parameter. (it's explained in the manual)emadsen wrote: ↑Sun Mar 28, 2021 7:35 pm Sven, can you elaborate on this? My initial reaction is I don’t agree with you. A rating is meaningful only in relation to the ratings of other engines. It has no intrinsic significance. If that relationship is not well established due to lack of play against a variety of opponents, or lack of play of opponents amongst themselves, then the rating cannot be trusted. But I have not conducted a formal study of the matter. If you have, would you please share your results?
I know of build systems that increase the number with each build automatically. Or that will include the revision of the version control system in the build. Definitely a best practice.emadsen wrote: ↑Sun Mar 28, 2021 10:39 pm This reinforces the argument that one never release a software update without incrementing the version number. In my opinion, the onus is on the software publisher- in this case, you as the chess engine author- to release a new version with the bug fix and with an incremented version number in the download file and in the id response to the uci command. In my opinion, it's too much to ask CCRL to track this (two binaries with same version number but different code). Just my opinion- I'm not speaking for them.
But with my private project and without a build system I would have to remember to increase the version manually with every source push. So far I only changed the version number when making a build and would certainly inc the version whenever I release binaries. But that wouldn't stop someone from checking out a specific revision from git *after* important features have been added but *before* I tagged the next version and it would play under the previous version but much stronger.
Do you increase the version with each push manually or do you have an automatic system?
Well, that's still bad because people will look for a engine rated ~1800, play Rustic Alpha 2 (the fixed build) which then plays 60 ELO stronger than advertised and draw the wrong conclusions. (e.g. their engines will appear weaker to them then they really are) So I think it's still worth to be investigated.