BayesElo recompiled and faster

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

mar
Posts: 2654
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: BayesElo recompiled and faster

Post by mar »

Modern Times wrote: Mon Dec 16, 2024 12:28 pm I tried it on a 1.2M game database. Identical output and around 15% faster than your January x64 compile of the existing code.
ok that looks good, not the biggest improvement ever but at least there's something
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesElo recompiled and faster

Post by Modern Times »

Yes, every bit is worth having.
mar
Posts: 2654
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: BayesElo recompiled and faster

Post by mar »

ok - uploaded a new version with some more autovec hints that measurably speed up mm,
link is the same, may need a browser refresh

(got ~12% total speedup for step 4 of CCRL update on a small sample set (year 2024) - might be worth a shot)
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesElo recompiled and faster

Post by Modern Times »

I think the next possible "win" is for Graham, Basti and Gabor to investigate the 7-zip version and options.

I recall at the outset that a balance had to be struck between speed, size of the compressed files, and memory usage

Currently the parameters are:

m0=ppmd: This sets the compression method to PPMd
mem=128m: This sets the amount of memory to use for compression to 128 megabytes.
o=7: This sets the compression level to 7, which is a higher compression setting.

With the size of the archives generated, there is a trade-off - it is probably possible to create the archives faster, but if they are bigger then there is a cost in upload time to the server, and more server disk space used, although the latter is no issue.

Using the latest version of 7-zip may also help.

The download options particularly on the 4015 site are extensive, hundreds of files. So there could be some improvements to be had. First thing for them to experiment with is to use the latest 7-zip version with the defaults and see how that compares.
mar
Posts: 2654
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: BayesElo recompiled and faster

Post by mar »

as for the compression step: might be worth a shot.
I succeeded in paralellizing the pgn compression step so it's much faster now (scales with amount of cores),
but the upload will still suffer. investigating better compression options/recent 7z might be worth a try.
another option would be to create say a tar or another archive so that the upload would only be 1 huge file that would
be further unpacked on the server

I only have a small subset of years (2023 and 2024) so I don't even know which step is the slowest.
but going from 1=>2 years made step 4 (rating history) 4x slower, i.e. quadratic wrt either number of engines or games;
judging from bayeselo code this should only depend on engines times opponents, but I may be wrong
best would be if someone could make mm incremental somehow for this step, but that's way above my paygrade

I'm currently toying with using different mm parameters for the history graphs, but I'd have to fix some offsets first to match to the rating list. doing mm instead of mm 1 1 gave a nice 2.2x total speedup (in reality faster because step 4 includes graph generation and other steps that take some time)
I'm even thinking if those graphs are really that useful, perhaps dropping them altogether would speed things up a lot, food for thought

I tried to parallelize the sorting step, but it seems some/many parts depend on each other so out of luck, might be able to squeeze
something out of it but I'd have to figure out which parts can be coupled and run in parallel

we'll see, either way the whole process is way more involved and complex than I thought
(I'm learning to appreciate the amount of work that went into both the development and also into building each update -
we typically only care about final elo gains of our engines)
and many parts depend on each other so they'd be hard/impossible to parallelize, unfortunately
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesElo recompiled and faster

Post by Modern Times »

I got rid of
mem=128m:
o=7:

And it compresses faster for sure.

m0=ppmd needs to be retained - it is the best option for text files apparently, which pgn files are.
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesElo recompiled and faster

Post by Modern Times »

mar wrote: Sun Dec 22, 2024 12:57 pm we'll see, either way the whole process is way more involved and complex than I thought
(I'm learning to appreciate the amount of work that went into both the development and also into building each update -
we typically only care about final elo gains of our engines)
Yes, a huge amount goes on.

For example, originally Kirill just had to parse chessbase format pgns. Then several other GUIs came along, sometimes with a slightly different pgn format, and not to mention Arena pgns where some testers had the "save mainline" option enabled in the early days, resulting in truly awful pgns.
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesElo recompiled and faster

Post by Modern Times »

mar wrote: Sun Dec 22, 2024 12:57 pm we'll see, either way the whole process is way more involved and complex than I thought
(I'm learning to appreciate the amount of work that went into both the development and also into building each update -
we typically only care about final elo gains of our engines)
The CCRL scripts came from work Kirill did on his own (long discontinued) website

https://kirill-kryukov.com/chess/kcec/
mar
Posts: 2654
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: BayesElo recompiled and faster

Post by mar »

oh nice, good to know what KCEC actually stands for and where the scripts originate from :) good stuff

meanwhile I succeeded at parallelizing step 4 (rating history) with some extra magic, now getting about 2x speedup total for that step (including the latest bayeselo avx2 compile), unfortunately couldn't spawn more than 3 workers for that step for some reason
(so only 4 threads are crunching in parallel since the calling thread helps), but still better than nothing.
I've already shared the changed scripts with Graham, hopefully this might chop off some time off the process, along with fully parallelized pgn compression.
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: BayesElo recompiled and faster

Post by Modern Times »

Sounds good.

The blitz update is already fairly reasonable - less than an hour and a half for me and Gabor, but the 4015 one is where a reduction will be really useful. I can't recall how long that took last time I ran it, maybe close to 3 hours.