Edmund wrote:Say I have a collection of games I want to store in a database using as little memory as possible.
Because the move-choice in the opening is very restricted I would like to save the first couple of moves by linking into a table with n entries.
Is there any algorithm that would tell me how to optimally populate the table with opening moves?
It sort of reminds me on huffman-encoding, but it is slightly different.
This is all 80s what was posted so far.
You can go 2 ways. Either use a clever format that stores things in not too many bits and be happy with that, or really search for compression algorithms.
The known compression algorithms of course hammer down size of such database in a manner that no clever format will be able to do.
For anything i did do in computerchess, especially bittables, but also games, the total superior algorithm is PPM*, with the * (asterix) reserved for all sorts of 'variations' onto the PPM algorithm. It's about the multidimensional compression idea of PPM that kicks so much butt. Litterally following it never is a good idea in fields like this, yet it's about the spirit and open mind.
Now realize a big winner for many modern compressors is sophisticated and huge dictionary sizes. This is tricky to use as it means you need to decompress everything starting at the start.
The bad news is that there isn't too many compressors that can decompress blockwise and still have a good compression.
Yet if you combine that with PPM* then you'll beat the hell out of anything compressionwise in computerchess, be it games ,be it EGTBs.
By the way in such supercompression attempts, don't take a look at the 'compression list first'. That's total inferior for my EGTBs. Usually a yank who with some parameter tuning can get the very tiny benchmark set real tiny and when i tried it at the same settings, then it was factors bigger the resulting compressed Diep files.
Supercompression versus normal fast compression is a huge difference in size for huge databases. It's really really a huge difference in size. Factor 4 is quite common difference.
I had one of the guys hosting a supercompression website (i might remind the chaps name if you're interested) comparing all compressors at a few smaller Diep EGTBs (uncompressed a 300MB or something). I was quite shocked seeing how an oldie program out of New Zealand kicked the hell out of anything i thought was good
Realize however the big problem. All those compressors aren't that much faster to decompress than they comopress.
If it takes 24 hours to compress at 1 core say a 200MB, it'll take also half a day to decompress that 200MB.
So it would be quite a challenge to show up with something good that's not doing some braindead runlength encoding.
The idea to order the moves a tad is actually not so stupid. That'll get you magnificent compression even with relative braindead compression algorithms.
What's relative fast is Kadatch compressor. If you ship him an email maybe he wants to contribute his compressor. It's using old methods, yet it's very very fast and it can work blockwise, so unpack just a block of 64KB for example which is probably what you want, without need to decompress everything else, and it kicks the hell out of old slow junk like gzip,bzip2 and such.
The interesting thing would be build a PPM compressor that takes huge time to compress and stores things in a simple to decompress manner, of course that'll eat some overhead, but still you end up with 25% the size of most other compressors and half the size of 7-zip.
Note that 7-zip is doing a pretty good job, yet of course it's using a dictionary and i bet that's a problem for you
Note that compression is a similar field like computerchess: you can get anything you want customized as you like it, as long as you are prepared to pay. If you want something that's world c lass it'll cost not that much more, you'll kick butt thoug hwith your code then. You get what you pay for.
Want worlds strongest engine? No problem. Just pay someone a salary. Peanuts. Big range of programmers can deliver it to you then, just know whom you pick.
Usually they pick the guy who never invented an algorithm for that - bad idea. Same with compression...
Without payment be happy with years 80 algorithms combined a tad and except for Kadatch i don't know any compressor that can compress buckets very well. Yo ucan try adapt 7-zip to it, but i'm not sure that's a good idea, as when losing the dictionary i bet it'll suck as it compresses things fast.
If you want to roll something yourself realize a PPM type algorithm that works bucketsize and that can decompress fast meanwhile take huge time to compress - that would be something you can really sell well to companies storing data...