Squash anyone?

syzygy · Post by **syzygy** » Sun Feb 07, 2016 5:30 pm

mar wrote:I'm not sure but I think (I'm sure Ronald will correct me if this claim is wrong) that Syzygy uses entropy coding only (note: JPEG/Vorbis also don't use LZ)

It does use a dictionary, but it is a static dictionary for the whole table. On top of that I use Huffman coding.

LZ usually starts with an empty dictionary that is constructed as a block is being compressed, which means that it does not work well with small block sizes.

My dictionary compression is based on Re-Pair compression described in this paper by Larsson and Moffat. I don't use the algorithm they describe for finding pairs as it would be too memory intensive.

The basic idea of Re-Pair is to iteratively replace frequently recurring pairs of consecutive symbols with new symbols. This leads to a sort of generalised RLE.

mar · Post by **mar** » Mon Feb 08, 2016 12:20 am

I see, thanks for the link. The idea is fairly simple - I think it should be actually better than just RLE which is limited to sequential runs.

I haven't read the paper thorougly yet but my guess
is one needs some criteria to stop pairing, most certainly if a pair has frequency exactly one but probably sooner.

I saw the use of static dictionaries elsewhere, I think on Charles Bloom's blog.
Their motivation was a bit different though, to save memory.

LZ requires at least as many bytes as dictionary size for both compression and decompression

but considering tens of thousands of channels used for realtime (de)compression - that would require a tons of RAM,

while a pre-computed static dictionary (for typical data transferred) needs to store the dictionary only once

I think static dictionaries sort of resemble vector quantization, except they are lossless

A bit of OT but I find this topic quite interesting:
I experimented a bit some time ago with Burrows-Wheeler transform and move-to-front before deflating, but to my surprise the result was always worse than plain deflate, even on text files
Even more stunning that bzip2 doesn't use LZ and can still compress better than deflate (especially on text files)
While reordering/transforms don't help with plain entropy coding, they can help LZ.
Some executable compressors for example convert relative x86 jumps/calls to absolute addresses to improve compression ratio and so on

Dann Corbit · Post by **Dann Corbit** » Mon Feb 08, 2016 1:27 pm

mar wrote:Well, from the readme:
Code: Select all
Squash is an abstraction library which provides a single API to access
many compression libraries, allowing applications a great deal of
flexibility when choosing a compression algorithm, or allowing a
choice between several of them.
So it seems to me it's a wrapper allowing you to choose from many external codecs
Question is why would one want to drag in extra dependency (GTB already has such abstraction if I'm not mistaken)

LZ4 decompresses blazingly fast (there is a reason why it's used in some OSes for realtime disk compression), but it comes at a price: compression ratio is worse than say zlib (LZ4 is plain LZ coder, no entropy coding)
Now GTB uses LZMA as default, it decompresses relatively fast but still much slower than LZ4 (and also slower than zlib = or PK's (RIP) inflate if you want).
I'm not sure but I think (I'm sure Ronald will correct me if this claim is wrong) that Syzygy uses entropy coding only (note: JPEG/Vorbis also don't use LZ)
The crucial thing for LZ compressor is matching; you can improve a lot by doing lazy matching (delaying the match because you could find a better one later)
or even "optimal parsing" which tries all possible matches. Of course the latter is VERY slow.
But 7zip does this and it can produce deflated zip files ~10% smaller than zlib (which does lazy matching), while you can still decompress it with any compliant decompressor.

So it's a tradeoff: smaller size or faster decompression, if IO is a bottleneck then smaller is better; but no matter what decompression is never free

The HC version of Lz4 also has excellent compression and decompresses very fast.

I do not know that it will work well with chess. But it seems to me that it is worth an experiment.

In the case of Gaviota GTB files, they compressor and decompressor are replaceable. That is why I suggested that file format.

The thing that is interesting about it is the abstraction layer. Since it is already built, it seems to me it would be simple enough to bolt it on and then try every compressor in the book, one at a time.

I guess that some of them may have nice properties.

It is also true that a compressor tailored for chess endgame database files may take special advantage both of the nature of the data and the chosen algorithms.

But I suspect that in creating chess database files, not every type of compression was actually tested. It is possible that some untried scheme will work exceptionally well.

Cardoso · Post by **Cardoso** » Wed Dec 07, 2016 8:54 pm

syzygy wrote:
Dann Corbit wrote:Some (such as LZ4 HC) compress very slowly but get a very good compression ratio and decompress with blazing speed.

That sort of thing is IDEAL for chess endgame database systems.
But many of these compression algorithms only work well with very large block sizes, which is not ideal for endgame databases that need to be probed during the search.

Another thing is that when storing a chess endgame database, it is possible to leave out a lot of information that can be quickly recomputed. A dedicated compression algorithm can make better use of this fact than a general purpose compressor.

Ronald,
could you please provide the final compression ratio for the Syzygy TBs?
Just to have an idea of the compression efficiency of your algorithm.
Also could you please elaborate on how your compression algorithm can make better use of leaving out information that can be quickly recomputed?
My WDL checkers database is compressed with a RLE scheme.
In my DTM checkers database I use another compressor.
Like you I prefer to decompress until I find the key index every time I probe.
I use a very fast and pure LZ decompressor (no Huffman), with 4K block sizes for fast decompression.
But I'm having poor compression ratio: 12% of the uncompressed size.
I set as "Don't Cares" positions with captures for the side to move and for the side not to move.
If your compression ratio is better then I'm thinking in doing something like you are doing. At this precise moment I'm compressing my 8man DTM checkers database. The uncompressed size is about 460GB, I'm expecting about 60GB for the compressed size, now this is a bit poor as a compression ratio. So could you please explain how you compression algorithm takes advantage of information easily recomputed?

best regards,
Alvaro

Dann Corbit · Post by **Dann Corbit** » Wed Dec 07, 2016 8:55 pm

LZ4HC gets better compression and still decompresses very fast.

mar · Post by **mar** » Wed Dec 07, 2016 9:53 pm

Cardoso wrote:I use a very fast and pure LZ decompressor (no Huffman), with 4K block sizes for fast decompression.
But I'm having poor compression ratio: 12% of the uncompressed size.

AFAIK Ronald uses re-pair plus Huffman.

Anyway, LZ has problems with small blocks, it needs time to warm up.
So first thing to try is to double block size and see how much it improves the ratio.
Next thing (unless you already do) is to use lazy matching (assuming you use your own LZ compressor)
- i.e. whenever you find a match, it's possible that you find a longer match at next position.

mar · Post by **mar** » Thu Dec 08, 2016 1:02 am

you may also try to train a small (or large, depending on how much distant matches you can encode) buffer to seed LZ (anything is better than nothing),
zstd already does this in "dictionary mode", see https://github.com/facebook/zstd;
say having a seed dictionary for each piece configuration

here the ratio is 3x better on small JSON files (~6k), which is not something you'd get by using a better compression algorithm, YMMV of course

btw zstd is a masterpiece and I consider it the best open source general
purpose compression algorithm. if you add IO cost to the equation,
assuming zstd with dictionary would beat LZ4 (same author) say 3x in compression ratio (a wild guess),
then it simply has to be faster than LZ4

Cardoso · Post by **Cardoso** » Thu Dec 08, 2016 3:19 am

Thanks Martin for your input.
First I'm no compression expert.
I tried 8K blocks and compression ratio improved a little bit but decompression speed came down considerably.
You see I allways decompress in my engine and stop decompressing at a key index and return the byte at that uncompressed index.
That's why I need very small blocks to keep decompression speed high, I would even go for 512bytes blocks, but compression ratio would be bad.
This is like Syzygy but I didn't mmaped at the time, but now I do.
I contacted Zstandard author and he told me the dic training code uses divsufsort.c (a prefix sorting library) and is limited to 2GB.
But what is worse is that in Zstandard the decompression code doesn't stop at a target index.
But LZ4 does stop decompression at a target index but compression ratio is poor for me (even the HC variant).
So both Zstandard and LZ4 almost work, but no cigar.
My WDL RLE scheme decompresses at almost 2 million probes per second per core.
Unfortunately for my DTM variant it is below 500k probes per second and compression ratio is way worse.

best regards,
Alvaro

Squash anyone?

Re: Squash anyone?

Re: Squash anyone?

Re: Squash anyone?

Re: Squash anyone?

Re: Squash anyone?

Re: Squash anyone?

Re: Squash anyone?

Re: Squash anyone?