michiguel wrote:diep wrote:michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.
The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes
All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).
The default will be
LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).
I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.
So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.
Miguel
Code: Select all
miguel@avefenix:/media/bigdisk/dev/tbcp$ ./tbcp -h
Compression/decompression utility for Gaviota Table Bases.
It uses different free compression algorithms. More information at
http://sites.google.com/site/gaviotachessengine/
quick example: tbcp gtb
compresses all the endgame table base files in the folder "gtb"
with the default compression scheme 2, and no deletion of the source files
usage: tbcp [-OPTION]... [folder]
-h give this help
-v give version number and exits
-c compress (default)
-d decompress
-k keep source files, after compression/decompression (default)
-r remove source files, after compression/decompression
-V verify encoding/decoding on the fly
-q quiet (no output except error messages)
-i <file> work on the given individual file, rather than a folder
-s # sets compression scheme (0-2, default=2)
-n # operates on up to 3-, 4- o 5-piece tablebases (default=5)
Copyright (c) 2009-2010 Miguel A. Ballicora
There is NO WARRANTY of any kind
Hello Miguel,
How does your compressor compare sizewise to Kadatch?
Apparently, when I use LZMA (the heart of 7z) it is a bit better than Kadatch = 6.5 GiB vs 7.2 GiB.
The algorithms you quote seem like an 80s chess program to me.
LZMA is what 7z uses!
BTW, I am testing what is open source with a BSD type license. The idea is to release the probing code so everybody can use it the tables.
Compression is an entire field of course; a good todays compressor is really hard to build and fulltime work, just like a strong chessprogram is.
Vincent
Yes, I am plugging available source code (some of them needed to be tweaked). I coded myself Huffman coding just for fun, that was it. I am not intending to be perfect, just to have something good enough. I think that LZMA is not worst that Nalimov/Kadatch, and that is ok for me.
Miguel
Oh wait i made a mistake, as i assumed lzma not usable because of the dictionary it used. Note Kadatch is just some 80s techniques programmed very well by him; it's really a fast compressor. Yet only works 32 bits and works at page sizes that are uncompressed whereas i like to have it work different, see below.
Kadatch doesn't use those big dynamic dictionaries, yet it compresses factors worse (factor 3 or so) than possible with a generic compressor.
Realize those dictionaries really compress most of all. I have 1 compressor from an english student here which just using a good dictionary at lightning speed gets a real good compression.
7-zip here using its default dictionary is in fact worse than that.
However to get 1 byte out of the entire EGTB you need to uncompress the entire thing, that is a hell of a disadvantage.
What i'm looking for is in fact a compressor that can compress to bucket sizes; so the buckets that are compressed have n bytes max, not the uncompressed data being n bytes; this because some EGTBs here get a factor 2000 compression or so. Of course when decompressing we don't want to store it in the RAM at all, we just want the byte number we look for.
The best compression is a special PPM compressor as i posted before, so i consider that most interesting.
Mind if i email you some diep EGTBs to you, to see how your compressor is doing at it?
note i'm at messengers under diepchess (skype,yahoo,aim etc).
Vincent