Yet, they are correct...Teemu Pudas wrote:Because hard drive manufacturers are liars. 1GB = 10^9 bytes according to them.
Miguel
Moderators: hgm, Rebel, chrisw
Yet, they are correct...Teemu Pudas wrote:Because hard drive manufacturers are liars. 1GB = 10^9 bytes according to them.
The ability of 7z (LZMA internally, I think) to compress this TBs is stunning. Unfortunately, when it is compressed in blocks, it loses that spectacular ratio. Still, is the best by a clear margin. Fitting in a DVD is very appealing for storage and/or distribution. How are Nalimov TBs distributed by companies? in two DVDs?beachknight wrote:I have zipped all 145 files (38,4 GB) using 7-zip with Ultra & Maximum
schemes. Resulting compressed size is around 3,9 GB and
compression ratio is around 10 %.
It seems that 1 normal DVD is sufficeient for storing Gaviota TBs.
Best,
Dont know exactly. But I never saw a single dvd that contains allmichiguel wrote:The ability of 7z (LZMA internally, I think) to compress this TBs is stunning. Unfortunately, when it is compressed in blocks, it loses that spectacular ratio. Still, is the best by a clear margin. Fitting in a DVD is very appealing for storage and/or distribution. How are Nalimov TBs distributed by companies? in two DVDs?beachknight wrote:I have zipped all 145 files (38,4 GB) using 7-zip with Ultra & Maximum
schemes. Resulting compressed size is around 3,9 GB and
compression ratio is around 10 %.
It seems that 1 normal DVD is sufficeient for storing Gaviota TBs.
Best,
Miguel
Companies also include some 6 pieces tablebases for example I found this in the old Convekta store: "The most complete collection of Nalimov tablebases on 12 DVDs of double density (100 GB in total)"michiguel wrote: How are Nalimov TBs distributed by companies? in two DVDs?
Miguel
I had some years ago communication with matt mahoney. His responses didn't impress me a lot (understatement). What he produced to 'top the charts' is basically something that's not usable for EGTB compression very well, he just cut'n pasted together what was needed to get the job done there based upon statistics; as an all round compressor that's useless. Adapting a form of PPM into something that works bucket sized and within 1 bucket has some overhead to allow lookups in O ( log n ) of the right compressed data, that is going to be far more succesful than whatever he has coded.michiguel wrote:For what I read, LZMA is LZ77 + Arithmetic coding, so I guess that there is literature for both. From all the links I surveyed to collect free libraries, I may conclude that it may be a good idea to ask this guyPradu wrote:Are there any good resources describing LZMA? I realize that there's already some C code, but I don't think it will be that useful to learn the algorithm.michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp)...
http://www.mattmahoney.net/
It is a professor expert in data compression and runs an impressive benchmark page:
http://www.mattmahoney.net/dc/text.html
Miguel
Hello Miguel,michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.
The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes
All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).
The default will be LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).
I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.
So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.
Miguel
Code: Select all
miguel@avefenix:/media/bigdisk/dev/tbcp$ ./tbcp -h Compression/decompression utility for Gaviota Table Bases. It uses different free compression algorithms. More information at http://sites.google.com/site/gaviotachessengine/ quick example: tbcp gtb compresses all the endgame table base files in the folder "gtb" with the default compression scheme 2, and no deletion of the source files usage: tbcp [-OPTION]... [folder] -h give this help -v give version number and exits -c compress (default) -d decompress -k keep source files, after compression/decompression (default) -r remove source files, after compression/decompression -V verify encoding/decoding on the fly -q quiet (no output except error messages) -i <file> work on the given individual file, rather than a folder -s # sets compression scheme (0-2, default=2) -n # operates on up to 3-, 4- o 5-piece tablebases (default=5) Copyright (c) 2009-2010 Miguel A. Ballicora There is NO WARRANTY of any kind
Apparently, when I use LZMA (the heart of 7z) it is a bit better than Kadatch = 6.5 GiB vs 7.2 GiB.diep wrote:Hello Miguel,michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.
The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes
All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).
The default will be LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).
I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.
So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.
Miguel
Code: Select all
miguel@avefenix:/media/bigdisk/dev/tbcp$ ./tbcp -h Compression/decompression utility for Gaviota Table Bases. It uses different free compression algorithms. More information at http://sites.google.com/site/gaviotachessengine/ quick example: tbcp gtb compresses all the endgame table base files in the folder "gtb" with the default compression scheme 2, and no deletion of the source files usage: tbcp [-OPTION]... [folder] -h give this help -v give version number and exits -c compress (default) -d decompress -k keep source files, after compression/decompression (default) -r remove source files, after compression/decompression -V verify encoding/decoding on the fly -q quiet (no output except error messages) -i <file> work on the given individual file, rather than a folder -s # sets compression scheme (0-2, default=2) -n # operates on up to 3-, 4- o 5-piece tablebases (default=5) Copyright (c) 2009-2010 Miguel A. Ballicora There is NO WARRANTY of any kind
How does your compressor compare sizewise to Kadatch?
LZMA is what 7z uses!
The algorithms you quote seem like an 80s chess program to me.
Yes, I am plugging available source code (some of them needed to be tweaked). I coded myself Huffman coding just for fun, that was it. I am not intending to be perfect, just to have something good enough. I think that LZMA is not worst that Nalimov/Kadatch, and that is ok for me.
Compression is an entire field of course; a good todays compressor is really hard to build and fulltime work, just like a strong chessprogram is.
Vincent
Ouch! The good thing is memory and HD space is getting cheaper, so a huge compression might not be so critical.diep wrote:I had some years ago communication with matt mahoney. His responses didn't impress me a lot (understatement). What he produced to 'top the charts' is basically something that's not usable for EGTB compression very well, he just cut'n pasted together what was needed to get the job done there based upon statistics; as an all round compressor that's useless. Adapting a form of PPM into something that works bucket sized and within 1 bucket has some overhead to allow lookups in O ( log n ) of the right compressed data, that is going to be far more succesful than whatever he has coded.michiguel wrote:For what I read, LZMA is LZ77 + Arithmetic coding, so I guess that there is literature for both. From all the links I surveyed to collect free libraries, I may conclude that it may be a good idea to ask this guyPradu wrote:Are there any good resources describing LZMA? I realize that there's already some C code, but I don't think it will be that useful to learn the algorithm.michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp)...
http://www.mattmahoney.net/
It is a professor expert in data compression and runs an impressive benchmark page:
http://www.mattmahoney.net/dc/text.html
Miguel
You can study 7-zip for example; it has total superior compression of mahoney's thing. This where it isn't even number 100 or so on the all time list of supercompression. A new zealand compressor by far has the best compression for Diep's WDL egtb's, however compression time of those compressors is like 24 hours single core for 2 EGTBs in total 200MB uncompressed, now that wouldn't be the problem of course wasn't it that the decompression time is also 24 hours.
The problem I found is most of PPM packages are GPL and I was looking for BSD or MIT type licenses. There are gazillions of GPL compressors everywhere.
That total hammered away all other compressors. It's not a coincidence of course that PPM is so superior for EGTBs, as EGTBs in fact are multi-dimensional table repesentations anyway.
Vincent