Gaviota TBs, compression schemes

michiguel · Post by **michiguel** » Fri Jan 01, 2010 2:24 am

Teemu Pudas wrote:Because hard drive manufacturers are liars. 1GB = 10^9 bytes according to them.

Yet, they are correct...

Miguel

ernest · Post by **ernest** » Fri Jan 01, 2010 10:00 pm

By the way, my Corsair "Flash Voyager GT" 8GB stick, on which I have put my Nalimov 3-4-5 man,
has a capacity of 8 271 753 216 bytes (7.70 GB).
This is looking at the "properties" of that "F drive".

So go figure...

beachknight · Post by **beachknight** » Sat Jan 02, 2010 11:04 am

I have zipped all 145 files (38,4 GB) using 7-zip with Ultra & Maximum
schemes. Resulting compressed size is around 3,9 GB and
compression ratio is around 10 %.

It seems that 1 normal DVD is sufficeient for storing Gaviota TBs.

Best,

michiguel · Post by **michiguel** » Sun Jan 03, 2010 10:03 pm

beachknight wrote:I have zipped all 145 files (38,4 GB) using 7-zip with Ultra & Maximum
schemes. Resulting compressed size is around 3,9 GB and
compression ratio is around 10 %.

It seems that 1 normal DVD is sufficeient for storing Gaviota TBs.

Best,

The ability of 7z (LZMA internally, I think) to compress this TBs is stunning. Unfortunately, when it is compressed in blocks, it loses that spectacular ratio. Still, is the best by a clear margin. Fitting in a DVD is very appealing for storage and/or distribution. How are Nalimov TBs distributed by companies? in two DVDs?

Miguel

beachknight · Post by **beachknight** » Mon Jan 04, 2010 8:19 am

michiguel wrote:
beachknight wrote:I have zipped all 145 files (38,4 GB) using 7-zip with Ultra & Maximum
schemes. Resulting compressed size is around 3,9 GB and
compression ratio is around 10 %.

It seems that 1 normal DVD is sufficeient for storing Gaviota TBs.

Best,
The ability of 7z (LZMA internally, I think) to compress this TBs is stunning. Unfortunately, when it is compressed in blocks, it loses that spectacular ratio. Still, is the best by a clear margin. Fitting in a DVD is very appealing for storage and/or distribution. How are Nalimov TBs distributed by companies? in two DVDs?

Miguel

Dont know exactly. But I never saw a single dvd that contains all
3-4-5 Nalimovs. Probably 2 dvd for all 3-4-5 and a couple 6 men.

Best,

PS1: I assume that compressed Gaviotas will be around 6 - 8 GB.,
ie max. 2 DVD.

PS2: When do you release new Gaviota that can process compr. G?
I must delete the uncompressed ones soon.

tano-urayoan · Post by **tano-urayoan** » Mon Jan 04, 2010 8:52 am

michiguel wrote: How are Nalimov TBs distributed by companies? in two DVDs?

Miguel

Companies also include some 6 pieces tablebases for example I found this in the old Convekta store: "The most complete collection of Nalimov tablebases on 12 DVDs of double density (100 GB in total)"

From this store: http://shop.chesscafe.com/item.asp?PID=373
This Nalimov DVD includes the most complete version of the 3-4-5 piece NALIMOV Endgame Tablebases (290 files).

I suppose they use a double layered dvd for this.

diep · Post by **diep** » Mon Jan 04, 2010 4:48 pm

michiguel wrote:
Pradu wrote:
michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp)...
Are there any good resources describing LZMA? I realize that there's already some C code, but I don't think it will be that useful to learn the algorithm.
For what I read, LZMA is LZ77 + Arithmetic coding, so I guess that there is literature for both. From all the links I surveyed to collect free libraries, I may conclude that it may be a good idea to ask this guy
http://www.mattmahoney.net/
It is a professor expert in data compression and runs an impressive benchmark page:
http://www.mattmahoney.net/dc/text.html

Miguel

I had some years ago communication with matt mahoney. His responses didn't impress me a lot (understatement). What he produced to 'top the charts' is basically something that's not usable for EGTB compression very well, he just cut'n pasted together what was needed to get the job done there based upon statistics; as an all round compressor that's useless. Adapting a form of PPM into something that works bucket sized and within 1 bucket has some overhead to allow lookups in O ( log n ) of the right compressed data, that is going to be far more succesful than whatever he has coded.

You can study 7-zip for example; it has total superior compression of mahoney's thing. This where it isn't even number 100 or so on the all time list of supercompression. A new zealand compressor by far has the best compression for Diep's WDL egtb's, however compression time of those compressors is like 24 hours single core for 2 EGTBs in total 200MB uncompressed, now that wouldn't be the problem of course wasn't it that the decompression time is also 24 hours.

That total hammered away all other compressors. It's not a coincidence of course that PPM is so superior for EGTBs, as EGTBs in fact are multi-dimensional table repesentations anyway.

Vincent

diep · Post by **diep** » Mon Jan 04, 2010 4:58 pm

michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.

The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes

All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).

The default will be LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).

I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.

So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.

Miguel
Code: Select all
miguel@avefenix&#58;/media/bigdisk/dev/tbcp$ ./tbcp -h

Compression/decompression utility for Gaviota Table Bases.
It uses different free compression algorithms. More information at
http&#58;//sites.google.com/site/gaviotachessengine/

quick example&#58; tbcp gtb
 compresses all the endgame table base files in the folder "gtb"
 with the default compression scheme 2, and no deletion of the source files

usage&#58; tbcp &#91;-OPTION&#93;... &#91;folder&#93;

 -h        give this help
 -v        give version number and exits
 -c        compress &#40;default&#41;
 -d        decompress
 -k        keep   source files, after compression/decompression &#40;default&#41;
 -r        remove source files, after compression/decompression
 -V        verify encoding/decoding on the fly
 -q        quiet &#40;no output except error messages&#41;
 -i <file> work on the given individual file, rather than a folder
 -s   #    sets compression scheme &#40;0-2, default=2&#41;
 -n   #    operates on up to 3-, 4- o 5-piece tablebases &#40;default=5&#41;

Copyright &#40;c&#41; 2009-2010 Miguel A. Ballicora
There is NO WARRANTY of any kind

Hello Miguel,

How does your compressor compare sizewise to Kadatch?

The algorithms you quote seem like an 80s chess program to me.

Compression is an entire field of course; a good todays compressor is really hard to build and fulltime work, just like a strong chessprogram is.

Vincent

michiguel · Post by **michiguel** » Mon Jan 04, 2010 6:05 pm

diep wrote:
michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.

The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes

All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).

The default will be LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).

I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.

So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.

Miguel
Code: Select all
miguel@avefenix&#58;/media/bigdisk/dev/tbcp$ ./tbcp -h

Compression/decompression utility for Gaviota Table Bases.
It uses different free compression algorithms. More information at
http&#58;//sites.google.com/site/gaviotachessengine/

quick example&#58; tbcp gtb
 compresses all the endgame table base files in the folder "gtb"
 with the default compression scheme 2, and no deletion of the source files

usage&#58; tbcp &#91;-OPTION&#93;... &#91;folder&#93;

 -h        give this help
 -v        give version number and exits
 -c        compress &#40;default&#41;
 -d        decompress
 -k        keep   source files, after compression/decompression &#40;default&#41;
 -r        remove source files, after compression/decompression
 -V        verify encoding/decoding on the fly
 -q        quiet &#40;no output except error messages&#41;
 -i <file> work on the given individual file, rather than a folder
 -s   #    sets compression scheme &#40;0-2, default=2&#41;
 -n   #    operates on up to 3-, 4- o 5-piece tablebases &#40;default=5&#41;

Copyright &#40;c&#41; 2009-2010 Miguel A. Ballicora
There is NO WARRANTY of any kind
Hello Miguel,

How does your compressor compare sizewise to Kadatch?

Apparently, when I use LZMA (the heart of 7z) it is a bit better than Kadatch = 6.5 GiB vs 7.2 GiB.

The algorithms you quote seem like an 80s chess program to me.

LZMA is what 7z uses!

BTW, I am testing what is open source with a BSD type license. The idea is to release the probing code so everybody can use it the tables.

Compression is an entire field of course; a good todays compressor is really hard to build and fulltime work, just like a strong chessprogram is.

Vincent

Yes, I am plugging available source code (some of them needed to be tweaked). I coded myself Huffman coding just for fun, that was it. I am not intending to be perfect, just to have something good enough. I think that LZMA is not worst that Nalimov/Kadatch, and that is ok for me.

Miguel

michiguel · Post by **michiguel** » Mon Jan 04, 2010 6:10 pm

diep wrote:
michiguel wrote:
Pradu wrote:
michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp)...
Are there any good resources describing LZMA? I realize that there's already some C code, but I don't think it will be that useful to learn the algorithm.
For what I read, LZMA is LZ77 + Arithmetic coding, so I guess that there is literature for both. From all the links I surveyed to collect free libraries, I may conclude that it may be a good idea to ask this guy
http://www.mattmahoney.net/
It is a professor expert in data compression and runs an impressive benchmark page:
http://www.mattmahoney.net/dc/text.html

Miguel
I had some years ago communication with matt mahoney. His responses didn't impress me a lot (understatement). What he produced to 'top the charts' is basically something that's not usable for EGTB compression very well, he just cut'n pasted together what was needed to get the job done there based upon statistics; as an all round compressor that's useless. Adapting a form of PPM into something that works bucket sized and within 1 bucket has some overhead to allow lookups in O ( log n ) of the right compressed data, that is going to be far more succesful than whatever he has coded.

You can study 7-zip for example; it has total superior compression of mahoney's thing. This where it isn't even number 100 or so on the all time list of supercompression. A new zealand compressor by far has the best compression for Diep's WDL egtb's, however compression time of those compressors is like 24 hours single core for 2 EGTBs in total 200MB uncompressed, now that wouldn't be the problem of course wasn't it that the decompression time is also 24 hours.

Ouch! The good thing is memory and HD space is getting cheaper, so a huge compression might not be so critical.

That total hammered away all other compressors. It's not a coincidence of course that PPM is so superior for EGTBs, as EGTBs in fact are multi-dimensional table repesentations anyway.

Vincent

The problem I found is most of PPM packages are GPL and I was looking for BSD or MIT type licenses. There are gazillions of GPL compressors everywhere.

Miguel

Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes

Re: Gaviota TBs, compression schemes