Gaviota TBs, compression schemes

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota TBs, compression schemes

Post by michiguel »

beachknight wrote:
michiguel wrote:
beachknight wrote:I have zipped all 145 files (38,4 GB) using 7-zip with Ultra & Maximum
schemes. Resulting compressed size is around 3,9 GB and
compression ratio is around 10 %.

It seems that 1 normal DVD is sufficeient for storing Gaviota TBs.

Best,
The ability of 7z (LZMA internally, I think) to compress this TBs is stunning. Unfortunately, when it is compressed in blocks, it loses that spectacular ratio. Still, is the best by a clear margin. Fitting in a DVD is very appealing for storage and/or distribution. How are Nalimov TBs distributed by companies? in two DVDs?

Miguel
Dont know exactly. But I never saw a single dvd that contains all
3-4-5 Nalimovs. Probably 2 dvd for all 3-4-5 and a couple 6 men.

Best,

PS1: I assume that compressed Gaviotas will be around 6 - 8 GB.,
ie max. 2 DVD.

PS2: When do you release new Gaviota that can process compr. G?
I must delete the uncompressed ones soon.
I just compiled it in Windows, both the compressor, and a Gaviota version that uses it (They were developed in Linux). I need to make sure that everything is fine (a quick test seemed to be ok). So, I will release this very soon, today or tomorrow...

Miguel
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Gaviota TBs, compression schemes

Post by diep »

michiguel wrote:
diep wrote:
michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.

The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes

All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).

The default will be LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).

I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.

So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.

Miguel

Code: Select all

miguel@avefenix:/media/bigdisk/dev/tbcp$ ./tbcp -h

Compression/decompression utility for Gaviota Table Bases.
It uses different free compression algorithms. More information at
http://sites.google.com/site/gaviotachessengine/

quick example: tbcp gtb
 compresses all the endgame table base files in the folder "gtb"
 with the default compression scheme 2, and no deletion of the source files

usage: tbcp [-OPTION]... [folder]

 -h        give this help
 -v        give version number and exits
 -c        compress (default)
 -d        decompress
 -k        keep   source files, after compression/decompression (default)
 -r        remove source files, after compression/decompression
 -V        verify encoding/decoding on the fly
 -q        quiet (no output except error messages)
 -i <file> work on the given individual file, rather than a folder
 -s   #    sets compression scheme &#40;0-2, default=2&#41;
 -n   #    operates on up to 3-, 4- o 5-piece tablebases &#40;default=5&#41;

Copyright &#40;c&#41; 2009-2010 Miguel A. Ballicora
There is NO WARRANTY of any kind
Hello Miguel,

How does your compressor compare sizewise to Kadatch?
Apparently, when I use LZMA (the heart of 7z) it is a bit better than Kadatch = 6.5 GiB vs 7.2 GiB.

The algorithms you quote seem like an 80s chess program to me.
LZMA is what 7z uses!

BTW, I am testing what is open source with a BSD type license. The idea is to release the probing code so everybody can use it the tables.

Compression is an entire field of course; a good todays compressor is really hard to build and fulltime work, just like a strong chessprogram is.

Vincent
Yes, I am plugging available source code (some of them needed to be tweaked). I coded myself Huffman coding just for fun, that was it. I am not intending to be perfect, just to have something good enough. I think that LZMA is not worst that Nalimov/Kadatch, and that is ok for me.

Miguel
Oh wait i made a mistake, as i assumed lzma not usable because of the dictionary it used. Note Kadatch is just some 80s techniques programmed very well by him; it's really a fast compressor. Yet only works 32 bits and works at page sizes that are uncompressed whereas i like to have it work different, see below.

Kadatch doesn't use those big dynamic dictionaries, yet it compresses factors worse (factor 3 or so) than possible with a generic compressor.

Realize those dictionaries really compress most of all. I have 1 compressor from an english student here which just using a good dictionary at lightning speed gets a real good compression.

7-zip here using its default dictionary is in fact worse than that.

However to get 1 byte out of the entire EGTB you need to uncompress the entire thing, that is a hell of a disadvantage.

What i'm looking for is in fact a compressor that can compress to bucket sizes; so the buckets that are compressed have n bytes max, not the uncompressed data being n bytes; this because some EGTBs here get a factor 2000 compression or so. Of course when decompressing we don't want to store it in the RAM at all, we just want the byte number we look for.

The best compression is a special PPM compressor as i posted before, so i consider that most interesting.

Mind if i email you some diep EGTBs to you, to see how your compressor is doing at it?

note i'm at messengers under diepchess (skype,yahoo,aim etc).

Vincent
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota TBs, compression schemes

Post by michiguel »

diep wrote:
michiguel wrote:
diep wrote:
michiguel wrote:I am done writing the compressor for the endgame tablebases (tbcp). I have done a basic research on some compression schemes, all of them incorporated as an option. The compressor/decompressor help is at the end.

The analysis of the different schemes are here:
http://sites.google.com/site/gaviotachessengine/schemes

All of them, AFAIK, are either BSD license, or public domain. I wrote myself the huffman coding and the RLE. The other ones are from free libraries. There gazillion of good alternatives, but most of them are GPLed (and the idea is to have a more permissive option).

The default will be LZMA, which is the one that provides the smallest memory. AFAIK, it is smaller than Nalimov's and it fits in a 8 GB USB stick. However, there are other valid alternatives with potential (Zlib-9 and LZF). Those could give better performance in the future, as the people move to solid state disks. The decompression speed of LZF is close to the speed of light (and the source code is really small!!).

I still have to try several ideas to improve this, and a couple of other compression libraries (which may be faster but no chance to compete with the compression ratio of LZMA). But I do not want to delay this any further.

So, there is a chance (but slim) that I can find a better alternative, or more likely, a better compression algorithm & free library could be designed by somebody else in the future. In those cases, I would like to keep the flexibility for progress. I think that I may request to set LZMA as a default that every engine developer should support if they support Gaviota TBs, and, if in the future we find something else, we could move to a new scheme, but always supporting LZMA to guarantee backward compatibility. In other words, if testers decide to maintain only one copy of LZMA version, they should always be covered. The library with the probing code that I will release, should be able to this, but the engine developers will be responsible to provide a switch or a WB/UCI option etc. so the user can inform the engine what TBs are available.

Miguel

Code: Select all

miguel@avefenix&#58;/media/bigdisk/dev/tbcp$ ./tbcp -h

Compression/decompression utility for Gaviota Table Bases.
It uses different free compression algorithms. More information at
http&#58;//sites.google.com/site/gaviotachessengine/

quick example&#58; tbcp gtb
 compresses all the endgame table base files in the folder "gtb"
 with the default compression scheme 2, and no deletion of the source files

usage&#58; tbcp &#91;-OPTION&#93;... &#91;folder&#93;

 -h        give this help
 -v        give version number and exits
 -c        compress &#40;default&#41;
 -d        decompress
 -k        keep   source files, after compression/decompression &#40;default&#41;
 -r        remove source files, after compression/decompression
 -V        verify encoding/decoding on the fly
 -q        quiet &#40;no output except error messages&#41;
 -i <file> work on the given individual file, rather than a folder
 -s   #    sets compression scheme &#40;0-2, default=2&#41;
 -n   #    operates on up to 3-, 4- o 5-piece tablebases &#40;default=5&#41;

Copyright &#40;c&#41; 2009-2010 Miguel A. Ballicora
There is NO WARRANTY of any kind
Hello Miguel,

How does your compressor compare sizewise to Kadatch?
Apparently, when I use LZMA (the heart of 7z) it is a bit better than Kadatch = 6.5 GiB vs 7.2 GiB.

The algorithms you quote seem like an 80s chess program to me.
LZMA is what 7z uses!

BTW, I am testing what is open source with a BSD type license. The idea is to release the probing code so everybody can use it the tables.

Compression is an entire field of course; a good todays compressor is really hard to build and fulltime work, just like a strong chessprogram is.

Vincent
Yes, I am plugging available source code (some of them needed to be tweaked). I coded myself Huffman coding just for fun, that was it. I am not intending to be perfect, just to have something good enough. I think that LZMA is not worst that Nalimov/Kadatch, and that is ok for me.

Miguel
Oh wait i made a mistake, as i assumed lzma not usable because of the dictionary it used. Note Kadatch is just some 80s techniques programmed very well by him; it's really a fast compressor. Yet only works 32 bits and works at page sizes that are uncompressed whereas i like to have it work different, see below.

Kadatch doesn't use those big dynamic dictionaries, yet it compresses factors worse (factor 3 or so) than possible with a generic compressor.

Realize those dictionaries really compress most of all. I have 1 compressor from an english student here which just using a good dictionary at lightning speed gets a real good compression.

7-zip here using its default dictionary is in fact worse than that.

However to get 1 byte out of the entire EGTB you need to uncompress the entire thing, that is a hell of a disadvantage.

What i'm looking for is in fact a compressor that can compress to bucket sizes; so the buckets that are compressed have n bytes max, not the uncompressed data being n bytes; this because some EGTBs here get a factor 2000 compression or so. Of course when decompressing we don't want to store it in the RAM at all, we just want the byte number we look for.

The best compression is a special PPM compressor as i posted before, so i consider that most interesting.

Mind if i email you some diep EGTBs to you, to see how your compressor is doing at it?

note i'm at messengers under diepchess (skype,yahoo,aim etc).

Vincent
I just uploaded it with the engine that can use them compressed.
http://sites.google.com/site/gaviotache ... e/download

a very brief explanation is here:
http://sites.google.com/site/gaviotache ... e/releases

Code: Select all

Compression/decompression utility for Gaviota Table Bases.
It uses different free compression algorithms. More information at
http&#58;//sites.google.com/site/gaviotachessengine/

quick example&#58; tbcp gtb
 compresses all the endgame table base files in the folder "gtb"
 with the default compression scheme 2, and no deletion of the source files

usage&#58; tbcp &#91;-OPTION&#93;... &#91;folder&#93;
 -h        print this help
 -v        print version number and exit
 -L        display the license information
 -c        compress &#40;default&#41;
 -d        decompress
 -k        keep, do not delete input files &#40;default&#41;
 -r        remove input files after compression/decompression
 -V        verify encoding/decoding on the fly
 -q        quiet &#40;no output except error messages&#41;
 -i <file> work on the given individual file, rather than a folder
 -s   #    set compression scheme &#40;1-4, default=4&#41;
 -S        compression scheme help
 -n   #    operate on up to 3-, 4- o 5-piece tablebases &#40;default=5&#41;

Copyright &#40;c&#41; 2009-2010 Miguel A. Ballicora
There is NO WARRANTY of any kind
typing tbcp -S give the compressing schemes (1) is disabled for now. 4 is the default and the recommended one for gaviota

Code: Select all

Scheme  Extension   Algorithm       Ratio       Decompresion    Compression
   1      .cp1       Huffman     medium &#40;3.2x&#41;         fast           fast
   2      .cp2         LZF       medium &#40;2.7x    super fast     super fast
   3      .cp3        Zlib-9  very good &#40;4.9x&#41;    very fast      very slow
   4      .cp4        LZMA-5    OPTIMUM &#40;5.9x&#41;         fast      very slow
Miguel
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: Gaviota TBs, compression schemes

Post by jshriver »

Amazing work as normal. Any chance of a Linux build? I'll gladly wait patiently or willing to make one for you if you want.

-Josh
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota TBs, compression schemes

Post by michiguel »

jshriver wrote:Amazing work as normal. Any chance of a Linux build? I'll gladly wait patiently or willing to make one for you if you want.

-Josh
Linux versions, 32 and 64 bits, available in the web site. I will wait to make sure everything is fine when the people test Gaviota with the compressed TBs. After that, the plan is to release the probing code.

Miguel
Cardoso
Posts: 362
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Gaviota TBs, compression schemes

Post by Cardoso »

Sorry for the very slightly off topic, but I'm quite interesting in compreessing my 460Gb DTM portuguese checkers EGTB. Last time I looked (but not implemented) I was considering LZO, did you try this compressing scheme?
Seems fast at decompressing.
Although I also have w/d/l (2bit) EGTB generators I don't like them since they don't give progression towards mate (if there is mate in checkers).
So I have EGTBs DTM since 2002 but never implemented compresssion of any kind since compression DTM EGTBs is hard.
So what is your impression on LZO?

http://www.oberhumer.com/opensource/lzo/

best regards,
Alvaro
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Gaviota TBs, compression schemes

Post by Gian-Carlo Pascutto »

LZO is GPL so it's useless.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota TBs, compression schemes

Post by michiguel »

Gian-Carlo Pascutto wrote:LZO is GPL so it's useless.
That is exactly why I did not even try. We mentioned this, but LZF is in the family of these superfast comp/decompressors with suboptimal compression ratio. That one has one permissive flavor of BSD.

LZF
http://oldhome.schmorp.de/marc/liblzf.html

This one I did not try it (yet)
http://www.fastlz.org/download.htm

Miguel
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: Gaviota TBs, compression schemes

Post by beachknight »

michiguel wrote:
jshriver wrote:Amazing work as normal. Any chance of a Linux build? I'll gladly wait patiently or willing to make one for you if you want.

-Josh
Linux versions, 32 and 64 bits, available in the web site. I will wait to make sure everything is fine when the people test Gaviota with the compressed TBs. After that, the plan is to release the probing code.

Miguel
Hi,

Producing compressed G TBs just completed.

6,54 GB (7.025.792.015 bytes)

145 files

tbtest 5: --> PASSED.

Well done, Miguel.

Best,

PS: I suppose x64 versions of Gaviota will also use them, right?
hi, merhaba, hallo HT
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota TBs, compression schemes

Post by michiguel »

beachknight wrote:
michiguel wrote:
jshriver wrote:Amazing work as normal. Any chance of a Linux build? I'll gladly wait patiently or willing to make one for you if you want.

-Josh
Linux versions, 32 and 64 bits, available in the web site. I will wait to make sure everything is fine when the people test Gaviota with the compressed TBs. After that, the plan is to release the probing code.

Miguel
Hi,

Producing compressed G TBs just completed.

6,54 GB (7.025.792.015 bytes)

145 files

tbtest 5: --> PASSED.

Well done, Miguel.

Best,

PS: I suppose x64 versions of Gaviota will also use them, right?
Yes, they are supposed to work portably in any system, Linux, Windows, or whatever OS Gaviota (or any other engine with the same TB library) is compiled in the future.

Miguel