Gaviota 0.74.6 (special for EGTB builders)

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by Edmund »

Spock wrote:
Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
The advantage of them being uncompressed would be much faster access by the engine, would it not ?
If the data is not cached a main bottleneck is the HD-I/O. An important goal for compressing EGTB is to keep random access probing still possible.
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by Dann Corbit »

Spock wrote:
Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
The advantage of them being uncompressed would be much faster access by the engine, would it not ?
Decompression speed is about 20 MB/sec, and I don't think most (any?) disk drives can compete with that currently. Of course, there is a cost in that the CPUs that are decompressing (to get 20 Mb/sec requires 2 CPUS) would be partially occupied and take away from the chess calculation speed. Of course, EGTB access (in general) has this approximate effect for all EGTB systems that I know of.
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by Edmund »

Dann Corbit wrote:
Volker Pittlik wrote:A more general question: how big are all 5-men egtbs? I'm running the program around two days now. I got 117 files generated so far and around 23 GB are used. That seems a bit much to me.

Wouldn't it be possible to spread a verified set of tbs via emule or a torrent?

Regards

Volker
38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
How is this compression ratio possible?

Just to take Nalimov EGTB for comparison:
The indexing is a bit more complex so the uncompressed size of 3-5men is 30,612,186,969 bytes
These files are compressed with DATACOMP by Andrew Kadatch and the resulting filesize is 7,580,348,624 bytes = 24.76% of uncompressed

The compressionratio is very much dependend on the block-size. Larger block size results in better compression, but takes longer to decompress and to get the value of a single position. IIRC Nalimov together with Hyatt did a couple of tests and came out with the current (optimal) blocksize of 8kb.

So what has this egtb that Nalimov's doesn't have, more than doubling the compression-ratio?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by michiguel »

Edmund wrote:Hello Miguel!

Great to see your EGTB Generator is working! It would be really interesting to me if you could explain some of the details like indexing used or generation algoritm. I just tried to figure out the indexing myself but wasn't able to fully grasp it.

What I got, it uses the trick to index Kings separatly - 462 positions for pawnless. But doesn't take into account the fact that 2 pieces may not stand on the same square. Furthermore both tables (wtm, btm) are integrated into one file.
Correct. It takes into account that pieces can be in the same square only when they are of the same type (e.g. two white rooks) because I index them as a pair. I think that taking into account that two pieces cannot be in the same square is an unneeded complication. How much memory would I save? Moreover, if a I am so worried about memory, I would like to compress the files later. Then, that trick may make the life of the compressor more difficult.

But I didn't get the way the pieces are indexed. So for example the KRK file starts with: "0x0303 0309 0909 0909 0309 0909 0909 0909" this sequence seems odd, because I would have expected it to start with a broken position as the rook is standing on the same square as one of the kings and even if broken positions are not marked I don't quite understand the following values.

Thanks in advance,
Edmund
It starts with a broken position. The first two bits are used for information.

enum { iDRAW = 0, iWMATE = 1, iBMATE = 2, iFORBID = 3,
iDRAWt = 4, iWMATEt = 5, iBMATEt = 6, iUNKNOWN = 7};

The third bit (numbers 4 to 7) is used only to flag temporary values during the generation process,
later is overwritten by distance to mate information. During the generation, I use two bytes, but they are packed at the end into one byte.

I will eventually release the generation and probing code under a BSD license.

Miguel
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by Dann Corbit »

Edmund wrote:
Dann Corbit wrote:
Volker Pittlik wrote:A more general question: how big are all 5-men egtbs? I'm running the program around two days now. I got 117 files generated so far and around 23 GB are used. That seems a bit much to me.

Wouldn't it be possible to spread a verified set of tbs via emule or a torrent?

Regards

Volker
38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
How is this compression ratio possible?

Just to take Nalimov EGTB for comparison:
The indexing is a bit more complex so the uncompressed size of 3-5men is 30,612,186,969 bytes
These files are compressed with DATACOMP by Andrew Kadatch and the resulting filesize is 7,580,348,624 bytes = 24.76% of uncompressed

The compressionratio is very much dependend on the block-size. Larger block size results in better compression, but takes longer to decompress and to get the value of a single position. IIRC Nalimov together with Hyatt did a couple of tests and came out with the current (optimal) blocksize of 8kb.

So what has this egtb that Nalimov's doesn't have, more than doubling the compression-ratio?
Propopsed compression system is 7-zip library (the library is formally declared public domain):
http://www.7-zip.org/sdk.html
Note that as of 4.62: "4.62: Some fixes. LZMA SDK is placed in the public domain. "

Proposed lookup system would be either FastDB or Gigabase (also public domain):
http://www.garret.ru/databases.html

FastDB is fully memory mapped database, Gigabase is demand paged memory mapped database.

But Miguel needs to think about what direction he wants to go in, and I don't want to push him. He may have alternative ideas.

The reason I imagine 10:1 compression is that I used 7-zip to compress the 4 man files into 15,939,772 bytes (gtb.7z) which contains:

Code: Select all

10/23/2009  03:37 PM         1,862,784 kbbk.gtb
10/23/2009  03:36 PM            59,136 kbk.gtb
10/23/2009  03:36 PM         3,784,704 kbkb.gtb
10/23/2009  03:36 PM         3,784,704 kbkn.gtb
10/23/2009  03:38 PM        12,582,912 kbkp.gtb
10/23/2009  03:37 PM         3,784,704 kbnk.gtb
10/23/2009  03:38 PM        12,582,912 kbpk.gtb
10/23/2009  03:36 PM            59,136 knk.gtb
10/23/2009  03:36 PM         3,784,704 knkn.gtb
10/23/2009  03:38 PM        12,582,912 knkp.gtb
10/23/2009  03:37 PM         1,862,784 knnk.gtb
10/23/2009  03:38 PM        12,582,912 knpk.gtb
10/23/2009  03:36 PM           196,608 kpk.gtb
10/23/2009  03:39 PM         9,437,184 kpkp.gtb
10/23/2009  03:39 PM         4,718,592 kppk.gtb
10/23/2009  03:37 PM         3,784,704 kqbk.gtb
10/23/2009  03:36 PM            59,136 kqk.gtb
10/23/2009  03:36 PM         3,784,704 kqkb.gtb
10/23/2009  03:36 PM         3,784,704 kqkn.gtb
10/23/2009  03:37 PM        12,582,912 kqkp.gtb
10/23/2009  03:36 PM         3,784,704 kqkq.gtb
10/23/2009  03:36 PM         3,784,704 kqkr.gtb
10/23/2009  03:37 PM         3,784,704 kqnk.gtb
10/23/2009  03:38 PM        12,582,912 kqpk.gtb
10/23/2009  03:36 PM         1,862,784 kqqk.gtb
10/23/2009  03:36 PM         3,784,704 kqrk.gtb
10/23/2009  03:37 PM         3,784,704 krbk.gtb
10/23/2009  03:36 PM            59,136 krk.gtb
10/23/2009  03:36 PM         3,784,704 krkb.gtb
10/23/2009  03:36 PM         3,784,704 krkn.gtb
10/23/2009  03:37 PM        12,582,912 krkp.gtb
10/23/2009  03:36 PM         3,784,704 krkr.gtb
10/23/2009  03:37 PM         3,784,704 krnk.gtb
10/23/2009  03:38 PM        12,582,912 krpk.gtb
10/23/2009  03:37 PM         1,862,784 krrk.gtb
              35 File(s)    183,258,624 bytes
The compression dictionary can be saved as a separate object.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by michiguel »

Spock wrote:
Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
The advantage of them being uncompressed would be much faster access by the engine, would it not ?
Short answer: Most likely uncompressed files are faster in solid state hard drives.

In normal HD: I do not know, probably the same. What I observe with gaviota is that the access to the tables as it is now is good enough.

The time needed to probe the tables is determined by

Raw Probing time = HD Access + Transfer time + Decompression time
Probing Time = Raw Probing Time * Cache miss rate%

In normal HD, with no compression, the access is the bottleneck. For that reason, I transfer not only the position that I need, but a contiguous block of ~8000 positions. This may increase the total time just a tiny bit, compared to transferring the info of only one position. What do I gain? I fill up my cache, so the next time I need a position, it is already in memory. I other words, I decrease Cache miss rate in the above formula.

Why compression could be an advantage? it may not add a lot of time if Access is the bottleneck, and it may allow to transfer more positions, to fill the cache, decreasing Cache miss rate even more. This may depend of many circumstances. However, this is all true if the access is the bottleneck. In SSHD the access is muuuuch lower (there is no disk spinning), so compression can become a bottleneck. This may require tests.

Compression has of course other benefits, like not hogging memory from users.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by michiguel »

SzG wrote:
Spock wrote:
michiguel wrote: tbcheck gtb

PS: if it works well, I may include the utility tbcheck in the distribution or I may integrate it to the engine.
I ran it on my tablebases just to double-check, and all were ok
I would include as a separate exe in the distribution
At last I have the full gtb set on my hard drive. Ray helped me out with the last 4 files.

I also ran tbcheck and all files were ok.

Thank you for your efforts, Miguel.

BTW, what is the tbval command for?
Great. I guess that the problems originated when the first version was hogging too much memory and crashes generated corrupted files. It should not happen now if someone else try to repeat the process. I am curious to see Volker's experience.

tbval validates the files. There is no much use for the user, but it is critical for the developer to run it at least once. It goes position by position and does a 1 ply search (probing the child positions that are also in the same EGTB file). The result of this search, should be identical to the results stored. It takes a long time to do it (less than the generation though), but if it passes, it guarantees that the generation has been correct. You can try tbval 3 or tbval 4.

So there are three ways to check the files
1) tbtest: tests one position of the file to make sure that the file is present and it is what is supposed to be. Super fast, but it might miss corruptions of a fragment or even if the file is truncated.

2) tbcheck: It checks the integrity of the file generating and comparing signatures with the original file. I use only 32 bit signatures because it should be enough. Not that fast, but extremely unlikely that there is a corrupted file undetected (1 in a billion chance).

3) tbval: what I explained above. Very slow, but it is impossible to have a wrong file if this test passes.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by michiguel »

Edmund wrote:
Dann Corbit wrote:
Volker Pittlik wrote:A more general question: how big are all 5-men egtbs? I'm running the program around two days now. I got 117 files generated so far and around 23 GB are used. That seems a bit much to me.

Wouldn't it be possible to spread a verified set of tbs via emule or a torrent?

Regards

Volker
38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
How is this compression ratio possible?

Just to take Nalimov EGTB for comparison:
The indexing is a bit more complex so the uncompressed size of 3-5men is 30,612,186,969 bytes
These files are compressed with DATACOMP by Andrew Kadatch and the resulting filesize is 7,580,348,624 bytes = 24.76% of uncompressed

The compressionratio is very much dependend on the block-size. Larger block size results in better compression, but takes longer to decompress and to get the value of a single position. IIRC Nalimov together with Hyatt did a couple of tests and came out with the current (optimal) blocksize of 8kb.

So what has this egtb that Nalimov's doesn't have, more than doubling the compression-ratio?
Dann already answered. Maybe it is the 7z algorithm. I wonder what is the compression ratio with the Nalimov files using 7z.

I do not think that the Gaviota files have anything special. If there is something special, it may be the fact that the indexing is simpler or that the positions are indexed in a way that slower pieces (Kings) are closer. But I do not think that the latter is anything new. I guess that simplicity sometimes makes compression easier.

Dann showed 4-men TB files compress at a less than 1:10 ratio. Maybe 5-men do not compress that well? I checked a handful and the ratio varies from 10 to 20%. So, to be cautious, I think that it will be between 3.8 Gb and less than 7 Gb.


Miguel
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by Edmund »

Dann Corbit wrote: [...]
Propopsed compression system is 7-zip library (the library is formally declared public domain):
http://www.7-zip.org/sdk.html
Note that as of 4.62: "4.62: Some fixes. LZMA SDK is placed in the public domain. "
[...]
Compression is one thing, probing is another.
For 4men its not so much of a problem you can probably decompress them as a whole, as they are not too big, but with 5 men and at least 6 men one has to work with blocks. Is this still possible with general-purpose compression libraries?

Older references:
http://kirill-kryukov.com/chess/discuss ... f=6&t=1558
http://www.open-aurec.com/wbforum/viewt ... 8&start=20
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Gaviota 0.74.6 (special for EGTB builders)

Post by Dann Corbit »

michiguel wrote:
Spock wrote:
Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
;-)
The advantage of them being uncompressed would be much faster access by the engine, would it not ?
Short answer: Most likely uncompressed files are faster in solid state hard drives.

In normal HD: I do not know, probably the same. What I observe with gaviota is that the access to the tables as it is now is good enough.

The time needed to probe the tables is determined by

Raw Probing time = HD Access + Transfer time + Decompression time
Probing Time = Raw Probing Time * Cache miss rate%

In normal HD, with no compression, the access is the bottleneck. For that reason, I transfer not only the position that I need, but a contiguous block of ~8000 positions. This may increase the total time just a tiny bit, compared to transferring the info of only one position. What do I gain? I fill up my cache, so the next time I need a position, it is already in memory. I other words, I decrease Cache miss rate in the above formula.

Why compression could be an advantage? it may not add a lot of time if Access is the bottleneck, and it may allow to transfer more positions, to fill the cache, decreasing Cache miss rate even more. This may depend of many circumstances. However, this is all true if the access is the bottleneck. In SSHD the access is muuuuch lower (there is no disk spinning), so compression can become a bottleneck. This may require tests.

Compression has of course other benefits, like not hogging memory from users.

Miguel
Also, once the 8 man Ballicora tablebase files become standardized, it will save a significant amount of storage. Otherwise, we will have to spend another $2.50 on a 12 petabyte memory stick.
;-)