If the data is not cached a main bottleneck is the HD-I/O. An important goal for compressing EGTB is to keep random access probing still possible.Spock wrote:The advantage of them being uncompressed would be much faster access by the engine, would it not ?Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
Gaviota 0.74.6 (special for EGTB builders)
Moderator: Ras
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: Gaviota 0.74.6 (special for EGTB builders)
-
- Posts: 12777
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Decompression speed is about 20 MB/sec, and I don't think most (any?) disk drives can compete with that currently. Of course, there is a cost in that the CPUs that are decompressing (to get 20 Mb/sec requires 2 CPUS) would be partially occupied and take away from the chess calculation speed. Of course, EGTB access (in general) has this approximate effect for all EGTB systems that I know of.Spock wrote:The advantage of them being uncompressed would be much faster access by the engine, would it not ?Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: Gaviota 0.74.6 (special for EGTB builders)
How is this compression ratio possible?Dann Corbit wrote:38 GB for full set, they are not compressed.Volker Pittlik wrote:A more general question: how big are all 5-men egtbs? I'm running the program around two days now. I got 117 files generated so far and around 23 GB are used. That seems a bit much to me.
Wouldn't it be possible to spread a verified set of tbs via emule or a torrent?
Regards
Volker
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
Just to take Nalimov EGTB for comparison:
The indexing is a bit more complex so the uncompressed size of 3-5men is 30,612,186,969 bytes
These files are compressed with DATACOMP by Andrew Kadatch and the resulting filesize is 7,580,348,624 bytes = 24.76% of uncompressed
The compressionratio is very much dependend on the block-size. Larger block size results in better compression, but takes longer to decompress and to get the value of a single position. IIRC Nalimov together with Hyatt did a couple of tests and came out with the current (optimal) blocksize of 8kb.
So what has this egtb that Nalimov's doesn't have, more than doubling the compression-ratio?
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Correct. It takes into account that pieces can be in the same square only when they are of the same type (e.g. two white rooks) because I index them as a pair. I think that taking into account that two pieces cannot be in the same square is an unneeded complication. How much memory would I save? Moreover, if a I am so worried about memory, I would like to compress the files later. Then, that trick may make the life of the compressor more difficult.Edmund wrote:Hello Miguel!
Great to see your EGTB Generator is working! It would be really interesting to me if you could explain some of the details like indexing used or generation algoritm. I just tried to figure out the indexing myself but wasn't able to fully grasp it.
What I got, it uses the trick to index Kings separatly - 462 positions for pawnless. But doesn't take into account the fact that 2 pieces may not stand on the same square. Furthermore both tables (wtm, btm) are integrated into one file.
It starts with a broken position. The first two bits are used for information.
But I didn't get the way the pieces are indexed. So for example the KRK file starts with: "0x0303 0309 0909 0909 0309 0909 0909 0909" this sequence seems odd, because I would have expected it to start with a broken position as the rook is standing on the same square as one of the kings and even if broken positions are not marked I don't quite understand the following values.
Thanks in advance,
Edmund
enum { iDRAW = 0, iWMATE = 1, iBMATE = 2, iFORBID = 3,
iDRAWt = 4, iWMATEt = 5, iBMATEt = 6, iUNKNOWN = 7};
The third bit (numbers 4 to 7) is used only to flag temporary values during the generation process,
later is overwritten by distance to mate information. During the generation, I use two bytes, but they are packed at the end into one byte.
I will eventually release the generation and probing code under a BSD license.
Miguel
-
- Posts: 12777
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Propopsed compression system is 7-zip library (the library is formally declared public domain):Edmund wrote:How is this compression ratio possible?Dann Corbit wrote:38 GB for full set, they are not compressed.Volker Pittlik wrote:A more general question: how big are all 5-men egtbs? I'm running the program around two days now. I got 117 files generated so far and around 23 GB are used. That seems a bit much to me.
Wouldn't it be possible to spread a verified set of tbs via emule or a torrent?
Regards
Volker
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
Just to take Nalimov EGTB for comparison:
The indexing is a bit more complex so the uncompressed size of 3-5men is 30,612,186,969 bytes
These files are compressed with DATACOMP by Andrew Kadatch and the resulting filesize is 7,580,348,624 bytes = 24.76% of uncompressed
The compressionratio is very much dependend on the block-size. Larger block size results in better compression, but takes longer to decompress and to get the value of a single position. IIRC Nalimov together with Hyatt did a couple of tests and came out with the current (optimal) blocksize of 8kb.
So what has this egtb that Nalimov's doesn't have, more than doubling the compression-ratio?
http://www.7-zip.org/sdk.html
Note that as of 4.62: "4.62: Some fixes. LZMA SDK is placed in the public domain. "
Proposed lookup system would be either FastDB or Gigabase (also public domain):
http://www.garret.ru/databases.html
FastDB is fully memory mapped database, Gigabase is demand paged memory mapped database.
But Miguel needs to think about what direction he wants to go in, and I don't want to push him. He may have alternative ideas.
The reason I imagine 10:1 compression is that I used 7-zip to compress the 4 man files into 15,939,772 bytes (gtb.7z) which contains:
Code: Select all
10/23/2009 03:37 PM 1,862,784 kbbk.gtb
10/23/2009 03:36 PM 59,136 kbk.gtb
10/23/2009 03:36 PM 3,784,704 kbkb.gtb
10/23/2009 03:36 PM 3,784,704 kbkn.gtb
10/23/2009 03:38 PM 12,582,912 kbkp.gtb
10/23/2009 03:37 PM 3,784,704 kbnk.gtb
10/23/2009 03:38 PM 12,582,912 kbpk.gtb
10/23/2009 03:36 PM 59,136 knk.gtb
10/23/2009 03:36 PM 3,784,704 knkn.gtb
10/23/2009 03:38 PM 12,582,912 knkp.gtb
10/23/2009 03:37 PM 1,862,784 knnk.gtb
10/23/2009 03:38 PM 12,582,912 knpk.gtb
10/23/2009 03:36 PM 196,608 kpk.gtb
10/23/2009 03:39 PM 9,437,184 kpkp.gtb
10/23/2009 03:39 PM 4,718,592 kppk.gtb
10/23/2009 03:37 PM 3,784,704 kqbk.gtb
10/23/2009 03:36 PM 59,136 kqk.gtb
10/23/2009 03:36 PM 3,784,704 kqkb.gtb
10/23/2009 03:36 PM 3,784,704 kqkn.gtb
10/23/2009 03:37 PM 12,582,912 kqkp.gtb
10/23/2009 03:36 PM 3,784,704 kqkq.gtb
10/23/2009 03:36 PM 3,784,704 kqkr.gtb
10/23/2009 03:37 PM 3,784,704 kqnk.gtb
10/23/2009 03:38 PM 12,582,912 kqpk.gtb
10/23/2009 03:36 PM 1,862,784 kqqk.gtb
10/23/2009 03:36 PM 3,784,704 kqrk.gtb
10/23/2009 03:37 PM 3,784,704 krbk.gtb
10/23/2009 03:36 PM 59,136 krk.gtb
10/23/2009 03:36 PM 3,784,704 krkb.gtb
10/23/2009 03:36 PM 3,784,704 krkn.gtb
10/23/2009 03:37 PM 12,582,912 krkp.gtb
10/23/2009 03:36 PM 3,784,704 krkr.gtb
10/23/2009 03:37 PM 3,784,704 krnk.gtb
10/23/2009 03:38 PM 12,582,912 krpk.gtb
10/23/2009 03:37 PM 1,862,784 krrk.gtb
35 File(s) 183,258,624 bytes
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Short answer: Most likely uncompressed files are faster in solid state hard drives.Spock wrote:The advantage of them being uncompressed would be much faster access by the engine, would it not ?Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
In normal HD: I do not know, probably the same. What I observe with gaviota is that the access to the tables as it is now is good enough.
The time needed to probe the tables is determined by
Raw Probing time = HD Access + Transfer time + Decompression time
Probing Time = Raw Probing Time * Cache miss rate%
In normal HD, with no compression, the access is the bottleneck. For that reason, I transfer not only the position that I need, but a contiguous block of ~8000 positions. This may increase the total time just a tiny bit, compared to transferring the info of only one position. What do I gain? I fill up my cache, so the next time I need a position, it is already in memory. I other words, I decrease Cache miss rate in the above formula.
Why compression could be an advantage? it may not add a lot of time if Access is the bottleneck, and it may allow to transfer more positions, to fill the cache, decreasing Cache miss rate even more. This may depend of many circumstances. However, this is all true if the access is the bottleneck. In SSHD the access is muuuuch lower (there is no disk spinning), so compression can become a bottleneck. This may require tests.
Compression has of course other benefits, like not hogging memory from users.
Miguel
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Great. I guess that the problems originated when the first version was hogging too much memory and crashes generated corrupted files. It should not happen now if someone else try to repeat the process. I am curious to see Volker's experience.SzG wrote:At last I have the full gtb set on my hard drive. Ray helped me out with the last 4 files.Spock wrote:I ran it on my tablebases just to double-check, and all were okmichiguel wrote: tbcheck gtb
PS: if it works well, I may include the utility tbcheck in the distribution or I may integrate it to the engine.
I would include as a separate exe in the distribution
I also ran tbcheck and all files were ok.
Thank you for your efforts, Miguel.
BTW, what is the tbval command for?
tbval validates the files. There is no much use for the user, but it is critical for the developer to run it at least once. It goes position by position and does a 1 ply search (probing the child positions that are also in the same EGTB file). The result of this search, should be identical to the results stored. It takes a long time to do it (less than the generation though), but if it passes, it guarantees that the generation has been correct. You can try tbval 3 or tbval 4.
So there are three ways to check the files
1) tbtest: tests one position of the file to make sure that the file is present and it is what is supposed to be. Super fast, but it might miss corruptions of a fragment or even if the file is truncated.
2) tbcheck: It checks the integrity of the file generating and comparing signatures with the original file. I use only 32 bit signatures because it should be enough. Not that fast, but extremely unlikely that there is a corrupted file undetected (1 in a billion chance).
3) tbval: what I explained above. Very slow, but it is impossible to have a wrong file if this test passes.
Miguel
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Dann already answered. Maybe it is the 7z algorithm. I wonder what is the compression ratio with the Nalimov files using 7z.Edmund wrote:How is this compression ratio possible?Dann Corbit wrote:38 GB for full set, they are not compressed.Volker Pittlik wrote:A more general question: how big are all 5-men egtbs? I'm running the program around two days now. I got 117 files generated so far and around 23 GB are used. That seems a bit much to me.
Wouldn't it be possible to spread a verified set of tbs via emule or a torrent?
Regards
Volker
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
Just to take Nalimov EGTB for comparison:
The indexing is a bit more complex so the uncompressed size of 3-5men is 30,612,186,969 bytes
These files are compressed with DATACOMP by Andrew Kadatch and the resulting filesize is 7,580,348,624 bytes = 24.76% of uncompressed
The compressionratio is very much dependend on the block-size. Larger block size results in better compression, but takes longer to decompress and to get the value of a single position. IIRC Nalimov together with Hyatt did a couple of tests and came out with the current (optimal) blocksize of 8kb.
So what has this egtb that Nalimov's doesn't have, more than doubling the compression-ratio?
I do not think that the Gaviota files have anything special. If there is something special, it may be the fact that the indexing is simpler or that the positions are indexed in a way that slower pieces (Kings) are closer. But I do not think that the latter is anything new. I guess that simplicity sometimes makes compression easier.
Dann showed 4-men TB files compress at a less than 1:10 ratio. Maybe 5-men do not compress that well? I checked a handful and the ratio varies from 10 to 20%. So, to be cautious, I think that it will be between 3.8 Gb and less than 7 Gb.
Miguel
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: Gaviota 0.74.6 (special for EGTB builders)
Compression is one thing, probing is another.Dann Corbit wrote: [...]
Propopsed compression system is 7-zip library (the library is formally declared public domain):
http://www.7-zip.org/sdk.html
Note that as of 4.62: "4.62: Some fixes. LZMA SDK is placed in the public domain. "
[...]
For 4men its not so much of a problem you can probably decompress them as a whole, as they are not too big, but with 5 men and at least 6 men one has to work with blocks. Is this still possible with general-purpose compression libraries?
Older references:
http://kirill-kryukov.com/chess/discuss ... f=6&t=1558
http://www.open-aurec.com/wbforum/viewt ... 8&start=20
-
- Posts: 12777
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Gaviota 0.74.6 (special for EGTB builders)
Also, once the 8 man Ballicora tablebase files become standardized, it will save a significant amount of storage. Otherwise, we will have to spend another $2.50 on a 12 petabyte memory stick.michiguel wrote:Short answer: Most likely uncompressed files are faster in solid state hard drives.Spock wrote:The advantage of them being uncompressed would be much faster access by the engine, would it not ?Dann Corbit wrote: 38 GB for full set, they are not compressed.
I am negotiating with Miguel on a compression scheme. The compressed files would be about 3.8 GB.
In normal HD: I do not know, probably the same. What I observe with gaviota is that the access to the tables as it is now is good enough.
The time needed to probe the tables is determined by
Raw Probing time = HD Access + Transfer time + Decompression time
Probing Time = Raw Probing Time * Cache miss rate%
In normal HD, with no compression, the access is the bottleneck. For that reason, I transfer not only the position that I need, but a contiguous block of ~8000 positions. This may increase the total time just a tiny bit, compared to transferring the info of only one position. What do I gain? I fill up my cache, so the next time I need a position, it is already in memory. I other words, I decrease Cache miss rate in the above formula.
Why compression could be an advantage? it may not add a lot of time if Access is the bottleneck, and it may allow to transfer more positions, to fill the cache, decreasing Cache miss rate even more. This may depend of many circumstances. However, this is all true if the access is the bottleneck. In SSHD the access is muuuuch lower (there is no disk spinning), so compression can become a bottleneck. This may require tests.
Compression has of course other benefits, like not hogging memory from users.
Miguel
