Gaviota 0.74.6 (special for EGTB builders)

michiguel · Post by **michiguel** » Thu Nov 05, 2009 5:45 pm

SzG wrote:
michiguel wrote:I guess that the problems originated when the first version was hogging too much memory and crashes generated corrupted files. It should not happen now if someone else try to repeat the process.
Well, someone else (me) tried to repeat the process. I used a notebook with Intel Core Duo P8700 CPU, 2 gigs of RAM, Win XP 32-bit.
All went well for 3 days until KppKQ generation. There I got problems at joining (what's that?) messages and the file got corrupted.

Can the problem be that generation was running on 2 cores?

I try to resume generating (deleting the corrupted KppKQ file first) using only 1 cores anyway. Let's see what happens. I've got nothing to lose.

I am very curious to know what happen when you do that for debugging purposes.

The message was something like this, correct?
thread 1: problems at joining

That is a message to inform what is going on internally. It is cryptic because it is for the developer, not the user. There is a problem when one of the cores finish the work and was suppose to "join" the main one. I may have a bug somewhere, and this bug may be triggered in some rare situation. I have not seen it, but it does not mean it is not there. As I said before, debugging this type of programs is painful because each run is not minutes, it may be hours or days. I will try to reproduce this. If I can reproduce it, I can fix it.

Do you remember if if was at the beginning of the generation of that particular file?

Miguel

Dann Corbit · Post by **Dann Corbit** » Fri Nov 06, 2009 12:07 am

Edmund wrote:
Dann Corbit wrote: [...]
Propopsed compression system is 7-zip library (the library is formally declared public domain):
http://www.7-zip.org/sdk.html
Note that as of 4.62: "4.62: Some fixes. LZMA SDK is placed in the public domain. "
[...]
Compression is one thing, probing is another.
For 4men its not so much of a problem you can probably decompress them as a whole, as they are not too big, but with 5 men and at least 6 men one has to work with blocks. Is this still possible with general-purpose compression libraries?

Older references:
http://kirill-kryukov.com/chess/discuss ... f=6&t=1558
http://www.open-aurec.com/wbforum/viewt ... 8&start=20

My notion is as follows:
1. Take the entire set of EGTB files and compress them with maximum compression. This will create a fairly optimal compression dictionary.
2. Use this dictionary to compress slices and store the slices in the database using the seek value as the key.
3. We will need to store also this dictionary table.

Now, when you want to probe offset 14259800, we do a database search for that value, since we use the offset as our artificial key. In that way, you can use the same API for both uncomressed or compressed access.

bob · Post by **bob** » Fri Nov 06, 2009 12:38 am

Dann Corbit wrote:
Edmund wrote:
Dann Corbit wrote: [...]
Propopsed compression system is 7-zip library (the library is formally declared public domain):
http://www.7-zip.org/sdk.html
Note that as of 4.62: "4.62: Some fixes. LZMA SDK is placed in the public domain. "
[...]
Compression is one thing, probing is another.
For 4men its not so much of a problem you can probably decompress them as a whole, as they are not too big, but with 5 men and at least 6 men one has to work with blocks. Is this still possible with general-purpose compression libraries?

Older references:
http://kirill-kryukov.com/chess/discuss ... f=6&t=1558
http://www.open-aurec.com/wbforum/viewt ... 8&start=20
My notion is as follows:
1. Take the entire set of EGTB files and compress them with maximum compression. This will create a fairly optimal compression dictionary.
2. Use this dictionary to compress slices and store the slices in the database using the seek value as the key.
3. We will need to store also this dictionary table.

Now, when you want to probe offset 14259800, we do a database search for that value, since we use the offset as our artificial key. In that way, you can use the same API for both uncomressed or compressed access.

Right out of Eugene's "play book".

(Except for the small detail that the compression he uses is tailored to egtbs and compresses better than normal compression algorithms.) Picking the blocksize is a challenge for general-purpose computers that can be anything from slow to very fast, ditto for disk drives that can go from crawling to 15K rpm and beyond.

michiguel · Post by **michiguel** » Fri Nov 06, 2009 7:49 am

SzG wrote:
michiguel wrote:
SzG wrote: All went well for 3 days until KppKQ generation. There I got problems at joining (what's that?) messages and the file got corrupted.

Can the problem be that generation was running on 2 cores?

I try to resume generating (deleting the corrupted KppKQ file first) using only 1 cores anyway. Let's see what happens. I've got nothing to lose.
I am very curious to know what happen when you do that for debugging purposes.

The message was something like this, correct?
thread 1: problems at joining
Yes.

michiguel wrote: Do you remember if if was at the beginning of the generation of that particular file?
I am not sure. I think it was at the beginning of KppKR generation. Then I stopped and used tbcheck which showed KppKQ corrupt. Again, not sure.

I found the bug

. I will be posting a fix in the next hour or so. If it is what I see, I did not port a function from Linux to Windows properly

. That explains why I did not see this behavior in my machine before

Miguel

michiguel · Post by **michiguel** » Fri Nov 06, 2009 11:21 am

SzG wrote:
michiguel wrote:
SzG wrote: I try to resume generating (deleting the corrupted KppKQ file first) using only 1 cores anyway. Let's see what happens. I've got nothing to lose.
I am very curious to know what happen when you do that for debugging purposes.
Well, generating on only 1 core seems to be successful. It was running all night, I have just stopped it in order to run tbcheck, all files up to KBBKp are healthy.
The rest are absent, not yet generated. Now resuming.

I uploaded v0.74.9, which hopefully fixes it.

Anyway, stopping and resuming should work too

. The bug manifest itself after a long time of running. It is a variable that counts number of threads that have been generated at one point. Since many threads are generated and killed over the time of the EGTB generation, the variable keeps growing and growing until it is too big that no more threads can be generated. Whenever you stop, you start from zero. So, if I am correct, you should be able to finish.

For the programmers: When I ported pthread_join() to windows, I did not know that after WaitForSingleObject() I should call CloseHandle().

Miguel

Gaviota 0.74.6 (special for EGTB builders)

Re: Gaviota 0.74.6 (special for EGTB builders)

Re: Gaviota 0.74.6 (special for EGTB builders)

Re: Gaviota 0.74.6 (special for EGTB builders)

Re: Gaviota 0.74.6 (special for EGTB builders)

Re: Gaviota 0.74.6 (special for EGTB builders)