Nalimov and memory for indexes (are you aware?)

bob · Post by **bob** » Tue Mar 02, 2010 10:05 pm

Gian-Carlo Pascutto wrote:
BrandonSi wrote:not all indexes would be loaded into memory at the same time,
They are. It's limitation of the Nalimov code.

I don't think of it as a "limitation". Who would want to page in and out the indices, so that you can then figure out what to page in / out from a specific EGTB file? I/O is already slow enough. Constantly reading indices before reading data would make it two times slower...

bob · Post by **bob** » Tue Mar 02, 2010 10:06 pm

michiguel wrote:
Gian-Carlo Pascutto wrote:
Harvey Williamson wrote: Rybka is very bad at this whatever you set as tb cache it will use x the number of cores so on an 8 core machine if you set cache at 64 Rybka will take 8x64 - i have not seen other engines do this.
Probably an issue with every engine that is multiprocessed. Zappa will likely be affected too.

It's not only the caches, the indexes will also get replicated. Nalimov wrote his code for Crafty at the moment Crafty was multithreaded, and the code just sucks for multiprocessed engines.
I guess that at least they could ask for EGTB Cache/core rather than EGTB Cache, to let the user know what is going on.

Anyway, I am very interested in your expert opinion:

I think it is not a good idea to have (for instance) 4 caches of 32 MiB, one for each thread. It would be better to have a 128 MiB cache shared by all threads, and properly protected for read/write operations (that is what I am doing currently with Gaviota TBs). Of course, this creates a problem when all threads start to hit the cache and it could potentially degrade the parallel scalability. However, the EGTB probe is dominated so much by HD access than anything else is almost irrelevant. Having a bigger shared cache decreases significantly the likelihood of HD access. I prefer to reduce HD accesses rather than the potential overlap of two threads trying to access the EGTB cache. Most of those problems are previously faced by the hash table probe already.

What do you think?

Miguel

That's why threads are better when using Eugene's code. The indices are shared among all threads, as are the cache blocks. If you use processes instead, you duplicate _everything_ which is a poor use of memory.

bob · Post by **bob** » Tue Mar 02, 2010 10:13 pm

Werner wrote:Hi,
I think in this case Windows task manager does not show it correct. If you look at the rest of free memory you see only 1 times the 64 MB are used.

Here is what ought to happen, and what _does_ happen under Linux. If you initialize the EGTBs _before_ you use fork() to create new processes, all is well. The indices are allocated, and filled in, and then after a fork() the memory is shared among all new processes via the "copy-on-write" logic that says to share all memory by just duplicating the page table entries so that each process has a separate page table, but with identical contents. Each writable page of memory is temporarily flagged as no-write. If one of those gets modified, the O/S first duplicates the unmodified data by copying it to a new page of RAM, then modifies one of the processes so that its page table now points to that new page of RAM with write permission, and then continues. Since the indices never get modified, they should be shared among all processes just as if they were using threads. And since there is no modification, there is no race issues and no need for any locks. So it really doesn't get "duplicated" under linux, you get exactly one copy no matter how many processes you run. For windows, I don't know if this is true, but would certainly expect it to work like that.

So the duplication is "virtual" but not "physical" and there really is only one copy of the indices in RAM and everyone is sharing them without knowing they are doing so.

michiguel · Post by **michiguel** » Tue Mar 02, 2010 11:47 pm

bob wrote:
Gian-Carlo Pascutto wrote:
BrandonSi wrote:not all indexes would be loaded into memory at the same time,
They are. It's limitation of the Nalimov code.
I don't think of it as a "limitation". Who would want to page in and out the indices, so that you can then figure out what to page in / out from a specific EGTB file? I/O is already slow enough. Constantly reading indices before reading data would make it two times slower...

It will we two times slower if you read the indexes every single time you probe. However, you can cache the ones you read more often. For a given position you hit only ~20% of all the files. So, I think that keeping only 20% of the indexes in cache should be very safe (even less may suffice). The performance penalty to go fetch the indexes on rare occasions should be negligible.

BTW, There is an advantage of keeping wtm and btm positions on the same file. Nalimov EGTBs keeps separate files for those.

Miguel

michiguel · Post by **michiguel** » Tue Mar 02, 2010 11:54 pm

Gian-Carlo Pascutto wrote:
michiguel wrote:
M ANSARI wrote:Well I have to agree that there is something different with R3 and Nalimov memory usage. I noticed that when I put EGTB's on a USB drive it takes ages for the engine to load and unload. For some reason if EGTB's are on HDD then this is not a problem. Once loaded things are OK. I don't see this behaviour with other engines and to be honest I have never figured this one out.
Each of the threads load their own indexes? Then it is 20 MiB x cores?

Miguel
Each *process*, see my comment above.

Oops, now I get it. Mmmhhh... I have to think if Gaviota TBs are MP friendly or not... I think they should be if everything is initialized before forking.

Miguel

There's a well-known bug where SMP Rybka doesn't use tablebases correctly, and it's closely related: none of the Nalimov stuff is shared and Vasik forgot to pass a parameter from the master process to the slaves.

I'm glad I got rid of sh*t like that in DS 3.0

lmader · Post by **lmader** » Wed Mar 03, 2010 12:21 am

bob wrote:For windows, I don't know if this is true, but would certainly expect it to work like that.

Interestingly, although Windows supports copy-on-write semantics for many things, there isn't a Windows API method for creating a process that behaves the way fork() does in *nix. Specifically, there isn't anything at the Windows API level that copies the page tables into the newly created process. The Windows createProcess() API method creates a brand new fresh process, analogous to fork() followed by exec().

Windows prefers using threads for this scenario, but I realize that this is different, and has both pros and cons.

bob · Post by **bob** » Wed Mar 03, 2010 1:12 am

lmader wrote:
bob wrote:For windows, I don't know if this is true, but would certainly expect it to work like that.
Interestingly, although Windows supports copy-on-write semantics for many things, there isn't a Windows API method for creating a process that behaves the way fork() does in *nix. Specifically, there isn't anything at the Windows API level that copies the page tables into the newly created process. The Windows createProcess() API method creates a brand new fresh process, analogous to fork() followed by exec().

Windows prefers using threads for this scenario, but I realize that this is different, and has both pros and cons.

This is simply a far better idea. No idea why windows would not use this, since it is well-known and has been in Linux for several years... Threads are not equivalent, since they share _everything_. copy-on-write shares everything until it is modified and slowly builds up copies of modified data, while still sharing that which has not (or can not) be modified including instructions. Far cache-friendlier as well.

bob · Post by **bob** » Wed Mar 03, 2010 1:14 am

michiguel wrote:
Gian-Carlo Pascutto wrote:
michiguel wrote:
M ANSARI wrote:Well I have to agree that there is something different with R3 and Nalimov memory usage. I noticed that when I put EGTB's on a USB drive it takes ages for the engine to load and unload. For some reason if EGTB's are on HDD then this is not a problem. Once loaded things are OK. I don't see this behaviour with other engines and to be honest I have never figured this one out.
Each of the threads load their own indexes? Then it is 20 MiB x cores?

Miguel
Each *process*, see my comment above.

Oops, now I get it. Mmmhhh... I have to think if Gaviota TBs are MP friendly or not... I think they should be if everything is initialized before forking.

Don't bet on it if you use windows. Someone pointed out that windows doesn't do copy-on-write as unix does... which would produce duplicates of everything when you fork().

Miguel

There's a well-known bug where SMP Rybka doesn't use tablebases correctly, and it's closely related: none of the Nalimov stuff is shared and Vasik forgot to pass a parameter from the master process to the slaves.

I'm glad I got rid of sh*t like that in DS 3.0

lmader · Post by **lmader** » Wed Mar 03, 2010 1:40 am

bob wrote:Don't bet on it if you use windows. Someone pointed out that windows doesn't do copy-on-write as unix does... which would produce duplicates of everything when you fork().

Well, it's not really that Windows doesn't do copy-on-write in similar ways to *nix, it's that the Windows API doesn't support creating a process with a copy of the parent's data. There is no equivalent of fork() in Windows.

Which is a little weird and unfortunate.

So doing this with multiple processes in Windows would be harder. There are probably a million ways to share a cache of memory between processes in Windows, you just can't do it with the fork() semantics.

Michel · Post by **Michel** » Wed Mar 03, 2010 7:19 am

There is no equivalent of fork() in Windows.

I was told that the windows kernel supports it. It is just not documented.

Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)

Re: Nalimov and memory for indexes (are you aware?)