7-men Syzygy attempt

noobpwnftw · Post by **noobpwnftw** » Sun Mar 18, 2018 10:34 am

Nordlandia wrote:Bojun Guo: have you checked if probing 7-man inflict larger speed penalty than usual. Probing 6-men cause slight engine speed reduction, not sure how much force 7-man add on speed.

Maybe too early to say.

There is slow down when it needs to probe TB and there is speed up when it doesn't have to search any further.
I believe personally that the more pieces you have in TB the more chances it would justify the time spent on probing.
But I cannot give you the answers based on theory though, and I don't have a reasonable amount of TBs yet.

koedem · Post by **koedem** » Sun Mar 18, 2018 2:34 pm

I think the important point is whether they fit on your SSD. If they do the slowdown shouldn't be too big. (i.e. little enough that it's worth it)

However if you have to put the TB on a HDD the much larger lookup times probably make it too slow to being offset by the gain from using TBs. (though I am just speculating of course)

If you can simply host it that's even better of course. I think one doesn't need the 5v1s, right? (I think I remember seeing some code in SF that handles incomplete TBs and obviously 5v1s can be handled by the engine without problems)

syzygy · Post by **syzygy** » Sun Mar 18, 2018 3:06 pm

koedem wrote:However if you have to put the TB on a HDD the much larger lookup times probably make it too slow to being offset by the gain from using TBs. (though I am just speculating of course)

It will depend on the probing depth. Doing a TB probe allows the search to cut the whole subtree, so if the subtree is large enough (remaining depth is high enough) the probe will be worth it already for that reason. The improved accuracy is then an added bonus.

If you can simply host it that's even better of course. I think one doesn't need the 5v1s, right? (I think I remember seeing some code in SF that handles incomplete TBs and obviously 5v1s can be handled by the engine without problems)

You mean 6v1. There is little reason not to generate it, if only to be the first with a complete set.

noobpwnftw · Post by **noobpwnftw** » Sun Mar 18, 2018 3:38 pm

The 6v1 ones are indeed meaningless in sense of improving engine strength, but they are important in the way that they require the generator to handle larger index space and for various verifications.

I also built 6v1 up to 2 pawns just to test the pawn code.

Only if these files are correct can I then safely assume that building the rest of the combinations would also produce correct results, it is time consuming and resource heavy so I really don't wish to find things broke after a couple of months and start over.

Rein Halbersma · Post by **Rein Halbersma** » Sun Mar 18, 2018 5:09 pm

syzygy wrote:
koedem wrote:However if you have to put the TB on a HDD the much larger lookup times probably make it too slow to being offset by the gain from using TBs. (though I am just speculating of course)
It will depend on the probing depth. Doing a TB probe allows the search to cut the whole subtree, so if the subtree is large enough (remaining depth is high enough) the probe will be worth it already for that reason. The improved accuracy is then an added bonus.

IIRC, you still do memory-mapped IO with unconditional probing? In checkers/draughts, where the dbs are adding quite a bit of playing strength, people write their own caching and conditionally probe the dbs. Typically they keep 4K (or whatever size the disk controller nowadays uses) blocks in an LRU-cache. When the search encounters an endgame, first the cache is being probed (unconditionally), if there is a cache miss, then some tunable function of search depth and search bound is being used to decide whether a disk read should be done. If so, a new block is being loaded and overwrites the LRU block in the cache. This disk read could also be done asynchronously in a separate thread, so that if a re-search is done the block is in cache.

syzygy · Post by **syzygy** » Sun Mar 18, 2018 7:01 pm

Rein Halbersma wrote:
syzygy wrote:
koedem wrote:However if you have to put the TB on a HDD the much larger lookup times probably make it too slow to being offset by the gain from using TBs. (though I am just speculating of course)
It will depend on the probing depth. Doing a TB probe allows the search to cut the whole subtree, so if the subtree is large enough (remaining depth is high enough) the probe will be worth it already for that reason. The improved accuracy is then an added bonus.
IIRC, you still do memory-mapped IO with unconditional probing?

Yes, it's so much easier. Of course probing can be limited to a particular remaining search depth and above.

In checkers/draughts, where the dbs are adding quite a bit of playing strength, people write their own caching and conditionally probe the dbs. Typically they keep 4K (or whatever size the disk controller nowadays uses) blocks in an LRU-cache.

But how are you going to do that efficiently in user space if you have 32 threads or so trying to access the TBs at the same time? I prefer to leave all the locking logic to the kernel, which can probably do it more efficiently anyway.

When the search encounters an endgame, first the cache is being probed (unconditionally), if there is a cache miss, then some tunable function of search depth and search bound is being used to decide whether a disk read should be done. If so, a new block is being loaded and overwrites the LRU block in the cache. This disk read could also be done asynchronously in a separate thread, so that if a re-search is done the block is in cache.

But the block to be read from disk might already/still be in the disk cache without the program knowing it.

On Linux it is possible (with mincore()) to check whether a particular virtual-memory address corresponds to an existing physical page. In theory that could be used to implement "smart" probing, but I don't know how efficient it would be.

With 6-piece TBs on SSD and a decent amount of RAM, there is no real probing overhead. With 7-piece TBs on HDD that will of course be different.

Rein Halbersma · Post by **Rein Halbersma** » Sun Mar 18, 2018 11:11 pm

syzygy wrote:
Rein Halbersma wrote:
syzygy wrote:
koedem wrote:However if you have to put the TB on a HDD the much larger lookup times probably make it too slow to being offset by the gain from using TBs. (though I am just speculating of course)
It will depend on the probing depth. Doing a TB probe allows the search to cut the whole subtree, so if the subtree is large enough (remaining depth is high enough) the probe will be worth it already for that reason. The improved accuracy is then an added bonus.
IIRC, you still do memory-mapped IO with unconditional probing?
Yes, it's so much easier. Of course probing can be limited to a particular remaining search depth and above.

In checkers/draughts, where the dbs are adding quite a bit of playing strength, people write their own caching and conditionally probe the dbs. Typically they keep 4K (or whatever size the disk controller nowadays uses) blocks in an LRU-cache.
But how are you going to do that efficiently in user space if you have 32 threads or so trying to access the TBs at the same time? I prefer to leave all the locking logic to the kernel, which can probably do it more efficiently anyway.

The locking is indeed done for each disk read, but for the shortest amount of time. See this GitHub repo by Ed Gilbert (author of the Kingsrow checkers/draughts program) https://github.com/eygilbert/egdb_intl which I helped port to Linux. In particular see https://github.com/eygilbert/egdb_intl/ ... all_v2.cpp and grep for "lock_guard". I think this has only been tested for 8 threads so I don't know if it would scale up to 32 threads. For 8 threads and an SSD, there is hardly any loss in NPS though. In the old days with limited RAM and HDD, the search speed could drop by 10x when near the endgame.

When the search encounters an endgame, first the cache is being probed (unconditionally), if there is a cache miss, then some tunable function of search depth and search bound is being used to decide whether a disk read should be done. If so, a new block is being loaded and overwrites the LRU block in the cache. This disk read could also be done asynchronously in a separate thread, so that if a re-search is done the block is in cache.
But the block to be read from disk might already/still be in the disk cache without the program knowing it.

That can indeed happen if you try to outguess the general OS caching by a domain-specific cache in user-land.

koedem · Post by **koedem** » Mon Mar 19, 2018 1:49 am

syzygy wrote:
If you can simply host it that's even better of course. I think one doesn't need the 5v1s, right? (I think I remember seeing some code in SF that handles incomplete TBs and obviously 5v1s can be handled by the engine without problems)
You mean 6v1. There is little reason not to generate it, if only to be the first with a complete set.

I meant for me as end user there is no reason to download 5v1 (yes I mean that one) since it's not really necessary for play. (and then I rather use my SSD space for a select few important 7 men tables rather than completely useless 5v1 ones)

koedem · Post by **koedem** » Mon Mar 19, 2018 1:57 am

syzygy wrote:
koedem wrote:However if you have to put the TB on a HDD the much larger lookup times probably make it too slow to being offset by the gain from using TBs. (though I am just speculating of course)
It will depend on the probing depth. Doing a TB probe allows the search to cut the whole subtree, so if the subtree is large enough (remaining depth is high enough) the probe will be worth it already for that reason. The improved accuracy is then an added bonus.

Of course it depends on how much you save but the interesting bit should be what it averages out to, right? (if the average lookup time from an HDD which is rather large can be compensated by the average savings in search)

syzygy wrote:But how are you going to do that efficiently in user space if you have 32 threads or so trying to access the TBs at the same time? I prefer to leave all the locking logic to the kernel, which can probably do it more efficiently anyway.

In a smart OS like Linux that might work indeed however on Windows already the 6 men tables are problematic since the default Windows memory mapping will throw out all idle processes to load in more tb files. So when you come back to your computer after a few hours it takes like an hour till all the pages are in the RAM again. (up until then you have 300 pagefaults per second and unresponsive applications)
Why that happens, even when setting the priority of the SF process to lowest is beyond me and I am not sure how to change that behavior. (except of course there being a different probing code)

syzygy · Post by **syzygy** » Mon Mar 19, 2018 12:42 pm

koedem wrote:
syzygy wrote:
If you can simply host it that's even better of course. I think one doesn't need the 5v1s, right? (I think I remember seeing some code in SF that handles incomplete TBs and obviously 5v1s can be handled by the engine without problems)
You mean 6v1. There is little reason not to generate it, if only to be the first with a complete set.
I meant for me as end user there is no reason to download 5v1 (yes I mean that one) since it's not really necessary for play. (and then I rather use my SSD space for a select few important 7 men tables rather than completely useless 5v1 ones)

OK, I thought you were suggesting to skip the generation of 6v1. 5v1 of course exists already.

If you want to probe 5v2, you need 5v1 (and 4v2 and their subtables). But compared with 7-piece TBs, the 5v1 files are really tiny anyway. (And the 5v2 ones are smaller than they would have been if the probing code had not required 5v1 to be present).

7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt

Re: 7-men Syzygy attempt