New 6-piece tablebases

lucasart · Post by **lucasart** » Sat Apr 06, 2013 2:56 am

syzygy wrote:
lucasart wrote:How does the size of 5-men table size base compare with: the Nalimov TB ? the Gaviota TB ? And in terms of "average probing speed" ?
The 5-piece Nalimov TB are over 7 GB. I believe the 6-piece are over 1.2 TB (and those lack the 5v1 tables).
The 5-piece Gaviota TB are 6.5GB with LZMA compression, see here.

Average probing speed of my tables should be considerably better than that of Nalimov tables, but to be honest I have not measured it. But probing 1.2 TB Nalimov 6-piece tables will of course lead to much more disk trashing than probing 68.3 GB of tables. And 68.3 GB on SSD is much more affordable than 1.2 TB on SSD.

Woaw! So you're being modest, but in reality, your TB beats the pants out of everything out there. And yes, probing speed will surely be linked to size (disk access much slower than RAM). The size of your TB is more comparable to the size of a bitbase than a TB, so it's really a big improvement over Nalimov and Gaviota.

Would it be possible to modify your code to generate a bitbase only:
- one bit per position (disregarding 50-move counter)
- bit just says if it's a draw or not.

if not a draw, good engines will (in the vast majority of cases) be able to find the win on their own. the real difficulty is to understand that certain positions are a "no progress situation", despite a material advantage, or an apparent advantage in terms of eval score.

A very small bitbase with only draw info, that could fit in RAM for 5-men, would be really nice.

syzygy · Post by **syzygy** » Sat Apr 06, 2013 4:57 am

lucasart wrote:The size of your TB is more comparable to the size of a bitbase than a TB, so it's really a big improvement over Nalimov and Gaviota.

I believe most available "bitbases" are really WDL tables. So it's fair to compare my WDL tables to bitbases.

Would it be possible to modify your code to generate a bitbase only:
- one bit per position (disregarding 50-move counter)
- bit just says if it's a draw or not.

That is possible, but I don't think it would reduce their size by much. E.g. with KQvKR the positions where black K or R can immediately capture the white Q are already stored as "don't care". The few positions where black can mate or take the white Q in more than 1 ply don't take up much space, and the space that it takes up seems to be worthwhile since the evaluation will usually not catch these situations. Leaving it out would either result in erroneous results or force the engine to search a few more ply at each node where it could otherwise immediately return a result.

For tables that really have only two values such as KPvK or KBNvK, there would be no difference at all.

The story is the similar for the additional size due to the 50-move rule: for most tables there is hardly any impact and for tables such as KBBvKN having the information is quite useful.

If I find some time I might try to generate some draw/non-draw tables just to give some numbers.

A very small bitbase with only draw info, that could fit in RAM for 5-men, would be really nice.

Nowadays 378 MB should fit in RAM

.

What would really decrease the size is to make the WDL tables single sided. But that would affect probing speed quite significantly. Depending on how much RAM is available it might still be an overall win for 6-piece tables, but I have not tried it out.

Houdini · Post by **Houdini** » Sat Apr 06, 2013 5:25 pm

syzygy wrote:The probing code is fully multithreaded as well.

Ronald,

Congrats for your new EGTB system!
Could you comment on the multi-threaded performance? Suppose 8 threads are simultaneously probing, how much of the probing code can run simultaneously?

All the EGTB solutions I've tested so far (Gaviota, Nalimov, Scorpio) have a serious problem with multi-threaded access, too much of the code is behind a single mutex (critical section) and when 8 threads probe there are 7 threads waiting while the first thread is doing its probe.

Robert

Daniel Shawul · Post by **Daniel Shawul** » Sat Apr 06, 2013 6:02 pm

Houdini wrote:
syzygy wrote:The probing code is fully multithreaded as well.
Ronald,

Congrats for your new EGTB system!
Could you comment on the multi-threaded performance? Suppose 8 threads are simultaneously probing, how much of the probing code can run simultaneously?

All the EGTB solutions I've tested so far (Gaviota, Nalimov, Scorpio) have a serious problem with multi-threaded access, too much of the code is behind a single mutex (critical section) and when 8 threads probe there are 7 threads waiting while the first thread is doing its probe.

Robert

Hah..this is false at least for scorpio. I have code that allows upto 8 searchers to do probes simultaneously. The LRU cache has locks but that is just about it. Please lets not spread FUD.

syzygy · Post by **syzygy** » Sat Apr 06, 2013 6:37 pm

Houdini wrote:Could you comment on the multi-threaded performance? Suppose 8 threads are simultaneously probing, how much of the probing code can run simultaneously?

Multi-threaded access is completely simultaneous, except for a single lock that is taken when accessing a particular table for the first time (this is negligible).

One of the reasons for not needing the mutex that the other systems have is that I let the OS do all the caching. I simply mmap() the compressed tables into memory and directly acccess the compressed data. For 5-piece tables that should never be a problem (it's just 378 MB). For 6-piece tables I don't know how well e.g. Windows handles it.

Accessing 6-piece tables on 32-bit systems would need some work, because mmap()ing them will run into problems, but on 32-bit systems it seems to make more sense to stay with 5-piece tables anyway. I think the decompression code currently expects a 64-bit processor, but that can be changed quite easily.

Houdini · Post by **Houdini** » Sat Apr 06, 2013 7:54 pm

Daniel Shawul wrote:Hah..this is false at least for scorpio. I have code that allows upto 8 searchers to do probes simultaneously. The LRU cache has locks but that is just about it. Please lets not spread FUD.

AFAIK with large cache sizes a lot of time is spent traversing the LRU cache, and that is exactly what you are locking.
Do you have any performance data that would suggest otherwise?

Robert

Houdini · Post by **Houdini** » Sat Apr 06, 2013 7:55 pm

syzygy wrote:Multi-threaded access is completely simultaneous, except for a single lock that is taken when accessing a particular table for the first time (this is negligible).

That's great, I'm looking forward to testing your system.

Cheers,
Robert

Daniel Shawul · Post by **Daniel Shawul** » Sat Apr 06, 2013 8:31 pm

Houdini wrote:
Daniel Shawul wrote:Hah..this is false at least for scorpio. I have code that allows upto 8 searchers to do probes simultaneously. The LRU cache has locks but that is just about it. Please lets not spread FUD.
AFAIK with large cache sizes a lot of time is spent traversing the LRU cache, and that is exactly what you are locking.
Do you have any performance data that would suggest otherwise?

You ask me data for your claim? You are the one who said all 3 TBs can only probe with one thread while the rest are stuck, so lets see it.
My caches have always been small and they store _decompressed_ data. So a hit is very fast while if you let the OS cache it you will have to decompress a block of data everytime. I tested this a long time ago with upto 4 threads and found no problems, and the lru cache did speed things up.

Houdini · Post by **Houdini** » Sat Apr 06, 2013 8:46 pm

Daniel Shawul wrote:You ask me data for your claim? You are the one who said all 3 TBs can only probe with one thread while the rest are stuck, so lets see it.
My caches have always been small and they store _decompressed_ data. So a hit is very fast while if you let the OS cache it you will have to decompress a block of data everytime. I tested this a long time ago with upto 4 threads and found no problems, and the lru cache did speed things up.

In my experience implementing Scorpio bitbase probing in Houdini there is a significant reduction of the efficiency of the probing code when using multiple threads (4 or more), because of the locks on the LRU cache.
It's entirely up to you to decide to do something with this information or not, I have no further interest in this.

Daniel Shawul · Post by **Daniel Shawul** » Sat Apr 06, 2013 8:54 pm

Houdini wrote:
Daniel Shawul wrote:You ask me data for your claim? You are the one who said all 3 TBs can only probe with one thread while the rest are stuck, so lets see it.
My caches have always been small and they store _decompressed_ data. So a hit is very fast while if you let the OS cache it you will have to decompress a block of data everytime. I tested this a long time ago with upto 4 threads and found no problems, and the lru cache did speed things up.
In my experience implementing Scorpio bitbase probing in Houdini there is a significant reduction of the efficiency of the probing code when using multiple threads (4 or more), because of the locks on the LRU cache.
It's entirely up to you to decide to do something with this information or not, I have no further interest in this.

I decide to keep them in based on what i have. Stop spreading FUD or own up to your claims.

New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases