7-men Syzygy attempt

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: 7-man Syzygy attempt.

Post by syzygy »

abulmo2 wrote:You forgot the dtz50 table:
You were not talking about dtz50 tables:
apart for a few trivial cases that a 1 ply search can decipher.
A 1-ply search is insufficient to solve KRvK and KQvK.
Moreover, you cut my quote where the important sentence was the following one:
Of course the same can be said for many 4-men to 8-men tables
Ah duh... Why do you think the 4-, 5- and 6-men tables do not efficiently store the trivial cases?
For example, are the 5-men positions with an isolated king usefull?
They are very useful because they cut down the size of the 4v2 tables.
I understand that as a table maker, you want to be exhaustive; but I guess you could agree that an incomplete 7-men table restricted to hard to decipher positions are somewhat more usefull than a bunch of trivial 3,4,5 and 6-men positions.
Do you realise that that bunch of trivial positions takes up less space than 0.1% of a non-trivial 7-men table? There is nothing to save here.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 7-man Syzygy attempt.

Post by Dann Corbit »

syzygy wrote:
abulmo2 wrote:You forgot the dtz50 table:
You were not talking about dtz50 tables:
apart for a few trivial cases that a 1 ply search can decipher.
A 1-ply search is insufficient to solve KRvK and KQvK.
Moreover, you cut my quote where the important sentence was the following one:
Of course the same can be said for many 4-men to 8-men tables
Ah duh... Why do you think the 4-, 5- and 6-men tables do not efficiently store the trivial cases?
For example, are the 5-men positions with an isolated king usefull?
They are very useful because they cut down the size of the 4v2 tables.
I understand that as a table maker, you want to be exhaustive; but I guess you could agree that an incomplete 7-men table restricted to hard to decipher positions are somewhat more usefull than a bunch of trivial 3,4,5 and 6-men positions.
Do you realise that that bunch of trivial positions takes up less space than 0.1% of a non-trivial 7-men table? There is nothing to save here.
I want the "absurd" tables too.

Besides the interesting statistics, consider the number of drawn positions in
KQQQQQk.
It's huge.
Of course, in a case like that, a tempo is a precious quantity for the disadvantaged side.

Let's look at the down-side:
Very small amount of space consumed by tables that will not be called during game play.
Very small amount of time lost during computation.

And not the positives:
Simplification of computation of other tables.
Mathematical completeness.
Interesting statistics generated.
And when you are done, you can delete any files you don't want.

It seems a no-brainer to me.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: 7-man Syzygy attempt.

Post by noobpwnftw »

Some more progress:

All 6+1 pawnless tables are available, pawnful ones are currently being built and uploaded as well as the 5+2 pawnless ones, so far no occurrence of DTZ overflow, I guess those few are in 4+3.

Hint: most calculation work runs quite fast and since there are CAS atomics used during intermediate memory access, more precisely, "lock cmpxchg8b" on random unaligned memory addresses. It is by design and I don't think can be avoided, already did enough fiddling.

This will hit memory bandwidth with about 60-90 threads on Xeon V4 & Skylake-SP processors, and it will backfire up to 10x slowdown on the latter with more threads, affecting all threads, while on the former only the extra threads will show high memory access latency. So it does not benefit from having more threads.

Binding seems not helpful and I guess a hard cap of 64 threads should be there as a reminder to everyone else not to waste more time on this matter.
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: 7-man Syzygy attempt.

Post by Nordlandia »

As of now, it is highly likely to indicate wdl is to be less or more than 10 TB in size. For simple adjudication only wdl is needed. 10 TB HDDs is fairly affordable.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 7-man Syzygy attempt.

Post by Dann Corbit »

Nordlandia wrote:As of now, it is highly likely to indicate wdl is to be less or more than 10 TB in size. For simple adjudication only wdl is needed. 10 TB HDDs is fairly affordable.
There is a commercial 100 TB SSD drive.
I guess in 5 years a 100 TB SSD drive will be less than $1000.
Anyone who wants will be able to afford storage for the full 7 man files.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: 7-man Syzygy attempt.

Post by Jesse Gersenson »

noobpwnftw wrote:Some more progress:

All 6+1 pawnless tables are available, pawnful ones are currently being built and uploaded as well as the 5+2 pawnless ones, so far no occurrence of DTZ overflow, I guess those few are in 4+3.

Hint: most calculation work runs quite fast and since there are CAS atomics used during intermediate memory access, more precisely, "lock cmpxchg8b" on random unaligned memory addresses. It is by design and I don't think can be avoided, already did enough fiddling.

This will hit memory bandwidth with about 60-90 threads on Xeon V4 & Skylake-SP processors, and it will backfire up to 10x slowdown on the latter with more threads, affecting all threads, while on the former only the extra threads will show high memory access latency. So it does not benefit from having more threads.

Binding seems not helpful and I guess a hard cap of 64 threads should be there as a reminder to everyone else not to waste more time on this matter.
Was that on bare metal or on the VM? I was going to suggest you request a special build from VMware with the core limit increased.
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: 7-man Syzygy attempt.

Post by Nordlandia »

Dann Corbit wrote:
Nordlandia wrote:As of now, it is highly likely to indicate wdl is to be less or more than 10 TB in size. For simple adjudication only wdl is needed. 10 TB HDDs is fairly affordable.
There is a commercial 100 TB SSD drive.
I guess in 5 years a 100 TB SSD drive will be less than $1000.
Anyone who wants will be able to afford storage for the full 7 man files.
7 man files is around the corner. Storing them in SSD format is extremely expensive. Alternatively using HDD for adjudication in cutechess is less expensive. In the long run like you're implying, more and more people can afford them for raw analysis. HDD simply inflict severe bottleneck, thus only choise is for adjudication.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: 7-man Syzygy attempt.

Post by noobpwnftw »

Jesse Gersenson wrote: Was that on bare metal or on the VM? I was going to suggest you request a special build from VMware with the core limit increased.
On bare metal.

In VM the problem is still there but not that obvious due to lesser running threads and the way hypervisor scheduling stuff, but I tested it on an evaluation version of the latest Hyper-V VM which supports a max of 240 cores, it also backfired similarly. So it probably would be the same if I have that special build.

The whole picture is: On older platforms if you do that, it would just make your extra threads running slower, on the latest platform it will affect your entire system and the total combined performance will be way lower. This is a disturbing fact I find hard to believe.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: 7-man Syzygy attempt.

Post by syzygy »

noobpwnftw wrote:Hint: most calculation work runs quite fast and since there are CAS atomics used during intermediate memory access, more precisely, "lock cmpxchg8b" on random unaligned memory addresses. It is by design and I don't think can be avoided, already did enough fiddling.
Yes, the locked instructions are necessary to guarantee correctness.

I would expect there to be hardly any contention even with very many threads when generating a 7-piece table, so inserting pause instructions shouldn't make a difference. Without contention, locked instruction don't generate much overhead on modern cpus. But this might be different on NUMA machines, in particular if the accessed memory location is a remote location. (Even without the lock prefix things might be slowing down though if the NUMA interconnect starts to run out of bandwidth.)
This will hit memory bandwidth with about 60-90 threads on Xeon V4 & Skylake-SP processors, and it will backfire up to 10x slowdown on the latter with more threads, affecting all threads, while on the former only the extra threads will show high memory access latency. So it does not benefit from having more threads.
So "Skylape Scalable Performance" is not so scalable?
Binding seems not helpful and I guess a hard cap of 64 threads should be there as a reminder to everyone else not to waste more time on this matter.
Optimising NUMA memory allocation together with thread binding might still help a bit. I will have a look at it.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: 7-man Syzygy attempt.

Post by syzygy »

Btw, are there differences in scalability between the generator, the permutator and the compressor? It might make sense to use different numbers of threads for these parts...