? about lazy SMP

Chessnut1071 · Post by **Chessnut1071** » Mon Sep 05, 2022 6:31 am

Assume you have a very strong evaluation function that is about 70% correct finding the best move.

?: why do you need lazy SMP? Even if you are 50% correct, isn't that better than lazy SMP?

?: Does anybody have data on the evaluation success rate and SMP; otherwise, I have a lot of work ahead.

thx in advance

expositor · Post by **expositor** » Mon Sep 05, 2022 7:00 am

I'm not sure I understand the question; you can have both a strong evaluation function and use lazy SMP.

Could you explain what you currently think "lazy SMP" means and why you think lazy SMP may not be useful?

hgm · Post by **hgm** » Mon Sep 05, 2022 9:04 am

You don't seem to understand the effect of SMP. It is just a method to speed up a program, by dividing up the work over several CPUs, which would otherwise be idle. Ideally using 8 CPUs instead of 1 would give you the same result 8 times faster.

Rebel · Post by **Rebel** » Mon Sep 05, 2022 9:06 am

Chessnut1071 wrote: ↑Mon Sep 05, 2022 6:31 am Assume you have a very strong evaluation function that is about 70% correct finding the best move.

?: why do you need lazy SMP? Even if you are 50% correct, isn't that better than lazy SMP?

?: Does anybody have data on the evaluation success rate and SMP; otherwise, I have a lot of work ahead.

thx in advance

I guess you mean lazy-eval?

Your question makes more sense then?

Chessnut1071 · Post by **Chessnut1071** » Mon Sep 05, 2022 2:29 pm

hgm wrote: ↑Mon Sep 05, 2022 9:04 am You don't seem to understand the effect of SMP. It is just a method to speed up a program, by dividing up the work over several CPUs, which would otherwise be idle. Ideally using 8 CPUs instead of 1 would give you the same result 8 times faster.

Roger that; however, I have only one CPU but many threads. So, dividing up the work between the threads is all I have to work with. The idea behind the lazy SMP is to set off the search following slightly different paths and have the fastest one finish ahead of the others. I agree most of the time this is very effective at speeding up the search; however, my objective is finding the fastest checkmate in a given position. The FEN below is an example:

"1n3r2/3k2pp/pp1P4/1p4b1/1q3B2/5Q2/PPP2PP/R4RK1 w - - 0 1 "; // Chess.com forum May 3, 2010

Using alpha/beta and a simple Zobirst hash [which doesn't overlap any more: thx] there are well over 80 billion nodes to search. If we have average luck it's at least 40 billion. I have a 12-bit evaluation function for 1st move, last move and middle move for white and one eval for black. I have a very fat 64-bit ulong which hold 17 positional variables and one 12-bit eval. The eval includes 20 optimized parameters for each of the white and black components. Without lazy SMP my engine finds the solution in under 7 seconds, looking at 11.502,081 nodes; 9,936,168 white and 1,565,913 black. Perhaps it's because I have only one cpu, but, lazy SMP more than doubles the time in this circumstance. I'm using Emil Ostensen's suggestion for implementation of lazy SMP. Perhaps my code is not bug free. If so, can anybody find the solution to the FEN above examining less nodes than the 11,502,801 above with a better implementation of lazy SMP? Since my engine is slow, nodes is a better comparison than time.

hgm · Post by **hgm** » Mon Sep 05, 2022 3:06 pm

If you have only a single CPU there is nothing to gain. And in fact a lot to lose, because the speedup that can be reached in chess is not perfect, as some if the CPUs over which you divide the work will partly duplicate each other's effort. If you use more threads than CPUs that extra work all has to be performed by the same CPU, which was running at its maximum utilization anyway.

One caveat; CPUs with Intel architecture often have a feature called Hyper Threading, which effectively splits the CPU into two equal parts, each with half the capacity. For tasks that leave the CPU idle a large fraction of the time waiting for instructions with long latency, this can lead to better CPU utilization. Some chess programs benefit from turning HT on.

Chessnut1071 · Post by **Chessnut1071** » Mon Sep 05, 2022 4:00 pm

hgm wrote: ↑Mon Sep 05, 2022 3:06 pm If you have only a single CPU there is nothing to gain. And in fact a lot to lose, because the speedup that can be reached in chess is not perfect, as some if the CPUs over which you divide the work will partly duplicate each other's effort. If you use more threads than CPUs that extra work all has to be performed by the same CPU, which was running at its maximum utilization anyway.

One caveat; CPUs with Intel architecture often have a feature called Hyper Threading, which effectively splits the CPU into two equal parts, each with half the capacity. For tasks that leave the CPU idle a large fraction of the time waiting for instructions with long latency, this can lead to better CPU utilization. Some chess programs benefit from turning HT on.

I totally agree, it make no sense to me why lazy SMP works with only one CPU; however, Emil Ostensen. "A Complete Chess Engine Parallelized Using Lazy SMP", shows [pp. 67 - 78] a significant reduction in search time using just threads! Apparently, others found similar reductions with one CPU and many threads. Ostensen listed a table describing an optimized thread to depth ratio table for best results [p 68]. Perhaps lazy SMP is better applied to ELO than my checkmate objective. If there is a way to benefit from one CPU and many threads I would like to know how they did it with an example. thx for the reply.

chrisw · Post by **chrisw** » Mon Sep 05, 2022 5:39 pm

Chessnut1071 wrote: ↑Mon Sep 05, 2022 4:00 pm
hgm wrote: ↑Mon Sep 05, 2022 3:06 pm If you have only a single CPU there is nothing to gain. And in fact a lot to lose, because the speedup that can be reached in chess is not perfect, as some if the CPUs over which you divide the work will partly duplicate each other's effort. If you use more threads than CPUs that extra work all has to be performed by the same CPU, which was running at its maximum utilization anyway.

One caveat; CPUs with Intel architecture often have a feature called Hyper Threading, which effectively splits the CPU into two equal parts, each with half the capacity. For tasks that leave the CPU idle a large fraction of the time waiting for instructions with long latency, this can lead to better CPU utilization. Some chess programs benefit from turning HT on.
I totally agree, it make no sense to me why lazy SMP works with only one CPU; however, Emil Ostensen. "A Complete Chess Engine Parallelized Using Lazy SMP", shows [pp. 67 - 78] a significant reduction in search time using just threads! Apparently, others found similar reductions with one CPU and many threads. Ostensen listed a table describing an optimized thread to depth ratio table for best results [p 68]. Perhaps lazy SMP is better applied to ELO than my checkmate objective. If there is a way to benefit from one CPU and many threads I would like to know how they did it with an example. thx for the reply.

It probably helps to read more thoroughly, especially the results and testing conditions.
1. The PC used for testing has 4 cores.
2. The author compared time to depth on a set of test positions with his engine set to 1, 2 and 4 cores (threads).
3. He reported 4 core engine 2x speed of 1 core engine, with 2 core somewhere in between.

All being in accord with engine programmer crowd knowledge since whenever.

The author did NOT report that setting threads to, say, 4 on a one core CPU gave an advantage. Unsurprisingly, since he didn’t test for that, since his PC had 4 cores.
He did report that 4 threads on a 4 core CPU gave better results than 1 thread on a 4 core CPU.
Since your CPU has 1 core only, you’re currently unable to repeat his test setup.

JVMerlino · Post by **JVMerlino** » Mon Sep 05, 2022 7:12 pm

Chessnut1071 wrote: ↑Mon Sep 05, 2022 2:29 pm [d] 1n3r2/3k2pp/pp1P4/1p4b1/1q3B2/5Q2/PPP2PP1/R4RK1 w - - 0 1

Without lazy SMP my engine finds the solution in under 7 seconds, looking at 11.502,081 nodes; 9,936,168 white and 1,565,913 black. If so, can anybody find the solution to the FEN above examining less nodes than the 11,502,801 above with a better implementation of lazy SMP? Since my engine is slow, nodes is a better comparison than time.

With only one core, Myrddin finds Mate in 7 at depth 7 in 0.06 seconds after searching only 76,414 nodes. Myrddin's Lazy SMP implementation just launches extra processes with slightly different search depth parameters so that the primary process can find the best move more quickly, but searching MANY more nodes, of which an embarrassingly large percentage are redundant but hopefully found in the hash table.

So in this case, with 4 cores, Myrddin finds Mate in 7 at depth 7 in 0.04 seconds (about 2/3 the time), but after searching 156,279 seconds (more than 2x nodes).

Chessnut1071 · Post by **Chessnut1071** » Mon Sep 05, 2022 8:43 pm

JVMerlino wrote: ↑Mon Sep 05, 2022 7:12 pm
Chessnut1071 wrote: ↑Mon Sep 05, 2022 2:29 pm [d] 1n3r2/3k2pp/pp1P4/1p4b1/1q3B2/5Q2/PPP2PP1/R4RK1 w - - 0 1

Without lazy SMP my engine finds the solution in under 7 seconds, looking at 11.502,081 nodes; 9,936,168 white and 1,565,913 black. If so, can anybody find the solution to the FEN above examining less nodes than the 11,502,801 above with a better implementation of lazy SMP? Since my engine is slow, nodes is a better comparison than time.
With only one core, Myrddin finds Mate in 7 at depth 7 in 0.06 seconds after searching only 76,414 nodes. Myrddin's Lazy SMP implementation just launches extra processes with slightly different search depth parameters so that the primary process can find the best move more quickly, but searching MANY more nodes, of which an embarrassingly large percentage are redundant but hopefully found in the hash table.

So in this case, with 4 cores, Myrddin finds Mate in 7 at depth 7 in 0.04 seconds (about 2/3 the time), but after searching 156,279 seconds (more than 2x nodes).

Any info on what Myrddin was running on, language it used, and what algorithm it used to achieve those results? I notice that even with 4 cores there's not much improvement because of the redundancy.

? about lazy SMP

? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP

Re: ? about lazy SMP