SMP (new style)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

SMP (new style)

Post by Rebel »

For Bob, not derailing his other SMP thread.

With so many cores available, why not include (add) a seond approach to it?

Say you have 16 cores, use 12 of them for regular SMP and 4 for a special search. I am currently experimenting with this (special) serach. On my quad I divide the available root moves over the 3 remaining cores and reward the best one.

Often you get some surprising results due to the A/B behavior where 1 or 2 of the other cores become a lot deeper and can pick up the best move much earlier. 2 examples, of course I am cherry picking.

[d] r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq -

Code: Select all

 Core-1 00:00:15 13 0.36 h5e2 (2.006.884)  all moves searched
 Core-2 00:00:20 15 0.26 d4e6 (1.681.782)
 Core-3 00:00:19 16 0.79 d4e6 (1.900.065)
 Core-4 00:00:09 13 0.42 h5e2 (1.581.967)
Best move 1.Nxe6! is picked up much earlier on core-3 (0:19) (depth 16) while the main search in core-1 is still at depth 13.

[D]rnb1kb1r/1p3ppp/p5q1/4p3/3N4/4BB2/PPPQ1P1P/R3K2R w KQkq -

Code: Select all

 Core-1 00:00:08 14 -0.10 d4b3 (2.179.111)   all moves searched
 Core-2 00:00:10 13 0.42 e1c1 (1.888.508)
 Core-3 00:00:06 14 -0.23 d2c3 (1.848.833)
 Core-4 00:00:05 14 -4.21 f3b7 (1.871.985)
Oddities like these also often happen.

The best move 1. 0-0-0 is found on core-2 at D=13 while the main search on core-1 is already at D=14 and missed 0-0-0. Another effect of the A/B.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: SMP (new style)

Post by Joost Buijs »

What is the difference with an SMP search that only splits at the root?
I don't see what is so special about it.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: SMP (new style)

Post by Rebel »

Like I said, the first core behaves as a normal SP program, all moves are searched. That's the difference.

The other 3 cores are fed with a couple of interesting moves, not too many.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP (new style)

Post by bob »

Rebel wrote:For Bob, not derailing his other SMP thread.

With so many cores available, why not include (add) a seond approach to it?

Say you have 16 cores, use 12 of them for regular SMP and 4 for a special search. I am currently experimenting with this (special) serach. On my quad I divide the available root moves over the 3 remaining cores and reward the best one.

Often you get some surprising results due to the A/B behavior where 1 or 2 of the other cores become a lot deeper and can pick up the best move much earlier. 2 examples, of course I am cherry picking.

[d] r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq -

Code: Select all

 Core-1 00:00:15 13 0.36 h5e2 (2.006.884)  all moves searched
 Core-2 00:00:20 15 0.26 d4e6 (1.681.782)
 Core-3 00:00:19 16 0.79 d4e6 (1.900.065)
 Core-4 00:00:09 13 0.42 h5e2 (1.581.967)
Best move 1.Nxe6! is picked up much earlier on core-3 (0:19) (depth 16) while the main search in core-1 is still at depth 13.

[D]rnb1kb1r/1p3ppp/p5q1/4p3/3N4/4BB2/PPPQ1P1P/R3K2R w KQkq -

Code: Select all

 Core-1 00:00:08 14 -0.10 d4b3 (2.179.111)   all moves searched
 Core-2 00:00:10 13 0.42 e1c1 (1.888.508)
 Core-3 00:00:06 14 -0.23 d2c3 (1.848.833)
 Core-4 00:00:05 14 -4.21 f3b7 (1.871.985)
Oddities like these also often happen.

The best move 1. 0-0-0 is found on core-2 at D=13 while the main search on core-1 is already at D=14 and missed 0-0-0. Another effect of the A/B.
If I reach the point where parallel search is diminishing, that's an idea. In fact, it is an idea used by Jonathan Schaeffer in Sun Phoenix back in the late 80's...

I'm just not ready to "resign" and say that there is nothing more that can be squeezed from a straight parallel search.

The scenario you gave I see all the time. But you have to realize it is the exception rather than the rule. I have a couple of positions that when searched to the right depth produce HUGE speedups... For example 3800 seconds on one core to 50 seconds on 20. Because the best root move is never #1 in the list... But overall the average speedup for 20 cores seems to be about 13x, measuring things in the most unfavorable way possible...
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: SMP (new style)

Post by Rebel »

bob wrote:
Rebel wrote:For Bob, not derailing his other SMP thread.

With so many cores available, why not include (add) a seond approach to it?

Say you have 16 cores, use 12 of them for regular SMP and 4 for a special search. I am currently experimenting with this (special) serach. On my quad I divide the available root moves over the 3 remaining cores and reward the best one.

Often you get some surprising results due to the A/B behavior where 1 or 2 of the other cores become a lot deeper and can pick up the best move much earlier. 2 examples, of course I am cherry picking.

[d] r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq -

Code: Select all

 Core-1 00:00:15 13 0.36 h5e2 (2.006.884)  all moves searched
 Core-2 00:00:20 15 0.26 d4e6 (1.681.782)
 Core-3 00:00:19 16 0.79 d4e6 (1.900.065)
 Core-4 00:00:09 13 0.42 h5e2 (1.581.967)
Best move 1.Nxe6! is picked up much earlier on core-3 (0:19) (depth 16) while the main search in core-1 is still at depth 13.

[D]rnb1kb1r/1p3ppp/p5q1/4p3/3N4/4BB2/PPPQ1P1P/R3K2R w KQkq -

Code: Select all

 Core-1 00:00:08 14 -0.10 d4b3 (2.179.111)   all moves searched
 Core-2 00:00:10 13 0.42 e1c1 (1.888.508)
 Core-3 00:00:06 14 -0.23 d2c3 (1.848.833)
 Core-4 00:00:05 14 -4.21 f3b7 (1.871.985)
Oddities like these also often happen.

The best move 1. 0-0-0 is found on core-2 at D=13 while the main search on core-1 is already at D=14 and missed 0-0-0. Another effect of the A/B.
If I reach the point where parallel search is diminishing, that's an idea.
Sure, that's the whole point, if extra cores don't bring much any longer, do something else with it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP (new style)

Post by bob »

Rebel wrote:
bob wrote:
Rebel wrote:For Bob, not derailing his other SMP thread.

With so many cores available, why not include (add) a seond approach to it?

Say you have 16 cores, use 12 of them for regular SMP and 4 for a special search. I am currently experimenting with this (special) serach. On my quad I divide the available root moves over the 3 remaining cores and reward the best one.

Often you get some surprising results due to the A/B behavior where 1 or 2 of the other cores become a lot deeper and can pick up the best move much earlier. 2 examples, of course I am cherry picking.

[d] r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq -

Code: Select all

 Core-1 00:00:15 13 0.36 h5e2 (2.006.884)  all moves searched
 Core-2 00:00:20 15 0.26 d4e6 (1.681.782)
 Core-3 00:00:19 16 0.79 d4e6 (1.900.065)
 Core-4 00:00:09 13 0.42 h5e2 (1.581.967)
Best move 1.Nxe6! is picked up much earlier on core-3 (0:19) (depth 16) while the main search in core-1 is still at depth 13.

[D]rnb1kb1r/1p3ppp/p5q1/4p3/3N4/4BB2/PPPQ1P1P/R3K2R w KQkq -

Code: Select all

 Core-1 00:00:08 14 -0.10 d4b3 (2.179.111)   all moves searched
 Core-2 00:00:10 13 0.42 e1c1 (1.888.508)
 Core-3 00:00:06 14 -0.23 d2c3 (1.848.833)
 Core-4 00:00:05 14 -4.21 f3b7 (1.871.985)
Oddities like these also often happen.

The best move 1. 0-0-0 is found on core-2 at D=13 while the main search on core-1 is already at D=14 and missed 0-0-0. Another effect of the A/B.
If I reach the point where parallel search is diminishing, that's an idea.
Sure, that's the whole point, if extra cores don't bring much any longer, do something else with it.
Unfortunately I am not there yet.

Schaeffer's biggest problem was deciding what to do when the two different searches didn't agree...
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: SMP (new style)

Post by Rebel »

bob wrote:
Rebel wrote:
bob wrote:
Rebel wrote:For Bob, not derailing his other SMP thread.

With so many cores available, why not include (add) a seond approach to it?

Say you have 16 cores, use 12 of them for regular SMP and 4 for a special search. I am currently experimenting with this (special) serach. On my quad I divide the available root moves over the 3 remaining cores and reward the best one.

Often you get some surprising results due to the A/B behavior where 1 or 2 of the other cores become a lot deeper and can pick up the best move much earlier. 2 examples, of course I am cherry picking.

[d] r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq -

Code: Select all

 Core-1 00:00:15 13 0.36 h5e2 (2.006.884)  all moves searched
 Core-2 00:00:20 15 0.26 d4e6 (1.681.782)
 Core-3 00:00:19 16 0.79 d4e6 (1.900.065)
 Core-4 00:00:09 13 0.42 h5e2 (1.581.967)
Best move 1.Nxe6! is picked up much earlier on core-3 (0:19) (depth 16) while the main search in core-1 is still at depth 13.

[D]rnb1kb1r/1p3ppp/p5q1/4p3/3N4/4BB2/PPPQ1P1P/R3K2R w KQkq -

Code: Select all

 Core-1 00:00:08 14 -0.10 d4b3 (2.179.111)   all moves searched
 Core-2 00:00:10 13 0.42 e1c1 (1.888.508)
 Core-3 00:00:06 14 -0.23 d2c3 (1.848.833)
 Core-4 00:00:05 14 -4.21 f3b7 (1.871.985)
Oddities like these also often happen.

The best move 1. 0-0-0 is found on core-2 at D=13 while the main search on core-1 is already at D=14 and missed 0-0-0. Another effect of the A/B.
If I reach the point where parallel search is diminishing, that's an idea.
Sure, that's the whole point, if extra cores don't bring much any longer, do something else with it.
Unfortunately I am not there yet.

Schaeffer's biggest problem was deciding what to do when the two different searches didn't agree...
It's not so hard. In the first example the Nxe6 move is found at depth 16 (on core-3) while the main search (on core-1) is still at depth 13. Score is higher, depth is higher, take it.

Second example is more tricky, the main search is at D=14 while something better (0-0-0) is found at core-2 one ply less deep at D=13. And since main search at D13 didn't find 0-0-0 its safe to assune the move is good anyway. For the moment, for security I allow a difference of one ply less, not more. Besides A/B LMR at depth 1 also plays a role here, On the first core 0-0-0 gets a LMR reduction, in the second core it gets not because it's high in the move-list. Fail-highs are not the problem.

Remains a Fail_Low in the main serch at core-1. If something better is the remaing cores with equal or higher depths you can make use of that too.

It's still in an experimental phase but I noticed already it's crucial what type of moves you spread to the other cores. There must be some kind of intelligent division. I (for instance) tried the (simple system of) pawn/knight moves to core-2, bishop and rook moves to core-3, queen and king into core-4 and it produces nothing, seldom something better is found.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP (new style)

Post by bob »

Rebel wrote:
bob wrote:
Rebel wrote:
bob wrote:
Rebel wrote:For Bob, not derailing his other SMP thread.

With so many cores available, why not include (add) a seond approach to it?

Say you have 16 cores, use 12 of them for regular SMP and 4 for a special search. I am currently experimenting with this (special) serach. On my quad I divide the available root moves over the 3 remaining cores and reward the best one.

Often you get some surprising results due to the A/B behavior where 1 or 2 of the other cores become a lot deeper and can pick up the best move much earlier. 2 examples, of course I am cherry picking.

[d] r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq -

Code: Select all

 Core-1 00:00:15 13 0.36 h5e2 (2.006.884)  all moves searched
 Core-2 00:00:20 15 0.26 d4e6 (1.681.782)
 Core-3 00:00:19 16 0.79 d4e6 (1.900.065)
 Core-4 00:00:09 13 0.42 h5e2 (1.581.967)
Best move 1.Nxe6! is picked up much earlier on core-3 (0:19) (depth 16) while the main search in core-1 is still at depth 13.

[D]rnb1kb1r/1p3ppp/p5q1/4p3/3N4/4BB2/PPPQ1P1P/R3K2R w KQkq -

Code: Select all

 Core-1 00:00:08 14 -0.10 d4b3 (2.179.111)   all moves searched
 Core-2 00:00:10 13 0.42 e1c1 (1.888.508)
 Core-3 00:00:06 14 -0.23 d2c3 (1.848.833)
 Core-4 00:00:05 14 -4.21 f3b7 (1.871.985)
Oddities like these also often happen.

The best move 1. 0-0-0 is found on core-2 at D=13 while the main search on core-1 is already at D=14 and missed 0-0-0. Another effect of the A/B.
If I reach the point where parallel search is diminishing, that's an idea.
Sure, that's the whole point, if extra cores don't bring much any longer, do something else with it.
Unfortunately I am not there yet.

Schaeffer's biggest problem was deciding what to do when the two different searches didn't agree...
It's not so hard. In the first example the Nxe6 move is found at depth 16 (on core-3) while the main search (on core-1) is still at depth 13. Score is higher, depth is higher, take it.

Second example is more tricky, the main search is at D=14 while something better (0-0-0) is found at core-2 one ply less deep at D=13. And since main search at D13 didn't find 0-0-0 its safe to assune the move is good anyway. For the moment, for security I allow a difference of one ply less, not more. Besides A/B LMR at depth 1 also plays a role here, On the first core 0-0-0 gets a LMR reduction, in the second core it gets not because it's high in the move-list. Fail-highs are not the problem.

Remains a Fail_Low in the main serch at core-1. If something better is the remaing cores with equal or higher depths you can make use of that too.

It's still in an experimental phase but I noticed already it's crucial what type of moves you spread to the other cores. There must be some kind of intelligent division. I (for instance) tried the (simple system of) pawn/knight moves to core-2, bishop and rook moves to core-3, queen and king into core-4 and it produces nothing, seldom something better is found.
It is not so simple. Suppose the score at depth 15, which is higher, is actually significantly lower than the depth 13 move, were it searched to depth 15. But the other case is also just as important. What happens if the depth 15 score is LOWER? Are you sure the depth 13 score won't drop by the time it gets to depth 15?

Jonathan pointed out that it is hard to compare two numbers when they don't come from equivalent domains. In this case the primary domain is depth.

I still think the right approach is to exclude this noise completely, by searching everything at a given ply in parallel, splitting and re-splitting until it is all done, you don't have to compare a depth 15 apple with a depth 13 orange...