ABDADA speedup results

Daniel Shawul · Post by **Daniel Shawul** » Sun May 05, 2013 12:40 am

Yes that is how I have it and it seems to work without problems so far. I have some rare crashes that I haven't fixed, but hopefully it will be gone when I re-write with processes.

Daniel Shawul · Post by **Daniel Shawul** » Sun May 05, 2013 12:50 am

Laszlo Gaspar wrote:Hi Daniel,

I used threads as it was easy in Java. Nowadays I am thinking about a cluster aware version too but there is still nothing concrete. Thank you for pointing out the LMR-problem I missed it in my design !
Now that you have ABDADA you could try out the lazy smp idea by Daniel Homan! I first got improvement but later tests didn't confirm it and I am curious...

I had some discussions with him and also did tests sometime ago but I couldn't get more than 1.25x or so. The problem is most of the work is redundant and work sharing is done only at the root in even/odd manner. ABDADA does work sharing in internal nodes too,so it scales well. Without that your only hopes are speculative parallel search (work sharing before even searching first move fully), and starting two separate searches at two different depths. I tested the first one and it didn't improve it, but in any case I very much doubt it would come any close to ABDADA or YBW with regard to searching the minimal tree.

Daniel Shawul · Post by **Daniel Shawul** » Mon May 06, 2013 1:39 am

I just finished porting abdada to scorpio and it seems the speed up lowered to 1.5x on two processors and about 2.2x on 4 processors. That is a massive drop for the 4 processor case. I have uploaded source code at the github link below if anyone is interested. I did the porting today so there may be bugs. Anyway I will keep on working on it to see if it can be improved.

Michel · Post by **Michel** » Mon May 06, 2013 2:40 am

Well a speedup of 1.5 is 40 Elo which is what other people have reported.
It is BTW also the elo increase of Toga per processor doubling. Toga uses a shared hash algorithm ("lazy SMP").

Note 1.5^2=2.25 which is approximately 2.2. So this is not unexpected.

To judge ABDADA versus YBW one whould have to understand the price of not communicating beta cuttoffs among the threads.

Daniel Shawul · Post by **Daniel Shawul** » Mon May 06, 2013 3:03 am

Michel wrote:Well a speedup of 1.5 is 40 Elo which is what other people have reported.
It is BTW also the elo increase of Toga per processor doubling. Toga uses a shared hash algorithm ("lazy SMP").

That really surprizes me though. If I turn off the exclusiveP = false to make it similar to Toga's, then my speed up clearly falls down to 1.1x. Specially for Nebiyu it falls down from 1.7x to 1.1x , so the work sharing pays off. My guess is that for complex engines the move ordering between the search of two threads is very different so the odd/even move sharing maybe not be needed as much as a simplistic engine with very predictable ordering. I am just trying to explain why Nebiyu benefits much more than Scorpio assuming I haven't introduced a bug.

Note 1.5^2=2.25 which is approximately 2.2. So this is not unexpected.

Ok that makes sense. I still havent finished testing so I will try to run with big depths d=22 or so and produce definitve numbers.

To judge ABDADA versus YBW one whould have to understand the price of not communicating beta cuttoffs among the threads.

Maybe but it is impossible to do as one thread wouldn't know the state of the other. It is hard to analyze , but i for the work sharing to be effective both threads should more or less follow the same path helping each other out.

Daniel Shawul · Post by **Daniel Shawul** » Mon May 06, 2013 10:50 pm

I have now fixed some serious bugs and now Scorpio is also getting speedups as high as 1.7x or more from ABDADA. And to quench my curiosity of how the Toga scheme will compare against ABDADA, I added an option to turn of exclusiveP = false. That is all that is needed and the threeds will work parallely. This may not be exactly the same as Toga's since my threads synchronize at each iteration depth, but the answer I got is a resounding improvement for coordinating with TT like ABDADA. Speed up fall from 1.75x to 1.1 , and overhead almost doubled. The log file that took 1.2 hours to run can be seen here. The summary of averaged results for the three options I have now i.e: YBW, ABDADA, and SHT

Code: Select all

YBW&#58; 1.123	2.070	1.993
ABD&#58; 1.265	1.748	1.915
SHT&#58; 1.997	1.161	1.893

As you can see the overhead rises dramatically from 1.26 to 1.99 slowing the search down. The synchronization should not hurt a lot because when one thread finishes the iteration, the other should just probe from TT to finish quickly. But to satisfy my curiosity I will strictly follow Toga's way and see what I get. I used a fixed depth of 20 for the run, but I feel that for deeper depths YBW becomes harder to catch. I started a depth=24 but it failed after a 5 hour run.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Tue May 07, 2013 9:11 am

I completed a depth=23 run in 6 hours. Here is a brief summary of the previous and new results.

LogFiles:

LogDepth20
LogDepth23

Depth=20

Code: Select all

YBW&#58; 1.123   2.070   1.993 
ABD&#58; 1.265   1.748   1.915 
SHT&#58; 1.997   1.161   1.893

Depth=23

Code: Select all

YBW&#58; 1.070	2.164	1.963 
ABD&#58; 1.393	1.708	1.920 
SHT&#58; 2.005	1.111	1.982

The conclusions I am making from these are

The SHT scheme is very bad at least for Scorpio (speed up of 1.11x). I don't know how others get benefit from it without using ABDADA like sharing ... Maybe root splitting at the top, or searching at different depths can help, but I think it is clear that it will not give results close to the others.
ABDADA is really good at least on 2 processors. The authors have mentioned that ABDADA beats YBW for chess, while it looses the comparison for Reversi which has a poorer move ordering. It did not beat YBW , but did not disappoint at all. There are still improvements to be made because right now I do not allow work sharing at the root. This is just for convenience that if a fail high occurs, I do not exit the iteration.

There are other primitive parallelization schemes such as unsynchronized ID on parts of the move list, or searching different sections of the aspiration window. But I don't think these are worth my time. Let me know if there are other easy parallelization schemes I forgot.

Daniel

ABDADA speedup results

Re: ABDADA speedup results

Re: ABDADA speedup results

Re: ABDADA speedup results

Re: ABDADA speedup results

Re: ABDADA speedup results

YBW, ABDADA and SHT comparison

Re: YBW, ABDADA and SHT comparison