ABDADA speedup results

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: ABDADA speedup results

Post by Daniel Shawul »

Yes that is how I have it and it seems to work without problems so far. I have some rare crashes that I haven't fixed, but hopefully it will be gone when I re-write with processes.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: ABDADA speedup results

Post by Daniel Shawul »

Laszlo Gaspar wrote:Hi Daniel,

I used threads as it was easy in Java. Nowadays I am thinking about a cluster aware version too but there is still nothing concrete. Thank you for pointing out the LMR-problem I missed it in my design :)!
Now that you have ABDADA you could try out the lazy smp idea by Daniel Homan! I first got improvement but later tests didn't confirm it and I am curious...
I had some discussions with him and also did tests sometime ago but I couldn't get more than 1.25x or so. The problem is most of the work is redundant and work sharing is done only at the root in even/odd manner. ABDADA does work sharing in internal nodes too,so it scales well. Without that your only hopes are speculative parallel search (work sharing before even searching first move fully), and starting two separate searches at two different depths. I tested the first one and it didn't improve it, but in any case I very much doubt it would come any close to ABDADA or YBW with regard to searching the minimal tree.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: ABDADA speedup results

Post by Daniel Shawul »

I just finished porting abdada to scorpio and it seems the speed up lowered to 1.5x on two processors and about 2.2x on 4 processors. That is a massive drop for the 4 processor case. I have uploaded source code at the github link below if anyone is interested. I did the porting today so there may be bugs. Anyway I will keep on working on it to see if it can be improved.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: ABDADA speedup results

Post by Michel »

Well a speedup of 1.5 is 40 Elo which is what other people have reported.
It is BTW also the elo increase of Toga per processor doubling. Toga uses a shared hash algorithm ("lazy SMP").

Note 1.5^2=2.25 which is approximately 2.2. So this is not unexpected.

To judge ABDADA versus YBW one whould have to understand the price of not communicating beta cuttoffs among the threads.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: ABDADA speedup results

Post by Daniel Shawul »

Michel wrote:Well a speedup of 1.5 is 40 Elo which is what other people have reported.
It is BTW also the elo increase of Toga per processor doubling. Toga uses a shared hash algorithm ("lazy SMP").
That really surprizes me though. If I turn off the exclusiveP = false to make it similar to Toga's, then my speed up clearly falls down to 1.1x. Specially for Nebiyu it falls down from 1.7x to 1.1x , so the work sharing pays off. My guess is that for complex engines the move ordering between the search of two threads is very different so the odd/even move sharing maybe not be needed as much as a simplistic engine with very predictable ordering. I am just trying to explain why Nebiyu benefits much more than Scorpio assuming I haven't introduced a bug.
Note 1.5^2=2.25 which is approximately 2.2. So this is not unexpected.
Ok that makes sense. I still havent finished testing so I will try to run with big depths d=22 or so and produce definitve numbers.
To judge ABDADA versus YBW one whould have to understand the price of not communicating beta cuttoffs among the threads.
Maybe but it is impossible to do as one thread wouldn't know the state of the other. It is hard to analyze , but i for the work sharing to be effective both threads should more or less follow the same path helping each other out.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

YBW, ABDADA and SHT comparison

Post by Daniel Shawul »

I have now fixed some serious bugs and now Scorpio is also getting speedups as high as 1.7x or more from ABDADA. And to quench my curiosity of how the Toga scheme will compare against ABDADA, I added an option to turn of exclusiveP = false. That is all that is needed and the threeds will work parallely. This may not be exactly the same as Toga's since my threads synchronize at each iteration depth, but the answer I got is a resounding improvement for coordinating with TT like ABDADA. Speed up fall from 1.75x to 1.1 , and overhead almost doubled. The log file that took 1.2 hours to run can be seen here. The summary of averaged results for the three options I have now i.e: YBW, ABDADA, and SHT

Code: Select all

YBW: 1.123	2.070	1.993
ABD: 1.265	1.748	1.915
SHT: 1.997	1.161	1.893
As you can see the overhead rises dramatically from 1.26 to 1.99 slowing the search down. The synchronization should not hurt a lot because when one thread finishes the iteration, the other should just probe from TT to finish quickly. But to satisfy my curiosity I will strictly follow Toga's way and see what I get. I used a fixed depth of 20 for the run, but I feel that for deeper depths YBW becomes harder to catch. I started a depth=24 but it failed after a 5 hour run.

Daniel
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: YBW, ABDADA and SHT comparison

Post by Daniel Shawul »

I completed a depth=23 run in 6 hours. Here is a brief summary of the previous and new results.

LogFiles:

LogDepth20
LogDepth23

Depth=20

Code: Select all

YBW: 1.123   2.070   1.993 
ABD: 1.265   1.748   1.915 
SHT: 1.997   1.161   1.893 
Depth=23

Code: Select all

YBW: 1.070	2.164	1.963 
ABD: 1.393	1.708	1.920 
SHT: 2.005	1.111	1.982 
The conclusions I am making from these are
  • The SHT scheme is very bad at least for Scorpio (speed up of 1.11x). I don't know how others get benefit from it without using ABDADA like sharing ... Maybe root splitting at the top, or searching at different depths can help, but I think it is clear that it will not give results close to the others.
  • ABDADA is really good at least on 2 processors. The authors have mentioned that ABDADA beats YBW for chess, while it looses the comparison for Reversi which has a poorer move ordering. It did not beat YBW , but did not disappoint at all. There are still improvements to be made because right now I do not allow work sharing at the root. This is just for convenience that if a fail high occurs, I do not exit the iteration.
There are other primitive parallelization schemes such as unsynchronized ID on parts of the move list, or searching different sections of the aspiration window. But I don't think these are worth my time. Let me know if there are other easy parallelization schemes I forgot.

Daniel