### Re: Trying to improve lazy smp

Posted:

**Mon Apr 13, 2015 11:24 pm**I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

Hosted by Your Move Chess & Games - chessusa.com

http://www.talkchess.com/forum3/

Page **3** of **6**

Posted: **Mon Apr 13, 2015 11:24 pm**

I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

Posted: **Tue Apr 14, 2015 12:01 am**

I wonder how that strategy would compare with analyzing the root node, the node after making the move with the second highest score (stored in a list from the previous iteration), the third highest score, and the 4th highest score and so on, for a small number of threads.cdani wrote:Yes Well. I will try to improve.bob wrote: Make a note: "quick and dirty" is often "quick and buggy". Done that too many times, haven't done it in quite a while however, as I simply always "measure twice, cut once" nowadays. My dad had it right.

New best version at 4 threads: +122 elo (from +117)

Not published for the moment, I'm triying it at 8 threads.

Added to depths, in readable format, of the four threads: 0 (obviously), +1, +2, +3.

Yes, is the one I wanted to try first, but by prudence I arrived to it slowly.

So the strategy is:

0 (obviosly)

+1a

+1b

+1c

...

Perhaps another method would be a function of the differences in the scores of the siblings (throw more horsepower to the strongest nodes).

e.g.

0 (obviosly)

+1a gets 2 threads for +40 score

+1b gets 1 thread for + 15 score

+1c gets 1 thread for + 12 score

or even:

0 (obviosly)

+1a gets 1 threads for +40 score

+2a (1st child node gets 1 thread due to the +40 score)

+1b gets 1 thread for + 15 score

+1c gets 1 thread for + 12 score

Posted: **Tue Apr 14, 2015 2:32 am**

This entire idea sounds like a LOT of waiting and twiddling overhead to me.Dann Corbit wrote:I wonder how that strategy would compare with analyzing the root node, the node after making the move with the second highest score (stored in a list from the previous iteration), the third highest score, and the 4th highest score and so on, for a small number of threads.cdani wrote:Yes Well. I will try to improve.bob wrote: Make a note: "quick and dirty" is often "quick and buggy". Done that too many times, haven't done it in quite a while however, as I simply always "measure twice, cut once" nowadays. My dad had it right.

New best version at 4 threads: +122 elo (from +117)

Not published for the moment, I'm triying it at 8 threads.

Added to depths, in readable format, of the four threads: 0 (obviously), +1, +2, +3.

Yes, is the one I wanted to try first, but by prudence I arrived to it slowly.

So the strategy is:

0 (obviosly)

+1a

+1b

+1c

...

Perhaps another method would be a function of the differences in the scores of the siblings (throw more horsepower to the strongest nodes).

e.g.

0 (obviosly)

+1a gets 2 threads for +40 score

+1b gets 1 thread for + 15 score

+1c gets 1 thread for + 12 score

or even:

0 (obviosly)

+1a gets 1 threads for +40 score

+2a (1st child node gets 1 thread due to the +40 score)

+1b gets 1 thread for + 15 score

+1c gets 1 thread for + 12 score

Posted: **Tue Apr 14, 2015 8:11 am**

My objective is to try to gain elo by increasing step by step the added depth to the different threads. So when I achieve a new improvement with minimal confidence, instead of increasing the confidence of this point I try to obtain a new point with higher gains. When at some point I will not be able to obtain an improvement, then I can spend resources soundly to consolidate the last point. The previous points are always consolidated automatically with higher confidence when a next point is found, simply because you know that if with higher depths you have some confidence that it works, with lower depths it must work nearly sure.bob wrote:I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

At 4 threads I try to obtain some relative sure results. At 8 threads or more, my objective is simply to be sure that at least is an improvement over 4, because of the limited resources.

In any case the objectives where to have a working multithreading engine that scales minimally, and also to try to prove this variation over lazy smp, and those things are already achieved over what I was hoping.

I will continue trying to improve this, also with slower time control, the next thing I will try to have minimal confidence that is not a regression.

Posted: **Wed Apr 15, 2015 2:05 am**

If you make a change that supposedly gains +8 elo as you reported, you have to have the error bar much lower than +/- 8 to have any confidence in that result. IE a BUNCH of games is needed. If you are trying to prove a +40 Elo change, that takes fewer games. But you specifically gave something like +8 which needs 20-30K games to measure that accurately. If the error bar is +/- 40, you really don't have any idea whether your change is good or bad if the potential gain is only a few Elo...cdani wrote:My objective is to try to gain elo by increasing step by step the added depth to the different threads. So when I achieve a new improvement with minimal confidence, instead of increasing the confidence of this point I try to obtain a new point with higher gains. When at some point I will not be able to obtain an improvement, then I can spend resources soundly to consolidate the last point. The previous points are always consolidated automatically with higher confidence when a next point is found, simply because you know that if with higher depths you have some confidence that it works, with lower depths it must work nearly sure.bob wrote:I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

At 4 threads I try to obtain some relative sure results. At 8 threads or more, my objective is simply to be sure that at least is an improvement over 4, because of the limited resources.

In any case the objectives where to have a working multithreading engine that scales minimally, and also to try to prove this variation over lazy smp, and those things are already achieved over what I was hoping.

I will continue trying to improve this, also with slower time control, the next thing I will try to have minimal confidence that is not a regression.

Posted: **Wed Apr 15, 2015 2:08 am**

Well, in this particular case I think his only change was the addition of `+ (depth > 7)' at the end of the formula, but with 8 threads that condition evaluates to false every time, which means his change can't possibly be an 8-point improvement in Elo.bob wrote:If you make a change that supposedly gains +8 elo as you reported, you have to have the error bar much lower than +/- 8 to have any confidence in that result. IE a BUNCH of games is needed. If you are trying to prove a +40 Elo change, that takes fewer games. But you specifically gave something like +8 which needs 20-30K games to measure that accurately. If the error bar is +/- 40, you really don't have any idea whether your change is good or bad if the potential gain is only a few Elo...cdani wrote:My objective is to try to gain elo by increasing step by step the added depth to the different threads. So when I achieve a new improvement with minimal confidence, instead of increasing the confidence of this point I try to obtain a new point with higher gains. When at some point I will not be able to obtain an improvement, then I can spend resources soundly to consolidate the last point. The previous points are always consolidated automatically with higher confidence when a next point is found, simply because you know that if with higher depths you have some confidence that it works, with lower depths it must work nearly sure.bob wrote:I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

At 4 threads I try to obtain some relative sure results. At 8 threads or more, my objective is simply to be sure that at least is an improvement over 4, because of the limited resources.

In any case the objectives where to have a working multithreading engine that scales minimally, and also to try to prove this variation over lazy smp, and those things are already achieved over what I was hoping.

I will continue trying to improve this, also with slower time control, the next thing I will try to have minimal confidence that is not a regression.

Posted: **Wed Apr 15, 2015 7:38 am**

Of course you have all the reason. But I'm trying to say that, as this is new and we don't know which is the upper bound, instead of spending efforts to be sure that a new point if sound, it's better to try higher points first to have an idea of the true upper bound and only then spend resources to refine the best points found.bob wrote:If you make a change that supposedly gains +8 elo as you reported, you have to have the error bar much lower than +/- 8 to have any confidence in that result. IE a BUNCH of games is needed. If you are trying to prove a +40 Elo change, that takes fewer games. But you specifically gave something like +8 which needs 20-30K games to measure that accurately. If the error bar is +/- 40, you really don't have any idea whether your change is good or bad if the potential gain is only a few Elo...

If you are trying to find the value in cp of a rook, and you try 100 and then 150, instead of being sure of those 150, is sound to try first 200, even with a relative high margin of error.

Posted: **Wed Apr 15, 2015 7:40 am**

Yes, the change that has won 8 elo was at 8 threads and was the bad one.AlvaroBegue wrote: Well, in this particular case I think his only change was the addition of `+ (depth > 7)' at the end of the formula, but with 8 threads that condition evaluates to false every time, which means his change can't possibly be an 8-point improvement in Elo.

Posted: **Wed Apr 15, 2015 3:17 pm**

You can buy a used dual 6 core Xeon system on ebay pretty cheap. Here is one for $610:

http://www.ebay.com/itm/HP-Z800-Worksta ... 33a0db6e8d

For leasing, you can try Amazon as suggested. But also look at http://webnx.com/dedicated-servers/los- ... u-servers/. They have quad CPU systems too.

--Jon

http://www.ebay.com/itm/HP-Z800-Worksta ... 33a0db6e8d

For leasing, you can try Amazon as suggested. But also look at http://webnx.com/dedicated-servers/los- ... u-servers/. They have quad CPU systems too.

--Jon

Posted: **Wed Apr 15, 2015 6:54 pm**

Look at Daniel Shawul's code and Dr. Hyatt's code. They both have interesting Numa implementations.cdani wrote:Thanks!! I will review this. I never worked with NUMA, so I will start.matthewlai wrote:I would recommend Amazon EC2 for that.cdani wrote: After those attempts, I will try to obtain access to a 12 or 16 core machine, to see how this must be modified to scale well at those machines. May be some of you have experience on an ISP that offers such services. I will pay just for a month, because I suppose it will not be any cheap, but I think it will be enough.

With cc2.8xlarge for example, you can get an entire dual socket Intel Xeon E5-2670 machine (2x8 cores, 32 hyperthreads) for about $2/hour, rounded up to hour.

Or, if you don't mind your instance (VM) being shut down randomly (in my experience, that very rarely happens), you can get an instance from Amazon's spare capacity for about 1/10 to 1/5 the price.

However, that's now a NUMA system (since it's dual socket), so that makes things a little more complicated, if your engine doesn't already support NUMA.