Trying to improve lazy smp

bob · Post by **bob** » Tue Apr 14, 2015 1:24 am

I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

Dann Corbit · Post by **Dann Corbit** » Tue Apr 14, 2015 2:01 am

cdani wrote:
bob wrote: Make a note: "quick and dirty" is often "quick and buggy". Done that too many times, haven't done it in quite a while however, as I simply always "measure twice, cut once" nowadays. My dad had it right.
Yes Well. I will try to improve.

New best version at 4 threads: +122 elo (from +117)

Not published for the moment, I'm triying it at 8 threads.

Added to depths, in readable format, of the four threads: 0 (obviously), +1, +2, +3.

Yes, is the one I wanted to try first, but by prudence I arrived to it slowly.

I wonder how that strategy would compare with analyzing the root node, the node after making the move with the second highest score (stored in a list from the previous iteration), the third highest score, and the 4th highest score and so on, for a small number of threads.
So the strategy is:
0 (obviosly)
+1a
+1b
+1c
...

Perhaps another method would be a function of the differences in the scores of the siblings (throw more horsepower to the strongest nodes).
e.g.
0 (obviosly)
+1a gets 2 threads for +40 score
+1b gets 1 thread for + 15 score
+1c gets 1 thread for + 12 score

or even:
0 (obviosly)
+1a gets 1 threads for +40 score
+2a (1st child node gets 1 thread due to the +40 score)
+1b gets 1 thread for + 15 score
+1c gets 1 thread for + 12 score

bob · Post by **bob** » Tue Apr 14, 2015 4:32 am

Dann Corbit wrote:
cdani wrote:
bob wrote: Make a note: "quick and dirty" is often "quick and buggy". Done that too many times, haven't done it in quite a while however, as I simply always "measure twice, cut once" nowadays. My dad had it right.
Yes Well. I will try to improve.

New best version at 4 threads: +122 elo (from +117)

Not published for the moment, I'm triying it at 8 threads.

Added to depths, in readable format, of the four threads: 0 (obviously), +1, +2, +3.

Yes, is the one I wanted to try first, but by prudence I arrived to it slowly.
I wonder how that strategy would compare with analyzing the root node, the node after making the move with the second highest score (stored in a list from the previous iteration), the third highest score, and the 4th highest score and so on, for a small number of threads.
So the strategy is:
0 (obviosly)
+1a
+1b
+1c
...

Perhaps another method would be a function of the differences in the scores of the siblings (throw more horsepower to the strongest nodes).
e.g.
0 (obviosly)
+1a gets 2 threads for +40 score
+1b gets 1 thread for + 15 score
+1c gets 1 thread for + 12 score

or even:
0 (obviosly)
+1a gets 1 threads for +40 score
+2a (1st child node gets 1 thread due to the +40 score)
+1b gets 1 thread for + 15 score
+1c gets 1 thread for + 12 score

This entire idea sounds like a LOT of waiting and twiddling overhead to me.

cdani · Post by **cdani** » Tue Apr 14, 2015 10:11 am

bob wrote:I assume you are playing tens of thousands of games to get 4-5 elo accuracy???

My objective is to try to gain elo by increasing step by step the added depth to the different threads. So when I achieve a new improvement with minimal confidence, instead of increasing the confidence of this point I try to obtain a new point with higher gains. When at some point I will not be able to obtain an improvement, then I can spend resources soundly to consolidate the last point. The previous points are always consolidated automatically with higher confidence when a next point is found, simply because you know that if with higher depths you have some confidence that it works, with lower depths it must work nearly sure.

At 4 threads I try to obtain some relative sure results. At 8 threads or more, my objective is simply to be sure that at least is an improvement over 4, because of the limited resources.

In any case the objectives where to have a working multithreading engine that scales minimally, and also to try to prove this variation over lazy smp, and those things are already achieved over what I was hoping.

I will continue trying to improve this, also with slower time control, the next thing I will try to have minimal confidence that is not a regression.

bob · Post by **bob** » Wed Apr 15, 2015 4:05 am

cdani wrote:
bob wrote:I assume you are playing tens of thousands of games to get 4-5 elo accuracy???
My objective is to try to gain elo by increasing step by step the added depth to the different threads. So when I achieve a new improvement with minimal confidence, instead of increasing the confidence of this point I try to obtain a new point with higher gains. When at some point I will not be able to obtain an improvement, then I can spend resources soundly to consolidate the last point. The previous points are always consolidated automatically with higher confidence when a next point is found, simply because you know that if with higher depths you have some confidence that it works, with lower depths it must work nearly sure.

At 4 threads I try to obtain some relative sure results. At 8 threads or more, my objective is simply to be sure that at least is an improvement over 4, because of the limited resources.

In any case the objectives where to have a working multithreading engine that scales minimally, and also to try to prove this variation over lazy smp, and those things are already achieved over what I was hoping.

I will continue trying to improve this, also with slower time control, the next thing I will try to have minimal confidence that is not a regression.

If you make a change that supposedly gains +8 elo as you reported, you have to have the error bar much lower than +/- 8 to have any confidence in that result. IE a BUNCH of games is needed. If you are trying to prove a +40 Elo change, that takes fewer games. But you specifically gave something like +8 which needs 20-30K games to measure that accurately. If the error bar is +/- 40, you really don't have any idea whether your change is good or bad if the potential gain is only a few Elo...

AlvaroBegue · Post by **AlvaroBegue** » Wed Apr 15, 2015 4:08 am

bob wrote:
cdani wrote:
bob wrote:I assume you are playing tens of thousands of games to get 4-5 elo accuracy???
My objective is to try to gain elo by increasing step by step the added depth to the different threads. So when I achieve a new improvement with minimal confidence, instead of increasing the confidence of this point I try to obtain a new point with higher gains. When at some point I will not be able to obtain an improvement, then I can spend resources soundly to consolidate the last point. The previous points are always consolidated automatically with higher confidence when a next point is found, simply because you know that if with higher depths you have some confidence that it works, with lower depths it must work nearly sure.

At 4 threads I try to obtain some relative sure results. At 8 threads or more, my objective is simply to be sure that at least is an improvement over 4, because of the limited resources.

In any case the objectives where to have a working multithreading engine that scales minimally, and also to try to prove this variation over lazy smp, and those things are already achieved over what I was hoping.

I will continue trying to improve this, also with slower time control, the next thing I will try to have minimal confidence that is not a regression.
If you make a change that supposedly gains +8 elo as you reported, you have to have the error bar much lower than +/- 8 to have any confidence in that result. IE a BUNCH of games is needed. If you are trying to prove a +40 Elo change, that takes fewer games. But you specifically gave something like +8 which needs 20-30K games to measure that accurately. If the error bar is +/- 40, you really don't have any idea whether your change is good or bad if the potential gain is only a few Elo...

Well, in this particular case I think his only change was the addition of `+ (depth > 7)' at the end of the formula, but with 8 threads that condition evaluates to false every time, which means his change can't possibly be an 8-point improvement in Elo.

cdani · Post by **cdani** » Wed Apr 15, 2015 9:38 am

bob wrote:If you make a change that supposedly gains +8 elo as you reported, you have to have the error bar much lower than +/- 8 to have any confidence in that result. IE a BUNCH of games is needed. If you are trying to prove a +40 Elo change, that takes fewer games. But you specifically gave something like +8 which needs 20-30K games to measure that accurately. If the error bar is +/- 40, you really don't have any idea whether your change is good or bad if the potential gain is only a few Elo...

Of course you have all the reason. But I'm trying to say that, as this is new and we don't know which is the upper bound, instead of spending efforts to be sure that a new point if sound, it's better to try higher points first to have an idea of the true upper bound and only then spend resources to refine the best points found.

If you are trying to find the value in cp of a rook, and you try 100 and then 150, instead of being sure of those 150, is sound to try first 200, even with a relative high margin of error.

cdani · Post by **cdani** » Wed Apr 15, 2015 9:40 am

AlvaroBegue wrote: Well, in this particular case I think his only change was the addition of `+ (depth > 7)' at the end of the formula, but with 8 threads that condition evaluates to false every time, which means his change can't possibly be an 8-point improvement in Elo.

Yes, the change that has won 8 elo was at 8 threads and was the bad one.

jdart · Post by **jdart** » Wed Apr 15, 2015 5:17 pm

You can buy a used dual 6 core Xeon system on ebay pretty cheap. Here is one for $610:

http://www.ebay.com/itm/HP-Z800-Worksta ... 33a0db6e8d

For leasing, you can try Amazon as suggested. But also look at http://webnx.com/dedicated-servers/los- ... u-servers/. They have quad CPU systems too.

--Jon

Dann Corbit · Post by **Dann Corbit** » Wed Apr 15, 2015 8:54 pm

cdani wrote:
matthewlai wrote:
cdani wrote: After those attempts, I will try to obtain access to a 12 or 16 core machine, to see how this must be modified to scale well at those machines. May be some of you have experience on an ISP that offers such services. I will pay just for a month, because I suppose it will not be any cheap, but I think it will be enough.
I would recommend Amazon EC2 for that.

With cc2.8xlarge for example, you can get an entire dual socket Intel Xeon E5-2670 machine (2x8 cores, 32 hyperthreads) for about $2/hour, rounded up to hour.

Or, if you don't mind your instance (VM) being shut down randomly (in my experience, that very rarely happens), you can get an instance from Amazon's spare capacity for about 1/10 to 1/5 the price.

However, that's now a NUMA system (since it's dual socket), so that makes things a little more complicated, if your engine doesn't already support NUMA.
Thanks!! I will review this. I never worked with NUMA, so I will start.

Look at Daniel Shawul's code and Dr. Hyatt's code. They both have interesting Numa implementations.

Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp