Hi Robert!
I understand what you mean and of course I have to agree with you.
I have one possible suggestion of why it might help searching with one thread to depth d and another to depth d+1. If your threads use the same history table then you might get some artifical aging that will be benificial and might help both the depth d search as well as the depth d+1 search.
I am just speculating but it might be the case that you will actually look at a quite different tree then if you do LMR based on your history table.
I might be very wrong and I have not finished my chess engine so be nice to me
Lazy SMP, part 2
Moderators: hgm, Rebel, chrisw
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Lazy SMP, part 2
If you do a serial search that is iterative you will get your baseline. If another thread is always doing depth + 1, the main thread is going to benefit from higher quality moves that were put in the hash table by the main thread from the previous iteration. Of course we would also expect the performance to increase due to moves already being searched.Pio wrote:Hi Robert!
I understand what you mean and of course I have to agree with you.
I have one possible suggestion of why it might help searching with one thread to depth d and another to depth d+1. If your threads use the same history table then you might get some artifical aging that will be benificial and might help both the depth d search as well as the depth d+1 search.
I am just speculating but it might be the case that you will actually look at a quite different tree then if you do LMR based on your history table.
I might be very wrong and I have not finished my chess engine so be nice to me
Due to the way LMR work anything discovered by the deeper search in the previous iteration is going to have a beneficial change on the move ordering and hence you might not reduce some moves you would have ordinarily.
Another way to do this on a 4 core machine for example is start with a 1,2,3 and 4 ply search. When 1 search completes, it frees up a thread so immediately start the next un-started depth. Let's assume the 1 ply search completes and the 2,3 and 4 are still running - you would start a 5 ply search on the newly freed thread. The program reports that a 1 ply search is complete and all the accounting is focused on the 2 ply search at this point and so on.
You can do the same thing with half ply increments since it may be that 4 full ply iterations at the same time is not very efficient.
I suspect that stopping the other threads and moving to the next iteration has almost the same impact as keeping them all running in the fashion I describe because the hash table essentially records the state anyway when you stop. So your way of doing this may be simpler and just as good.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Lazy SMP, part 2
Correction! I meant that the main thread would benefit from the depth+1 search from the previous iteration.Don wrote:If you do a serial search that is iterative you will get your baseline. If another thread is always doing depth + 1, the main thread is going to benefit from higher quality moves that were put in the hash table by the main thread from the previous iteration.Pio wrote:Hi Robert!
I understand what you mean and of course I have to agree with you.
I have one possible suggestion of why it might help searching with one thread to depth d and another to depth d+1. If your threads use the same history table then you might get some artifical aging that will be benificial and might help both the depth d search as well as the depth d+1 search.
I am just speculating but it might be the case that you will actually look at a quite different tree then if you do LMR based on your history table.
I might be very wrong and I have not finished my chess engine so be nice to me
Of course we would also expect the performance to increase due to moves already being searched.
Due to the way LMR work anything discovered by the deeper search in the previous iteration is going to have a beneficial change on the move ordering and hence you might not reduce some moves you would have ordinarily.
Another way to do this on a 4 core machine for example is start with a 1,2,3 and 4 ply search. When 1 search completes, it frees up a thread so immediately start the next un-started depth. Let's assume the 1 ply search completes and the 2,3 and 4 are still running - you would start a 5 ply search on the newly freed thread. The program reports that a 1 ply search is complete and all the accounting is focused on the 2 ply search at this point and so on.
You can do the same thing with half ply increments since it may be that 4 full ply iterations at the same time is not very efficient.
I suspect that stopping the other threads and moving to the next iteration has almost the same impact as keeping them all running in the fashion I describe because the hash table essentially records the state anyway when you stop. So your way of doing this may be simpler and just as good.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: Lazy SMP, part 2
That perfectly describes what's happening in a YBW search, doesn't it?bob wrote: The moral of that story is that on occasion, that extra search space finds something good. On other occasions, it slows you down. It MUST average out to nothing. Otherwise you could improve your serial search to accomplish the same thing.
All in all no qualitative gain, only speedup.
But I still think this is not true for the SHT-approach ...
I'm afraid our opinions differ here, which is ok.
Jörg Oster
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Lazy SMP, part 2
My point is this. IF your approach works, one can do that SAME thing in a purely serial/sequential search. You just write two identical search functions, and call the first which selects one move searches one node (using the shared HT). It returns. You call the other search which does the same thing. Then you call the first which searches one more node (deeper or wider doesn't matter). Then back to the second. In essence, you are doing a parallel search by interleaving the nodes. If the shared TT actually improves quality, so will this.Joerg Oster wrote:That perfectly describes what's happening in a YBW search, doesn't it?bob wrote: The moral of that story is that on occasion, that extra search space finds something good. On other occasions, it slows you down. It MUST average out to nothing. Otherwise you could improve your serial search to accomplish the same thing.
All in all no qualitative gain, only speedup.
But I still think this is not true for the SHT-approach ...
I'm afraid our opinions differ here, which is ok.
I simply do not believe in the concept, myself.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Lazy SMP, part 2
Or better yet, if there is something there, try to figure out what the principle is so that you can take advantage of it more purposefully.bob wrote:My point is this. IF your approach works, one can do that SAME thing in a purely serial/sequential search. You just write two identical search functions, and call the first which selects one move searches one node (using the shared HT). It returns. You call the other search which does the same thing. Then you call the first which searches one more node (deeper or wider doesn't matter). Then back to the second. In essence, you are doing a parallel search by interleaving the nodes. If the shared TT actually improves quality, so will this.Joerg Oster wrote:That perfectly describes what's happening in a YBW search, doesn't it?bob wrote: The moral of that story is that on occasion, that extra search space finds something good. On other occasions, it slows you down. It MUST average out to nothing. Otherwise you could improve your serial search to accomplish the same thing.
All in all no qualitative gain, only speedup.
But I still think this is not true for the SHT-approach ...
I'm afraid our opinions differ here, which is ok.
I simply do not believe in the concept, myself.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 171
- Joined: Wed Dec 28, 2011 8:44 pm
- Location: United States
TurboBoost complications
Well, I discovered at least one issue that might have affected my results.... Turbo-boost. I realized that Turbo-boost may be affect my tests of multiple-cores... both the time-to-depth test and the game playing tests. My chip is an i7, so when I ran the single core time-to-depth test, it was 15-20% faster than it would have been if the chip had been fully loaded. So this makes the relative improvement in time to depth for multiple threads is actually somewhat better than my first post indicates.
On the game results the interpretation is more difficult, in the single-thread case all four cores are running all the time because I am playing four games in parallel, but in the four-thread case, one core is running for the opponents (and therefore might get turbo-boost) while all four are running for my program. In principle, this reduces the rating of the four-thread version relative to the single-thread version because the opponents are stronger for the four-thread version, but given the tiny time-slices the opponents are playing in, I am not sure this matters.
The only way to tell is to try to disable TurboBoost. Anyone know a safe way to do this? There is no option in the BIOS. Here is a website claiming to have an approach, but I don't know anything about the tool they are referencing.
http://luisjdominguezp.tumblr.com/post/ ... t-in-linux
Are there any other adaptive speed changes the i7 does that might also affect the results? What do others do when testing to avoid problems like these?
- Dan
On the game results the interpretation is more difficult, in the single-thread case all four cores are running all the time because I am playing four games in parallel, but in the four-thread case, one core is running for the opponents (and therefore might get turbo-boost) while all four are running for my program. In principle, this reduces the rating of the four-thread version relative to the single-thread version because the opponents are stronger for the four-thread version, but given the tiny time-slices the opponents are playing in, I am not sure this matters.
The only way to tell is to try to disable TurboBoost. Anyone know a safe way to do this? There is no option in the BIOS. Here is a website claiming to have an approach, but I don't know anything about the tool they are referencing.
http://luisjdominguezp.tumblr.com/post/ ... t-in-linux
Are there any other adaptive speed changes the i7 does that might also affect the results? What do others do when testing to avoid problems like these?
- Dan
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: TurboBoost complications
Hi Dan,
are you really sure there is no such option in the BIOS?
What Mainboard do you have? Maybe you have to change into an 'Advanced Mode' to have all options available?
Maybe it will help to change the CPU settings to manual instead of using automatic settings. Good luck.
are you really sure there is no such option in the BIOS?
What Mainboard do you have? Maybe you have to change into an 'Advanced Mode' to have all options available?
Maybe it will help to change the CPU settings to manual instead of using automatic settings. Good luck.
Jörg Oster
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: TurboBoost complications
You need to disable that for sensible testing. It's ok to run with it in real games, as when it kicks in, you get a performance boost. But it does screw up SMP comparisons significantly, because you now have yet another variable in the equation.dchoman wrote:Well, I discovered at least one issue that might have affected my results.... Turbo-boost. I realized that Turbo-boost may be affect my tests of multiple-cores... both the time-to-depth test and the game playing tests. My chip is an i7, so when I ran the single core time-to-depth test, it was 15-20% faster than it would have been if the chip had been fully loaded. So this makes the relative improvement in time to depth for multiple threads is actually somewhat better than my first post indicates.
On the game results the interpretation is more difficult, in the single-thread case all four cores are running all the time because I am playing four games in parallel, but in the four-thread case, one core is running for the opponents (and therefore might get turbo-boost) while all four are running for my program. In principle, this reduces the rating of the four-thread version relative to the single-thread version because the opponents are stronger for the four-thread version, but given the tiny time-slices the opponents are playing in, I am not sure this matters.
The only way to tell is to try to disable TurboBoost. Anyone know a safe way to do this? There is no option in the BIOS. Here is a website claiming to have an approach, but I don't know anything about the tool they are referencing.
http://luisjdominguezp.tumblr.com/post/ ... t-in-linux
Are there any other adaptive speed changes the i7 does that might also affect the results? What do others do when testing to avoid problems like these?
- Dan
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: TurboBoost complications
So the rating agencies should have this disabled. Do you have a sense of whether it would benefit MP or SP more or is it just random?bob wrote:You need to disable that for sensible testing. It's ok to run with it in real games, as when it kicks in, you get a performance boost. But it does screw up SMP comparisons significantly, because you now have yet another variable in the equation.dchoman wrote:Well, I discovered at least one issue that might have affected my results.... Turbo-boost. I realized that Turbo-boost may be affect my tests of multiple-cores... both the time-to-depth test and the game playing tests. My chip is an i7, so when I ran the single core time-to-depth test, it was 15-20% faster than it would have been if the chip had been fully loaded. So this makes the relative improvement in time to depth for multiple threads is actually somewhat better than my first post indicates.
On the game results the interpretation is more difficult, in the single-thread case all four cores are running all the time because I am playing four games in parallel, but in the four-thread case, one core is running for the opponents (and therefore might get turbo-boost) while all four are running for my program. In principle, this reduces the rating of the four-thread version relative to the single-thread version because the opponents are stronger for the four-thread version, but given the tiny time-slices the opponents are playing in, I am not sure this matters.
The only way to tell is to try to disable TurboBoost. Anyone know a safe way to do this? There is no option in the BIOS. Here is a website claiming to have an approach, but I don't know anything about the tool they are referencing.
http://luisjdominguezp.tumblr.com/post/ ... t-in-linux
Are there any other adaptive speed changes the i7 does that might also affect the results? What do others do when testing to avoid problems like these?
- Dan
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.