A main difference moving to the Dynamic Tree Splitting Algorithm is that the whole search has to be laid out iterative instead of recursive. Now my question is, how can this be achieved best?
My first approach was similar to how it is done in ZCT: a switch branch inside of an infinite loop. In the switch there are several stages defined and when jumping to a different part (eg. from pv-search to quies-search) the stage where it should return afterwards is stored in the stack, the target stage (beginning of quies) is stored and then with "continue;" the instruction pointer goes to the beginning of the switch branch where it then jumps to the target stage.
After reading an article yesterday I think there is a much easier maybe nicer solution to this. Its more about emulating the behavior of the usual function call algorithm. Instead of the stages we use jump labels. At the beginning of the function a table is filled with the addresses of these labels (I found a solution where this is done in assembly - or in gcc there is even the option to use the &&label operator) Now when a new stage is called, it directly jumps to it, storing the address of the return label on the stack for when it has to return. To conclude, this approach doesn't do the extra work jumping to the beginning of the switch statement every time it wants to enter another section of the program. Additionally all the statements like continue; break; etc. become obsolete.
Are there any other approaches?
DTS Structure
Moderators: hgm, Rebel, chrisw
Re: DTS Structure
Just wait for your result.
DTS is much complicated than YBW, whose performance is also better than YBW, not much. If we only care about the ELO gain, it's hard to measure the difference between DTS and YBW.
DTS is much complicated than YBW, whose performance is also better than YBW, not much. If we only care about the ELO gain, it's hard to measure the difference between DTS and YBW.
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: DTS Structure
Hello Edmund,Edmund wrote:A main difference moving to the Dynamic Tree Splitting Algorithm is that the whole search has to be laid out iterative instead of recursive. Now my question is, how can this be achieved best?
My first approach was similar to how it is done in ZCT: a switch branch inside of an infinite loop. In the switch there are several stages defined and when jumping to a different part (eg. from pv-search to quies-search) the stage where it should return afterwards is stored in the stack, the target stage (beginning of quies) is stored and then with "continue;" the instruction pointer goes to the beginning of the switch branch where it then jumps to the target stage.
After reading an article yesterday I think there is a much easier maybe nicer solution to this. Its more about emulating the behavior of the usual function call algorithm. Instead of the stages we use jump labels. At the beginning of the function a table is filled with the addresses of these labels (I found a solution where this is done in assembly - or in gcc there is even the option to use the &&label operator) Now when a new stage is called, it directly jumps to it, storing the address of the return label on the stack for when it has to return. To conclude, this approach doesn't do the extra work jumping to the beginning of the switch statement every time it wants to enter another section of the program. Additionally all the statements like continue; break; etc. become obsolete.
Are there any other approaches?
Your solution might work a bit better, but personally I doubt there would be much speed gain. If you look at the assembler output from search.c in ZCT, you'll see it isn't really that bad--at each RETURN, there's an unconditional jump to the top of the function, where it calls search_maintenance(), and from there, uses the search state to index a jump table. So the net win is quite small, you win one unconditional jump, a bounds check on the search state (which should always be not taken of course) and one memory access to a very small table.
Here's what gcc -O3 gives me:
Code: Select all
.L171:
movq %r14, %rsi
movq %r13, %rdi
call search_maintenance
testl %eax, %eax
jne .L174
movq 32(%rsp), %rbx
movl 120(%rbx), %esi
cmpl $12, %esi
ja .L7
mov %esi, %eax
jmp *.L21(,%rax,8)
.section .rodata
.align 8
.align 4
.L21:
.quad .L8
.quad .L9
.quad .L10
.quad .L157
.quad .L12
.quad .L15
.quad .L14
.quad .L15
.quad .L16
.quad .L17
.quad .L18
.quad .L19
.quad .L20
And there are actually other ways to formulate it. Teemu Pudas was able to do it with real function calls but without being recursive. It's pretty clean, but it would be even slower than ZCT. His method is here: http://www.open-aurec.com/wbforum/viewt ... 76#p187476
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: DTS Structure
Where would be the fun? And its not so much more complicated if you think about it.liuzy wrote:Just wait for your result.
DTS is much complicated than YBW, whose performance is also better than YBW, not much. If we only care about the ELO gain, it's hard to measure the difference between DTS and YBW.
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: DTS Structure
first of all thanks for the link Zach. That was a very interesting read. Teemus approach with overloading the usual function statements is quite neat. Its also interesting that already hgm and vincent were working into similar directions with iterative search via gotos.Zach Wegner wrote:
Hello Edmund,
Your solution might work a bit better, but personally I doubt there would be much speed gain. If you look at the assembler output from search.c in ZCT, you'll see it isn't really that bad--at each RETURN, there's an unconditional jump to the top of the function, where it calls search_maintenance(), and from there, uses the search state to index a jump table. So the net win is quite small, you win one unconditional jump, a bounds check on the search state (which should always be not taken of course) and one memory access to a very small table.
...
regarding the comparison with your code, I wouldn't replace the switch statement for the sake of one additional jump. Its rather for the improved readability, which is very important in such structures.
-
- Posts: 224
- Joined: Mon Mar 12, 2007 7:31 pm
- Location: Bonn, Germany
Re: DTS Structure
Iterative search is also useful for YBW. Also, it helps profiling because you don't have cycles in the call tree.Edmund wrote:A main difference moving to the Dynamic Tree Splitting Algorithm is that the whole search has to be laid out iterative instead of recursive. Now my question is, how can this be achieved best?
I personally don't like the macro based solutions for iterative search from other programs.
EDIT: because I dislike macros in general and try to minimize their use. I also dislike a huge case switch. The latter was fine for qsearch, but as a case switch cannot jump into the middle of loops, it requires ugly adaptions of the code.
I thought quite a long time, how to write a readable iterative search. I found a macro free solution. The price are - gotos. I hadn't used goto since my youth with C64 BASIC (not counting jumping out of nested loops) but for an iterative search I found gotos really useful.
Search looks like this:
Code: Select all
namespace Label
{
enum _ {after_scout, after_research, ...};
}
class Node
{
int alpha, beta;
Move move;
MoveList moves;
int value;
Label::_ ret_addr;
};
search ()
{
Node* node = &d_nodes[0];
Node* child = node+1;
recurse:
node->moves.init();
while ((node->move = node->moves.next()))
{
child->alpha = ...
child->beta = ...
child->ret_addr = Label::after_scout;
++node;
++child;
goto recurse;
// when we return here, node and child have their original values
// and child is evaluated
after_scout:
if (child->value < -node->alpha)
{
child->alpha = ...
child->beta = ...
child->ret_addr = Label::after_research;
++node
++child;
goto recurse;
after_research:
...
}
if (cutoff)
{
node->value = ...
goto node_done;
}
}
node_done:
--node;
--child;
switch (node->ret_addr)
{
case Label after_research:
goto after_research;
...
}
}
-
- Posts: 224
- Joined: Mon Mar 12, 2007 7:31 pm
- Location: Bonn, Germany
Re: DTS Structure
Code: Select all
switch (child->ret_addr)
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: DTS Structure
Thats about the algorithm I was thinking of too. I just wonder how the inline assembly should be slower than the additional indirection.Onno Garms wrote:...
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.
I was thinking of something like:
first a jumptable is initialized holding the addresses of all labels
Code: Select all
// init Jumptable
int jmptable[64];
#define ADD_JUMPER(jmp) \
mov edx, DWORD PTR jmp \
mov [esi], edx \
add esi, 4
__asm
{
lea esi, jmptable ;Load address of the jmptable into esi
ADD_JUMPER(PV)
ADD_JUMPER(ZW)
ADD_JUMPER(QS)
[...]
}
#undef ADD_JUMPER
[...]
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: DTS Structure
I sometimes use array(s) of function pointers.Edmund wrote:Thats about the algorithm I was thinking of too. I just wonder how the inline assembly should be slower than the additional indirection.Onno Garms wrote:...
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.
I was thinking of something like:
first a jumptable is initialized holding the addresses of all labelsthen to jump to a specific label I would have used a macro with an inlined assembly statement.Code: Select all
// init Jumptable int jmptable[64]; #define ADD_JUMPER(jmp) \ mov edx, DWORD PTR jmp \ mov [esi], edx \ add esi, 4 __asm { lea esi, jmptable ;Load address of the jmptable into esi ADD_JUMPER(PV) ADD_JUMPER(ZW) ADD_JUMPER(QS) [...] } #undef ADD_JUMPER [...]
E.g. something like this:
genmoves[piecetype][color](rank, file);
where genmoves is a 6x2 array of function pointers.
It removes all the goto branches (if you don't count calls and returns).
That approach used to be a *lot* faster than a switch if it were in a critical path, but lately the difference has pretty well vanished.
Here is a benchmark program to test your compiler for pointer, array, and switch invocation (a profiler will tell you which approach is fastest):
Code: Select all
#include <math.h>
#include <stdio.h>
typedef double (*f_t) (double);
static f_t f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0};
static double accum0 = 0;
static double accum1 = 0;
static double accum2 = 0;
void arr(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
d += f[i] (0.5);
}
accum0 += d;
}
void poi(void)
{
f_t *flist = f;
double d = 0;
while (*flist) {
f_t ff = *flist;
d += ff(0.5);
flist++;
}
accum1 += d;
}
void swi(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
switch (i) {
case 0:
d += f[0] (0.5);
break;
case 1:
d += f[1] (0.5);
break;
case 2:
d += f[2] (0.5);
break;
case 3:
d += f[3] (0.5);
break;
case 4:
d += f[4] (0.5);
break;
case 5:
d += f[5] (0.5);
break;
case 6:
d += f[6] (0.5);
break;
case 7:
d += f[7] (0.5);
break;
case 8:
d += f[8] (0.5);
break;
case 9:
d += f[9] (0.5);
break;
default:
break;
}
}
accum2 += d;
}
int main(void)
{
long i;
for (i = 0; i < 1000000; i++) {
arr();
poi();
swi();
}
printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
return 0;
}
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: DTS Structure
That's quite close to the way it used to be in ZCT (apparently before the first public release). I had a big switch at the top, each statement just a goto (or a return in one case). Having it at the top is also marginally more flexible, you can call qsearch directly, and for when children join a node at a split point, they can go straight into the move loop.Onno Garms wrote:Iterative search is also useful for YBW. Also, it helps profiling because you don't have cycles in the call tree.Edmund wrote:A main difference moving to the Dynamic Tree Splitting Algorithm is that the whole search has to be laid out iterative instead of recursive. Now my question is, how can this be achieved best?
I personally don't like the macro based solutions for iterative search from other programs.
EDIT: because I dislike macros in general and try to minimize their use. I also dislike a huge case switch. The latter was fine for qsearch, but as a case switch cannot jump into the middle of loops, it requires ugly adaptions of the code.
I thought quite a long time, how to write a readable iterative search. I found a macro free solution. The price are - gotos. I hadn't used goto since my youth with C64 BASIC (not counting jumping out of nested loops) but for an iterative search I found gotos really useful.
Search looks like this:
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.Code: Select all
namespace Label { enum _ {after_scout, after_research, ...}; } class Node { int alpha, beta; Move move; MoveList moves; int value; Label::_ ret_addr; }; search () { Node* node = &d_nodes[0]; Node* child = node+1; recurse: node->moves.init(); while ((node->move = node->moves.next())) { child->alpha = ... child->beta = ... child->ret_addr = Label::after_scout; ++node; ++child; goto recurse; // when we return here, node and child have their original values // and child is evaluated after_scout: if (child->value < -node->alpha) { child->alpha = ... child->beta = ... child->ret_addr = Label::after_research; ++node ++child; goto recurse; after_research: ... } if (cutoff) { node->value = ... goto node_done; } } node_done: --node; --child; switch (node->ret_addr) { case Label after_research: goto after_research; ... } }
Frankly going that far to avoid macros just makes the code ugly. At the very least you can wrap all the child-> stuff into a function, but you still have a goto and a label sitting around.
And of course switches can jump into the middle of a loop. Why couldn't they?