Direction of travel

Guetti · Post by **Guetti** » Mon Jan 21, 2008 9:26 pm

Gerd Isenberg wrote:
Guetti wrote:I'm wondering whether there is an advantage of this method except speed? i.e. Does one need to access the movelist of ply x-1 at ply x?
It might be interesting to know, whether the move about to make, was already tried two plies before, but obviously failed to cut. Probably considering some "connections" of the two moves played instead with this move about to make. Another late pruning measure?

Hm, don't some kind of heuristics usually take care of that?

Guetti wrote:Or can I just keep a local movelist in the search for every ply?

Since we have ...-cut-all-cut-all... and rarely more than maxmoves/2 at all- and pv-nodes, we'll have some gaps or "defragmentation" of pages and cachelines with two dimensional maxply * maxmovesPerPly arrays. IIRC Dieter Buerssner used local arrays on the stack.

Sorry, now you lost me. Can you explain this to a complete beginner like me?

Robert Pope · Post by **Robert Pope** » Mon Jan 21, 2008 9:36 pm

bob wrote:
Robert Pope wrote:Maybe I shouldn't care. It just seems cleaner than having to explicitly size the array each time.
Don't do it that way. Declare one large array. then for each ply, have a pointer into the list to point to the last move for that ply. The next ply will start at that point +1 and have a pointer to the end of it's move list. move_list[5000] will cover any case you can hope to search. And it will be a bit more efficient as well...

Thanks for the idea -- once I get the perft validated, I may give it a try to switch over.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Jan 21, 2008 10:12 pm

Guetti wrote:
Gerd Isenberg wrote:
Guetti wrote:I'm wondering whether there is an advantage of this method except speed? i.e. Does one need to access the movelist of ply x-1 at ply x?
It might be interesting to know, whether the move about to make, was already tried two plies before, but obviously failed to cut. Probably considering some "connections" of the two moves played instead with this move about to make. Another late pruning measure?
Hm, don't some kind of heuristics usually take care of that?

which?

Guetti wrote:Or can I just keep a local movelist in the search for every ply?

Since we have ...-cut-all-cut-all... and rarely more than maxmoves/2 at all- and pv-nodes, we'll have some gaps or "defragmentation" of pages and cachelines with two dimensional maxply * maxmovesPerPly arrays. IIRC Dieter Buerssner used local arrays on the stack.
Sorry, now you lost me. Can you explain this to a complete beginner like me?

I mean a two dimensional array

MOVE movelistByPly[MAX_PLIES][256]

So 256 moves per ply preallocated, let say 4 bytes per move - 1KByte per ply. 4KByte pages are virtually mapped into physical memory and caches, each page takes one entry inside the TLB.

At cut nodes you'll likely generate no moves at all - or maybe only a few winning captures or killers. Otherwise at all-nodes you'll need to generate let say 30..40 moves in general, but you'll need to pre-allocate the array for worst case.

Thus, inside a 4K page, likely about 80 moves (4 plies) are stored - instead of 1024. Which is worse because you'll need more pages. Another issue might be (hardware) pre-fetching of cachelines.

In general, keeping things together improves cache and memory utilization and performance.

Guetti · Post by **Guetti** » Mon Jan 21, 2008 10:39 pm

Gerd Isenberg wrote:
Guetti wrote: Hm, don't some kind of heuristics usually take care of that?
which?

Sorry, I was thinking about history tables or refutation tables.

Guetti wrote:Or can I just keep a local movelist in the search for every ply?

Since we have ...-cut-all-cut-all... and rarely more than maxmoves/2 at all- and pv-nodes, we'll have some gaps or "defragmentation" of pages and cachelines with two dimensional maxply * maxmovesPerPly arrays. IIRC Dieter Buerssner used local arrays on the stack.
Sorry, now you lost me. Can you explain this to a complete beginner like me?

I mean a two dimensional array

MOVE movelistByPly[MAX_PLIES][256]

So 256 moves per ply preallocated, let say 4 bytes per move - 1KByte per ply. 4KByte pages are virtually mapped into physical memory and caches, each page takes one entry inside the TLB.

At cut nodes you'll likely generate no moves at all - or maybe only a few winning captures or killers. Otherwise at all-nodes you'll need to generate let say 30..40 moves in general, but you'll need to pre-allocate the array for worst case.

Thus, inside a 4K page, likely about 80 moves (4 plies) are stored - instead of 1024. Which is worse because you'll need more pages. Another issue might be (hardware) pre-fetching of cachelines.

In general, keeping things together improves cache and memory utilization and performance.

Ok, I understand now. Thanks.
I will try to switch to the method Bob mentioned. What initially scared me a bit is the fact, that when I accidentally misalign the pointer of the position in the array, I start to overwrite previous moves.

Guetti · Post by **Guetti** » Tue Jan 22, 2008 3:42 pm

I thought about this some, and I think I leave it as it is for the time being. There is no sense in prematurely optimizing too much. I tried the approach of the 2-dim array you were mentioning, but I could not observe a big difference in speed compared to the creation of a move[265] array with every call of search even for perft. Because I not only use a list of moves, but also a sorting score for every move, the current method seems much more convenient for me.
I also had a look at other engines about that, and i.e. glaurung also creates a class containing an move[256] array among other stuff with every call to search, if I understood correctly.

- Andy

Direction of travel

Re: Direction of travel

Re: Direction of travel

Re: Direction of travel

Re: Direction of travel

Re: Direction of travel