Way to reduce amount of stored data in engine

Sergei S. Markoff · Post by **Sergei S. Markoff** » Tue Sep 13, 2011 7:08 pm

Current eval functions and shallow searches are good predictors for the deep searches. So, when you're need to store tree (in openings book, EGTB or TT) it's possible to store _only_ moves that are differs from "obvious". Possible it can reduce amount of stored info by many times. Anyone tried this approach?

mcostalba · Post by **mcostalba** » Tue Sep 13, 2011 7:25 pm

Sergei S. Markoff wrote:Current eval functions and shallow searches are good predictors for the deep searches. So, when you're need to store tree (in openings book, EGTB or TT) it's possible to store _only_ moves that are differs from "obvious". Possible it can reduce amount of stored info by many times. Anyone tried this approach?

Given that in some position the best move is _not_ the obvious one you have anyhow to foreseen a TT entry record that has a slot for storing a move, so you could _theoretically_ save same space only if you foreseen two different TT entries structs, one when you have a move and one when you don't, but I don't see how you can fit 2 different record types in the same table. You could use 2 TT tables with different layouts, but it seems to me what I call "searching for troubles".

If instead you think of _not_ saving the TT entry altogether when the cut-off move is the most obvious one than you need to consider that TT probing is done well early in the search(), after that you still have null search, razoring IID and perhaps other stuff before you reach main loop where you try the "obvious" ttMove.

Don · Post by **Don** » Tue Sep 13, 2011 8:38 pm

Sergei S. Markoff wrote:Current eval functions and shallow searches are good predictors for the deep searches. So, when you're need to store tree (in openings book, EGTB or TT) it's possible to store _only_ moves that are differs from "obvious". Possible it can reduce amount of stored info by many times. Anyone tried this approach?

The problem with this approach is that you have to know whether a move was even considered to be in the book from the position in question. In other words the move to be consider "book" is either the move the computer likes, in the exception table, or neither - and there is no simple way to know if it's "neither."

Don · Post by **Don** » Tue Sep 13, 2011 9:34 pm

There is a way to represent a book using 1 byte per move. The old spraklen programs used this to conserve space. You basically build a tree where 6 bits represent the move (the Nth move in a sorted legal move list) and 2 bits shape the tree. You can think of the 2 bits as left and right parenthesis. Left parenthesis starts a new child node, right parenthesis closes the current node. So a book with 2 choices for white only looks like this:

(e4)(d4)

And can be expressed with 2 bytes.

Since it's recursive, you can build a sub book inside one:

(e4(e5))(d4)

Easily parsed by a computer of course. One limitation is that it cannot handle a position with over 64 moves but I'm pretty sure that is rare enough that you can build a practical book with it. 64 is not the actual limitation - if there are 70 legal moves you are still ok if the one you want is one of the first 64.

None of this is really interesting for books however unless you are building a program for a highly constrained toy computer with severely limited memory.

But your idea might be interesting in the context of an endgame database. Let's assume a bit base. If you can produce a function that gets the right answer the majority of the time, you would only have to store the exceptions. However a bit base is already very compact and it's difficult to improve on that. There is no regular pattern to the exception list so you have to store each one with many bits. So your function must be almost perfect if you expect substantial savings, perhaps needing to get it right 99% of the time or more. Keep in mind that many endings are mostly wins or draws anyway and are compressed - so you will beat your head against the wall trying to improve on this with heuristics.

I came up with an interesting data structure when building a perfect Rubiks cube solver many years ago. The solver found the shortest solution. It was a very difficult problem back when I did this because computers were slow and did not have much memory. Such an idea might work for an endgame database combined with a search. Here is how it worked for the cube:

1. Build a fixed size hash table.
2. From the solved cube, work backwards.
3. From each position hash the cube.
4. Don't worry about collisions in the hash table.
5. Store the current depth in the hash table entry IF it's the lowest depth that hashes here.

Each entry was only 4 bits.

So given any cube I could do a lookup in the hash table and tell you that the solution is at least N ply away. I shared many positions with a single entry by doing this. During the search this could prune away enormous sub-tree's because the search was iterative and I could stop searching if the remaining depth was less than this value. You pre-build this table with a depth first search until it is completely full and it can be any size, the bigger the better.

Sergei S. Markoff wrote:Current eval functions and shallow searches are good predictors for the deep searches. So, when you're need to store tree (in openings book, EGTB or TT) it's possible to store _only_ moves that are differs from "obvious". Possible it can reduce amount of stored info by many times. Anyone tried this approach?

Sergei S. Markoff · Post by **Sergei S. Markoff** » Tue Sep 13, 2011 10:03 pm

At practice you can use something like Huffman tree.
Let's assume that you have move generator with some clever ordering. For example, based on quiescence search. It produces a list of moves, where the 1st move is the best one, 2nd — next to the first and so on.
Then you just analyzes every node of your book and replaces move itself to the number in the list. Obviously, number 0 will be much frequent, 1 — the next and so on. Then you're creating Huffman tree that will provide your with optimal bit sequences for each number (or using arithmetic coding to pack data). In ideal case you can achive much less than 4 bits per move

Don · Post by **Don** » Wed Sep 14, 2011 1:43 am

Sergei S. Markoff wrote:At practice you can use something like Huffman tree.
Let's assume that you have move generator with some clever ordering. For example, based on quiescence search. It produces a list of moves, where the 1st move is the best one, 2nd — next to the first and so on.
Then you just analyzes every node of your book and replaces move itself to the number in the list. Obviously, number 0 will be much frequent, 1 — the next and so on. Then you're creating Huffman tree that will provide your with optimal bit sequences for each number (or using arithmetic coding to pack data). In ideal case you can achive much less than 4 bits per move

Huffman is a very compact way to express the moves themselves and a good idea, however you still have to have some structure to this, for example you need to be able to reference a position. If you use a tree structure your pointers are at least 32 bits. So how do you figure out where to start looking for a move?

Sergei S. Markoff · Post by **Sergei S. Markoff** » Wed Sep 14, 2011 2:35 am

Don wrote:Huffman is a very compact way to express the moves themselves and a good idea, however you still have to have some structure to this, for example you need to be able to reference a position. If you use a tree structure your pointers are at least 32 bits. So how do you figure out where to start looking for a move?

If the book is path-dependent (can't handle transpositions) — you can just go from the root till the current position. But of course it isn't good book.

For the path-independent book it's neccessary to create some index. In my old SmarThink book I just have <hash><move> pairs sorted by <hash> to perform binary search. Of course it's not too clever from the packing point of view. Of course there are several easy ways of compacting sorted sequence of hashes. Just store one of say 100 hashes as base and next 99 value deltas. Deltas will be easy to pack because all of them will be nearly same values because positions are uniformly distributed through the hash space. So just use the same Huffman/arithmetic coding. When you're going to find position, perform binary search through base hashes and then extract block of 99 hashes to find neccessary position.

Edmund · Post by **Edmund** » Wed Sep 14, 2011 7:15 pm

Sergei S. Markoff wrote:
Don wrote:Huffman is a very compact way to express the moves themselves and a good idea, however you still have to have some structure to this, for example you need to be able to reference a position. If you use a tree structure your pointers are at least 32 bits. So how do you figure out where to start looking for a move?
If the book is path-dependent (can't handle transpositions) — you can just go from the root till the current position. But of course it isn't good book.

For the path-independent book it's neccessary to create some index. In my old SmarThink book I just have <hash><move> pairs sorted by <hash> to perform binary search. Of course it's not too clever from the packing point of view. Of course there are several easy ways of compacting sorted sequence of hashes. Just store one of say 100 hashes as base and next 99 value deltas. Deltas will be easy to pack because all of them will be nearly same values because positions are uniformly distributed through the hash space. So just use the same Huffman/arithmetic coding. When you're going to find position, perform binary search through base hashes and then extract block of 99 hashes to find neccessary position.

The real question is where you want to save space. If you want to save space on the hard-disk you can use the packed huffman data and if you want to use it in game play you can convert it to a hashed position-move database.

Anyway I think this format is much more helpful for large databases of games, where the main purpose is just archiving and not position search.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Wed Sep 14, 2011 7:48 pm

Edmund wrote:The real question is where you want to save space. If you want to save space on the hard-disk you can use the packed huffman data and if you want to use it in game play you can convert it to a hashed position-move database.

Anyway I think this format is much more helpful for large databases of games, where the main purpose is just archiving and not position search.

Packed structure can be used to search this tree -- there is no problem. Both ways -- on disk or in memory. You just need to perform binary search through "base" hashes and then extract neccessary hash/move block.

Don · Post by **Don** » Wed Sep 14, 2011 9:10 pm

Sergei S. Markoff wrote:
Edmund wrote:The real question is where you want to save space. If you want to save space on the hard-disk you can use the packed huffman data and if you want to use it in game play you can convert it to a hashed position-move database.

Anyway I think this format is much more helpful for large databases of games, where the main purpose is just archiving and not position search.
Packed structure can be used to search this tree -- there is no problem. Both ways -- on disk or in memory. You just need to perform binary search through "base" hashes and then extract neccessary hash/move block.

Another data structure to be considered is a bloom filter. I haven't done any serious requirements or feasibility study but it's based on set membership and the size of the Key is irrelevant. So you could store a moves as the tuple (position, mv)

Given a position P you query each legal move to see if the key (P, move) is in the set and if it is you consider it a book move.

One important issue is that this is a probabilistic data structure which can produce false positive but never false negatives. With 10 bits you have something less than 1% chance of a false positive (which is not nearly enough) but you can improve that enormously by using more bits or using a second bloom filter which is designed specifically to identify false positives. If the book is designed such that the program knows when it's out of book, you will be able to know with certainty which items will generate false matches. This will be orders of magnitude less than the number of moves in the book and can be handled with a second anti-bloom filter which would be very tiny in comparison. This can be taken to a third or fourth level if needed to prevent false positives.

I don't think you can do any better than this using such a simple data structure. You can store moves using Huffman coding very compactly, but you need to impose some type of indexing or other data structure to make it work. So I think if you wanted the simplest compact data structure it would be hard to beat this and you get the advantage of a hashed data structure that would automatically pick up transpositions.

Way to reduce amount of stored data in engine

Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine

Re: Way to reduce amount of stored data in engine