Is there such a thing as branchless move generation?

Daniel Shawul · Post by **Daniel Shawul** » Sat Jun 16, 2012 1:17 pm

In 90s we just didn't store the moves at all. You generate 1 move and then search it. Worked for cpu's with 128 bytes RAM. The 128 bytes victory goes to Frans Morsch if i remember well

Doing anything in a cache is not clever. part of L1 for fast hashtable, that should be the max you want to store and what's closer to the root is allowed to probe the RAM, nothing else you'd want to store in the RAM...

Algorithmically the hashtable is always the problem to solve well.

It is a bit different for monte carlo since you need to pick one move _uniformly_ and search only that move. With alpha-beta scheme you use in the 90's you know you are going to come back to that ply and search the rest of the moves. So you can go sequentially on the move list. OTOH for MC ,first you have to count how many moves you have, which for hex is easy but not chess. Then you generate a random move index. Then you generate that move which again requires looping over all moves (without generating them) until you reach that index. So there are three loops over the move generator , kind of an overkill I know.And that is only for pseudo legal moves, generating a random legal move requires more effort.

Daniel Shawul · Post by **Daniel Shawul** » Sat Jun 16, 2012 1:27 pm

Are you sure 8 bits will work to store each move ? from&to will atleast require 12 bits. That will save me a lot of trouble as I am using a generous 32 bit for each move. Other than that I generate a single random move just like you did by skipping moves until I reach the target move.

diep · Post by **diep** » Sat Jun 16, 2012 1:30 pm

Daniel Shawul wrote:
In 90s we just didn't store the moves at all. You generate 1 move and then search it. Worked for cpu's with 128 bytes RAM. The 128 bytes victory goes to Frans Morsch if i remember well

Doing anything in a cache is not clever. part of L1 for fast hashtable, that should be the max you want to store and what's closer to the root is allowed to probe the RAM, nothing else you'd want to store in the RAM...

Algorithmically the hashtable is always the problem to solve well.
It is a bit different for monte carlo since you need to pick one move _uniformly_ and search only that move. With alpha-beta scheme you use in the 90's you know you are going to come back to that ply and search the rest of the moves. So you can go sequentially on the move list. OTOH for MC ,first you have to count how many moves you have, which for hex is easy but not chess. Then you generate a random move index. Then you generate that move which again requires looping over all moves (without generating them) until you reach that index. So there are three loops over the move generator , kind of an overkill I know.And that is only for pseudo legal moves, generating a random legal move requires more effort.

Since when are you in the business of generating legal moves, you're interested in doing Peter McKenzie's perft test faster or something?

I remember when he invented it in 90s and started posting the numbers of it, yet i don't think that very interesting.

Isn't your thing supposed to play chess somehow?

diep · Post by **diep** » Sat Jun 16, 2012 1:35 pm

Daniel Shawul wrote:Are you sure 8 bits will work to store each move ? from&to will atleast require 12 bits. That will save me a lot of trouble as I am using a generous 32 bit for each move. Other than that I generate a single random move just like you did by skipping moves until I reach the target move.

Well you can afford wasting 20 out of 32 bits on a gpu i guess.

Remember that you are in a big luxury position with so many kilobytes for each processing element / streamcore.

That is a big luxury position.

So what you need basically is a few registers for Zobrist, depth left, 1 bound value, and then a few registers for the position and/or list of pieces. That all fits easily inside the registers of a streamcore.

The rest is just for luxury purposes

The real bottleneck there is, is when you would want to write a tad bigger program, say using a tad bigger evaluation function than the Fruit/Rybka clones have, that part of the L1 for the instructions is rather little.

Real huge evaluation functions are tricky in gpgpu.

Just searching deep in beancounter means is quite possible though.

I never understood all the monte carlo attempts.

Daniel Shawul · Post by **Daniel Shawul** » Sat Jun 16, 2012 3:56 pm

Since when are you in the business of generating legal moves, you're interested in doing Peter McKenzie's perft test faster or something?

I remember when he invented it in 90s and started posting the numbers of it, yet i don't think that very interesting.

Isn't your thing supposed to play chess somehow?

You are missing the point. Monte Carlo requires you to generate a a single legal move uniformly among the exiting legal moves. Now what I described above is not even for legal move but for pseudo legal. If you want the correct uniform distribution over the legal moves, you have to select among the legal moves. What I do for legal move generation is kind of a hack that changes the uniform distribution. I don't check for legality of all moves while generating them. I select uniformly from the pseudo-legal moves as I described above, then test the selected move for legality. If it fails it is marked as tested so the generator will skip that move. So when you are in check for example, most moves will be invald and the legal move generation takes a bit of time. In the extreme case that you don't have any legal move (like when being in check mate) all bits will be set. That is what I use the bitsets for when generating a legal move.

diep · Post by **diep** » Sat Jun 16, 2012 4:51 pm

Daniel Shawul wrote:
Since when are you in the business of generating legal moves, you're interested in doing Peter McKenzie's perft test faster or something?

I remember when he invented it in 90s and started posting the numbers of it, yet i don't think that very interesting.

Isn't your thing supposed to play chess somehow?
You are missing the point. Monte Carlo requires you to generate a a single legal move uniformly among the exiting legal moves. Now what I described above is not even for legal move but for pseudo legal. If you want the correct uniform distribution over the legal moves, you have to select among the legal moves. What I do for legal move generation is kind of a hack that changes the uniform distribution. I don't check for legality of all moves while generating them. I select uniformly from the pseudo-legal moves as I described above, then test the selected move for legality. If it fails it is marked as tested so the generator will skip that move. So when you are in check for example, most moves will be invald and the legal move generation takes a bit of time. In the extreme case that you don't have any legal move (like when being in check mate) all bits will be set. That is what I use the bitsets for when generating a legal move.

I posted before i'm not a big fan of Monte Carlo.
What you describe however seems overcomplicating things too much to me.

Just make a special case for being in check.

Is there such a thing as branchless move generation?

Re: Zeta CL uses Magic Bitboards

Re: Is there such a thing as branchless move generation?

Re: Zeta CL uses Magic Bitboards

Re: Is there such a thing as branchless move generation?

Re: Zeta CL uses Magic Bitboards

Re: Zeta CL uses Magic Bitboards