finally i got the negamax scores runnning and now my little engine plays chess on a gpu
http://zeta-chess.blogspot.com/
Still a lot of work to do...therefore i am looking for sponsors, hardware, money, what ever.
--
Srdja
Moderators: hgm, Harvey Williamson, bob
With its current "SPPS" search, which is comparable to a Negamax wo AlphaBeta pruning Zeta achieves with 128 Threads ~ 500.000 nodes per second.How many nodes per second do you get with 128 threads?
Thats not possible in that way, because the spps-search assumes a min amount of 128 threads.Did you calculate the speedup between 1 and 128 threads?
Not yet.Did you submit your engine to AMD OpenCL coding competition innovation challenge: http://community.topcoder.com/amdapp/am ... etition-2/ ?
In your blog you wrote:smatovic wrote:With its current "SPPS" search, which is comparable to a Negamax wo AlphaBeta pruning Zeta achieves with 128 Threads ~ 500.000 nodes per second.
No Quiscence Search, no Castling/En Passant moves. Very simple Eval.
What is the reason for this? Could you explain how your "SPPS" scheme is related to Q-Search? Since you do not want to split a QS node over different threads there must be some other reason which I do not understand yet.As i thought the SPPS-Search with 128 parallel Threads has problems when the engine enters the Q-Search. In Q-Search are only Capture-Moves considered and as fewer the moves size is the more power i am loosing with spps![]()
very nice to read that you have managed to create a chess playing engine on a GPU!
With spps- a simple parallel processing scheme - i use 128 Threads in parallel to process one board position with an max amount of 128 childs (my personal assumption). So the next iteration has 128*128 childs. Because every of this 128 threads generates max 128 childs. This means my average occupancy of the 128 threads depends on the max amount of childs from one position....i.e. if we got an average of 32 childs per node during a chess game then i will "loose" with spps 128-32=96 idle Threads, but i win a SIMD friendly process with no communication overhead.What is the reason for this? Could you explain how your "SPPS" scheme is related to Q-Search? Since you do not want to split a QS node over different threads there must be some other reason which I do not understand yet.
I use OpenCL, so It works on GPUs. CPUs, APUs...Does it work with any video card? Or just with nVidias? (perhaps you use CUDA?)