What does your code look like that chooses the sorting network function to call?
New Engine announcement: Pygmalion
Moderators: hgm, Rebel, chrisw
-
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: New Engine announcement: Pygmalion
[Moderation warning] This signature violated the rule against commercial exhortations.
-
- Posts: 2657
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: New Engine announcement: Pygmalion
Curious, there are not that many OpenCL engines out there. What kind of design did you choose? Did you couple multiple gpu-threads to one worker? What kind of move generation?R. Tomasi wrote: ↑Thu Sep 02, 2021 3:00 pm ...
Over the last years the "engine" was rewritten several times and ported to C++. I even played around with an OpenCL version in the hopes of harnessing the power of modern GPUs, but (as you may already guess) failed miserably at efficiently parallelizing the search in a way that fits these vector-based architectures of the GPUs. About two years ago I had a version that was playing reasonably strong (it consistently beat the FairyMax version bundled with Winboard at the time), but it was buggy, full of search instabilities and generally not anything I would show around without being ashamed of my work.
...
--
Srdja
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi
Re: New Engine announcement: Pygmalion
Code: Select all
template<typename VALUE, typename SCORE>
class sortAlgorithm
{
private:
/*
[...]
*/
typedef void SORTFUNCTION(VALUE* pValues, SCORE* pScores);
constexpr static inline SORTFUNCTION* m_Sort[31]
{
&sort_N2,
&sort_N3,
&sort_N4,
&sort_N5,
&sort_N6,
&sort_N7,
&sort_N8,
&sort_N9,
&sort_N10,
&sort_N11,
&sort_N12,
&sort_N13,
&sort_N14,
&sort_N15,
&sort_N16,
&sort_N17,
&sort_N18,
&sort_N19,
&sort_N20,
&sort_N21,
&sort_N22,
&sort_N23,
&sort_N24,
&sort_N25,
&sort_N26,
&sort_N27,
&sort_N28,
&sort_N29,
&sort_N30,
&sort_N31,
&sort_N32
};
constexpr static size_t sort_tail() noexcept
{
return 32;
}
public:
constexpr static void sortValues(VALUE* pValues, SCORE* pScores, const size_t length) noexcept
{
if (length > 1)
{
if (length <= sort_tail())
{
m_Sort[length - 2](pValues, pScores);
}
else
{
quickSort(0, length - 1, pValues, pScores);
}
}
}
}
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi
Re: New Engine announcement: Pygmalion
My Idea was to break down the processing of a node into different phases (prologue, make move, returned from move, loop next move, epilogue) and have a kernel that can process such a phase. Then in essence process all moves of a node in a vector-parallel fashion using these kernels. Trouble is, that depending on what phase the kernel is in it will take different time to execute it, such that all moves always wait for the one that takes the longest to process. Also, the different threads branch differently, which is a huge performance hit on a GPU (you typically want neighbouring workers to branch the same way).smatovic wrote: ↑Sun Sep 05, 2021 11:55 amCurious, there are not that many OpenCL engines out there. What kind of design did you choose? Did you couple multiple gpu-threads to one worker? What kind of move generation?R. Tomasi wrote: ↑Thu Sep 02, 2021 3:00 pm ...
Over the last years the "engine" was rewritten several times and ported to C++. I even played around with an OpenCL version in the hopes of harnessing the power of modern GPUs, but (as you may already guess) failed miserably at efficiently parallelizing the search in a way that fits these vector-based architectures of the GPUs. About two years ago I had a version that was playing reasonably strong (it consistently beat the FairyMax version bundled with Winboard at the time), but it was buggy, full of search instabilities and generally not anything I would show around without being ashamed of my work.
...
--
Srdja
Concerning the move generation: well, I simply ported my C++ code to C99 and replaced anything recursive by equivalent iterations (since you cannot do recursion on the GPU).
It simply did not work out the way I hoped. Although it did play chess, it became relatively quickly clear that this design is totally inefficient.
Edit: I may have the code lying around somewhere. If you're interested I can try to dig it out, but I don't want to promise you that I still have it - I may have lost that code. I'm not sure.
-
- Posts: 2657
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: New Engine announcement: Pygmalion
Not sure, maybe I had a similar idea, I called it SPPS, simple parallel processing scheme, couple one work-group, 128 threads, to one worker, process up to 128 moves of a node in parallel, store the moves as tree in a way that you can process the moves in parallel. SPPS was able to play chess, but IIRC I had to realize that I am wasting most of the threads during Quiescence-Search and such an approach can not work out...R. Tomasi wrote: ↑Sun Sep 05, 2021 12:05 pm My Idea was to break down the processing of a node into different phases (prologue, make move, returned from move, loop next move, epilogue) and have a kernel that can process such a phase. Then in essence process all moves of a node in a vector-parallel fashion using these kernels. Trouble is, that depending on what phase the kernel is in it will take different time to execute it, such that all moves always wait for the one that takes the longest to process. Also, the different threads branch differently, which is a huge performance hit on a GPU (you typically want neighbouring workers to branch the same way).
Concerning the move generation: well, I simply ported my C++ code to C99 and replaced anything recursive by equivalent iterations (since you cannot do recursion on the GPU).
It simply did not work out the way I hoped. Although it did play chess, it became relatively quickly clear that this design is totally inefficient.
Edit: I may have the code lying around somewhere. If you're interested I can try to dig it out, but I don't want to promise you that I still have it - I may have lost that code. I'm not sure.
--
Srdja
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi
Re: New Engine announcement: Pygmalion
Sounds like a similar concept, yes. Anyways, I dumped that idea relatively quickly. I think there still are ways to use the GPU in a chess engine, but imho it would have to be some sort of hybrid-aproach.smatovic wrote: ↑Sun Sep 05, 2021 12:31 pmNot sure, maybe I had a similar idea, I called it SPPS, simple parallel processing scheme, couple one work-group, 128 threads, to one worker, process up to 128 moves of a node in parallel, store the moves as tree in a way that you can process the moves in parallel. SPPS was able to play chess, but IIRC I had to realize that I am wasting most of the threads during Quiescence-Search and such an approach can not work out...R. Tomasi wrote: ↑Sun Sep 05, 2021 12:05 pm My Idea was to break down the processing of a node into different phases (prologue, make move, returned from move, loop next move, epilogue) and have a kernel that can process such a phase. Then in essence process all moves of a node in a vector-parallel fashion using these kernels. Trouble is, that depending on what phase the kernel is in it will take different time to execute it, such that all moves always wait for the one that takes the longest to process. Also, the different threads branch differently, which is a huge performance hit on a GPU (you typically want neighbouring workers to branch the same way).
Concerning the move generation: well, I simply ported my C++ code to C99 and replaced anything recursive by equivalent iterations (since you cannot do recursion on the GPU).
It simply did not work out the way I hoped. Although it did play chess, it became relatively quickly clear that this design is totally inefficient.
Edit: I may have the code lying around somewhere. If you're interested I can try to dig it out, but I don't want to promise you that I still have it - I may have lost that code. I'm not sure.
--
Srdja
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi
Re: New Engine announcement: Pygmalion
pygmalion-mechanics
This module provides the framework for defining moves and how they are supposed to be made/unmade. It builds on top of pygmalion-state. There is no move generation or legality checks going on, that's all in the pygmalion-dynamics module which builds on top of pygmalion-mechanics. The mechanics module is first and foremost responsible for describing what exactly a certain movetype is supposed to do with the board. It also provides everything needed to keep the move history of a game and bloomfilters for repetition checks.
As already done with the pygmalion-state module it is configured using a description template:
The template arguments have the following semantics:
Again it's using the CRTP for static polymorphism. The fromSquare and toSquare methods probably should be moved from this class into the inheriting class, since for some movetypes (like nullmove) these may not be well-defined. I plan to do that eventually, but for the moment I could not be bothered tidying this up. It's on the to-do list, though.
Building on top of that base-class the module defines some movetypes which one typically encounters in boardgames:
Now, since typically you will need more than one of these movetypes in a boardgame (in chess for example you need transport, captures, promotions, promocaptures) another class that inherits from move is provided: disjunctive move. What it does is multiplexing different movetypes into a single movetype.
In chess, for example, my final move class looks like that:
This turns out to be a quite compact way of storing moves, for chess a move will require 16bit storage size and it could even be extended with movetypes for offering/accepting draws without increasing the storage size.
This module provides the framework for defining moves and how they are supposed to be made/unmade. It builds on top of pygmalion-state. There is no move generation or legality checks going on, that's all in the pygmalion-dynamics module which builds on top of pygmalion-mechanics. The mechanics module is first and foremost responsible for describing what exactly a certain movetype is supposed to do with the board. It also provides everything needed to keep the move history of a game and bloomfilters for repetition checks.
As already done with the pygmalion-state module it is configured using a description template:
Code: Select all
template<typename MOVE, size_t COUNT_BITS_BLOOMFILTER, size_t COUNT_VALUES_BLOOMFILTER>
class descriptor_mechanics;
- COUNT_BITS_BLOOMFILTER defines how many bits of a position hash should be used for the repetition bloomfilter.
- COUNT_VALUES_BLOOMFILTER defines the range of the individual entries in the bloomfilter. Essentially this should be larger than the maximum number of moves a game can have, which obviously depends on the rules of the boardgame that is targetted.
- MOVE is the class that defines a move of the game. It must inherit from the move template/class that is also defined in the pygmalion-mechanics module. Afaik there is no elegant way in C++17 to enforce template arguments inheriting a certain base class. I think C++20 provides ways to do that, but I have not really looked into that, so don't take me at face-value here.
Code: Select all
template<typename BOARD, size_t COUNT_BITS, typename MOVEDATA, typename INSTANCE>
class move
{
public:
using instanceType = INSTANCE;
using boardType = BOARD;
using descriptorState = typename boardType::descriptorState;
#include <pygmalion-state/include_state.h>
constexpr static const size_t countBits{ COUNT_BITS };
using movebitsType = uint_t<countBits, false>;
using movedataType = MOVEDATA;
protected:
constexpr move() noexcept = default;
constexpr move(move&&) noexcept = default;
constexpr move(const move&) noexcept = default;
constexpr move& operator=(move&&) noexcept = default;
constexpr move& operator=(const move&) noexcept = default;
~move() noexcept = default;
public:
std::string name() const noexcept
{
return static_cast<const instanceType*>(this)->name_Implementation();
}
constexpr movedataType doMove(boardType& position, const movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->doMove_Implementation(position, moveBits);
}
constexpr void undoMove(boardType& position, const movedataType& data) const noexcept
{
static_cast<const instanceType*>(this)->undoMove_Implementation(position, data);
}
bool parse(const boardType& position, std::string& text, movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->parse_Implementation(position, text, moveBits);
}
std::string toString(const boardType& position, const movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->toString_Implementation(position, moveBits);
}
constexpr squaresType otherOccupancyDelta(const boardType& position, const movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->otherOccupancyDelta_Implementation(position, moveBits);
}
constexpr squaresType ownOccupancyDelta(const boardType& position, const movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->ownOccupancyDelta_Implementation(position, moveBits);
}
constexpr squareType fromSquare(const boardType& position, const movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->fromSquare_Implementation(position, moveBits);
}
constexpr squareType toSquare(const boardType& position, const movebitsType& moveBits) const noexcept
{
return static_cast<const instanceType*>(this)->toSquare_Implementation(position, moveBits);
}
};
Building on top of that base-class the module defines some movetypes which one typically encounters in boardgames:
- transportmove : a move that just transports a piece from one square to another
- capturemove: a move that transports a piece from one square to another and removes the piece that occupies the target square
- promotionmove: a move that removes the piece from it's initial square and drops a different piece on the target square
- promocapturemove: a move that removes the piece from it's initial square and drops a different piece on the target square after removing the piece that occupies the target square
- dropmove: a move that just spawns a piece on a square
- killmove: a move that removes a piece from a square
- nullmove: does nothing, except from changing which player is on the move
Now, since typically you will need more than one of these movetypes in a boardgame (in chess for example you need transport, captures, promotions, promocaptures) another class that inherits from move is provided: disjunctive move. What it does is multiplexing different movetypes into a single movetype.
In chess, for example, my final move class looks like that:
Code: Select all
class combinedmoves :
public pygmalion::mechanics::disjunctivemove<board, queenpromocapturemove, queenpromotionmove, rookpromocapturemove, rookpromotionmove, bishoppromocapturemove, bishoppromotionmove, knightpromocapturemove, knightpromotionmove, doublepushmove, enpassantmove, kingsidecastlemove, queensidecastlemove, capturemove, quietmove, nullmove>
-
- Posts: 323
- Joined: Tue Aug 31, 2021 10:32 pm
- Full name: tcusr
Re: New Engine announcement: Pygmalion
you can do a static_assert and use std::is_base_of to verify a class inherits from another one
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi
Re: New Engine announcement: Pygmalion
You're right! Thanks for pointing that out. I guess I'll incorporate static asserts of that kind. It's more or less a cosmetic thing, since when the inheritance does not fit, it will not compile anyways. But it's better to have one static assert failing instead of like 100 compiler errors without clear indication of what went wrong.
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi