I need help, performance wall!

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

aberent

I need help, performance wall!

Post by aberent »

I need help

I have been developing my own C# chess engine for over a year now. I had no prior experience in chess programming so I learned everything from various articles on the internet.

Unfortunately recently I hit a performance wall.

My chess engine can search happily to ply 5 in about 10-15 seconds per move. However at ply 5 its not a very strong chess engine. I am trying to get it to ply 7. However it just takes too long to play a game, often taking up to 2 minutes per move.

I have spent months examining my code, profiling, looking for anything that might be improved. Often I have found things to fix, shaving seconds per move here and there. Ultimately though the engine is still slow. At this point I think I have a design flaw.

Here is what I know.

The slowest portion of my chess engine, according to my profiler is the move generator. By far it takes up 50% of the cpu cycles. I have optimized it to heck and still no luck.

One of the shortcoming of my chess engine I am aware of is what is often refered to as the flooding technique. Basicaly instead of having a move and unmove method I simply make a copy of all 64 board squares. I know this is slower, however according to my profiler this code is responcible for aproximatly 5% of the cpu cycles. Implementing the move and unmove method is not going to solve my problem (I tried it)

The other short coming of my design is that I calculate the score of every possible move and sort, before I enter alpha beta. This makes my sort almost perfect, but it means that I have to generate moves and evaluate score for chess boards that might never be searched due to an alapha beta cutoff. I have also created an alternative version of my chess engine,that only evaluates the move it is on now (not all at once). This is faster but not all that much. Plus not having examined all the moves ahead of time, I have no idea if I am in a check mate (or check) . This makes tha alternative design weaker at the same ply.

At this point I really need some help. A while back I have started a blog documenting the development of my chess engine. All of the source code is available there. The blog is www.chessbin.com

Can you guys please have a look at let me know what I am doing wrong.

Thank you

Adam Berent
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: I need help, performance wall!

Post by bob »

aberent wrote:I need help

I have been developing my own C# chess engine for over a year now. I had no prior experience in chess programming so I learned everything from various articles on the internet.

Unfortunately recently I hit a performance wall.

My chess engine can search happily to ply 5 in about 10-15 seconds per move. However at ply 5 its not a very strong chess engine. I am trying to get it to ply 7. However it just takes too long to play a game, often taking up to 2 minutes per move.

I have spent months examining my code, profiling, looking for anything that might be improved. Often I have found things to fix, shaving seconds per move here and there. Ultimately though the engine is still slow. At this point I think I have a design flaw.

Here is what I know.

The slowest portion of my chess engine, according to my profiler is the move generator. By far it takes up 50% of the cpu cycles. I have optimized it to heck and still no luck.

One of the shortcoming of my chess engine I am aware of is what is often refered to as the flooding technique. Basicaly instead of having a move and unmove method I simply make a copy of all 64 board squares. I know this is slower, however according to my profiler this code is responcible for aproximatly 5% of the cpu cycles. Implementing the move and unmove method is not going to solve my problem (I tried it)

The other short coming of my design is that I calculate the score of every possible move and sort, before I enter alpha beta. This makes my sort almost perfect, but it means that I have to generate moves and evaluate score for chess boards that might never be searched due to an alapha beta cutoff. I have also created an alternative version of my chess engine,that only evaluates the move it is on now (not all at once). This is faster but not all that much. Plus not having examined all the moves ahead of time, I have no idea if I am in a check mate (or check) . This makes tha alternative design weaker at the same ply.

At this point I really need some help. A while back I have started a blog documenting the development of my chess engine. All of the source code is available there. The blog is www.chessbin.com

Can you guys please have a look at let me know what I am doing wrong.

Thank you

Adam Berent
Before you address performance, you have to make sure you are using the right algorithm. It sounds like you are not using alpha/beta, for starters???
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: I need help, performance wall!

Post by Dann Corbit »

aberent wrote:I need help

I have been developing my own C# chess engine for over a year now. I had no prior experience in chess programming so I learned everything from various articles on the internet.

Unfortunately recently I hit a performance wall.

My chess engine can search happily to ply 5 in about 10-15 seconds per move. However at ply 5 its not a very strong chess engine. I am trying to get it to ply 7. However it just takes too long to play a game, often taking up to 2 minutes per move.

I have spent months examining my code, profiling, looking for anything that might be improved. Often I have found things to fix, shaving seconds per move here and there. Ultimately though the engine is still slow. At this point I think I have a design flaw.

Here is what I know.

The slowest portion of my chess engine, according to my profiler is the move generator. By far it takes up 50% of the cpu cycles. I have optimized it to heck and still no luck.

One of the shortcoming of my chess engine I am aware of is what is often refered to as the flooding technique. Basicaly instead of having a move and unmove method I simply make a copy of all 64 board squares. I know this is slower, however according to my profiler this code is responcible for aproximatly 5% of the cpu cycles. Implementing the move and unmove method is not going to solve my problem (I tried it)

The other short coming of my design is that I calculate the score of every possible move and sort, before I enter alpha beta. This makes my sort almost perfect, but it means that I have to generate moves and evaluate score for chess boards that might never be searched due to an alapha beta cutoff. I have also created an alternative version of my chess engine,that only evaluates the move it is on now (not all at once). This is faster but not all that much. Plus not having examined all the moves ahead of time, I have no idea if I am in a check mate (or check) . This makes tha alternative design weaker at the same ply.

At this point I really need some help. A while back I have started a blog documenting the development of my chess engine. All of the source code is available there. The blog is www.chessbin.com

Can you guys please have a look at let me know what I am doing wrong.

Thank you

Adam Berent
The problem is not your move generator, despite your profile.
The problem will be in your search.
1. You must use some form of alpha-beta
2. You must use some form of null-move pruning

I will take a look at your engine.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: I need help, performance wall!

Post by Dann Corbit »

Dann Corbit wrote:
aberent wrote:I need help

I have been developing my own C# chess engine for over a year now. I had no prior experience in chess programming so I learned everything from various articles on the internet.

Unfortunately recently I hit a performance wall.

My chess engine can search happily to ply 5 in about 10-15 seconds per move. However at ply 5 its not a very strong chess engine. I am trying to get it to ply 7. However it just takes too long to play a game, often taking up to 2 minutes per move.

I have spent months examining my code, profiling, looking for anything that might be improved. Often I have found things to fix, shaving seconds per move here and there. Ultimately though the engine is still slow. At this point I think I have a design flaw.

Here is what I know.

The slowest portion of my chess engine, according to my profiler is the move generator. By far it takes up 50% of the cpu cycles. I have optimized it to heck and still no luck.

One of the shortcoming of my chess engine I am aware of is what is often refered to as the flooding technique. Basicaly instead of having a move and unmove method I simply make a copy of all 64 board squares. I know this is slower, however according to my profiler this code is responcible for aproximatly 5% of the cpu cycles. Implementing the move and unmove method is not going to solve my problem (I tried it)

The other short coming of my design is that I calculate the score of every possible move and sort, before I enter alpha beta. This makes my sort almost perfect, but it means that I have to generate moves and evaluate score for chess boards that might never be searched due to an alapha beta cutoff. I have also created an alternative version of my chess engine,that only evaluates the move it is on now (not all at once). This is faster but not all that much. Plus not having examined all the moves ahead of time, I have no idea if I am in a check mate (or check) . This makes tha alternative design weaker at the same ply.

At this point I really need some help. A while back I have started a blog documenting the development of my chess engine. All of the source code is available there. The blog is www.chessbin.com

Can you guys please have a look at let me know what I am doing wrong.

Thank you

Adam Berent
The problem is not your move generator, despite your profile.
The problem will be in your search.
1. You must use some form of alpha-beta
2. You must use some form of null-move pruning

I will take a look at your engine.
ChessBinStarterKit seems to be just a GUI with no engine attached, and the engine is just a control but not attached to anything. If you publish the code to a working chess engine it will be much easier to profile things and help you in your search.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: I need help, performance wall!

Post by Sven »

I have taken a look into your source code which is, as Dann noted already, not present in some compact form via download but can be read step by step. First of all I want to mention that I like the way you describe the design and implementation of your program. It has been quite easy for me to read your code based on your descriptions.

Now that you have read my first sentences, you might already have an idea of my next sentence, and especially of the word it begins with. Yes, correct, it is "however" :-)

However, there is at least one thing which I consider to be simply not appropriate for designing a chess engine, and that is the way you deal with the generation of valid moves and their ordering at the beginning of each new node. What you seem to do at each node is:

- you generate all possible moves;
- you store a list of all *positions* resulting from making these moves;
- you check each of these positions for legality, and throw illegal ones out of the list;
- for the remaining list of valid successor positions, you evaluate each of these positions using your normal evaluation function;
- you sort the whole list of positions based on the score;
- finally, you search all moves in that order with a standard alpha-beta search.

If I got this about right then I think this is *by far* more than necessary. Let me tell you what you *should* do instead. I know this would turn out to be a major redesign, and this will hurt you for sure. But you should realize where you stay without it, I think.

- Generate all possible moves, and store the *moves* (not the resulting positions) in a move list. One "move" object represents kind of an "event" and contains at least the square numbers of the "from" and "to" squares plus the type of the promotion piece.
- Evaluate all these moves without actually making them on the board by assigning some simple, static value that can be retrieved very quickly from the move itself plus the current board state. Use this value for move ordering, instead of the evaluation function. Take care that most captures are ordered to the top of the move list. As a first approach, use the MVV/LVA principle to order captures, and some other heuristic, like history heuristic, to order non-captures. Also use the killer move heuristic to improve ordering of non-captures.
- Do not care about move legality in the context of move generation; instead, after making a pseudo-legal move on the board during search, check for legality and return an appropriate value for illegal positions. Due to the many cutoffs alpha-beta implies, this saves many legality checks for positions which never occur on the board.

It is true that there are also a lot of engines implementing a legal move generator. But they do it differently, with a lot of knowledge, for instance about pins or about which moves may be illegal at all and which not. They do not do "make - legality check - unmake" for all possible moves.

If you create a new version of your engine and do it this way, I'm pretty sure this will already give you a boost of about two or three plies.

You can do more, of course. I did not take enough time to read your quiescence search code carefully enough in order to judge about it. But at a first glance, it looks a little bit unusual for me, in that I don't recognize the typical, most important "stand pat" element in your quiescence search algorithm. It seems you are using the same search function for both full search and quiescence search, and I feel there must be something wrong with it. You need to implement quiescence search in a separate function where you start with calling the evaluation function, and if the score does not already cause a beta cutoff, try to improve on this score by searching all captures in an "intelligent" order (e.g. also MVV/LVA which is common).

These are some of the most obvious points for me, although there may be more.

Sven
CRoberson
Posts: 2055
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: I need help, performance wall!

Post by CRoberson »

2 minutes to achieve 7 ply means tree explosion -> you have a search problem. Telepath gets 19+ ply in 2 minutes. 14 ply in 7
seconds. I don't think a person with enough ability to code a chess program can make one slow enough (and I mean nps slow) to be that
slow. So, your problem has to be search.

Also, when asking for performance help, give us some performance numbers: Nodes per Second, search depth in T time on a given defined position,
plus total nodes (if possible a break down on Search nodes and QSearch nodes). With that, we can see things much quicker.

This is interesting: I met a high school kid from Atlanta a few months ago. He had the same issue - 2 minutes to complete 7 ply. He claimed
to be doing null move, alpha-beta and all the other best known algorithms. He was also testing for checkmate at every move generation.
I told him to drop that - it is a major waste of time. Also, he was running an elaborate idea on every move for move ordering.

If you are doing alpha-beta and you're still this slow, your move-ordering is likely bad. The other possibility is serious bugs.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: I need help, performance wall!

Post by stegemma »

aberent wrote:...to get it to ply 7. However it just takes too long to play a game, often taking up to 2 minutes per move.
...move generator. By far it takes up 50% of the cpu cycles. I have optimized it to heck and still no luck.
...
Adam Berent
If the move generator takes 50% of the whole time, consider that optimizing this part can save only a part of the time. If you'll get a 3 time faster move generator then your whole time will becomes:

50% other things + 1/3 * 50% generator = 67%

that means that working hard you can find a move in 80 seconds instead of about 120 seconds. That's interesting but does'nt solve your problem.

As said by Bob, maybe there are something wrong in your alfa-beta or in other algorithms. The idea to evaluate any move is obviously wrong, as you said. This could give little benefit if you change it, but add this to a faster move generator, exact alfa-beta, a little Transposition Table and maybe Null Moves and you can get a faster program. Not only one of all of this stuff can give the answer to your problem, but a combination of all of them.

I would procede step by step, starting from the way you use generated moves: you should compute and save the less as you can, when you generate moves. You can complete them as soon as you execute them.

The base idea is to delay anything that could be delayed, everywhere and everytime. alfa beta can avoid at all what you have delayed...
aberent

Re: I need help, performance wall!

Post by aberent »

Thank you for your reply Sven ,

Yours was by far the best, as you actually looked at my code, the other people just replied with "use alpha beta" which as you know, I already do.

As I mentioned in my post I have done exactly what you suggested. I haven't posted that code on my website yet, since I am not quite happy with its performance.

Maybe I can email you that code so you can review it?

Essentially in the alternative implementation I do the following,

1. Get all the positions for all possible moves (Legal and Illegal)
2. Apply a value mainly by looking at captures and castling
3. Sort
4. Start a Loop through positions
5. Make First Move
6. Alpha Beta
7. If Cut-off Return
8. Else Loop

As a result here is the difference between the 2 implementations:

1. Ply 7 Alpha Beta All Valid Moves Ahead of Time from Starting Position

Nodes per Second: 32879
Total Time: 25 seconds
Total Nodes: 854449

2. Ply 7 Alpha Beta One Valid Move at a Time from Starting Position

Nodes per Second: 132486
Total Time: 45 seconds
Total Nodes: 6090649

As you see the one move implementation is faster in nodes per second, but because the opening position has no captures, it is not trying the best move first. This gets better in the middle game, however not to the point where I can get 1 minute per move average.

How else can I sort before examining each position, please help

For everyone else who could not find my code the url is:

http://www.chessbin.com/page/Developmen ... ngine.aspx
Fguy64
Posts: 814
Joined: Sat May 09, 2009 4:51 pm
Location: Toronto

Re: I need help, performance wall!

Post by Fguy64 »

i haven't looked at your code, I'm not a c++ guy, but as far as I can see your list of steps for your alternative method does shed some light on the problem. I see possibly quite a bit of unnecessary work there, I'm not sure you want to be storing all those positions in step 1, then evaluating them before you even call alphabeta. That seems counter productive to me.

Give some thought as to how things might work out if a call to alphaBeta was the very first thing on the list. At least that's what I do, and I'm quite satisfied with the performance.

My source code (java) is available through the link in my signature. You are welcome to have a look if you think it might be useful to you.

Good luck, I'm sure Sven will set you straight.
aberent

Re: I need help, performance wall!

Post by aberent »

Maybe you can describe, how do you sort the positions before calling alpha bata?

You must have some quick evaluation or else how do you search for the more likely best moves first?