I need help, performance wall!

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Sven
Posts: 3853
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: I need help, performance wall!

Post by Sven » Wed Oct 21, 2009 9:50 pm

I have taken a look into your source code which is, as Dann noted already, not present in some compact form via download but can be read step by step. First of all I want to mention that I like the way you describe the design and implementation of your program. It has been quite easy for me to read your code based on your descriptions.

Now that you have read my first sentences, you might already have an idea of my next sentence, and especially of the word it begins with. Yes, correct, it is "however" :-)

However, there is at least one thing which I consider to be simply not appropriate for designing a chess engine, and that is the way you deal with the generation of valid moves and their ordering at the beginning of each new node. What you seem to do at each node is:

- you generate all possible moves;
- you store a list of all *positions* resulting from making these moves;
- you check each of these positions for legality, and throw illegal ones out of the list;
- for the remaining list of valid successor positions, you evaluate each of these positions using your normal evaluation function;
- you sort the whole list of positions based on the score;
- finally, you search all moves in that order with a standard alpha-beta search.

If I got this about right then I think this is *by far* more than necessary. Let me tell you what you *should* do instead. I know this would turn out to be a major redesign, and this will hurt you for sure. But you should realize where you stay without it, I think.

- Generate all possible moves, and store the *moves* (not the resulting positions) in a move list. One "move" object represents kind of an "event" and contains at least the square numbers of the "from" and "to" squares plus the type of the promotion piece.
- Evaluate all these moves without actually making them on the board by assigning some simple, static value that can be retrieved very quickly from the move itself plus the current board state. Use this value for move ordering, instead of the evaluation function. Take care that most captures are ordered to the top of the move list. As a first approach, use the MVV/LVA principle to order captures, and some other heuristic, like history heuristic, to order non-captures. Also use the killer move heuristic to improve ordering of non-captures.
- Do not care about move legality in the context of move generation; instead, after making a pseudo-legal move on the board during search, check for legality and return an appropriate value for illegal positions. Due to the many cutoffs alpha-beta implies, this saves many legality checks for positions which never occur on the board.

It is true that there are also a lot of engines implementing a legal move generator. But they do it differently, with a lot of knowledge, for instance about pins or about which moves may be illegal at all and which not. They do not do "make - legality check - unmake" for all possible moves.

If you create a new version of your engine and do it this way, I'm pretty sure this will already give you a boost of about two or three plies.

You can do more, of course. I did not take enough time to read your quiescence search code carefully enough in order to judge about it. But at a first glance, it looks a little bit unusual for me, in that I don't recognize the typical, most important "stand pat" element in your quiescence search algorithm. It seems you are using the same search function for both full search and quiescence search, and I feel there must be something wrong with it. You need to implement quiescence search in a separate function where you start with calling the evaluation function, and if the score does not already cause a beta cutoff, try to improve on this score by searching all captures in an "intelligent" order (e.g. also MVV/LVA which is common).

These are some of the most obvious points for me, although there may be more.

Sven

CRoberson
Posts: 2008
Joined: Mon Mar 13, 2006 1:31 am
Location: North Carolina, USA
Contact:

Re: I need help, performance wall!

Post by CRoberson » Wed Oct 21, 2009 10:07 pm

2 minutes to achieve 7 ply means tree explosion -> you have a search problem. Telepath gets 19+ ply in 2 minutes. 14 ply in 7
seconds. I don't think a person with enough ability to code a chess program can make one slow enough (and I mean nps slow) to be that
slow. So, your problem has to be search.

Also, when asking for performance help, give us some performance numbers: Nodes per Second, search depth in T time on a given defined position,
plus total nodes (if possible a break down on Search nodes and QSearch nodes). With that, we can see things much quicker.

This is interesting: I met a high school kid from Atlanta a few months ago. He had the same issue - 2 minutes to complete 7 ply. He claimed
to be doing null move, alpha-beta and all the other best known algorithms. He was also testing for checkmate at every move generation.
I told him to drop that - it is a major waste of time. Also, he was running an elaborate idea on every move for move ordering.

If you are doing alpha-beta and you're still this slow, your move-ordering is likely bad. The other possibility is serious bugs.

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 8:05 pm
Location: Italy
Full name: Stefano Gemma
Contact:

Re: I need help, performance wall!

Post by stegemma » Thu Oct 22, 2009 10:34 am

aberent wrote:...to get it to ply 7. However it just takes too long to play a game, often taking up to 2 minutes per move.
...move generator. By far it takes up 50% of the cpu cycles. I have optimized it to heck and still no luck.
...
Adam Berent
If the move generator takes 50% of the whole time, consider that optimizing this part can save only a part of the time. If you'll get a 3 time faster move generator then your whole time will becomes:

50% other things + 1/3 * 50% generator = 67%

that means that working hard you can find a move in 80 seconds instead of about 120 seconds. That's interesting but does'nt solve your problem.

As said by Bob, maybe there are something wrong in your alfa-beta or in other algorithms. The idea to evaluate any move is obviously wrong, as you said. This could give little benefit if you change it, but add this to a faster move generator, exact alfa-beta, a little Transposition Table and maybe Null Moves and you can get a faster program. Not only one of all of this stuff can give the answer to your problem, but a combination of all of them.

I would procede step by step, starting from the way you use generated moves: you should compute and save the less as you can, when you generate moves. You can complete them as soon as you execute them.

The base idea is to delay anything that could be delayed, everywhere and everytime. alfa beta can avoid at all what you have delayed...

aberent

Re: I need help, performance wall!

Post by aberent » Thu Oct 22, 2009 3:03 pm

Thank you for your reply Sven ,

Yours was by far the best, as you actually looked at my code, the other people just replied with "use alpha beta" which as you know, I already do.

As I mentioned in my post I have done exactly what you suggested. I haven't posted that code on my website yet, since I am not quite happy with its performance.

Maybe I can email you that code so you can review it?

Essentially in the alternative implementation I do the following,

1. Get all the positions for all possible moves (Legal and Illegal)
2. Apply a value mainly by looking at captures and castling
3. Sort
4. Start a Loop through positions
5. Make First Move
6. Alpha Beta
7. If Cut-off Return
8. Else Loop

As a result here is the difference between the 2 implementations:

1. Ply 7 Alpha Beta All Valid Moves Ahead of Time from Starting Position

Nodes per Second: 32879
Total Time: 25 seconds
Total Nodes: 854449

2. Ply 7 Alpha Beta One Valid Move at a Time from Starting Position

Nodes per Second: 132486
Total Time: 45 seconds
Total Nodes: 6090649

As you see the one move implementation is faster in nodes per second, but because the opening position has no captures, it is not trying the best move first. This gets better in the middle game, however not to the point where I can get 1 minute per move average.

How else can I sort before examining each position, please help

For everyone else who could not find my code the url is:

http://www.chessbin.com/page/Developmen ... ngine.aspx

Fguy64
Posts: 814
Joined: Sat May 09, 2009 2:51 pm
Location: Toronto
Contact:

Re: I need help, performance wall!

Post by Fguy64 » Thu Oct 22, 2009 3:21 pm

i haven't looked at your code, I'm not a c++ guy, but as far as I can see your list of steps for your alternative method does shed some light on the problem. I see possibly quite a bit of unnecessary work there, I'm not sure you want to be storing all those positions in step 1, then evaluating them before you even call alphabeta. That seems counter productive to me.

Give some thought as to how things might work out if a call to alphaBeta was the very first thing on the list. At least that's what I do, and I'm quite satisfied with the performance.

My source code (java) is available through the link in my signature. You are welcome to have a look if you think it might be useful to you.

Good luck, I'm sure Sven will set you straight.

aberent

Re: I need help, performance wall!

Post by aberent » Thu Oct 22, 2009 4:33 pm

Maybe you can describe, how do you sort the positions before calling alpha bata?

You must have some quick evaluation or else how do you search for the more likely best moves first?

Fguy64
Posts: 814
Joined: Sat May 09, 2009 2:51 pm
Location: Toronto
Contact:

Re: I need help, performance wall!

Post by Fguy64 » Thu Oct 22, 2009 4:41 pm

aberent wrote:Maybe you can describe, how do you sort the positions before calling alpha bata?

You must have some quick evaluation or else how do you search for the more likely best moves first?
I don't sort positions, I sort moves. That's a very important distinction.

Also, there is no moveOrdering before the initial call to alphaBeta, only before the recursive calls.

I use MVV/LVA move ordering. in which captures are evaluated before regular moves. That is pretty standard. And you only need to know the current position (before the moves) not the position after.

Your strategy may indeed result in a better move ordering, but it sounds like it might be defeating the purpose, in that the work is needed to provide the ordering outweighs the work saved by using the ordering, do you see what I mean? Also, I suspect that your alphaBeta pruning might not be working as you expect, but I am not sure on that point.

I found the following site very helpful for all of this stuff.

http://web.archive.org/web/200404032117 ... /index.htm
Last edited by Fguy64 on Thu Oct 22, 2009 4:45 pm, edited 1 time in total.

User avatar
hgm
Posts: 24615
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: I need help, performance wall!

Post by hgm » Thu Oct 22, 2009 4:43 pm

First search the PV move of the previous iteration (when you are still on that PV). Next capture the most valuable piece that you can capture. That should more or less do it, all the rest is refinement.

aberent

Re: I need help, performance wall!

Post by aberent » Thu Oct 22, 2009 4:49 pm

hgm wrote:First search the PV move of the previous iteration (when you are still on that PV). Next capture the most valuable piece that you can capture. That should more or less do it, all the rest is refinement.
What does PV move stand for (Principle Variation)? Can you please explain in more detail, sorry I am still learning.

Aleks Peshkov
Posts: 882
Joined: Sun Nov 19, 2006 8:16 pm
Location: Russia

Re: I need help, performance wall!

Post by Aleks Peshkov » Thu Oct 22, 2009 4:49 pm

If you actually sort (shuffle) whole _positions_ it is an obvious bottleneck.

Post Reply