Engine vs Engine - Plays Same Moves

Cheney · Post by **Cheney** » Thu Feb 09, 2017 7:12 pm

Hi Everyone!

I decided to put some time into my chess project again (it has been a year on the digital shelf) and while doing some testing, I see something I feel that I saw before. Maybe you might have an idea what this is, if it is anything at all? There are two parts...

When engine A and B play, they play the same moves for 20+ moves. Where the positions fork, the resulting next 10-15+ moves are also the same, and then another fork. Thus, the game results are always the same. If they play 50 games, there will actually be about 5 different games up to about 40 moves. The kicker here is if engine A loses one game then pretty much loses them all.

Is this to be expected? I know there is "randomness" in there but not by my design. I do not have an opening book and am using the same game times. I believe this should be expected as the engine does not "learn" between games and that the answer for me is to add more intelligence and maybe specifically to clock management.

The other oddity is I can have these same two engines play each other on a slightly different computer. Engine A, which loses 90% of the games on computer #1 will win 60% of the games on computer 2. I am not a computer architect, but I know they are both I5, 3MB cache, and the second one (where engine A wins 60%) is slightly faster (2.5ghz vs 2.6ghz).

Is this a computer architecture thing? Maybe the opponent engine is tuned for a specific type of processor where my engine is not? Maybe because of that little extra speed in the CPU, my programming is a little more efficient than that of the opponent's engine?

Any insights are greatly appreciated

Thanks!

AlvaroBegue · Post by **AlvaroBegue** » Thu Feb 09, 2017 7:21 pm

Differences between architectures do exist, but that's a distraction from your main problem: Your testing setup is useless because there is not enough variation in the games.

The obvious solution is to use a database of positions to start from (I believe pretty much everybody does this for testing). You can also add a small random number to your evaluation function to encourage diversity.

ZirconiumX · Post by **ZirconiumX** » Thu Feb 09, 2017 7:23 pm

Nothing is wrong, your engine is working as intended.

As a general rule, engines tend to be deterministic on a single thread. This is useful because it aids debugging to easily reproduce an issue.

If you want randomness, you'll need an opening book, but if you exhaust that, you will eventually end up with repeated games.

Cheney · Post by **Cheney** » Fri Feb 10, 2017 1:39 am

Thanks guys! When I last worked on this (a year ago), I had an opening book but think it was giving me some grief so I took it out. It did create various games though. I will bring it back into this version after some more testing.

As for the different architectures of computers, it just baffles me that I can have my engine play another engine on one computer and it consistently win 40 out of 50 but when I have them play on another computer it loses by almost the same margin. This could be related to the same idea of repeating the same games, but with the environment equal, is it really possible the engine is just having a bad day?

There has to be something more to it than that; maybe I should compile my code on that second computer and give it a try, just guessing here.

Thanks again!

Ras · Post by **Ras** » Fri Feb 10, 2017 1:51 am

Cheney wrote:Thanks guys! When I last worked on this (a year ago), I had an opening book

You should.

but think it was giving me some grief

That is a bug in the opening book. Can happen if you don't make sure that the book actually matches your engine. E.g. with my CT800, the Dutch opening with black is a predictable disaster, that's why I relegated it to passive knowledge. Closed positions are also bad.

It's a lot of work to have the book match the engine. Even if top-range engines are happy with some openings, that doesn't mean your engine is fine, too.

This could be related to the same idea of repeating the same games

It is. If you are going for multi-threaded, then the order of moves that get searched depend on the architecture. If you only have, say, 5 different games, then it is like throwing dice which architecture will give you the "nice" 5 games.

My advice: get an opening book. First, it will be more fun to play your engine. Second, if the book matches your engine style, the engine will be stronger.

hgm · Post by **hgm** » Fri Feb 10, 2017 8:00 am

Cheney wrote:As for the different architectures of computers, it just baffles me that I can have my engine play another engine on one computer and it consistently win 40 out of 50 but when I have them play on another computer it loses by almost the same margin.

If your engine is single-threaded, and doesn't make any decision that is dependent on timing (e.g. it uses a fixed depth, or searches a given number of nodes) then it should be 100% deterministic, and play exactly the same on all computers. But if its decisions depend on timing, the speed of the computer becomes a factor. If your engine aborts the search after a timeout, having the timeout occur after a different number of nodes has been searched would leave different information in the hash table, which could affect later moves. And if you will always complete iterations, the decision if there stillis time for a next iteration sometimes has tobe taken very closely to the time limit for this, so that a slight alteration of the speed can cause the engine to search one ply deeper. And sometimes it prefers a different move in such a deeper search.

These sources of radomness are very weak, though. I once measured this, and on average only one in 40 moves was different. This is the reason why you get results like 70-30 rather than always 100-0, 0-100 or 50-50. You can drive up the randomness by randomizing the move order of non-captures in the root. Or by adding a random score to every root move (but this will weaken the engine a little) or every evaluation. The random generator should then be initialized from something that varies from run to run, (like the start time), however, or this would not help.

As already pointed out, you can start from a variety of positions or opening lines (or use a variety of opponents) to avoid this problem altogether. Almost every GUI will allow you to select a file with positions (FEN) or opening lines (PGN) to be used with an engine-engine match. Or supports a book fromwhich it randomly picksmoves to be feeded to the playing engines, so that you don't have to support a book yourself. Many books are freely available. In testing you don't care if the book suits your engine or not. If the book has sufficient variety (i.e. not only e2-e4 openings, and such), it should give a good impression of the average performanceof your engine. (Which isusually what you want .Unless you are developing a engine only intended to analyze games starting with e2-e4, and would be clueless after any other move.)

Cheney · Post by **Cheney** » Sat Feb 11, 2017 1:32 am

Thanks for all the replies to help me wrap my mind around this.

My engine is single threaded and I have been running games at 2:0+1 for the most part. I know on different computers with different speeds that different depths will be reached and thus different moves will be played. However, I would expect the same to apply to the opponent's engine. If one engine tests significantly better than another engine on one computer, then how can the weaker engine be superior on a slower or faster computer? Sure, my engine does not go as deep and searches less nodes, but does't that apply to the opponent? Maybe his sorting his better at lower depths where my engine relies on speed for more depth?

I am going to rebuild my opening book into the engine and see what happens. As for randomizing a score in the eval or at the root, I will save that for later

Thanks again!

Sven · Post by **Sven** » Sat Feb 11, 2017 12:27 pm

Cheney wrote:Thanks for all the replies to help me wrap my mind around this.

My engine is single threaded and I have been running games at 2:0+1 for the most part. I know on different computers with different speeds that different depths will be reached and thus different moves will be played. However, I would expect the same to apply to the opponent's engine. If one engine tests significantly better than another engine on one computer, then how can the weaker engine be superior on a slower or faster computer? Sure, my engine does not go as deep and searches less nodes, but does't that apply to the opponent? Maybe his sorting his better at lower depths where my engine relies on speed for more depth?

I am going to rebuild my opening book into the engine and see what happens. As for randomizing a score in the eval or at the root, I will save that for later

Thanks again!

I think that the ability of an engine to use an own opening book is less important than a proper testing strategy. You may need an opening book for online tournaments (like monthly HGM blitz tourney) where your engine plays few games on your own hardware. But for testing your changes during development, as well as for testing in the context of rating lists like CCRL, it will not be needed, here it is more important to get reproducible results (and thus no randomizing!) and (in case of your own testing) to play many (usually very fast) games with different starting positions. From 100 games, for instance, you can't derive any reliable statement about the change in playing strength, this will require more like 1000 or 2000 games per version.

Ras · Post by **Ras** » Sun Feb 12, 2017 12:02 am

Sven Schüle wrote:But for testing your changes during development, as well as for testing in the context of rating lists like CCRL, it will not be needed.

Couldn't that involve the risk to optimise the program for the handful of different games it plays - only to find out that in a broader context, the changes were bad?

Daniel Anulliero · Post by **Daniel Anulliero** » Sun Feb 12, 2017 1:52 am

Now I use 250 opennings lines , always the same, played with color reversed = 500 games I run in the seltests .
Using always the same openning is important ho mesure your improvments ( or not lol)

Engine vs Engine - Plays Same Moves

Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves

Re: Engine vs Engine - Plays Same Moves