new axolotl!

AxolotlFever · Post by **AxolotlFever** » Sun Dec 16, 2018 2:49 pm

Hello everyone,

I am very happy to announce the latest version of axolotl, my Java chess engine, version 1.1! There are not many "big ticket" improvements, just a lot of bug fixing, better UCI integration and a gigantic refactor.

I have completely split the Legal move generation from the engine proper, and they are now separate github projects:

move gen:
https://github.com/louism33/ChessCore

This program, which is available through maven, is what I wrote to generate all legal moves in Java. It uses bitboards, in which "0" is Square H1, and 63 is square A8. I always wanted to make a maven import so here it is. It has a speed of about 25million nodes per second. If you have any comments about how I go about things, I would be extremely happy to hear them! Thank you very much to all the kind people who gave me such good advice on the last version of my engine.

maven import:
https://oss.sonatype.org/content/reposi ... chesscore/

engine:
https://github.com/louism33/Axolotl

Axolotl gets about 250 / 300 on the Win at Chess puzzles. It has a long way to go, and I have done my best to make improvements. I would be extremely grateful to anyone who uses her in a match (binaries available on github https://github.com/louism33/Axolotl/releases ) or through maven ( https://oss.sonatype.org/content/reposi ... -SNAPSHOT/ ).

The source code is all on github, and any advice is hugely appreciated. In particular, I am curious about how to increase the speed, without sacrificing quality. We are at about 150 knps, and I would love to hit a million.

Hopefully it is now good enough to win some games!

Kind regards, and thanks to all those who took the time to help me get to this point,

All the very best,
Louis

odomobo · Post by **odomobo** » Mon Dec 17, 2018 9:59 am

Looking good! As far as performance goes, you'll need to use a profiler to see where your slowdowns are happening. There's really no way around it. Otherwise, you're stumbling around in the dark.

It pains me to see that your transposition table is backed by 2 arrays of longs, but unfortunately this does seem to be the optimal way to do it in java.

You might find the following interesting, since you have fixed-sized move lists: viewtopic.php?topic_view=threads&p=410026&t=39332

odomobo · Post by **odomobo** » Wed Dec 19, 2018 5:20 am

I thought about it, and you can back your transposition table by a single array of longs, where even indexes are keys, and odd indexes are the corresponding entries. This means every tt lookup will be a single cache miss instead of 2. However, I think your engine would need to be quite a bit faster before you'd be able to see the difference in performance.

AxolotlFever · Post by **AxolotlFever** » Wed Dec 19, 2018 8:56 am

Hey odo,

Wow that is a really interesting idea! I will look into it, but I think you are right - that there are more pressing problems to deal with... How does a c++ engine handle a transposition table? Do they not use two backing arrays?
Thanks,
Louis

odomobo · Post by **odomobo** » Wed Dec 19, 2018 9:59 am

In c or c++, you can store arbitrary objects in-place in an array (store the object by value, instead of by reference). For, for example, if your tt entries looked like:

Code: Select all

struct tt_entry {
    uint64_t hash;
    int32_t score;
    int32_t depth;
};

Then you could do something like:

Code: Select all

tt_entry transposition_table[NUM_ENTRIES];

A tt_entry would take up 16 bytes, so the transposition table would really be a contiguous chunk of 16*NUM_ENTRIES bytes in memory. Due to the size of transposition tables compared to cache size, nearly every TT lookup will be a cache miss (at least for L2 cache). Cache lines for modern processors are 64 bytes in size, so as long as your transposition table is aligned on (at least) the 16 byte boundary, then entries will not cross cache line boundaries (requiring 2 cache line fetches to look up 1 entry).

You can also do tricks like align the table on a 64-byte boundary, and have each TT index look up into a 4-entry "bucket". You have to search all 4 entries (instead of just checking the 1), but that should be much cheaper than the initial cost of the cache miss. Since you can choose which of the 4 entries to overwrite, you can use a more intelligent replacement algorithm, to hopefully store more useful entries in the table.

Hope this all makes sense!

AxolotlFever · Post by **AxolotlFever** » Wed Dec 19, 2018 10:11 am

Thanks very much! Some really interesting ideas there. I hae been considering using a bucket system, so the next version will include this

Luckily (?) in Java, I don't think that it is as necessary to understand the points you mentioned, but you have made me ver interested in them, so I think c++ will have to be the next language I look into

PS: If you had to, very roughly, estimate the strength boost from moving from Java to C++, what kind of number would you say? If no additional features are included, does +100 ELO sound realistic ?

Ras · Post by **Ras** » Wed Dec 19, 2018 4:44 pm

AxolotlFever wrote: ↑Wed Dec 19, 2018 10:11 amestimate the strength boost from moving from Java to C++, what kind of number would you say? If no additional features are included, does +100 ELO sound realistic ?

I think we can assume something in the ballpark of 50 Elo per speed doubling. So for a 100 Elo, you would need 4 times the speed. How much faster C++ is depends on the application, but if C++ is more than 2 times faster, there's something wrong with the Java program, usually.

One of the things that you need to avoid strictly is allocating objects on the heap in search, eval and movegen. That needs to be done either on the stack or statically, or otherwise you effectlively limit your performance by running into the allocator.

mar · Post by **mar** » Wed Dec 19, 2018 5:27 pm

AxolotlFever wrote: ↑Wed Dec 19, 2018 10:11 am PS: If you had to, very roughly, estimate the strength boost from moving from Java to C++, what kind of number would you say? If no additional features are included, does +100 ELO sound realistic ?

Very roughly from what I've seen, that's one doubling roughly.
I compared TSCP (non-bitboard) C vs C# and the C# version ran 1.8x times slower. How much is one doubling depends on TC, at bullet, I would say 100 elo is realisitic.
I don't know about Java, but I remember when Peter went with Cuckoo from Java to C++ (now Texel), he gained some 100+ elo in CCRL, but there were already some functional changes plus I'm not sure if even that would be fair (maybe the Java version tested was 32-bit and Texel was 64-bit already?)
There are not enough samples to say for sure (not aware of many 1:1 compiled vs managed language functionally equivalent engines).

Anyway - if you're hitting 150kn/s, then you should probably look at some profiling tool to see what's really holding you back before you consider to switch the language.

odomobo · Post by **odomobo** » Thu Dec 20, 2018 5:42 am

AxolotlFever wrote: ↑Wed Dec 19, 2018 10:11 am ... I think c++ will have to be the next language I look into

C++ is one of my favorite languages, and it's incredibly powerful, but it also has many confusing parts and legacy rough-edges. C, on the other hand, is a pretty simple language with only a relatively few rough edges, but it lacks many of the high-level abstractions C++ gives you to help manage and simplify your code. They both have the same performance constraints (theoretically, the fastest possible performance).

That being said, there's still a far way you can go with Java, you'll just hit a performance brick wall at some point (the highest java engine I know of on CCRL is Carballo which is around 2700 ELO). Trying to switch over to C or C++ would likely just prove a distraction at this point. On the other hand, I guess it never hurts to learn a new language.

Ratosh · Post by **Ratosh** » Fri Dec 21, 2018 1:02 am

Hey Luis,

You can get to 3000 CCRL running on a JVM. Chess22k is written in Java and is about 3100 CCRL, Pirarucu is written in Kotlin and is about 2950 CCRL. My engine (Pirarucu) does not have a fast move generator, but has a decent search speed.

new axolotl!

new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!

Re: new axolotl!