AI performs poorly despite decent depth.

mvanthoor · Post by **mvanthoor** » Tue Apr 11, 2023 11:02 am

AngularMomentum wrote: ↑Tue Apr 11, 2023 1:57 am I obviously wasn’t asking anyone else to look at my code and find bugs for me. I was simply asking if the performance is normal for the features I implemented.

Nobody can know by the description you gave. We must have a known-good benchmark. The benchmark for most people is a reference to a CCRL-rating (as this is the most well-known rating list). Therefore you'd have to test your engine in a gauntlet against a few CCRL-rated engines and establish an approximate rating. It would then be possible to look at engines around the same rating as yours achieve, and compare feature sets.

The Alpha 3 version of my engine implements this:

- iterative deepening
- alpha/beta
- hash table
- MVV-LVA ordering
- hash table move ordering
- check extension
- pvs
- liller moves
- PSQT-only evaluation (not tapered yet, not tuned)

The rating in the CCRL 2m1s is 1920.

The 4.0-beta version adds a tapered and tuned evaluation on top of that. Its projected rating is going to be somewhere around 2170 +/- 20 Elo.

My engine is strong for its feature set: when comparing it against engines at around the same CCRL rating, Rustic mostly has _less_ features to achieve that rating. That means that these other engines either have bugs or are (much) slower. Or both.

Move generation: 100-150 million nps in Perft, 15-20 million in actual search.
Search: PVS, quiescence search.
Move Ordering: MVV-LVA when the captured piece is more valuable than the moving piece, SEE for all other moves.
Evaluation: piece square table with 10cp side to move bonus.

That is really fast. Is the engine multithreaded already?

AngularMomentum · Post by **AngularMomentum** » Tue Apr 11, 2023 1:12 pm

mvanthoor wrote: ↑Tue Apr 11, 2023 11:02 am
AngularMomentum wrote: ↑Tue Apr 11, 2023 1:57 am I obviously wasn’t asking anyone else to look at my code and find bugs for me. I was simply asking if the performance is normal for the features I implemented.
Nobody can know by the description you gave. We must have a known-good benchmark. The benchmark for most people is a reference to a CCRL-rating (as this is the most well-known rating list). Therefore you'd have to test your engine in a gauntlet against a few CCRL-rated engines and establish an approximate rating. It would then be possible to look at engines around the same rating as yours achieve, and compare feature sets.

The Alpha 3 version of my engine implements this:

- iterative deepening
- alpha/beta
- hash table
- MVV-LVA ordering
- hash table move ordering
- check extension
- pvs
- liller moves
- PSQT-only evaluation (not tapered yet, not tuned)

The rating in the CCRL 2m1s is 1920.

The 4.0-beta version adds a tapered and tuned evaluation on top of that. Its projected rating is going to be somewhere around 2170 +/- 20 Elo.

My engine is strong for its feature set: when comparing it against engines at around the same CCRL rating, Rustic mostly has _less_ features to achieve that rating. That means that these other engines either have bugs or are (much) slower. Or both.

Move generation: 100-150 million nps in Perft, 15-20 million in actual search.
Search: PVS, quiescence search.
Move Ordering: MVV-LVA when the captured piece is more valuable than the moving piece, SEE for all other moves.
Evaluation: piece square table with 10cp side to move bonus.
That is really fast. Is the engine multithreaded already?

I’ll try testing my engine against some others myself, as others already suggested.
As for the speed, I got a large boost by generating legal moves directly (instead of doing the move and seeing if the king is in check) and by generating all pawn moves with just a few bitshifts instead of looping through every pawn.

AngularMomentum · Post by **AngularMomentum** » Wed Apr 12, 2023 6:31 am

smatovic wrote: ↑Tue Apr 11, 2023 8:06 am There is a known search:eval co-dependency, if you search deep but your eval is lame it won't work out, and, generally this depends on engine, for example PeSTO with PSQT only, or OliThink with mobility eval only.

I use to run STS1-15 for my Zeta engines (~2000 CCRL Elo), my development steps are still big enough to get a feeling of the progress via test suites:

Re: STS re-re-re-re-re-visited
https://talkchess.com/forum3/viewtopic. ... 10#p945710

E.g.: run STS to get a rough impression of your development, then measure Elo gain/loss via self-play, then measure Elo against other opponents.

I would say your search depth with given NPS and feature set is okay, you might want to take a look at your effective branching factor:

https://www.chessprogramming.org/Branch ... ing_Factor

Chess has an average game branching factor of ~36, with perfect AB move ordering you get an EBF of ~6, every additional (pruning) technique lowers EBF further, for example <2 for modern top AB engines.

Can not judge about your eval.

--
Srdja

My EBF seems to be about 6-7, using (non quiescence nodes)^(1/depth) but that’s using transposition tables. It should improve once I implement killer moves.
And this isn’t the main point of the discussion, but is mobility really good enough to be used alone as an evaluation term? I’m quite surprised.

smatovic · Post by **smatovic** » Wed Apr 12, 2023 7:16 am

AngularMomentum wrote: ↑Wed Apr 12, 2023 6:31 am ...
And this isn’t the main point of the discussion, but is mobility really good enough to be used alone as an evaluation term? I’m quite surprised.

That's the magic about OliThink

https://www.chessprogramming.org/OliThink

OliThink's evaluation consists almost of material balance and mobility, plus a very simple pawn structure evaluation, rewarding passed pawns.
[...]
Oliver Brausch published OliThink 5.4.0 with a big leap in playing strength due to modifications in evaluation of likely drawn endgames

https://computerchess.org.uk/ccrl/404/c ... 9_7_64-bit

OliThink 5.9.7 64-bit #128 (2897)

--
Srdja

AI performs poorly despite decent depth.

Re: AI performs poorly despite decent depth.

Re: AI performs poorly despite decent depth.

Re: AI performs poorly despite decent depth.

Re: AI performs poorly despite decent depth.