Stuck at 2400

Carbec · Post by **Carbec** » Thu Mar 09, 2023 1:12 pm

Hello all,

Since several weeks, I try to improve my engine. It has already a lot of "classic" features :
alpha-beta, quiescence, transposition table, null move pruning, pvs. The evaluation is very basic :
material , tables PeSTO. All these work rather well, I made some tournaments against Sungorus
and Blunder 7.1; and I estimate my elo of 2400.
I think that now is the time to improve the evaluation. I tried to implement several ideas,
but for the moment I only lose elo. I obviously looked at what others do, but its in general
very complex, as I would like to understand what I am doing.
Do you know of papers, links to help me understand what to do ?

Thanks a lot
Philippe

Henk · Post by **Henk** » Thu Mar 09, 2023 1:44 pm

Just read somwhere that if you need help then you are not good at what you are doing. So best to do something that suits your skills. Problem is that it is usually boring work.

So if you are good at what you are doing you don't appreciate and need any help.

O wait maybe you are asking for (chess programming) tutorials to improve your skills. I don't think they exist for engines of that level.

lithander · Post by **lithander** » Thu Mar 09, 2023 3:23 pm

If your evaluation is just PSQT and nothing else then 2400 is a good rating and working on the eval next seems like the right idea. You can try adding evaluation terms that represent information of the position that PSQTs alone can't capture. Some ideas are very simple like a bonus for bishop pairs or a tempo bonus. The classics that really make a strength difference are Mobility and Pawn-Structure. Also King-Safety, but that's more tricky - at least I couldn't get it to work, yet.

Note that when you add new evaluation terms the old PSQT values are not optimal anymore. If you just literally used PeSTO's values and never changed them when you tried to improve your evaluation by adding more terms than that's probably why it didn't work. Imo the best way to find values that fit together is by creating an infrastructure that allows you to "tune" the engine automatically. At the beginning you can use existing datasets (the Zurichess' one is pretty popular) and something simple like Texel's tuning, although gradient descent is more powerful and not significantly more complicated. Long term you may want to be able to create your own datasets of annotated positions but that can wait.

I don't know of any papers or books but always found this forum very helpful in giving me new ideas to try.

algerbrex · Post by **algerbrex** » Thu Mar 09, 2023 6:07 pm

A couple of suggestions:

I agree with Thomas (lithander), focus on developing an automated way to tune your evaluation. I know it can seem daunting at first, because it did to me. And that why I put it off for so long. But you eventually get to a point where hand-tuning the evaluation parameters becomes incredibly tedious and largely unfruitful. And automated tuning makes this process significantly easier, and allows you to make progress much quicker. For example, I spent many weeks trying to hand tune king safety, and nothing I did gained any strength. But last summer once I finally bit the bullet and implemented a gradient descent tuner, and then had it tune my king safety, not only did it produce values that gained 40 Elo, but it did it in a fraction of the time I had spent messing around.

I know you've probably heard this a lot, but it's worth repeating because it's so important. Check, double-check, triple-check, and quadruple-check your code for bugs. As your code base grows larger and more complicated, it's incredibly easy to make a tiny error (e.g. putting a + instead of a -) that can cause significant strength decreases. I can't tell you how many times I finally thinl I've ironed all the major bugs out of my engine only to discover I've made a glaring mistake. For instance in my release of Blunder 8.0.0, I had completly forgotten to re-add code to detect draws correctly, which sometimes made the engine play weird, nonsensical moves towards the ends of games. Make sure to add sanity checks and unit tests to your engine where and when appropriate. The more testing and checking for bugs, the better.

Start simple with evaluation changes. And test throughly. By this point in your engine development it's really a necessity to have a robust testing framework, because often times what looks like a nice improvement after a couple hundred games, turns into nothing after enough testing is done. If you want something concrete, start with adding evaluation terms for pawn structure, and passed pawns. That should net some nice Elo, but you'll likely need to work a good bit tweaking values if doing it by hand.

Don't be afraid to experiment and go off the beaten path. Every engine is different and chess engine programming isn't an exact science. Sometimes you'll need to experiment with something unique to make progress. For a long time I couldn't get late move reduction to gain any Elo in my engine, and kept getting frustrated with it and taking it out. But it wasn't until I decided to try making some slight modifications that I finally gained a signficant amount of strength from it.

Don't be afraid to take a step back to make progress either. Blunder would not have reached the strength it did today if I wasn't willing to scrap some code here and there and find better ways to implement different techniques and ideas. Blunder has had two major refactorings so far, versions 5.0.0 and 8.0.0, and this upcoming version 9.0.0 will also mark a pretty significant refactoring job.

Carbec · Post by **Carbec** » Fri Mar 10, 2023 1:00 pm

Thanks a lot for your encouragements.

I don't know exactly what you mean by "testing framework". Is it a software or a way to do test ? For my part, I have some special positions I use to test the node's number searched, the speed, and of course the move found. I also use some position suites, like WinAtChess, IQ4, ECMGCP (from Arasan site). The first is now a little useless, as I found 299/300 positions with a 5 s time. And last, I do some tournaments against others engines. Lately Sungorus and Blunder 7.1.

So i was so frustated with my last tries, I decided to remove all my evaluation bonus, and revert to an ancient version.
It is the one I released on Github. For the future, I will concentrate to learn about "tuning". I see that it is the big thing
on the forums.

Philippe

lithander · Post by **lithander** » Fri Mar 10, 2023 1:26 pm

Carbec wrote: ↑Fri Mar 10, 2023 1:00 pm I don't know exactly what you mean by "testing framework". Is it a software or a way to do test ? For my part, I have some special positions I use to test the node's number searched, the speed, and of course the move found. I also use some position suites, like WinAtChess, IQ4, ECMGCP (from Arasan site). The first is now a little useless, as I found 299/300 positions with a 5 s time. And last, I do some tournaments against others engines. Lately Sungorus and Blunder 7.1.

I could imagine you could write unit tests and have a testing framework setup to run these tests after each change. Maybe that's what Christian means? But I'm also not doing that. I do the same things you do: Quick tests with position suites to a fixed depth. And I do selfplay matches and gauntlets, mostly. Since my move generator is done I don't use Perft anymore but in the beginning that was very valuable.

Carbec wrote: ↑Fri Mar 10, 2023 1:00 pm So i was so frustated with my last tries, I decided to remove all my evaluation bonus, and revert to an ancient version.
It is the one I released on Github. For the future, I will concentrate to learn about "tuning". I see that it is the big thing
on the forums.

It's only a big thing on the forums because it answers the crucial question: Why are some PSQT values better than others and where do they originally come from? Something that governs how well and also in what style your engine plays is obviously an important part of your engine, so it's natural that all engine programmers ask a variant of this question eventually. Hence it comes up frequently on the forum.

I wanted to be able to find good values on my own and 'tuning' was the only way to do that. If you're able to find these values by hand I admire you. But you can also try to use 'genetic programming' like JoAnn Peeler recently wrote about, here. Tuning PSQT values is the beginner friendly first step, eventually you'll be training thousands of NNUE weights but it's basically the same thing: It generates the 'data' that your evaluation needs to evaluate chess positions and I'm not aware of evaluation approaches that are powerful and rely 100% on code only with no 'arbitrary' weights, constants or parameters. (maybe MCTS with random playthroughs would qualify but is it really powerful?)

algerbrex · Post by **algerbrex** » Fri Mar 10, 2023 4:00 pm

Carbec wrote: ↑Fri Mar 10, 2023 1:00 pm I don't know exactly what you mean by "testing framework" ...

Ah sorry, I should have explained more.

All I really mean is having a consistent approach to testing your engine. In other words, whenever you think you've made a strength-gaining change to your engine, how to you plan on testing it? For most, this can look like using SPRT and self-play to establish an idea of the Elo gain, and then from there, some might go on to do gauntlet testing. Another approach might be to utilize things you mentioned, like WAC positions or what not. There's a variety of approaches available here, but by "framework," I meant something a bit more conceptualize. Perhaps a better phrase would be testing strategy.

Carbec · Post by **Carbec** » Fri Mar 10, 2023 6:02 pm

Ok, I have now a better idea. Finaly, I am not so far )
What is that SPRT you mentionned ? Some software ?

I have another question, about Github. I put the latest sources on it, and then created a release with the binay. For some reason, Github added ancient sources to it, not the right one of course? Why ? And how can I fix that ? Im sorry to ask a so dumb question, but im not familiar with Github.
The source is here : https://github.com/Carbecq/Zangdar

Thanks,
Philippe

algerbrex · Post by **algerbrex** » Fri Mar 10, 2023 6:39 pm

Carbec wrote: ↑Fri Mar 10, 2023 6:02 pm Ok, I have now a better idea. Finaly, I am not so far )
What is that SPRT you mentionned ? Some software ?

I have another question, about Github. I put the latest sources on it, and then created a release with the binay. For some reason, Github added ancient sources to it, not the right one of course? Why ? And how can I fix that ? Im sorry to ask a so dumb question, but im not familiar with Github.
The source is here : https://github.com/Carbecq/Zangdar

Thanks,
Philippe

Ah sorry again I should've explained the terms more

SPRT is a name for a kind of test that can be run using CuteChess. Without going into all of the details, it basically allows you to check that engine A is stronger than engine B within a certain margin of error. It's what many authors use to verify the changes they make to their engine's are actually improvements.

As for your question, for binaries, you can fully control which ones are uploaded, so if the binaries are not up to date, you can re-upload new ones. If you mean the source doesn't match your current source, then you'll need to go through some hoops to make your current release tag point to your most recent commit and not the commit it was originally pointing to.

silentshark · Post by **silentshark** » Fri Mar 10, 2023 10:01 pm

Carbec wrote: ↑Thu Mar 09, 2023 1:12 pm Hello all,

Since several weeks, I try to improve my engine. It has already a lot of "classic" features :
alpha-beta, quiescence, transposition table, null move pruning, pvs. The evaluation is very basic :
material , tables PeSTO. All these work rather well, I made some tournaments against Sungorus
and Blunder 7.1; and I estimate my elo of 2400.
I think that now is the time to improve the evaluation. I tried to implement several ideas,
but for the moment I only lose elo. I obviously looked at what others do, but its in general
very complex, as I would like to understand what I am doing.
Do you know of papers, links to help me understand what to do ?

Thanks a lot
Philippe

Mobility and king safety eval terms will help.

Stuck at 2400

Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400

Re: Stuck at 2400