When there is not progress

Kempelen · Post by **Kempelen** » Tue Nov 17, 2009 10:18 am

Hello,

This post is for ask a little of help. I have been improving Rodin for two years now. It has all major features a engine can have and I have arrived a point where new problems appear. First an introduction; these are the engines I am currently testing my engine with:

Code: Select all

Rank Name            Elo    +    - games score oppo. draws 
   1 Danasah        2481   41   39   200   67%  2362   21% 
   2 RomiChess P3K  2472   42   40   200   64%  2362   13% 
   3 Eeyore         2447   41   40   200   61%  2362   13% 
   4 ThorsHammer22  2418   41   40   200   57%  2362   11% 
   5 Bruja          2391   41   41   200   54%  2362    9% 
   6 Knightx192     2386   39   39   200   53%  2362   18% 
   7 Rodin v2.8b    2362   13   13  2200   51%  2349   14% 
   8 Dirty          2342   39   39   200   47%  2362   19% 
   9 Scidlet        2288   39   39   200   40%  2362   22% 
  10 ZCT            2282   40   41   200   40%  2362   13% 
  11 RattateChess   2229   41   43   200   34%  2362   11% 
  12 BlackBishop    2102   46   50   200   20%  2362    9%

These are the result of my last test tournament, at 1 min 0 secs. My problem is mainly I have been testing new features for around two months in this format and I have never obtained any gain. What I have been measuring is things like futility margins, passed pawns bonus, king safety adjusments, ..... any tournament I test I get the same result: 51%/52%

I dont know how you deal when this happen to you. For me is quite desesperate and lead me to ask myselft questions like: Do I have any bug which hide good results?, Will those engines be suitable for testing my engine?, Does it means I need to change my testing schema?

Well, This post is only to ask you for advice and tips on have to deal with this situacion. What I mainly fear is something will be break. Many of you have more expertise than me, so you comments will be wellcome.....

Greetings,
Fermin

P.S.: Situacion like what I have just described show how difficult is to arrive to the top of rating list. It is like playing the piano or any other ability.... you can be good, but is impossible one arrive to the top when releasing his first engine. You need training and a lot of experience. Rybka, Ippolit and a few others maybe are good programs, but they are much suspects.... Nobody is born knowing how to play piano, paint or play chess with a considerable training time.

jesper_nielsen · Post by **jesper_nielsen** » Tue Nov 17, 2009 11:05 am

Kempelen wrote:Hello,

This post is for ask a little of help. I have been improving Rodin for two years now. It has all major features a engine can have and I have arrived a point where new problems appear. First an introduction; these are the engines I am currently testing my engine with:
Code: Select all
Rank Name            Elo    +    - games score oppo. draws 
   1 Danasah        2481   41   39   200   67%  2362   21% 
   2 RomiChess P3K  2472   42   40   200   64%  2362   13% 
   3 Eeyore         2447   41   40   200   61%  2362   13% 
   4 ThorsHammer22  2418   41   40   200   57%  2362   11% 
   5 Bruja          2391   41   41   200   54%  2362    9% 
   6 Knightx192     2386   39   39   200   53%  2362   18% 
   7 Rodin v2.8b    2362   13   13  2200   51%  2349   14% 
   8 Dirty          2342   39   39   200   47%  2362   19% 
   9 Scidlet        2288   39   39   200   40%  2362   22% 
  10 ZCT            2282   40   41   200   40%  2362   13% 
  11 RattateChess   2229   41   43   200   34%  2362   11% 
  12 BlackBishop    2102   46   50   200   20%  2362    9% 
These are the result of my last test tournament, at 1 min 0 secs. My problem is mainly I have been testing new features for around two months in this format and I have never obtained any gain. What I have been measuring is things like futility margins, passed pawns bonus, king safety adjusments, ..... any tournament I test I get the same result: 51%/52%

I dont know how you deal when this happen to you. For me is quite desesperate and lead me to ask myselft questions like: Do I have any bug which hide good results?, Will those engines be suitable for testing my engine?, Does it means I need to change my testing schema?

Well, This post is only to ask you for advice and tips on have to deal with this situacion. What I mainly fear is something will be break. Many of you have more expertise than me, so you comments will be wellcome.....

Greetings,
Fermin

P.S.: Situacion like what I have just described show how difficult is to arrive to the top of rating list. It is like playing the piano or any other ability.... you can be good, but is impossible one arrive to the top when releasing his first engine. You need training and a lot of experience. Rybka, Ippolit and a few others maybe are good programs, but they are much suspects.... Nobody is born knowing how to play piano, paint or play chess with a considerable training time.

I know the feeling. I have been in the same situation for about a year. All my efforts have returned a whooping zero Elo points strength gain!

What I did was walk away from the project for a while. Six months or so. And then slowly the motivation returned to try again.

A couple of things I tried, that actually seems to have increased my motivation:

1. Use a profiler heavily to try and find performance improvements. These kinds of improvements are brilliant, because they are 100% gains. No worrying if the change is for the better or worse. Another good aspect of this approach was that it forced me to look at some of the very old parts of my code. This sparked my imagination! "Hmm... This could be improved!"

2. I am currently changing my test strategy. Previously I used Arena to run 1+1 games. But I have changed to use the cutechess-cli tool now. This enables me to run very fast games without having time forfeits all over the place. This way I get to play a lot more test-games in the limited time available.

When I have implemented the new test strategy, I plan to start removing stuff from the evaluation function. Bascially to make sure that all parts of the evaluation has proved to me that it belongs.

Another idea: If your changes to the evaluation function doesn't change the score, then maybe your engine have a "blind spot" where the other engines keep pushing it. To look for a "blind spot" you could examine the loses manually, and see if you can categorize them by the reason for the loss. "Undeestimated the danger of a kingside attack". "Pushed a passed pawn too far and lost it". Maybe a pattern will emerge?!

Kind regards,
Jesper

EDIT: P.S. Now all I need is a good reason why you should take advice from someone who hasn't improved his engine in more than a year!

mcostalba · Post by **mcostalba** » Tue Nov 17, 2009 12:53 pm

I am not very experienced, but something what you said happens also to me, and, ironically, on the same subjects, namely futility pruning that is a beast. You can completely change it, or do small changes or big changes or almost completly remove and the result will be always the same

Also king safety and passed pawns are other two very hard parts.

What I do in these cases is to start read old code and do cleanup-patches. This is rewarding for me becasue are relaxing to do, the code at the end looks better and is easy to verify I have not introduced regressions because at the end of a clean-up patch I test functionality and must be the same.

Another possibility is to fire up my profiler and do some profiling session, also this is rewarding, but less and less because I think the big holes have already been fixed and now I can spend many days (of course only a small time a day) just to increase by 0.5%. A speed increase of 1% is very welcomed and is more and more rare these days.

But both cleanups and profiling are coming to a natural end, at least for my technical skills....

Kempelen · Post by **Kempelen** » Tue Nov 17, 2009 1:10 pm

mcostalba wrote: Another possibility is to fire up my profiler and do some profiling session, also this is rewarding, but less and less because I think the big holes have already been fixed and now I can spend many days (of course only a small time a day) just to increase by 0.5%. A speed increase of 1% is very welcomed and is more and more rare these days.

Only one question, how do you profile? my engine has two threads, one for thinking and the other for main communication. I am using mingw and the profile does not run with more than a thread.....

mcostalba · Post by **mcostalba** » Tue Nov 17, 2009 1:26 pm

Kempelen wrote: Only one question, how do you profile? my engine has two threads, one for thinking and the other for main communication. I am using mingw and the profile does not run with more than a thread.....

I always profile on single thread because results are more reproducible. I use Intel VTune on Windows and KCachegrind on Linux.

KCachegrind is nice and easy to use, VTune is more detailed and can properly account also RAM access latencies that are the real key secret to get a fast binary, but are also very difficult to handle because many times they require a kind of trial and see approach.

Michael Sherwin · Post by **Michael Sherwin** » Tue Nov 17, 2009 3:56 pm

mcostalba wrote:futility pruning that is a beast. You can completely change it, or do small changes or big changes or almost completly remove and the result will be always the same

I do not in general believe that futility can help. They are based on the material + any positional value + margin being less than alpha. When a move is made that is not good enough, NULL MOVE will take care of it anyway. So, reduce the depth by one and it does not matter to NULL MOVE towards the end_leafs as only a qsearch will be done in either case.

However, in RomiChess NULL MOVE is not done at depth = 1. Against some programs this hurts, most though it helps.

So, there is this line instead.

if(!inCheck && depth == 1 && h->eval + 150 < alpha) return CaptSearch ...

This line helps. It might be razoring--I am not sure.

mcostalba · Post by **mcostalba** » Tue Nov 17, 2009 4:46 pm

Michael Sherwin wrote:
So, there is this line instead.

if(!inCheck && depth == 1 && h->eval + 150 < alpha) return CaptSearch ...

This line helps. It might be razoring--I am not sure.

yes is a kind of razoring, although in our implemenattion is something like this (simplified):

Code: Select all

if(!inCheck && depth == 1 && h->eval + 150 < alpha)
   if (qsearch() +150 < alpha) return CaptSearch ...

Eelco de Groot · Post by **Eelco de Groot** » Tue Nov 17, 2009 5:12 pm

mcostalba wrote:
Michael Sherwin wrote:
So, there is this line instead.

if(!inCheck && depth == 1 && h->eval + 150 < alpha) return CaptSearch ...

This line helps. It might be razoring--I am not sure.
yes is a kind of razoring, although in our implemenattion is something like this (simplified):
Code: Select all
if(!inCheck && depth == 1 && h->eval + 150 < alpha)
   if (qsearch() +150 < alpha) return CaptSearch ...

This is the code in Rainbow Serpent for Razoring, it differs slightly from that in Stockfish, Stockfish does as far as I know razor when in check, but RomiChess does not...

Code: Select all

 // Razoring
  const Depth RazorDepth = 6*OnePly;

  // Remaining depth:                  1 ply         1.5 ply       2 ply         2.5 ply       3 ply         3.5 ply       4 ply         4.5 ply       5 ply         5.5 ply       6 ply         6.5 ply
  const Value RazorMargins[12]     = { Value(0x180), Value(0x300), Value(0x300), Value(0x3C0), Value(0x3F0), Value(0x430), Value(0x480), Value(0x500), Value(0x590), Value(0x640), Value(0x700), Value(0x800) };


  // Remaining depth:                  1 ply         1.5 ply       2 ply         2.5 ply       3 ply         3.5 ply       4 ply         4.5 ply       5 ply         5.5 ply       6 ply         6.5
  const Value RazorApprMargins[12] = { Value(0x500), Value(0x510), Value(0x520), Value(0x530), Value(0x540), Value(0x560), Value(0x580), Value(0x5A0), Value(0x5C0), Value(0x5E0), Value(0x620), Value(0x660) };
  // [EdG These higher values are probably not necessary if only used in simple endgames? Just experimenting here, but original code probably correct]

Code: Select all

    // Null move search not allowed, try razoring
    else if (   !value_is_mate(beta)
             &&  depth < RazorDepth
             &&  ttMove == MOVE_NONE
             && !isCheck
             &&  ss[ply - 1].reduction || Iteration >= 10
             &&  approximateEval < beta - RazorApprMargins[int(depth) - 2]
             &&  ss[ply - 1].currentMove != MOVE_NULL             
             && !pos.has_pawn_on_7th(pos.side_to_move()))
    {
		staticEval = evaluate(pos, ei, threadID);
		if (staticEval < beta - RazorMargins[int(depth) - 2])
		{
			Value v = qsearch(pos, ss, beta - RazorMargins[int(depth) - 2] - 1, beta - RazorMargins[int(depth) - 2], Depth(0), ply, threadID);
			if (v < beta - RazorMargins[int(depth) - 2])
				return v;
		}
    }

mcostalba · Post by **mcostalba** » Tue Nov 17, 2009 5:21 pm

Eelco de Groot wrote:
This is the code in Rainbow Serpent for Razoring, it differs slightly from that in Stockfish, Stockfish does as far as I know razor when in check, but RomiChess does not...

I have removed that condition with a patch of 18/12/2008 where it seemd to hurt a bit after 500 games, but I didn't log by how much and the test result.

They were my first attemps and I didn't have yet developed a proper testing procedure...perhaps it is time to give that condition another spin

Thanks for pointing out this.

BTW calling evaluation just before qsearch seems redundant to me because the evaluation is the first thing that qsearch will do and if above the limit it will return immendiately, so with your code you end up to call evaluation a second time without any need and without result becasue it will fail low for sure given the pre-qsearch condition.

Eelco de Groot · Post by **Eelco de Groot** » Tue Nov 17, 2009 6:22 pm

mcostalba wrote:
Eelco de Groot wrote:
This is the code in Rainbow Serpent for Razoring, it differs slightly from that in Stockfish, Stockfish does as far as I know razor when in check, but RomiChess does not...
I have removed that condition with a patch of 18/12/2008 where it seemd to hurt a bit after 500 games, but I didn't log by how much and the test result.

They were my first attemps and I didn't have yet developed a proper testing procedure...perhaps it is time to give that condition another spin

Thanks for pointing out this.

BTW calling evaluation just before qsearch seems redundant to me because the evaluation is the first thing that qsearch will do and if above the limit it will return immendiately, so with your code you end up to call evaluation a second time without any need and without result becasue it will fail low for sure given the pre-qsearch condition.

Hi Marco,

Yes, I did not test it in tournaments but it is more that if you have to avoid nullmove at higher depths, higher than the normal case with the remaining depth == OnePly for not doing null move, because of being in check, it probably is not so safe to razor either. At the moment I do not nullmove in quiescence search, but there also check in a previous position might interfere? Not sure. Usually you want to extend in check positions and razoring, with a razordepth of maximum six plies, seems unsafe then

But this is geared more towards analysis than Blitz.

The idea for staticEval was I believe copied from an idea that Bob mentioned, the idea is if the stand pat would return anyway now you save a functioncall to qsearch(), in the other case I admit you do a double eval but chances are high that you will still achieve your reduction because the razoring condition will be satisfied. I could try to store the staticEval in the transposition table before calling qsearch() but I don't think that is worth the delay?

A much more clean idea was to give the staticEval as a parameter to qsearch() but I just wanted to save that for a possible optimization later

I hope I am recalling my considerations a bit correctly, but I did think about it a bit! Maybe I got it wrong... These Razoring experiments were really much the only changes so far to the code in Stockfish it is just a coincidence that Michael Sherwin mentioned razoring in Stockfish.

The other idea I believe is that in the -probably rare- cases qsearch does not fulfill the razoring conditions I still can use staticEval for Futilit Pruning.

The -different from Stockfish- adjusted alpha and beta parameters for calling qsearch() I had almost forgotten but I think they are important in cases like this were you use a futility margin, well I just hope I got that right. Again I'm not completely sure yet but it seemed correct in other cases to do it like this.

Regards,
Eelco

When there is not progress

When there is not progress

Re: When there is not progress

Re: When there is not progress

Re: When there is not progress

Re: When there is not progress

Re: When there is not progress

Re: When there is not progress

Re: Razoring

Re: Razoring

Re: Razoring