Cache over-writing and PV's

Daniel Mehrmann · Post by **Daniel Mehrmann** » Fri Oct 17, 2014 12:34 pm

hgm wrote:Printing a complete PV as supposed to badly clipping it is another feature that has a huge impact on the usefulness of an engine. There do exist good solutions for that (e.g. the method used in Crafty). But of course they don't provide any Elo. The overhead to bother with saving the PV might even cost 0.1 Elo. So forget about it...

Probably it's the best to disable all SF output and only send "best move" at the end of a search. I guess they earn one, two or more ELO points for free.

Don't understand me wrong, it's a lot of hard work to reach this level where SF currently stands. With respect: Congratulation! (No joke)

But for the typical chessplayer this engine is a nightmare. I'm not talking about your testing work. It's more about PV's and uci options as well.

The user don't need an engine which reach depth 20 or higher in a few seconds with a lot of PV and score changes. The user doesn't need uci search related options like "split depth" and so on...

What the fu** is "split depth" or "slaves" ??

Last but not least you should follow the uci protocol instead of going your own way (Example play strenght)

I could write so much more, but i don't wanna help you with your "ELO" race.
In fact, if people asking me which engine should they use to analyze games, and they doing this, my suggestion is Kommodo or Houdini and stay away from SF.

hgm · Post by **hgm** » Fri Oct 17, 2014 1:52 pm

xmas79 wrote: I do not like people who start "offending" other people when they have no arguments, because there are absolutely none in this particular situation...

Oh well, there are one or two people here that seem to think any form of disagreement needs at least one nasty ad-hominem remark to be taken seriously. You will get used to it!

bob · Post by **bob** » Fri Oct 17, 2014 2:33 pm

syzygy wrote:
gladius wrote:
hgm wrote:Note that the design goal of Stockfish is not to be a good or useful engine. Its goal is to optimize beating a small set of opponents in bullet games under conditions where hash replacement virtually does not occur. Any patches that would improve its hashing, or solve the problems that you are experiencing, would be instantly rejected, as they would not cause a higher win rate in the bulet games.
This is just not true. The goal is indeed to be a "good and useful" engine. The *means* to achieve this is indeed by self-testing in bullet matches most of the time. This has proven to be a very effective strategy for improving strength. Changes that affect hash strategy are certainly tested at hash levels where replacement plays a major role (4mb hash at 60s).
I'm still waiting for any of the critics to explain how they are succesfully using 10-minute searches to improve their engines. Those engines must be really excellent at playing long time controls! I'm sure they'd blow SF away.

Oh wait...

Now there's an argument.

But SF testing is far from perfect and has too many regression errors. Any time you see a comment "you should only test the same idea 3-4 times max, as otherwise you will likely get a false positive, it makes you wonder.... or it should. The early SPRT terminations make testing go faster, but at a significant cost.

And not all modifications work well at bullet. Some only show a gain at longer time controls. So that IS a valid criticism to make for anyone that uses bullet games in the wrong conditions.

Zenmastur · Post by **Zenmastur** » Fri Oct 17, 2014 3:43 pm

Stan Arts wrote:This second cache is already very common. The classical approach is a depth and exact score (possibly PV nodes) preferred part of the table, that eventually saves a lot of interesting nodes that took a lot of work high up the tree.
And an always replace part that simply stores every node searched. That gives you a lot of local hits or stuff deep down the tree.

Might not be just a lack of memory as a program like Stockfish does a lot of funky stuff depending on move ordening and so researching with hash information is sometimes going to have you search different searchspace. Just a guess.

Hmmm... common huh...
To bad their isn't a list that users could use to determine which programs have which features etc. to make better informed decisions about which program would be best for a particular purpose.

As far as the cache getting flushed, I highly doubt this is the cause since I use the fact that it doesn't get flushed between positions. SF and many other program I have used on most occasions will pick up a mate that is in 15+ moves at ply one and immediately (20 milliseconds or so) advance to iteration 30-60 (depending on the depth of search of the previous position) when I back up one , two or even four plies in a line of analysis I'm working on. This is only possible if the cache hasn't been flushed.

So I don't think this is the problem.

xmas79 wrote:
Aren't you using "stockfish"? Why did you choose stockfish? It is a random choice or have you pondered on it and then picked up one of the strongest chess program in the world?

Well, we clearly think about programs differently. I blow by all the hyperbola about different programs. I evaluate them based on them meeting my particular needs. A lot of the time I use crafty because it has a lot of utilities that I find very useful.

Originally I wasn't using Stockfish. I had tried it and it absolutely sucked rocks for finding mates. Then I read a post that said this had been fixed so I downloaded the latest version and tried it. It had, in fact, been fixed not only that but it was faster and was finding mates that were deeper than the program I was using. Since it's also a very strong OTB program I switched. Nothing magical, it just does a better job than some other programs do.

hgm wrote:Most programs use some sort of depth-preferred hashing, which would be very reluctant to overwrite results of very deep searches. So it would indeed is very fishy when, after having searched all daughter positions to N ply and found mate in at most M there, searching the original position would not instantly in all iterations upto N+1 report mate in M+1. The depth-N entries of the daughter nodes cannot possibly have been overwritten, as no search deeper than N has been performed, and it is pretty inconceivable that some 50 moves in a table of billions would collide with each other. (And even that should not be a problem in most hash designs, unless more than 3 of them mapped into the same bucket...)

Not sure I understand completely your point. 8-GB of cache is 512 million entries. At 10 million nodes per second (assume 10-million cache writes are exexuted, I know this is a large over estimation but bear with me for a minute) then we have about 50 seconds of time before the cache will start being overwritten. In a five day long search (5*24*60*60/50= 8640) the cache on average could be over written as many as 8640 times. Now lets assume I over estimated by a factor of 10. Has anything change that would change this argument? Most of the cache will be over written.

hgm wrote:So this likely has to do with undesirable flushing of the hashed tree of previous moves. E.g. after searching the positon after g1f3, taking it back, and trying g1e2, the engine might figure that the position after g1f3 now cannot occur anymore, as you already moved the Knight elsewhere, and thus consider all entries belonging to that search now worthless, no matter how deep a search they were from, and overwrite them. In a game this would of course be true, but in interactive analysis it can be very counter-productive, dooming your approach.

My experience is that this either doesn't happen at all or happens only rarely and I use this fact to do a lot of my deep analysis.

hgm wrote: Note that the design goal of Stockfish is not to be a good or useful engine. Its goal is to optimize beating a small set of opponents in bullet games under conditions where hash replacement virtually does not occur. Any patches that would improve its hashing, or solve the problems that you are experiencing, would be instantly rejected, as they would not cause a higher win rate in the bulet games.

jdart wrote:Stockfish doesn't exclusively use bullet games for testing. Still, you are right it is probably not tuned for very deep long time control searches.

--Jon

I have noticed that some developers are solely interested in the engines performance in the next TCEC or other tournaments. This is unfortunate. On the other hand I haven't seen a great deal of evidence that testing at short time controls hurts long time control performance. I have seen several post that imply this is true, but I'm not convinced this is the case.

gladius wrote: This is just not true. The goal is indeed to be a "good and useful" engine. The *means* to achieve this is indeed by self-testing in bullet matches most of the time. This has proven to be a very effective strategy for improving strength. Changes that affect hash strategy are certainly tested at hash levels where replacement plays a major role (4mb hash at 60s).

You are also wrong about such a fix being "instantly rejected". Where does this extremely pessimistic view of SF development come from?

Yes, the main goal is definitely to increase ELO, but usability of the engine is also a big goal. For example, the issue with SF having a lot of trouble finding mates in SF5 was fixed recently, and was an ELO neutral fix at best.

I can attest to the fact that this was a MAJOR improvement and is why I started using stockfish. While it may have been ELO neutral it made a huge difference in its over all usefulness for analysis.

gladius wrote: I took a look at the search output, and I think everything looks okay actually. SF is not actually losing the mate, it's finding shorter and shorter mates as it goes deeper. You should ignore the fail high/low output of SF along the way (ie. looks like 62/75- or 62/75+ instead of just 62/75), if we remove that your search log looks like this:

I wish I would have saved the analysis that I let run to iteration 70, but there is no use crying over spilt milk. I don't have the time to let it run to that depth again. From what I remember, it didn't settle on a single move like Nc6 or Bf5. What it was doing was finding a mate in 24, say -Bf5 then failing low a bunch of times until Bf5 was no longer showing a mating score, then finding a mate in 24 -Nd7 and then failing low, then finding a mate in 24 -Nc6, then failing low, then finding a mate in 24 -a6, -a5 then the king moves etc. I actually ran this search to depths greater than 60 iterations 5 or 6 times. All of them did this without exception.

gladius wrote:The fail high/low output is the result of an incomplete search, it's just meant to give an idea of what the engine is working on - not what it's current best score/line is. Once the fail high/low has been resolved, you get the actual output, and SF sticks to the mate in it's main lines.

...
31/47 00:03.460 24,213k 6,998k +59.75 43. ... Kc6 44.Be6 Na6 45.f5 Kc7 46.fxg6 hxg6 47.Rg4 Nc5 48.Rc4 Kd6 49.Rxc5 Kxc5 50.h4 Kd6 51.Bf7 g5 52.hxg5 Ke7 53.g6 Kf8 54.a5 Kg7 55.Kb4 Kf8 56.a6 Kg7 57.a7 Kf6 58.a8Q Ke7 59.g7 Kxf7 60.g8Q+ Kf6 61.Qh8+ Kg6
32/47 00:03.517 24,595k 6,993k +59.75 43. ... Kc6 44.Be6 Na6 45.f5 Kc7 46.fxg6 hxg6 47.Rg4 Nc5 48.Rc4 Kd6 49.Rxc5 Kxc5 50.h4 Kd6 51.Bf7 g5 52.hxg5 Ke7 53.g6 Kf8 54.a5 Kg7 55.Kb4 Kf8 56.a6 Kg7 57.a7 Kf6 58.a8Q Ke7 59.g7 Kxf7 60.g8Q+ Kf6 61.Qh8+ Kg6
33/47 00:03.668 25,642k 6,991k +59.75 43. ... Kc6 44.Be6 Na6 45.f5 Kc7 46.fxg6 hxg6 47.Rg4 Nc5 48.Rc4 Kd6 49.Rxc5 Kxc5 50.h4 Kd6 51.Bf7 g5 52.hxg5 Ke7 53.g6 Kf8 54.a5 Kg7 55.Kb4 Kf8 56.a6 Kg7 57.a7 Kf6 58.a8Q Ke7 59.g7 Kxf7 60.g8Q+ Kf6 61.Qh8+ Kg6
34/47 00:03.728 26,085k 6,997k +59.75 43. ... Kc6 44.Be6 Na6 45.f5 Kc7 46.fxg6 hxg6 47.Rg4 Nc5 48.Rc4 Kd6 49.Rxc5 Kxc5 50.h4 Kd6 51.Bf7 g5 52.hxg5 Ke7 53.g6 Kf8 54.a5 Kg7 55.Kb4 Kf8 56.a6 Kg7 57.a7 Kf6 58.a8Q Ke7 59.g7 Kxf7 60.g8Q+ Kf6 61.Qh8+ Kg6 62.Qc6+
35/51 00:13.651 101,418k 7,429k +M29 43. ... h6 44.Be6 Be8 45.f5 Nc6 46.f6 Ne5 47.Re4 Ng6 48.Bf5 Bf7 49.Bxg6 Bxg6 50.Rg4 Bh5 51.Rh4 Bf7 52.Rxh6 Kd6 53.Rh7 Bg6 54.Ra7 Ke6
36/51 00:30.541 227,743k 7,457k +M28 43. ... h5 44.Rd5 Be8 45.Bb5 Bf7 46.Rf5 Be6 47.Rxh5 Bg4 48.Rh7+ Kb6 49.Rh6+ Ka5 50.Rh8 Nd7 51.Bxd7 Bxd7 52.Rh7 Be6 53.f5 Bg8 54.Re7 Kxa4 55.Rg7 Bd5 56.f6 Kb5 57.Rg5 Kc6 58.Rxd5
37/51 00:31.176 232,078k 7,444k +M28 43. ... h5 44.Rd5 Be8 45.Bb5 Bf7 46.Rf5 Be6 47.Rxh5 Bg4 48.Rh7+ Kb6 49.Rh6+ Ka5 50.Rh8 Nd7 51.Bxd7 Bxd7 52.Rh7 Be6 53.f5 Bg8 54.Re7 Kxa4 55.Rg7 Bd5 56.f6 Kb5 57.Rg5 Kc6 58.Rxd5
38/51 00:35.573 262,931k 7,391k +M28 43. ... h5 44.Rd5 Be8 45.Bb5 Bf7 46.Rf5 Be6 47.Rxh5 Bg4 48.Rh7+ Kd8 49.a5 Bf5 50.Rh8+ Kc7 51.Rh6 Kb7 52.Rf6 Be4 53.a6+ Nxa6 54.Bxa6+ Kc7 55.Bd3 Bf3 56.f5 Kd8 57.Rh6 Kc7 58.f6 Bd5 59.Bc4 Bxc4 60.Kxc4 Kd6 61.Rh7 Ke6 62.f7 Ke7 63.Kd5 Kf8 64.Kc5 Ke7 65.f8R+ Kxf8 66.Ra7
39/51 00:41.484 323,672k 7,802k +M28 43. ... h5 44.Rd5 Be8 45.Bb5 Bf7 46.Rf5 Be6 47.Rxh5 Bg4 48.Rh7+ Kd8 49.a5 Bf5 50.Rh8+ Kc7 51.Rh6 Kb7 52.Rf6 Be4 53.a6+ Nxa6 54.Bxa6+ Kc7 55.Bd3 Bf3 56.f5 Kd8 57.Rh6 Kc7 58.f6 Bd5 59.Bc4 Bxc4 60.Kxc4 Kd6 61.Rh7 Ke6 62.f7 Ke7 63.Kd5 Kf8 64.Kc5 Ke7 65.f8R+ Kxf8
40/51 00:57.943 482,098k 8,320k +M28 43. ... h5 44.Rd5 Be4 45.Rxh5 Nd7 46.a5 Kb7 47.Rh6 Nf8 48.Rf6 Nd7 49.a6+ Ka8 50.Re6 Bh1 51.Re8+ Ka7 52.Re7 Bc6 53.f5 Kb6 54.Rxd7 Bxd7 55.f6 Be8
41/54 01:01.566 518,168k 8,416k +M28 43. ... h5 44.Rd5 Be4 45.Rxh5 Nd7 46.a5 Kb7 47.Rh6 Nf8 48.Rf6 Nd7 49.a6+ Ka8 50.Re6 Bf3 51.Re8+ Ka7 52.Re7 Bc6 53.f5 Kb6 54.Rxd7 Bxd7 55.f6 Be8
42/56 01:16.226 664,758k 8,721k +M28 43. ... h5 44.Rd5 Be4 45.Rxh5 Nd7 46.a5 Kb7 47.Rh6 Nf8 48.Rf6 Nd7 49.a6+ Ka8 50.Re6 Bf3 51.Re8+ Ka7 52.Re7 Bc6 53.f5 Kb6 54.Rxd7 Bxd7 55.f6 Be8
43/56 01:55.052 1,054,333k 9,164k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
44/57 02:03.174 1,142,733k 9,277k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
45/57 02:06.098 1,173,265k 9,304k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
46/57 02:34.998 1,454,168k 9,382k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
47/57 03:03.619 1,748,551k 9,523k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
48/57 03:12.095 1,840,669k 9,582k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
49/57 04:05.105 2,391,383k 9,757k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
50/57 04:46.988 2,829,157k 9,858k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
51/57 06:50.326 4,074,104k 9,929k +M28 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2
52/67 48:36.682 29,443,642k 10,095k +M26 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2 Bf3 47.Re7+ Kd6 48.Rxh7 Nd7 49.a5 Be4 50.Rh6+ Kc5 51.Rh5+ Kd6 52.Kd4 Bf3 53.Rh6+ Kc7
53/71 5:06:52.470 185,058,793k 10,051k +M26 43. ... Bf5 44.Rd5 Be4 45.Re5 Bg2 46.Re2 Bf3 47.Re7+ Kd6 48.Rxh7 Nd7 49.a5 Be4 50.Rh6+ Kc5 51.Rh5+ Kd6
54/71 7:04:25.286 256,362,361k 10,067k +M24 43. ... Bf5 44.Rd5 Bg6 45.f5 Bh5 46.f6 Bg6 47.Rd4 Nc6 48.f7 Bxf7 49.Bxf7 Ne5 50.Bh5 Kb6 51.Rd5 Nc6 52.Rd6 Kc7 53.Rxc6+ Kxc6
55/71 7:33:53.106 274,534,343k 10,081k +M24 43. ... Bf5 44.Rd5 Bg6 45.f5 Bh5 46.f6 Bg6 47.Rd4 Nc6 48.f7 Bxf7 49.Bxf7 Ne5 50.Bh5 Kb6 51.Rd5 Nc6 52.Rd6 Kc7 53.Rxc6+ Kxc6 54.Kd4 Kc7 55.Bf7
56/75 25:07:30.658 899,477,435k 9,944k +M24 43. ... Bf5 44.Rd5 Bg6 45.f5 Bh5 46.f6 Bg6 47.Rd4
57/75 30:12:54.512 1,086,334,318k 9,987k +M24 43. ... Nc6 44.Rd5 Ne7 45.Re5
58/75 35:11:25.737 1,269,825,516k 10,023k +M24 43. ... Nc6 44.Rd5 Ne7 45.Re5
59/75 38:14:36.866 1,386,206,840k 10,069k +M24 43. ... Nc6 44.Rd5 Ne7 45.Re5
60/75 43:51:45.495 1,598,624,748k 10,124k +M24 43. ... Nc6 44.Rd5 Ne7 45.Re5
61/75 51:01:08.022 1,866,180,440k 10,161k +M24 43. ... Nc6 44.Rd5 Ne7 45.Re5

Since I down loaded this version it has been running 24/7 on my machines. Generally multiple instances per machine. I've become quite familiar with the way it behaves. I can tell when it has a solid line of play and when it is going to change its PV. Sometimes hours in advance of when the change actually happens. To be honest I have the code but haven't look at it. So I don't know much about it internal workings. But I have analyzed with it enough to have a very good idea when it's "happy" with the PV it's displaying.

When it outputs a full length PV its generally ok with the line. If on subsequent iterations it clips even one move from the PV it means there is something wrong with the PV. If left running the PV is very likely to change. It may take multiple iterations. The score may or may not change. When it starts clipping multiple moves from a formerly full length PV it means something major is wrong with the PV and it is going to change and its probably going to change sooner than later. When I let it run to 70 iterations, the last three iterations the PV was a single move, Nd7 if I recall and it had been failing low just previous to this. This is a sure sign that the program is going to change both the PV and its score. How many iteration that would take, I have no clue. A single iteration could take less than the last iteration took or it could take 100-200 times as long as the last iteration took. One thing is for sure, if time is limited, the last thing you want to see is the program chop the PV to one move after failing low and then repeat it for several iterations, because it means the move and it's current score are probably worthless.

Picking up on these cues is a great help during analysis because it lets me know when to let it run longer on a position on when to move to the next or previous position in the line of play. This is one reason I have grow to like SF so much in the short time I've been using it. Being able to read cues that are in the programs output makes it perfect for correspondence games. It saves tons of time.

syzygy wrote: I'm still waiting for any of the critics to explain how they are succesfully using 10-minute searches to improve their engines. Those engines must be really excellent at playing long time controls! I'm sure they'd blow SF away.

Oh wait...

I think everybody understands what you are saying, though some don't want to admit it.

xmas79 wrote: Oh boy, a search at depth 61 with only... one..two..three...mmmhh four... wow.. four moves, which should have 48 moves to show the +M24 REALLY looks OK to you?? Please... As a programmer I can understand all the whole output (including fail high and fail low which are a fundamental part of the search) and agree with you that everything actually IS ok (since the engine will play the correct move anyway), but from a pure user point of view (which seems to me the OP POV), this is a HUGE problem IMHO.

Just my 2 cents,
Natale.

As I stated above, it's been my experience that when SF start chopping it's PV something in the milk isn't white.

hgm wrote:Printing a complete PV as supposed to badly clipping it is another feature that has a huge impact on the usefulness of an engine. There do exist good solutions for that (e.g. the method used in Crafty). But of course they don't provide any Elo. The overhead to bother with saving the PV might even cost 0.1 Elo. So forget about it...

I don't know if I agree 100%, but I do know that if your dredging your cache for the PV and the time control is long enough that major portions of the cache are over written before the move is made or analysis stopped, the last PV out put is likely to be trashed. This makes it completely unreliable and is a nightmare as far as time is concerned. I save the PV's and then use the engine to search each and every move in it to ferret out all the trash moves. I can't ever recall a case where the PV was accurate. I always find errors and the evaluations always change. This is a huge waste of time. Not all of this time is a waste but a major fraction of it is. The really bad part is that it takes tons of manual intervention to perform these tasks. This is the real killer. Making even marginal improvements to the accuracy of the PVs that are output may not add any ELO to the program for OTB play but it saves tons of time during analysis.

Regards,

Forrest

Joerg Oster · Post by **Joerg Oster** » Fri Oct 17, 2014 5:19 pm

gladius wrote:
hgm wrote:Note that the design goal of Stockfish is not to be a good or useful engine. Its goal is to optimize beating a small set of opponents in bullet games under conditions where hash replacement virtually does not occur. Any patches that would improve its hashing, or solve the problems that you are experiencing, would be instantly rejected, as they would not cause a higher win rate in the bulet games.
This is just not true. The goal is indeed to be a "good and useful" engine. The *means* to achieve this is indeed by self-testing in bullet matches most of the time. This has proven to be a very effective strategy for improving strength. Changes that affect hash strategy are certainly tested at hash levels where replacement plays a major role (4mb hash at 60s).

You are also wrong about such a fix being "instantly rejected". Where does this extremely pessimistic view of SF development come from?

Yes, the main goal is definitely to increase ELO, but usability of the engine is also a big goal. For example, the issue with SF having a lot of trouble finding mates in SF5 was fixed recently, and was an ELO neutral fix at best.

Hi Gary,

sorry, but i must disagree.
There are many examples of patches that didn't make it into the official branch, only because they don't gain elo. For example KBBK detection of draws with bishops on the same color, 3-fold rep patch, better verification search, etc. But they would be quite useful in analysis!

hgm · Post by **hgm** » Fri Oct 17, 2014 8:05 pm

Zenmastur wrote:Not sure I understand completely your point. 8-GB of cache is 512 million entries. At 10 million nodes per second (assume 10-million cache writes are exexuted, I know this is a large over estimation but bear with me for a minute) then we have about 50 seconds of time before the cache will start being overwritten. In a five day long search (5*24*60*60/50= 8640) the cache on average could be over written as many as 8640 times. Now lets assume I over estimated by a factor of 10. Has anything change that would change this argument? Most of the cache will be over written.

The point is that any engine more sophisticated than micro-Max is selective in what it overwrites, and (for instance) always overwrites the least deep of each group of 4 hash entries. This should save the complete search tree upto the depth that still fits in 3GB (3/4 of the hash size) basically forever, no matter how many days you continue searching. All those billions and billions of nodes further from the root will have to fight for the remaining 1GB. The have no chance to ever force out something that was closer to the root. So in particular the 40 moves at ply 1 should be save for overwriting. They would be safe even with a 16KB hash table.

On the other hand I haven't seen a great deal of evidence that testing at short time controls hurts long time control performance. I have seen several post that imply this is true, but I'm not convinced this is the case.

I never implied that the time control mattered. What I wanted to point out is that when win rate in games becomes the only criterion, general quality will badly suffer. Why repair it when an undo command immediately crashes the engine? There are no undo commands in games... It is easily calculated that stupid behavior that occurs so frequently it sticks out like a mountain might not result in any measurable Elo loss.

hgm wrote:I don't know if I agree 100%, but I do know that if your dredging your cache for the PV and the time control is long enough that major portions of the cache are over written before the move is made or analysis stopped, the last PV out put is likely to be trashed.

This is exactly why a quality engine does not dredge the PV out of the hash. (I did that in Joker, and usually the PV is completely different then the one that actually produced the score. E.g. a PV that ends in a checkmate, while the score is not a mate score.)

The more reliable way is the 'triangular-array method'. The problem there is that it suffers from hash cuts, which is why some engines do not allow hash cuts in PV nodes. Crafty does allow the hash cuts, but has a separate hash table to hash complete PVs, so the tail of the PV can be restored on a hash cutoff. (And if that is overwritten, there is of course no hash cut.)

Stan Arts · Post by **Stan Arts** » Fri Oct 17, 2014 8:59 pm

hgm wrote: This is exactly why a quality engine does not dredge the PV out of the hash. (I did that in Joker, and usually the PV is completely different then the one that actually produced the score. E.g. a PV that ends in a checkmate, while the score is not a mate score.)

That sounds pretty bad/buggy. What caused that?

Because my hash dredged PV's have a pretty high accuracy rate.

Could have something to do with your replacementscheme and the order in which you retrieved moves from which part of hash?

Biggest of infact only problem I have is with repetitions. When my program allows a repetition but will play something else the third time. Ofcourse the hash PV can't diverge and then shows a faulty rep draw PV. Then again it doesn't go 0.00 there like an idiot while that seems to be the norm.

Come to think of it I could fix the faulty rep thing by "guessing" a best move/looking what else is available in hash when score isn't 0.00. Well, or a hash triangle thing! Hmm.

Adam Hair · Post by **Adam Hair** » Fri Oct 17, 2014 9:10 pm

bob wrote:
syzygy wrote:
gladius wrote:
hgm wrote:Note that the design goal of Stockfish is not to be a good or useful engine. Its goal is to optimize beating a small set of opponents in bullet games under conditions where hash replacement virtually does not occur. Any patches that would improve its hashing, or solve the problems that you are experiencing, would be instantly rejected, as they would not cause a higher win rate in the bulet games.
This is just not true. The goal is indeed to be a "good and useful" engine. The *means* to achieve this is indeed by self-testing in bullet matches most of the time. This has proven to be a very effective strategy for improving strength. Changes that affect hash strategy are certainly tested at hash levels where replacement plays a major role (4mb hash at 60s).
I'm still waiting for any of the critics to explain how they are succesfully using 10-minute searches to improve their engines. Those engines must be really excellent at playing long time controls! I'm sure they'd blow SF away.

Oh wait...
Now there's an argument.

But SF testing is far from perfect and has too many regression errors. Any time you see a comment "you should only test the same idea 3-4 times max, as otherwise you will likely get a false positive, it makes you wonder.... or it should. The early SPRT terminations make testing go faster, but at a significant cost.

I am not sure that I understand what you mean by "significant cost". I may be misremembering, but I think that the SPRT bounds (which are used to determine when to pass or reject a change) have been set to hold both false positives and negatives to 5%.

bob wrote: And not all modifications work well at bullet. Some only show a gain at longer time controls. So that IS a valid criticism to make for anyone that uses bullet games in the wrong conditions.

syzygy · Post by **syzygy** » Fri Oct 17, 2014 9:12 pm

xmas79 wrote:
hgm wrote:Printing a complete PV as supposed to badly clipping it is another feature that has a huge impact on the usefulness of an engine. There do exist good solutions for that (e.g. the method used in Crafty). But of course they don't provide any Elo. The overhead to bother with saving the PV might even cost 0.1 Elo. So forget about it...
I have no problem with that at all, I know it is ok for a developer, but it is "crystal clear" that it is NOT for a user. And that should also be crystal clear to people that take up the cudgels on stockfish. I do not like people who start "offending" other people when they have no arguments, because there are absolutely none in this particular situation...

"Oh boy", seems I got you upset.

syzygy · Post by **syzygy** » Fri Oct 17, 2014 9:35 pm

Zenmastur wrote:I don't know if I agree 100%, but I do know that if your dredging your cache for the PV and the time control is long enough that major portions of the cache are over written before the move is made or analysis stopped, the last PV out put is likely to be trashed. This makes it completely unreliable and is a nightmare as far as time is concerned.

Clipping of the PV in SF does not make the non-clipped part unreliable.

I save the PV's and then use the engine to search each and every move in it to ferret out all the trash moves. I can't ever recall a case where the PV was accurate. I always find errors and the evaluations always change.

Sure, but what do you expect?

If the root position is searched to depth 40, then the position after 20 half moves was searched only to depth 20. That's just how things work.

Even if you get SF (or any other engine) to return a long PV, putting a lot of trust in the second half of the PV isn't very wise.

Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's