Don wrote:So we tried being a bit more conservative with this version (with LMR specifically) in order to try to understand what was going on. When we did this, the program gets weaker and the time/elo scalability does not improve. It's just a slower program that does not scale as well.
It is incredible that, although universally used, LMR is still a kind of unkonw land in many ways. Scalability of LMR in particular is still far from being completely understood
I cannot help but wonder if the evaluation function is closely related to how aggressive you can be with LMR. If so I suspect it's not as simple as just how good the evaluation function is. I'm not sure the quality of the evaluation function is a one dimensional thing anyway. In fact I do not believe you can characterize chess programs simply by considering "search" and "evaluation" as 2 totally independent components.
Just as an example, if you could build a chess program by plugging in separate components from various programs could you produce a stronger program? For example if you used Rybkas search and Stockfish's evaluation or visa versa I think the resulting program would be weaker than both programs.
Jim Ablett wrote: Stockfish 1.9 JA by the Stockfish team.
We have mainly removed evaluation stuff that proved to be almost
useless, we have now access to a better hardware facility(*) and we
are able to test with a better resolution our evaluation code so to
remove old stuff that we never dared to touch.
Is it evaluation stuff that is almost useless because it caused stockfish to be slower or simply because it is not clear if the evaluation is better even with fixed number of nodes?
In the second case it is a good idea to remove it.
In the first case I am not sure because I suspect that evaluation knowledge (that is productive assuming no price in speed) can help more at longer time control.
Extrapolating to long time controls is a difficult and tricky exercise. Regarding evaluation I have experienced that most of the cases (but not all) can be safely proved at fast TC and they will hold also at longer TC.
Anyhow to answer your question, is the first case. We never test at fixed depth because is a very artificial condition and _could_ lead to artifacts.
It appears from our own testing of Komodo that most evaluation improvements can be proved at very fast time controls or even fixed depth games. But there are some notable exceptions. I think highly dynamic things such as king safety need more depth.
My feeling based on looking at games is that the stockfish team removed some productive king safety evaluation at long time control
The evaluation seems to be the main change between stockfish1.8 and stockfish1.9 and not the pruning and stockfish1.9 seems to still get significantly higher depth than other programs.
Yet another reason why the term "depth" is no longer meaningful. One can make a program search to any arbitrary depth by fiddling with LMR and forward pruning restrictions. But in general, we want depth N+1 to be stronger than depth N. It is easy to make that not happen. We no longer have a good way of describing the trees we search, when trying to use a term like "depth" or even "effective branching factor". Both have a specific definition, but today's programs are doing thing that affect both, but not in the expected ways. If two programs have a different definition of "depth" then "effective branching factor" is meaningless since it compares the tree sizes for two consecutive "depths" where depth is not a constant term between two programs, any longer. Back in the early days, when chess 4.x reported 6 ply searches on the cyber 176, we knew that a 4 ply search was going to get ripped apart. Today, comparing depth doesn't mean much...
Apart from that the increment is really silly at fast time controls
(as we normally test), perhaps we could have some surprises (hopefully
not bad ones) at longer TC where the new time management code by Joona
could kick in with a bit of luck.
Another change from 1.8 is that we lowered the aggressiveness of both
LMR and pruning, we will se if this will pay at longer TC, at fast TC
change is almost zero....
Seems that with longer time controls SF get really a little turbo!
GOOD WORK !!
the new TM is close to the border.
2 of 718 games SF lost on time, never see before that a SF version lost on time. So I think a very very little bit tuning could be solved this little problem. But all in all, time management with ponder = on is very good, could be 5 ELo to the preview versions, means the TM is definitiv better.
Furthermore, I think SF need a bit more king safty.
Vs. Deep Fritz 12 3 games lost very fast. The lost game vs. DF 12 I add in this message was one of the best computer chess games I ever saw. In this case it comes from the opponent, but not important.
But all in all a fantastic version.
I am very happy with SF 1.9.1 JA !!
Also vs. Smarthink (300 ELO different) a very fast lost game ...