Stockfish randomicity

amchess · Post by **amchess** » Thu Sep 21, 2023 12:16 pm

I have noticed that when running a game, even with LTC starting from a specific position, Stockfish and its derivatives do not always play the same move. There is randomness, therefore, in the algorithm. Analyzing the code, the only point where it appears in the search is the following:

Code: Select all

// Choose best move. For each move score we add two terms, both dependent on
// weakness. One is deterministic and bigger for weaker levels, and one is
// random. Then we choose the move with the resulting highest score.
for (size_t i = 0; i < multiPV; ++i)
{
// This is our magic formula
int push = int(( weakness * int(topScore - rootMoves[i].score)
+ delta * (rng.rand() % int(weakness))) / 128);

    if (rootMoves[i].score + push >= maxScore)
    {
        maxScore = rootMoves[i].score + push;
        best = rootMoves[i].pv[0];
    }
}

Why was this randomness introduced in the selection of the "best move"? In fact, it is more pronounced in sharp positions, and if you want to test an engine, you have to choose such positions since the nnue has significantly raised the level of play...

Joerg Oster · Post by **Joerg Oster** » Thu Sep 21, 2023 1:03 pm

amchess wrote: ↑Thu Sep 21, 2023 12:16 pm I have noticed that when running a game, even with LTC starting from a specific position, Stockfish and its derivatives do not always play the same move. There is randomness, therefore, in the algorithm. Analyzing the code, the only point where it appears in the search is the following:
Code: Select all
// Choose best move. For each move score we add two terms, both dependent on
// weakness. One is deterministic and bigger for weaker levels, and one is
// random. Then we choose the move with the resulting highest score.
for (size_t i = 0; i < multiPV; ++i)
{
// This is our magic formula
int push = int(( weakness * int(topScore - rootMoves[i].score)
+ delta * (rng.rand() % int(weakness))) / 128);

    if (rootMoves[i].score + push >= maxScore)
    {
        maxScore = rootMoves[i].score + push;
        best = rootMoves[i].pv[0];
    }
}
Why was this randomness introduced in the selection of the "best move"? In fact, it is more pronounced in sharp positions, and if you want to test an engine, you have to choose such positions since the nnue has significantly raised the level of play...

This is only for playing with Skill Levels.

amchess · Post by **amchess** » Thu Sep 21, 2023 8:27 pm

Yes, my error, but the problem persists.
In fact, it is related to the lazy-smp and multi-threading of the OS.
With nnue, truly sharp positions must be chosen for testing.
By definition, such are those for which the first and second choices are considered more or less equivalent, but lead one to a draw and the other to victory.
So the sharper the positions, the more random the results obtainable. How to be sure, then, of the effectiveness of a patch (= an elo increase), especially at long times when too many games cannot be run?

syzygy · Post by **syzygy** » Fri Sep 22, 2023 11:40 pm

amchess wrote: ↑Thu Sep 21, 2023 8:27 pm Yes, my error, but the problem persists.
In fact, it is related to the lazy-smp and multi-threading of the OS.

A deterministic yet efficient parallel search is not possible to achieve.

A single-threaded search is deterministic, but this a kind of fake determinism. Change the hash table size, you get different moves. Change the Zobrist values, you get different moves. Change anything, you get different moves.

How to be sure, then, of the effectiveness of a patch (= an elo increase), especially at long times when too many games cannot be run?

The only way to know whether a patch improves play is to play thousands of different games. If anything, lack of determinism is an advantage, not a disadvantage.

amchess · Post by **amchess** » Sat Sep 23, 2023 10:27 am

Of course.
The problem is that to play hundreds of games, you have to use very short time controls.
This favors patches with very thick cuts (pruning) that are not necessarily useful at longer times.
Therefore, Stockfish performs worse at long times and in solving complicated positions.
The statistics of the various patches in the Stockfish framework also show this.
Red patches have been integrated at very fast time controls, but not already at LTC (for them, 10s+1s !).
One idea could be a tournament where the engine plays not only against itself, but also against engines of more or less similar strength to see how it rips them compared to the previous version.
I also think that test positions cannot be random because it is getting harder and harder to find ones with results that are not basically "decided." Here, chess knowledge might intervene for an intelligent testing strategy.
In short, imho, there is food for thought and a lot of it....

Ciekce · Post by **Ciekce** » Sat Sep 23, 2023 12:55 pm

SF does test every patch at LTC.

amchess · Post by **amchess** » Sat Sep 23, 2023 1:46 pm

LTC = 10s+0.1s
but very long time control is a problem of time and resources.

pgg106 · Post by **pgg106** » Sat Sep 23, 2023 7:12 pm

LTC = 60s + 0.6*
and that's excluding all the tests that are also tried at vltc or with more than one core, with how many sf ltcs you merged you should know by now that the time control is.

connor_mcmonigle · Post by **connor_mcmonigle** » Sat Sep 23, 2023 7:16 pm

amchess wrote: ↑Sat Sep 23, 2023 10:27 am ...
chess knowledge might intervene for an intelligent testing strategy.
...

You've not proposed any alternative nor offered any explanation in regards to your claim that the chaotic behavior (in the sense of high sensitivity to initial conditions) of Stockfish's search makes testing difficult. Not much food for thought as I see it and "chess knowledge" seems entirely irrelevant. Playing chess well is about playing "good moves" (moves which preserve the game theoretic value of a position) with high probability and "bad moves" (moves which change the game theoretic value of a position) with low probability. If you slightly tweak SF's initial search conditions, you can sort of sample from SF's latent policy over the action space (and uncover that latent policy by performing millions of searches and collecting statistics). Good patches are patches which yield latent policies which assign higher probability to "good moves" relative to the previous version (over the stationary distribution of positions as determined by said policy).

syzygy · Post by **syzygy** » Sat Sep 23, 2023 10:01 pm

amchess wrote: ↑Sat Sep 23, 2023 10:27 am Of course.
The problem is that to play hundreds of games, you have to use very short time controls.
This favors patches with very thick cuts (pruning) that are not necessarily useful at longer times.
Therefore, Stockfish performs worse at long times and in solving complicated positions.

Totally different topic. You just seem to be looking for something to complain about.

The pros and cons of testing at ultrabullet time controls have been discussed to death already. Reality shows that it works.

Stockfish randomicity

Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity