I will give some logic for testing engines:
1.
Let's assume that Stockfish plays perfect chess?!
Now Stockfish vs. Engine x ... 1.000, 2.000 or perhaps 15.000 games!
Errorbar can not be right because Stockfish is playing the perfect chess.
2.
This means, the more in the near of perfect chess the lesser the errorbar and the calculation to errorbar is more and more wrong, The more powerful the engines become. That's pure logic! To the topic ... how many games we need.
You need 1 game ...
But you should know that the probably is 7,5% for a draw.
OK, so you need a bit more as 1 game.
The results with 1000, 2000, 3000, 15000 or 1000000000000000000000000000000 games will be the same.
3.
In the last years I made some experiments with my FEOBOS balanced opening book. The question is:
How much points will make an engine, in the near to perfection (endgame, transition into endgames), like Stockfish vs. an engine 700-800 weaker. If we know that we can say:
With an balanced opening book in 7,5% of cases an engine are able to play draw if the opponents plays perfect chess vs. the perfect engines ... means without any blunder. Probabley for many ECO codes is high, if the quantity of pieces on board is less.
4.
Now it's really easy:
How strong can be an engine, plays near the perfection in computer chess?
Should make vs. all oponents, around 92,5%, because with a balanced opening book every other engines (not more as around 800 Elo weaker) should make max. 7,5% of points if ... this engines are free of a blunder in a game. And the reason is ... every engine have the advantage to start with a balanced opening system.
5.
At the moment the strongest engine is playing with around 83,5% of points!
~9% - max. 10% more points is possible.
If Stockfish 18 is playing with 3575 Elo, around 3750 - 3800 Elo is max. possible.
Note what is certain to result in a draw in the 500 ECO codes.
Often I read here ...
More Elo isn't possible?
That's wrong.
If 30 Elo per year in around 6-8 years the engines is playing chess near the perfection!
If the Stockfish developers ...
This was a question for a sleepless night.
I give this anwere to myself and I sleep like a baby!
Maybe this helps others, for sleepless nights!
Best
Frank
And after all you will come to the conclusion:
Elo will no longer be suitable for measuring playing strength if the scores achieved continue to rise at the rate we see in opening theory.