It would be interesting to see how the draw rate compares to 12 parallel matches on 12 cores.Modern Times wrote: ↑Sun Apr 16, 2023 12:48 pmStefan Pohl does exactly this for all his Stockfish and other ratings lists - he runs 20 threads on his 12-core Ryzens. Everyone seems happy with the results, I guess because of the large number of games he runs.syzygy wrote: ↑Sat Apr 15, 2023 11:57 pm
2. When doing ultrabullet testing on such a CPU, should you run more parallel matches than there are physical cores (each match being between two single-threaded engines without pondering)?
This is probably not worth it. You get more games but of lower quality (since nps per engine will be much lower). If you increase the time control for each game to correct for the nps loss, you still get a bit more games than without hyperthreading, but there will be a lot more noise which decreases the statistical relevance of the results. Noise means you need many more games to get the same statistical significance. (And if your statistical model does not reflect this, you will have results that are less reliable than you think.)
Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
Moderator: Ras
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
This is a pretty safe bet. The CPU back end knows nothing about hyper-threading; this is purely a front-end matter (fetcher and decoder). Performance would suffer if it would not always schedule the oldest execution-ready microOp in the re-order buffer, because leaving an old instruction unexecuted can stall the retire unit (and thus the entire pipeline). It wouldn't care at all which HT put that microOp in the reorder buffer. I think it is safe to assume that the designers are not stupid, and would have to see proof of that before I believe it.
Statistical significance goes up if draw rate goes down. Drawn games do not provide any info at all, it is like they were never played. The strength comparison comes purely from the the ratio of wins and losses, and the more you have of those, the smaller the statistical error in it will be.Noise increases, so draw rate goes down and statistical significance goes with it. You'll have to increase the number of games to get the same reliability.
That depends on the program. If a program uses a unique unit every cycle, there would be no speedup at all if you run two of those on one core.Last time I checked hyperthreading did not give 50% more speed.
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
No, win+loss is worse than 2xdraw.hgm wrote: ↑Sun Apr 16, 2023 10:16 pm Statistical significance goes up if draw rate goes down. Drawn games do not provide any info at all, it is like they were never played. The strength comparison comes purely from the the ratio of wins and losses, and the more you have of those, the smaller the statistical error in it will be.
My example was:
6700K: (wins - losses) / sqrt(wins + losses) = (10 - 0) / sqrt(10) = 10 / sqrt(10).On your 6700K, A wins 10 games, draws 90 games. "Outcome" is 55-45..
On your 7950X, A wins 55 games, loses 45 games. "Outcome" is 55-45.
7950X: (wins - losses) / sqrt(wins + losses) = (55-45)/ (sqrt(55+45)) = 10 / sqrt(110).
So you get much more information from the 10 decided games on the 6700K than from the 100 decided games on the (noisy) 7950X.
In practice it won't be this extreme, but the point is that a lower draw rate as a result of noise decreases statistical significance. (And nothing else would make sense.)
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
That is not a relevant comparison. Of course the result would not stay the same if you decrease draw rate by adding noise. The noisy result would be more like 60+/30=/10-.
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
Well, I know you are the last person in the world willing to admit being wrong. You seem to believe the universe would explode if you do.
Going from LTC to STC effectively increases noise. I don't think the general experience is that the rating difference is affected by a lot, i.e. the outcome in terms of point difference is essentially the same. But of course the draw rate goes down and with the draw rate the statistical significance (at same number of games).
hgm wrote:Statistical significance goes up if draw rate goes down. Drawn games do not provide any info at all, it is like they were never played. The strength comparison comes purely from the the ratio of wins and losses, and the more you have of those, the smaller the statistical error in it will be.
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
So when you run out of arguments, you think an ad hominem will prove your point?
The rating differences are affected a lot, and testers abundantly complained about this. Rating scales get compressed when the draw rate goes up. People have even proposed new rating systems (discounting draws) to combat this effect.
The LOS usually does not suffer much, because error bars get similarly compressed. The assumption that noise will turn all draws in wins and losses in an exact 50-50 ratio, even if one of the engines is far stronger, is just wrong.
The rating differences are affected a lot, and testers abundantly complained about this. Rating scales get compressed when the draw rate goes up. People have even proposed new rating systems (discounting draws) to combat this effect.
The LOS usually does not suffer much, because error bars get similarly compressed. The assumption that noise will turn all draws in wins and losses in an exact 50-50 ratio, even if one of the engines is far stronger, is just wrong.
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
Just a factual observation.
You started out with "wrong on several levels" and now you're just slowly retreating and changing position.
You were arguing that LOS went up...The LOS usually does not suffer much, because error bars get similarly compressed.
The 10-0-90 example was extreme, as I wrote, but still there is no reason to expect that considerable noise favours the stronger engine.The assumption that noise will turn all draws in wins and losses in an exact 50-50 ratio, even if one of the engines is far stronger, is just wrong.
In real tests the Elo difference will be very small, which means even small noise will have a considerable impact, and it will just lead to more wins AND losses. Again, win+loss is much worse than 2xdraw.
This is wrong:
A draw does not give information, but win and loss decreases information.hgm wrote:Statistical significance goes up if draw rate goes down. Drawn games do not provide any info at all, it is like they were never played. The strength comparison comes purely from the the ratio of wins and losses, and the more you have of those, the smaller the statistical error in it will be.
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
You must misunderstand what I wrote, because I did not retreat from anything. You are still arguing from the totally unrealistic fiction that 'noise' (e.g. by lower quality games because of faster TC) would replace draws by an equal number of wins and losses. But it won't if the engines are not equally strong. If one of the engines was really that much stronger as your oh-so-reliable 10/90/0 result suggest, resolving the draws would more likely result in 95/0/5. Not in the 55/0/45 that you compare it with.
And I did not say anything about LOS, other than that is is determined by the number of wins and losses, before the sentence you qoute. This is just your imagination.
And I did not say anything about LOS, other than that is is determined by the number of wins and losses, before the sentence you qoute. This is just your imagination.
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
What is totally unrealisitic is that adding noise would help. Just plug in a random generator and you'll instantly get more accurate statistics. Right.
Both intuitively and in reality noise is unwelcome. That you are confused by the knowledge that "draws do not count" is just that, it confused you.
Some time ago you were suggesting that 1 Elo difference could not be measured. Clearly it can be with the right testing set up and enough games. But noise is not going to help there.
-
- Posts: 40
- Joined: Fri Apr 16, 2021 4:44 pm
- Full name: Jakob Progsch
Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost
I wouldn't trust a situation where different engines share a physical cpu core. For two instances of the same engine it seems reasonable to assume the instruction slots will even out between them (although even there I'd want to do some experiments). But I don't think one can expect "fair" instruction scheduling for different engines that will have a different instruction mix. Also the presence of wide vector instructions for example can affect boost behavior and result in behavior bleeding from one engine into the other.