NPS is really a poor metric to measure quality of YBWC implementation: you just raise minimum split depth and you get all the nps that you want.bob wrote: Your YBWC is NOT "state of the art". Talk to some of your guys that are doing actual measurements. Louis Z. comes to mind For example, my current code is showing a 19.5x NPS speedup on 20 cores. How are you doing? When my NPS speedup was stuck at 16x or so a couple of months ago the SMP speedup was hanging at 13x on 20 cores. How are you doing? Current code is faster. I am running the tests now, but it takes a week or so to run them.
Indeed a way to measure the efficency and the low overhead of the YBWC implementation is to see at what minimum split depth you have the optimum. In SF our sweet spot is 4 but recently with some scheme we even went down to minimum split depth of just 2 (!) plies without losing ELO (please note that I have written ELO, not NPS, TTD or other indirect stuff). IMO a near optimal ELO at minimum split depth 2 is a testament to the very efficient copying and locking and minimal overhead of the YBWC implementation. How are you doing?
Regarding lazy SMP I only add that is far from trivial to get something that works well: many SF developers (me included) have tried and failed before one of us, known by alias mbootsector, come up with the good one. Then many people improved above it, as is common in our development model. So the fact that you didn't get success with lazy SMP does not yield a lot of info per se.