First, I tested a bunch of positions using 8 processors to compare NPS speeds, and I was quite surprised to find that the thread-based version is not just faster, but is _significantly_ faster. Below I will give output from one position (all show about the same ratio). I ran the same position 4 times with the process (fork) based version, and 4 times with the new thread-based version. Everything else is identical between the two.
Code: Select all
log.001: time=30.12 mat=0 n=403743991 fh=94% nps=13.4M
log.002: time=31.35 mat=0 n=425472609 fh=94% nps=13.6M
log.003: time=38.80 mat=0 n=515449589 fh=94% nps=13.3M
log.004: time=31.21 mat=0 n=416896300 fh=94% nps=13.4M
log.005: time=19.06 mat=0 n=360009325 fh=94% nps=18.9M
log.006: time=19.35 mat=0 n=365336707 fh=94% nps=18.9M
log.007: time=25.68 mat=0 n=467414358 fh=94% nps=18.2M
log.008: time=16.90 mat=0 n=320243950 fh=94% nps=18.9M
Again, to recap, all 8 runs above are using the same position, same .craftyrc to set the same hash size, etc. Only difference is that the first 4 are version 22.1, last 4 runs are version 22.1 + threads rather than using fork().
huge difference.