My experience (win 7) is that reading files in text mode is relatively slow even if the file is cached because windows handles a lot of special characters + there are copies being made. I don't have the data right now but I have definitely seen file reads in the order of 10% of runtime when processing PGNs at ~100MB/s (which means about +10% worst case speed improvement from offloading that part to a different thread). Admittadly, using a binary mode does close that gap significantly, so if someone is designing a parser I'd suggest going this way and handling the special characters in the parser instead. (mmap only operates in binary mode).Fulvio wrote: ↑Mon Nov 15, 2021 6:55 pmIn my experience (SCID reads PGN files in 128kb chunks, automatically doubling the buffer up to 128MB if it encounters larger games) O.S. are pretty good at optimizing point 1. Moving it to a separate thread may increase complexity without improving the performance.
Overall I don't like to rely on OS caching the file as I have less feedback about whether the data is ready or not. When doing asynchronous IO explicitly you can get better understanding of actual bottlenecks. I'd go as far as to disable caching (so open with FILE_FLAG_NO_BUFFERING on windows) for large files that are to be read sequentially (and rely on explicit buffers with async reads) as to not fill the memory available for file caches with useless stuff. This makes mmap undesirable as it forces caching.