Writing to a Text File (Thread Safe)

Don · Post by **Don** » Sun Aug 11, 2013 8:56 pm

syzygy wrote:
Don wrote:You asked me where I got the 512 from. I found it from the PIPE man page, but presumably the posix standard does not impose this limit on printf statements for conforming C libraries. This applies to ALL writes, not just printf:
Code: Select all
PIPE_BUF

POSIX.1-2001 says that write&#40;2&#41;s of less than PIPE_BUF bytes must be atomic&#58; the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic&#58; the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. &#40;On Linux, PIPE_BUF is 4096 bytes.) 
Ok, that explains the 512. But note that this only applies to write()s to pipes and not to printf() (which is neither a write(), nor outputs to a pipe).

I'm probably wrong in my thinking but I always thought that semantically sending output with fprintf was the same as writing to a pipe. Otherwise, it seems silly to create a pipe and fork the program and write to it with fprintf - then it IS a pipe and yet the program has not changed (except for the fork.)

Once you use (FILE *) objects, I/O is buffered by the C library and the connection with write() is lost. Conceivably, printf() could insert data into the buffer character by character interleaved with characters written by other threads. Or printf() could issue a separate write() for each 10 characters. It is only the atomicity guarantee for printf() mandated by POSIX that prevents this.

It might still be wise to use locks and not depend on printf() being atomic if you want to run on Windows.

I have no idea how Windows does this so unless I knew I would just go ahead and do the locks too.

Henk · Post by **Henk** » Sun Aug 11, 2013 10:56 pm

Locking in Unix operating systems has something to do with semaphores.
In .net it's something like Monitor.Lock()

bob · Post by **bob** » Sun Aug 11, 2013 11:12 pm

syzygy wrote:
Don wrote:
syzygy wrote:
Don wrote:I think in linux a small write is atomic. There is an internal OS defined write buffer size (I think it's called PIPE_BUFFER_SIZE or something like that) which determines how much is written on one go. So if you had 100 different processes writing to the end of a text file I don't think they would get mixed together - but I'm not sure about the order they get written if that is an issue for you. I don't know how other OS's do it.

I believe the size of that is defined to be 512 and may be a POSIX standard.
It is not sufficient (and in fact not so important from the programmer's point of view) that the OS implements atomic writes. The programmer usually does not invoke the OS directly. The question is what the C-library (or e.g. the C# runtime in case of C#) guarantees.

If you use streams (FILE *) in C, then POSIX guarantees that stream operations are atomic (link). So if two threads each perform a single fprintf(), there is a guarantee that they will be executed sequentially, i.e. the program will not crash, and the two output strings will not be interleaved. However, the moment one thread uses two separate fprintf()s that need to stay together, you will need to lock.
I think my understand is exactly the same as yours on this.
I don't think so.

Therefore, if you are doing simple logging with short lines less than 512 bytes - you do not need a critical section, just use printf they way you normally would. If you need to printf several lines that must stay grouped together for context you need to use a critical section even if the total you intend to write is less than 512 bytes.
No, the question is only whether you use one single printf() or multiple printf()s. Single printf()s are guaranteed to be executed atomically. This guarantee is provided by the (POSIX-compliant) C library. There is no 512 byte issue here. And it has really nothing to do with how the OS performs low level writes.

Otherwise printf lines from other threads my get placed between the lines you want grouped together. If you can group them together into a single string with a single printf and they total less than 512 you don't need a critical section - but you really need to be quite sure of that before printing and it's probably more trouble to check than then it's worth.
I don't know where you got the "512" from...

If you look at Crafty's utility.c file, you can see my "Print9) function. It uses printf() to send a message to the console, and fprintf() to send it to a file. I can guarantee you with 100% accuracy, fprintf() is absolutely NOT atomic. I have inadvertently done fprintf()s from more than one thread and ended up with corrupted log files. The problem is that each thread will access the descriptor, discover where the next byte to write is, and then write there. Then the byte offset gets updated. Sometimes the two writes overwrite the same area of the file which might leave part of the first write following the second one, if the first one was longer. But then updating the file position pointer skips too much leaving null/garbage bytes in the file.

And there's no question this still happens ad when debugging some of the 23.6 changes, I was seeing corrupted log files. Solution was a simple lock before and unlock after the fprintf().

This was failing on my macbook (os/x) and on our cluster (different flavors of linux depending on the cluster) and also I saw it on ICC using the 8-core box I generally play on (another linux version)...

syzygy · Post by **syzygy** » Sun Aug 11, 2013 11:14 pm

Don wrote:
syzygy wrote:
Don wrote:You asked me where I got the 512 from. I found it from the PIPE man page, but presumably the posix standard does not impose this limit on printf statements for conforming C libraries. This applies to ALL writes, not just printf:
Code: Select all
PIPE_BUF

POSIX.1-2001 says that write&#40;2&#41;s of less than PIPE_BUF bytes must be atomic&#58; the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic&#58; the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. &#40;On Linux, PIPE_BUF is 4096 bytes.) 
Ok, that explains the 512. But note that this only applies to write()s to pipes and not to printf() (which is neither a write(), nor outputs to a pipe).
I'm probably wrong in my thinking but I always thought that semantically sending output with fprintf was the same as writing to a pipe. Otherwise, it seems silly to create a pipe and fork the program and write to it with fprintf - then it IS a pipe and yet the program has not changed (except for the fork.)

(FILE *) is a buffering wrapper around a file descriptor. (This buffering does not refer to OS buffering but to buffering implemented by the C library.) File descriptors can refer to regular files, directories, devices, sockets and pipes. Write() operates directly on a file descriptor, i.e. without the buffering of (FILE *) operations. Printf() operates on (FILE *) objects.

If stdout is (a buffering wrapper around) a pipe to another process, then printf will indeed end up write()ing to a pipe, but there is really no reason to expect that atomicity of write() gives atomicity of printf(). For all we know a single printf() can result in 100 write()s or 100 printf()s might be bundled into a single write(). (Of course fflush() gives some control on this.)

All file descriptors have semantical similarities, but that does not mean that they share the same atomicity properties. E.g, write()s to pipes of fewer than PIPE_BUF bytes are atomic, but it seems there is no POSIX atomicity guarantee for write()s to files.

In any event, the fact that (FILE *) is buffering wrapper around a file descriptor ensures that any atomicity of operations on the underlying file descriptor do not translate to atomicity of operations on the (FILE *) object. However, POSIX does guarantee atomicity of operation on (FILE *) objects.

bob · Post by **bob** » Sun Aug 11, 2013 11:16 pm

syzygy wrote:
Don wrote:You asked me where I got the 512 from. I found it from the PIPE man page, but presumably the posix standard does not impose this limit on printf statements for conforming C libraries. This applies to ALL writes, not just printf:
Code: Select all
PIPE_BUF

POSIX.1-2001 says that write&#40;2&#41;s of less than PIPE_BUF bytes must be atomic&#58; the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic&#58; the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. &#40;On Linux, PIPE_BUF is 4096 bytes.) 
Ok, that explains the 512. But note that this only applies to write()s to pipes and not to printf() (which is neither a write(), nor outputs to a pipe). Once you use (FILE *) objects, I/O is buffered by the C library and the connection with write() is lost. Conceivably, printf() could insert data into the buffer character by character interleaved with characters written by other threads. Or printf() could issue a separate write() for each 10 characters. It is only the atomicity guarantee for printf() mandated by POSIX that prevents this.

It might still be wise to use locks and not depend on printf() being atomic if you want to run on Windows.

Actually printf() IS a write(). The write is just in the C library. That's one of the causes of the winboard protocol buffering issues most have as they tend to use printf() and scant() to write/read data. Only reasonable way to read is to use read() and bypass all but system buffering.

bob · Post by **bob** » Sun Aug 11, 2013 11:18 pm

Don wrote:
syzygy wrote:
Don wrote:You asked me where I got the 512 from. I found it from the PIPE man page, but presumably the posix standard does not impose this limit on printf statements for conforming C libraries. This applies to ALL writes, not just printf:
Code: Select all
PIPE_BUF

POSIX.1-2001 says that write&#40;2&#41;s of less than PIPE_BUF bytes must be atomic&#58; the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic&#58; the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. &#40;On Linux, PIPE_BUF is 4096 bytes.) 
Ok, that explains the 512. But note that this only applies to write()s to pipes and not to printf() (which is neither a write(), nor outputs to a pipe).
I'm probably wrong in my thinking but I always thought that semantically sending output with fprintf was the same as writing to a pipe. Otherwise, it seems silly to create a pipe and fork the program and write to it with fprintf - then it IS a pipe and yet the program has not changed (except for the fork.)

Once you use (FILE *) objects, I/O is buffered by the C library and the connection with write() is lost. Conceivably, printf() could insert data into the buffer character by character interleaved with characters written by other threads. Or printf() could issue a separate write() for each 10 characters. It is only the atomicity guarantee for printf() mandated by POSIX that prevents this.

It might still be wise to use locks and not depend on printf() being atomic if you want to run on Windows.
I have no idea how Windows does this so unless I knew I would just go ahead and do the locks too.

This is mangled. printf simply does (internally) a write to stdout (descriptor 1). If you create a pipe, use dup2() to overwrite the pipe file descriptor over stdout, then any printf now just writes to the pipe instead. And nobody knows otherwise.

Interleaved characters are not the only issue. I gave the other one elsewhere in this thread, and it is an absolute issue that can be shown easily.

Don · Post by **Don** » Sun Aug 11, 2013 11:21 pm

bob wrote:
syzygy wrote:
Don wrote:
syzygy wrote:
Don wrote:I think in linux a small write is atomic. There is an internal OS defined write buffer size (I think it's called PIPE_BUFFER_SIZE or something like that) which determines how much is written on one go. So if you had 100 different processes writing to the end of a text file I don't think they would get mixed together - but I'm not sure about the order they get written if that is an issue for you. I don't know how other OS's do it.

I believe the size of that is defined to be 512 and may be a POSIX standard.
It is not sufficient (and in fact not so important from the programmer's point of view) that the OS implements atomic writes. The programmer usually does not invoke the OS directly. The question is what the C-library (or e.g. the C# runtime in case of C#) guarantees.

If you use streams (FILE *) in C, then POSIX guarantees that stream operations are atomic (link). So if two threads each perform a single fprintf(), there is a guarantee that they will be executed sequentially, i.e. the program will not crash, and the two output strings will not be interleaved. However, the moment one thread uses two separate fprintf()s that need to stay together, you will need to lock.
I think my understand is exactly the same as yours on this.
I don't think so.

Therefore, if you are doing simple logging with short lines less than 512 bytes - you do not need a critical section, just use printf they way you normally would. If you need to printf several lines that must stay grouped together for context you need to use a critical section even if the total you intend to write is less than 512 bytes.
No, the question is only whether you use one single printf() or multiple printf()s. Single printf()s are guaranteed to be executed atomically. This guarantee is provided by the (POSIX-compliant) C library. There is no 512 byte issue here. And it has really nothing to do with how the OS performs low level writes.

Otherwise printf lines from other threads my get placed between the lines you want grouped together. If you can group them together into a single string with a single printf and they total less than 512 you don't need a critical section - but you really need to be quite sure of that before printing and it's probably more trouble to check than then it's worth.
I don't know where you got the "512" from...
If you look at Crafty's utility.c file, you can see my "Print9) function. It uses printf() to send a message to the console, and fprintf() to send it to a file. I can guarantee you with 100% accuracy, fprintf() is absolutely NOT atomic. I have inadvertently done fprintf()s from more than one thread and ended up with corrupted log files. The problem is that each thread will access the descriptor, discover where the next byte to write is, and then write there. Then the byte offset gets updated. Sometimes the two writes overwrite the same area of the file which might leave part of the first write following the second one, if the first one was longer. But then updating the file position pointer skips too much leaving null/garbage bytes in the file.

And there's no question this still happens ad when debugging some of the 23.6 changes, I was seeing corrupted log files. Solution was a simple lock before and unlock after the fprintf().

This was failing on my macbook (os/x) and on our cluster (different flavors of linux depending on the cluster) and also I saw it on ICC using the 8-core box I generally play on (another linux version)...

Have you done this test in the last 10 years? The documentation does say that printf is atomic so something does not quite wash. I have been burned many times by documentation though ....

syzygy · Post by **syzygy** » Sun Aug 11, 2013 11:57 pm

bob wrote:Actually printf() IS a write(). The write is just in the C library. That's one of the causes of the winboard protocol buffering issues most have as they tend to use printf() and scant() to write/read data. Only reasonable way to read is to use read() and bypass all but system buffering.

Fine with me if you want to equate printf() to write(), but for this discussion this is pretty useless.

syzygy · Post by **syzygy** » Mon Aug 12, 2013 12:13 am

Don wrote:
bob wrote:I can guarantee you with 100% accuracy, fprintf() is absolutely NOT atomic.
Have you done this test in the last 10 years? The documentation does say that printf is atomic so something does not quite wash. I have been burned many times by documentation though ....

I have not done any testing myself, but from what I am reading fprintf() really is atomic on POSIX-compliant platforms at the level of the (FILE *) object.

This means that if you fprintf() to the same (FILE *) object from multiple threads within a single process, characters from multiple output strings are not interleaved.

However, if multiple processes perform printf()s to private (FILE *) objects all wrapping the same shared file descriptor, then there is no atomicity guarantee. Each process is locking on its own (FILE *) object which is not going to prevent concurrent write()s to the shared file descriptor of partial lines.

So maybe Bob is recalling an experience when Crafty was using processes instead of threads.

sje · Post by **sje** » Mon Aug 12, 2013 12:48 am

syzygy wrote:So maybe Bob is recalling an experience when Crafty was using processes instead of threads.

At one time, a Linux thread was really just a lightweight process. This may have changed.

Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)

Re: Writing to a Text File (Thread Safe)