123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172 |
- Mandatory File Locking For The Linux Operating System
- Andy Walker <andy@lysaker.kvaerner.no>
- 15 April 1996
- (Updated September 2007)
- 0. Why you should avoid mandatory locking
- -----------------------------------------
- The Linux implementation is prey to a number of difficult-to-fix race
- conditions which in practice make it not dependable:
- - The write system call checks for a mandatory lock only once
- at its start. It is therefore possible for a lock request to
- be granted after this check but before the data is modified.
- A process may then see file data change even while a mandatory
- lock was held.
- - Similarly, an exclusive lock may be granted on a file after
- the kernel has decided to proceed with a read, but before the
- read has actually completed, and the reading process may see
- the file data in a state which should not have been visible
- to it.
- - Similar races make the claimed mutual exclusion between lock
- and mmap similarly unreliable.
- 1. What is mandatory locking?
- ------------------------------
- Mandatory locking is kernel enforced file locking, as opposed to the more usual
- cooperative file locking used to guarantee sequential access to files among
- processes. File locks are applied using the flock() and fcntl() system calls
- (and the lockf() library routine which is a wrapper around fcntl().) It is
- normally a process' responsibility to check for locks on a file it wishes to
- update, before applying its own lock, updating the file and unlocking it again.
- The most commonly used example of this (and in the case of sendmail, the most
- troublesome) is access to a user's mailbox. The mail user agent and the mail
- transfer agent must guard against updating the mailbox at the same time, and
- prevent reading the mailbox while it is being updated.
- In a perfect world all processes would use and honour a cooperative, or
- "advisory" locking scheme. However, the world isn't perfect, and there's
- a lot of poorly written code out there.
- In trying to address this problem, the designers of System V UNIX came up
- with a "mandatory" locking scheme, whereby the operating system kernel would
- block attempts by a process to write to a file that another process holds a
- "read" -or- "shared" lock on, and block attempts to both read and write to a
- file that a process holds a "write " -or- "exclusive" lock on.
- The System V mandatory locking scheme was intended to have as little impact as
- possible on existing user code. The scheme is based on marking individual files
- as candidates for mandatory locking, and using the existing fcntl()/lockf()
- interface for applying locks just as if they were normal, advisory locks.
- Note 1: In saying "file" in the paragraphs above I am actually not telling
- the whole truth. System V locking is based on fcntl(). The granularity of
- fcntl() is such that it allows the locking of byte ranges in files, in addition
- to entire files, so the mandatory locking rules also have byte level
- granularity.
- Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
- borrowing the fcntl() locking scheme from System V. The mandatory locking
- scheme is defined by the System V Interface Definition (SVID) Version 3.
- 2. Marking a file for mandatory locking
- ---------------------------------------
- A file is marked as a candidate for mandatory locking by setting the group-id
- bit in its file mode but removing the group-execute bit. This is an otherwise
- meaningless combination, and was chosen by the System V implementors so as not
- to break existing user programs.
- Note that the group-id bit is usually automatically cleared by the kernel when
- a setgid file is written to. This is a security measure. The kernel has been
- modified to recognize the special case of a mandatory lock candidate and to
- refrain from clearing this bit. Similarly the kernel has been modified not
- to run mandatory lock candidates with setgid privileges.
- 3. Available implementations
- ----------------------------
- I have considered the implementations of mandatory locking available with
- SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
- Generally I have tried to make the most sense out of the behaviour exhibited
- by these three reference systems. There are many anomalies.
- All the reference systems reject all calls to open() for a file on which
- another process has outstanding mandatory locks. This is in direct
- contravention of SVID 3, which states that only calls to open() with the
- O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
- definition, which is the "Right Thing", since only calls with O_TRUNC can
- modify the contents of the file.
- HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
- just mandatory locks. That would appear to contravene POSIX.1.
- mmap() is another interesting case. All the operating systems mentioned
- prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX
- also disallows advisory locks for such a file. SVID actually specifies the
- paranoid HP-UX behaviour.
- In my opinion only MAP_SHARED mappings should be immune from locking, and then
- only from mandatory locks - that is what is currently implemented.
- SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
- mandatory locks, so reads and writes to locked files always block when they
- should return EAGAIN.
- I'm afraid that this is such an esoteric area that the semantics described
- below are just as valid as any others, so long as the main points seem to
- agree.
- 4. Semantics
- ------------
- 1. Mandatory locks can only be applied via the fcntl()/lockf() locking
- interface - in other words the System V/POSIX interface. BSD style
- locks using flock() never result in a mandatory lock.
- 2. If a process has locked a region of a file with a mandatory read lock, then
- other processes are permitted to read from that region. If any of these
- processes attempts to write to the region it will block until the lock is
- released, unless the process has opened the file with the O_NONBLOCK
- flag in which case the system call will return immediately with the error
- status EAGAIN.
- 3. If a process has locked a region of a file with a mandatory write lock, all
- attempts to read or write to that region block until the lock is released,
- unless a process has opened the file with the O_NONBLOCK flag in which case
- the system call will return immediately with the error status EAGAIN.
- 4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
- any mandatory locks owned by other processes will be rejected with the
- error status EAGAIN.
- 5. Attempts to apply a mandatory lock to a file that is memory mapped and
- shared (via mmap() with MAP_SHARED) will be rejected with the error status
- EAGAIN.
- 6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED)
- that has any mandatory locks in effect will be rejected with the error status
- EAGAIN.
- 5. Which system calls are affected?
- -----------------------------------
- Those which modify a file's contents, not just the inode. That gives read(),
- write(), readv(), writev(), open(), creat(), mmap(), truncate() and
- ftruncate(). truncate() and ftruncate() are considered to be "write" actions
- for the purposes of mandatory locking.
- The affected region is usually defined as stretching from the current position
- for the total number of bytes read or written. For the truncate calls it is
- defined as the bytes of a file removed or added (we must also consider bytes
- added, as a lock can specify just "the whole file", rather than a specific
- range of bytes.)
- Note 3: I may have overlooked some system calls that need mandatory lock
- checking in my eagerness to get this code out the door. Please let me know, or
- better still fix the system calls yourself and submit a patch to me or Linus.
- 6. Warning!
- -----------
- Not even root can override a mandatory lock, so runaway processes can wreak
- havoc if they lock crucial files. The way around it is to change the file
- permissions (remove the setgid bit) before trying to read or write to it.
- Of course, that might be a bit tricky if the system is hung :-(
|