files.txt 4.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
  1. File management in the Linux kernel
  2. -----------------------------------
  3. This document describes how locking for files (struct file)
  4. and file descriptor table (struct files) works.
  5. Up until 2.6.12, the file descriptor table has been protected
  6. with a lock (files->file_lock) and reference count (files->count).
  7. ->file_lock protected accesses to all the file related fields
  8. of the table. ->count was used for sharing the file descriptor
  9. table between tasks cloned with CLONE_FILES flag. Typically
  10. this would be the case for posix threads. As with the common
  11. refcounting model in the kernel, the last task doing
  12. a put_files_struct() frees the file descriptor (fd) table.
  13. The files (struct file) themselves are protected using
  14. reference count (->f_count).
  15. In the new lock-free model of file descriptor management,
  16. the reference counting is similar, but the locking is
  17. based on RCU. The file descriptor table contains multiple
  18. elements - the fd sets (open_fds and close_on_exec, the
  19. array of file pointers, the sizes of the sets and the array
  20. etc.). In order for the updates to appear atomic to
  21. a lock-free reader, all the elements of the file descriptor
  22. table are in a separate structure - struct fdtable.
  23. files_struct contains a pointer to struct fdtable through
  24. which the actual fd table is accessed. Initially the
  25. fdtable is embedded in files_struct itself. On a subsequent
  26. expansion of fdtable, a new fdtable structure is allocated
  27. and files->fdtab points to the new structure. The fdtable
  28. structure is freed with RCU and lock-free readers either
  29. see the old fdtable or the new fdtable making the update
  30. appear atomic. Here are the locking rules for
  31. the fdtable structure -
  32. 1. All references to the fdtable must be done through
  33. the files_fdtable() macro :
  34. struct fdtable *fdt;
  35. rcu_read_lock();
  36. fdt = files_fdtable(files);
  37. ....
  38. if (n <= fdt->max_fds)
  39. ....
  40. ...
  41. rcu_read_unlock();
  42. files_fdtable() uses rcu_dereference() macro which takes care of
  43. the memory barrier requirements for lock-free dereference.
  44. The fdtable pointer must be read within the read-side
  45. critical section.
  46. 2. Reading of the fdtable as described above must be protected
  47. by rcu_read_lock()/rcu_read_unlock().
  48. 3. For any update to the fd table, files->file_lock must
  49. be held.
  50. 4. To look up the file structure given an fd, a reader
  51. must use either fcheck() or fcheck_files() APIs. These
  52. take care of barrier requirements due to lock-free lookup.
  53. An example :
  54. struct file *file;
  55. rcu_read_lock();
  56. file = fcheck(fd);
  57. if (file) {
  58. ...
  59. }
  60. ....
  61. rcu_read_unlock();
  62. 5. Handling of the file structures is special. Since the look-up
  63. of the fd (fget()/fget_light()) are lock-free, it is possible
  64. that look-up may race with the last put() operation on the
  65. file structure. This is avoided using atomic_long_inc_not_zero()
  66. on ->f_count :
  67. rcu_read_lock();
  68. file = fcheck_files(files, fd);
  69. if (file) {
  70. if (atomic_long_inc_not_zero(&file->f_count))
  71. *fput_needed = 1;
  72. else
  73. /* Didn't get the reference, someone's freed */
  74. file = NULL;
  75. }
  76. rcu_read_unlock();
  77. ....
  78. return file;
  79. atomic_long_inc_not_zero() detects if refcounts is already zero or
  80. goes to zero during increment. If it does, we fail
  81. fget()/fget_light().
  82. 6. Since both fdtable and file structures can be looked up
  83. lock-free, they must be installed using rcu_assign_pointer()
  84. API. If they are looked up lock-free, rcu_dereference()
  85. must be used. However it is advisable to use files_fdtable()
  86. and fcheck()/fcheck_files() which take care of these issues.
  87. 7. While updating, the fdtable pointer must be looked up while
  88. holding files->file_lock. If ->file_lock is dropped, then
  89. another thread expand the files thereby creating a new
  90. fdtable and making the earlier fdtable pointer stale.
  91. For example :
  92. spin_lock(&files->file_lock);
  93. fd = locate_fd(files, file, start);
  94. if (fd >= 0) {
  95. /* locate_fd() may have expanded fdtable, load the ptr */
  96. fdt = files_fdtable(files);
  97. __set_open_fd(fd, fdt);
  98. __clear_close_on_exec(fd, fdt);
  99. spin_unlock(&files->file_lock);
  100. .....
  101. Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
  102. the fdtable pointer (fdt) must be loaded after locate_fd().