123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439 |
- Changes since 2.5.0:
- ---
- [recommended]
- New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
- sb_set_blocksize() and sb_min_blocksize().
- Use them.
- (sb_find_get_block() replaces 2.4's get_hash_table())
- ---
- [recommended]
- New methods: ->alloc_inode() and ->destroy_inode().
- Remove inode->u.foo_inode_i
- Declare
- struct foo_inode_info {
- /* fs-private stuff */
- struct inode vfs_inode;
- };
- static inline struct foo_inode_info *FOO_I(struct inode *inode)
- {
- return list_entry(inode, struct foo_inode_info, vfs_inode);
- }
- Use FOO_I(inode) instead of &inode->u.foo_inode_i;
- Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate
- foo_inode_info and return the address of ->vfs_inode, the latter should free
- FOO_I(inode) (see in-tree filesystems for examples).
- Make them ->alloc_inode and ->destroy_inode in your super_operations.
- Keep in mind that now you need explicit initialization of private data
- typically between calling iget_locked() and unlocking the inode.
- At some point that will become mandatory.
- ---
- [mandatory]
- Change of file_system_type method (->read_super to ->get_sb)
- ->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
- Turn your foo_read_super() into a function that would return 0 in case of
- success and negative number in case of error (-EINVAL unless you have more
- informative error value to report). Call it foo_fill_super(). Now declare
- int foo_get_sb(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data, struct vfsmount *mnt)
- {
- return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
- mnt);
- }
- (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
- filesystem).
- Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
- foo_get_sb.
- ---
- [mandatory]
- Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
- Most likely there is no need to change anything, but if you relied on
- global exclusion between renames for some internal purpose - you need to
- change your internal locking. Otherwise exclusion warranties remain the
- same (i.e. parents and victim are locked, etc.).
- ---
- [informational]
- Now we have the exclusion between ->lookup() and directory removal (by
- ->rmdir() and ->rename()). If you used to need that exclusion and do
- it by internal locking (most of filesystems couldn't care less) - you
- can relax your locking.
- ---
- [mandatory]
- ->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
- ->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
- and ->readdir() are called without BKL now. Grab it on entry, drop upon return
- - that will guarantee the same locking you used to have. If your method or its
- parts do not need BKL - better yet, now you can shift lock_kernel() and
- unlock_kernel() so that they would protect exactly what needs to be
- protected.
- ---
- [mandatory]
- BKL is also moved from around sb operations. ->write_super() Is now called
- without BKL held. BKL should have been shifted into individual fs sb_op
- functions. If you don't need it, remove it.
- ---
- [informational]
- check for ->link() target not being a directory is done by callers. Feel
- free to drop it...
- ---
- [informational]
- ->link() callers hold ->i_mutex on the object we are linking to. Some of your
- problems might be over...
- ---
- [mandatory]
- new file_system_type method - kill_sb(superblock). If you are converting
- an existing filesystem, set it according to ->fs_flags:
- FS_REQUIRES_DEV - kill_block_super
- FS_LITTER - kill_litter_super
- neither - kill_anon_super
- FS_LITTER is gone - just remove it from fs_flags.
- ---
- [mandatory]
- FS_SINGLE is gone (actually, that had happened back when ->get_sb()
- went in - and hadn't been documented ;-/). Just remove it from fs_flags
- (and see ->get_sb() entry for other actions).
- ---
- [mandatory]
- ->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so
- watch for ->i_mutex-grabbing code that might be used by your ->setattr().
- Callers of notify_change() need ->i_mutex now.
- ---
- [recommended]
- New super_block field "struct export_operations *s_export_op" for
- explicit support for exporting, e.g. via NFS. The structure is fully
- documented at its declaration in include/linux/fs.h, and in
- Documentation/filesystems/nfs/Exporting.
- Briefly it allows for the definition of decode_fh and encode_fh operations
- to encode and decode filehandles, and allows the filesystem to use
- a standard helper function for decode_fh, and provide file-system specific
- support for this helper, particularly get_parent.
- It is planned that this will be required for exporting once the code
- settles down a bit.
- [mandatory]
- s_export_op is now required for exporting a filesystem.
- isofs, ext2, ext3, resierfs, fat
- can be used as examples of very different filesystems.
- ---
- [mandatory]
- iget4() and the read_inode2 callback have been superseded by iget5_locked()
- which has the following prototype,
- struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
- int (*test)(struct inode *, void *),
- int (*set)(struct inode *, void *),
- void *data);
- 'test' is an additional function that can be used when the inode
- number is not sufficient to identify the actual file object. 'set'
- should be a non-blocking function that initializes those parts of a
- newly created inode to allow the test function to succeed. 'data' is
- passed as an opaque value to both test and set functions.
- When the inode has been created by iget5_locked(), it will be returned with the
- I_NEW flag set and will still be locked. The filesystem then needs to finalize
- the initialization. Once the inode is initialized it must be unlocked by
- calling unlock_new_inode().
- The filesystem is responsible for setting (and possibly testing) i_ino
- when appropriate. There is also a simpler iget_locked function that
- just takes the superblock and inode number as arguments and does the
- test and set for you.
- e.g.
- inode = iget_locked(sb, ino);
- if (inode->i_state & I_NEW) {
- err = read_inode_from_disk(inode);
- if (err < 0) {
- iget_failed(inode);
- return err;
- }
- unlock_new_inode(inode);
- }
- Note that if the process of setting up a new inode fails, then iget_failed()
- should be called on the inode to render it dead, and an appropriate error
- should be passed back to the caller.
- ---
- [recommended]
- ->getattr() finally getting used. See instances in nfs, minix, etc.
- ---
- [mandatory]
- ->revalidate() is gone. If your filesystem had it - provide ->getattr()
- and let it call whatever you had as ->revlidate() + (for symlinks that
- had ->revalidate()) add calls in ->follow_link()/->readlink().
- ---
- [mandatory]
- ->d_parent changes are not protected by BKL anymore. Read access is safe
- if at least one of the following is true:
- * filesystem has no cross-directory rename()
- * we know that parent had been locked (e.g. we are looking at
- ->d_parent of ->lookup() argument).
- * we are called from ->rename().
- * the child's ->d_lock is held
- Audit your code and add locking if needed. Notice that any place that is
- not protected by the conditions above is risky even in the old tree - you
- had been relying on BKL and that's prone to screwups. Old tree had quite
- a few holes of that kind - unprotected access to ->d_parent leading to
- anything from oops to silent memory corruption.
- ---
- [mandatory]
- FS_NOMOUNT is gone. If you use it - just set MS_NOUSER in flags
- (see rootfs for one kind of solution and bdev/socket/pipe for another).
- ---
- [recommended]
- Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter
- is still alive, but only because of the mess in drivers/s390/block/dasd.c.
- As soon as it gets fixed is_read_only() will die.
- ---
- [mandatory]
- ->permission() is called without BKL now. Grab it on entry, drop upon
- return - that will guarantee the same locking you used to have. If
- your method or its parts do not need BKL - better yet, now you can
- shift lock_kernel() and unlock_kernel() so that they would protect
- exactly what needs to be protected.
- ---
- [mandatory]
- ->statfs() is now called without BKL held. BKL should have been
- shifted into individual fs sb_op functions where it's not clear that
- it's safe to remove it. If you don't need it, remove it.
- ---
- [mandatory]
- is_read_only() is gone; use bdev_read_only() instead.
- ---
- [mandatory]
- destroy_buffers() is gone; use invalidate_bdev().
- ---
- [mandatory]
- fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is
- deliberate; as soon as struct block_device * is propagated in a reasonable
- way by that code fixing will become trivial; until then nothing can be
- done.
- [mandatory]
- block truncatation on error exit from ->write_begin, and ->direct_IO
- moved from generic methods (block_write_begin, cont_write_begin,
- nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at
- ext2_write_failed and callers for an example.
- [mandatory]
- ->truncate is going away. The whole truncate sequence needs to be
- implemented in ->setattr, which is now mandatory for filesystems
- implementing on-disk size changes. Start with a copy of the old inode_setattr
- and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
- be in order of zeroing blocks using block_truncate_page or similar helpers,
- size update and on finally on-disk truncation which should not fail.
- inode_change_ok now includes the size checks for ATTR_SIZE and must be called
- in the beginning of ->setattr unconditionally.
- [mandatory]
- ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
- be used instead. It gets called whenever the inode is evicted, whether it has
- remaining links or not. Caller does *not* evict the pagecache or inode-associated
- metadata buffers; getting rid of those is responsibility of method, as it had
- been for ->delete_inode().
- ->drop_inode() returns int now; it's called on final iput() with
- inode->i_lock held and it returns true if filesystems wants the inode to be
- dropped. As before, generic_drop_inode() is still the default and it's been
- updated appropriately. generic_delete_inode() is also alive and it consists
- simply of return 1. Note that all actual eviction work is done by caller after
- ->drop_inode() returns.
- clear_inode() is gone; use end_writeback() instead. As before, it must
- be called exactly once on each call of ->evict_inode() (as it used to be for
- each call of ->delete_inode()). Unlike before, if you are using inode-associated
- metadata buffers (i.e. mark_buffer_dirty_inode()), it's your responsibility to
- call invalidate_inode_buffers() before end_writeback().
- No async writeback (and thus no calls of ->write_inode()) will happen
- after end_writeback() returns, so actions that should not overlap with ->write_inode()
- (e.g. freeing on-disk inode if i_nlink is 0) ought to be done after that call.
- NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
- if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput()
- may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
- free the on-disk inode, you may end up doing that while ->write_inode() is writing
- to it.
- ---
- [mandatory]
- .d_delete() now only advises the dcache as to whether or not to cache
- unreferenced dentries, and is now only called when the dentry refcount goes to
- 0. Even on 0 refcount transition, it must be able to tolerate being called 0,
- 1, or more times (eg. constant, idempotent).
- ---
- [mandatory]
- .d_compare() calling convention and locking rules are significantly
- changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
- look at examples of other filesystems) for guidance.
- ---
- [mandatory]
- .d_hash() calling convention and locking rules are significantly
- changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
- look at examples of other filesystems) for guidance.
- ---
- [mandatory]
- dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
- for details of what locks to replace dcache_lock with in order to protect
- particular things. Most of the time, a filesystem only needs ->d_lock, which
- protects *all* the dcache state of a given dentry.
- --
- [mandatory]
- Filesystems must RCU-free their inodes, if they can have been accessed
- via rcu-walk path walk (basically, if the file can have had a path name in the
- vfs namespace).
- Even though i_dentry and i_rcu share storage in a union, we will
- initialize the former in inode_init_always(), so just leave it alone in
- the callback. It used to be necessary to clean it there, but not anymore
- (starting at 3.2).
- --
- [recommended]
- vfs now tries to do path walking in "rcu-walk mode", which avoids
- atomic operations and scalability hazards on dentries and inodes (see
- Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes
- (above) are examples of the changes required to support this. For more complex
- filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
- no changes are required to the filesystem. However, this is costly and loses
- the benefits of rcu-walk mode. We will begin to add filesystem callbacks that
- are rcu-walk aware, shown below. Filesystems should take advantage of this
- where possible.
- --
- [mandatory]
- d_revalidate is a callback that is made on every path element (if
- the filesystem provides it), which requires dropping out of rcu-walk mode. This
- may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
- returned if the filesystem cannot handle rcu-walk. See
- Documentation/filesystems/vfs.txt for more details.
- permission and check_acl are inode permission checks that are called
- on many or all directory inodes on the way down a path walk (to check for
- exec permission). These must now be rcu-walk aware (flags & IPERM_FLAG_RCU).
- See Documentation/filesystems/vfs.txt for more details.
-
- --
- [mandatory]
- In ->fallocate() you must check the mode option passed in. If your
- filesystem does not support hole punching (deallocating space in the middle of a
- file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
- Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
- so the i_size should not change when hole punching, even when puching the end of
- a file off.
- --
- [mandatory]
- ->get_sb() is gone. Switch to use of ->mount(). Typically it's just
- a matter of switching from calling get_sb_... to mount_... and changing the
- function type. If you were doing it manually, just switch from setting ->mnt_root
- to some pointer to returning that pointer. On errors return ERR_PTR(...).
- --
- [mandatory]
- ->permission() and generic_permission()have lost flags
- argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
- generic_permission() has also lost the check_acl argument; ACL checking
- has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
- to read an ACL from disk.
- --
- [mandatory]
- If you implement your own ->llseek() you must handle SEEK_HOLE and
- SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to
- support it in some way. The generic handler assumes that the entire file is
- data and there is a virtual hole at the end of the file. So if the provided
- offset is less than i_size and SEEK_DATA is specified, return the same offset.
- If the above is true for the offset and you are given SEEK_HOLE, return the end
- of the file. If the offset is i_size or greater return -ENXIO in either case.
- [mandatory]
- If you have your own ->fsync() you must make sure to call
- filemap_write_and_wait_range() so that all dirty pages are synced out properly.
- You must also keep in mind that ->fsync() is not called with i_mutex held
- anymore, so if you require i_mutex locking you must make sure to take it and
- release it yourself.
- --
- [mandatory]
- d_alloc_root() is gone, along with a lot of bugs caused by code
- misusing it. Replacement: d_make_root(inode). The difference is,
- d_make_root() drops the reference to inode if dentry allocation fails.
- --
- [mandatory]
- vfs_readdir() is gone; switch to iterate_dir() instead
|