Skip to content

Commit 3d9c5e1

Browse files
fdmananagregkh
authored andcommitted
btrfs: fix race between setting last_dir_index_offset and inode logging
[ Upstream commit 59a0dd4 ] At inode_logged() if we find that the inode was not logged before we update its ->last_dir_index_offset to (u64)-1 with the goal that the next directory log operation will see the (u64)-1 and then figure out it must check what was the index of the last logged dir index key and update ->last_dir_index_offset to that key's offset (this is done in update_last_dir_index_offset()). This however has a possibility for a time window where a race can happen and lead to directory logging skipping dir index keys that should be logged. The race happens like this: 1) Task A calls inode_logged(), sees ->logged_trans as 0 and then checks that the inode item was logged before, but before it sets the inode's ->last_dir_index_offset to (u64)-1... 2) Task B is at btrfs_log_inode() which calls inode_logged() early, and that has set ->last_dir_index_offset to (u64)-1; 3) Task B then enters log_directory_changes() which calls update_last_dir_index_offset(). There it sees ->last_dir_index_offset is (u64)-1 and that the inode was logged before (ctx->logged_before is true), and so it searches for the last logged dir index key in the log tree and it finds that it has an offset (index) value of N, so it sets ->last_dir_index_offset to N, so that we can skip index keys that are less than or equal to N (later at process_dir_items_leaf()); 4) Task A now sets ->last_dir_index_offset to (u64)-1, undoing the update that task B just did; 5) Task B will now skip every index key when it enters process_dir_items_leaf(), since ->last_dir_index_offset is (u64)-1. Fix this by making inode_logged() not touch ->last_dir_index_offset and initializing it to 0 when an inode is loaded (at btrfs_alloc_inode()) and then having update_last_dir_index_offset() treat a value of 0 as meaning we must check the log tree and update with the index of the last logged index key. This is fine since the minimum possible value for ->last_dir_index_offset is 1 (BTRFS_DIR_START_INDEX - 1 = 2 - 1 = 1). This also simplifies the management of ->last_dir_index_offset and now all accesses to it are done under the inode's log_mutex. Fixes: 0f8ce49 ("btrfs: avoid inode logging during rename and link when possible") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent 37c4910 commit 3d9c5e1

File tree

3 files changed

+4
-16
lines changed

3 files changed

+4
-16
lines changed

fs/btrfs/btrfs_inode.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ struct btrfs_inode {
247247
u64 new_delalloc_bytes;
248248
/*
249249
* The offset of the last dir index key that was logged.
250-
* This is used only for directories.
250+
* This is used only for directories. Protected by 'log_mutex'.
251251
*/
252252
u64 last_dir_index_offset;
253253
};

fs/btrfs/inode.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7767,6 +7767,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
77677767
ei->last_sub_trans = 0;
77687768
ei->logged_trans = 0;
77697769
ei->delalloc_bytes = 0;
7770+
/* new_delalloc_bytes and last_dir_index_offset are in a union. */
77707771
ei->new_delalloc_bytes = 0;
77717772
ei->defrag_bytes = 0;
77727773
ei->disk_i_size = 0;

fs/btrfs/tree-log.c

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3448,19 +3448,6 @@ static int inode_logged(const struct btrfs_trans_handle *trans,
34483448
inode->logged_trans = trans->transid;
34493449
spin_unlock(&inode->lock);
34503450

3451-
/*
3452-
* If it's a directory, then we must set last_dir_index_offset to the
3453-
* maximum possible value, so that the next attempt to log the inode does
3454-
* not skip checking if dir index keys found in modified subvolume tree
3455-
* leaves have been logged before, otherwise it would result in attempts
3456-
* to insert duplicate dir index keys in the log tree. This must be done
3457-
* because last_dir_index_offset is an in-memory only field, not persisted
3458-
* in the inode item or any other on-disk structure, so its value is lost
3459-
* once the inode is evicted.
3460-
*/
3461-
if (S_ISDIR(inode->vfs_inode.i_mode))
3462-
inode->last_dir_index_offset = (u64)-1;
3463-
34643451
return 1;
34653452
}
34663453

@@ -4052,7 +4039,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
40524039

40534040
/*
40544041
* If the inode was logged before and it was evicted, then its
4055-
* last_dir_index_offset is (u64)-1, so we don't the value of the last index
4042+
* last_dir_index_offset is 0, so we don't know the value of the last index
40564043
* key offset. If that's the case, search for it and update the inode. This
40574044
* is to avoid lookups in the log tree every time we try to insert a dir index
40584045
* key from a leaf changed in the current transaction, and to allow us to always
@@ -4068,7 +4055,7 @@ static int update_last_dir_index_offset(struct btrfs_inode *inode,
40684055

40694056
lockdep_assert_held(&inode->log_mutex);
40704057

4071-
if (inode->last_dir_index_offset != (u64)-1)
4058+
if (inode->last_dir_index_offset != 0)
40724059
return 0;
40734060

40744061
if (!ctx->logged_before) {

0 commit comments

Comments
 (0)