Al Viro (2): do_last(): fetch directory ->i_mode and ->i_uid before it's too late vfs: fix do_last() regression
Amir Goldstein (1): debugfs: simplify __debugfs_remove_file()
Jan Kara (23): jbd2: Fixup stale comment in commit code jbd2: Completely fill journal descriptor blocks ext4: Move marking of handle as sync to ext4_add_nondir() ext4: Do not iput inode under running transaction ext4: Fix ext4_should_journal_data() for EA inodes ext4: Use ext4_journal_extend() instead of jbd2_journal_extend() ext4: Avoid unnecessary revokes in ext4_alloc_branch() ext4: Provide function to handle transaction restarts ext4, jbd2: Provide accessor function for handle credits ocfs2: Use accessor function for h_buffer_credits jbd2: Reorganize jbd2_journal_stop() jbd2: Drop pointless check from jbd2_journal_stop() jbd2: Drop pointless wakeup from jbd2_journal_stop() jbd2: Factor out common parts of stopping and restarting a handle jbd2: Account descriptor blocks into t_outstanding_credits jbd2: Drop jbd2_space_needed() jbd2: Reserve space for revoke descriptor blocks jbd2: Rename h_buffer_credits to h_total_credits jbd2: Make credit checking more strict ext4: Reserve revoke credits for freed blocks jbd2: Provide trace event for handle restarts jbd2: Fine tune estimate of necessary descriptor blocks jbd2: make jbd2_handle_buffer_credits() handle reserved handles
Liu Song (1): jbd2: remove repeated assignments in __jbd2_log_wait_for_space()
Shijie Luo (1): ext4: add cond_resched() to ext4_protect_reserved_inode
Wen Huang (1): libertas: Fix two buffer overflows at parsing bss descriptor
Yufen Yu (2): bdi: fix use-after-free for the bdi device bdi: fix kabi for struct backing_dev_info
yangerkun (1): ext4: reserve revoke credits in __ext4_new_inode
yu kuai (3): block: rename 'q->debugfs_dir' and 'q->blk_trace->dir' in blk_unregister_queue() simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems debugfs: fix kabi for function debugfs_remove_recursive
block/blk-cgroup.c | 7 +- block/blk-sysfs.c | 44 +++++ drivers/net/wireless/marvell/libertas/cfg.c | 16 +- fs/debugfs/inode.c | 126 +++----------- fs/ext4/block_validity.c | 1 + fs/ext4/ext4.h | 4 +- fs/ext4/ext4_jbd2.c | 32 +++- fs/ext4/ext4_jbd2.h | 106 ++++++++++-- fs/ext4/extents.c | 89 ++++++---- fs/ext4/ialloc.c | 4 +- fs/ext4/indirect.c | 125 ++++++++------ fs/ext4/inode.c | 31 +--- fs/ext4/migrate.c | 103 +++++------ fs/ext4/namei.c | 39 +++-- fs/ext4/resize.c | 46 ++--- fs/ext4/xattr.c | 94 ++++------ fs/jbd2/checkpoint.c | 3 +- fs/jbd2/commit.c | 9 +- fs/jbd2/journal.c | 35 +++- fs/jbd2/revoke.c | 6 + fs/jbd2/transaction.c | 254 ++++++++++++++++------------ fs/libfs.c | 65 +++++++ fs/namei.c | 17 +- fs/ocfs2/alloc.c | 32 ++-- fs/ocfs2/journal.c | 8 +- fs/tracefs/inode.c | 111 ++---------- include/linux/backing-dev-defs.h | 7 + include/linux/device.h | 5 + include/linux/fs.h | 2 + include/linux/jbd2.h | 87 ++++++---- include/linux/tracefs.h | 1 - include/trace/events/ext4.h | 13 +- include/trace/events/jbd2.h | 16 +- kernel/trace/trace.c | 4 +- kernel/trace/trace_events.c | 6 +- kernel/trace/trace_hwlat.c | 2 +- mm/backing-dev.c | 49 +++++- 37 files changed, 894 insertions(+), 705 deletions(-)
From: Wen Huang huangwenabc@gmail.com
commit e5e884b42639c74b5b57dc277909915c0aefc8bb upstream.
add_ie_rates() copys rates without checking the length in bss descriptor from remote AP.when victim connects to remote attacker, this may trigger buffer overflow. lbs_ibss_join_existing() copys rates without checking the length in bss descriptor from remote IBSS node.when victim connects to remote attacker, this may trigger buffer overflow. Fix them by putting the length check before performing copy.
This fix addresses CVE-2019-14896 and CVE-2019-14897. This also fix build warning of mixed declarations and code.
Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Wen Huang huangwenabc@gmail.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/net/wireless/marvell/libertas/cfg.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/marvell/libertas/cfg.c b/drivers/net/wireless/marvell/libertas/cfg.c index 57edfad..c9401c1 100644 --- a/drivers/net/wireless/marvell/libertas/cfg.c +++ b/drivers/net/wireless/marvell/libertas/cfg.c @@ -273,6 +273,10 @@ static int lbs_add_supported_rates_tlv(u8 *tlv) int hw, ap, ap_max = ie[1]; u8 hw_rate;
+ if (ap_max > MAX_RATES) { + lbs_deb_assoc("invalid rates\n"); + return tlv; + } /* Advance past IE header */ ie += 2;
@@ -1717,6 +1721,9 @@ static int lbs_ibss_join_existing(struct lbs_private *priv, struct cmd_ds_802_11_ad_hoc_join cmd; u8 preamble = RADIO_PREAMBLE_SHORT; int ret = 0; + int hw, i; + u8 rates_max; + u8 *rates;
/* TODO: set preamble based on scan result */ ret = lbs_set_radio(priv, preamble, 1); @@ -1775,9 +1782,12 @@ static int lbs_ibss_join_existing(struct lbs_private *priv, if (!rates_eid) { lbs_add_rates(cmd.bss.rates); } else { - int hw, i; - u8 rates_max = rates_eid[1]; - u8 *rates = cmd.bss.rates; + rates_max = rates_eid[1]; + if (rates_max > MAX_RATES) { + lbs_deb_join("invalid rates"); + goto out; + } + rates = cmd.bss.rates; for (hw = 0; hw < ARRAY_SIZE(lbs_rates); hw++) { u8 hw_rate = lbs_rates[hw].bitrate / 5; for (i = 0; i < rates_max; i++) {
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 0db45889453644bb5d3e3c6044f4d81b910d41ef category: bugfix bugzilla: 25031 CVE: NA ---------------------------
jbd2_journal_next_log_block() does not look at transaction->t_outstanding_credits. Remove the misleading comment.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-2-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/commit.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index d4e6288..92927950 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -639,8 +639,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
/* * start_this_handle() uses t_outstanding_credits to determine - * the free space in the log, but this counter is changed - * by jbd2_journal_next_log_block() also. + * the free space in the log. */ atomic_dec(&commit_transaction->t_outstanding_credits);
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit b90bfdf581194a0fa5f6c26fef1e522f15f6212e category: bugfix bugzilla: 25031 CVE: NA ---------------------------
With 32-bit block numbers, we don't allocate the array for journal buffer heads large enough for corresponding descriptor tags to fill the descriptor block. Thus we end up writing out half-full descriptor blocks to the journal unnecessarily growing the transaction. Fix the logic to allocate the array large enough.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-3-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/journal.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 10d480d..ae8e991 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1103,6 +1103,16 @@ static void jbd2_stats_proc_exit(journal_t *journal) remove_proc_entry(journal->j_devname, proc_jbd2_stats); }
+/* Minimum size of descriptor tag */ +static int jbd2_min_tag_size(void) +{ + /* + * Tag with 32-bit block numbers does not use last four bytes of the + * structure + */ + return sizeof(journal_block_tag_t) - 4; +} + /* * Management for journal control blocks: functions to create and * destroy journal_t structures, and to initialise and read existing @@ -1161,7 +1171,8 @@ static journal_t *journal_init_common(struct block_device *bdev, journal->j_fs_dev = fs_dev; journal->j_blk_offset = start; journal->j_maxlen = len; - n = journal->j_blocksize / sizeof(journal_block_tag_t); + /* We need enough buffers to write out full descriptor block. */ + n = journal->j_blocksize / jbd2_min_tag_size(); journal->j_wbufsize = n; journal->j_wbuf = kmalloc_array(n, sizeof(struct buffer_head *), GFP_KERNEL);
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit a9e26328adfa82b1f3c941bc6e3daea47631abce category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Every caller of ext4_add_nondir() marks handle as sync if directory has DIRSYNC set. Move this marking to ext4_add_nondir() so reduce some duplication.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-4-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/namei.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 43dcb91..6de5974 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2432,9 +2432,12 @@ static void ext4_dec_count(handle_t *handle, struct inode *inode) static int ext4_add_nondir(handle_t *handle, struct dentry *dentry, struct inode *inode) { + struct inode *dir = d_inode(dentry->d_parent); int err = ext4_add_entry(handle, dentry, inode); if (!err) { ext4_mark_inode_dirty(handle, inode); + if (IS_DIRSYNC(dir)) + ext4_handle_sync(handle); d_instantiate_new(dentry, inode); return 0; } @@ -2475,8 +2478,6 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode, inode->i_fop = &ext4_file_operations; ext4_set_aops(inode); err = ext4_add_nondir(handle, dentry, inode); - if (!err && IS_DIRSYNC(dir)) - ext4_handle_sync(handle); } if (handle) ext4_journal_stop(handle); @@ -2507,8 +2508,6 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry, init_special_inode(inode, inode->i_mode, rdev); inode->i_op = &ext4_special_inode_operations; err = ext4_add_nondir(handle, dentry, inode); - if (!err && IS_DIRSYNC(dir)) - ext4_handle_sync(handle); } if (handle) ext4_journal_stop(handle); @@ -3195,9 +3194,6 @@ static int ext4_symlink(struct inode *dir, } EXT4_I(inode)->i_disksize = inode->i_size; err = ext4_add_nondir(handle, dentry, inode); - if (!err && IS_DIRSYNC(dir)) - ext4_handle_sync(handle); - if (handle) ext4_journal_stop(handle); goto out_free_encrypted_link;
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 9b88f9fb0d2fc8f7e71e75a42c5a064bc6cfffd2 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
When ext4_mkdir(), ext4_symlink(), ext4_create(), or ext4_mknod() fail to add entry into directory, it ends up dropping freshly created inode under the running transaction and thus inode truncation happens under that transaction. That breaks assumptions that evict() does not get called from a transaction context and at least in ext4_symlink() case it can result in inode eviction deadlocking in inode_wait_for_writeback() when flush worker finds symlink inode, starts to write it back and blocks on starting a transaction. So change the code in ext4_mkdir() and ext4_add_nondir() to drop inode reference only after the transaction is stopped. We also have to add inode to the orphan list in that case as otherwise the inode would get leaked in case we crash before inode deletion is committed.
CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-5-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/namei.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 6de5974..40cca4c 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2429,21 +2429,29 @@ static void ext4_dec_count(handle_t *handle, struct inode *inode) }
+/* + * Add non-directory inode to a directory. On success, the inode reference is + * consumed by dentry is instantiation. This is also indicated by clearing of + * *inodep pointer. On failure, the caller is responsible for dropping the + * inode reference in the safe context. + */ static int ext4_add_nondir(handle_t *handle, - struct dentry *dentry, struct inode *inode) + struct dentry *dentry, struct inode **inodep) { struct inode *dir = d_inode(dentry->d_parent); + struct inode *inode = *inodep; int err = ext4_add_entry(handle, dentry, inode); if (!err) { ext4_mark_inode_dirty(handle, inode); if (IS_DIRSYNC(dir)) ext4_handle_sync(handle); d_instantiate_new(dentry, inode); + *inodep = NULL; return 0; } drop_nlink(inode); + ext4_orphan_add(handle, inode); unlock_new_inode(inode); - iput(inode); return err; }
@@ -2477,10 +2485,12 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode, inode->i_op = &ext4_file_inode_operations; inode->i_fop = &ext4_file_operations; ext4_set_aops(inode); - err = ext4_add_nondir(handle, dentry, inode); + err = ext4_add_nondir(handle, dentry, &inode); } if (handle) ext4_journal_stop(handle); + if (!IS_ERR_OR_NULL(inode)) + iput(inode); if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; @@ -2507,10 +2517,12 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry, if (!IS_ERR(inode)) { init_special_inode(inode, inode->i_mode, rdev); inode->i_op = &ext4_special_inode_operations; - err = ext4_add_nondir(handle, dentry, inode); + err = ext4_add_nondir(handle, dentry, &inode); } if (handle) ext4_journal_stop(handle); + if (!IS_ERR_OR_NULL(inode)) + iput(inode); if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; @@ -2663,10 +2675,12 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) if (err) { out_clear_inode: clear_nlink(inode); + ext4_orphan_add(handle, inode); unlock_new_inode(inode); ext4_mark_inode_dirty(handle, inode); + ext4_journal_stop(handle); iput(inode); - goto out_stop; + goto out_retry; } ext4_inc_count(handle, dir); ext4_update_dx_flag(dir); @@ -2680,6 +2694,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) out_stop: if (handle) ext4_journal_stop(handle); +out_retry: if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; @@ -3193,9 +3208,11 @@ static int ext4_symlink(struct inode *dir, inode->i_size = disk_link.len - 1; } EXT4_I(inode)->i_disksize = inode->i_size; - err = ext4_add_nondir(handle, dentry, inode); + err = ext4_add_nondir(handle, dentry, &inode); if (handle) ext4_journal_stop(handle); + if (inode) + iput(inode); goto out_free_encrypted_link;
err_drop_inode:
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 321238fbfb49003c66caecb1eefb5238dce27b61 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Similarly to directories, EA inodes do only journalled modifications to their data. Change ext4_should_journal_data() to return true for them so that we don't have to special-case them during truncate.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-7-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/ext4_jbd2.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 402dc36..d161a0e 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -407,6 +407,7 @@ static inline int ext4_inode_journal_mode(struct inode *inode) return EXT4_INODE_WRITEBACK_DATA_MODE; /* writeback */ /* We do not support data journalling with delayed allocation */ if (!S_ISREG(inode->i_mode) || + ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && !test_opt(inode->i_sb, DELALLOC))) {
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 6cb367c2d1f8875043aa2d238eca9a2602dc1f72 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Use ext4 helper ext4_journal_extend() instead of opencoding it in ext4_try_to_expand_extra_isize().
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-8-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/inode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 58e92e0..a430120 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6027,8 +6027,7 @@ static int ext4_try_to_expand_extra_isize(struct inode *inode, * If this is felt to be critical, then e2fsck should be run to * force a large enough s_min_extra_isize. */ - if (ext4_handle_valid(handle) && - jbd2_journal_extend(handle, + if (ext4_journal_extend(handle, EXT4_DATA_TRANS_BLOCKS(inode->i_sb)) != 0) return -ENOSPC;
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit f2890730f8292831b7741d89a65b9c6834d85ee6 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Error cleanup path in ext4_alloc_branch() calls ext4_forget() on freshly allocated indirect blocks with 'metadata' set to 1. This results in generating revoke records for these blocks. However this is unnecessary as the freed blocks are only allocated in the current transaction and thus they will never be journalled. Make this cleanup path similar to e.g. cleanup in ext4_splice_branch() and use ext4_free_blocks() to handle block forgetting by passing EXT4_FREE_BLOCKS_FORGET and not EXT4_FREE_BLOCKS_METADATA to ext4_free_blocks(). This also allows allocating transaction not to reserve any credits for revoke records.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-9-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/indirect.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c index 6f3f7d5..c547d7a 100644 --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -333,11 +333,14 @@ static int ext4_alloc_branch(handle_t *handle, for (i = 0; i <= indirect_blks; i++) { if (i == indirect_blks) { new_blocks[i] = ext4_mb_new_blocks(handle, ar, &err); - } else + } else { ar->goal = new_blocks[i] = ext4_new_meta_blocks(handle, ar->inode, ar->goal, ar->flags & EXT4_MB_DELALLOC_RESERVED, NULL, &err); + /* Simplify error cleanup... */ + branch[i+1].bh = NULL; + } if (err) { i--; goto failed; @@ -379,18 +382,25 @@ static int ext4_alloc_branch(handle_t *handle, } return 0; failed: + if (i == indirect_blks) { + /* Free data blocks */ + ext4_free_blocks(handle, ar->inode, NULL, new_blocks[i], + ar->len, 0); + i--; + } for (; i >= 0; i--) { /* * We want to ext4_forget() only freshly allocated indirect - * blocks. Buffer for new_blocks[i-1] is at branch[i].bh and - * buffer at branch[0].bh is indirect block / inode already - * existing before ext4_alloc_branch() was called. + * blocks. Buffer for new_blocks[i] is at branch[i+1].bh + * (buffer at branch[0].bh is indirect block / inode already + * existing before ext4_alloc_branch() was called). Also + * because blocks are freshly allocated, we don't need to + * revoke them which is why we don't set + * EXT4_FREE_BLOCKS_METADATA. */ - if (i > 0 && i != indirect_blks && branch[i].bh) - ext4_forget(handle, 1, ar->inode, branch[i].bh, - branch[i].bh->b_blocknr); - ext4_free_blocks(handle, ar->inode, NULL, new_blocks[i], - (i == indirect_blks) ? ar->len : 1, 0); + ext4_free_blocks(handle, ar->inode, branch[i+1].bh, + new_blocks[i], 1, + branch[i+1].bh ? EXT4_FREE_BLOCKS_FORGET : 0); } return err; }
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit a413036791d040e33badcc634453a4d0c0705499 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Provide ext4_journal_ensure_credits_fn() function to ensure transaction has given amount of credits and call helper function to prepare for restarting a transaction. This allows to remove some boilerplate code from various places, add proper error handling for the case where transaction extension or restart fails, and reduces following changes needed for proper revoke record reservation tracking.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Conflict: fs/ext4/ext4.h
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/ext4.h | 3 +- fs/ext4/ext4_jbd2.c | 11 +++++++ fs/ext4/ext4_jbd2.h | 48 +++++++++++++++++++++++++++ fs/ext4/extents.c | 68 ++++++++++++++++++++++---------------- fs/ext4/indirect.c | 93 +++++++++++++++++++++++++++++---------------------- fs/ext4/inode.c | 26 --------------- fs/ext4/migrate.c | 95 ++++++++++++++++++++--------------------------------- fs/ext4/resize.c | 46 ++++++-------------------- fs/ext4/xattr.c | 90 +++++++++++++++++++------------------------------- 9 files changed, 233 insertions(+), 247 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index ff483a8..1fe3ad7 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2505,7 +2505,6 @@ extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, extern int ext4_truncate(struct inode *); extern int ext4_break_layouts(struct inode *); extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length); -extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks); extern void ext4_set_inode_flags(struct inode *); extern int ext4_alloc_da_blocks(struct inode *inode); extern void ext4_set_aops(struct inode *inode); @@ -3187,6 +3186,8 @@ extern int ext4_swap_extents(handle_t *handle, struct inode *inode1, struct inode *inode2, ext4_lblk_t lblk1, ext4_lblk_t lblk2, ext4_lblk_t count, int mark_unwritten,int *err); +extern int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode, + int check_cred, int restart_cred);
/* move_extent.c */ extern void ext4_double_down_write_data_sem(struct inode *first, diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 7c70b08..2b98d89 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -133,6 +133,17 @@ handle_t *__ext4_journal_start_reserved(handle_t *handle, unsigned int line, return handle; }
+int __ext4_journal_ensure_credits(handle_t *handle, int check_cred, + int extend_cred) +{ + if (!ext4_handle_valid(handle)) + return 0; + if (handle->h_buffer_credits >= check_cred) + return 0; + return ext4_journal_extend(handle, + extend_cred - handle->h_buffer_credits); +} + static void ext4_journal_abort_handle(const char *caller, unsigned int line, const char *err_fn, struct buffer_head *bh, diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index d161a0e..1ad43b7 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -346,6 +346,54 @@ static inline int ext4_journal_restart(handle_t *handle, int nblocks) return 0; }
+int __ext4_journal_ensure_credits(handle_t *handle, int check_cred, + int extend_cred); + + +/* + * Ensure @handle has at least @check_creds credits available. If not, + * transaction will be extended or restarted to contain at least @extend_cred + * credits. Before restarting transaction @fn is executed to allow for cleanup + * before the transaction is restarted. + * + * The return value is < 0 in case of error, 0 in case the handle has enough + * credits or transaction extension succeeded, 1 in case transaction had to be + * restarted. + */ +#define ext4_journal_ensure_credits_fn(handle, check_cred, extend_cred, fn) \ +({ \ + __label__ __ensure_end; \ + int err = __ext4_journal_ensure_credits((handle), (check_cred), \ + (extend_cred)); \ + \ + if (err <= 0) \ + goto __ensure_end; \ + err = (fn); \ + if (err < 0) \ + goto __ensure_end; \ + err = ext4_journal_restart((handle), (extend_cred)); \ + if (err == 0) \ + err = 1; \ +__ensure_end: \ + err; \ +}) + +/* + * Ensure given handle has at least requested amount of credits available, + * possibly restarting transaction if needed. + */ +static inline int ext4_journal_ensure_credits(handle_t *handle, int credits) +{ + return ext4_journal_ensure_credits_fn(handle, credits, credits, 0); +} + +static inline int ext4_journal_ensure_credits_batch(handle_t *handle, + int credits) +{ + return ext4_journal_ensure_credits_fn(handle, credits, + EXT4_MAX_TRANS_DATA, 0); +} + static inline int ext4_journal_blocks_per_page(struct inode *inode) { if (EXT4_JOURNAL(inode) != NULL) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index efb5355..92a19e3 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -100,29 +100,40 @@ static int ext4_split_extent_at(handle_t *handle, static int ext4_find_delayed_extent(struct inode *inode, struct extent_status *newes);
-static int ext4_ext_truncate_extend_restart(handle_t *handle, - struct inode *inode, - int needed) +static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped) { - int err; - - if (!ext4_handle_valid(handle)) - return 0; - if (handle->h_buffer_credits >= needed) - return 0; /* - * If we need to extend the journal get a few extra blocks - * while we're at it for efficiency's sake. + * Drop i_data_sem to avoid deadlock with ext4_map_blocks. At this + * moment, get_block can be called only for blocks inside i_size since + * page cache has been already dropped and writes are blocked by + * i_mutex. So we can safely drop the i_data_sem here. */ - needed += 3; - err = ext4_journal_extend(handle, needed - handle->h_buffer_credits); - if (err <= 0) - return err; - err = ext4_truncate_restart_trans(handle, inode, needed); - if (err == 0) - err = -EAGAIN; + BUG_ON(EXT4_JOURNAL(inode) == NULL); + ext4_discard_preallocations(inode); + up_write(&EXT4_I(inode)->i_data_sem); + *dropped = 1; + return 0; +}
- return err; +/* + * Make sure 'handle' has at least 'check_cred' credits. If not, restart + * transaction with 'restart_cred' credits. The function drops i_data_sem + * when restarting transaction and gets it after transaction is restarted. + * + * The function returns 0 on success, 1 if transaction had to be restarted, + * and < 0 in case of fatal error. + */ +int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode, + int check_cred, int restart_cred) +{ + int ret; + int dropped = 0; + + ret = ext4_journal_ensure_credits_fn(handle, check_cred, restart_cred, + ext4_ext_trunc_restart_fn(inode, &dropped)); + if (dropped) + down_write(&EXT4_I(inode)->i_data_sem); + return ret; }
/* @@ -2723,9 +2734,13 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode, } credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);
- err = ext4_ext_truncate_extend_restart(handle, inode, credits); - if (err) + err = ext4_datasem_ensure_credits(handle, inode, credits, + credits); + if (err) { + if (err > 0) + err = -EAGAIN; goto out; + }
err = ext4_ext_get_access(handle, inode, path + depth); if (err) @@ -5238,13 +5253,10 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, * descriptor) for each block group; assume two block * groups */ - if (handle->h_buffer_credits < 7) { - credits = ext4_writepage_trans_blocks(inode); - err = ext4_ext_truncate_extend_restart(handle, inode, credits); - /* EAGAIN is success */ - if (err && err != -EAGAIN) - return err; - } + credits = ext4_writepage_trans_blocks(inode); + err = ext4_datasem_ensure_credits(handle, inode, 7, credits); + if (err < 0) + return err;
err = ext4_ext_get_access(handle, inode, path); return err; diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c index c547d7a..0e88d5b 100644 --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -705,27 +705,62 @@ int ext4_ind_trans_blocks(struct inode *inode, int nrblocks) return DIV_ROUND_UP(nrblocks, EXT4_ADDR_PER_BLOCK(inode->i_sb)) + 4; }
+static int ext4_ind_trunc_restart_fn(handle_t *handle, struct inode *inode, + struct buffer_head *bh, int *dropped) +{ + int err; + + if (bh) { + BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); + err = ext4_handle_dirty_metadata(handle, inode, bh); + if (unlikely(err)) + return err; + } + err = ext4_mark_inode_dirty(handle, inode); + if (unlikely(err)) + return err; + /* + * Drop i_data_sem to avoid deadlock with ext4_map_blocks. At this + * moment, get_block can be called only for blocks inside i_size since + * page cache has been already dropped and writes are blocked by + * i_mutex. So we can safely drop the i_data_sem here. + */ + BUG_ON(EXT4_JOURNAL(inode) == NULL); + ext4_discard_preallocations(inode); + up_write(&EXT4_I(inode)->i_data_sem); + *dropped = 1; + return 0; +} + /* * Truncate transactions can be complex and absolutely huge. So we need to * be able to restart the transaction at a conventient checkpoint to make * sure we don't overflow the journal. * * Try to extend this transaction for the purposes of truncation. If - * extend fails, we need to propagate the failure up and restart the - * transaction in the top-level truncate loop. --sct - * - * Returns 0 if we managed to create more room. If we can't create more - * room, and the transaction must be restarted we return 1. + * extend fails, we restart transaction. */ -static int try_to_extend_transaction(handle_t *handle, struct inode *inode) +static int ext4_ind_truncate_ensure_credits(handle_t *handle, + struct inode *inode, + struct buffer_head *bh) { - if (!ext4_handle_valid(handle)) - return 0; - if (ext4_handle_has_enough_credits(handle, EXT4_RESERVE_TRANS_BLOCKS+1)) - return 0; - if (!ext4_journal_extend(handle, ext4_blocks_for_truncate(inode))) - return 0; - return 1; + int ret; + int dropped = 0; + + ret = ext4_journal_ensure_credits_fn(handle, EXT4_RESERVE_TRANS_BLOCKS, + ext4_blocks_for_truncate(inode), + ext4_ind_trunc_restart_fn(handle, inode, bh, &dropped)); + if (dropped) + down_write(&EXT4_I(inode)->i_data_sem); + if (ret <= 0) + return ret; + if (bh) { + BUFFER_TRACE(bh, "retaking write access"); + ret = ext4_journal_get_write_access(handle, bh); + if (unlikely(ret)) + return ret; + } + return 0; }
/* @@ -860,27 +895,9 @@ static int ext4_clear_blocks(handle_t *handle, struct inode *inode, return 1; }
- if (try_to_extend_transaction(handle, inode)) { - if (bh) { - BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); - err = ext4_handle_dirty_metadata(handle, inode, bh); - if (unlikely(err)) - goto out_err; - } - err = ext4_mark_inode_dirty(handle, inode); - if (unlikely(err)) - goto out_err; - err = ext4_truncate_restart_trans(handle, inode, - ext4_blocks_for_truncate(inode)); - if (unlikely(err)) - goto out_err; - if (bh) { - BUFFER_TRACE(bh, "retaking write access"); - err = ext4_journal_get_write_access(handle, bh); - if (unlikely(err)) - goto out_err; - } - } + err = ext4_ind_truncate_ensure_credits(handle, inode, bh); + if (err < 0) + goto out_err;
for (p = first; p < last; p++) *p = 0; @@ -1063,11 +1080,9 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode, */ if (ext4_handle_is_aborted(handle)) return; - if (try_to_extend_transaction(handle, inode)) { - ext4_mark_inode_dirty(handle, inode); - ext4_truncate_restart_trans(handle, inode, - ext4_blocks_for_truncate(inode)); - } + if (ext4_ind_truncate_ensure_credits(handle, inode, + NULL) < 0) + return;
/* * The forget flag here is critical because if diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a430120..a152e9f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -164,32 +164,6 @@ int ext4_inode_is_fast_symlink(struct inode *inode) }
/* - * Restart the transaction associated with *handle. This does a commit, - * so before we call here everything must be consistently dirtied against - * this transaction. - */ -int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode, - int nblocks) -{ - int ret; - - /* - * Drop i_data_sem to avoid deadlock with ext4_map_blocks. At this - * moment, get_block can be called only for blocks inside i_size since - * page cache has been already dropped and writes are blocked by - * i_mutex. So we can safely drop the i_data_sem here. - */ - BUG_ON(EXT4_JOURNAL(inode) == NULL); - jbd_debug(2, "restarting handle %p\n", handle); - up_write(&EXT4_I(inode)->i_data_sem); - ret = ext4_journal_restart(handle, nblocks); - down_write(&EXT4_I(inode)->i_data_sem); - ext4_discard_preallocations(inode); - - return ret; -} - -/* * Called at the last iput() if i_nlink is zero. */ void ext4_evict_inode(struct inode *inode) diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index a98bfca..814119f 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -50,29 +50,9 @@ static int finish_range(handle_t *handle, struct inode *inode, needed = ext4_ext_calc_credits_for_single_extent(inode, lb->last_block - lb->first_block + 1, path);
- /* - * Make sure the credit we accumalated is not really high - */ - if (needed && ext4_handle_has_enough_credits(handle, - EXT4_RESERVE_TRANS_BLOCKS)) { - up_write((&EXT4_I(inode)->i_data_sem)); - retval = ext4_journal_restart(handle, needed); - down_write((&EXT4_I(inode)->i_data_sem)); - if (retval) - goto err_out; - } else if (needed) { - retval = ext4_journal_extend(handle, needed); - if (retval) { - /* - * IF not able to extend the journal restart the journal - */ - up_write((&EXT4_I(inode)->i_data_sem)); - retval = ext4_journal_restart(handle, needed); - down_write((&EXT4_I(inode)->i_data_sem)); - if (retval) - goto err_out; - } - } + retval = ext4_datasem_ensure_credits(handle, inode, needed, needed); + if (retval < 0) + goto err_out; retval = ext4_ext_insert_extent(handle, inode, &path, &newext, 0); err_out: up_write((&EXT4_I(inode)->i_data_sem)); @@ -196,26 +176,6 @@ static int update_tind_extent_range(handle_t *handle, struct inode *inode,
}
-static int extend_credit_for_blkdel(handle_t *handle, struct inode *inode) -{ - int retval = 0, needed; - - if (ext4_handle_has_enough_credits(handle, EXT4_RESERVE_TRANS_BLOCKS+1)) - return 0; - /* - * We are freeing a blocks. During this we touch - * superblock, group descriptor and block bitmap. - * So allocate a credit of 3. We may update - * quota (user and group). - */ - needed = 3 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb); - - if (ext4_journal_extend(handle, needed) != 0) - retval = ext4_journal_restart(handle, needed); - - return retval; -} - static int free_dind_blocks(handle_t *handle, struct inode *inode, __le32 i_data) { @@ -223,6 +183,7 @@ static int free_dind_blocks(handle_t *handle, __le32 *tmp_idata; struct buffer_head *bh; unsigned long max_entries = inode->i_sb->s_blocksize >> 2; + int err;
bh = ext4_sb_bread(inode->i_sb, le32_to_cpu(i_data), 0); if (IS_ERR(bh)) @@ -231,7 +192,12 @@ static int free_dind_blocks(handle_t *handle, tmp_idata = (__le32 *)bh->b_data; for (i = 0; i < max_entries; i++) { if (tmp_idata[i]) { - extend_credit_for_blkdel(handle, inode); + err = ext4_journal_ensure_credits(handle, + EXT4_RESERVE_TRANS_BLOCKS); + if (err < 0) { + put_bh(bh); + return err; + } ext4_free_blocks(handle, inode, NULL, le32_to_cpu(tmp_idata[i]), 1, EXT4_FREE_BLOCKS_METADATA | @@ -239,7 +205,9 @@ static int free_dind_blocks(handle_t *handle, } } put_bh(bh); - extend_credit_for_blkdel(handle, inode); + err = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS); + if (err < 0) + return err; ext4_free_blocks(handle, inode, NULL, le32_to_cpu(i_data), 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); @@ -270,7 +238,9 @@ static int free_tind_blocks(handle_t *handle, } } put_bh(bh); - extend_credit_for_blkdel(handle, inode); + retval = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS); + if (retval < 0) + return retval; ext4_free_blocks(handle, inode, NULL, le32_to_cpu(i_data), 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); @@ -283,7 +253,10 @@ static int free_ind_block(handle_t *handle, struct inode *inode, __le32 *i_data)
/* ei->i_data[EXT4_IND_BLOCK] */ if (i_data[0]) { - extend_credit_for_blkdel(handle, inode); + retval = ext4_journal_ensure_credits(handle, + EXT4_RESERVE_TRANS_BLOCKS); + if (retval < 0) + return retval; ext4_free_blocks(handle, inode, NULL, le32_to_cpu(i_data[0]), 1, EXT4_FREE_BLOCKS_METADATA | @@ -318,12 +291,9 @@ static int ext4_ext_swap_inode_data(handle_t *handle, struct inode *inode, * One credit accounted for writing the * i_data field of the original inode */ - retval = ext4_journal_extend(handle, 1); - if (retval) { - retval = ext4_journal_restart(handle, 1); - if (retval) - goto err_out; - } + retval = ext4_journal_ensure_credits(handle, 1); + if (retval < 0) + goto err_out;
i_data[0] = ei->i_data[EXT4_IND_BLOCK]; i_data[1] = ei->i_data[EXT4_DIND_BLOCK]; @@ -391,15 +361,19 @@ static int free_ext_idx(handle_t *handle, struct inode *inode, ix = EXT_FIRST_INDEX(eh); for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ix++) { retval = free_ext_idx(handle, inode, ix); - if (retval) - break; + if (retval) { + put_bh(bh); + return retval; + } } } put_bh(bh); - extend_credit_for_blkdel(handle, inode); + retval = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS); + if (retval < 0) + return retval; ext4_free_blocks(handle, inode, NULL, block, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); - return retval; + return 0; }
/* @@ -574,9 +548,9 @@ int ext4_ext_migrate(struct inode *inode) }
/* We mark the tmp_inode dirty via ext4_ext_tree_init. */ - if (ext4_journal_extend(handle, 1) != 0) - ext4_journal_restart(handle, 1); - + retval = ext4_journal_ensure_credits(handle, 1); + if (retval < 0) + goto out_stop; /* * Mark the tmp_inode as of size zero */ @@ -594,6 +568,7 @@ int ext4_ext_migrate(struct inode *inode)
/* Reset the extent details */ ext4_ext_tree_init(handle, tmp_inode); +out_stop: ext4_journal_stop(handle); out: unlock_new_inode(tmp_inode); diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 4d5c0fc..a1e6082 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -389,30 +389,6 @@ static struct buffer_head *bclean(handle_t *handle, struct super_block *sb, }
/* - * If we have fewer than thresh credits, extend by EXT4_MAX_TRANS_DATA. - * If that fails, restart the transaction & regain write access for the - * buffer head which is used for block_bitmap modifications. - */ -static int extend_or_restart_transaction(handle_t *handle, int thresh) -{ - int err; - - if (ext4_handle_has_enough_credits(handle, thresh)) - return 0; - - err = ext4_journal_extend(handle, EXT4_MAX_TRANS_DATA); - if (err < 0) - return err; - if (err) { - err = ext4_journal_restart(handle, EXT4_MAX_TRANS_DATA); - if (err) - return err; - } - - return 0; -} - -/* * set_flexbg_block_bitmap() mark clusters [@first_cluster, @last_cluster] used. * * Helper function for ext4_setup_new_group_blocks() which set . @@ -451,8 +427,8 @@ static int set_flexbg_block_bitmap(struct super_block *sb, handle_t *handle, continue; }
- err = extend_or_restart_transaction(handle, 1); - if (err) + err = ext4_journal_ensure_credits_batch(handle, 1); + if (err < 0) return err;
bh = sb_getblk(sb, flex_gd->groups[group].block_bitmap); @@ -544,8 +520,8 @@ static int setup_new_flex_group_blocks(struct super_block *sb, struct buffer_head *gdb;
ext4_debug("update backup group %#04llx\n", block); - err = extend_or_restart_transaction(handle, 1); - if (err) + err = ext4_journal_ensure_credits_batch(handle, 1); + if (err < 0) goto out;
gdb = sb_getblk(sb, block); @@ -602,8 +578,8 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
/* Initialize block bitmap of the @group */ block = group_data[i].block_bitmap; - err = extend_or_restart_transaction(handle, 1); - if (err) + err = ext4_journal_ensure_credits_batch(handle, 1); + if (err < 0) goto out;
bh = bclean(handle, sb, block); @@ -631,8 +607,8 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
/* Initialize inode bitmap of the @group */ block = group_data[i].inode_bitmap; - err = extend_or_restart_transaction(handle, 1); - if (err) + err = ext4_journal_ensure_credits_batch(handle, 1); + if (err < 0) goto out; /* Mark unused entries in inode bitmap used */ bh = bclean(handle, sb, block); @@ -1109,10 +1085,8 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data, ext4_fsblk_t backup_block;
/* Out of journal space, and can't get more - abort - so sad */ - if (ext4_handle_valid(handle) && - handle->h_buffer_credits == 0 && - ext4_journal_extend(handle, EXT4_MAX_TRANS_DATA) && - (err = ext4_journal_restart(handle, EXT4_MAX_TRANS_DATA))) + err = ext4_journal_ensure_credits_batch(handle, 1); + if (err < 0) break;
if (meta_bg == 0) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index f73fc90..5ea8f6d 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -967,55 +967,6 @@ int __ext4_xattr_set_credits(struct super_block *sb, struct inode *inode, return credits; }
-static int ext4_xattr_ensure_credits(handle_t *handle, struct inode *inode, - int credits, struct buffer_head *bh, - bool dirty, bool block_csum) -{ - int error; - - if (!ext4_handle_valid(handle)) - return 0; - - if (handle->h_buffer_credits >= credits) - return 0; - - error = ext4_journal_extend(handle, credits - handle->h_buffer_credits); - if (!error) - return 0; - if (error < 0) { - ext4_warning(inode->i_sb, "Extend journal (error %d)", error); - return error; - } - - if (bh && dirty) { - if (block_csum) - ext4_xattr_block_csum_set(inode, bh); - error = ext4_handle_dirty_metadata(handle, NULL, bh); - if (error) { - ext4_warning(inode->i_sb, "Handle metadata (error %d)", - error); - return error; - } - } - - error = ext4_journal_restart(handle, credits); - if (error) { - ext4_warning(inode->i_sb, "Restart journal (error %d)", error); - return error; - } - - if (bh) { - error = ext4_journal_get_write_access(handle, bh); - if (error) { - ext4_warning(inode->i_sb, - "Get write access failed (error %d)", - error); - return error; - } - } - return 0; -} - static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode, int ref_change) { @@ -1153,6 +1104,24 @@ static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent, return saved_err; }
+static int ext4_xattr_restart_fn(handle_t *handle, struct inode *inode, + struct buffer_head *bh, bool block_csum, bool dirty) +{ + int error; + + if (bh && dirty) { + if (block_csum) + ext4_xattr_block_csum_set(inode, bh); + error = ext4_handle_dirty_metadata(handle, NULL, bh); + if (error) { + ext4_warning(inode->i_sb, "Handle metadata (error %d)", + error); + return error; + } + } + return 0; +} + static void ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent, struct buffer_head *bh, @@ -1189,13 +1158,23 @@ static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent, continue; }
- err = ext4_xattr_ensure_credits(handle, parent, credits, bh, - dirty, block_csum); - if (err) { + err = ext4_journal_ensure_credits_fn(handle, credits, credits, + ext4_xattr_restart_fn(handle, parent, bh, block_csum, + dirty)); + if (err < 0) { ext4_warning_inode(ea_inode, "Ensure credits err=%d", err); continue; } + if (err > 0) { + err = ext4_journal_get_write_access(handle, bh); + if (err) { + ext4_warning_inode(ea_inode, + "Re-get write access err=%d", + err); + continue; + } + }
err = ext4_xattr_inode_dec_ref(handle, ea_inode); if (err) { @@ -2866,11 +2845,8 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode, struct inode *ea_inode; int error;
- error = ext4_xattr_ensure_credits(handle, inode, extra_credits, - NULL /* bh */, - false /* dirty */, - false /* block_csum */); - if (error) { + error = ext4_journal_ensure_credits(handle, extra_credits); + if (error < 0) { EXT4_ERROR_INODE(inode, "ensure credits (error %d)", error); goto cleanup; }
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit a9a8344ee1714f835ba394077e8c13d751e2f148 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Provide accessor function to get number of credits available in a handle and use it from ext4. Later, computation of available credits won't be so straightforward.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/ext4_jbd2.c | 13 +++++++------ fs/ext4/ext4_jbd2.h | 7 ------- fs/ext4/xattr.c | 2 +- include/linux/jbd2.h | 6 ++++++ 4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 2b98d89..731bbfd 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -119,8 +119,8 @@ handle_t *__ext4_journal_start_reserved(handle_t *handle, unsigned int line, return ext4_get_nojournal();
sb = handle->h_journal->j_private; - trace_ext4_journal_start_reserved(sb, handle->h_buffer_credits, - _RET_IP_); + trace_ext4_journal_start_reserved(sb, + jbd2_handle_buffer_credits(handle), _RET_IP_); err = ext4_journal_check_start(sb); if (err < 0) { jbd2_journal_free_reserved(handle); @@ -138,10 +138,10 @@ int __ext4_journal_ensure_credits(handle_t *handle, int check_cred, { if (!ext4_handle_valid(handle)) return 0; - if (handle->h_buffer_credits >= check_cred) + if (jbd2_handle_buffer_credits(handle) >= check_cred) return 0; return ext4_journal_extend(handle, - extend_cred - handle->h_buffer_credits); + extend_cred - jbd2_handle_buffer_credits(handle)); }
static void ext4_journal_abort_handle(const char *caller, unsigned int line, @@ -289,7 +289,7 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, handle->h_type, handle->h_line_no, handle->h_requested_credits, - handle->h_buffer_credits, err); + jbd2_handle_buffer_credits(handle), err); return err; } ext4_error_inode(inode, where, line, @@ -300,7 +300,8 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, handle->h_type, handle->h_line_no, handle->h_requested_credits, - handle->h_buffer_credits, err); + jbd2_handle_buffer_credits(handle), + err); } } else { if (inode) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 1ad43b7..9d51c05 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -288,13 +288,6 @@ static inline int ext4_handle_is_aborted(handle_t *handle) return 0; }
-static inline int ext4_handle_has_enough_credits(handle_t *handle, int needed) -{ - if (ext4_handle_valid(handle) && handle->h_buffer_credits < needed) - return 0; - return 1; -} - #define ext4_journal_start_sb(sb, type, nblocks) \ __ext4_journal_start_sb((sb), __LINE__, (type), (nblocks), 0)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 5ea8f6d..f11aeb9 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2318,7 +2318,7 @@ static struct buffer_head *ext4_xattr_get_block(struct inode *inode) flags & XATTR_CREATE); brelse(bh);
- if (!ext4_handle_has_enough_credits(handle, credits)) { + if (jbd2_handle_buffer_credits(handle) < credits) { error = -ENOSPC; goto cleanup; } diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 268f300..dd5d9df 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1651,6 +1651,12 @@ static inline tid_t jbd2_get_latest_transaction(journal_t *journal) return tid; }
+ +static inline int jbd2_handle_buffer_credits(handle_t *handle) +{ + return handle->h_buffer_credits; +} + #ifdef __KERNEL__
#define buffer_trace_init(bh) do {} while (0)
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 9797a902480521dc8e7a478e38f0c896ffff8784 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Use the jbd2 accessor function for h_buffer_credits.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-12-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ocfs2/alloc.c | 32 ++++++++++++++++---------------- fs/ocfs2/journal.c | 4 ++-- 2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index a342f00..40842c4 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -2302,9 +2302,9 @@ static int ocfs2_extend_rotate_transaction(handle_t *handle, int subtree_depth, int ret = 0; int credits = (path->p_tree_depth - subtree_depth) * 2 + 1 + op_credits;
- if (handle->h_buffer_credits < credits) + if (jbd2_handle_buffer_credits(handle) < credits) ret = ocfs2_extend_trans(handle, - credits - handle->h_buffer_credits); + credits - jbd2_handle_buffer_credits(handle));
return ret; } @@ -2381,7 +2381,7 @@ static int ocfs2_rotate_tree_right(handle_t *handle, struct ocfs2_path *right_path, struct ocfs2_path **ret_left_path) { - int ret, start, orig_credits = handle->h_buffer_credits; + int ret, start, orig_credits = jbd2_handle_buffer_credits(handle); u32 cpos; struct ocfs2_path *left_path = NULL; struct super_block *sb = ocfs2_metadata_cache_get_super(et->et_ci); @@ -3162,7 +3162,7 @@ static int ocfs2_rotate_tree_left(handle_t *handle, struct ocfs2_path *path, struct ocfs2_cached_dealloc_ctxt *dealloc) { - int ret, orig_credits = handle->h_buffer_credits; + int ret, orig_credits = jbd2_handle_buffer_credits(handle); struct ocfs2_path *tmp_path = NULL, *restart_path = NULL; struct ocfs2_extent_block *eb; struct ocfs2_extent_list *el; @@ -3400,8 +3400,8 @@ static int ocfs2_merge_rec_right(struct ocfs2_path *left_path, right_path);
ret = ocfs2_extend_rotate_transaction(handle, subtree_index, - handle->h_buffer_credits, - right_path); + jbd2_handle_buffer_credits(handle), + right_path); if (ret) { mlog_errno(ret); goto out; @@ -3562,8 +3562,8 @@ static int ocfs2_merge_rec_left(struct ocfs2_path *right_path, right_path);
ret = ocfs2_extend_rotate_transaction(handle, subtree_index, - handle->h_buffer_credits, - left_path); + jbd2_handle_buffer_credits(handle), + left_path); if (ret) { mlog_errno(ret); goto out; @@ -3637,7 +3637,7 @@ static int ocfs2_merge_rec_left(struct ocfs2_path *right_path, le16_to_cpu(el->l_next_free_rec) == 1) { /* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, + jbd2_handle_buffer_credits(handle), right_path); if (ret) { mlog_errno(ret); @@ -3683,7 +3683,7 @@ static int ocfs2_try_to_merge_extent(handle_t *handle, if (ctxt->c_split_covers_rec && ctxt->c_has_empty_extent) { /* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, + jbd2_handle_buffer_credits(handle), path); if (ret) { mlog_errno(ret); @@ -3739,7 +3739,7 @@ static int ocfs2_try_to_merge_extent(handle_t *handle,
/* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, + jbd2_handle_buffer_credits(handle), path); if (ret) { mlog_errno(ret); @@ -3769,7 +3769,7 @@ static int ocfs2_try_to_merge_extent(handle_t *handle,
/* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, + jbd2_handle_buffer_credits(handle), path); if (ret) { mlog_errno(ret); @@ -3813,7 +3813,7 @@ static int ocfs2_try_to_merge_extent(handle_t *handle, if (ctxt->c_split_covers_rec) { /* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, + jbd2_handle_buffer_credits(handle), path); if (ret) { mlog_errno(ret); @@ -5376,7 +5376,7 @@ static int ocfs2_truncate_rec(handle_t *handle, if (ocfs2_is_empty_extent(&el->l_recs[0]) && index > 0) { /* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, + jbd2_handle_buffer_credits(handle), path); if (ret) { mlog_errno(ret); @@ -5445,8 +5445,8 @@ static int ocfs2_truncate_rec(handle_t *handle, }
ret = ocfs2_extend_rotate_transaction(handle, 0, - handle->h_buffer_credits, - path); + jbd2_handle_buffer_credits(handle), + path); if (ret) { mlog_errno(ret); goto out; diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index fc1f209e..212414c 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -434,7 +434,7 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks) if (!nblocks) return 0;
- old_nblocks = handle->h_buffer_credits; + old_nblocks = jbd2_handle_buffer_credits(handle);
trace_ocfs2_extend_trans(old_nblocks, nblocks);
@@ -475,7 +475,7 @@ int ocfs2_allocate_extend_trans(handle_t *handle, int thresh)
BUG_ON(!handle);
- old_nblks = handle->h_buffer_credits; + old_nblks = jbd2_handle_buffer_credits(handle); trace_ocfs2_allocate_extend_trans(old_nblks, thresh);
if (old_nblks < thresh)
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit dfaf5ffda227be3e867fee7c0f6a66749392fbd0 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Move code in jbd2_journal_stop() around a bit. It removes some unnecessary code duplication and will make factoring out parts common with jbd2__journal_restart() easier.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-14-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 40 ++++++++++++++++------------------------ 1 file changed, 16 insertions(+), 24 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 524953b..4a12025 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1672,41 +1672,34 @@ int jbd2_journal_stop(handle_t *handle) tid_t tid; pid_t pid;
+ if (--handle->h_ref > 0) { + jbd_debug(4, "h_ref %d -> %d\n", handle->h_ref + 1, + handle->h_ref); + if (is_handle_aborted(handle)) + return -EIO; + return 0; + } if (!transaction) { /* - * Handle is already detached from the transaction so - * there is nothing to do other than decrease a refcount, - * or free the handle if refcount drops to zero + * Handle is already detached from the transaction so there is + * nothing to do other than free the handle. */ - if (--handle->h_ref > 0) { - jbd_debug(4, "h_ref %d -> %d\n", handle->h_ref + 1, - handle->h_ref); - return err; - } else { - if (handle->h_rsv_handle) - jbd2_free_handle(handle->h_rsv_handle); - goto free_and_exit; - } + if (handle->h_rsv_handle) + jbd2_free_handle(handle->h_rsv_handle); + goto free_and_exit; } journal = transaction->t_journal; + tid = transaction->t_tid;
J_ASSERT(journal_current_handle() == handle); + J_ASSERT(atomic_read(&transaction->t_updates) > 0);
if (is_handle_aborted(handle)) err = -EIO; - else - J_ASSERT(atomic_read(&transaction->t_updates) > 0); - - if (--handle->h_ref > 0) { - jbd_debug(4, "h_ref %d -> %d\n", handle->h_ref + 1, - handle->h_ref); - return err; - }
jbd_debug(4, "Handle %p going down\n", handle); trace_jbd2_handle_stats(journal->j_fs_dev->bd_dev, - transaction->t_tid, - handle->h_type, handle->h_line_no, + tid, handle->h_type, handle->h_line_no, jiffies - handle->h_start_jiffies, handle->h_sync, handle->h_requested_credits, (handle->h_requested_credits - @@ -1791,7 +1784,7 @@ int jbd2_journal_stop(handle_t *handle) jbd_debug(2, "transaction too old, requesting commit for " "handle %p\n", handle); /* This is non-blocking */ - jbd2_log_start_commit(journal, transaction->t_tid); + jbd2_log_start_commit(journal, tid);
/* * Special case: JBD2_SYNC synchronous updates require us @@ -1807,7 +1800,6 @@ int jbd2_journal_stop(handle_t *handle) * once we do this, we must not dereference transaction * pointer again. */ - tid = transaction->t_tid; if (atomic_dec_and_test(&transaction->t_updates)) { wake_up(&journal->j_wait_updates); if (journal->j_barrier_count)
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 150549ed2fcf4be9bf3efedd99b72924dff26166 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
If a transaction is larger than journal->j_max_transaction_buffers, that is a bug and not a trigger for transaction commit. Also the very next attempt to start new handle will start transaction commit anyway. So just remove the pointless check. Arguably, we could start transaction commit whenever the transaction size is *close* to journal->j_max_transaction_buffers. This has a potential to reduce latency of the next jbd2_journal_start() at the cost of somewhat smaller transactions. However for this to have any effect, it would mean that there isn't someone already waiting in jbd2_journal_start() which means metadata load for the fs is pretty light anyway so probably this optimization is not worth it.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 4a12025..9697fb3 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1769,13 +1769,10 @@ int jbd2_journal_stop(handle_t *handle)
/* * If the handle is marked SYNC, we need to set another commit - * going! We also want to force a commit if the current - * transaction is occupying too much of the log, or if the - * transaction is too old now. + * going! We also want to force a commit if the transaction is too + * old now. */ if (handle->h_sync || - (atomic_read(&transaction->t_outstanding_credits) > - journal->j_max_transaction_buffers) || time_after_eq(jiffies, transaction->t_expires)) { /* Do this even for aborted journals: an abort still * completes the commit thread, it just doesn't write
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 5559b2d81b51de75cb7864bb1fbb82982f7e8fff category: bugfix bugzilla: 25031 CVE: NA ---------------------------
When we drop last handle from a transaction and journal->j_barrier_count
0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked
wait queue. This looks pointless - wait for outstanding handles always happens on journal->j_wait_updates waitqueue. journal->j_wait_transaction_locked is used to wait for transaction state changes and by start_this_handle() for waiting until journal->j_barrier_count drops to 0. The first case is clearly irrelevant here since only jbd2 thread changes transaction state. The second case looks related but jbd2_journal_unlock_updates() is responsible for the wakeup in this case. So just drop the wakeup.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 9697fb3..e2184f2 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1797,11 +1797,8 @@ int jbd2_journal_stop(handle_t *handle) * once we do this, we must not dereference transaction * pointer again. */ - if (atomic_dec_and_test(&transaction->t_updates)) { + if (atomic_dec_and_test(&transaction->t_updates)) wake_up(&journal->j_wait_updates); - if (journal->j_barrier_count) - wake_up(&journal->j_wait_transaction_locked); - }
rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_);
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit ec8b6f600e49dc87a8564807fec4193bf93ee2b5 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
jbd2__journal_restart() has quite some code that is common with jbd2_journal_stop(). Factor this functionality into stop_this_handle() helper and use it from both functions. Note that this also drops t_handle_lock protection from jbd2__journal_restart() as jbd2_journal_stop() does the same thing without it.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 98 ++++++++++++++++++++++++--------------------------- 1 file changed, 46 insertions(+), 52 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index e2184f2..58eb697 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -481,12 +481,17 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) } EXPORT_SYMBOL(jbd2_journal_start);
-void jbd2_journal_free_reserved(handle_t *handle) +static void __jbd2_journal_unreserve_handle(handle_t *handle) { journal_t *journal = handle->h_journal;
WARN_ON(!handle->h_reserved); sub_reserved_credits(journal, handle->h_buffer_credits); +} + +void jbd2_journal_free_reserved(handle_t *handle) +{ + __jbd2_journal_unreserve_handle(handle); jbd2_free_handle(handle); } EXPORT_SYMBOL(jbd2_journal_free_reserved); @@ -621,6 +626,28 @@ int jbd2_journal_extend(handle_t *handle, int nblocks) return result; }
+static void stop_this_handle(handle_t *handle) +{ + transaction_t *transaction = handle->h_transaction; + journal_t *journal = transaction->t_journal; + + J_ASSERT(journal_current_handle() == handle); + J_ASSERT(atomic_read(&transaction->t_updates) > 0); + current->journal_info = NULL; + atomic_sub(handle->h_buffer_credits, + &transaction->t_outstanding_credits); + if (handle->h_rsv_handle) + __jbd2_journal_unreserve_handle(handle->h_rsv_handle); + if (atomic_dec_and_test(&transaction->t_updates)) + wake_up(&journal->j_wait_updates); + + rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_); + /* + * Scope of the GFP_NOFS context is over here and so we can restore the + * original alloc context. + */ + memalloc_nofs_restore(handle->saved_alloc_context); +}
/** * int jbd2_journal_restart() - restart a handle . @@ -643,52 +670,34 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask) transaction_t *transaction = handle->h_transaction; journal_t *journal; tid_t tid; - int need_to_start, ret; + int need_to_start;
/* If we've had an abort of any type, don't even think about * actually doing the restart! */ if (is_handle_aborted(handle)) return 0; journal = transaction->t_journal; + tid = transaction->t_tid;
/* * First unlink the handle from its current transaction, and start the * commit on that. */ - J_ASSERT(atomic_read(&transaction->t_updates) > 0); - J_ASSERT(journal_current_handle() == handle); - - read_lock(&journal->j_state_lock); - spin_lock(&transaction->t_handle_lock); - atomic_sub(handle->h_buffer_credits, - &transaction->t_outstanding_credits); - if (handle->h_rsv_handle) { - sub_reserved_credits(journal, - handle->h_rsv_handle->h_buffer_credits); - } - if (atomic_dec_and_test(&transaction->t_updates)) - wake_up(&journal->j_wait_updates); - tid = transaction->t_tid; - spin_unlock(&transaction->t_handle_lock); + jbd_debug(2, "restarting handle %p\n", handle); + stop_this_handle(handle); handle->h_transaction = NULL; - current->journal_info = NULL;
- jbd_debug(2, "restarting handle %p\n", handle); + /* + * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can + * get rid of pointless j_state_lock traffic like this. + */ + read_lock(&journal->j_state_lock); need_to_start = !tid_geq(journal->j_commit_request, tid); read_unlock(&journal->j_state_lock); if (need_to_start) jbd2_log_start_commit(journal, tid); - - rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_); handle->h_buffer_credits = nblocks; - /* - * Restore the original nofs context because the journal restart - * is basically the same thing as journal stop and start. - * start_this_handle will start a new nofs context. - */ - memalloc_nofs_restore(handle->saved_alloc_context); - ret = start_this_handle(journal, handle, gfp_mask); - return ret; + return start_this_handle(journal, handle, gfp_mask); } EXPORT_SYMBOL(jbd2__journal_restart);
@@ -1684,16 +1693,12 @@ int jbd2_journal_stop(handle_t *handle) * Handle is already detached from the transaction so there is * nothing to do other than free the handle. */ - if (handle->h_rsv_handle) - jbd2_free_handle(handle->h_rsv_handle); + memalloc_nofs_restore(handle->saved_alloc_context); goto free_and_exit; } journal = transaction->t_journal; tid = transaction->t_tid;
- J_ASSERT(journal_current_handle() == handle); - J_ASSERT(atomic_read(&transaction->t_updates) > 0); - if (is_handle_aborted(handle)) err = -EIO;
@@ -1763,9 +1768,6 @@ int jbd2_journal_stop(handle_t *handle)
if (handle->h_sync) transaction->t_synchronous_commit = 1; - current->journal_info = NULL; - atomic_sub(handle->h_buffer_credits, - &transaction->t_outstanding_credits);
/* * If the handle is marked SYNC, we need to set another commit @@ -1792,27 +1794,19 @@ int jbd2_journal_stop(handle_t *handle) }
/* - * Once we drop t_updates, if it goes to zero the transaction - * could start committing on us and eventually disappear. So - * once we do this, we must not dereference transaction - * pointer again. + * Once stop_this_handle() drops t_updates, the transaction could start + * committing on us and eventually disappear. So we must not + * dereference transaction pointer again after calling + * stop_this_handle(). */ - if (atomic_dec_and_test(&transaction->t_updates)) - wake_up(&journal->j_wait_updates); - - rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_); + stop_this_handle(handle);
if (wait_for_commit) err = jbd2_log_wait_commit(journal, tid);
- if (handle->h_rsv_handle) - jbd2_journal_free_reserved(handle->h_rsv_handle); free_and_exit: - /* - * Scope of the GFP_NOFS context is over here and so we can restore the - * original alloc context. - */ - memalloc_nofs_restore(handle->saved_alloc_context); + if (handle->h_rsv_handle) + jbd2_free_handle(handle->h_rsv_handle); jbd2_free_handle(handle); return err; }
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 9f356e5a4f12008fa0df8b6385fc0ab830416e72 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Currently, journal descriptor blocks were not accounted in transaction->t_outstanding_credits and we were just leaving some slack space in the journal for them (in jbd2_log_space_left() and jbd2_space_needed()). This is making proper accounting (and reservation we want to add) of descriptor blocks difficult so switch to accounting descriptor blocks in transaction->t_outstanding_credits and just reserve the same amount of credits in t_outstanding credits for journal descriptor blocks when creating transaction.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Conflict: include/linux/jbd2.h
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/commit.c | 6 ++++-- fs/jbd2/journal.c | 1 + fs/jbd2/transaction.c | 20 ++++++++++++-------- include/linux/jbd2.h | 22 +++++++--------------- 4 files changed, 24 insertions(+), 25 deletions(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 92927950..4a3300e 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -557,8 +557,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) stats.run.rs_logging = jiffies; stats.run.rs_flushing = jbd2_time_diff(stats.run.rs_flushing, stats.run.rs_logging); - stats.run.rs_blocks = - atomic_read(&commit_transaction->t_outstanding_credits); + stats.run.rs_blocks = commit_transaction->t_nr_buffers; stats.run.rs_blocks_logged = 0;
J_ASSERT(commit_transaction->t_nr_buffers <= @@ -886,6 +885,9 @@ void jbd2_journal_commit_transaction(journal_t *journal) if (err) jbd2_journal_abort(journal, err);
+ WARN_ON_ONCE( + atomic_read(&commit_transaction->t_outstanding_credits) < 0); + /* * Now disk caches for filesystem device are flushed so we are safe to * erase checkpointed transactions from the log by updating journal diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index ae8e991..f46434b 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -845,6 +845,7 @@ struct buffer_head * bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); if (!bh) return NULL; + atomic_dec(&transaction->t_outstanding_credits); lock_buffer(bh); memset(bh->b_data, 0, journal->j_blocksize); header = (journal_header_t *)bh->b_data; diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 58eb697..845c6c5 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -63,6 +63,17 @@ void jbd2_journal_free_transaction(transaction_t *transaction) }
/* + * We reserve t_outstanding_credits >> JBD2_CONTROL_BLOCKS_SHIFT for + * transaction descriptor blocks. + */ +#define JBD2_CONTROL_BLOCKS_SHIFT 5 + +static int jbd2_descriptor_blocks_per_trans(journal_t *journal) +{ + return journal->j_max_transaction_buffers >> JBD2_CONTROL_BLOCKS_SHIFT; +} + +/* * jbd2_get_transaction: obtain a new transaction_t object. * * Simply allocate and initialise a new transaction. Create it in @@ -88,6 +99,7 @@ void jbd2_journal_free_transaction(transaction_t *transaction) spin_lock_init(&transaction->t_handle_lock); atomic_set(&transaction->t_updates, 0); atomic_set(&transaction->t_outstanding_credits, + jbd2_descriptor_blocks_per_trans(journal) + atomic_read(&journal->j_reserved_credits)); atomic_set(&transaction->t_handle_count, 0); INIT_LIST_HEAD(&transaction->t_inode_list); @@ -600,14 +612,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks) goto unlock; }
- if (wanted + (wanted >> JBD2_CONTROL_BLOCKS_SHIFT) > - jbd2_log_space_left(journal)) { - jbd_debug(3, "denied handle %p %d blocks: " - "insufficient log space\n", handle, nblocks); - atomic_sub(nblocks, &transaction->t_outstanding_credits); - goto unlock; - } - trace_jbd2_handle_extend(journal->j_fs_dev->bd_dev, transaction->t_tid, handle->h_type, handle->h_line_no, diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index dd5d9df..b340d60 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -683,8 +683,10 @@ struct transaction_s atomic_t t_updates;
/* - * Number of buffers reserved for use by all handles in this transaction - * handle but not yet modified. [t_handle_lock] + * Number of blocks reserved for this transaction in the journal. + * This is including all credits reserved when starting transaction + * handles as well as all journal descriptor blocks needed for this + * transaction. [none] */ atomic_t t_outstanding_credits;
@@ -1566,19 +1568,12 @@ static inline int jbd2_journal_has_csum_v2or3(journal_t *journal) }
/* - * We reserve t_outstanding_credits >> JBD2_CONTROL_BLOCKS_SHIFT for - * transaction control blocks. - */ -#define JBD2_CONTROL_BLOCKS_SHIFT 5 - -/* * Return the minimum number of blocks which must be free in the journal * before a new transaction may be started. Must be called under j_state_lock. */ static inline int jbd2_space_needed(journal_t *journal) { - int nblocks = journal->j_max_transaction_buffers; - return nblocks + (nblocks >> JBD2_CONTROL_BLOCKS_SHIFT); + return journal->j_max_transaction_buffers; }
/* @@ -1590,11 +1585,8 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal) long free = journal->j_free - 32;
if (journal->j_committing_transaction) { - unsigned long committing = atomic_read(&journal-> - j_committing_transaction->t_outstanding_credits); - - /* Transaction + control blocks */ - free -= committing + (committing >> JBD2_CONTROL_BLOCKS_SHIFT); + free -= atomic_read(&journal-> + j_committing_transaction->t_outstanding_credits); } return max_t(long, free, 0); }
From: Liu Song liu.song11@zte.com.cn
mainline inclusion from mainline-5.2-rc1 commit fb203751099eecf145317685ee480a51e5b246de category: bugfix bugzilla: 25031 CVE: NA ---------------------------
At the beginning, nblocks has been assigned. There is no need to repeat the assignment in the while loop, and remove it.
Signed-off-by: Liu Song liu.song11@zte.com.cn Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/checkpoint.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 0f77b11..62cf497 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -132,7 +132,6 @@ void __jbd2_log_wait_for_space(journal_t *journal) return; } spin_lock(&journal->j_list_lock); - nblocks = jbd2_space_needed(journal); space_left = jbd2_log_space_left(journal); if (space_left < nblocks) { int chkpt = journal->j_checkpoint_transactions != NULL;
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 77444ac4f9537bc4211f928959d5231445e30c6e category: bugfix bugzilla: 25031 CVE: NA ---------------------------
The function is now just a trivial wrapper returning journal->j_max_transaction_buffers. Drop it.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/checkpoint.c | 2 +- fs/jbd2/transaction.c | 5 +++-- include/linux/jbd2.h | 9 --------- 3 files changed, 4 insertions(+), 12 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 62cf497..96bf339 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -110,7 +110,7 @@ void __jbd2_log_wait_for_space(journal_t *journal) int nblocks, space_left; /* assert_spin_locked(&journal->j_state_lock); */
- nblocks = jbd2_space_needed(journal); + nblocks = journal->j_max_transaction_buffers; while (jbd2_log_space_left(journal) < nblocks) { write_unlock(&journal->j_state_lock); mutex_lock_io(&journal->j_checkpoint_mutex); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 845c6c5..544ac8f 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -245,12 +245,13 @@ static int add_transaction_credits(journal_t *journal, int blocks, * *before* starting to dirty potentially checkpointed buffers * in the new transaction. */ - if (jbd2_log_space_left(journal) < jbd2_space_needed(journal)) { + if (jbd2_log_space_left(journal) < journal->j_max_transaction_buffers) { atomic_sub(total, &t->t_outstanding_credits); read_unlock(&journal->j_state_lock); jbd2_might_wait_for_commit(journal); write_lock(&journal->j_state_lock); - if (jbd2_log_space_left(journal) < jbd2_space_needed(journal)) + if (jbd2_log_space_left(journal) < + journal->j_max_transaction_buffers) __jbd2_log_wait_for_space(journal); write_unlock(&journal->j_state_lock); return 1; diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index b340d60..66d7361 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1568,15 +1568,6 @@ static inline int jbd2_journal_has_csum_v2or3(journal_t *journal) }
/* - * Return the minimum number of blocks which must be free in the journal - * before a new transaction may be started. Must be called under j_state_lock. - */ -static inline int jbd2_space_needed(journal_t *journal) -{ - return journal->j_max_transaction_buffers; -} - -/* * Return number of free blocks in the log. Must be called under j_state_lock. */ static inline unsigned long jbd2_log_space_left(journal_t *journal)
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit fdc3ef882a5d59c1709a13b5486ae2b1632e12b6 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Extend functions for starting, extending, and restarting transaction handles to take number of revoke records handle must be able to accommodate. These functions then make sure transaction has enough credits to be able to store resulting revoke descriptor blocks. Also revoke code tracks number of revoke records created by a handle to catch situation where some place didn't reserve enough space for revoke records. Similarly to standard transaction credits, space for unused reserved revoke records is released when the handle is stopped.
On the ext4 side we currently take a simplistic approach of reserving space for 1024 revoke records for any transaction. This grows amount of credits reserved for each handle only by a few and is enough for any normal workload so that we don't hit warnings in jbd2. We will refine the logic in following commits.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Conflict: include/linux/jbd2.h
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/ext4_jbd2.h | 4 ++-- fs/jbd2/journal.c | 21 ++++++++++++++++++++ fs/jbd2/revoke.c | 6 ++++++ fs/jbd2/transaction.c | 54 ++++++++++++++++++++++++++++++++++++++++++++------- fs/ocfs2/journal.c | 4 ++-- include/linux/jbd2.h | 43 ++++++++++++++++++++++++++++++---------- 7 files changed, 112 insertions(+), 22 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 731bbfd..b81190b 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -78,7 +78,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, journal = EXT4_SB(sb)->s_journal; if (!journal) return ext4_get_nojournal(); - return jbd2__journal_start(journal, blocks, rsv_blocks, GFP_NOFS, + return jbd2__journal_start(journal, blocks, rsv_blocks, 1024, GFP_NOFS, type, line); }
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 9d51c05..cdac72f 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -328,14 +328,14 @@ static inline handle_t *ext4_journal_current_handle(void) static inline int ext4_journal_extend(handle_t *handle, int nblocks) { if (ext4_handle_valid(handle)) - return jbd2_journal_extend(handle, nblocks); + return jbd2_journal_extend(handle, nblocks, 1024); return 0; }
static inline int ext4_journal_restart(handle_t *handle, int nblocks) { if (ext4_handle_valid(handle)) - return jbd2_journal_restart(handle, nblocks); + return jbd2__journal_restart(handle, nblocks, 1024, GFP_NOFS); return 0; }
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index f46434b..5577e0df 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1505,6 +1505,21 @@ void jbd2_journal_update_sb_errno(journal_t *journal) } EXPORT_SYMBOL(jbd2_journal_update_sb_errno);
+static int journal_revoke_records_per_block(journal_t *journal) +{ + int record_size; + int space = journal->j_blocksize - sizeof(jbd2_journal_revoke_header_t); + + if (jbd2_has_feature_64bit(journal)) + record_size = 8; + else + record_size = 4; + + if (jbd2_journal_has_csum_v2or3(journal)) + space -= sizeof(struct jbd2_journal_block_tail); + return space / record_size; +} + /* * Read the superblock for a given journal, performing initial * validation of the format. @@ -1613,6 +1628,8 @@ static int journal_get_superblock(journal_t *journal) sizeof(sb->s_uuid)); }
+ journal->j_revoke_records_per_block = + journal_revoke_records_per_block(journal); set_buffer_verified(bh);
return 0; @@ -1933,6 +1950,8 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat, sb->s_feature_ro_compat |= cpu_to_be32(ro); sb->s_feature_incompat |= cpu_to_be32(incompat); unlock_buffer(journal->j_sb_buffer); + journal->j_revoke_records_per_block = + journal_revoke_records_per_block(journal);
return 1; #undef COMPAT_FEATURE_ON @@ -1963,6 +1982,8 @@ void jbd2_journal_clear_features(journal_t *journal, unsigned long compat, sb->s_feature_compat &= ~cpu_to_be32(compat); sb->s_feature_ro_compat &= ~cpu_to_be32(ro); sb->s_feature_incompat &= ~cpu_to_be32(incompat); + journal->j_revoke_records_per_block = + journal_revoke_records_per_block(journal); } EXPORT_SYMBOL(jbd2_journal_clear_features);
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c index 69b9bc3..5f438dc 100644 --- a/fs/jbd2/revoke.c +++ b/fs/jbd2/revoke.c @@ -371,6 +371,11 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr, } #endif
+ if (WARN_ON_ONCE(handle->h_revoke_credits <= 0)) { + if (!bh_in) + brelse(bh); + return -EIO; + } /* We really ought not ever to revoke twice in a row without first having the revoke cancelled: it's illegal to free a block twice without allocating it in between! */ @@ -391,6 +396,7 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr, __brelse(bh); } } + handle->h_revoke_credits--;
jbd_debug(2, "insert revoke for block %llu, bh_in=%p\n",blocknr, bh_in); err = insert_revoke_hash(journal, blocknr, diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 544ac8f..0218f58 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -101,6 +101,7 @@ static int jbd2_descriptor_blocks_per_trans(journal_t *journal) atomic_set(&transaction->t_outstanding_credits, jbd2_descriptor_blocks_per_trans(journal) + atomic_read(&journal->j_reserved_credits)); + atomic_set(&transaction->t_outstanding_revokes, 0); atomic_set(&transaction->t_handle_count, 0); INIT_LIST_HEAD(&transaction->t_inode_list); INIT_LIST_HEAD(&transaction->t_private_list); @@ -387,6 +388,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle, update_t_max_wait(transaction, ts); handle->h_transaction = transaction; handle->h_requested_credits = blocks; + handle->h_revoke_credits_requested = handle->h_revoke_credits; handle->h_start_jiffies = jiffies; atomic_inc(&transaction->t_updates); atomic_inc(&transaction->t_handle_count); @@ -420,8 +422,8 @@ static handle_t *new_handle(int nblocks) }
handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, - gfp_t gfp_mask, unsigned int type, - unsigned int line_no) + int revoke_records, gfp_t gfp_mask, + unsigned int type, unsigned int line_no) { handle_t *handle = journal_current_handle(); int err; @@ -435,6 +437,8 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, return handle; }
+ nblocks += DIV_ROUND_UP(revoke_records, + journal->j_revoke_records_per_block); handle = new_handle(nblocks); if (!handle) return ERR_PTR(-ENOMEM); @@ -450,6 +454,7 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, rsv_handle->h_journal = journal; handle->h_rsv_handle = rsv_handle; } + handle->h_revoke_credits = revoke_records;
err = start_this_handle(journal, handle, gfp_mask); if (err < 0) { @@ -490,7 +495,7 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, */ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) { - return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0); + return jbd2__journal_start(journal, nblocks, 0, 0, GFP_NOFS, 0, 0); } EXPORT_SYMBOL(jbd2_journal_start);
@@ -564,6 +569,7 @@ int jbd2_journal_start_reserved(handle_t *handle, unsigned int type, * int jbd2_journal_extend() - extend buffer credits. * @handle: handle to 'extend' * @nblocks: nr blocks to try to extend by. + * @revoke_records: number of revoke records to try to extend by. * * Some transactions, such as large extends and truncates, can be done * atomically all at once or in several stages. The operation requests @@ -580,7 +586,7 @@ int jbd2_journal_start_reserved(handle_t *handle, unsigned int type, * return code < 0 implies an error * return code > 0 implies normal transaction-full status. */ -int jbd2_journal_extend(handle_t *handle, int nblocks) +int jbd2_journal_extend(handle_t *handle, int nblocks, int revoke_records) { transaction_t *transaction = handle->h_transaction; journal_t *journal; @@ -602,6 +608,12 @@ int jbd2_journal_extend(handle_t *handle, int nblocks) goto error_out; }
+ nblocks += DIV_ROUND_UP( + handle->h_revoke_credits_requested + revoke_records, + journal->j_revoke_records_per_block) - + DIV_ROUND_UP( + handle->h_revoke_credits_requested, + journal->j_revoke_records_per_block); spin_lock(&transaction->t_handle_lock); wanted = atomic_add_return(nblocks, &transaction->t_outstanding_credits); @@ -621,6 +633,8 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
handle->h_buffer_credits += nblocks; handle->h_requested_credits += nblocks; + handle->h_revoke_credits += revoke_records; + handle->h_revoke_credits_requested += revoke_records; result = 0;
jbd_debug(3, "extended handle %p by %d\n", handle, nblocks); @@ -635,10 +649,31 @@ static void stop_this_handle(handle_t *handle) { transaction_t *transaction = handle->h_transaction; journal_t *journal = transaction->t_journal; + int revokes;
J_ASSERT(journal_current_handle() == handle); J_ASSERT(atomic_read(&transaction->t_updates) > 0); current->journal_info = NULL; + /* + * Subtract necessary revoke descriptor blocks from handle credits. We + * take care to account only for revoke descriptor blocks the + * transaction will really need as large sequences of transactions with + * small numbers of revokes are relatively common. + */ + revokes = handle->h_revoke_credits_requested - handle->h_revoke_credits; + if (revokes) { + int t_revokes, revoke_descriptors; + int rr_per_blk = journal->j_revoke_records_per_block; + + WARN_ON_ONCE(DIV_ROUND_UP(revokes, rr_per_blk) + > handle->h_buffer_credits); + t_revokes = atomic_add_return(revokes, + &transaction->t_outstanding_revokes); + revoke_descriptors = + DIV_ROUND_UP(t_revokes, rr_per_blk) - + DIV_ROUND_UP(t_revokes - revokes, rr_per_blk); + handle->h_buffer_credits -= revoke_descriptors; + } atomic_sub(handle->h_buffer_credits, &transaction->t_outstanding_credits); if (handle->h_rsv_handle) @@ -658,6 +693,7 @@ static void stop_this_handle(handle_t *handle) * int jbd2_journal_restart() - restart a handle . * @handle: handle to restart * @nblocks: nr credits requested + * @revoke_records: number of revoke record credits requested * @gfp_mask: memory allocation flags (for start_this_handle) * * Restart a handle for a multi-transaction filesystem @@ -670,7 +706,8 @@ static void stop_this_handle(handle_t *handle) * credits. We preserve reserved handle if there's any attached to the * passed in handle. */ -int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask) +int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, + gfp_t gfp_mask) { transaction_t *transaction = handle->h_transaction; journal_t *journal; @@ -701,7 +738,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask) read_unlock(&journal->j_state_lock); if (need_to_start) jbd2_log_start_commit(journal, tid); - handle->h_buffer_credits = nblocks; + handle->h_buffer_credits = nblocks + + DIV_ROUND_UP(revoke_records, + journal->j_revoke_records_per_block); + handle->h_revoke_credits = revoke_records; return start_this_handle(journal, handle, gfp_mask); } EXPORT_SYMBOL(jbd2__journal_restart); @@ -709,7 +749,7 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
int jbd2_journal_restart(handle_t *handle, int nblocks) { - return jbd2__journal_restart(handle, nblocks, GFP_NOFS); + return jbd2__journal_restart(handle, nblocks, 0, GFP_NOFS); } EXPORT_SYMBOL(jbd2_journal_restart);
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 212414c..08ca856 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -441,7 +441,7 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks) #ifdef CONFIG_OCFS2_DEBUG_FS status = 1; #else - status = jbd2_journal_extend(handle, nblocks); + status = jbd2_journal_extend(handle, nblocks, 0); if (status < 0) { mlog_errno(status); goto bail; @@ -481,7 +481,7 @@ int ocfs2_allocate_extend_trans(handle_t *handle, int thresh) if (old_nblks < thresh) return 0;
- status = jbd2_journal_extend(handle, OCFS2_MAX_TRANS_DATA); + status = jbd2_journal_extend(handle, OCFS2_MAX_TRANS_DATA, 0); if (status < 0) { mlog_errno(status); goto bail; diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 66d7361..c7c196c 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -481,6 +481,7 @@ struct jbd2_inode { * @h_journal: Which journal handle belongs to - used iff h_reserved set. * @h_rsv_handle: Handle reserved for finishing the logical operation. * @h_buffer_credits: Number of remaining buffers we are allowed to dirty. + * @h_revoke_credits: Number of remaining revoke records available for handle * @h_ref: Reference count on this handle. * @h_err: Field for caller's use to track errors through large fs operations. * @h_sync: Flag for sync-on-close. @@ -491,6 +492,7 @@ struct jbd2_inode { * @h_line_no: For handle statistics. * @h_start_jiffies: Handle Start time. * @h_requested_credits: Holds @h_buffer_credits after handle is started. + * @h_revoke_credits_requested: Holds @h_revoke_credits after handle is started. * @saved_alloc_context: Saved context while transaction is open. **/
@@ -508,6 +510,8 @@ struct jbd2_journal_handle
handle_t *h_rsv_handle; int h_buffer_credits; + int h_revoke_credits; + int h_revoke_credits_requested; int h_ref; int h_err;
@@ -691,6 +695,17 @@ struct transaction_s atomic_t t_outstanding_credits;
/* + * Number of revoke records for this transaction added by already + * stopped handles. [none] + */ + atomic_t t_outstanding_revokes; + + /* + * How many handles used this transaction? [none] + */ + atomic_t t_handle_count; + + /* * Forward and backward links for the circular list of all transactions * awaiting checkpoint. [j_list_lock] */ @@ -708,11 +723,6 @@ struct transaction_s ktime_t t_start_time;
/* - * How many handles used this transaction? [t_handle_lock] - */ - atomic_t t_handle_count; - - /* * This transaction is being forced and some process is * waiting for it to finish. */ @@ -1029,6 +1039,13 @@ struct journal_s int j_max_transaction_buffers;
/** + * @j_revoke_records_per_block: + * + * Number of revoke records that fit in one descriptor block. + */ + int j_revoke_records_per_block; + + /** * @j_commit_interval: * * What is the maximum transaction lifetime before we begin a commit? @@ -1362,14 +1379,16 @@ static inline handle_t *journal_current_handle(void)
extern handle_t *jbd2_journal_start(journal_t *, int nblocks); extern handle_t *jbd2__journal_start(journal_t *, int blocks, int rsv_blocks, - gfp_t gfp_mask, unsigned int type, - unsigned int line_no); + int revoke_records, gfp_t gfp_mask, + unsigned int type, unsigned int line_no); extern int jbd2_journal_restart(handle_t *, int nblocks); -extern int jbd2__journal_restart(handle_t *, int nblocks, gfp_t gfp_mask); +extern int jbd2__journal_restart(handle_t *, int nblocks, + int revoke_records, gfp_t gfp_mask); extern int jbd2_journal_start_reserved(handle_t *handle, unsigned int type, unsigned int line_no); extern void jbd2_journal_free_reserved(handle_t *handle); -extern int jbd2_journal_extend (handle_t *, int nblocks); +extern int jbd2_journal_extend(handle_t *handle, int nblocks, + int revoke_records); extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *); extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *); extern int jbd2_journal_get_undo_access(handle_t *, struct buffer_head *); @@ -1637,7 +1656,11 @@ static inline tid_t jbd2_get_latest_transaction(journal_t *journal)
static inline int jbd2_handle_buffer_credits(handle_t *handle) { - return handle->h_buffer_credits; + journal_t *journal = handle->h_transaction->t_journal; + + return handle->h_buffer_credits - + DIV_ROUND_UP(handle->h_revoke_credits_requested, + journal->j_revoke_records_per_block); }
#ifdef __KERNEL__
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 933f1c1e0b75bbc29730eef07c9e196c6dfd37e5 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
The credit counter now contains both buffer and revoke descriptor block credits. Rename to counter to h_total_credits to reflect that. No functional change.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Conflict: fs/jbd2/transaction.c
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 28 ++++++++++++++-------------- include/linux/jbd2.h | 9 +++++---- 2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 0218f58..45f9f06 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -288,12 +288,12 @@ static int start_this_handle(journal_t *journal, handle_t *handle, gfp_t gfp_mask) { transaction_t *transaction, *new_transaction = NULL; - int blocks = handle->h_buffer_credits; + int blocks = handle->h_total_credits; int rsv_blocks = 0; unsigned long ts = jiffies;
if (handle->h_rsv_handle) - rsv_blocks = handle->h_rsv_handle->h_buffer_credits; + rsv_blocks = handle->h_rsv_handle->h_total_credits;
/* * Limit the number of reserved credits to 1/2 of maximum transaction @@ -415,7 +415,7 @@ static handle_t *new_handle(int nblocks) handle_t *handle = jbd2_alloc_handle(GFP_NOFS); if (!handle) return NULL; - handle->h_buffer_credits = nblocks; + handle->h_total_credits = nblocks; handle->h_ref = 1;
return handle; @@ -504,7 +504,7 @@ static void __jbd2_journal_unreserve_handle(handle_t *handle) journal_t *journal = handle->h_journal;
WARN_ON(!handle->h_reserved); - sub_reserved_credits(journal, handle->h_buffer_credits); + sub_reserved_credits(journal, handle->h_total_credits); }
void jbd2_journal_free_reserved(handle_t *handle) @@ -628,10 +628,10 @@ int jbd2_journal_extend(handle_t *handle, int nblocks, int revoke_records) trace_jbd2_handle_extend(journal->j_fs_dev->bd_dev, transaction->t_tid, handle->h_type, handle->h_line_no, - handle->h_buffer_credits, + handle->h_total_credits, nblocks);
- handle->h_buffer_credits += nblocks; + handle->h_total_credits += nblocks; handle->h_requested_credits += nblocks; handle->h_revoke_credits += revoke_records; handle->h_revoke_credits_requested += revoke_records; @@ -666,15 +666,15 @@ static void stop_this_handle(handle_t *handle) int rr_per_blk = journal->j_revoke_records_per_block;
WARN_ON_ONCE(DIV_ROUND_UP(revokes, rr_per_blk) - > handle->h_buffer_credits); + > handle->h_total_credits); t_revokes = atomic_add_return(revokes, &transaction->t_outstanding_revokes); revoke_descriptors = DIV_ROUND_UP(t_revokes, rr_per_blk) - DIV_ROUND_UP(t_revokes - revokes, rr_per_blk); - handle->h_buffer_credits -= revoke_descriptors; + handle->h_total_credits -= revoke_descriptors; } - atomic_sub(handle->h_buffer_credits, + atomic_sub(handle->h_total_credits, &transaction->t_outstanding_credits); if (handle->h_rsv_handle) __jbd2_journal_unreserve_handle(handle->h_rsv_handle); @@ -738,7 +738,7 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, read_unlock(&journal->j_state_lock); if (need_to_start) jbd2_log_start_commit(journal, tid); - handle->h_buffer_credits = nblocks + + handle->h_total_credits = nblocks + DIV_ROUND_UP(revoke_records, journal->j_revoke_records_per_block); handle->h_revoke_credits = revoke_records; @@ -1443,12 +1443,12 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) * of the transaction. This needs to be done * once a transaction -bzzz */ - if (handle->h_buffer_credits <= 0) { + if (handle->h_total_credits <= 0) { ret = -ENOSPC; goto out_unlock_bh; } jh->b_modified = 1; - handle->h_buffer_credits--; + handle->h_total_credits--; }
/* @@ -1692,7 +1692,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh) drop: if (drop_reserve) { /* no need to reserve log space for this block -bzzz */ - handle->h_buffer_credits++; + handle->h_total_credits++; } return err;
@@ -1753,7 +1753,7 @@ int jbd2_journal_stop(handle_t *handle) jiffies - handle->h_start_jiffies, handle->h_sync, handle->h_requested_credits, (handle->h_requested_credits - - handle->h_buffer_credits)); + handle->h_total_credits));
/* * Implement synchronous transaction batching. If the handle diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index c7c196c..ee6de17e 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -480,7 +480,8 @@ struct jbd2_inode { * @h_transaction: Which compound transaction is this update a part of? * @h_journal: Which journal handle belongs to - used iff h_reserved set. * @h_rsv_handle: Handle reserved for finishing the logical operation. - * @h_buffer_credits: Number of remaining buffers we are allowed to dirty. + * @h_total_credits: Number of remaining buffers we are allowed to add to + journal. These are dirty buffers and revoke descriptor blocks. * @h_revoke_credits: Number of remaining revoke records available for handle * @h_ref: Reference count on this handle. * @h_err: Field for caller's use to track errors through large fs operations. @@ -491,7 +492,7 @@ struct jbd2_inode { * @h_type: For handle statistics. * @h_line_no: For handle statistics. * @h_start_jiffies: Handle Start time. - * @h_requested_credits: Holds @h_buffer_credits after handle is started. + * @h_requested_credits: Holds @h_total_credits after handle is started. * @h_revoke_credits_requested: Holds @h_revoke_credits after handle is started. * @saved_alloc_context: Saved context while transaction is open. **/ @@ -509,7 +510,7 @@ struct jbd2_journal_handle };
handle_t *h_rsv_handle; - int h_buffer_credits; + int h_total_credits; int h_revoke_credits; int h_revoke_credits_requested; int h_ref; @@ -1658,7 +1659,7 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle) { journal_t *journal = handle->h_transaction->t_journal;
- return handle->h_buffer_credits - + return handle->h_total_credits - DIV_ROUND_UP(handle->h_revoke_credits_requested, journal->j_revoke_records_per_block); }
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit d090707edab59cb07047d6d7e138ffcc3bdc42be category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Make checking of available credits in jbd2_journal_dirty_metadata() more strict. There should be always enough credits in the handle to write all potential revoke descriptors. Also we warn in case there are not enough credits since this is a bug in the filesystem.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 45f9f06..247f61fc 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1443,7 +1443,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) * of the transaction. This needs to be done * once a transaction -bzzz */ - if (handle->h_total_credits <= 0) { + if (WARN_ON_ONCE(jbd2_handle_buffer_credits(handle) <= 0)) { ret = -ENOSPC; goto out_unlock_bh; }
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 83448bdfb59731c2f54784ed3f4a93ff95be6e7e category: bugfix bugzilla: 25031 CVE: NA ---------------------------
So far we have reserved only relatively high fixed amount of revoke credits for each transaction. We over-reserved by large amount for most cases but when freeing large directories or files with data journalling, the fixed amount is not enough. In fact the worst case estimate is inconveniently large (maximum extent size) for freeing of one extent.
We fix this by doing proper estimate of the amount of blocks that need to be revoked when removing blocks from the inode due to truncate or hole punching and otherwise reserve just a small amount of revoke credits for each transaction to accommodate freeing of xattrs block or so.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/ext4.h | 3 +- fs/ext4/ext4_jbd2.c | 20 ++++++----- fs/ext4/ext4_jbd2.h | 84 +++++++++++++++++++++++++++++++-------------- fs/ext4/extents.c | 27 +++++++++++---- fs/ext4/ialloc.c | 2 +- fs/ext4/indirect.c | 12 ++++--- fs/ext4/inode.c | 2 +- fs/ext4/migrate.c | 24 ++++++++----- fs/ext4/resize.c | 16 ++++++--- fs/ext4/xattr.c | 4 ++- include/trace/events/ext4.h | 13 ++++--- 11 files changed, 140 insertions(+), 67 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 1fe3ad7..43a681b 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3187,7 +3187,8 @@ extern int ext4_swap_extents(handle_t *handle, struct inode *inode1, ext4_lblk_t lblk2, ext4_lblk_t count, int mark_unwritten,int *err); extern int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode, - int check_cred, int restart_cred); + int check_cred, int restart_cred, + int revoke_cred);
/* move_extent.c */ extern void ext4_double_down_write_data_sem(struct inode *first, diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index b81190b..d3b8cde 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -65,12 +65,14 @@ static int ext4_journal_check_start(struct super_block *sb) }
handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, - int type, int blocks, int rsv_blocks) + int type, int blocks, int rsv_blocks, + int revoke_creds) { journal_t *journal; int err;
- trace_ext4_journal_start(sb, blocks, rsv_blocks, _RET_IP_); + trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds, + _RET_IP_); err = ext4_journal_check_start(sb); if (err < 0) return ERR_PTR(err); @@ -78,8 +80,8 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, journal = EXT4_SB(sb)->s_journal; if (!journal) return ext4_get_nojournal(); - return jbd2__journal_start(journal, blocks, rsv_blocks, 1024, GFP_NOFS, - type, line); + return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, + GFP_NOFS, type, line); }
int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle) @@ -134,14 +136,16 @@ handle_t *__ext4_journal_start_reserved(handle_t *handle, unsigned int line, }
int __ext4_journal_ensure_credits(handle_t *handle, int check_cred, - int extend_cred) + int extend_cred, int revoke_cred) { if (!ext4_handle_valid(handle)) return 0; - if (jbd2_handle_buffer_credits(handle) >= check_cred) + if (jbd2_handle_buffer_credits(handle) >= check_cred && + handle->h_revoke_credits >= revoke_cred) return 0; - return ext4_journal_extend(handle, - extend_cred - jbd2_handle_buffer_credits(handle)); + extend_cred = max(0, extend_cred - jbd2_handle_buffer_credits(handle)); + revoke_cred = max(0, revoke_cred - handle->h_revoke_credits); + return ext4_journal_extend(handle, extend_cred, revoke_cred); }
static void ext4_journal_abort_handle(const char *caller, unsigned int line, diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index cdac72f..25396e5 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -261,7 +261,8 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, __ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb))
handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, - int type, int blocks, int rsv_blocks); + int type, int blocks, int rsv_blocks, + int revoke_creds); int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle);
#define EXT4_NOJOURNAL_MAX_REF_COUNT ((unsigned long) 4096) @@ -288,21 +289,41 @@ static inline int ext4_handle_is_aborted(handle_t *handle) return 0; }
+static inline int ext4_free_metadata_revoke_credits(struct super_block *sb, + int blocks) +{ + /* Freeing each metadata block can result in freeing one cluster */ + return blocks * EXT4_SB(sb)->s_cluster_ratio; +} + +static inline int ext4_trans_default_revoke_credits(struct super_block *sb) +{ + return ext4_free_metadata_revoke_credits(sb, 8); +} + #define ext4_journal_start_sb(sb, type, nblocks) \ - __ext4_journal_start_sb((sb), __LINE__, (type), (nblocks), 0) + __ext4_journal_start_sb((sb), __LINE__, (type), (nblocks), 0, \ + ext4_trans_default_revoke_credits(sb))
#define ext4_journal_start(inode, type, nblocks) \ - __ext4_journal_start((inode), __LINE__, (type), (nblocks), 0) + __ext4_journal_start((inode), __LINE__, (type), (nblocks), 0, \ + ext4_trans_default_revoke_credits((inode)->i_sb))
-#define ext4_journal_start_with_reserve(inode, type, blocks, rsv_blocks) \ - __ext4_journal_start((inode), __LINE__, (type), (blocks), (rsv_blocks)) +#define ext4_journal_start_with_reserve(inode, type, blocks, rsv_blocks)\ + __ext4_journal_start((inode), __LINE__, (type), (blocks), (rsv_blocks),\ + ext4_trans_default_revoke_credits((inode)->i_sb)) + +#define ext4_journal_start_with_revoke(inode, type, blocks, revoke_creds) \ + __ext4_journal_start((inode), __LINE__, (type), (blocks), 0, \ + (revoke_creds))
static inline handle_t *__ext4_journal_start(struct inode *inode, unsigned int line, int type, - int blocks, int rsv_blocks) + int blocks, int rsv_blocks, + int revoke_creds) { return __ext4_journal_start_sb(inode->i_sb, line, type, blocks, - rsv_blocks); + rsv_blocks, revoke_creds); }
#define ext4_journal_stop(handle) \ @@ -325,22 +346,23 @@ static inline handle_t *ext4_journal_current_handle(void) return journal_current_handle(); }
-static inline int ext4_journal_extend(handle_t *handle, int nblocks) +static inline int ext4_journal_extend(handle_t *handle, int nblocks, int revoke) { if (ext4_handle_valid(handle)) - return jbd2_journal_extend(handle, nblocks, 1024); + return jbd2_journal_extend(handle, nblocks, revoke); return 0; }
-static inline int ext4_journal_restart(handle_t *handle, int nblocks) +static inline int ext4_journal_restart(handle_t *handle, int nblocks, + int revoke) { if (ext4_handle_valid(handle)) - return jbd2__journal_restart(handle, nblocks, 1024, GFP_NOFS); + return jbd2__journal_restart(handle, nblocks, revoke, GFP_NOFS); return 0; }
int __ext4_journal_ensure_credits(handle_t *handle, int check_cred, - int extend_cred); + int extend_cred, int revoke_cred);
/* @@ -353,18 +375,19 @@ int __ext4_journal_ensure_credits(handle_t *handle, int check_cred, * credits or transaction extension succeeded, 1 in case transaction had to be * restarted. */ -#define ext4_journal_ensure_credits_fn(handle, check_cred, extend_cred, fn) \ +#define ext4_journal_ensure_credits_fn(handle, check_cred, extend_cred, \ + revoke_cred, fn) \ ({ \ __label__ __ensure_end; \ int err = __ext4_journal_ensure_credits((handle), (check_cred), \ - (extend_cred)); \ + (extend_cred), (revoke_cred)); \ \ if (err <= 0) \ goto __ensure_end; \ err = (fn); \ if (err < 0) \ goto __ensure_end; \ - err = ext4_journal_restart((handle), (extend_cred)); \ + err = ext4_journal_restart((handle), (extend_cred), (revoke_cred)); \ if (err == 0) \ err = 1; \ __ensure_end: \ @@ -373,18 +396,16 @@ int __ext4_journal_ensure_credits(handle_t *handle, int check_cred,
/* * Ensure given handle has at least requested amount of credits available, - * possibly restarting transaction if needed. + * possibly restarting transaction if needed. We also make sure the transaction + * has space for at least ext4_trans_default_revoke_credits(sb) revoke records + * as freeing one or two blocks is very common pattern and requesting this is + * very cheap. */ -static inline int ext4_journal_ensure_credits(handle_t *handle, int credits) +static inline int ext4_journal_ensure_credits(handle_t *handle, int credits, + int revoke_creds) { - return ext4_journal_ensure_credits_fn(handle, credits, credits, 0); -} - -static inline int ext4_journal_ensure_credits_batch(handle_t *handle, - int credits) -{ - return ext4_journal_ensure_credits_fn(handle, credits, - EXT4_MAX_TRANS_DATA, 0); + return ext4_journal_ensure_credits_fn(handle, credits, credits, + revoke_creds, 0); }
static inline int ext4_journal_blocks_per_page(struct inode *inode) @@ -479,6 +500,19 @@ static inline int ext4_should_writeback_data(struct inode *inode) return ext4_inode_journal_mode(inode) & EXT4_INODE_WRITEBACK_DATA_MODE; }
+static inline int ext4_free_data_revoke_credits(struct inode *inode, int blocks) +{ + if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) + return 0; + if (!ext4_should_journal_data(inode)) + return 0; + /* + * Data blocks in one extent are contiguous, just account for partial + * clusters at extent boundaries + */ + return blocks + 2*(EXT4_SB(inode->i_sb)->s_cluster_ratio - 1); +} + /* * This function controls whether or not we should try to go down the * dioread_nolock code paths, which makes it safe to avoid taking diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 92a19e3..211944e 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -124,13 +124,14 @@ static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped) * and < 0 in case of fatal error. */ int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode, - int check_cred, int restart_cred) + int check_cred, int restart_cred, + int revoke_cred) { int ret; int dropped = 0;
ret = ext4_journal_ensure_credits_fn(handle, check_cred, restart_cred, - ext4_ext_trunc_restart_fn(inode, &dropped)); + revoke_cred, ext4_ext_trunc_restart_fn(inode, &dropped)); if (dropped) down_write(&EXT4_I(inode)->i_data_sem); return ret; @@ -1851,7 +1852,8 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, * group descriptor to release the extent tree block. If we * can't get the journal credits, give up. */ - if (ext4_journal_extend(handle, 2)) + if (ext4_journal_extend(handle, 2, + ext4_free_metadata_revoke_credits(inode->i_sb, 1))) return;
/* @@ -2641,7 +2643,7 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode, { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); int err = 0, correct_index = 0; - int depth = ext_depth(inode), credits; + int depth = ext_depth(inode), credits, revoke_credits; struct ext4_extent_header *eh; ext4_lblk_t a, b; unsigned num; @@ -2733,9 +2735,18 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode, credits += (ext_depth(inode)) + 1; } credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb); + /* + * We may end up freeing some index blocks and data from the + * punched range. Note that partial clusters are accounted for + * by ext4_free_data_revoke_credits(). + */ + revoke_credits = + ext4_free_metadata_revoke_credits(inode->i_sb, + ext_depth(inode)) + + ext4_free_data_revoke_credits(inode, b - a + 1);
err = ext4_datasem_ensure_credits(handle, inode, credits, - credits); + credits, revoke_credits); if (err) { if (err > 0) err = -EAGAIN; @@ -2858,7 +2869,9 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, ext_debug("truncate since %u to %u\n", start, end);
/* probably first extent we're gonna free will be last in block */ - handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1); + handle = ext4_journal_start_with_revoke(inode, EXT4_HT_TRUNCATE, + depth + 1, + ext4_free_metadata_revoke_credits(inode->i_sb, depth)); if (IS_ERR(handle)) return PTR_ERR(handle);
@@ -5254,7 +5267,7 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, * groups */ credits = ext4_writepage_trans_blocks(inode); - err = ext4_datasem_ensure_credits(handle, inode, 7, credits); + err = ext4_datasem_ensure_credits(handle, inode, 7, credits, 0); if (err < 0) return err;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 091a18a..0b5e65f 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -927,7 +927,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, BUG_ON(nblocks <= 0); handle = __ext4_journal_start_sb(dir->i_sb, line_no, handle_type, nblocks, - 0); + 0, 0); if (IS_ERR(handle)) { err = PTR_ERR(handle); ext4_std_error(sb, err); diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c index 0e88d5b..2625dab 100644 --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -742,13 +742,14 @@ static int ext4_ind_trunc_restart_fn(handle_t *handle, struct inode *inode, */ static int ext4_ind_truncate_ensure_credits(handle_t *handle, struct inode *inode, - struct buffer_head *bh) + struct buffer_head *bh, + int revoke_creds) { int ret; int dropped = 0;
ret = ext4_journal_ensure_credits_fn(handle, EXT4_RESERVE_TRANS_BLOCKS, - ext4_blocks_for_truncate(inode), + ext4_blocks_for_truncate(inode), revoke_creds, ext4_ind_trunc_restart_fn(handle, inode, bh, &dropped)); if (dropped) down_write(&EXT4_I(inode)->i_data_sem); @@ -895,7 +896,8 @@ static int ext4_clear_blocks(handle_t *handle, struct inode *inode, return 1; }
- err = ext4_ind_truncate_ensure_credits(handle, inode, bh); + err = ext4_ind_truncate_ensure_credits(handle, inode, bh, + ext4_free_data_revoke_credits(inode, count)); if (err < 0) goto out_err;
@@ -1081,7 +1083,9 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode, if (ext4_handle_is_aborted(handle)) return; if (ext4_ind_truncate_ensure_credits(handle, inode, - NULL) < 0) + NULL, + ext4_free_metadata_revoke_credits( + inode->i_sb, 1)) < 0) return;
/* diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a152e9f..7dbc5f7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6002,7 +6002,7 @@ static int ext4_try_to_expand_extra_isize(struct inode *inode, * force a large enough s_min_extra_isize. */ if (ext4_journal_extend(handle, - EXT4_DATA_TRANS_BLOCKS(inode->i_sb)) != 0) + EXT4_DATA_TRANS_BLOCKS(inode->i_sb), 0) != 0) return -ENOSPC;
if (ext4_write_trylock_xattr(inode, &no_expand) == 0) diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index 814119f..2398bce 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -50,7 +50,7 @@ static int finish_range(handle_t *handle, struct inode *inode, needed = ext4_ext_calc_credits_for_single_extent(inode, lb->last_block - lb->first_block + 1, path);
- retval = ext4_datasem_ensure_credits(handle, inode, needed, needed); + retval = ext4_datasem_ensure_credits(handle, inode, needed, needed, 0); if (retval < 0) goto err_out; retval = ext4_ext_insert_extent(handle, inode, &path, &newext, 0); @@ -182,10 +182,11 @@ static int free_dind_blocks(handle_t *handle, int i; __le32 *tmp_idata; struct buffer_head *bh; + struct super_block *sb = inode->i_sb; unsigned long max_entries = inode->i_sb->s_blocksize >> 2; int err;
- bh = ext4_sb_bread(inode->i_sb, le32_to_cpu(i_data), 0); + bh = ext4_sb_bread(sb, le32_to_cpu(i_data), 0); if (IS_ERR(bh)) return PTR_ERR(bh);
@@ -193,7 +194,8 @@ static int free_dind_blocks(handle_t *handle, for (i = 0; i < max_entries; i++) { if (tmp_idata[i]) { err = ext4_journal_ensure_credits(handle, - EXT4_RESERVE_TRANS_BLOCKS); + EXT4_RESERVE_TRANS_BLOCKS, + ext4_free_metadata_revoke_credits(sb, 1)); if (err < 0) { put_bh(bh); return err; @@ -205,7 +207,8 @@ static int free_dind_blocks(handle_t *handle, } } put_bh(bh); - err = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS); + err = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS, + ext4_free_metadata_revoke_credits(sb, 1)); if (err < 0) return err; ext4_free_blocks(handle, inode, NULL, le32_to_cpu(i_data), 1, @@ -238,7 +241,8 @@ static int free_tind_blocks(handle_t *handle, } } put_bh(bh); - retval = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS); + retval = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS, + ext4_free_metadata_revoke_credits(inode->i_sb, 1)); if (retval < 0) return retval; ext4_free_blocks(handle, inode, NULL, le32_to_cpu(i_data), 1, @@ -254,7 +258,8 @@ static int free_ind_block(handle_t *handle, struct inode *inode, __le32 *i_data) /* ei->i_data[EXT4_IND_BLOCK] */ if (i_data[0]) { retval = ext4_journal_ensure_credits(handle, - EXT4_RESERVE_TRANS_BLOCKS); + EXT4_RESERVE_TRANS_BLOCKS, + ext4_free_metadata_revoke_credits(inode->i_sb, 1)); if (retval < 0) return retval; ext4_free_blocks(handle, inode, NULL, @@ -291,7 +296,7 @@ static int ext4_ext_swap_inode_data(handle_t *handle, struct inode *inode, * One credit accounted for writing the * i_data field of the original inode */ - retval = ext4_journal_ensure_credits(handle, 1); + retval = ext4_journal_ensure_credits(handle, 1, 0); if (retval < 0) goto err_out;
@@ -368,7 +373,8 @@ static int free_ext_idx(handle_t *handle, struct inode *inode, } } put_bh(bh); - retval = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS); + retval = ext4_journal_ensure_credits(handle, EXT4_RESERVE_TRANS_BLOCKS, + ext4_free_metadata_revoke_credits(inode->i_sb, 1)); if (retval < 0) return retval; ext4_free_blocks(handle, inode, NULL, block, 1, @@ -548,7 +554,7 @@ int ext4_ext_migrate(struct inode *inode) }
/* We mark the tmp_inode dirty via ext4_ext_tree_init. */ - retval = ext4_journal_ensure_credits(handle, 1); + retval = ext4_journal_ensure_credits(handle, 1, 0); if (retval < 0) goto out_stop; /* diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index a1e6082..462469a 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -388,6 +388,12 @@ static struct buffer_head *bclean(handle_t *handle, struct super_block *sb, return bh; }
+static int ext4_resize_ensure_credits_batch(handle_t *handle, int credits) +{ + return ext4_journal_ensure_credits_fn(handle, credits, + EXT4_MAX_TRANS_DATA, 0, 0); +} + /* * set_flexbg_block_bitmap() mark clusters [@first_cluster, @last_cluster] used. * @@ -427,7 +433,7 @@ static int set_flexbg_block_bitmap(struct super_block *sb, handle_t *handle, continue; }
- err = ext4_journal_ensure_credits_batch(handle, 1); + err = ext4_resize_ensure_credits_batch(handle, 1); if (err < 0) return err;
@@ -520,7 +526,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb, struct buffer_head *gdb;
ext4_debug("update backup group %#04llx\n", block); - err = ext4_journal_ensure_credits_batch(handle, 1); + err = ext4_resize_ensure_credits_batch(handle, 1); if (err < 0) goto out;
@@ -578,7 +584,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
/* Initialize block bitmap of the @group */ block = group_data[i].block_bitmap; - err = ext4_journal_ensure_credits_batch(handle, 1); + err = ext4_resize_ensure_credits_batch(handle, 1); if (err < 0) goto out;
@@ -607,7 +613,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
/* Initialize inode bitmap of the @group */ block = group_data[i].inode_bitmap; - err = ext4_journal_ensure_credits_batch(handle, 1); + err = ext4_resize_ensure_credits_batch(handle, 1); if (err < 0) goto out; /* Mark unused entries in inode bitmap used */ @@ -1085,7 +1091,7 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data, ext4_fsblk_t backup_block;
/* Out of journal space, and can't get more - abort - so sad */ - err = ext4_journal_ensure_credits_batch(handle, 1); + err = ext4_resize_ensure_credits_batch(handle, 1); if (err < 0) break;
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index f11aeb9..37986d5 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1159,6 +1159,7 @@ static int ext4_xattr_restart_fn(handle_t *handle, struct inode *inode, }
err = ext4_journal_ensure_credits_fn(handle, credits, credits, + ext4_free_metadata_revoke_credits(parent->i_sb, 1), ext4_xattr_restart_fn(handle, parent, bh, block_csum, dirty)); if (err < 0) { @@ -2845,7 +2846,8 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode, struct inode *ea_inode; int error;
- error = ext4_journal_ensure_credits(handle, extra_credits); + error = ext4_journal_ensure_credits(handle, extra_credits, + ext4_free_metadata_revoke_credits(inode->i_sb, 1)); if (error < 0) { EXT4_ERROR_INODE(inode, "ensure credits (error %d)", error); goto cleanup; diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 0dfb174..68e1b06 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -1745,15 +1745,16 @@
TRACE_EVENT(ext4_journal_start, TP_PROTO(struct super_block *sb, int blocks, int rsv_blocks, - unsigned long IP), + int revoke_creds, unsigned long IP),
- TP_ARGS(sb, blocks, rsv_blocks, IP), + TP_ARGS(sb, blocks, rsv_blocks, revoke_creds, IP),
TP_STRUCT__entry( __field( dev_t, dev ) __field(unsigned long, ip ) __field( int, blocks ) __field( int, rsv_blocks ) + __field( int, revoke_creds ) ),
TP_fast_assign( @@ -1761,11 +1762,13 @@ __entry->ip = IP; __entry->blocks = blocks; __entry->rsv_blocks = rsv_blocks; + __entry->revoke_creds = revoke_creds; ),
- TP_printk("dev %d,%d blocks, %d rsv_blocks, %d caller %pS", - MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->blocks, __entry->rsv_blocks, (void *)__entry->ip) + TP_printk("dev %d,%d blocks %d, rsv_blocks %d, revoke_creds %d, " + "caller %pS", MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->blocks, __entry->rsv_blocks, __entry->revoke_creds, + (void *)__entry->ip) );
TRACE_EVENT(ext4_journal_start_reserved,
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 0094f981bbaca3ae707c95c5e5977429d29c2dd0 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Provide trace event for handle restarts to ease debugging.
Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 8 +++++++- include/trace/events/jbd2.h | 16 +++++++++++++++- 2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 247f61fc..3b1efc4 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -713,6 +713,7 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, journal_t *journal; tid_t tid; int need_to_start; + int ret;
/* If we've had an abort of any type, don't even think about * actually doing the restart! */ @@ -742,7 +743,12 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, DIV_ROUND_UP(revoke_records, journal->j_revoke_records_per_block); handle->h_revoke_credits = revoke_records; - return start_this_handle(journal, handle, gfp_mask); + ret = start_this_handle(journal, handle, gfp_mask); + trace_jbd2_handle_restart(journal->j_fs_dev->bd_dev, + ret ? 0 : handle->h_transaction->t_tid, + handle->h_type, handle->h_line_no, + handle->h_total_credits); + return ret; } EXPORT_SYMBOL(jbd2__journal_restart);
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h index 2310b25..d16a328 100644 --- a/include/trace/events/jbd2.h +++ b/include/trace/events/jbd2.h @@ -133,7 +133,7 @@ (unsigned long) __entry->ino) );
-TRACE_EVENT(jbd2_handle_start, +DECLARE_EVENT_CLASS(jbd2_handle_start_class, TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, unsigned int line_no, int requested_blocks),
@@ -161,6 +161,20 @@ __entry->type, __entry->line_no, __entry->requested_blocks) );
+DEFINE_EVENT(jbd2_handle_start_class, jbd2_handle_start, + TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + unsigned int line_no, int requested_blocks), + + TP_ARGS(dev, tid, type, line_no, requested_blocks) +); + +DEFINE_EVENT(jbd2_handle_start_class, jbd2_handle_restart, + TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + unsigned int line_no, int requested_blocks), + + TP_ARGS(dev, tid, type, line_no, requested_blocks) +); + TRACE_EVENT(jbd2_handle_extend, TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, unsigned int line_no, int buffer_credits,
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 19014d697147c6aea3a34eea00a2844e698d070f category: bugfix bugzilla: 25031 CVE: NA ---------------------------
Currently we reserve j_max_transaction_buffers / 32 for transaction descriptor blocks. Now that revoke descriptors are accounted for separately this estimate is unnecessarily high and we can actually compute much tighter estimate. In the common case of 32k journal blocks and 4k blocksize this actually reduces the amount of reserved descriptor blocks from 256 to ~25 which allows us to fit more real data into a transaction.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/transaction.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 3b1efc4..fddfd6e 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -63,14 +63,25 @@ void jbd2_journal_free_transaction(transaction_t *transaction) }
/* - * We reserve t_outstanding_credits >> JBD2_CONTROL_BLOCKS_SHIFT for - * transaction descriptor blocks. + * Base amount of descriptor blocks we reserve for each transaction. */ -#define JBD2_CONTROL_BLOCKS_SHIFT 5 - static int jbd2_descriptor_blocks_per_trans(journal_t *journal) { - return journal->j_max_transaction_buffers >> JBD2_CONTROL_BLOCKS_SHIFT; + int tag_space = journal->j_blocksize - sizeof(journal_header_t); + int tags_per_block; + + /* Subtract UUID */ + tag_space -= 16; + if (jbd2_journal_has_csum_v2or3(journal)) + tag_space -= sizeof(struct jbd2_journal_block_tail); + /* Commit code leaves a slack space of 16 bytes at the end of block */ + tags_per_block = (tag_space - 16) / journal_tag_bytes(journal); + /* + * Revoke descriptors are accounted separately so we need to reserve + * space for commit block and normal transaction descriptor blocks. + */ + return 1 + DIV_ROUND_UP(journal->j_max_transaction_buffers, + tags_per_block); }
/*
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.5-rc1 commit 3c845acd0237caef617f330a0e3b37ad8ae9fea5 category: bugfix bugzilla: 25031 CVE: NA ---------------------------
The helper jbd2_handle_buffer_credits() doesn't correctly handle reserved handles which can lead to crashes. Fix it getting of journal pointer to work for reserved handles as well.
Fixes: a9a8344ee171 ("ext4, jbd2: Provide accessor function for handle credits") Reported-by: Eric Biggers ebiggers@kernel.org Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191115102210.29445-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/jbd2.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index ee6de17e..7dd9d25 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1654,10 +1654,14 @@ static inline tid_t jbd2_get_latest_transaction(journal_t *journal) return tid; }
- static inline int jbd2_handle_buffer_credits(handle_t *handle) { - journal_t *journal = handle->h_transaction->t_journal; + journal_t *journal; + + if (!handle->h_reserved) + journal = handle->h_transaction->t_journal; + else + journal = handle->h_journal;
return handle->h_total_credits - DIV_ROUND_UP(handle->h_revoke_credits_requested,
From: yangerkun yangerkun@huawei.com
mainline inclusion from mainline-5.5-rc3 commit a70fd5ac2ea787cafe07b69dadd16b3648ad64ac category: bugfix bugzilla: 25031 CVE: NA -----------------------------------
It's possible that __ext4_new_inode will release the xattr block, so it will trigger a warning since there is revoke credits will be 0 if the handle == NULL. The below scripts can reproduce it easily.
------------[ cut here ]------------ WARNING: CPU: 0 PID: 3861 at fs/jbd2/revoke.c:374 jbd2_journal_revoke+0x30e/0x540 fs/jbd2/revoke.c:374 ... __ext4_forget+0x1d7/0x800 fs/ext4/ext4_jbd2.c:248 ext4_free_blocks+0x213/0x1d60 fs/ext4/mballoc.c:4743 ext4_xattr_release_block+0x55b/0x780 fs/ext4/xattr.c:1254 ext4_xattr_block_set+0x1c2c/0x2c40 fs/ext4/xattr.c:2112 ext4_xattr_set_handle+0xa7e/0x1090 fs/ext4/xattr.c:2384 __ext4_set_acl+0x54d/0x6c0 fs/ext4/acl.c:214 ext4_init_acl+0x218/0x2e0 fs/ext4/acl.c:293 __ext4_new_inode+0x352a/0x42b0 fs/ext4/ialloc.c:1151 ext4_mkdir+0x2e9/0xbd0 fs/ext4/namei.c:2774 vfs_mkdir+0x386/0x5f0 fs/namei.c:3811 do_mkdirat+0x11c/0x210 fs/namei.c:3834 do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:294 ... -------------------------------------
scripts: mkfs.ext4 /dev/vdb mount /dev/vdb /mnt cd /mnt && mkdir dir && for i in {1..8}; do setfacl -dm "u:user_"$i":rx" dir; done mkdir dir/dir1 && mv dir/dir1 ./ sh repro.sh && add some user
[root@localhost ~]# cat repro.sh while [ 1 -eq 1 ]; do rm -rf dir rm -rf dir1/dir1 mkdir dir for i in {1..8}; do setfacl -dm "u:test"$i":rx" dir; done setfacl -m "u:user_9:rx" dir & mkdir dir1/dir1 & done
Before exec repro.sh, dir1 has inherit the default acl from dir, and xattr block of dir1 dir is not the same, so the h_refcount of these two dir's xattr block will be 1. Then repro.sh can trigger the warning with the situation show as below. The last h_refcount can be clear with mkdir, and __ext4_new_inode has not reserved revoke credits, so the warning will happened, fix it by reserve revoke credits in __ext4_new_inode.
Thread 1 Thread 2 mkdir dir set default acl(will create a xattr block blk1 and the refcount of ext4_xattr_header will be 1) ... mkdir dir1/dir1 ->....->ext4_init_acl ->__ext4_set_acl(set default acl, will reuse blk1, and h_refcount will be 2)
setfacl->ext4_set_acl->... ->ext4_xattr_block_set(will create new block blk2 to store xattr)
->__ext4_set_acl(set access acl, since h_refcount of blk1 is 2, will create blk3 to store xattr)
->ext4_xattr_release_block(dec h_refcount of blk1 to 1) ->ext4_xattr_release_block(dec h_refcount and since it is 0, will release the block and trigger the warning)
Link: https://lore.kernel.org/r/20191213014900.47228-1-yangerkun@huawei.com Reported-by: Hulk Robot hulkci@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/ialloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 0b5e65f..0abb72d 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -926,8 +926,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, if (!handle) { BUG_ON(nblocks <= 0); handle = __ext4_journal_start_sb(dir->i_sb, line_no, - handle_type, nblocks, - 0, 0); + handle_type, nblocks, 0, + ext4_trans_default_revoke_credits(sb)); if (IS_ERR(handle)) { err = PTR_ERR(handle); ext4_std_error(sb, err);
From: Al Viro viro@zeniv.linux.org.uk
commit d0cb50185ae942b03c4327be322055d622dc79f6 upstream.
may_create_in_sticky() call is done when we already have dropped the reference to dir.
Fixes: 30aba6656f61e (namei: allow restricted O_CREAT of FIFOs and regular files) Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/namei.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c index 6448cfb..1dd68b3 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1009,7 +1009,8 @@ static int may_linkat(struct path *link) * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory * should be allowed, or not, on files that already * exist. - * @dir: the sticky parent directory + * @dir_mode: mode bits of directory + * @dir_uid: owner of directory * @inode: the inode of the file to open * * Block an O_CREAT open of a FIFO (or a regular file) when: @@ -1025,18 +1026,18 @@ static int may_linkat(struct path *link) * * Returns 0 if the open is allowed, -ve on error. */ -static int may_create_in_sticky(struct dentry * const dir, +static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid, struct inode * const inode) { if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) || (!sysctl_protected_regular && S_ISREG(inode->i_mode)) || - likely(!(dir->d_inode->i_mode & S_ISVTX)) || - uid_eq(inode->i_uid, dir->d_inode->i_uid) || + likely(!(dir_mode & S_ISVTX)) || + uid_eq(inode->i_uid, dir_uid) || uid_eq(current_fsuid(), inode->i_uid)) return 0;
- if (likely(dir->d_inode->i_mode & 0002) || - (dir->d_inode->i_mode & 0020 && + if (likely(dir_mode & 0002) || + (dir_mode & 0020 && ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) || (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) { return -EACCES; @@ -3265,6 +3266,8 @@ static int do_last(struct nameidata *nd, struct file *file, const struct open_flags *op) { struct dentry *dir = nd->path.dentry; + kuid_t dir_uid = dir->d_inode->i_uid; + umode_t dir_mode = dir->d_inode->i_mode; int open_flag = op->open_flag; bool will_truncate = (open_flag & O_TRUNC) != 0; bool got_write = false; @@ -3400,7 +3403,7 @@ static int do_last(struct nameidata *nd, error = -EISDIR; if (d_is_dir(nd->path.dentry)) goto out; - error = may_create_in_sticky(dir, + error = may_create_in_sticky(dir_mode, dir_uid, d_backing_inode(nd->path.dentry)); if (unlikely(error)) goto out;
From: Al Viro viro@zeniv.linux.org.uk
commit 6404674acd596de41fd3ad5f267b4525494a891a upstream.
Brown paperbag time: fetching ->i_uid/->i_mode really should've been done from nd->inode. I even suggested that, but the reason for that has slipped through the cracks and I went for dir->d_inode instead - made for more "obvious" patch.
Analysis:
- at the entry into do_last() and all the way to step_into(): dir (aka nd->path.dentry) is known not to have been freed; so's nd->inode and it's equal to dir->d_inode unless we are already doomed to -ECHILD. inode of the file to get opened is not known.
- after step_into(): inode of the file to get opened is known; dir might be pointing to freed memory/be negative/etc.
- at the call of may_create_in_sticky(): guaranteed to be out of RCU mode; inode of the file to get opened is known and pinned; dir might be garbage.
The last was the reason for the original patch. Except that at the do_last() entry we can be in RCU mode and it is possible that nd->path.dentry->d_inode has already changed under us.
In that case we are going to fail with -ECHILD, but we need to be careful; nd->inode is pointing to valid struct inode and it's the same as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we should use that.
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" tommi.t.rantala@nokia.com Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com Wearing-brown-paperbag: Al Viro viro@zeniv.linux.org.uk Cc: stable@kernel.org Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late") Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/namei.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c index 1dd68b3..18ddae1 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3266,8 +3266,8 @@ static int do_last(struct nameidata *nd, struct file *file, const struct open_flags *op) { struct dentry *dir = nd->path.dentry; - kuid_t dir_uid = dir->d_inode->i_uid; - umode_t dir_mode = dir->d_inode->i_mode; + kuid_t dir_uid = nd->inode->i_uid; + umode_t dir_mode = nd->inode->i_mode; int open_flag = op->open_flag; bool will_truncate = (open_flag & O_TRUNC) != 0; bool got_write = false;
From: Yufen Yu yuyufen@huawei.com
hulk inclusion category: bugfix bugzilla: 30109 CVE: NA ---------------------------
We reported kernel crash:
[201962.639350] Call trace: [201962.644403] string+0x28/0xa0 [201962.650501] vsnprintf+0x5f0/0x748 [201962.657472] seq_vprintf+0x70/0x98 [201962.664442] seq_printf+0x7c/0xa0 [201962.671238] __blkg_prfill_rwstat+0x84/0x128 [201962.679949] blkg_prfill_rwstat_field+0x94/0xc0 [201962.689182] blkcg_print_blkgs+0xcc/0x140 [201962.697370] blkg_print_stat_bytes+0x4c/0x60 [201962.706083] cgroup_seqfile_show+0x58/0xc0 [201962.714446] kernfs_seq_show+0x44/0x50 [201962.722112] seq_read+0xd4/0x4a8 [201962.728732] kernfs_fop_read+0x16c/0x218 [201962.736748] __vfs_read+0x60/0x188 [201962.743717] vfs_read+0x94/0x150 [201962.750338] ksys_read+0x6c/0xd8 [201962.756958] __arm64_sys_read+0x24/0x30 [201962.764800] el0_svc_common+0x78/0x130 [201962.772466] el0_svc_handler+0x38/0x78 [201962.780131] el0_svc+0x8/0xc
__blkg_prfill_rwstat() tried to get the device name by 'bdi->dev', while the 'dev' have been freed by bdi_release(). The race as following:
blkg_print_stat_bytes __scsi_remove_device del_gendisk bdi_unregister
put_device(bdi->dev) kfree(bdi->dev)
__blkg_prfill_rwstat blkg_dev_name //use the freed bdi->dev dev_name(blkg->q->backing_dev_info->dev)
bdi->dev = NULL
Since blkg_dev_name() have been coverd by rcu_read_lock/unlock(), we wait all rcu reader before free 'bdi->dev' to avoid use-after-free.
Link: https://lore.kernel.org/linux-block/20200211140038.146629-1-yuyufen@huawei.c... Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-cgroup.c | 7 ++++-- include/linux/backing-dev-defs.h | 5 +++- include/linux/device.h | 5 ++++ mm/backing-dev.c | 49 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 58 insertions(+), 8 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index a06547f..e592167 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -473,8 +473,11 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css, const char *blkg_dev_name(struct blkcg_gq *blkg) { /* some drivers (floppy) instantiate a queue w/o disk registered */ - if (blkg->q->backing_dev_info->dev) - return dev_name(blkg->q->backing_dev_info->dev); + struct rcu_device *rcu_dev; + + rcu_dev = rcu_dereference(blkg->q->backing_dev_info->rcu_dev); + if (rcu_dev) + return dev_name(&rcu_dev->dev); return NULL; } EXPORT_SYMBOL_GPL(blkg_dev_name); diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 3cfe7de..73fb6bc 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -201,7 +201,10 @@ struct backing_dev_info { #endif wait_queue_head_t wb_waitq;
- struct device *dev; + union { + struct rcu_device *rcu_dev; + struct device *dev; + }; struct device *owner;
struct timer_list laptop_mode_wb_timer; diff --git a/include/linux/device.h b/include/linux/device.h index 23a1529..0d6960c 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1073,6 +1073,11 @@ struct device { KABI_RESERVE(16) };
+struct rcu_device { + struct device dev; + struct rcu_head rcu_head; +}; + static inline struct device *kobj_to_dev(struct kobject *kobj) { return container_of(kobj, struct device, kobj); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 00277c8..040d778 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -872,19 +872,43 @@ struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id) } EXPORT_SYMBOL(bdi_alloc_node);
+static void bdi_device_release(struct device *dev) +{ + struct rcu_device *rcu_dev = container_of(dev, + struct rcu_device, dev); + kfree(rcu_dev); +} + int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) { struct device *dev; + struct rcu_device *rcu_dev; + int retval = -ENODEV;
if (bdi->dev) /* The driver needs to use separate queues per device */ return 0;
- dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); - if (IS_ERR(dev)) - return PTR_ERR(dev); + rcu_dev = kzalloc(sizeof(struct rcu_device), GFP_KERNEL); + if (!rcu_dev) + return -ENOMEM; + + /* initialize device */ + dev = &rcu_dev->dev; + device_initialize(dev); + dev->class = bdi_class; + dev->release = bdi_device_release; + dev->devt = MKDEV(0, 0); + dev_set_drvdata(dev, (void *)bdi); + retval = kobject_set_name_vargs(&dev->kobj, fmt, args); + if (retval) + goto error; + + retval = device_add(dev); + if (retval) + goto error;
cgwb_bdi_register(bdi); - bdi->dev = dev; + bdi->rcu_dev = rcu_dev;
bdi_debug_register(bdi, dev_name(dev)); set_bit(WB_registered, &bdi->wb.state); @@ -895,6 +919,10 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
trace_writeback_bdi_register(bdi); return 0; + +error: + kfree(rcu_dev); + return retval; } EXPORT_SYMBOL(bdi_register_va);
@@ -937,17 +965,28 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi) synchronize_rcu_expedited(); }
+static void bdi_put_device_rcu(struct rcu_head *rcu) +{ + struct rcu_device *rcu_dev = container_of(rcu, struct rcu_device, + rcu_head); + put_device(&rcu_dev->dev); +} void bdi_unregister(struct backing_dev_info *bdi) { /* make sure nobody finds us on the bdi_list anymore */ + struct rcu_device *rcu_dev = bdi->rcu_dev; bdi_remove_from_list(bdi); wb_shutdown(&bdi->wb); cgwb_bdi_unregister(bdi);
if (bdi->dev) { bdi_debug_unregister(bdi); + get_device(bdi->dev); device_unregister(bdi->dev); - bdi->dev = NULL; + rcu_assign_pointer(bdi->dev, NULL); + + /* wait all rcu reader of bdi->dev before free dev */ + call_rcu(&rcu_dev->rcu_head, bdi_put_device_rcu); }
if (bdi->owner) {
From: Yufen Yu yuyufen@huawei.com
hulk inclusion category: bugfix bugzilla: 30109 CVE: NA ---------------------------
Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/backing-dev-defs.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 73fb6bc..f2bda82 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -201,10 +201,14 @@ struct backing_dev_info { #endif wait_queue_head_t wb_waitq;
+#ifndef __GENKSYMS__ union { struct rcu_device *rcu_dev; struct device *dev; }; +#else + struct device *dev; +#endif struct device *owner;
struct timer_list laptop_mode_wb_timer;
From: Shijie Luo luoshijie1@huawei.com
mainline inclusion from mainline-v5.6-rc2 commit af133ade9a40794a37104ecbcc2827c0ea373a3c category: bugfix bugzilla: 13690 CVE: CVE-2020-8992
-------------------------------------------------
When journal size is set too big by "mkfs.ext4 -J size=", or when we mount a crafted image to make journal inode->i_size too big, the loop, "while (i < num)", holds cpu too long. This could cause soft lockup.
[ 529.357541] Call trace: [ 529.357551] dump_backtrace+0x0/0x198 [ 529.357555] show_stack+0x24/0x30 [ 529.357562] dump_stack+0xa4/0xcc [ 529.357568] watchdog_timer_fn+0x300/0x3e8 [ 529.357574] __hrtimer_run_queues+0x114/0x358 [ 529.357576] hrtimer_interrupt+0x104/0x2d8 [ 529.357580] arch_timer_handler_virt+0x38/0x58 [ 529.357584] handle_percpu_devid_irq+0x90/0x248 [ 529.357588] generic_handle_irq+0x34/0x50 [ 529.357590] __handle_domain_irq+0x68/0xc0 [ 529.357593] gic_handle_irq+0x6c/0x150 [ 529.357595] el1_irq+0xb8/0x140 [ 529.357599] __ll_sc_atomic_add_return_acquire+0x14/0x20 [ 529.357668] ext4_map_blocks+0x64/0x5c0 [ext4] [ 529.357693] ext4_setup_system_zone+0x330/0x458 [ext4] [ 529.357717] ext4_fill_super+0x2170/0x2ba8 [ext4] [ 529.357722] mount_bdev+0x1a8/0x1e8 [ 529.357746] ext4_mount+0x44/0x58 [ext4] [ 529.357748] mount_fs+0x50/0x170 [ 529.357752] vfs_kern_mount.part.9+0x54/0x188 [ 529.357755] do_mount+0x5ac/0xd78 [ 529.357758] ksys_mount+0x9c/0x118 [ 529.357760] __arm64_sys_mount+0x28/0x38 [ 529.357764] el0_svc_common+0x78/0x130 [ 529.357766] el0_svc_handler+0x38/0x78 [ 529.357769] el0_svc+0x8/0xc [ 541.356516] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mount:18674]
Link: https://lore.kernel.org/r/20200211011752.29242-1-luoshijie1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Shijie Luo luoshijie1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/block_validity.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c index d4d4fdf..ff8e120 100644 --- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -203,6 +203,7 @@ static int ext4_protect_reserved_inode(struct super_block *sb, return PTR_ERR(inode); num = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits; while (i < num) { + cond_resched(); map.m_lblk = i; map.m_len = num - i; n = ext4_map_blocks(NULL, inode, &map, 0);
From: yu kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 30213 CVE: CVE-2019-19770
---------------------------
syzbot is reporting use after free bug in debugfs_remove[1].
This is because in request_queue, 'q->debugfs_dir' and 'q->blk_trace->dir' could be the same dir. And in __blk_release_queue(), blk_mq_debugfs_unregister() will remove everything inside the dir.
With futher investigation of the reporduce repro, the problem can be reporduced by following procedure:
1. LOOP_CTL_ADD, create a request_queue q1, blk_mq_debugfs_register() will create the dir. 2. LOOP_CTL_REMOVE, blk_release_queue() will add q1 to release queue. 3. LOOP_CTL_ADD, create another request_queue q2,blk_mq_debugfs_register() will fail because the dir aready exist. 4. BLKTRACESETUP, create two files(msg and dropped) inside the dir. 5. call __blk_release_queue() for q1, debugfs_remove_recursive() will delete the files created in step 4. 6. LOOP_CTL_REMOVE, blk_release_queue() will add q2 to release queue. And when __blk_release_queue() is called for q2, blk_trace_shutdown() will try to release the two files created in step 4, wich are aready released in step 5.
thread1 |kworker |thread2 ----------------------- | ------------------------ | -------------------- loop_control_ioctl | | loop_add | | blk_mq_debugfs_register| | debugfs_create_dir | | loop_control_ioctl | | loop_remove | | blk_release_queue | | schedule_work | | | |loop_control_ioctl | | loop_add | | ... | |blk_trace_ioctl | | __blk_trace_setup | | debugfs_create_file |__blk_release_queue | | blk_mq_debugfs_unregister| | debugfs_remove_recursive| | |loop_control_ioctl | | loop_remove | | ... |__blk_release_queue | | blk_trace_shutdown | | debugfs_remove |
commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") pushed the final release of request_queue to a workqueue, so, when loop_add() is called again(step 3), __blk_release_queue() might not been called yet, which causes the problem.
Fix the problem by renaming 'q->debugfs_dir' or 'q->blk_trace->dir' in blk_unregister_queue() if they exist.
[1] https://syzkaller.appspot.com/bug?extid=903b72a010ad6b7a40f2 References: CVE-2019-19770 Fixes: commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") Reported-by: syzbot syz...@syzkaller.appspotmail.com Signed-off-by: yu kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-sysfs.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 8286640..d249d7f 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -11,6 +11,7 @@ #include <linux/blktrace_api.h> #include <linux/blk-mq.h> #include <linux/blk-cgroup.h> +#include <linux/debugfs.h>
#include "blk.h" #include "blk-mq.h" @@ -958,6 +959,48 @@ int blk_register_queue(struct gendisk *disk) } EXPORT_SYMBOL_GPL(blk_register_queue);
+#ifdef CONFIG_DEBUG_FS +/* + * blk_prepare_release_queue - rename q->debugfs_dir and q->blk_trace->dir + * @q: request_queue of which the dir to be renamed belong to. + * + * Because the final release of request_queue is in a workqueue, the + * cleanup might not been finished yet while the same device start to + * create. It's not correct if q->debugfs_dir still exist while trying + * to create a new one. + */ +static void blk_prepare_release_queue(struct request_queue *q) +{ + struct dentry *new = NULL; + struct dentry **old = NULL; + char name[DNAME_INLINE_LEN]; + int i = 0; + +#ifdef CONFIG_BLK_DEBUG_FS + if (!IS_ERR_OR_NULL(q->debugfs_dir)) + old = &q->debugfs_dir; +#endif +#ifdef CONFIG_BLK_DEV_IO_TRACE + /* q->debugfs_dir and q->blk_trace->dir can't both exist */ + if (q->blk_trace && !IS_ERR_OR_NULL(q->blk_trace->dir)) + old = &q->blk_trace->dir; +#endif + if (old == NULL) + return; +#define MAX_ATTEMPT 1024 + while (new == NULL && i < MAX_ATTEMPT) { + sprintf(name, "ready_to_remove_%s_%d", + kobject_name(q->kobj.parent), i++); + new = debugfs_rename(blk_debugfs_root, *old, + blk_debugfs_root, name); + } + if (new) + *old = new; +} +#else +#define blk_prepare_release_queue(q) do { } while (0) +#endif + /** * blk_unregister_queue - counterpart of blk_register_queue() * @disk: Disk of which the request queue should be unregistered from sysfs. @@ -982,6 +1025,7 @@ void blk_unregister_queue(struct gendisk *disk) * concurrent elv_iosched_store() calls. */ mutex_lock(&q->sysfs_lock); + blk_prepare_release_queue(q);
blk_queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
From: Amir Goldstein amir73il@gmail.com
mainline inclusion from mainline-5.3-rc1 commit 823e545c027795997f29ec5c255aff605cf39e85 category: bugfix bugzilla: 24454 CVE: MA
---------------------------
Move simple_unlink()+d_delete() from __debugfs_remove_file() into caller __debugfs_remove() and rename helper for post remove file to __debugfs_file_removed().
This will simplify adding fsnotify_unlink() hook.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: yu kuai yukuai3@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/debugfs/inode.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index e5126fa..d68477b 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -620,13 +620,10 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent, } EXPORT_SYMBOL_GPL(debugfs_create_symlink);
-static void __debugfs_remove_file(struct dentry *dentry, struct dentry *parent) +static void __debugfs_file_removed(struct dentry *dentry) { struct debugfs_fsdata *fsd;
- simple_unlink(d_inode(parent), dentry); - d_delete(dentry); - /* * Paired with the closing smp_mb() implied by a successful * cmpxchg() in debugfs_file_get(): either @@ -647,16 +644,15 @@ static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
if (simple_positive(dentry)) { dget(dentry); - if (!d_is_reg(dentry)) { - if (d_is_dir(dentry)) - ret = simple_rmdir(d_inode(parent), dentry); - else - simple_unlink(d_inode(parent), dentry); - if (!ret) - d_delete(dentry); + if (d_is_dir(dentry)) { + ret = simple_rmdir(d_inode(parent), dentry); } else { - __debugfs_remove_file(dentry, parent); + simple_unlink(d_inode(parent), dentry); } + if (!ret) + d_delete(dentry); + if (d_is_reg(dentry)) + __debugfs_file_removed(dentry); dput(dentry); } return ret;
From: yu kuai yukuai3@huawei.com
mainline inclusion from mainline-5.6-rc1 commit a3d1e7eb5abe3aa1095bc75d1a6760d3809bd672 category: bugfix bugzilla: 24454 CVE: NA
---------------------------
two requirements: no file creations in IS_DEADDIR and no cross-directory renames whatsoever.
Signed-off-by: Al Viro viro@zeniv.linux.org.uk
Conflicts: fs/debugfs/inode.c fs/libfs.c fs/tracefs/inode.c include/linux/debugfs.h include/linux/fs.h include/linux/tracefs.h kernel/trace/trace.c functional changes: replace current_time() with current_fs_time() remove call to fsnotify_rmdir() and fsnotify_unlink()
Signed-off-by: yu kuai yukuai3@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/debugfs/inode.c | 118 ++++++-------------------------------------- fs/libfs.c | 65 ++++++++++++++++++++++++ fs/tracefs/inode.c | 111 +++++------------------------------------ include/linux/debugfs.h | 2 +- include/linux/fs.h | 2 + include/linux/tracefs.h | 1 - kernel/trace/trace.c | 4 +- kernel/trace/trace_events.c | 6 +-- kernel/trace/trace_hwlat.c | 2 +- 9 files changed, 99 insertions(+), 212 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index d68477b..11664fd 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -310,7 +310,10 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) parent = debugfs_mount->mnt_root;
inode_lock(d_inode(parent)); - dentry = lookup_one_len(name, parent, strlen(name)); + if (unlikely(IS_DEADDIR(d_inode(parent)))) + dentry = ERR_PTR(-ENOENT); + else + dentry = lookup_one_len(name, parent, strlen(name)); if (!IS_ERR(dentry) && d_really_is_positive(dentry)) { dput(dentry); dentry = ERR_PTR(-EEXIST); @@ -638,59 +641,15 @@ static void __debugfs_file_removed(struct dentry *dentry) wait_for_completion(&fsd->active_users_drained); }
-static int __debugfs_remove(struct dentry *dentry, struct dentry *parent) -{ - int ret = 0; - - if (simple_positive(dentry)) { - dget(dentry); - if (d_is_dir(dentry)) { - ret = simple_rmdir(d_inode(parent), dentry); - } else { - simple_unlink(d_inode(parent), dentry); - } - if (!ret) - d_delete(dentry); - if (d_is_reg(dentry)) - __debugfs_file_removed(dentry); - dput(dentry); - } - return ret; -} - -/** - * debugfs_remove - removes a file or directory from the debugfs filesystem - * @dentry: a pointer to a the dentry of the file or directory to be - * removed. If this parameter is NULL or an error value, nothing - * will be done. - * - * This function removes a file or directory in debugfs that was previously - * created with a call to another debugfs function (like - * debugfs_create_file() or variants thereof.) - * - * This function is required to be called in order for the file to be - * removed, no automatic cleanup of files will happen when a module is - * removed, you are responsible here. - */ -void debugfs_remove(struct dentry *dentry) +static void remove_one(struct dentry *victim) { - struct dentry *parent; - int ret; - - if (IS_ERR_OR_NULL(dentry)) - return; - - parent = dentry->d_parent; - inode_lock(d_inode(parent)); - ret = __debugfs_remove(dentry, parent); - inode_unlock(d_inode(parent)); - if (!ret) - simple_release_fs(&debugfs_mount, &debugfs_mount_count); + if (d_is_reg(victim)) + __debugfs_file_removed(victim); + simple_release_fs(&debugfs_mount, &debugfs_mount_count); } -EXPORT_SYMBOL_GPL(debugfs_remove);
/** - * debugfs_remove_recursive - recursively removes a directory + * debugfs_remove - recursively removes a directory * @dentry: a pointer to a the dentry of the directory to be removed. If this * parameter is NULL or an error value, nothing will be done. * @@ -702,65 +661,16 @@ void debugfs_remove(struct dentry *dentry) * removed, no automatic cleanup of files will happen when a module is * removed, you are responsible here. */ -void debugfs_remove_recursive(struct dentry *dentry) +void debugfs_remove(struct dentry *dentry) { - struct dentry *child, *parent; - if (IS_ERR_OR_NULL(dentry)) return;
- parent = dentry; - down: - inode_lock(d_inode(parent)); - loop: - /* - * The parent->d_subdirs is protected by the d_lock. Outside that - * lock, the child can be unlinked and set to be freed which can - * use the d_u.d_child as the rcu head and corrupt this list. - */ - spin_lock(&parent->d_lock); - list_for_each_entry(child, &parent->d_subdirs, d_child) { - if (!simple_positive(child)) - continue; - - /* perhaps simple_empty(child) makes more sense */ - if (!list_empty(&child->d_subdirs)) { - spin_unlock(&parent->d_lock); - inode_unlock(d_inode(parent)); - parent = child; - goto down; - } - - spin_unlock(&parent->d_lock); - - if (!__debugfs_remove(child, parent)) - simple_release_fs(&debugfs_mount, &debugfs_mount_count); - - /* - * The parent->d_lock protects agaist child from unlinking - * from d_subdirs. When releasing the parent->d_lock we can - * no longer trust that the next pointer is valid. - * Restart the loop. We'll skip this one with the - * simple_positive() check. - */ - goto loop; - } - spin_unlock(&parent->d_lock); - - inode_unlock(d_inode(parent)); - child = parent; - parent = parent->d_parent; - inode_lock(d_inode(parent)); - - if (child != dentry) - /* go up */ - goto loop; - - if (!__debugfs_remove(child, parent)) - simple_release_fs(&debugfs_mount, &debugfs_mount_count); - inode_unlock(d_inode(parent)); + simple_pin_fs(&debug_fs_type, &debugfs_mount, &debugfs_mount_count); + simple_recursive_removal(dentry, remove_one); + simple_release_fs(&debugfs_mount, &debugfs_mount_count); } -EXPORT_SYMBOL_GPL(debugfs_remove_recursive); +EXPORT_SYMBOL_GPL(debugfs_remove);
/** * debugfs_rename - rename a file/directory in the debugfs filesystem diff --git a/fs/libfs.c b/fs/libfs.c index 9840beb..f7a0cf4 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -234,6 +234,71 @@ ssize_t generic_read_dir(struct file *filp, char __user *buf, size_t siz, loff_t }; EXPORT_SYMBOL(simple_dir_inode_operations);
+static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev) +{ + struct dentry *child = NULL; + struct list_head *p = prev ? &prev->d_child : &parent->d_subdirs; + + spin_lock(&parent->d_lock); + while ((p = p->next) != &parent->d_subdirs) { + struct dentry *d = container_of(p, struct dentry, d_child); + if (simple_positive(d)) { + spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED); + if (simple_positive(d)) + child = dget_dlock(d); + spin_unlock(&d->d_lock); + if (likely(child)) + break; + } + } + spin_unlock(&parent->d_lock); + dput(prev); + return child; +} + +void simple_recursive_removal(struct dentry *dentry, + void (*callback)(struct dentry *)) +{ + struct dentry *this = dget(dentry); + while (true) { + struct dentry *victim = NULL, *child; + struct inode *inode = this->d_inode; + + inode_lock(inode); + if (d_is_dir(this)) + inode->i_flags |= S_DEAD; + while ((child = find_next_child(this, victim)) == NULL) { + // kill and ascend + // update metadata while it's still locked + inode->i_ctime = current_time(inode); + clear_nlink(inode); + inode_unlock(inode); + victim = this; + this = this->d_parent; + inode = this->d_inode; + inode_lock(inode); + if (simple_positive(victim)) { + d_invalidate(victim); // avoid lost mounts + if (callback) + callback(victim); + dput(victim); // unpin it + } + if (victim == dentry) { + inode->i_ctime = inode->i_mtime = + current_time(inode); + if (d_is_dir(dentry)) + drop_nlink(inode); + inode_unlock(inode); + dput(dentry); + return; + } + } + inode_unlock(inode); + this = child; + } +} +EXPORT_SYMBOL(simple_recursive_removal); + static const struct super_operations simple_super_operations = { .statfs = simple_statfs, }; diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 7098c49..7cfb2c7 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -333,7 +333,10 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) parent = tracefs_mount->mnt_root;
inode_lock(parent->d_inode); - dentry = lookup_one_len(name, parent, strlen(name)); + if (unlikely(IS_DEADDIR(parent->d_inode))) + dentry = ERR_PTR(-ENOENT); + else + dentry = lookup_one_len(name, parent, strlen(name)); if (!IS_ERR(dentry) && dentry->d_inode) { dput(dentry); dentry = ERR_PTR(-EEXIST); @@ -499,119 +502,27 @@ __init struct dentry *tracefs_create_instance_dir(const char *name, return dentry; }
-static int __tracefs_remove(struct dentry *dentry, struct dentry *parent) +static void remove_one(struct dentry *victim) { - int ret = 0; - - if (simple_positive(dentry)) { - if (dentry->d_inode) { - dget(dentry); - switch (dentry->d_inode->i_mode & S_IFMT) { - case S_IFDIR: - ret = simple_rmdir(parent->d_inode, dentry); - break; - default: - simple_unlink(parent->d_inode, dentry); - break; - } - if (!ret) - d_delete(dentry); - dput(dentry); - } - } - return ret; -} - -/** - * tracefs_remove - removes a file or directory from the tracefs filesystem - * @dentry: a pointer to a the dentry of the file or directory to be - * removed. - * - * This function removes a file or directory in tracefs that was previously - * created with a call to another tracefs function (like - * tracefs_create_file() or variants thereof.) - */ -void tracefs_remove(struct dentry *dentry) -{ - struct dentry *parent; - int ret; - - if (IS_ERR_OR_NULL(dentry)) - return; - - parent = dentry->d_parent; - inode_lock(parent->d_inode); - ret = __tracefs_remove(dentry, parent); - inode_unlock(parent->d_inode); - if (!ret) - simple_release_fs(&tracefs_mount, &tracefs_mount_count); + simple_release_fs(&tracefs_mount, &tracefs_mount_count); }
/** - * tracefs_remove_recursive - recursively removes a directory + * tracefs_remove - recursively removes a directory * @dentry: a pointer to a the dentry of the directory to be removed. * * This function recursively removes a directory tree in tracefs that * was previously created with a call to another tracefs function * (like tracefs_create_file() or variants thereof.) */ -void tracefs_remove_recursive(struct dentry *dentry) +void tracefs_remove(struct dentry *dentry) { - struct dentry *child, *parent; - if (IS_ERR_OR_NULL(dentry)) return;
- parent = dentry; - down: - inode_lock(parent->d_inode); - loop: - /* - * The parent->d_subdirs is protected by the d_lock. Outside that - * lock, the child can be unlinked and set to be freed which can - * use the d_u.d_child as the rcu head and corrupt this list. - */ - spin_lock(&parent->d_lock); - list_for_each_entry(child, &parent->d_subdirs, d_child) { - if (!simple_positive(child)) - continue; - - /* perhaps simple_empty(child) makes more sense */ - if (!list_empty(&child->d_subdirs)) { - spin_unlock(&parent->d_lock); - inode_unlock(parent->d_inode); - parent = child; - goto down; - } - - spin_unlock(&parent->d_lock); - - if (!__tracefs_remove(child, parent)) - simple_release_fs(&tracefs_mount, &tracefs_mount_count); - - /* - * The parent->d_lock protects agaist child from unlinking - * from d_subdirs. When releasing the parent->d_lock we can - * no longer trust that the next pointer is valid. - * Restart the loop. We'll skip this one with the - * simple_positive() check. - */ - goto loop; - } - spin_unlock(&parent->d_lock); - - inode_unlock(parent->d_inode); - child = parent; - parent = parent->d_parent; - inode_lock(parent->d_inode); - - if (child != dentry) - /* go up */ - goto loop; - - if (!__tracefs_remove(child, parent)) - simple_release_fs(&tracefs_mount, &tracefs_mount_count); - inode_unlock(parent->d_inode); + simple_pin_fs(&trace_fs_type, &tracefs_mount, &tracefs_mount_count); + simple_recursive_removal(dentry, remove_one); + simple_release_fs(&tracefs_mount, &tracefs_mount_count); }
/** diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index 3b0ba54..5fdcfef 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -82,7 +82,7 @@ struct dentry *debugfs_create_automount(const char *name, void *data);
void debugfs_remove(struct dentry *dentry); -void debugfs_remove_recursive(struct dentry *dentry); +#define debugfs_remove_recursive debugfs_remove
const struct file_operations *debugfs_real_fops(const struct file *filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h index 0d9a2e2..daf6aa5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3231,6 +3231,8 @@ extern void iterate_supers_type(struct file_system_type *, extern int simple_rmdir(struct inode *, struct dentry *); extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int); +extern void simple_recursive_removal(struct dentry *, + void (*callback)(struct dentry *)); extern int noop_fsync(struct file *, loff_t, loff_t, int); extern int noop_set_page_dirty(struct page *page); extern void noop_invalidatepage(struct page *page, unsigned int offset, diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 5b727a1..955a5b0 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -32,7 +32,6 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *tracefs_create_dir(const char *name, struct dentry *parent);
void tracefs_remove(struct dentry *dentry); -void tracefs_remove_recursive(struct dentry *dentry);
struct dentry *tracefs_create_instance_dir(const char *name, struct dentry *parent, int (*mkdir)(const char *name), diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 32602a1..e32aae0 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7846,7 +7846,7 @@ static int instance_mkdir(const char *name)
ret = event_trace_add_tracer(tr->dir, tr); if (ret) { - tracefs_remove_recursive(tr->dir); + tracefs_remove(tr->dir); goto out_free_tr; }
@@ -7914,7 +7914,7 @@ static int instance_rmdir(const char *name) event_trace_del_tracer(tr); ftrace_clear_pids(tr); ftrace_destroy_function_files(tr); - tracefs_remove_recursive(tr->dir); + tracefs_remove(tr->dir); free_trace_buffers(tr);
for (i = 0; i < tr->nr_topts; i++) { diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index ec340e1..e90135e 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -704,7 +704,7 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) return;
if (!--dir->nr_events) { - tracefs_remove_recursive(dir->entry); + tracefs_remove(dir->entry); list_del(&dir->list); __put_system_dir(dir); } @@ -723,7 +723,7 @@ static void remove_event_file_dir(struct trace_event_file *file) } spin_unlock(&dir->d_lock);
- tracefs_remove_recursive(dir); + tracefs_remove(dir); }
list_del(&file->list); @@ -3061,7 +3061,7 @@ int event_trace_del_tracer(struct trace_array *tr)
down_write(&trace_event_sem); __trace_remove_event_dirs(tr); - tracefs_remove_recursive(tr->event_dir); + tracefs_remove(tr->event_dir); up_write(&trace_event_sem);
tr->event_dir = NULL; diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 8030e24d..c6cd54c 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -553,7 +553,7 @@ static int init_tracefs(void) return 0;
err: - tracefs_remove_recursive(top_dir); + tracefs_remove(top_dir); return -ENOMEM; }
From: yu kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 24454 CVE: NA
---------------------------
debugfs_remove_recursive was changed from a function to an alias to debugfs_remove in patch "simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems". Change it back to a function.
Signed-off-by: yu kuai yukuai3@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/debugfs/inode.c | 6 ++++++ include/linux/debugfs.h | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index 11664fd..59e5e49 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -672,6 +672,12 @@ void debugfs_remove(struct dentry *dentry) } EXPORT_SYMBOL_GPL(debugfs_remove);
+void debugfs_remove_recursive(struct dentry *dentry) +{ + debugfs_remove(dentry); +} +EXPORT_SYMBOL_GPL(debugfs_remove_recursive); + /** * debugfs_rename - rename a file/directory in the debugfs filesystem * @old_dir: a pointer to the parent dentry for the renamed object. This diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index 5fdcfef..3b0ba54 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -82,7 +82,7 @@ struct dentry *debugfs_create_automount(const char *name, void *data);
void debugfs_remove(struct dentry *dentry); -#define debugfs_remove_recursive debugfs_remove +void debugfs_remove_recursive(struct dentry *dentry);
const struct file_operations *debugfs_real_fops(const struct file *filp);