mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 64 participants
  • 19409 discussions
[PATCH openEuler-22.03-LTS-SP1] ext4: avoid deadlock in fs reclaim with page writeback
by Baokun Li 11 Jun '24

11 Jun '24
From: Jan Kara <jack(a)suse.cz> mainline inclusion from mainline-v6.4-rc2 commit 00d873c17e29cc32d90ca852b82685f1673acaa5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SYGK Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Ext4 has a filesystem wide lock protecting ext4_writepages() calls to avoid races with switching of journalled data flag or inode format. This lock can however cause a deadlock like: CPU0 CPU1 ext4_writepages() percpu_down_read(sbi->s_writepages_rwsem); ext4_change_inode_journal_flag() percpu_down_write(sbi->s_writepages_rwsem); - blocks, all readers block from now on ext4_do_writepages() ext4_init_io_end() kmem_cache_zalloc(io_end_cachep, GFP_KERNEL) fs_reclaim frees dentry... dentry_unlink_inode() iput() - last ref => iput_final() - inode dirty => write_inode_now()... ext4_writepages() tries to acquire sbi->s_writepages_rwsem and blocks forever Make sure we cannot recurse into filesystem reclaim from writeback code to avoid the deadlock. Reported-by: syzbot+6898da502aef574c5f8a(a)syzkaller.appspotmail.com Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages") CC: stable(a)vger.kernel.org Signed-off-by: Jan Kara <jack(a)suse.cz> Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Conflicts: fs/ext4/ext4.h fs/ext4/super.c fs/ext4/inode.c [Because we have merged in 583b2fb551d0 ("ext4: fix race between writepages and remount") and we don't have ext4_do_writepages() yet and i_mmap_sem hasn't switched to invalidate_lock yet.] Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- fs/ext4/ext4.h | 25 +++++++++++++++++++++++++ fs/ext4/inode.c | 18 ++++++++++-------- fs/ext4/migrate.c | 11 ++++++----- fs/ext4/super.c | 11 ++++++----- 4 files changed, 47 insertions(+), 18 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5d5ae6f44510..3cf1fe773f4f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -37,6 +37,7 @@ #include <linux/falloc.h> #include <linux/percpu-rwsem.h> #include <linux/fiemap.h> +#include <linux/sched/mm.h> #ifdef __KERNEL__ #include <linux/compat.h> #endif @@ -1689,6 +1690,30 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode) return container_of(inode, struct ext4_inode_info, vfs_inode); } +static inline int ext4_writepages_down_read(struct super_block *sb) +{ + percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem); + return memalloc_nofs_save(); +} + +static inline void ext4_writepages_up_read(struct super_block *sb, int ctx) +{ + memalloc_nofs_restore(ctx); + percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem); +} + +static inline int ext4_writepages_down_write(struct super_block *sb) +{ + percpu_down_write(&EXT4_SB(sb)->s_writepages_rwsem); + return memalloc_nofs_save(); +} + +static inline void ext4_writepages_up_write(struct super_block *sb, int ctx) +{ + memalloc_nofs_restore(ctx); + percpu_up_write(&EXT4_SB(sb)->s_writepages_rwsem); +} + static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f26d9b88d48d..9c438215cf56 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2705,13 +2705,14 @@ static int ext4_writepages(struct address_space *mapping, struct blk_plug plug; bool give_up_on_write = false; unsigned long retry_warn_ddl = 0; + int alloc_ctx; #define RETRY_WARN_TIMEOUT (30 * HZ) if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; - percpu_down_read(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_read(inode->i_sb); trace_ext4_writepages(inode, wbc); /* @@ -2932,7 +2933,7 @@ static int ext4_writepages(struct address_space *mapping, out_writepages: trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); - percpu_up_read(&sbi->s_writepages_rwsem); + ext4_writepages_up_read(inode->i_sb, alloc_ctx); return ret; } @@ -2943,17 +2944,18 @@ static int ext4_dax_writepages(struct address_space *mapping, long nr_to_write = wbc->nr_to_write; struct inode *inode = mapping->host; struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb); + int alloc_ctx; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; - percpu_down_read(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_read(inode->i_sb); trace_ext4_writepages(inode, wbc); ret = dax_writeback_mapping_range(mapping, sbi->s_daxdev, wbc); trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); - percpu_up_read(&sbi->s_writepages_rwsem); + ext4_writepages_up_read(inode->i_sb, alloc_ctx); return ret; } @@ -6065,7 +6067,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) journal_t *journal; handle_t *handle; int err; - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + int alloc_ctx; /* * We have to be very careful here: changing a data block's @@ -6103,7 +6105,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) } } - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(inode->i_sb); jbd2_journal_lock_updates(journal); /* @@ -6120,7 +6122,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) err = jbd2_journal_flush(journal); if (err < 0) { jbd2_journal_unlock_updates(journal); - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); return err; } ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA); @@ -6128,7 +6130,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) ext4_set_aops(inode); jbd2_journal_unlock_updates(journal); - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); if (val) up_write(&EXT4_I(inode)->i_mmap_sem); diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index 04320715d61f..274f7ab37fff 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -409,7 +409,6 @@ static int free_ext_block(handle_t *handle, struct inode *inode) int ext4_ext_migrate(struct inode *inode) { - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); handle_t *handle; int retval = 0, i; __le32 *i_data; @@ -419,6 +418,7 @@ int ext4_ext_migrate(struct inode *inode) unsigned long max_entries; __u32 goal, tmp_csum_seed; uid_t owner[2]; + int alloc_ctx; /* * If the filesystem does not support extents, or the inode @@ -434,7 +434,7 @@ int ext4_ext_migrate(struct inode *inode) */ return retval; - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(inode->i_sb); /* * Worst case we can touch the allocation bitmaps and a block @@ -586,7 +586,7 @@ int ext4_ext_migrate(struct inode *inode) unlock_new_inode(tmp_inode); iput(tmp_inode); out_unlock: - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); return retval; } @@ -605,6 +605,7 @@ int ext4_ind_migrate(struct inode *inode) ext4_fsblk_t blk; handle_t *handle; int ret, ret2 = 0; + int alloc_ctx; if (!ext4_has_feature_extents(inode->i_sb) || (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) @@ -621,7 +622,7 @@ int ext4_ind_migrate(struct inode *inode) if (test_opt(inode->i_sb, DELALLOC)) ext4_alloc_da_blocks(inode); - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(inode->i_sb); handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1); if (IS_ERR(handle)) { @@ -665,6 +666,6 @@ int ext4_ind_migrate(struct inode *inode) ext4_journal_stop(handle); up_write(&EXT4_I(inode)->i_data_sem); out_unlock: - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); return ret; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ac032f517a60..2bdb16df5c23 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5939,6 +5939,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) ext4_group_t g; unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO; int err = 0; + int alloc_ctx; #ifdef CONFIG_QUOTA int i, j; char *to_free[EXT4_MAXQUOTAS]; @@ -5991,13 +5992,13 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) * here s_writepages_rwsem to avoid race between writepages ops and * remount. */ - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(sb); if (!parse_options(data, sb, NULL, &journal_ioprio, 1)) { err = -EINVAL; - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(sb, alloc_ctx); goto restore_opts; } - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(sb, alloc_ctx); if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^ test_opt(sb, JOURNAL_CHECKSUM)) { @@ -6215,7 +6216,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) return 0; restore_opts: - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(sb); sb->s_flags = old_sb_flags; sbi->s_mount_opt = old_opts.s_mount_opt; sbi->s_mount_opt2 = old_opts.s_mount_opt2; @@ -6224,7 +6225,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) sbi->s_commit_interval = old_opts.s_commit_interval; sbi->s_min_batch_time = old_opts.s_min_batch_time; sbi->s_max_batch_time = old_opts.s_max_batch_time; - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(sb, alloc_ctx); if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks) ext4_release_system_zone(sb); -- 2.31.1
2 1
0 0
[PATCH OLK-5.10] ext4: avoid deadlock in fs reclaim with page writeback
by Baokun Li 11 Jun '24

11 Jun '24
From: Jan Kara <jack(a)suse.cz> mainline inclusion from mainline-v6.4-rc2 commit 00d873c17e29cc32d90ca852b82685f1673acaa5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SYGK Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Ext4 has a filesystem wide lock protecting ext4_writepages() calls to avoid races with switching of journalled data flag or inode format. This lock can however cause a deadlock like: CPU0 CPU1 ext4_writepages() percpu_down_read(sbi->s_writepages_rwsem); ext4_change_inode_journal_flag() percpu_down_write(sbi->s_writepages_rwsem); - blocks, all readers block from now on ext4_do_writepages() ext4_init_io_end() kmem_cache_zalloc(io_end_cachep, GFP_KERNEL) fs_reclaim frees dentry... dentry_unlink_inode() iput() - last ref => iput_final() - inode dirty => write_inode_now()... ext4_writepages() tries to acquire sbi->s_writepages_rwsem and blocks forever Make sure we cannot recurse into filesystem reclaim from writeback code to avoid the deadlock. Reported-by: syzbot+6898da502aef574c5f8a(a)syzkaller.appspotmail.com Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages") CC: stable(a)vger.kernel.org Signed-off-by: Jan Kara <jack(a)suse.cz> Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Conflicts: fs/ext4/ext4.h fs/ext4/super.c fs/ext4/inode.c [Because we have merged in b66c23ec7fcf ("ext4: fix race between writepages and remount") and we don't have ext4_do_writepages() yet and i_mmap_sem hasn't switched to invalidate_lock yet.] Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- fs/ext4/ext4.h | 25 +++++++++++++++++++++++++ fs/ext4/inode.c | 18 ++++++++++-------- fs/ext4/migrate.c | 11 ++++++----- fs/ext4/super.c | 12 +++++++----- 4 files changed, 48 insertions(+), 18 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b701f7ac7f66..a9530484147d 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -37,6 +37,7 @@ #include <linux/falloc.h> #include <linux/percpu-rwsem.h> #include <linux/fiemap.h> +#include <linux/sched/mm.h> #ifdef __KERNEL__ #include <linux/compat.h> #endif @@ -1701,6 +1702,30 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode) return container_of(inode, struct ext4_inode_info, vfs_inode); } +static inline int ext4_writepages_down_read(struct super_block *sb) +{ + percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem); + return memalloc_nofs_save(); +} + +static inline void ext4_writepages_up_read(struct super_block *sb, int ctx) +{ + memalloc_nofs_restore(ctx); + percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem); +} + +static inline int ext4_writepages_down_write(struct super_block *sb) +{ + percpu_down_write(&EXT4_SB(sb)->s_writepages_rwsem); + return memalloc_nofs_save(); +} + +static inline void ext4_writepages_up_write(struct super_block *sb, int ctx) +{ + memalloc_nofs_restore(ctx); + percpu_up_write(&EXT4_SB(sb)->s_writepages_rwsem); +} + static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 412af3c5fde4..0f8d23aa2838 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2709,13 +2709,14 @@ static int ext4_writepages(struct address_space *mapping, struct blk_plug plug; bool give_up_on_write = false; unsigned long retry_warn_ddl = 0; + int alloc_ctx; #define RETRY_WARN_TIMEOUT (30 * HZ) if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; - percpu_down_read(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_read(inode->i_sb); trace_ext4_writepages(inode, wbc); /* @@ -2936,7 +2937,7 @@ static int ext4_writepages(struct address_space *mapping, out_writepages: trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); - percpu_up_read(&sbi->s_writepages_rwsem); + ext4_writepages_up_read(inode->i_sb, alloc_ctx); return ret; } @@ -2947,17 +2948,18 @@ static int ext4_dax_writepages(struct address_space *mapping, long nr_to_write = wbc->nr_to_write; struct inode *inode = mapping->host; struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb); + int alloc_ctx; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; - percpu_down_read(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_read(inode->i_sb); trace_ext4_writepages(inode, wbc); ret = dax_writeback_mapping_range(mapping, sbi->s_daxdev, wbc); trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); - percpu_up_read(&sbi->s_writepages_rwsem); + ext4_writepages_up_read(inode->i_sb, alloc_ctx); return ret; } @@ -6107,7 +6109,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) journal_t *journal; handle_t *handle; int err; - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + int alloc_ctx; /* * We have to be very careful here: changing a data block's @@ -6145,7 +6147,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) } } - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(inode->i_sb); jbd2_journal_lock_updates(journal); /* @@ -6162,7 +6164,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) err = jbd2_journal_flush(journal); if (err < 0) { jbd2_journal_unlock_updates(journal); - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); return err; } ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA); @@ -6170,7 +6172,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) ext4_set_aops(inode); jbd2_journal_unlock_updates(journal); - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); if (val) up_write(&EXT4_I(inode)->i_mmap_sem); diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index b0ea646454ac..50e449ec62bf 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -409,7 +409,6 @@ static int free_ext_block(handle_t *handle, struct inode *inode) int ext4_ext_migrate(struct inode *inode) { - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); handle_t *handle; int retval = 0, i; __le32 *i_data; @@ -419,6 +418,7 @@ int ext4_ext_migrate(struct inode *inode) unsigned long max_entries; __u32 goal, tmp_csum_seed; uid_t owner[2]; + int alloc_ctx; /* * If the filesystem does not support extents, or the inode @@ -435,7 +435,7 @@ int ext4_ext_migrate(struct inode *inode) */ return retval; - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(inode->i_sb); /* * Worst case we can touch the allocation bitmaps and a block @@ -587,7 +587,7 @@ int ext4_ext_migrate(struct inode *inode) unlock_new_inode(tmp_inode); iput(tmp_inode); out_unlock: - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); return retval; } @@ -606,6 +606,7 @@ int ext4_ind_migrate(struct inode *inode) ext4_fsblk_t blk; handle_t *handle; int ret, ret2 = 0; + int alloc_ctx; if (!ext4_has_feature_extents(inode->i_sb) || (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) @@ -622,7 +623,7 @@ int ext4_ind_migrate(struct inode *inode) if (test_opt(inode->i_sb, DELALLOC)) ext4_alloc_da_blocks(inode); - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(inode->i_sb); handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1); if (IS_ERR(handle)) { @@ -666,6 +667,6 @@ int ext4_ind_migrate(struct inode *inode) ext4_journal_stop(handle); up_write(&EXT4_I(inode)->i_data_sem); out_unlock: - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(inode->i_sb, alloc_ctx); return ret; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 90f886184fc6..bfa592bfd298 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5961,6 +5961,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) ext4_group_t g; unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO; int err = 0; + int alloc_ctx; #ifdef CONFIG_QUOTA int enable_quota = 0; int i, j; @@ -6014,13 +6015,13 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) * here s_writepages_rwsem to avoid race between writepages ops and * remount. */ - percpu_down_write(&sbi->s_writepages_rwsem); + alloc_ctx = ext4_writepages_down_write(sb); if (!parse_options(data, sb, NULL, &journal_ioprio, 1)) { err = -EINVAL; - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(sb, alloc_ctx); goto restore_opts; } - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(sb, alloc_ctx); if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^ test_opt(sb, JOURNAL_CHECKSUM)) { @@ -6247,7 +6248,8 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) if ((sb->s_flags & SB_RDONLY) && !(old_sb_flags & SB_RDONLY) && sb_any_quota_suspended(sb)) dquot_resume(sb, -1); - percpu_down_write(&sbi->s_writepages_rwsem); + + alloc_ctx = ext4_writepages_down_write(sb); sb->s_flags = old_sb_flags; sbi->s_mount_opt = old_opts.s_mount_opt; sbi->s_mount_opt2 = old_opts.s_mount_opt2; @@ -6256,7 +6258,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) sbi->s_commit_interval = old_opts.s_commit_interval; sbi->s_min_batch_time = old_opts.s_min_batch_time; sbi->s_max_batch_time = old_opts.s_max_batch_time; - percpu_up_write(&sbi->s_writepages_rwsem); + ext4_writepages_up_write(sb, alloc_ctx); if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks) ext4_release_system_zone(sb); -- 2.31.1
2 1
0 0
[PATCH OLK-5.10] phonet: fix rtm_phonet_notify() skb allocation
by Wang Wensheng 11 Jun '24

11 Jun '24
From: Eric Dumazet <edumazet(a)google.com> mainline inclusion from mainline-v6.9 commit d8cac8568618dcb8a51af3db1103e8d4cc4aeea7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9U4L2 CVE: CVE-2024-36946 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- fill_route() stores three components in the skb: - struct rtmsg - RTA_DST (u8) - RTA_OIF (u32) Therefore, rtm_phonet_notify() should use NLMSG_ALIGN(sizeof(struct rtmsg)) + nla_total_size(1) + nla_total_size(4) Fixes: f062f41d0657 ("Phonet: routing table Netlink interface") Signed-off-by: Eric Dumazet <edumazet(a)google.com> Acked-by: Rémi Denis-Courmont <courmisch(a)gmail.com> Link: https://lore.kernel.org/r/20240502161700.1804476-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Wang Wensheng <wangwensheng4(a)huawei.com> --- net/phonet/pn_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c index 59aebe296890..dd4c7e9a634f 100644 --- a/net/phonet/pn_netlink.c +++ b/net/phonet/pn_netlink.c @@ -193,7 +193,7 @@ void rtm_phonet_notify(int event, struct net_device *dev, u8 dst) struct sk_buff *skb; int err = -ENOBUFS; - skb = nlmsg_new(NLMSG_ALIGN(sizeof(struct ifaddrmsg)) + + skb = nlmsg_new(NLMSG_ALIGN(sizeof(struct rtmsg)) + nla_total_size(1) + nla_total_size(4), GFP_KERNEL); if (skb == NULL) goto errout; -- 2.17.1
2 1
0 0
[PATCH openEuler-1.0-LTS] i40e: Fix NULL pointer dereference in i40e_dbg_dump_desc
by Li Huafei 11 Jun '24

11 Jun '24
From: Norbert Zulinski <norbertx.zulinski(a)intel.com> stable inclusion from stable-v5.10.85 commit e5b7fb2198abc50058f1a29c395b004f76ab1c83 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9S22G CVE: CVE-2021-47501 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 23ec111bf3549aae37140330c31a16abfc172421 upstream. When trying to dump VFs VSI RX/TX descriptors using debugfs there was a crash due to NULL pointer dereference in i40e_dbg_dump_desc. Added a check to i40e_dbg_dump_desc that checks if VSI type is correct for dumping RX/TX descriptors. Fixes: 02e9c290814c ("i40e: debugfs interface") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com> Signed-off-by: Norbert Zulinski <norbertx.zulinski(a)intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski(a)intel.com> Tested-by: Gurucharan G <gurucharanx.g(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: drivers/net/ethernet/intel/i40e/i40e_debugfs.c [ Conflicts caused by not merging 44ea803e2fa7 ("i40e: introduce new dump desc XDP command"). ] Signed-off-by: Li Huafei <lihuafei1(a)huawei.com> --- drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c index 56b911a5dd8be..3e6c6585012f9 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c @@ -506,6 +506,14 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n, dev_info(&pf->pdev->dev, "vsi %d not found\n", vsi_seid); return; } + if (vsi->type != I40E_VSI_MAIN && + vsi->type != I40E_VSI_FDIR && + vsi->type != I40E_VSI_VMDQ2) { + dev_info(&pf->pdev->dev, + "vsi %d type %d descriptor rings not available\n", + vsi_seid, vsi->type); + return; + } if (ring_id >= vsi->num_queue_pairs || ring_id < 0) { dev_info(&pf->pdev->dev, "ring %d not found\n", ring_id); return; -- 2.25.1
2 1
0 0
[PATCH openEuler-22.03-LTS-SP1] hugetlbfs: fix hugetlbfs_statfs() locking
by Yifan Qiao 11 Jun '24

11 Jun '24
From: Mina Almasry <almasrymina(a)google.com> mainline inclusion from mainline-v5.19-rc1 commit 4b25f030ae69ba710eff587cabb4c57cb7e7a8a1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SY02 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ---------------------------------------------------------------------- After commit db71ef79b59b ("hugetlb: make free_huge_page irq safe"), the subpool lock should be locked with spin_lock_irq() and all call sites was modified as such, except for the ones in hugetlbfs_statfs(). Link: https://lkml.kernel.org/r/20220429202207.3045-1-almasrymina@google.com Fixes: db71ef79b59b ("hugetlb: make free_huge_page irq safe") Signed-off-by: Mina Almasry <almasrymina(a)google.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- fs/hugetlbfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 8274f096e89f..73a008dfe322 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1122,12 +1122,12 @@ static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf) if (sbinfo->spool) { long free_pages; - spin_lock(&sbinfo->spool->lock); + spin_lock_irq(&sbinfo->spool->lock); buf->f_blocks = sbinfo->spool->max_hpages; free_pages = sbinfo->spool->max_hpages - sbinfo->spool->used_hpages; buf->f_bavail = buf->f_bfree = free_pages; - spin_unlock(&sbinfo->spool->lock); + spin_unlock_irq(&sbinfo->spool->lock); buf->f_files = sbinfo->max_inodes; buf->f_ffree = sbinfo->free_inodes; } -- 2.39.2
2 1
0 0
[PATCH OLK-5.10] hugetlbfs: fix hugetlbfs_statfs() locking
by Yifan Qiao 11 Jun '24

11 Jun '24
From: Mina Almasry <almasrymina(a)google.com> mainline inclusion from mainline-v5.19-rc1 commit 4b25f030ae69ba710eff587cabb4c57cb7e7a8a1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SY02 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ---------------------------------------------------------------------- After commit db71ef79b59b ("hugetlb: make free_huge_page irq safe"), the subpool lock should be locked with spin_lock_irq() and all call sites was modified as such, except for the ones in hugetlbfs_statfs(). Link: https://lkml.kernel.org/r/20220429202207.3045-1-almasrymina@google.com Fixes: db71ef79b59b ("hugetlb: make free_huge_page irq safe") Signed-off-by: Mina Almasry <almasrymina(a)google.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- fs/hugetlbfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 508c20ef9061..1203178914cc 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1133,12 +1133,12 @@ static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf) if (sbinfo->spool) { long free_pages; - spin_lock(&sbinfo->spool->lock); + spin_lock_irq(&sbinfo->spool->lock); buf->f_blocks = sbinfo->spool->max_hpages; free_pages = sbinfo->spool->max_hpages - sbinfo->spool->used_hpages; buf->f_bavail = buf->f_bfree = free_pages; - spin_unlock(&sbinfo->spool->lock); + spin_unlock_irq(&sbinfo->spool->lock); buf->f_files = sbinfo->max_inodes; buf->f_ffree = sbinfo->free_inodes; } -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] gpiolib: cdev: fix uninitialised kfifo
by Luo Gengkun 11 Jun '24

11 Jun '24
From: Kent Gibson <warthog618(a)gmail.com> mainline inclusion from mainline-v6.9 commit ee0166b637a5e376118e9659e5b4148080f1d27e category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9U7YV CVE: CVE-2024-36898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- If a line is requested with debounce, and that results in debouncing in software, and the line is subsequently reconfigured to enable edge detection then the allocation of the kfifo to contain edge events is overlooked. This results in events being written to and read from an uninitialised kfifo. Read events are returned to userspace. Initialise the kfifo in the case where the software debounce is already active. Fixes: 65cff7046406 ("gpiolib: cdev: support setting debounce") Signed-off-by: Kent Gibson <warthog618(a)gmail.com> Link: https://lore.kernel.org/r/20240510065342.36191-1-warthog618@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Conflicts: drivers/gpio/gpiolib-cdev.c [fix context conflict] Signed-off-by: Luo Gengkun <luogengkun2(a)huawei.com> --- drivers/gpio/gpiolib-cdev.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c index 84125e55de10..29b671c92621 100644 --- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -1121,6 +1121,8 @@ static int edge_detector_update(struct line *line, struct gpio_v2_line_config *lc, unsigned int line_idx, u64 edflags) { + u64 eflags; + int ret; u64 active_edflags = READ_ONCE(line->edflags); unsigned int debounce_period_us = gpio_v2_line_config_debounce_period(lc, line_idx); @@ -1132,6 +1134,18 @@ static int edge_detector_update(struct line *line, /* sw debounced and still will be...*/ if (debounce_period_us && READ_ONCE(line->sw_debounced)) { WRITE_ONCE(line->desc->debounce_period_us, debounce_period_us); + /* + * ensure event fifo is initialised if edge detection + * is now enabled. + */ + eflags = edflags & GPIO_V2_LINE_EDGE_FLAGS; + if (eflags && !kfifo_initialized(&line->req->events)) { + ret = kfifo_alloc(&line->req->events, + line->req->event_buffer_size, + GFP_KERNEL); + if (ret) + return ret; + } return 0; } -- 2.34.1
2 1
0 0
[PATCH OLK-5.10] xfs: fix interval filtering in multi-step fsmap queries
by Zizhi Wo 11 Jun '24

11 Jun '24
From: "Darrick J. Wong" <djwong(a)kernel.org> mainline inclusion from mainline-v6.5-rc1 commit 63ef7a35912dd743cabd65d5bb95891625c0dd46 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA470G Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- I noticed a bug in ranged GETFSMAP queries: EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 8:80 [0..7]: static fs metadata 0 (0..7) 8 <snip> 9: 8:80 [192..223]: 137 0..31 0 (192..223) 32 That's not right -- we asked what block maps block 208, and we should've received a mapping for inode 137 offset 16. Instead, we get nothing. The root cause of this problem is a mis-interaction between the fsmap code and how btree ranged queries work. xfs_btree_query_range returns any btree record that overlaps with the query interval, even if the record starts before or ends after the interval. Similarly, GETFSMAP is supposed to return a recordset containing all records that overlap the range queried. However, it's possible that the recordset is larger than the buffer that the caller provided to convey mappings to userspace. In /that/ case, userspace is supposed to copy the last record returned to fmh_keys[0] and call GETFSMAP again. In this case, we do not want to return mappings that we have already supplied to the caller. The call to xfs_btree_query_range is the same, but now we ignore any records that start before fmh_keys[0]. Unfortunately, we didn't implement the filtering predicate correctly. The predicate should only be called when we're calling back for more records. Accomplish this by setting info->low.rm_blockcount to a nonzero value and ensuring that it is cleared as necessary. As a result, we no longer want to adjust dkeys[0] in the main setup function because that's confusing. This patch doesn't touch the logdev/rtbitmap backends because they have bigger problems that will be addressed by subsequent patches. Found via xfs/556 with parent pointers enabled. Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong <djwong(a)kernel.org> Reviewed-by: Dave Chinner <dchinner(a)redhat.com> Conflicts: fs/xfs/xfs_fsmap.c [Because there are many conflicting patches that need to be adapted, the in-place context adaptation is performed directly.] Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/xfs/xfs_fsmap.c | 67 +++++++++++++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 19 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 595450a99ae2..deaed94011ad 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -161,7 +161,14 @@ struct xfs_getfsmap_info { u64 missing_owner; /* owner of holes */ u32 dev; /* device id */ xfs_agnumber_t agno; /* AG number, if applicable */ - struct xfs_rmap_irec low; /* low rmap key */ + /* + * Low rmap key for the query. If low.rm_blockcount is nonzero, this + * is the second (or later) call to retrieve the recordset in pieces. + * xfs_getfsmap_rec_before_start will compare all records retrieved + * by the rmapbt query to filter out any records that start before + * the last record. + */ + struct xfs_rmap_irec low; struct xfs_rmap_irec high; /* high rmap key */ bool last; /* last extent? */ }; @@ -237,6 +244,17 @@ xfs_getfsmap_format( xfs_fsmap_from_internal(rec, xfm); } +static inline bool +xfs_getfsmap_rec_before_start( + struct xfs_getfsmap_info *info, + const struct xfs_rmap_irec *rec, + xfs_daddr_t rec_daddr) +{ + if (info->low.rm_blockcount) + return xfs_rmap_compare(rec, &info->low) < 0; + return false; +} + /* * Format a reverse mapping for getfsmap, having translated rm_startblock * into the appropriate daddr units. @@ -260,7 +278,7 @@ xfs_getfsmap_helper( * Filter out records that start before our startpoint, if the * caller requested that. */ - if (xfs_rmap_compare(rec, &info->low) < 0) { + if (xfs_getfsmap_rec_before_start(info, rec, rec_daddr)) { rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; @@ -604,9 +622,27 @@ __xfs_getfsmap_datadev( error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); if (error) return error; - info->low.rm_blockcount = 0; + info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length); xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + /* Adjust the low key if we are continuing from where we left off. */ + if (info->low.rm_blockcount == 0) { + /* empty */ + } else if (XFS_RMAP_NON_INODE_OWNER(info->low.rm_owner) || + (info->low.rm_flags & (XFS_RMAP_ATTR_FORK | + XFS_RMAP_BMBT_BLOCK | + XFS_RMAP_UNWRITTEN))) { + info->low.rm_startblock += info->low.rm_blockcount; + info->low.rm_owner = 0; + info->low.rm_offset = 0; + + start_fsb += info->low.rm_blockcount; + if (XFS_FSB_TO_DADDR(mp, start_fsb) >= eofs) + return 0; + } else { + info->low.rm_offset += info->low.rm_blockcount; + } + info->high.rm_startblock = -1U; info->high.rm_owner = ULLONG_MAX; info->high.rm_offset = ULLONG_MAX; @@ -657,12 +693,8 @@ __xfs_getfsmap_datadev( * Set the AG low key to the start of the AG prior to * moving on to the next AG. */ - if (info->agno == start_ag) { - info->low.rm_startblock = 0; - info->low.rm_owner = 0; - info->low.rm_offset = 0; - info->low.rm_flags = 0; - } + if (info->agno == start_ag) + memset(&info->low, 0, sizeof(info->low)); } /* Report any gap at the end of the AG */ @@ -886,21 +918,17 @@ xfs_getfsmap( * blocks could be mapped to several other files/offsets. * According to rmapbt record ordering, the minimal next * possible record for the block range is the next starting - * offset in the same inode. Therefore, bump the file offset to - * continue the search appropriately. For all other low key - * mapping types (attr blocks, metadata), bump the physical - * offset as there can be no other mapping for the same physical - * block range. + * offset in the same inode. Therefore, each fsmap backend bumps + * the file offset to continue the search appropriately. For + * all other low key mapping types (attr blocks, metadata), each + * fsmap backend bumps the physical offset as there can be no + * other mapping for the same physical block range. */ dkeys[0] = head->fmh_keys[0]; if (dkeys[0].fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) { - dkeys[0].fmr_physical += dkeys[0].fmr_length; - dkeys[0].fmr_owner = 0; if (dkeys[0].fmr_offset) return -EINVAL; - } else - dkeys[0].fmr_offset += dkeys[0].fmr_length; - dkeys[0].fmr_length = 0; + } memset(&dkeys[1], 0xFF, sizeof(struct xfs_fsmap)); if (!xfs_getfsmap_check_keys(dkeys, &head->fmh_keys[1])) @@ -948,6 +976,7 @@ xfs_getfsmap( info.dev = handlers[i].dev; info.last = false; info.agno = NULLAGNUMBER; + info.low.rm_blockcount = 0; error = handlers[i].fn(tp, dkeys, &info); if (error) break; -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] blk-iocost: do not WARN if iocg was already offlined
by Li Nan 11 Jun '24

11 Jun '24
mainline inclusion from mainline-v6.9-rc5 commit 01bc4fda9ea0a6b52f12326486f07a4910666cf6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9UABH CVE: CVE-2024-36908 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which is intended to confirm iocg is active when it has debt. However, warn can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn() is run at that time: WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190 Call trace: iocg_pay_debt+0x14c/0x190 iocg_kick_waitq+0x438/0x4c0 iocg_waitq_timer_fn+0xd8/0x130 __run_hrtimer+0x144/0x45c __hrtimer_run_queues+0x16c/0x244 hrtimer_interrupt+0x2cc/0x7b0 The warn in this situation is meaningless. Since this iocg is being removed, the state of the 'active_list' is irrelevant, and 'waitq_timer' is canceled after removing 'active_list' in ioc_pd_free(), which ensures iocg is freed after iocg_waitq_timer_fn() returns. Therefore, add the check if iocg was already offlined to avoid warn when removing a blkcg or disk. Signed-off-by: Li Nan <linan122(a)huawei.com> Reviewed-by: Yu Kuai <yukuai3(a)huawei.com> Acked-by: Tejun Heo <tj(a)kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Li Nan <linan122(a)huawei.com> --- block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 6a3e8a21c0fc..3ac79f0c098f 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1439,8 +1439,11 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, lockdep_assert_held(&iocg->ioc->lock); lockdep_assert_held(&iocg->waitq.lock); - /* make sure that nobody messed with @iocg */ - WARN_ON_ONCE(list_empty(&iocg->active_list)); + /* + * make sure that nobody messed with @iocg. Check iocg->pd.online + * to avoid warn when removing blkcg or disk. + */ + WARN_ON_ONCE(list_empty(&iocg->active_list) && iocg->pd.online); WARN_ON_ONCE(iocg->inuse > 1); iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); -- 2.39.2
2 1
0 0
[PATCH openEuler-22.03-LTS-SP1] gpiolib: cdev: Fix use after free in lineinfo_changed_notify
by Wupeng Ma 11 Jun '24

11 Jun '24
From: Zhongqiu Han <quic_zhonhan(a)quicinc.com> stable inclusion from stable-v6.6.31 commit 95ca7c90eaf5ea8a8460536535101e3e81160e2a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9UG5Z CVE: CVE-2024-36899 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 02f6b0e1ec7e0e7d059dddc893645816552039da ] The use-after-free issue occurs as follows: when the GPIO chip device file is being closed by invoking gpio_chrdev_release(), watched_lines is freed by bitmap_free(), but the unregistration of lineinfo_changed_nb notifier chain failed due to waiting write rwsem. Additionally, one of the GPIO chip's lines is also in the release process and holds the notifier chain's read rwsem. Consequently, a race condition leads to the use-after-free of watched_lines. Here is the typical stack when issue happened: [free] gpio_chrdev_release() --> bitmap_free(cdev->watched_lines) <-- freed --> blocking_notifier_chain_unregister() --> down_write(&nh->rwsem) <-- waiting rwsem --> __down_write_common() --> rwsem_down_write_slowpath() --> schedule_preempt_disabled() --> schedule() [use] st54spi_gpio_dev_release() --> gpio_free() --> gpiod_free() --> gpiod_free_commit() --> gpiod_line_state_notify() --> blocking_notifier_call_chain() --> down_read(&nh->rwsem); <-- held rwsem --> notifier_call_chain() --> lineinfo_changed_notify() --> test_bit(xxxx, cdev->watched_lines) <-- use after free The side effect of the use-after-free issue is that a GPIO line event is being generated for userspace where it shouldn't. However, since the chrdev is being closed, userspace won't have the chance to read that event anyway. To fix the issue, call the bitmap_free() function after the unregistration of lineinfo_changed_nb notifier chain. Fixes: 51c1064e82e7 ("gpiolib: add new ioctl() for monitoring changes in line info") Signed-off-by: Zhongqiu Han <quic_zhonhan(a)quicinc.com> Link: https://lore.kernel.org/r/20240505141156.2944912-1-quic_zhonhan@quicinc.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: drivers/gpio/gpiolib-cdev.c [Ma Wupeng: context conflicts] Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- drivers/gpio/gpiolib-cdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c index 0f98a07efcd6..62666174a2c9 100644 --- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -2328,9 +2328,9 @@ static int gpio_chrdev_release(struct inode *inode, struct file *file) struct gpio_chardev_data *cdev = file->private_data; struct gpio_device *gdev = cdev->gdev; - bitmap_free(cdev->watched_lines); blocking_notifier_chain_unregister(&gdev->notifier, &cdev->lineinfo_changed_nb); + bitmap_free(cdev->watched_lines); put_device(&gdev->dev); kfree(cdev); -- 2.25.1
2 1
0 0
  • ← Newer
  • 1
  • ...
  • 970
  • 971
  • 972
  • 973
  • 974
  • 975
  • 976
  • ...
  • 1941
  • Older →

HyperKitty Powered by HyperKitty