mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2026 -----
  • March
  • February
  • January
  • ----- 2025 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 49 participants
  • 23081 discussions
[PATCH openEuler-1.0-LTS] RDMA/umad: Reject negative data_len in ib_umad_write
by Liu Kai 26 Mar '26

26 Mar '26
From: YunJe Shin <yjshin0438(a)gmail.com> stable inclusion from mainline-v7.0-rc1 commit 5551b02fdbfd85a325bb857f3a8f9c9f33397ed2 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13865 CVE: CVE-2026-23243 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- ib_umad_write computes data_len from user-controlled count and the MAD header sizes. With a mismatched user MAD header size and RMPP header length, data_len can become negative and reach ib_create_send_mad(). This can make the padding calculation exceed the segment size and trigger an out-of-bounds memset in alloc_send_rmpp_list(). Add an explicit check to reject negative data_len before creating the send buffer. KASAN splat: [ 211.363464] BUG: KASAN: slab-out-of-bounds in ib_create_send_mad+0xa01/0x11b0 [ 211.364077] Write of size 220 at addr ffff88800c3fa1f8 by task spray_thread/102 [ 211.365867] ib_create_send_mad+0xa01/0x11b0 [ 211.365887] ib_umad_write+0x853/0x1c80 Fixes: 2be8e3ee8efd ("IB/umad: Add P_Key index support") Signed-off-by: YunJe Shin <ioerts(a)kookmin.ac.kr> Link: https://patch.msgid.link/20260203100628.1215408-1-ioerts@kookmin.ac.kr Signed-off-by: Leon Romanovsky <leon(a)kernel.org> Conflicts: drivers/infiniband/core/user_mad.c [add type casting in check_sub_overflow()] Signed-off-by: Liu Kai <liukai284(a)huawei.com> --- drivers/infiniband/core/user_mad.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index b37aada92a62a..5d51794b69d46 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -486,7 +486,8 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, struct ib_ah *ah; struct ib_rmpp_mad *rmpp_mad; __be64 *tid; - int ret, data_len, hdr_len, copy_offset, rmpp_active; + int ret, hdr_len, copy_offset, rmpp_active; + size_t data_len; u8 base_version; if (count < hdr_size(file) + IB_MGMT_RMPP_HDR) @@ -557,7 +558,11 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, } base_version = ((struct ib_mad_hdr *)&packet->mad.data)->base_version; - data_len = count - hdr_size(file) - hdr_len; + if (check_sub_overflow(count, (size_t) (hdr_size(file) + hdr_len), + &data_len)) { + ret = -EINVAL; + goto err_ah; + } packet->msg = ib_create_send_mad(agent, be32_to_cpu(packet->mad.hdr.qpn), packet->mad.hdr.pkey_index, rmpp_active, -- 2.34.1
2 2
0 0
[PATCH OLK-5.10] ext4: fix memory leak in ext4_ext_shift_extents()
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Zilin Guan <zilin(a)seu.edu.cn> mainline inclusion from mainline-v7.0-rc1 commit ca81109d4a8f192dc1cbad4a1ee25246363c2833 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In ext4_ext_shift_extents(), if the extent is NULL in the while loop, the function returns immediately without releasing the path obtained via ext4_find_extent(), leading to a memory leak. Fix this by jumping to the out label to ensure the path is properly released. Fixes: a18ed359bdddc ("ext4: always check ext4_ext_find_extent result") Signed-off-by: Zilin Guan <zilin(a)seu.edu.cn> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Reviewed-by: Baokun Li <libaokun1(a)huawei.com> Link: https://patch.msgid.link/20251225084800.905701-1-zilin@seu.edu.cn Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Cc: stable(a)kernel.org Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/ext4/extents.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index ceca82a0fd5c..786701524736 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5278,7 +5278,8 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle, if (!extent) { EXT4_ERROR_INODE(inode, "unexpected hole at %lu", (unsigned long) *iterator); - return -EFSCORRUPTED; + ret = -EFSCORRUPTED; + goto out; } if (SHIFT == SHIFT_LEFT && *iterator > le32_to_cpu(extent->ee_block)) { -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] quota: fix livelock between quotactl and freeze_super
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Abhishek Bapat <abhishekbapat(a)google.com> mainline inclusion from mainline-v7.0-rc1 commit 77449e453dfc006ad738dec55374c4cbc056fd39 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When a filesystem is frozen, quotactl_block() enters a retry loop waiting for the filesystem to thaw. It acquires s_umount, checks the freeze state, drops s_umount and uses sb_start_write() - sb_end_write() pair to wait for the unfreeze. However, this retry loop can trigger a livelock issue, specifically on kernels with preemption disabled. The mechanism is as follows: 1. freeze_super() sets SB_FREEZE_WRITE and calls sb_wait_write(). 2. sb_wait_write() calls percpu_down_write(), which initiates synchronize_rcu(). 3. Simultaneously, quotactl_block() spins in its retry loop, immediately executing the sb_start_write() - sb_end_write() pair. 4. Because the kernel is non-preemptible and the loop contains no scheduling points, quotactl_block() never yields the CPU. This prevents that CPU from reaching an RCU quiescent state. 5. synchronize_rcu() in the freezer thread waits indefinitely for the quotactl_block() CPU to report a quiescent state. 6. quotactl_block() spins indefinitely waiting for the freezer to advance, which it cannot do as it is blocked on the RCU sync. This results in a hang of the freezer process and 100% CPU usage by the quota process. While this can occur intermittently on multi-core systems, it is reliably reproducing on a node with the following script, running both the freezer and the quota toggle on the same CPU: # mkfs.ext4 -O quota /dev/sda 2g && mkdir a_mount # mount /dev/sda -o quota,usrquota,grpquota a_mount # taskset -c 3 bash -c "while true; do xfs_freeze -f a_mount; \ xfs_freeze -u a_mount; done" & # taskset -c 3 bash -c "while true; do quotaon a_mount; \ quotaoff a_mount; done" & Adding cond_resched() to the retry loop fixes the issue. It acts as an RCU quiescent state, allowing synchronize_rcu() in percpu_down_write() to complete. Fixes: 576215cffdef ("fs: Drop wait_unfrozen wait queue") Signed-off-by: Abhishek Bapat <abhishekbapat(a)google.com> Link: https://patch.msgid.link/20260115213103.1089129-1-abhishekbapat@google.com Signed-off-by: Jan Kara <jack(a)suse.cz> Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/quota/quota.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 0e41fb84060f..5be53cae2c95 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -899,6 +899,7 @@ static struct super_block *quotactl_block(const char __user *special, int cmd) sb_start_write(sb); sb_end_write(sb); put_super(sb); + cond_resched(); goto retry; } return sb; -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] ext4: fix dirtyclusters double decrement on fs shutdown
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Brian Foster <bfoster(a)redhat.com> mainline inclusion from mainline-v7.0-rc1 commit 94a8cea54cd935c54fa2fba70354757c0fc245e3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- fstests test generic/388 occasionally reproduces a warning in ext4_put_super() associated with the dirty clusters count: WARNING: CPU: 7 PID: 76064 at fs/ext4/super.c:1324 ext4_put_super+0x48c/0x590 [ext4] Tracing the failure shows that the warning fires due to an s_dirtyclusters_counter value of -1. IOW, this appears to be a spurious decrement as opposed to some sort of leak. Further tracing of the dirty cluster count deltas and an LLM scan of the resulting output identified the cause as a double decrement in the error path between ext4_mb_mark_diskspace_used() and the caller ext4_mb_new_blocks(). First, note that generic/388 is a shutdown vs. fsstress test and so produces a random set of operations and shutdown injections. In the problematic case, the shutdown triggers an error return from the ext4_handle_dirty_metadata() call(s) made from ext4_mb_mark_context(). The changed value is non-zero at this point, so ext4_mb_mark_diskspace_used() does not exit after the error bubbles up from ext4_mb_mark_context(). Instead, the former decrements both cluster counters and returns the error up to ext4_mb_new_blocks(). The latter falls into the !ar->len out path which decrements the dirty clusters counter a second time, creating the inconsistency. To avoid this problem and simplify ownership of the cluster reservation in this codepath, lift the counter reduction to a single place in the caller. This makes it more clear that ext4_mb_new_blocks() is responsible for acquiring cluster reservation (via ext4_claim_free_clusters()) in the !delalloc case as well as releasing it, regardless of whether it ends up consumed or returned due to failure. Fixes: 0087d9fb3f29 ("ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc") Signed-off-by: Brian Foster <bfoster(a)redhat.com> Reviewed-by: Baokun Li <libaokun1(a)huawei.com> Link: https://patch.msgid.link/20260113171905.118284-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Cc: stable(a)kernel.org Conflicts: fs/ext4/mballoc-test.c fs/ext4/mballoc.c [mainline 2f94711b098b & 7c9fa399a369 not applied] Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/ext4/mballoc.c | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index ba5a8f2c7bff..81c3ba818aca 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4071,8 +4071,7 @@ void ext4_exit_mballoc(void) * Returns 0 if success or error code */ static noinline_for_stack int -ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, - handle_t *handle, unsigned int reserv_clstrs) +ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, handle_t *handle) { struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; @@ -4158,13 +4157,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, ext4_unlock_group(sb, ac->ac_b_ex.fe_group); percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len); - /* - * Now reduce the dirty block count also. Should not go negative - */ - if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) - /* release all the reserved blocks if non delalloc */ - percpu_counter_sub(&sbi->s_dirtyclusters_counter, - reserv_clstrs); if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, @@ -6402,7 +6394,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, ext4_mb_pa_put_free(ac); } if (likely(ac->ac_status == AC_STATUS_FOUND)) { - *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs); + *errp = ext4_mb_mark_diskspace_used(ac, handle); if (*errp) { ext4_discard_allocated_blocks(ac); goto errout; @@ -6433,12 +6425,9 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, out: if (inquota && ar->len < inquota) dquot_free_block(ar->inode, EXT4_C2B(sbi, inquota - ar->len)); - if (!ar->len) { - if ((ar->flags & EXT4_MB_DELALLOC_RESERVED) == 0) - /* release all the reserved blocks if non delalloc */ - percpu_counter_sub(&sbi->s_dirtyclusters_counter, - reserv_clstrs); - } + /* release any reserved blocks */ + if (reserv_clstrs) + percpu_counter_sub(&sbi->s_dirtyclusters_counter, reserv_clstrs); trace_ext4_allocate_blocks(ar, (unsigned long long)block); -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] fat: avoid parent link count underflow in rmdir
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Zhiyu Zhang <zhiyuzhang999(a)gmail.com> mainline inclusion from mainline-v7.0-rc1 commit 8cafcb881364af5ef3a8b9fed4db254054033d8a category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Corrupted FAT images can leave a directory inode with an incorrect i_nlink (e.g. 2 even though subdirectories exist). rmdir then unconditionally calls drop_nlink(dir) and can drive i_nlink to 0, triggering the WARN_ON in drop_nlink(). Add a sanity check in vfat_rmdir() and msdos_rmdir(): only drop the parent link count when it is at least 3, otherwise report a filesystem error. Link: https://lkml.kernel.org/r/20260101111148.1437-1-zhiyuzhang999@gmail.com Fixes: 9a53c3a783c2 ("[PATCH] r/o bind mounts: unlink: monitor i_nlink") Signed-off-by: Zhiyu Zhang <zhiyuzhang999(a)gmail.com> Reported-by: Zhiyu Zhang <zhiyuzhang999(a)gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/aVN06OKsKxZe6-Kv@casper.infradead.org… Tested-by: Zhiyu Zhang <zhiyuzhang999(a)gmail.com> Acked-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Jan Kara <jack(a)suse.cz> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/fat/namei_msdos.c | 7 ++++++- fs/fat/namei_vfat.c | 7 ++++++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c index 2116c486843b..e189fbf95fca 100644 --- a/fs/fat/namei_msdos.c +++ b/fs/fat/namei_msdos.c @@ -325,7 +325,12 @@ static int msdos_rmdir(struct inode *dir, struct dentry *dentry) err = fat_remove_entries(dir, &sinfo); /* and releases bh */ if (err) goto out; - drop_nlink(dir); + if (dir->i_nlink >= 3) + drop_nlink(dir); + else { + fat_fs_error(sb, "parent dir link count too low (%u)", + dir->i_nlink); + } clear_nlink(inode); fat_truncate_time(inode, NULL, S_CTIME); diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c index 3cf22a6727f1..7d7ac30c6eff 100644 --- a/fs/fat/namei_vfat.c +++ b/fs/fat/namei_vfat.c @@ -806,7 +806,12 @@ static int vfat_rmdir(struct inode *dir, struct dentry *dentry) err = fat_remove_entries(dir, &sinfo); /* and releases bh */ if (err) goto out; - drop_nlink(dir); + if (dir->i_nlink >= 3) + drop_nlink(dir); + else { + fat_fs_error(sb, "parent dir link count too low (%u)", + dir->i_nlink); + } clear_nlink(inode); fat_truncate_time(inode, NULL, S_ATIME|S_MTIME); -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Zhang Yi <yi.zhang(a)huawei.com> mainline inclusion from mainline-v7.0-rc1 commit feaf2a80e78f89ee8a3464126077ba8683b62791 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When allocating blocks during within-EOF DIO and writeback with dioread_nolock enabled, EXT4_GET_BLOCKS_PRE_IO was set to split an existing large unwritten extent. However, EXT4_GET_BLOCKS_CONVERT was set when calling ext4_split_convert_extents(), which may potentially result in stale data issues. Assume we have an unwritten extent, and then DIO writes the second half. [UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent [UUUUUUUUUUUUUUUU] extent status tree |<- ->| ----> dio write this range First, ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO, EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set. ext4_map_blocks() find this extent and call ext4_split_convert_extents() with EXT4_GET_BLOCKS_CONVERT and the above flags set. Then, ext4_split_convert_extents() calls ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and EXT4_EXT_DATA_VALID2 flags set, and it calls ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2, EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags set. However, ext4_split_extent_at() failed to insert extent since a temporary lack -ENOSPC. It zeroes out the first half but convert the entire on-disk extent to written since the EXT4_EXT_DATA_VALID2 flag set, but left the second half as unwritten in the extent status tree. [0000000000SSSSSS] data S: stale data, 0: zeroed [WWWWWWWWWWWWWWWW] on-disk extent W: written extent [WWWWWWWWWWUUUUUU] extent status tree Finally, if the DIO failed to write data to the disk, the stale data in the second half will be exposed once the cached extent entry is gone. Fix this issue by not passing EXT4_GET_BLOCKS_CONVERT when splitting an unwritten extent before submitting I/O, and make ext4_split_convert_extents() to zero out the entire extent range to zero for this case, and also mark the extent in the extent status tree for consistency. Fixes: b8a8684502a0 ("ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate") Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com> Reviewed-by: Baokun Li <libaokun1(a)huawei.com> Cc: stable(a)kernel.org Message-ID: <20251129103247.686136-4-yi.zhang(a)huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Conflicts: fs/ext4/extents.c [mainline commit dac092195b6a not applied] Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/ext4/extents.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 19eae32fc128..83b2e5d1e00a 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3713,11 +3713,15 @@ static int ext4_split_convert_extents(handle_t *handle, /* Convert to unwritten */ if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) { split_flag |= EXT4_EXT_DATA_VALID1; - /* Convert to initialized */ - } else if (flags & EXT4_GET_BLOCKS_CONVERT) { + /* Split the existing unwritten extent */ + } else if (flags & (EXT4_GET_BLOCKS_UNWRIT_EXT | + EXT4_GET_BLOCKS_CONVERT)) { split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0; - split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2); + split_flag |= EXT4_EXT_MARK_UNWRIT2; + /* Convert to initialized */ + if (flags & EXT4_GET_BLOCKS_CONVERT) + split_flag |= EXT4_EXT_DATA_VALID2; } flags |= EXT4_GET_BLOCKS_PRE_IO; return ext4_split_extent(handle, inode, ppath, map, split_flag, flags); @@ -3888,7 +3892,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, /* get_block() before submitting IO, split the extent */ if (flags & EXT4_GET_BLOCKS_PRE_IO) { ret = ext4_split_convert_extents(handle, inode, map, ppath, - flags | EXT4_GET_BLOCKS_CONVERT); + flags); if (ret < 0) { err = ret; goto out2; -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] ext4: fix memory leak in ext4_ext_shift_extents()
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Zilin Guan <zilin(a)seu.edu.cn> mainline inclusion from mainline-v7.0-rc1 commit ca81109d4a8f192dc1cbad4a1ee25246363c2833 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In ext4_ext_shift_extents(), if the extent is NULL in the while loop, the function returns immediately without releasing the path obtained via ext4_find_extent(), leading to a memory leak. Fix this by jumping to the out label to ensure the path is properly released. Fixes: a18ed359bdddc ("ext4: always check ext4_ext_find_extent result") Signed-off-by: Zilin Guan <zilin(a)seu.edu.cn> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Reviewed-by: Baokun Li <libaokun1(a)huawei.com> Link: https://patch.msgid.link/20251225084800.905701-1-zilin@seu.edu.cn Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Cc: stable(a)kernel.org Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/ext4/extents.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 6126f98bac63..19eae32fc128 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5281,7 +5281,8 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle, if (!extent) { EXT4_ERROR_INODE(inode, "unexpected hole at %lu", (unsigned long) *iterator); - return -EFSCORRUPTED; + ret = -EFSCORRUPTED; + goto out; } if (SHIFT == SHIFT_LEFT && *iterator > le32_to_cpu(extent->ee_block)) { -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] blktrace: fix __this_cpu_read/write in preemptible context
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Chaitanya Kulkarni <kch(a)nvidia.com> mainline inclusion from mainline-v7.0-rc3 commit da46b5dfef48658d03347cda21532bcdbb521e67 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14005 CVE: CVE-2026-23374 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- tracing_record_cmdline() internally uses __this_cpu_read() and __this_cpu_write() on the per-CPU variable trace_cmdline_save, and trace_save_cmdline() explicitly asserts preemption is disabled via lockdep_assert_preemption_disabled(). These operations are only safe when preemption is off, as they were designed to be called from the scheduler context (probe_wakeup_sched_switch() / probe_wakeup()). __blk_add_trace() was calling tracing_record_cmdline(current) early in the blk_tracer path, before ring buffer reservation, from process context where preemption is fully enabled. This triggers the following using blktests/blktrace/002: blktrace/002 (blktrace ftrace corruption with sysfs trace) [failed] runtime 0.367s ... 0.437s something found in dmesg: [ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33 [ 81.239580] null_blk: disk nullb1 created [ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516 [ 81.362842] caller is tracing_record_cmdline+0x10/0x40 [ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full) [ 81.362877] Tainted: [N]=TEST [ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 81.362881] Call Trace: [ 81.362884] <TASK> [ 81.362886] dump_stack_lvl+0x8d/0xb0 ... (See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message) [ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33 [ 81.239580] null_blk: disk nullb1 created [ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516 [ 81.362842] caller is tracing_record_cmdline+0x10/0x40 [ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full) [ 81.362877] Tainted: [N]=TEST [ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 81.362881] Call Trace: [ 81.362884] <TASK> [ 81.362886] dump_stack_lvl+0x8d/0xb0 [ 81.362895] check_preemption_disabled+0xce/0xe0 [ 81.362902] tracing_record_cmdline+0x10/0x40 [ 81.362923] __blk_add_trace+0x307/0x5d0 [ 81.362934] ? lock_acquire+0xe0/0x300 [ 81.362940] ? iov_iter_extract_pages+0x101/0xa30 [ 81.362959] blk_add_trace_bio+0x106/0x1e0 [ 81.362968] submit_bio_noacct_nocheck+0x24b/0x3a0 [ 81.362979] ? lockdep_init_map_type+0x58/0x260 [ 81.362988] submit_bio_wait+0x56/0x90 [ 81.363009] __blkdev_direct_IO_simple+0x16c/0x250 [ 81.363026] ? __pfx_submit_bio_wait_endio+0x10/0x10 [ 81.363038] ? rcu_read_lock_any_held+0x73/0xa0 [ 81.363051] blkdev_read_iter+0xc1/0x140 [ 81.363059] vfs_read+0x20b/0x330 [ 81.363083] ksys_read+0x67/0xe0 [ 81.363090] do_syscall_64+0xbf/0xf00 [ 81.363102] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 81.363106] RIP: 0033:0x7f281906029d [ 81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 0c [ 81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d [ 81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000 [ 81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000 [ 81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000 [ 81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a [ 81.363142] </TASK> The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(), and blk_add_trace_rq() paths as well. The purpose of tracing_record_cmdline() is to cache the task->comm for a given PID so that the trace can later resolve it. It is only meaningful when a trace event is actually being recorded. Ring buffer reservation via ring_buffer_lock_reserve() disables preemption, and preemption remains disabled until the event is committed :- __blk_add_trace() __trace_buffer_lock_reserve() __trace_buffer_lock_reserve() ring_buffer_lock_reserve() preempt_disable_notrace(); <--- With this fix blktests for blktrace pass: blktests (master) # ./check blktrace blktrace/001 (blktrace zone management command tracing) [passed] runtime 3.650s ... 3.647s blktrace/002 (blktrace ftrace corruption with sysfs trace) [passed] runtime 0.411s ... 0.384s Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki(a)wdc.com> Suggested-by: Steven Rostedt <rostedt(a)goodmis.org> Signed-off-by: Chaitanya Kulkarni <kch(a)nvidia.com> Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflicts: kernel/trace/blktrace.c [mainline e472eca538358 & 48886b9d668 not applied] Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- kernel/trace/blktrace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index d5d94510afd3..ce797d8dd451 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -251,8 +251,6 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, cpu = raw_smp_processor_id(); if (blk_tracer) { - tracing_record_cmdline(current); - buffer = blk_tr->array_buffer.buffer; trace_ctx = tracing_gen_ctx_flags(0); event = trace_buffer_lock_reserve(buffer, TRACE_BLK, @@ -260,6 +258,8 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, trace_ctx); if (!event) return; + + tracing_record_cmdline(current); t = ring_buffer_event_data(event); goto record_it; } -- 2.39.2
2 1
0 0
[PATCH OLK-5.10] ext4: fix dirtyclusters double decrement on fs shutdown
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Brian Foster <bfoster(a)redhat.com> mainline inclusion from mainline-v7.0-rc1 commit 94a8cea54cd935c54fa2fba70354757c0fc245e3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- fstests test generic/388 occasionally reproduces a warning in ext4_put_super() associated with the dirty clusters count: WARNING: CPU: 7 PID: 76064 at fs/ext4/super.c:1324 ext4_put_super+0x48c/0x590 [ext4] Tracing the failure shows that the warning fires due to an s_dirtyclusters_counter value of -1. IOW, this appears to be a spurious decrement as opposed to some sort of leak. Further tracing of the dirty cluster count deltas and an LLM scan of the resulting output identified the cause as a double decrement in the error path between ext4_mb_mark_diskspace_used() and the caller ext4_mb_new_blocks(). First, note that generic/388 is a shutdown vs. fsstress test and so produces a random set of operations and shutdown injections. In the problematic case, the shutdown triggers an error return from the ext4_handle_dirty_metadata() call(s) made from ext4_mb_mark_context(). The changed value is non-zero at this point, so ext4_mb_mark_diskspace_used() does not exit after the error bubbles up from ext4_mb_mark_context(). Instead, the former decrements both cluster counters and returns the error up to ext4_mb_new_blocks(). The latter falls into the !ar->len out path which decrements the dirty clusters counter a second time, creating the inconsistency. To avoid this problem and simplify ownership of the cluster reservation in this codepath, lift the counter reduction to a single place in the caller. This makes it more clear that ext4_mb_new_blocks() is responsible for acquiring cluster reservation (via ext4_claim_free_clusters()) in the !delalloc case as well as releasing it, regardless of whether it ends up consumed or returned due to failure. Fixes: 0087d9fb3f29 ("ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc") Signed-off-by: Brian Foster <bfoster(a)redhat.com> Reviewed-by: Baokun Li <libaokun1(a)huawei.com> Link: https://patch.msgid.link/20260113171905.118284-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Cc: stable(a)kernel.org Conflicts: fs/ext4/mballoc-test.c fs/ext4/mballoc.c [mainline 2f94711b098b & 7c9fa399a369 not applied] Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/ext4/mballoc.c | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 9d4e8e3c74e2..683f1e220645 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3300,8 +3300,7 @@ void ext4_exit_mballoc(void) * Returns 0 if success or error code */ static noinline_for_stack int -ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, - handle_t *handle, unsigned int reserv_clstrs) +ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, handle_t *handle) { struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; @@ -3388,13 +3387,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, ext4_unlock_group(sb, ac->ac_b_ex.fe_group); percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len); - /* - * Now reduce the dirty block count also. Should not go negative - */ - if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) - /* release all the reserved blocks if non delalloc */ - percpu_counter_sub(&sbi->s_dirtyclusters_counter, - reserv_clstrs); if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, @@ -5272,7 +5264,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, ext4_mb_pa_free(ac); } if (likely(ac->ac_status == AC_STATUS_FOUND)) { - *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs); + *errp = ext4_mb_mark_diskspace_used(ac, handle); if (*errp) { ext4_discard_allocated_blocks(ac); goto errout; @@ -5304,12 +5296,9 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, kmem_cache_free(ext4_ac_cachep, ac); if (inquota && ar->len < inquota) dquot_free_block(ar->inode, EXT4_C2B(sbi, inquota - ar->len)); - if (!ar->len) { - if ((ar->flags & EXT4_MB_DELALLOC_RESERVED) == 0) - /* release all the reserved blocks if non delalloc */ - percpu_counter_sub(&sbi->s_dirtyclusters_counter, - reserv_clstrs); - } + /* release any reserved blocks */ + if (reserv_clstrs) + percpu_counter_sub(&sbi->s_dirtyclusters_counter, reserv_clstrs); trace_ext4_allocate_blocks(ar, (unsigned long long)block); -- 2.39.2
2 1
0 0
[PATCH OLK-5.10] ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O
by Yongjian Sun 26 Mar '26

26 Mar '26
From: Zhang Yi <yi.zhang(a)huawei.com> mainline inclusion from mainline-v7.0-rc1 commit feaf2a80e78f89ee8a3464126077ba8683b62791 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When allocating blocks during within-EOF DIO and writeback with dioread_nolock enabled, EXT4_GET_BLOCKS_PRE_IO was set to split an existing large unwritten extent. However, EXT4_GET_BLOCKS_CONVERT was set when calling ext4_split_convert_extents(), which may potentially result in stale data issues. Assume we have an unwritten extent, and then DIO writes the second half. [UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent [UUUUUUUUUUUUUUUU] extent status tree |<- ->| ----> dio write this range First, ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO, EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set. ext4_map_blocks() find this extent and call ext4_split_convert_extents() with EXT4_GET_BLOCKS_CONVERT and the above flags set. Then, ext4_split_convert_extents() calls ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and EXT4_EXT_DATA_VALID2 flags set, and it calls ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2, EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags set. However, ext4_split_extent_at() failed to insert extent since a temporary lack -ENOSPC. It zeroes out the first half but convert the entire on-disk extent to written since the EXT4_EXT_DATA_VALID2 flag set, but left the second half as unwritten in the extent status tree. [0000000000SSSSSS] data S: stale data, 0: zeroed [WWWWWWWWWWWWWWWW] on-disk extent W: written extent [WWWWWWWWWWUUUUUU] extent status tree Finally, if the DIO failed to write data to the disk, the stale data in the second half will be exposed once the cached extent entry is gone. Fix this issue by not passing EXT4_GET_BLOCKS_CONVERT when splitting an unwritten extent before submitting I/O, and make ext4_split_convert_extents() to zero out the entire extent range to zero for this case, and also mark the extent in the extent status tree for consistency. Fixes: b8a8684502a0 ("ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate") Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com> Reviewed-by: Baokun Li <libaokun1(a)huawei.com> Cc: stable(a)kernel.org Message-ID: <20251129103247.686136-4-yi.zhang(a)huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Conflicts: fs/ext4/extents.c [mainline commit dac092195b6a not applied] Signed-off-by: Yongjian Sun <sunyongjian1(a)huawei.com> --- fs/ext4/extents.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 786701524736..4c77284be84d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3714,11 +3714,15 @@ static int ext4_split_convert_extents(handle_t *handle, /* Convert to unwritten */ if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) { split_flag |= EXT4_EXT_DATA_VALID1; - /* Convert to initialized */ - } else if (flags & EXT4_GET_BLOCKS_CONVERT) { + /* Split the existing unwritten extent */ + } else if (flags & (EXT4_GET_BLOCKS_UNWRIT_EXT | + EXT4_GET_BLOCKS_CONVERT)) { split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0; - split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2); + split_flag |= EXT4_EXT_MARK_UNWRIT2; + /* Convert to initialized */ + if (flags & EXT4_GET_BLOCKS_CONVERT) + split_flag |= EXT4_EXT_DATA_VALID2; } flags |= EXT4_GET_BLOCKS_PRE_IO; return ext4_split_extent(handle, inode, ppath, map, split_flag, flags); @@ -3883,7 +3887,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, /* get_block() before submitting IO, split the extent */ if (flags & EXT4_GET_BLOCKS_PRE_IO) { ret = ext4_split_convert_extents(handle, inode, map, ppath, - flags | EXT4_GET_BLOCKS_CONVERT); + flags); if (ret < 0) { err = ret; goto out2; -- 2.39.2
2 1
0 0
  • ← Newer
  • 1
  • 2
  • 3
  • 4
  • ...
  • 2309
  • Older →

HyperKitty Powered by HyperKitty