This patch set fix some bug for forcealign:
patch 1 ~ 4 : fix tail alignment issue while approach to no space. patch 5 : fix file offset not align due to revert allocated offset while bmap alloc. patch 6 ~ 8 : fix expose stale data issue while truance down file. patch 9 : fix forcealign not compatible with reflink and realtime.
Dave Chinner (3): xfs: only allow minlen allocations when near ENOSPC xfs: always tail align maxlen allocations xfs: align args->minlen for forced allocation alignment
John Garry (2): xfs: Don't revert allocated offset for forcealign
Long Li (3): xfs: set minlen to align for forcealign xfs: correct the truncate blocksize of forcealign xfs: forcealign not compatible with reflink and realtime device
Zhang Yi (2): math64: add rem_u64() to just return the remainder iomap: pass blocksize to iomap_truncate_page()
fs/iomap/buffered-io.c | 8 ++++---- fs/xfs/libxfs/xfs_alloc.c | 31 ++++++++++++++++++----------- fs/xfs/libxfs/xfs_bmap.c | 42 +++++++++++++++++++++++++++++---------- fs/xfs/xfs_iops.c | 10 ++++++++-- fs/xfs/xfs_super.c | 19 +++++++++++++++++- include/linux/iomap.h | 4 ++-- include/linux/math64.h | 24 ++++++++++++++++++++++ 7 files changed, 106 insertions(+), 32 deletions(-)
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
--------------------------------
When we are near ENOSPC and don't have enough free space for an args->maxlen allocation, xfs_alloc_space_available() will trim args->maxlen to equal the available space. However, this function has only checked that there is enough contiguous free space for an aligned args->minlen allocation to succeed. Hence there is no guarantee that an args->maxlen allocation will succeed, nor that the available space will allow for correct alignment of an args->maxlen allocation.
Further, by trimming args->maxlen arbitrarily, it breaks an assumption made in xfs_alloc_fix_len() that if the caller wants aligned allocation, then args->maxlen will be set to an aligned value. It then skips the tail alignment and so we end up with extents that aren't aligned to extent size hint boundaries as we approach ENOSPC.
To avoid this problem, don't reduce args->maxlen by some random, arbitrary amount. If args->maxlen is too large for the available space, reduce the allocation to a minlen allocation as we know we have contiguous free space available for this to succeed and always be correctly aligned.
Signed-off-by: Dave Chinner dchinner@redhat.com Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 95bfe89651ad..19d89bf19b48 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2385,14 +2385,23 @@ xfs_alloc_space_available( if (available < (int)max(args->total, alloc_len)) return false;
+ if (flags & XFS_ALLOC_FLAG_CHECK) + return true; + /* - * Clamp maxlen to the amount of free space available for the actual - * extent allocation. + * If we can't do a maxlen allocation, then we must reduce the size of + * the allocation to match the available free space. We know how big + * the largest contiguous free space we can allocate is, so that's our + * upper bound. However, we don't exaclty know what alignment/size + * constraints have been placed on the allocation, so we can't + * arbitrarily select some new max size. Hence make this a minlen + * allocation as we know that will definitely succeed and match the + * callers alignment constraints. */ - if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) { - args->maxlen = available; + alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop; + if (longest < alloc_len) { + args->maxlen = args->minlen; ASSERT(args->maxlen > 0); - ASSERT(args->maxlen >= args->minlen); }
return true;
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
--------------------------------
When we do a large allocation, the core free space allocation code assumes that args->maxlen is aligned to args->prod/args->mod. hence if we get a maximum sized extent allocated, it does not do tail alignment of the extent.
However, this assumes that nothing modifies args->maxlen between the original allocation context setup and trimming the selected free space extent to size. This assumption has recently been found to be invalid - xfs_alloc_space_available() modifies args->maxlen in low space situations - and there may be more situations we haven't yet found like this.
Force aligned allocation introduces the requirement that extents are correctly tail aligned, resulting in this occasional latent alignment failure to be reclassified from an unimportant curiousity to a must-fix bug.
Removing the assumption about args->maxlen allocations always being tail aligned is trivial, and should not impact anything because args->maxlen for inodes with extent size hints configured are already aligned. Hence all this change does it avoid weird corner cases that would have resulted in unaligned extent sizes by always trimming the extent down to an aligned size.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org [provisional on v1 series comment] Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 19d89bf19b48..23c0e666d2f4 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -408,20 +408,18 @@ xfs_alloc_compute_diff( * Fix up the length, based on mod and prod. * len should be k * prod + mod for some k. * If len is too small it is returned unchanged. - * If len hits maxlen it is left alone. */ -STATIC void +static void xfs_alloc_fix_len( - xfs_alloc_arg_t *args) /* allocation argument structure */ + struct xfs_alloc_arg *args) { - xfs_extlen_t k; - xfs_extlen_t rlen; + xfs_extlen_t k; + xfs_extlen_t rlen = args->len;
ASSERT(args->mod < args->prod); - rlen = args->len; ASSERT(rlen >= args->minlen); ASSERT(rlen <= args->maxlen); - if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen || + if (args->prod <= 1 || rlen < args->mod || (args->mod == 0 && rlen < args->prod)) return; k = rlen % args->prod;
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
--------------------------------
If args->minlen is not aligned to the constraints of forced alignment, we may do minlen allocations that are not aligned when we approach ENOSPC. Avoid this by always aligning args->minlen appropriately. If alignment of minlen results in a value smaller than the alignment constraint, fail the allocation immediately.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: John Garry john.g.garry@oracle.com Conflicts: fs/xfs/libxfs/xfs_bmap.c [Conflicts in xfs_bmap_select_minlen()] Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7682dfe2f701..284cc73b8bef 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3253,32 +3253,47 @@ xfs_bmap_longest_free_extent( return error; }
-static void +static int xfs_bmap_select_minlen( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t *blen, int notinit) { + xfs_extlen_t nlen = 0; + if (notinit || *blen < ap->minlen) { /* * Since we did a BUF_TRYLOCK above, it is possible that * there is space for this request. */ - args->minlen = ap->minlen; + nlen = ap->minlen; } else if (*blen < args->maxlen) { /* * If the best seen length is less than the request length, * use the best as the minimum. */ - args->minlen = *blen; + + nlen = *blen; } else { /* * Otherwise we've seen an extent as big as maxlen, use that * as the minimum. */ - args->minlen = args->maxlen; + nlen = args->maxlen; } + + if (args->alignment > 1) { + nlen = rounddown(nlen, args->alignment); + if (nlen < ap->minlen) { + if (xfs_inode_forcealign(ap->ip) && + (ap->datatype & XFS_ALLOC_USERDATA)) + return -ENOSPC; + nlen = ap->minlen; + } + } + args->minlen = nlen; + return 0; }
STATIC int @@ -3311,8 +3326,8 @@ xfs_bmap_btalloc_nullfb( break; }
- xfs_bmap_select_minlen(ap, args, blen, notinit); - return 0; + error = xfs_bmap_select_minlen(ap, args, blen, notinit); + return error; }
STATIC int @@ -3349,7 +3364,9 @@ xfs_bmap_btalloc_filestreams(
}
- xfs_bmap_select_minlen(ap, args, blen, notinit); + error = xfs_bmap_select_minlen(ap, args, blen, notinit); + if (error) + return error;
/* * Set the failure fallback case to look in the selected AG as stream
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
--------------------------------
Set the minlen of extent allocation arguments to the alignment size to enforce alignment. This prevents the extent from being misaligned at the tail when we approach a no-space situation.
Although xfs_bmap_select_minlen updates args->minlen alignment, args-> minlen may revert to ap->minlen in subsequent extent allocations, which could violate the tail alignment.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 284cc73b8bef..34a603c4b2c7 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3510,6 +3510,7 @@ xfs_bmap_btalloc( */ if (xfs_inode_forcealign(ap->ip) && align) { args.alignment = align; + args.minlen = ap->minlen = align; if (stripe_align == 0 || stripe_align % align) stripe_align = align; } else {
From: John Garry john.g.garry@oracle.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
--------------------------------
In xfs_bmap_process_allocated_extent(), for when we found that we could not provide the requested length completely, the mapping is moved so that we can provide as much as possible for the original request.
For forcealign, this would mean ignoring alignment guaranteed, so don't do this.
Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 34a603c4b2c7..bf09737cfc71 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3687,10 +3687,12 @@ xfs_bmap_btalloc( * very fragmented so we're unlikely to be able to satisfy the * hints anyway. */ - if (ap->length <= orig_length) - ap->offset = orig_offset; - else if (ap->offset + ap->length < orig_offset + orig_length) - ap->offset = orig_offset + orig_length - ap->length; + if (!(xfs_inode_forcealign(ap->ip) && align)) { + if (ap->length <= orig_length) + ap->offset = orig_offset; + else if (ap->offset + ap->length < orig_offset + orig_length) + ap->offset = orig_offset + orig_length - ap->length; + } xfs_bmap_btalloc_accounting(ap, &args); } else { ap->blkno = NULLFSBLOCK;
From: Zhang Yi yi.zhang@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-fsdevel/20240529095206.2568162-1-yi.zhang@huaw...
--------------------------------
Add a new helper rem_u64() to only get the remainder of unsigned 64bit divide with 32bit divisor.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- include/linux/math64.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/include/linux/math64.h b/include/linux/math64.h index 66deb1fdc2ef..b5c4d1df08e5 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -3,6 +3,7 @@ #define _LINUX_MATH64_H
#include <linux/types.h> +#include <linux/log2.h> #include <vdso/math64.h> #include <asm/div64.h>
@@ -11,6 +12,20 @@ #define div64_long(x, y) div64_s64((x), (y)) #define div64_ul(x, y) div64_u64((x), (y))
+/** + * rem_u64 - remainder of unsigned 64bit divide with 32bit divisor + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 32bit divisor + * + * Return: dividend % divisor + */ +static inline u32 rem_u64(u64 dividend, u32 divisor) +{ + if (is_power_of_2(divisor)) + return dividend & (divisor - 1); + return dividend % divisor; +} + /** * div_u64_rem - unsigned 64bit divide with 32bit divisor with remainder * @dividend: unsigned 64bit dividend @@ -85,6 +100,15 @@ static inline s64 div64_s64(s64 dividend, s64 divisor) #define div64_long(x, y) div_s64((x), (y)) #define div64_ul(x, y) div_u64((x), (y))
+#ifndef rem_u64 +static inline u32 rem_u64(u64 dividend, u32 divisor) +{ + if (is_power_of_2(divisor)) + return dividend & (divisor - 1); + return do_div(dividend, divisor); +} +#endif + #ifndef div_u64_rem static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder) {
From: Zhang Yi yi.zhang@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-fsdevel/20240529095206.2568162-1-yi.zhang@huaw...
--------------------------------
iomap_truncate_page() always assumes the block size of the truncating inode is i_blocksize(), this is not always true for some filesystems, e.g. XFS does extent size alignment for realtime inodes. Drop this assumption and pass the block size for zeroing into iomap_truncate_page(), allow filesystems to indicate the correct block size.
Suggested-by: Dave Chinner david@fromorbit.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/iomap/buffered-io.c | 8 ++++---- fs/xfs/xfs_iops.c | 5 +++-- include/linux/iomap.h | 4 ++-- 3 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 45471ee7e919..0bb3257cba42 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -17,6 +17,7 @@ #include <linux/bio.h> #include <linux/sched/signal.h> #include <linux/migrate.h> +#include <linux/math64.h> #include "trace.h"
#include "../internal.h" @@ -1044,11 +1045,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, EXPORT_SYMBOL_GPL(iomap_zero_range);
int -iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops) +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops) { - unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = rem_u64(pos, blocksize);
/* Block boundary? Nothing to do */ if (!off) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a527a544a684..092fb02d1a13 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -769,6 +769,7 @@ xfs_setattr_size( int error; uint lock_flags = 0; bool did_zeroing = false; + unsigned int blocksize = i_blocksize(inode);
ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL)); @@ -830,8 +831,8 @@ xfs_setattr_size( newsize); if (error) return error; - error = iomap_truncate_page(inode, newsize, &did_zeroing, - &xfs_buffered_write_iomap_ops); + error = iomap_truncate_page(inode, newsize, blocksize, + &did_zeroing, &xfs_buffered_write_iomap_ops); }
if (error) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index d14a729d40ce..1b6e22741d43 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -207,8 +207,8 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); -int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); +int iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops); vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops); int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
--------------------------------
Now xfs_itruncate_extents() unmap extsize aligned extents for forcealign, when unaligned truncating down a forceflign file which extsize is bigger than one block, xfs_truncate_page() only zeros out the tail EOF block, this could expose stale data.
If we truncate file that contains a large enough written extent:
|< ext >|< ext >| ...WWWWWWWWWWWWWWWWWWWWWzzzzzzzzzzzz ^ (new EOF) ^ old EOF
Since we only zeros out the tail of the EOF block, and xfs_itruncate_extents() unmap the whole ailgned extents, it becomes this state:
|< ext >| ...WWWzWWWWWWWWWWWWW ^ new EOF
Then if we do an extending write like this, the blocks in the previous tail extent becomes stale:
|< ext >| |< ext >| ...WWWzSSSSSSSSSSSSS......WWWWWWWWWWWzzzzzz ^ old EOF ^ append start ^ new EOF
Fix this by zeroing out the tail allocation uint for forcealign.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_iops.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 092fb02d1a13..0640fc666d83 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -769,7 +769,7 @@ xfs_setattr_size( int error; uint lock_flags = 0; bool did_zeroing = false; - unsigned int blocksize = i_blocksize(inode); + unsigned int blocksize = 0;
ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL)); @@ -777,6 +777,11 @@ xfs_setattr_size( ASSERT((iattr->ia_valid & (ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET| ATTR_MTIME_SET|ATTR_TIMES_SET)) == 0);
+ if (xfs_inode_forcealign(ip) && ip->i_d.di_extsize > 1) + blocksize = ip->i_d.di_extsize; + else + blocksize = i_blocksize(inode); + oldsize = inode->i_size; newsize = iattr->ia_size;
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
--------------------------------
Reflink will not be supported for forcealign yet, so disallow a mount under this condition. This is because we have the limitation of pageache writeback not knowing how to writeback an entire allocation unut, after covert extent form cowfork to datafork, force alignment constraints may be break, so reject a mount with relink.
RT vol will not be supported for forcealign yet, so disallow a mount under this condition. It will be possible to support RT vol and forcealign in future. For this, the inode extsize must be a multiple of rtextsize - this is enforced already in xfs_ioctl_setattr_check_extsize() and xfs_inode_validate_extsize().
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_super.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d43f76a4b99a..f2ff547e760c 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1658,10 +1658,19 @@ xfs_fc_fill_super( } }
- if (xfs_has_forcealign(mp)) + if (xfs_has_forcealign(mp)) { xfs_warn(mp, "EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
+ if (xfs_has_realtime(mp)) { + xfs_alert(mp, + "forcealign not supported for realtime device!"); + error = -EINVAL; + goto out_filestream_unmount; + } + + } + if (xfs_has_atomicwrites(mp)) xfs_warn(mp, "EXPERIMENTAL atomicwrites feature in use. Use at your own risk!"); @@ -1674,6 +1683,14 @@ xfs_fc_fill_super( goto out_filestream_unmount; }
+ if (xfs_has_forcealign(mp)) { + xfs_alert(mp, + "reflink not compatible with forcealign!"); + error = -EINVAL; + goto out_filestream_unmount; + } + + if (xfs_globals.always_cow) { xfs_info(mp, "using DEBUG-only always_cow mode."); mp->m_always_cow = true;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10847 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/W...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10847 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/W...
LGTM
在 2024/8/8 21:32, Long Li 写道:
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
When we are near ENOSPC and don't have enough free space for an args->maxlen allocation, xfs_alloc_space_available() will trim args->maxlen to equal the available space. However, this function has only checked that there is enough contiguous free space for an aligned args->minlen allocation to succeed. Hence there is no guarantee that an args->maxlen allocation will succeed, nor that the available space will allow for correct alignment of an args->maxlen allocation.
Further, by trimming args->maxlen arbitrarily, it breaks an assumption made in xfs_alloc_fix_len() that if the caller wants aligned allocation, then args->maxlen will be set to an aligned value. It then skips the tail alignment and so we end up with extents that aren't aligned to extent size hint boundaries as we approach ENOSPC.
To avoid this problem, don't reduce args->maxlen by some random, arbitrary amount. If args->maxlen is too large for the available space, reduce the allocation to a minlen allocation as we know we have contiguous free space available for this to succeed and always be correctly aligned.
Signed-off-by: Dave Chinner dchinner@redhat.com Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com
fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 95bfe89651ad..19d89bf19b48 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2385,14 +2385,23 @@ xfs_alloc_space_available( if (available < (int)max(args->total, alloc_len)) return false;
- if (flags & XFS_ALLOC_FLAG_CHECK)
return true;
- /*
* Clamp maxlen to the amount of free space available for the actual
* extent allocation.
* If we can't do a maxlen allocation, then we must reduce the size of
* the allocation to match the available free space. We know how big
* the largest contiguous free space we can allocate is, so that's our
* upper bound. However, we don't exaclty know what alignment/size
* constraints have been placed on the allocation, so we can't
* arbitrarily select some new max size. Hence make this a minlen
* allocation as we know that will definitely succeed and match the
*/* callers alignment constraints.
- if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
args->maxlen = available;
- alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
- if (longest < alloc_len) {
ASSERT(args->maxlen > 0);args->maxlen = args->minlen;
ASSERT(args->maxlen >= args->minlen);
}
return true;
LGTM
在 2024/8/8 21:32, Long Li 写道:
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
When we do a large allocation, the core free space allocation code assumes that args->maxlen is aligned to args->prod/args->mod. hence if we get a maximum sized extent allocated, it does not do tail alignment of the extent.
However, this assumes that nothing modifies args->maxlen between the original allocation context setup and trimming the selected free space extent to size. This assumption has recently been found to be invalid - xfs_alloc_space_available() modifies args->maxlen in low space situations - and there may be more situations we haven't yet found like this.
Force aligned allocation introduces the requirement that extents are correctly tail aligned, resulting in this occasional latent alignment failure to be reclassified from an unimportant curiousity to a must-fix bug.
Removing the assumption about args->maxlen allocations always being tail aligned is trivial, and should not impact anything because args->maxlen for inodes with extent size hints configured are already aligned. Hence all this change does it avoid weird corner cases that would have resulted in unaligned extent sizes by always trimming the extent down to an aligned size.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org [provisional on v1 series comment] Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com
fs/xfs/libxfs/xfs_alloc.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 19d89bf19b48..23c0e666d2f4 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -408,20 +408,18 @@ xfs_alloc_compute_diff(
- Fix up the length, based on mod and prod.
- len should be k * prod + mod for some k.
- If len is too small it is returned unchanged.
*/
- If len hits maxlen it is left alone.
-STATIC void +static void xfs_alloc_fix_len(
- xfs_alloc_arg_t *args) /* allocation argument structure */
- struct xfs_alloc_arg *args) {
- xfs_extlen_t k;
- xfs_extlen_t rlen;
xfs_extlen_t k;
xfs_extlen_t rlen = args->len;
ASSERT(args->mod < args->prod);
- rlen = args->len; ASSERT(rlen >= args->minlen); ASSERT(rlen <= args->maxlen);
- if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen ||
- if (args->prod <= 1 || rlen < args->mod || (args->mod == 0 && rlen < args->prod)) return; k = rlen % args->prod;
LGTM
在 2024/8/8 21:32, Long Li 写道:
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
If args->minlen is not aligned to the constraints of forced alignment, we may do minlen allocations that are not aligned when we approach ENOSPC. Avoid this by always aligning args->minlen appropriately. If alignment of minlen results in a value smaller than the alignment constraint, fail the allocation immediately.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: John Garry john.g.garry@oracle.com Conflicts: fs/xfs/libxfs/xfs_bmap.c [Conflicts in xfs_bmap_select_minlen()] Signed-off-by: Long Li leo.lilong@huawei.com
fs/xfs/libxfs/xfs_bmap.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7682dfe2f701..284cc73b8bef 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3253,32 +3253,47 @@ xfs_bmap_longest_free_extent( return error; }
-static void +static int xfs_bmap_select_minlen( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t *blen, int notinit) {
- xfs_extlen_t nlen = 0;
- if (notinit || *blen < ap->minlen) { /*
*/
- Since we did a BUF_TRYLOCK above, it is possible that
- there is space for this request.
args->minlen = ap->minlen;
} else if (*blen < args->maxlen) { /*nlen = ap->minlen;
*/
- If the best seen length is less than the request length,
- use the best as the minimum.
args->minlen = *blen;
} else { /*nlen = *blen;
*/
- Otherwise we've seen an extent as big as maxlen, use that
- as the minimum.
args->minlen = args->maxlen;
nlen = args->maxlen;
}
if (args->alignment > 1) {
nlen = rounddown(nlen, args->alignment);
if (nlen < ap->minlen) {
if (xfs_inode_forcealign(ap->ip) &&
(ap->datatype & XFS_ALLOC_USERDATA))
return -ENOSPC;
nlen = ap->minlen;
}
}
args->minlen = nlen;
return 0; }
STATIC int
@@ -3311,8 +3326,8 @@ xfs_bmap_btalloc_nullfb( break; }
- xfs_bmap_select_minlen(ap, args, blen, notinit);
- return 0;
error = xfs_bmap_select_minlen(ap, args, blen, notinit);
return error; }
STATIC int
@@ -3349,7 +3364,9 @@ xfs_bmap_btalloc_filestreams(
}
- xfs_bmap_select_minlen(ap, args, blen, notinit);
error = xfs_bmap_select_minlen(ap, args, blen, notinit);
if (error)
return error;
/*
- Set the failure fallback case to look in the selected AG as stream
LGTM
在 2024/8/8 21:32, Long Li 写道:
From: John Garry john.g.garry@oracle.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac...
In xfs_bmap_process_allocated_extent(), for when we found that we could not provide the requested length completely, the mapping is moved so that we can provide as much as possible for the original request.
For forcealign, this would mean ignoring alignment guaranteed, so don't do this.
Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com
fs/xfs/libxfs/xfs_bmap.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 34a603c4b2c7..bf09737cfc71 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3687,10 +3687,12 @@ xfs_bmap_btalloc( * very fragmented so we're unlikely to be able to satisfy the * hints anyway. */
if (ap->length <= orig_length)
ap->offset = orig_offset;
else if (ap->offset + ap->length < orig_offset + orig_length)
ap->offset = orig_offset + orig_length - ap->length;
if (!(xfs_inode_forcealign(ap->ip) && align)) {
if (ap->length <= orig_length)
ap->offset = orig_offset;
else if (ap->offset + ap->length < orig_offset + orig_length)
ap->offset = orig_offset + orig_length - ap->length;
xfs_bmap_btalloc_accounting(ap, &args); } else { ap->blkno = NULLFSBLOCK;}
On 2024/8/8 21:32, Long Li wrote:
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA
Set the minlen of extent allocation arguments to the alignment size to enforce alignment. This prevents the extent from being misaligned at the tail when we approach a no-space situation.
Although xfs_bmap_select_minlen updates args->minlen alignment, args-> minlen may revert to ap->minlen in subsequent extent allocations, which could violate the tail alignment.
Signed-off-by: Long Li leo.lilong@huawei.com
fs/xfs/libxfs/xfs_bmap.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 284cc73b8bef..34a603c4b2c7 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3510,6 +3510,7 @@ xfs_bmap_btalloc( */ if (xfs_inode_forcealign(ap->ip) && align) { args.alignment = align;
args.minlen = ap->minlen = align;
建议按照主线的方案,在force aligned的场景下bypass掉出问题的流程,而不是在 这里强制更新minlen。
if (stripe_align == 0 || stripe_align % align) stripe_align = align;
} else {