[PATCH OLK-5.10 v5 00/12] xfs: some fix for forcealign

newer
[PATCH OLK-5.10] net/hinic3: fix...

Long Li

14 Oct 2024 14 Oct '24

8:54 p.m.

This patch set fix some bug for forcealign: patch 1 ~ 5 : fix tail alignment issue while approach to no space. patch 6 : fix forcealign not compatible with reflink and realtime. patch 7 : only datafork need bunmap algin for focealign patch 8 ~ 11 : fix truncate for forcealign Dave Chinner (3): xfs: only allow minlen allocations when near ENOSPC xfs: always tail align maxlen allocations xfs: align args->minlen for forced allocation alignment John Garry (1): xfs: Don't revert allocated offset for forcealign Long Li (5): xfs: don't attempting non-aligned fallbacks alloc for forcealign xfs: simplify extent allocation alignment xfs: forcealign not compatible with reflink and realtime device xfs: only bunmap align in datafork for forcealign xfs: correct the truncate blocksize of forcealign Zhang Yi (3): math64: add rem_u64() to just return the remainder iomap: pass blocksize to iomap_truncate_page() xfs: refactor the truncating order fs/iomap/buffered-io.c | 8 +-- fs/xfs/libxfs/xfs_alloc.c | 31 +++++---- fs/xfs/libxfs/xfs_bmap.c | 140 ++++++++++++++++++++------------------ fs/xfs/xfs_iops.c | 124 +++++++++++++++++---------------- fs/xfs/xfs_super.c | 19 +++++- include/linux/iomap.h | 4 +- include/linux/math64.h | 24 +++++++ 7 files changed, 208 insertions(+), 142 deletions(-) -- 2.39.2

Show replies by date

Long Li

14 Oct 14 Oct

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 01/12] xfs: only allow minlen allocations when near ENOSPC

From: Dave Chinner <dchinner@redhat.com> maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac... -------------------------------- When we are near ENOSPC and don't have enough free space for an args->maxlen allocation, xfs_alloc_space_available() will trim args->maxlen to equal the available space. However, this function has only checked that there is enough contiguous free space for an aligned args->minlen allocation to succeed. Hence there is no guarantee that an args->maxlen allocation will succeed, nor that the available space will allow for correct alignment of an args->maxlen allocation. Further, by trimming args->maxlen arbitrarily, it breaks an assumption made in xfs_alloc_fix_len() that if the caller wants aligned allocation, then args->maxlen will be set to an aligned value. It then skips the tail alignment and so we end up with extents that aren't aligned to extent size hint boundaries as we approach ENOSPC. To avoid this problem, don't reduce args->maxlen by some random, arbitrary amount. If args->maxlen is too large for the available space, reduce the allocation to a minlen allocation as we know we have contiguous free space available for this to succeed and always be correctly aligned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 95bfe89651ad..19d89bf19b48 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2385,14 +2385,23 @@ xfs_alloc_space_available( if (available < (int)max(args->total, alloc_len)) return false; + if (flags & XFS_ALLOC_FLAG_CHECK) + return true; + /* - * Clamp maxlen to the amount of free space available for the actual - * extent allocation. + * If we can't do a maxlen allocation, then we must reduce the size of + * the allocation to match the available free space. We know how big + * the largest contiguous free space we can allocate is, so that's our + * upper bound. However, we don't exaclty know what alignment/size + * constraints have been placed on the allocation, so we can't + * arbitrarily select some new max size. Hence make this a minlen + * allocation as we know that will definitely succeed and match the + * callers alignment constraints. */ - if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) { - args->maxlen = available; + alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop; + if (longest < alloc_len) { + args->maxlen = args->minlen; ASSERT(args->maxlen > 0); - ASSERT(args->maxlen >= args->minlen); } return true; -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 02/12] xfs: always tail align maxlen allocations

From: Dave Chinner <dchinner@redhat.com> maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac... -------------------------------- When we do a large allocation, the core free space allocation code assumes that args->maxlen is aligned to args->prod/args->mod. hence if we get a maximum sized extent allocated, it does not do tail alignment of the extent. However, this assumes that nothing modifies args->maxlen between the original allocation context setup and trimming the selected free space extent to size. This assumption has recently been found to be invalid - xfs_alloc_space_available() modifies args->maxlen in low space situations - and there may be more situations we haven't yet found like this. Force aligned allocation introduces the requirement that extents are correctly tail aligned, resulting in this occasional latent alignment failure to be reclassified from an unimportant curiousity to a must-fix bug. Removing the assumption about args->maxlen allocations always being tail aligned is trivial, and should not impact anything because args->maxlen for inodes with extent size hints configured are already aligned. Hence all this change does it avoid weird corner cases that would have resulted in unaligned extent sizes by always trimming the extent down to an aligned size. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> [provisional on v1 series comment] Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_alloc.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 19d89bf19b48..23c0e666d2f4 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -408,20 +408,18 @@ xfs_alloc_compute_diff( * Fix up the length, based on mod and prod. * len should be k * prod + mod for some k. * If len is too small it is returned unchanged. - * If len hits maxlen it is left alone. */ -STATIC void +static void xfs_alloc_fix_len( - xfs_alloc_arg_t *args) /* allocation argument structure */ + struct xfs_alloc_arg *args) { - xfs_extlen_t k; - xfs_extlen_t rlen; + xfs_extlen_t k; + xfs_extlen_t rlen = args->len; ASSERT(args->mod < args->prod); - rlen = args->len; ASSERT(rlen >= args->minlen); ASSERT(rlen <= args->maxlen); - if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen || + if (args->prod <= 1 || rlen < args->mod || (args->mod == 0 && rlen < args->prod)) return; k = rlen % args->prod; -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 03/12] xfs: align args->minlen for forced allocation alignment

From: Dave Chinner <dchinner@redhat.com> maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac... -------------------------------- If args->minlen is not aligned to the constraints of forced alignment, we may do minlen allocations that are not aligned when we approach ENOSPC. Avoid this by always aligning args->minlen appropriately. If alignment of minlen results in a value smaller than the alignment constraint, fail the allocation immediately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Conflicts: fs/xfs/libxfs/xfs_bmap.c [Conflicts in xfs_bmap_select_minlen()] Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_bmap.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7682dfe2f701..284cc73b8bef 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3253,32 +3253,47 @@ xfs_bmap_longest_free_extent( return error; } -static void +static int xfs_bmap_select_minlen( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t *blen, int notinit) { + xfs_extlen_t nlen = 0; + if (notinit || *blen < ap->minlen) { /* * Since we did a BUF_TRYLOCK above, it is possible that * there is space for this request. */ - args->minlen = ap->minlen; + nlen = ap->minlen; } else if (*blen < args->maxlen) { /* * If the best seen length is less than the request length, * use the best as the minimum. */ - args->minlen = *blen; + + nlen = *blen; } else { /* * Otherwise we've seen an extent as big as maxlen, use that * as the minimum. */ - args->minlen = args->maxlen; + nlen = args->maxlen; } + + if (args->alignment > 1) { + nlen = rounddown(nlen, args->alignment); + if (nlen < ap->minlen) { + if (xfs_inode_forcealign(ap->ip) && + (ap->datatype & XFS_ALLOC_USERDATA)) + return -ENOSPC; + nlen = ap->minlen; + } + } + args->minlen = nlen; + return 0; } STATIC int @@ -3311,8 +3326,8 @@ xfs_bmap_btalloc_nullfb( break; } - xfs_bmap_select_minlen(ap, args, blen, notinit); - return 0; + error = xfs_bmap_select_minlen(ap, args, blen, notinit); + return error; } STATIC int @@ -3349,7 +3364,9 @@ xfs_bmap_btalloc_filestreams( } - xfs_bmap_select_minlen(ap, args, blen, notinit); + error = xfs_bmap_select_minlen(ap, args, blen, notinit); + if (error) + return error; /* * Set the failure fallback case to look in the selected AG as stream -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 04/12] xfs: Don't revert allocated offset for forcealign

From: John Garry <john.g.garry@oracle.com> maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA Reference: https://lore.kernel.org/linux-xfs/20240705162450.3481169-1-john.g.garry@orac... -------------------------------- In xfs_bmap_process_allocated_extent(), for when we found that we could not provide the requested length completely, the mapping is moved so that we can provide as much as possible for the original request. For forcealign, this would mean ignoring alignment guaranteed, so don't do this. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_bmap.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 284cc73b8bef..35728779b0d6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3686,10 +3686,12 @@ xfs_bmap_btalloc( * very fragmented so we're unlikely to be able to satisfy the * hints anyway. */ - if (ap->length <= orig_length) - ap->offset = orig_offset; - else if (ap->offset + ap->length < orig_offset + orig_length) - ap->offset = orig_offset + orig_length - ap->length; + if (!(xfs_inode_forcealign(ap->ip) && align)) { + if (ap->length <= orig_length) + ap->offset = orig_offset; + else if (ap->offset + ap->length < orig_offset + orig_length) + ap->offset = orig_offset + orig_length - ap->length; + } xfs_bmap_btalloc_accounting(ap, &args); } else { ap->blkno = NULLFSBLOCK; -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 05/12] xfs: don't attempting non-aligned fallbacks alloc for forcealign

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA -------------------------------- When forced allocation alignment is specified, the extent will be aligned to the extent size hint size rather than stripe alignment. If aligned allocation cannot be done, then the allocation is failed rather than attempting non-aligned fallbacks. Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_bmap.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 35728779b0d6..89f843a63842 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3565,7 +3565,13 @@ xfs_bmap_btalloc( * is only set if the allocation length is >= the stripe unit and the * allocation offset is at the end of file. */ - if (!(ap->tp->t_flags & XFS_TRANS_LOWMODE) && ap->aeof) { + args.minalignslop = 0; + if (ap->tp->t_flags & XFS_TRANS_LOWMODE) { + if (args.alignment > 1 && xfs_inode_forcealign(ap->ip)) { + args.fsbno = NULLFSBLOCK; + goto alloc_out; + } + } else if (ap->aeof) { if (!ap->offset) { args.alignment = stripe_align; atype = args.type; @@ -3577,7 +3583,6 @@ xfs_bmap_btalloc( if (blen > args.alignment && blen <= args.maxlen + args.alignment) args.minlen = blen - args.alignment; - args.minalignslop = 0; } else { /* * First try an exact bno allocation. @@ -3604,8 +3609,6 @@ xfs_bmap_btalloc( else args.minalignslop = 0; } - } else { - args.minalignslop = 0; } args.postallocs = 1; args.minleft = ap->minleft; @@ -3632,8 +3635,16 @@ xfs_bmap_btalloc( return error; } - if (isaligned && args.fsbno == NULLFSBLOCK && - (args.alignment <= 1 || !xfs_inode_forcealign(ap->ip))) { + if (args.fsbno == NULLFSBLOCK && args.alignment > 1 && + xfs_inode_forcealign(ap->ip)) { + /* + * Don't attempting non-aligned fallbacks alloc + * for forcealign + */ + goto alloc_out; + } + + if (isaligned && args.fsbno == NULLFSBLOCK) { /* * allocation failed, so turn off alignment and * try again. @@ -3660,6 +3671,8 @@ xfs_bmap_btalloc( return error; ap->tp->t_flags |= XFS_TRANS_LOWMODE; } + +alloc_out: if (args.fsbno != NULLFSBLOCK) { /* * check the allocation happened at the same or higher AG than -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 06/12] xfs: simplify extent allocation alignment

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA -------------------------------- We currently align extent allocation to stripe unit or stripe width. That is specified by an external parameter to the allocation code, which then manipulates the xfs_alloc_args alignment configuration in interesting ways. The args->alignment field specifies extent start alignment, but because we may be attempting non-aligned allocation first there are also slop variables that allow for those allocation attempts to account for aligned allocation if they fail. This gets much more complex as we introduce forced allocation alignment, where extent size hints are used to generate the extent start alignment. extent size hints currently only affect extent lengths (via args->prod and args->mod) and so with this change we will have two different start alignment conditions. Avoid this complexity by always using args->alignment to indicate extent start alignment, and always using args->prod/mod to indicate extent length adjustment. Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_bmap.c | 78 ++++++++++++++-------------------------- 1 file changed, 27 insertions(+), 51 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 89f843a63842..9e6a609450fe 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3262,6 +3262,10 @@ xfs_bmap_select_minlen( { xfs_extlen_t nlen = 0; + /* Adjust best length for extent start alignment. */ + if (*blen > args->alignment) + *blen -= args->alignment; + if (notinit || *blen < ap->minlen) { /* * Since we did a BUF_TRYLOCK above, it is possible that @@ -3436,9 +3440,8 @@ xfs_bmap_btalloc( xfs_fileoff_t orig_offset; xfs_extlen_t orig_length; xfs_extlen_t blen; - xfs_extlen_t nextminlen = 0; + xfs_extlen_t alignment; int nullfb; /* true if ap->firstblock isn't set */ - int isaligned; int tryagain; int error; int stripe_align; @@ -3497,7 +3500,7 @@ xfs_bmap_btalloc( /* * Normal allocation, done through xfs_alloc_vextent. */ - tryagain = isaligned = 0; + tryagain = 0; memset(&args, 0, sizeof(args)); args.tp = ap->tp; args.mp = mp; @@ -3508,13 +3511,12 @@ xfs_bmap_btalloc( * xfs_get_cowextsz_hint() returns extsz_hint for when forcealign is * set as forcealign and cowextsz_hint are mutually exclusive */ - if (xfs_inode_forcealign(ap->ip) && align) { + if (xfs_inode_forcealign(ap->ip)) args.alignment = align; - if (stripe_align == 0 || stripe_align % align) - stripe_align = align; - } else { + else if (stripe_align) + args.alignment = stripe_align; + else args.alignment = 1; - } /* Trim the allocation back to the maximum an AG can fit. */ args.maxlen = min(ap->length, mp->m_ag_max_usable); @@ -3571,44 +3573,21 @@ xfs_bmap_btalloc( args.fsbno = NULLFSBLOCK; goto alloc_out; } - } else if (ap->aeof) { - if (!ap->offset) { - args.alignment = stripe_align; - atype = args.type; - isaligned = 1; - /* - * Adjust minlen to try and preserve alignment if we - * can't guarantee an aligned maxlen extent. - */ - if (blen > args.alignment && - blen <= args.maxlen + args.alignment) - args.minlen = blen - args.alignment; - } else { - /* - * First try an exact bno allocation. - * If it fails then do a near or start bno - * allocation with alignment turned on. - */ - atype = args.type; - tryagain = 1; - args.type = XFS_ALLOCTYPE_THIS_BNO; - /* - * Compute the minlen+alignment for the - * next case. Set slop so that the value - * of minlen+alignment+slop doesn't go up - * between the calls. - */ - if (blen > stripe_align && blen <= args.maxlen) - nextminlen = blen - stripe_align; - else - nextminlen = args.minlen; - if (nextminlen + stripe_align > args.minlen + 1) - args.minalignslop = - nextminlen + stripe_align - - args.minlen - 1; - else - args.minalignslop = 0; - } + args.alignment = 1; + } else if (ap->aeof && ap->offset) { + /* + * First try an exact bno allocation. + * If it fails then do a near or start bno + * allocation with alignment turned on. + */ + alignment = args.alignment; + atype = args.type; + tryagain = 1; + args.type = XFS_ALLOCTYPE_THIS_BNO; + args.fsbno = ap->blkno; + + args.alignment = 1; + args.minalignslop = alignment - args.alignment; } args.postallocs = 1; args.minleft = ap->minleft; @@ -3627,10 +3606,8 @@ xfs_bmap_btalloc( */ args.type = atype; args.fsbno = ap->blkno; - args.alignment = stripe_align; - args.minlen = nextminlen; + args.alignment = alignment; args.minalignslop = 0; - isaligned = 1; if ((error = xfs_alloc_vextent(&args))) return error; } @@ -3644,12 +3621,11 @@ xfs_bmap_btalloc( goto alloc_out; } - if (isaligned && args.fsbno == NULLFSBLOCK) { + if (args.alignment > 1 && args.fsbno == NULLFSBLOCK) { /* * allocation failed, so turn off alignment and * try again. */ - args.type = atype; args.fsbno = ap->blkno; args.alignment = 0; if ((error = xfs_alloc_vextent(&args))) -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 07/12] xfs: forcealign not compatible with reflink and realtime device

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA -------------------------------- Reflink will not be supported for forcealign yet, so disallow a mount under this condition. This is because we have the limitation of pageache writeback not knowing how to writeback an entire allocation unut, after covert extent form cowfork to datafork, force alignment constraints may be break, so reject a mount with relink. RT vol will not be supported for forcealign yet, so disallow a mount under this condition. It will be possible to support RT vol and forcealign in future. For this, the inode extsize must be a multiple of rtextsize - this is enforced already in xfs_ioctl_setattr_check_extsize() and xfs_inode_validate_extsize(). Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/xfs_super.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d43f76a4b99a..f2ff547e760c 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1658,10 +1658,19 @@ xfs_fc_fill_super( } } - if (xfs_has_forcealign(mp)) + if (xfs_has_forcealign(mp)) { xfs_warn(mp, "EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!"); + if (xfs_has_realtime(mp)) { + xfs_alert(mp, + "forcealign not supported for realtime device!"); + error = -EINVAL; + goto out_filestream_unmount; + } + + } + if (xfs_has_atomicwrites(mp)) xfs_warn(mp, "EXPERIMENTAL atomicwrites feature in use. Use at your own risk!"); @@ -1674,6 +1683,14 @@ xfs_fc_fill_super( goto out_filestream_unmount; } + if (xfs_has_forcealign(mp)) { + xfs_alert(mp, + "reflink not compatible with forcealign!"); + error = -EINVAL; + goto out_filestream_unmount; + } + + if (xfs_globals.always_cow) { xfs_info(mp, "using DEBUG-only always_cow mode."); mp->m_always_cow = true; -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 08/12] xfs: only bunmap align in datafork for forcealign

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA -------------------------------- We only need bunmap align in datafork for forcealign, so fix it add extra datafork check. Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_bmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 9e6a609450fe..1323259192d6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5297,7 +5297,7 @@ __xfs_bunmapi( isrt = (whichfork == XFS_DATA_FORK) && XFS_IS_REALTIME_INODE(ip); end = start + len; if (xfs_inode_forcealign(ip) && ip->i_d.di_extsize > 1 - && S_ISREG(VFS_I(ip)->i_mode)) { + && S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) { start = roundup_64(start, ip->i_d.di_extsize); end = rounddown_64(end, ip->i_d.di_extsize); len = end - start; -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 09/12] math64: add rem_u64() to just return the remainder

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 10/12] iomap: pass blocksize to iomap_truncate_page()

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 11/12] xfs: refactor the truncating order

From: Zhang Yi <yi.zhang@huawei.com> maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA Reference: https://lore.kernel.org/linux-fsdevel/20240529095206.2568162-1-yi.zhang@huaw... -------------------------------- When truncating down an inode, we call xfs_truncate_page() to zero out the tail partial block that beyond new EOF, which prevents exposing stale data. But xfs_truncate_page() always assumes the blocksize is i_blocksize(inode), it's not always true if we have a large allocation unit for a file and we should aligned to this unitsize, e.g. realtime inode should aligned to the rtextsize. Current xfs_setattr_size() can't support zeroing out a large alignment size on trucate down since the process order is wrong. We first do zero out through xfs_truncate_page(), and then update inode size through truncate_setsize() immediately. If the zeroed range is larger than a folio, the write back path would not write back zeroed pagecache beyond the EOF folio, so it doesn't write zeroes to the entire tail extent and could expose stale data after an appending write into the next aligned extent. We need to adjust the order to zero out tail aligned blocks, write back zeroed or cached data, update i_size and drop cache beyond aligned EOF block, preparing for the fix of realtime inode and supporting the upcoming forced alignment feature. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/xfs_iops.c | 117 ++++++++++++++++++++++++---------------------- 1 file changed, 61 insertions(+), 56 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 092fb02d1a13..d8caa2f495ff 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -769,7 +769,7 @@ xfs_setattr_size( int error; uint lock_flags = 0; bool did_zeroing = false; - unsigned int blocksize = i_blocksize(inode); + bool write_back = false; ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL)); @@ -806,21 +806,10 @@ xfs_setattr_size( */ inode_dio_wait(inode); - /* - * File data changes must be complete before we start the transaction to - * modify the inode. This needs to be done before joining the inode to - * the transaction because the inode cannot be unlocked once it is a - * part of the transaction. - * - * Start with zeroing any data beyond EOF that we may expose on file - * extension, or zeroing out the rest of the block on a downward - * truncate. - */ - if (newsize > oldsize) { - trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); - error = iomap_zero_range(inode, oldsize, newsize - oldsize, - &did_zeroing, &xfs_buffered_write_iomap_ops); - } else { + write_back = newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size; + if (newsize < oldsize) { + unsigned int blocksize = i_blocksize(inode); + /* * iomap won't detect a dirty page over an unwritten block (or a * cow block over a hole) and subsequently skips zeroing the @@ -828,53 +817,69 @@ xfs_setattr_size( * convert the block before the pagecache truncate. */ error = filemap_write_and_wait_range(inode->i_mapping, newsize, - newsize); + roundup_64(newsize, blocksize) - 1); if (error) return error; + error = iomap_truncate_page(inode, newsize, blocksize, &did_zeroing, &xfs_buffered_write_iomap_ops); - } + if (error) + return error; + /* + * We are going to log the inode size change in this transaction + * so any previous writes that are beyond the on disk EOF and + * the new EOF that have not been written out need to be written + * here. If we do not write the data out, we expose ourselves + * to the null files problem. Note that this includes any block + * zeroing we did above; otherwise those blocks may not be + * zeroed after a crash. + */ + if (did_zeroing || write_back) { + error = filemap_write_and_wait_range(inode->i_mapping, + min_t(loff_t, ip->i_d.di_size, newsize), + roundup_64(newsize, blocksize) - 1); + if (error) + return error; + } - if (error) - return error; + /* + * Updating i_size after writing back to make sure the zeroed + * blocks could been written out, and drop all the page cache + * range that beyond blocksize aligned new EOF block. + * + * We've already locked out new page faults, so now we can + * safely remove pages from the page cache knowing they won't + * get refaulted until we drop the XFS_MMAP_EXCL lock after the + * extent manipulations are complete. + */ + i_size_write(inode, newsize); + truncate_pagecache(inode, roundup_64(newsize, blocksize)); + } else { + /* + * Start with zeroing any data beyond EOF that we may expose on + * file extension. + */ + if (newsize > oldsize) { + trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); + error = iomap_zero_range(inode, oldsize, newsize - oldsize, + &did_zeroing, &xfs_buffered_write_iomap_ops); + if (error) + return error; + } - /* - * We've already locked out new page faults, so now we can safely remove - * pages from the page cache knowing they won't get refaulted until we - * drop the XFS_MMAP_EXCL lock after the extent manipulations are - * complete. The truncate_setsize() call also cleans partial EOF page - * PTEs on extending truncates and hence ensures sub-page block size - * filesystems are correctly handled, too. - * - * We have to do all the page cache truncate work outside the - * transaction context as the "lock" order is page lock->log space - * reservation as defined by extent allocation in the writeback path. - * Hence a truncate can fail with ENOMEM from xfs_trans_alloc(), but - * having already truncated the in-memory version of the file (i.e. made - * user visible changes). There's not much we can do about this, except - * to hope that the caller sees ENOMEM and retries the truncate - * operation. - * - * And we update in-core i_size and truncate page cache beyond newsize - * before writeback the [di_size, newsize] range, so we're guaranteed - * not to write stale data past the new EOF on truncate down. - */ - truncate_setsize(inode, newsize); + /* + * The truncate_setsize() call also cleans partial EOF page + * PTEs on extending truncates and hence ensures sub-page block + * size filesystems are correctly handled, too. + */ + truncate_setsize(inode, newsize); - /* - * We are going to log the inode size change in this transaction so - * any previous writes that are beyond the on disk EOF and the new - * EOF that have not been written out need to be written here. If we - * do not write the data out, we expose ourselves to the null files - * problem. Note that this includes any block zeroing we did above; - * otherwise those blocks may not be zeroed after a crash. - */ - if (did_zeroing || - (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) { - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - ip->i_d.di_size, newsize - 1); - if (error) - return error; + if (did_zeroing || write_back) { + error = filemap_write_and_wait_range(inode->i_mapping, + ip->i_d.di_size, newsize - 1); + if (error) + return error; + } } error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); -- 2.39.2

Long Li

8:54 p.m.

New subject: [PATCH OLK-5.10 v5 12/12] xfs: correct the truncate blocksize of forcealign

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA -------------------------------- Now xfs_itruncate_extents() unmap extsize aligned extents for forcealign, when unaligned truncating down a forceflign file which extsize is bigger than one block, xfs_truncate_page() only zeros out the tail EOF block, this could expose stale data. If we truncate file that contains a large enough written extent: |< ext >|< ext >| ...WWWWWWWWWWWWWWWWWWWWWzzzzzzzzzzzz ^ (new EOF) ^ old EOF Since we only zeros out the tail of the EOF block, and xfs_itruncate_extents() unmap the whole ailgned extents, it becomes this state: |< ext >| ...WWWzWWWWWWWWWWWWW ^ new EOF Then if we do an extending write like this, the blocks in the previous tail extent becomes stale: |< ext >| |< ext >| ...WWWzSSSSSSSSSSSSS......WWWWWWWWWWWzzzzzz ^ old EOF ^ append start ^ new EOF Fix this by zeroing out the tail allocation uint for forcealign. Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/xfs_iops.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index d8caa2f495ff..30dc960951ca 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -770,6 +770,7 @@ xfs_setattr_size( uint lock_flags = 0; bool did_zeroing = false; bool write_back = false; + unsigned int blocksize = 0; ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL)); @@ -777,6 +778,11 @@ xfs_setattr_size( ASSERT((iattr->ia_valid & (ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET| ATTR_MTIME_SET|ATTR_TIMES_SET)) == 0); + if (xfs_inode_forcealign(ip) && ip->i_d.di_extsize > 1) + blocksize = ip->i_d.di_extsize << i_blocksize(inode); + else + blocksize = i_blocksize(inode); + oldsize = inode->i_size; newsize = iattr->ia_size; @@ -808,8 +814,6 @@ xfs_setattr_size( write_back = newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size; if (newsize < oldsize) { - unsigned int blocksize = i_blocksize(inode); - /* * iomap won't detect a dirty page over an unwritten block (or a * cow block over a hole) and subsequently skips zeroing the -- 2.39.2

patchwork bot

9 p.m.

反馈：您发送到kernel@openeuler.org的补丁/补丁集，已成功转换为PR！ PR链接地址： https://gitee.com/openeuler/kernel/pulls/12187 邮件列表地址：https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/5... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/12187 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/5...

483

Age (days ago)

483

Last active (days ago)

List overview

13 comments

2 participants

participants (2)

Long Li
patchwork bot

[PATCH OLK-5.10 v5 00/12] xfs: some fix for forcealign

tags

participants (2)