From: Dave Chinner <dchinner(a)redhat.com>
mainline inclusion
from mainline-v6.9-rc2
commit f2e812c1522dab847912309b00abcc762dd696da
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IA6CG1
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
-------------------------------------------------
syzbot reported an ext4 panic during a page fault where found a
journal handle when it didn't expect to find one. The structure
it tripped over had a value of 'TRAN' in the first entry in the
structure, and that indicates it tripped over a struct xfs_trans
instead of a jbd2 handle.
The reason for this is that the page fault was taken during a
copy-out to a user buffer from an xfs bulkstat operation. XFS uses
an "empty" transaction context for bulkstat to do automated metadata
buffer cleanup, and so the transaction context is valid across the
copyout of the bulkstat info into the user buffer.
We are using empty transaction contexts like this in XFS to reduce
the risk of failing to release objects we reference during the
operation, especially during error handling. Hence we really need to
ensure that we can take page faults from these contexts without
leaving landmines for the code processing the page fault to trip
over.
However, this same behaviour could happen from any other filesystem
that triggers a page fault or any other exception that is handled
on-stack from within a task context that has current->journal_info
set. Having a page fault from some other filesystem bounce into XFS
where we have to run a transaction isn't a bug at all, but the usage
of current->journal_info means that this could result corruption of
the outer task's journal_info structure.
The problem is purely that we now have two different contexts that
now think they own current->journal_info. IOWs, no filesystem can
allow page faults or on-stack exceptions while current->journal_info
is set by the filesystem because the exception processing might use
current->journal_info itself.
If we end up with nested XFS transactions whilst holding an empty
transaction, then it isn't an issue as the outer transaction does
not hold a log reservation. If we ignore the current->journal_info
usage, then the only problem that might occur is a deadlock if the
exception tries to take the same locks the upper context holds.
That, however, is not a problem that setting current->journal_info
would solve, so it's largely an irrelevant concern here.
IOWs, we really only use current->journal_info for a warning check
in xfs_vm_writepages() to ensure we aren't doing writeback from a
transaction context. Writeback might need to do allocation, so it
can need to run transactions itself. Hence it's a debug check to
warn us that we've done something silly, and largely it is not all
that useful.
So let's just remove all the use of current->journal_info in XFS and
get rid of all the potential issues from nested contexts where
current->journal_info might get misused by another filesystem
context.
Reported-by: syzbot+cdee56dbcdf0096ef605(a)syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong(a)kernel.org>
Reviewed-by: Mark Tinguely <mark.tinguely(a)oracle.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Chandan Babu R <chandanbabu(a)kernel.org>
Conflicts:
fs/xfs/scrub/common.c
fs/xfs/xfs_aops.c
[ a03297a0ca9f2("xfs: manage inode DONTCACHE status at irele time") is
not applied. 21b4ee7029c90("xfs: drop ->writepage completely") is not
applied.]
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
---
fs/xfs/xfs_aops.c | 13 -------------
fs/xfs/xfs_icache.c | 8 +++++---
fs/xfs/xfs_trans.h | 9 +--------
3 files changed, 6 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index ef7db9defef1..e751cf34ccba 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -552,12 +552,6 @@ xfs_vm_writepage(
{
struct xfs_writepage_ctx wpc = { };
- if (WARN_ON_ONCE(current->journal_info)) {
- redirty_page_for_writepage(wbc, page);
- unlock_page(page);
- return 0;
- }
-
return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
}
@@ -568,13 +562,6 @@ xfs_vm_writepages(
{
struct xfs_writepage_ctx wpc = { };
- /*
- * Writing back data in a transaction context can result in recursive
- * transactions. This is bad, so issue a warning and get out of here.
- */
- if (WARN_ON_ONCE(current->journal_info))
- return 0;
-
xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bdf55ed88680..fae443e0d120 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2043,8 +2043,10 @@ xfs_inodegc_want_queue_work(
* - Memory shrinkers queued the inactivation worker and it hasn't finished.
* - The queue depth exceeds the maximum allowable percpu backlog.
*
- * Note: If the current thread is running a transaction, we don't ever want to
- * wait for other transactions because that could introduce a deadlock.
+ * Note: If we are in a NOFS context here (e.g. current thread is running a
+ * transaction) the we don't want to block here as inodegc progress may require
+ * filesystem resources we hold to make progress and that could result in a
+ * deadlock. Hence we skip out of here if we are in a scoped NOFS context.
*/
static inline bool
xfs_inodegc_want_flush_work(
@@ -2052,7 +2054,7 @@ xfs_inodegc_want_flush_work(
unsigned int items,
unsigned int shrinker_hits)
{
- if (current->journal_info)
+ if (current->flags & PF_MEMALLOC_NOFS)
return false;
if (shrinker_hits > 0)
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 731af3f04c83..d512fdd5f7c3 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -271,19 +271,14 @@ static inline void
xfs_trans_set_context(
struct xfs_trans *tp)
{
- ASSERT(current->journal_info == NULL);
tp->t_pflags = memalloc_nofs_save();
- current->journal_info = tp;
}
static inline void
xfs_trans_clear_context(
struct xfs_trans *tp)
{
- if (current->journal_info == tp) {
- memalloc_nofs_restore(tp->t_pflags);
- current->journal_info = NULL;
- }
+ memalloc_nofs_restore(tp->t_pflags);
}
static inline void
@@ -291,10 +286,8 @@ xfs_trans_switch_context(
struct xfs_trans *old_tp,
struct xfs_trans *new_tp)
{
- ASSERT(current->journal_info == old_tp);
new_tp->t_pflags = old_tp->t_pflags;
old_tp->t_pflags = 0;
- current->journal_info = new_tp;
}
#endif /* __XFS_TRANS_H__ */
--
2.31.1
From: Dave Chinner <dchinner(a)redhat.com>
mainline inclusion
from mainline-v6.9-rc2
commit f2e812c1522dab847912309b00abcc762dd696da
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IA6CG1
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
-------------------------------------------------
syzbot reported an ext4 panic during a page fault where found a
journal handle when it didn't expect to find one. The structure
it tripped over had a value of 'TRAN' in the first entry in the
structure, and that indicates it tripped over a struct xfs_trans
instead of a jbd2 handle.
The reason for this is that the page fault was taken during a
copy-out to a user buffer from an xfs bulkstat operation. XFS uses
an "empty" transaction context for bulkstat to do automated metadata
buffer cleanup, and so the transaction context is valid across the
copyout of the bulkstat info into the user buffer.
We are using empty transaction contexts like this in XFS to reduce
the risk of failing to release objects we reference during the
operation, especially during error handling. Hence we really need to
ensure that we can take page faults from these contexts without
leaving landmines for the code processing the page fault to trip
over.
However, this same behaviour could happen from any other filesystem
that triggers a page fault or any other exception that is handled
on-stack from within a task context that has current->journal_info
set. Having a page fault from some other filesystem bounce into XFS
where we have to run a transaction isn't a bug at all, but the usage
of current->journal_info means that this could result corruption of
the outer task's journal_info structure.
The problem is purely that we now have two different contexts that
now think they own current->journal_info. IOWs, no filesystem can
allow page faults or on-stack exceptions while current->journal_info
is set by the filesystem because the exception processing might use
current->journal_info itself.
If we end up with nested XFS transactions whilst holding an empty
transaction, then it isn't an issue as the outer transaction does
not hold a log reservation. If we ignore the current->journal_info
usage, then the only problem that might occur is a deadlock if the
exception tries to take the same locks the upper context holds.
That, however, is not a problem that setting current->journal_info
would solve, so it's largely an irrelevant concern here.
IOWs, we really only use current->journal_info for a warning check
in xfs_vm_writepages() to ensure we aren't doing writeback from a
transaction context. Writeback might need to do allocation, so it
can need to run transactions itself. Hence it's a debug check to
warn us that we've done something silly, and largely it is not all
that useful.
So let's just remove all the use of current->journal_info in XFS and
get rid of all the potential issues from nested contexts where
current->journal_info might get misused by another filesystem
context.
Reported-by: syzbot+cdee56dbcdf0096ef605(a)syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong(a)kernel.org>
Reviewed-by: Mark Tinguely <mark.tinguely(a)oracle.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Chandan Babu R <chandanbabu(a)kernel.org>
Signed-off-by: Zhihao Cheng <chengzhihao(a)huaweicloud.com>
---
fs/xfs/scrub/common.c | 4 +---
fs/xfs/xfs_aops.c | 7 -------
fs/xfs/xfs_icache.c | 8 +++++---
fs/xfs/xfs_trans.h | 9 +--------
4 files changed, 7 insertions(+), 21 deletions(-)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 23944fcc1a6c..08e292485268 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -978,9 +978,7 @@ xchk_irele(
struct xfs_scrub *sc,
struct xfs_inode *ip)
{
- if (current->journal_info != NULL) {
- ASSERT(current->journal_info == sc->tp);
-
+ if (sc->tp) {
/*
* If we are in a transaction, we /cannot/ drop the inode
* ourselves, because the VFS will trigger writeback, which
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a2e1f73f84dd..82f18f28c1c1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -477,13 +477,6 @@ xfs_vm_writepages(
{
struct xfs_writepage_ctx wpc = { };
- /*
- * Writing back data in a transaction context can result in recursive
- * transactions. This is bad, so issue a warning and get out of here.
- */
- if (WARN_ON_ONCE(current->journal_info))
- return 0;
-
xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f4971bd1de05..370c0df1a7a5 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2037,8 +2037,10 @@ xfs_inodegc_want_queue_work(
* - Memory shrinkers queued the inactivation worker and it hasn't finished.
* - The queue depth exceeds the maximum allowable percpu backlog.
*
- * Note: If the current thread is running a transaction, we don't ever want to
- * wait for other transactions because that could introduce a deadlock.
+ * Note: If we are in a NOFS context here (e.g. current thread is running a
+ * transaction) the we don't want to block here as inodegc progress may require
+ * filesystem resources we hold to make progress and that could result in a
+ * deadlock. Hence we skip out of here if we are in a scoped NOFS context.
*/
static inline bool
xfs_inodegc_want_flush_work(
@@ -2046,7 +2048,7 @@ xfs_inodegc_want_flush_work(
unsigned int items,
unsigned int shrinker_hits)
{
- if (current->journal_info)
+ if (current->flags & PF_MEMALLOC_NOFS)
return false;
if (shrinker_hits > 0)
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 4e38357237c3..ead65f5f8dc3 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -277,19 +277,14 @@ static inline void
xfs_trans_set_context(
struct xfs_trans *tp)
{
- ASSERT(current->journal_info == NULL);
tp->t_pflags = memalloc_nofs_save();
- current->journal_info = tp;
}
static inline void
xfs_trans_clear_context(
struct xfs_trans *tp)
{
- if (current->journal_info == tp) {
- memalloc_nofs_restore(tp->t_pflags);
- current->journal_info = NULL;
- }
+ memalloc_nofs_restore(tp->t_pflags);
}
static inline void
@@ -297,10 +292,8 @@ xfs_trans_switch_context(
struct xfs_trans *old_tp,
struct xfs_trans *new_tp)
{
- ASSERT(current->journal_info == old_tp);
new_tp->t_pflags = old_tp->t_pflags;
old_tp->t_pflags = 0;
- current->journal_info = new_tp;
}
#endif /* __XFS_TRANS_H__ */
--
2.31.1
From: Dave Chinner <dchinner(a)redhat.com>
mainline inclusion
from mainline-v6.9-rc2
commit f2e812c1522dab847912309b00abcc762dd696da
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IA6CG1
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
-------------------------------------------------
syzbot reported an ext4 panic during a page fault where found a
journal handle when it didn't expect to find one. The structure
it tripped over had a value of 'TRAN' in the first entry in the
structure, and that indicates it tripped over a struct xfs_trans
instead of a jbd2 handle.
The reason for this is that the page fault was taken during a
copy-out to a user buffer from an xfs bulkstat operation. XFS uses
an "empty" transaction context for bulkstat to do automated metadata
buffer cleanup, and so the transaction context is valid across the
copyout of the bulkstat info into the user buffer.
We are using empty transaction contexts like this in XFS to reduce
the risk of failing to release objects we reference during the
operation, especially during error handling. Hence we really need to
ensure that we can take page faults from these contexts without
leaving landmines for the code processing the page fault to trip
over.
However, this same behaviour could happen from any other filesystem
that triggers a page fault or any other exception that is handled
on-stack from within a task context that has current->journal_info
set. Having a page fault from some other filesystem bounce into XFS
where we have to run a transaction isn't a bug at all, but the usage
of current->journal_info means that this could result corruption of
the outer task's journal_info structure.
The problem is purely that we now have two different contexts that
now think they own current->journal_info. IOWs, no filesystem can
allow page faults or on-stack exceptions while current->journal_info
is set by the filesystem because the exception processing might use
current->journal_info itself.
If we end up with nested XFS transactions whilst holding an empty
transaction, then it isn't an issue as the outer transaction does
not hold a log reservation. If we ignore the current->journal_info
usage, then the only problem that might occur is a deadlock if the
exception tries to take the same locks the upper context holds.
That, however, is not a problem that setting current->journal_info
would solve, so it's largely an irrelevant concern here.
IOWs, we really only use current->journal_info for a warning check
in xfs_vm_writepages() to ensure we aren't doing writeback from a
transaction context. Writeback might need to do allocation, so it
can need to run transactions itself. Hence it's a debug check to
warn us that we've done something silly, and largely it is not all
that useful.
So let's just remove all the use of current->journal_info in XFS and
get rid of all the potential issues from nested contexts where
current->journal_info might get misused by another filesystem
context.
Reported-by: syzbot+cdee56dbcdf0096ef605(a)syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong(a)kernel.org>
Reviewed-by: Mark Tinguely <mark.tinguely(a)oracle.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Chandan Babu R <chandanbabu(a)kernel.org>
Conflicts:
fs/xfs/scrub/common.c
fs/xfs/xfs_aops.c
[ a03297a0ca9f2("xfs: manage inode DONTCACHE status at irele time") is
not applied. 21b4ee7029c90("xfs: drop ->writepage completely") is not
applied.]
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
---
fs/xfs/xfs_aops.c | 13 -------------
fs/xfs/xfs_icache.c | 8 +++++---
fs/xfs/xfs_trans.h | 9 +--------
3 files changed, 6 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index ef7db9defef1..e751cf34ccba 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -552,12 +552,6 @@ xfs_vm_writepage(
{
struct xfs_writepage_ctx wpc = { };
- if (WARN_ON_ONCE(current->journal_info)) {
- redirty_page_for_writepage(wbc, page);
- unlock_page(page);
- return 0;
- }
-
return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
}
@@ -568,13 +562,6 @@ xfs_vm_writepages(
{
struct xfs_writepage_ctx wpc = { };
- /*
- * Writing back data in a transaction context can result in recursive
- * transactions. This is bad, so issue a warning and get out of here.
- */
- if (WARN_ON_ONCE(current->journal_info))
- return 0;
-
xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bdf55ed88680..fae443e0d120 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2043,8 +2043,10 @@ xfs_inodegc_want_queue_work(
* - Memory shrinkers queued the inactivation worker and it hasn't finished.
* - The queue depth exceeds the maximum allowable percpu backlog.
*
- * Note: If the current thread is running a transaction, we don't ever want to
- * wait for other transactions because that could introduce a deadlock.
+ * Note: If we are in a NOFS context here (e.g. current thread is running a
+ * transaction) the we don't want to block here as inodegc progress may require
+ * filesystem resources we hold to make progress and that could result in a
+ * deadlock. Hence we skip out of here if we are in a scoped NOFS context.
*/
static inline bool
xfs_inodegc_want_flush_work(
@@ -2052,7 +2054,7 @@ xfs_inodegc_want_flush_work(
unsigned int items,
unsigned int shrinker_hits)
{
- if (current->journal_info)
+ if (current->flags & PF_MEMALLOC_NOFS)
return false;
if (shrinker_hits > 0)
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 731af3f04c83..d512fdd5f7c3 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -271,19 +271,14 @@ static inline void
xfs_trans_set_context(
struct xfs_trans *tp)
{
- ASSERT(current->journal_info == NULL);
tp->t_pflags = memalloc_nofs_save();
- current->journal_info = tp;
}
static inline void
xfs_trans_clear_context(
struct xfs_trans *tp)
{
- if (current->journal_info == tp) {
- memalloc_nofs_restore(tp->t_pflags);
- current->journal_info = NULL;
- }
+ memalloc_nofs_restore(tp->t_pflags);
}
static inline void
@@ -291,10 +286,8 @@ xfs_trans_switch_context(
struct xfs_trans *old_tp,
struct xfs_trans *new_tp)
{
- ASSERT(current->journal_info == old_tp);
new_tp->t_pflags = old_tp->t_pflags;
old_tp->t_pflags = 0;
- current->journal_info = new_tp;
}
#endif /* __XFS_TRANS_H__ */
--
2.31.1
hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9K0H3
--------------------------------
In __blkdev_issue_discard(), q->limits.discard_granularity is read
multi times.
WARN_ON_ONCE(!q->limits.discard_granularity) [1]
...
q->limits.discard_granularity >> SECTOR_SHIFT [2]
...
bio_aligned_discard_max_sectors [3]
It can be changed to 0 after check [1], such as submitting ioctl
'LOOP_SET_STATUS'. This is undesirable. If 'discard_granularity' is set
to 0, BUG_ON might be triggered and the loop of 'while(nr_sects)' will
never exit(see fixes commit). Fix it by checking 'req_sects' before
submit io, return error code directly if it is 0.
Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()")
Signed-off-by: Li Nan <linan122(a)huawei.com>
---
block/blk-lib.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e90614fd8d6a..59ff99bc4698 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -27,7 +27,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct bio **biop)
{
struct request_queue *q = bdev_get_queue(bdev);
- struct bio *bio = *biop;
+ struct bio *bio = NULL;
unsigned int op;
sector_t bs_mask, part_offset = 0;
@@ -92,6 +92,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
req_sects = min_t(sector_t, nr_sects,
granularity_aligned_lba - sector_mapped);
+ if (!req_sects) {
+ /* just put the bio allocated in this function */
+ if (bio) {
+ bio_io_error(bio);
+ bio_put(bio);
+ }
+ return -EOPNOTSUPP;
+ }
+ if (!bio)
+ bio = *biop;
+
WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
bio = blk_next_bio(bio, 0, gfp_mask);
--
2.39.2