bugfix for arch/fs/kernel modules
Arnd Bergmann (2): asm-generic: fix ffs -Wshadow warning seqlock: avoid -Wshadow warnings
Chen Jiahao (1): arm64: seccomp: fix the incorrect name of syscall __NR_compat_exit in secure computing mode
Liang Wang (1): lib: use PFN_PHYS() in devmem_is_allowed()
Lin Ruizhe (1): amba-pl011: Fix no irq issue due to no IRQ domain found
Mark Rutland (1): arm64: fix compat syscall return truncation
Peter Zijlstra (1): kthread: Fix PF_KTHREAD vs to_kthread() race
Yu Kuai (4): block: ensure the memory order between bi_private and bi_status Revert "[Huawei] block: avoid creating invalid symlink file for patitions" Revert "[Backport] block: take bd_mutex around delete_partitions in del_gendisk" blk: reuse lookup_sem to serialize partition operations
Zhihao Cheng (2): mtd: mtdconcat: Judge callback existence based on the master mtd: mtdconcat: Check _read,_write callbacks existence before assignment
arch/arm/mm/mmap.c | 2 +- arch/arm64/include/asm/ptrace.h | 12 +++++++- arch/arm64/include/asm/seccomp.h | 2 +- arch/arm64/include/asm/syscall.h | 19 +++++++------ arch/arm64/kernel/ptrace.c | 2 +- arch/arm64/kernel/signal.c | 3 +- arch/arm64/kernel/syscall.c | 9 ++---- block/genhd.c | 14 +-------- block/partitions/core.c | 4 +++ drivers/mtd/mtdconcat.c | 33 +++++++++++++++------- drivers/tty/serial/amba-pl011.c | 12 +++++++- fs/block_dev.c | 36 ++++++++++++++++++++---- include/asm-generic/bitops/builtin-ffs.h | 5 +--- include/linux/seqlock.h | 14 ++++----- kernel/kthread.c | 33 ++++++++++++++++++---- kernel/sched/fair.c | 2 +- 16 files changed, 134 insertions(+), 68 deletions(-)
From: Arnd Bergmann arnd@arndb.de
mainline inclusion from mainline-v5.11 commit 6f6573a4044adefbd07f1bd951a2041150e888d7 category: bugfix bugzilla: 176150 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
gcc -Wshadow warns about the ffs() definition that has the same name as the global ffs() built-in:
include/asm-generic/bitops/builtin-ffs.h:13:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
This is annoying because 'make W=2' warns every time this header gets included.
Change it to use a #define instead, making callers directly reference the builtin.
Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: He Ying heying24@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- include/asm-generic/bitops/builtin-ffs.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/include/asm-generic/bitops/builtin-ffs.h b/include/asm-generic/bitops/builtin-ffs.h index 458c85ebcd15..1dacfdb4247e 100644 --- a/include/asm-generic/bitops/builtin-ffs.h +++ b/include/asm-generic/bitops/builtin-ffs.h @@ -10,9 +10,6 @@ * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ -static __always_inline int ffs(int x) -{ - return __builtin_ffs(x); -} +#define ffs(x) __builtin_ffs(x)
#endif
From: Arnd Bergmann arnd@arndb.de
mainline inclusion from mainline-v5.11 commit a07c45312f06e288417049208c344ad76074627d category: bugfix bugzilla: 176150 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
When building with W=2, there is a flood of warnings about the seqlock macros shadowing local variables:
19806 linux/seqlock.h:331:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 48 linux/seqlock.h:348:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 8 linux/seqlock.h:379:11: warning: declaration of 'seq' shadows a previous local [-Wshadow]
Prefix the local variables to make the warning useful elsewhere again.
Fixes: 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20201026165044.3722931-1-arnd@kernel.org Signed-off-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: He Ying heying24@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- include/linux/seqlock.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 1ac20d75b061..fb89b05066f4 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -328,13 +328,13 @@ SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mu */ #define __read_seqcount_begin(s) \ ({ \ - unsigned seq; \ + unsigned __seq; \ \ - while ((seq = __seqcount_sequence(s)) & 1) \ + while ((__seq = __seqcount_sequence(s)) & 1) \ cpu_relax(); \ \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ - seq; \ + __seq; \ })
/** @@ -345,10 +345,10 @@ SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mu */ #define raw_read_seqcount_begin(s) \ ({ \ - unsigned seq = __read_seqcount_begin(s); \ + unsigned _seq = __read_seqcount_begin(s); \ \ smp_rmb(); \ - seq; \ + _seq; \ })
/** @@ -376,11 +376,11 @@ SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mu */ #define raw_read_seqcount(s) \ ({ \ - unsigned seq = __seqcount_sequence(s); \ + unsigned __seq = __seqcount_sequence(s); \ \ smp_rmb(); \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ - seq; \ + __seq; \ })
/**
From: Chen Jiahao chenjiahao16@huawei.com
hulk inclusion category: bugfix bugzilla: 176178 https://gitee.com/openeuler/kernel/issues/I4DDEL
--------
In secure computing mode, due to the incorrect name of syscall __NR_compat_exit, while setting as strict mode, calling exit(0) will return SIGKILL, which does not match our expectation. This patch fixes it.
Fixes: 2227c11c5f07 ("[Huawei] arm64: secomp: fix the secure computing mode 1 syscall check for ilp32") Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm64/include/asm/seccomp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h index 0be58ac682c4..bc3ab2468f03 100644 --- a/arch/arm64/include/asm/seccomp.h +++ b/arch/arm64/include/asm/seccomp.h @@ -25,7 +25,7 @@ static inline const int *get_compat_mode1_syscalls(void) #ifdef CONFIG_AARCH32_EL0 static const int mode1_syscalls_a32[] = { __NR_compat_read, __NR_compat_write, - __NR_compat_read, __NR_compat_sigreturn, + __NR_compat_exit, __NR_compat_sigreturn, 0, /* null terminated */ }; #endif
From: Lin Ruizhe linruizhe@huawei.com
hulk inclusion category: bugfix bugzilla: 176552 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
If pl011 interrupt is connected to MBIGEN interrupt controller, because the mbigen initialization is too late, which will lead to no IRQ due to no IRQ domain found, logs is shown below, "irq: no irq domain found for uart0 !"
When dev->irq[0] is zero, try to get IRQ by of_irq_get() again, and return -EPROBE_DEFER if the IRQ domain is not yet created.
Using deferred probing mechanism to fix the issue.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Lin ruizhe linruizhe@huawei.com Signed-off-by: He Ying heying24@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/tty/serial/amba-pl011.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index 87dc3fc15694..51ca2d4a8bb3 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -41,6 +41,7 @@ #include <linux/sizes.h> #include <linux/io.h> #include <linux/acpi.h> +#include <linux/of_irq.h>
#include "amba-pl011.h"
@@ -2665,9 +2666,18 @@ static int pl011_probe(struct amba_device *dev, const struct amba_id *id) uap->vendor = vendor; uap->fifosize = vendor->get_fifosize(dev); uap->port.iotype = vendor->access_32b ? UPIO_MEM32 : UPIO_MEM; - uap->port.irq = dev->irq[0]; uap->port.ops = &amba_pl011_pops;
+ /* if no irq domain found, irq number is 0, try again */ + if (!dev->irq[0] && dev->dev.of_node) { + ret = of_irq_get(dev->dev.of_node, 0); + if (ret < 0) + return ret; + dev->irq[0] = ret; + } + + uap->port.irq = dev->irq[0]; + snprintf(uap->type, sizeof(uap->type), "PL011 rev%u", amba_rev(dev));
ret = pl011_setup_port(&dev->dev, uap, &dev->res, portnr);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 167067 https://gitee.com/openeuler/kernel/issues/I4DDEL
--------------------------------
When running stress test on null_blk under linux-4.19.y, the following warning is reported:
percpu_ref_switch_to_atomic_rcu: percpu ref (css_release) <= 0 (-3) after switching to atomic
The cause is that css_put() is invoked twice on the same bio as shown below:
CPU 1: CPU 2:
// IO completion kworker // IO submit thread __blkdev_direct_IO_simple submit_bio
bio_endio bio_uninit(bio) css_put(bi_css) bi_css = NULL set_current_state(TASK_UNINTERRUPTIBLE) bio->bi_end_io blkdev_bio_end_io_simple bio->bi_private = NULL // bi_private is NULL READ_ONCE(bio->bi_private) wake_up_process smp_mb__after_spinlock
bio_unint(bio) // read bi_css as no-NULL // so call css_put() again css_put(bi_css)
Because there is no memory barriers between the reading and the writing of bi_private and bi_css, so reading bi_private as NULL can not guarantee bi_css will also be NULL on weak-memory model host (e.g, ARM64).
For the latest kernel source, css_put() has been removed from bio_unint(), but the memory-order problem still exists, because the order between bio->bi_private and {bi_status|bi_blkg} is also assumed in __blkdev_direct_IO_simple(). It is reproducible that __blkdev_direct_IO_simple() may read bi_status as 0 event if bi_status is set as an errno in req_bio_endio().
In __blkdev_direct_IO(), the memory order between dio->waiter and dio->bio.bi_status is not guaranteed neither. Until now it is unable to reproduce it, maybe because dio->waiter and dio->bio.bi_status are in the same cache-line. But it is better to add guarantee for memory order.
Fixing it by using smp_wmb() and spm_rmb() to guarantee the order between {bio->bi_private|dio->waiter} and {bi_status|bi_blkg}.
Fixes: 189ce2b9dcc3 ("block: fast-path for small and simple direct I/O requests") Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- fs/block_dev.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 7bc660054d21..d2881fd351a3 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -229,6 +229,11 @@ static void blkdev_bio_end_io_simple(struct bio *bio) { struct task_struct *waiter = bio->bi_private;
+ /* + * Paired with smp_rmb() in __blkdev_direct_IO_simple() to ensure + * the order between bi_private and bi_xxx. + */ + smp_wmb(); WRITE_ONCE(bio->bi_private, NULL); blk_wake_io_task(waiter); } @@ -288,8 +293,15 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, qc = submit_bio(&bio); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); - if (!READ_ONCE(bio.bi_private)) + if (!READ_ONCE(bio.bi_private)) { + /* + * Paired with smp_wmb() in + * blkdev_bio_end_io_simple(). + */ + smp_rmb(); break; + } + if (!(iocb->ki_flags & IOCB_HIPRI) || !blk_poll(bdev_get_queue(bdev), qc, true)) blk_io_schedule(); @@ -358,6 +370,11 @@ static void blkdev_bio_end_io(struct bio *bio) } else { struct task_struct *waiter = dio->waiter;
+ /* + * Paired with smp_rmb() in __blkdev_direct_IO() to + * ensure the order between dio->waiter and bio->bi_xxx. + */ + smp_wmb(); WRITE_ONCE(dio->waiter, NULL); blk_wake_io_task(waiter); } @@ -483,8 +500,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); - if (!READ_ONCE(dio->waiter)) + if (!READ_ONCE(dio->waiter)) { + /* Paired with smp_wmb() in blkdev_bio_end_io(). */ + smp_rmb(); break; + }
if (!(iocb->ki_flags & IOCB_HIPRI) || !blk_poll(bdev_get_queue(bdev), qc, true))
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 55097 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
This reverts commit 6afb4716beff7102784a06fda7df7cd703721a8d.
The patch set for partition symlink cleanup will introduce deadlock for nbd, loop and xen-blkfront driver, so revert it.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- block/genhd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c index 5a7196286241..b6e7c00c384a 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -920,7 +920,6 @@ void del_gendisk(struct gendisk *disk) bdev = bdget_disk(disk, 0); if (bdev) mutex_lock(&bdev->bd_mutex); - disk->flags &= ~GENHD_FL_UP; /* invalidate stuff */ disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE); @@ -936,6 +935,7 @@ void del_gendisk(struct gendisk *disk)
invalidate_partition(disk, 0); set_capacity(disk, 0); + disk->flags &= ~GENHD_FL_UP; up_write(&disk->lookup_sem);
if (!(disk->flags & GENHD_FL_HIDDEN))
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 55097 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
This reverts commit 5ff55bd87e0c5a1f7ca9c802b73368ea1cfa282f.
The patch set for partition symlink cleanup will introduce deadlock for nbd, loop and xen-blkfront driver, so revert it.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- block/genhd.c | 12 ------------ 1 file changed, 12 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c index b6e7c00c384a..6566eacc807d 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -901,7 +901,6 @@ void del_gendisk(struct gendisk *disk) { struct disk_part_iter piter; struct hd_struct *part; - struct block_device *bdev;
might_sleep();
@@ -913,13 +912,6 @@ void del_gendisk(struct gendisk *disk) * disk is marked as dead (GENHD_FL_UP cleared). */ down_write(&disk->lookup_sem); - /* - * If bdev is null, that means memory allocate fail. Then - * add_partitions can also fail. - */ - bdev = bdget_disk(disk, 0); - if (bdev) - mutex_lock(&bdev->bd_mutex); /* invalidate stuff */ disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE); @@ -928,10 +920,6 @@ void del_gendisk(struct gendisk *disk) delete_partition(part); } disk_part_iter_exit(&piter); - if (bdev) { - mutex_unlock(&bdev->bd_mutex); - bdput(bdev); - }
invalidate_partition(disk, 0); set_capacity(disk, 0);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 55097 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
Now there no protection between partition operations (e.g, partition rescan) and delete_partition() in del_gendisk(), so the following scenario is possible:
CPU 1
blkdev_ioctl del_gendisk blkdev_reread_part lock bd_mutex drop_partitions check_partition lock lookup_sem // for each partition deletion_partion
// for each partition add_partition
The newly added partitions, the device files (e.g, /dev/sdXN) and the symlinks in /sys/class/block will be left behind. If the deleted disk is online again, the scan of partition will fail with the following error:
sysfs: cannot create duplicate filename '/class/block/sdaN' sdX: pN could not be added: 17
Vanilla kernel tries to fix that by commit c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk"), but it introduces dead-lock for nbd/loop/xen-frontblk drivers. These in-tree drivers can be fixed, but there may be other affected block drivers, especially the out-of-tree ones, so fixing it in another way.
Two methods are considered. The first is waiting for the end of partition operations in del_gendisk(). It is OK but it needs adding new fields in gendisk (bool & wait_queue_head_t). The second is reusing lookup_sem and GENHD_FL_UP to serialize partition operations and del_gendisk(). Now the latter is chose and here are the details.
There are six partition operations:
(1) add_partition() in blkpg_ioctl() (2) deletion_partion() in blkpg_ioctl() (3) resize in blkpg_ioctl() (4) partition rescan and revalidate in bdev_disk_changed() (5) deletion_partion() in del_gendisk()
op (1)~(4) already take bd_mutex, so using down_read() to serialize with down_write() in del_gendisk() is OK. op (3) only updates the values in hd_struct, so no lock is needed, because it already increase the ref of hd_struct.
lookup_sem is used to prevent a newly-created blocking device inode from associating with a deleting gendisk, and the locking order is:
part->bd_mutex -> disk->lookup_sem or whole->bd_mutex -> disk->lookup_sem
Now it is also used to serialize the partition operations and the new locking order will be:
part->bd_mutex -> whole->bd_mutex -> disk->lookup_sem
and it is OK.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- block/partitions/core.c | 4 ++++ fs/block_dev.c | 12 ++++++++---- 2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/block/partitions/core.c b/block/partitions/core.c index 1f031f074ffd..569b0ca9f6e1 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -523,6 +523,7 @@ int bdev_add_partition(struct block_device *bdev, int partno, int ret;
mutex_lock(&bdev->bd_mutex); + down_read(&disk->lookup_sem); if (!(disk->flags & GENHD_FL_UP)) { ret = -ENXIO; goto out; @@ -537,6 +538,7 @@ int bdev_add_partition(struct block_device *bdev, int partno, ADDPART_FLAG_NONE, NULL); ret = PTR_ERR_OR_ZERO(part); out: + up_read(&disk->lookup_sem); mutex_unlock(&bdev->bd_mutex); return ret; } @@ -553,6 +555,7 @@ int bdev_del_partition(struct block_device *bdev, int partno)
mutex_lock(&bdevp->bd_mutex); mutex_lock_nested(&bdev->bd_mutex, 1); + down_read(&bdev->bd_disk->lookup_sem);
ret = -ENXIO; part = disk_get_part(bdev->bd_disk, partno); @@ -569,6 +572,7 @@ int bdev_del_partition(struct block_device *bdev, int partno) delete_partition(part); ret = 0; out_unlock: + up_read(&bdev->bd_disk->lookup_sem); mutex_unlock(&bdev->bd_mutex); mutex_unlock(&bdevp->bd_mutex); bdput(bdevp); diff --git a/fs/block_dev.c b/fs/block_dev.c index d2881fd351a3..46801789f2dc 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1427,14 +1427,16 @@ int bdev_disk_changed(struct block_device *bdev, bool invalidate) int ret;
lockdep_assert_held(&bdev->bd_mutex); - - if (!(disk->flags & GENHD_FL_UP)) - return -ENXIO; + down_read(&disk->lookup_sem); + if (!(disk->flags & GENHD_FL_UP)) { + ret = -ENXIO; + goto out; + }
rescan: ret = blk_drop_partitions(bdev); if (ret) - return ret; + goto out;
clear_bit(GD_NEED_PART_SCAN, &disk->state);
@@ -1469,6 +1471,8 @@ int bdev_disk_changed(struct block_device *bdev, bool invalidate) kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE); }
+out: + up_read(&disk->lookup_sem); return ret; } /*
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-5.14-rc2 commit e30e8d46cf605d216a799a28c77b8a41c328613a category: bugfix bugzilla: 176549 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------------------
Due to inconsistencies in the way we manipulate compat GPRs, we have a few issues today:
* For audit and tracing, where error codes are handled as a (native) long, negative error codes are expected to be sign-extended to the native 64-bits, or they may fail to be matched correctly. Thus a syscall which fails with an error may erroneously be identified as failing.
* For ptrace, *all* compat return values should be sign-extended for consistency with 32-bit arm, but we currently only do this for negative return codes.
* As we may transiently set the upper 32 bits of some compat GPRs while in the kernel, these can be sampled by perf, which is somewhat confusing. This means that where a syscall returns a pointer above 2G, this will be sign-extended, but will not be mistaken for an error as error codes are constrained to the inclusive range [-4096, -1] where no user pointer can exist.
To fix all of these, we must consistently use helpers to get/set the compat GPRs, ensuring that we never write the upper 32 bits of the return code, and always sign-extend when reading the return code. This patch does so, with the following changes:
* We re-organise syscall_get_return_value() to always sign-extend for compat tasks, and reimplement syscall_get_error() atop. We update syscall_trace_exit() to use syscall_get_return_value().
* We consistently use syscall_set_return_value() to set the return value, ensureing the upper 32 bits are never set unexpectedly.
* As the core audit code currently uses regs_return_value() rather than syscall_get_return_value(), we special-case this for compat_user_mode(regs) such that this will do the right thing. Going forward, we should try to move the core audit code over to syscall_get_return_value().
Cc: stable@vger.kernel.org Reported-by: He Zhe zhe.he@windriver.com Reported-by: weiyuchen weiyuchen3@huawei.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Jiahao Chen chenjiahao16@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm64/include/asm/ptrace.h | 12 +++++++++++- arch/arm64/include/asm/syscall.h | 19 ++++++++++--------- arch/arm64/kernel/ptrace.c | 2 +- arch/arm64/kernel/signal.c | 3 ++- arch/arm64/kernel/syscall.c | 9 +++------ 5 files changed, 27 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index eb9803a29f8c..34ed891da81b 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -316,7 +316,17 @@ static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
static inline unsigned long regs_return_value(struct pt_regs *regs) { - return regs->regs[0]; + unsigned long val = regs->regs[0]; + + /* + * Audit currently uses regs_return_value() instead of + * syscall_get_return_value(). Apply the same sign-extension here until + * audit is updated to use syscall_get_return_value(). + */ + if (a32_user_mode(regs)) + val = sign_extend64(val, 31); + + return val; }
static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index 579300bb03fe..45ac51ba05fc 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -33,22 +33,23 @@ static inline void syscall_rollback(struct task_struct *task, regs->regs[0] = regs->orig_x0; }
- -static inline long syscall_get_error(struct task_struct *task, - struct pt_regs *regs) +static inline long syscall_get_return_value(struct task_struct *task, + struct pt_regs *regs) { - unsigned long error = regs->regs[0]; + unsigned long val = regs->regs[0];
if (is_a32_compat_thread(task_thread_info(task))) - error = sign_extend64(error, 31); + val = sign_extend64(val, 31);
- return IS_ERR_VALUE(error) ? error : 0; + return val; }
-static inline long syscall_get_return_value(struct task_struct *task, - struct pt_regs *regs) +static inline long syscall_get_error(struct task_struct *task, + struct pt_regs *regs) { - return regs->regs[0]; + unsigned long error = syscall_get_return_value(task, regs); + + return IS_ERR_VALUE(error) ? error : 0; }
static inline void syscall_set_return_value(struct task_struct *task, diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 70315c44a134..74b925c8d534 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1841,7 +1841,7 @@ void syscall_trace_exit(struct pt_regs *regs) audit_syscall_exit(regs);
if (flags & _TIF_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, regs_return_value(regs)); + trace_sys_exit(regs, syscall_get_return_value(current, regs));
if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) tracehook_report_syscall(regs, PTRACE_SYSCALL_EXIT); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 955a9dceed4b..519be804d381 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -29,6 +29,7 @@ #include <asm/unistd.h> #include <asm/fpsimd.h> #include <asm/ptrace.h> +#include <asm/syscall.h> #include <asm/signal32.h> #include <asm/traps.h> #include <asm/vdso.h> @@ -666,7 +667,7 @@ static void do_signal(struct pt_regs *regs) retval == -ERESTART_RESTARTBLOCK || (retval == -ERESTARTSYS && !(ksig.ka.sa.sa_flags & SA_RESTART)))) { - regs->regs[0] = -EINTR; + syscall_set_return_value(current, regs, -EINTR, 0); regs->pc = continue_addr; }
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 158680961477..66ca9534bd69 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -50,10 +50,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, ret = do_ni_syscall(regs, scno); }
- if (is_a32_compat_task()) - ret = lower_32_bits(ret); - - regs->regs[0] = ret; + syscall_set_return_value(current, regs, 0, ret); }
static inline bool has_syscall_work(unsigned long flags) @@ -128,7 +125,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, * syscall. do_notify_resume() will send a signal to userspace * before the syscall is restarted. */ - regs->regs[0] = -ERESTARTNOINTR; + syscall_set_return_value(current, regs, -ERESTARTNOINTR, 0); return; }
@@ -149,7 +146,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, * anyway. */ if (scno == NO_SYSCALL) - regs->regs[0] = -ENOSYS; + syscall_set_return_value(current, regs, -ENOSYS, 0); scno = syscall_trace_enter(regs); if (scno == NO_SYSCALL) goto trace_exit;
From: Liang Wang wangliang101@huawei.com
hulk inclusion category: bugfix bugzilla: 176713 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://lore.kernel.org/stable/20210731025057.78825-1-wangliang101@huawei.co...
--------------------------------
The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address,use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated.
We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned.
This bug was initially introduced from v2.6.37, and the function was moved to lib when v5.11.
Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang wangliang101@huawei.com Reviewed-by: Luis Chamberlain mcgrof@kernel.org Cc: Palmer Dabbelt palmerdabbelt@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Russell King linux@armlinux.org.uk Cc: Liang Wang wangliang101@huawei.com Cc: Xiaoming Ni nixiaoming@huawei.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org [2.6.37+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Stephen Rothwell sfr@canb.auug.org.au [KF: fix devmem_is_allowed() on ARM] Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Tong Tiangen tongtiangen@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index b8d912ac9e61..c64124fb34ea 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@ -179,7 +179,7 @@ int valid_mmap_phys_addr_range(unsigned long pfn, size_t size) */ int devmem_is_allowed(unsigned long pfn) { - if (iomem_is_exclusive(pfn << PAGE_SHIFT)) + if (iomem_is_exclusive(PFN_PHYS(pfn))) return 0; if (!page_is_ram(pfn)) return 1;
From: Zhihao Cheng chengzhihao1@huawei.com
hulk inclusion category: bugfix bugzilla: 175251 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
Since commit 46b5889cc2c5("mtd: implement proper partition handling") applied, mtd partition device won't hold some callback functions, such as _block_isbad, _block_markbad, etc. Besides, function mtd_block_isbad() will get mtd device's master mtd device, then invokes master mtd device's callback function. So, following process may result mtd_block_isbad() always return 0, even though mtd device has bad blocks:
1. Split a mtd device into 3 partitions: PA, PB, PC [ Each mtd partition device won't has callback function _block_isbad(). ] 2. Concatenate PA and PB as a new mtd device PN [ mtd_concat_create() finds out each subdev has no callback function _block_isbad(), so PN won't be assigned callback function concat_block_isbad(). ] Then, mtd_block_isbad() checks "!master->_block_isbad" is true, will always return 0.
Reproducer: // reproduce.c static int __init init_diy_module(void) { struct mtd_info *mtd[2]; struct mtd_info *mtd_combine = NULL;
mtd[0] = get_mtd_device_nm("NAND simulator partition 0"); if (!mtd[0]) { pr_err("cannot find mtd1\n"); return -EINVAL; } mtd[1] = get_mtd_device_nm("NAND simulator partition 1"); if (!mtd[1]) { pr_err("cannot find mtd2\n"); return -EINVAL; }
put_mtd_device(mtd[0]); put_mtd_device(mtd[1]);
mtd_combine = mtd_concat_create(mtd, 2, "Combine mtd"); if (mtd_combine == NULL) { pr_err("combine failed\n"); return -EINVAL; }
mtd_device_register(mtd_combine, NULL, 0); pr_info("Combine success\n");
return 0; }
1. ID="0x20,0xac,0x00,0x15" 2. modprobe nandsim id_bytes=$ID parts=50,100 badblocks=100 3. insmod reproduce.ko 4. flash_erase /dev/mtd3 0 0 libmtd: error!: MEMERASE64 ioctl failed for eraseblock 100 (mtd3) error 5 (Input/output error) // Should be "flash_erase: Skipping bad block at 00c80000"
Link: https://lkml.org/lkml/2021/7/30/1148 Fixes: 46b5889cc2c54bac ("mtd: implement proper partition handling") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/mtd/mtdconcat.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/mtd/mtdconcat.c b/drivers/mtd/mtdconcat.c index 6e4d0017c0bd..af51eee6b5e8 100644 --- a/drivers/mtd/mtdconcat.c +++ b/drivers/mtd/mtdconcat.c @@ -641,6 +641,7 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[], /* subdevices to c int i; size_t size; struct mtd_concat *concat; + struct mtd_info *subdev_master = NULL; uint32_t max_erasesize, curr_erasesize; int num_erase_region; int max_writebufsize = 0; @@ -679,17 +680,19 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[], /* subdevices to c concat->mtd.subpage_sft = subdev[0]->subpage_sft; concat->mtd.oobsize = subdev[0]->oobsize; concat->mtd.oobavail = subdev[0]->oobavail; - if (subdev[0]->_writev) + + subdev_master = mtd_get_master(subdev[0]); + if (subdev_master->_writev) concat->mtd._writev = concat_writev; - if (subdev[0]->_read_oob) + if (subdev_master->_read_oob) concat->mtd._read_oob = concat_read_oob; - if (subdev[0]->_write_oob) + if (subdev_master->_write_oob) concat->mtd._write_oob = concat_write_oob; - if (subdev[0]->_block_isbad) + if (subdev_master->_block_isbad) concat->mtd._block_isbad = concat_block_isbad; - if (subdev[0]->_block_markbad) + if (subdev_master->_block_markbad) concat->mtd._block_markbad = concat_block_markbad; - if (subdev[0]->_panic_write) + if (subdev_master->_panic_write) concat->mtd._panic_write = concat_panic_write;
concat->mtd.ecc_stats.badblocks = subdev[0]->ecc_stats.badblocks; @@ -721,14 +724,22 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[], /* subdevices to c subdev[i]->flags & MTD_WRITEABLE; }
+ subdev_master = mtd_get_master(subdev[i]); concat->mtd.size += subdev[i]->size; concat->mtd.ecc_stats.badblocks += subdev[i]->ecc_stats.badblocks; if (concat->mtd.writesize != subdev[i]->writesize || concat->mtd.subpage_sft != subdev[i]->subpage_sft || concat->mtd.oobsize != subdev[i]->oobsize || - !concat->mtd._read_oob != !subdev[i]->_read_oob || - !concat->mtd._write_oob != !subdev[i]->_write_oob) { + !concat->mtd._read_oob != !subdev_master->_read_oob || + !concat->mtd._write_oob != !subdev_master->_write_oob) { + /* + * Check against subdev[i] for data members, because + * subdev's attributes may be different from master + * mtd device. Check against subdev's master mtd + * device for callbacks, because the existence of + * subdev's callbacks is decided by master mtd device. + */ kfree(concat); printk("Incompatible OOB or ECC data on "%s"\n", subdev[i]->name);
From: Zhihao Cheng chengzhihao1@huawei.com
hulk inclusion category: bugfix bugzilla: 175251 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
Since 2431c4f5b46c3 ("mtd: Implement mtd_{read,write}() as wrappers around mtd_{read,write}_oob()") don't allow _write|_read and _write_oob|_read_oob existing at the same time, we should check the existence of callbacks "_read and _write" from subdev's master device (We can trust master device since it has been registered) before assigning, otherwise following warning occurs while making concatenated device:
WARNING: CPU: 2 PID: 6728 at drivers/mtd/mtdcore.c:595 add_mtd_device+0x7f/0x7b0
Link: https://lkml.org/lkml/2021/7/30/1148 Fixes: 2431c4f5b46c3 ("mtd: Implement mtd_{read,write}() around ...") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/mtd/mtdconcat.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/mtdconcat.c b/drivers/mtd/mtdconcat.c index af51eee6b5e8..f685a581df48 100644 --- a/drivers/mtd/mtdconcat.c +++ b/drivers/mtd/mtdconcat.c @@ -694,6 +694,10 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[], /* subdevices to c concat->mtd._block_markbad = concat_block_markbad; if (subdev_master->_panic_write) concat->mtd._panic_write = concat_panic_write; + if (subdev_master->_read) + concat->mtd._read = concat_read; + if (subdev_master->_write) + concat->mtd._write = concat_write;
concat->mtd.ecc_stats.badblocks = subdev[0]->ecc_stats.badblocks;
@@ -755,8 +759,6 @@ struct mtd_info *mtd_concat_create(struct mtd_info *subdev[], /* subdevices to c concat->mtd.name = name;
concat->mtd._erase = concat_erase; - concat->mtd._read = concat_read; - concat->mtd._write = concat_write; concat->mtd._sync = concat_sync; concat->mtd._lock = concat_lock; concat->mtd._unlock = concat_unlock;
From: Peter Zijlstra peterz@infradead.org
mainline inclusion from mainline-v5.13-rc1 commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 bugzilla: 52510 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The kthread_is_per_cpu() construct relies on only being called on PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the following usage pattern:
if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
However, as reported by syzcaller, this is broken. The scenario is:
CPU0 CPU1 (running p)
(p->flags & PF_KTHREAD) // true
begin_new_exec() me->flags &= ~(PF_KTHREAD|...); kthread_is_per_cpu(p) to_kthread(p) WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT*
Introduce __to_kthread() that omits the WARN and is sure to check both values.
Use this to remove the problematic pattern for kthread_is_per_cpu() and fix a number of other kthread_*() functions that have similar issues but are currently not used in ways that would expose the problem.
Notably kthread_func() is only ever called on 'current', while kthread_probe_data() is only used for PF_WQ_WORKER, which implies the task is from kthread_create*().
Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider Valentin.Schneider@arm.com Link: https://lkml.kernel.org/r/YH6WJc825C4P0FCK@hirez.programming.kicks-ass.net Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com
Conflicts: kernel/sched/core.c Reviewed-by: Chen Hui judy.chenhui@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- kernel/kthread.c | 33 +++++++++++++++++++++++++++------ kernel/sched/fair.c | 2 +- 2 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c index 9825cf89c614..508fe5278285 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -84,6 +84,25 @@ static inline struct kthread *to_kthread(struct task_struct *k) return (__force void *)k->set_child_tid; }
+/* + * Variant of to_kthread() that doesn't assume @p is a kthread. + * + * Per construction; when: + * + * (p->flags & PF_KTHREAD) && p->set_child_tid + * + * the task is both a kthread and struct kthread is persistent. However + * PF_KTHREAD on it's own is not, kernel_thread() can exec() (See umh.c and + * begin_new_exec()). + */ +static inline struct kthread *__to_kthread(struct task_struct *p) +{ + void *kthread = (__force void *)p->set_child_tid; + if (kthread && !(p->flags & PF_KTHREAD)) + kthread = NULL; + return kthread; +} + void free_kthread_struct(struct task_struct *k) { struct kthread *kthread; @@ -168,8 +187,9 @@ EXPORT_SYMBOL_GPL(kthread_freezable_should_stop); */ void *kthread_func(struct task_struct *task) { - if (task->flags & PF_KTHREAD) - return to_kthread(task)->threadfn; + struct kthread *kthread = __to_kthread(task); + if (kthread) + return kthread->threadfn; return NULL; } EXPORT_SYMBOL_GPL(kthread_func); @@ -199,10 +219,11 @@ EXPORT_SYMBOL_GPL(kthread_data); */ void *kthread_probe_data(struct task_struct *task) { - struct kthread *kthread = to_kthread(task); + struct kthread *kthread = __to_kthread(task); void *data = NULL;
- copy_from_kernel_nofault(&data, &kthread->data, sizeof(data)); + if (kthread) + copy_from_kernel_nofault(&data, &kthread->data, sizeof(data)); return data; }
@@ -514,9 +535,9 @@ void kthread_set_per_cpu(struct task_struct *k, int cpu) set_bit(KTHREAD_IS_PER_CPU, &kthread->flags); }
-bool kthread_is_per_cpu(struct task_struct *k) +bool kthread_is_per_cpu(struct task_struct *p) { - struct kthread *kthread = to_kthread(k); + struct kthread *kthread = __to_kthread(p); if (!kthread) return false;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bd42f8e9f5ad..32fea109e604 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7804,7 +7804,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) return 0;
/* Disregard pcpu kthreads; they are where they need to be. */ - if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p)) + if (kthread_is_per_cpu(p)) return 0;
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {