From: Peter Zijlstra peterz@infradead.org
mainline inclusion from mainline-v5.12-rc1 commit 87ccc826bf1c9e5ab4c2f649b404e02c63e47622 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LCHG CVE: NA
--------------------------------
Currently REG_SP_INDIRECT is unused but means (%rsp + offset), change it to mean (%rsp) + offset.
The reason is that we're going to swizzle stack in the middle of a C function with non-trivial stack footprint. This means that when the unwinder finds the ToS, it needs to dereference it (%rsp) and then add the offset to the next frame, resulting in: (%rsp) + offset
This is somewhat unfortunate, since REG_BP_INDIRECT is used (by DRAP) and thus needs to retain the current (%rbp + offset).
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Miroslav Benes mbenes@suse.cz Acked-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Yipeng Zou zouyipeng@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/unwind_orc.c | 5 ++++- tools/objtool/orc_dump.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index bafe953f5d7f..eea8ec5eca3b 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -450,7 +450,7 @@ bool unwind_next_frame(struct unwind_state *state) break;
case ORC_REG_SP_INDIRECT: - sp = state->sp + orc->sp_offset; + sp = state->sp; indirect = true; break;
@@ -500,6 +500,9 @@ bool unwind_next_frame(struct unwind_state *state) if (indirect) { if (!deref_stack_reg(state, sp, &sp)) goto err; + + if (orc->sp_reg == ORC_REG_SP_INDIRECT) + sp += orc->sp_offset; }
/* Find IP, SP and possibly regs: */ diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c index faa444270ee3..ba28830aace2 100644 --- a/tools/objtool/orc_dump.c +++ b/tools/objtool/orc_dump.c @@ -64,7 +64,7 @@ static void print_reg(unsigned int reg, int offset) if (reg == ORC_REG_BP_INDIRECT) printf("(bp%+d)", offset); else if (reg == ORC_REG_SP_INDIRECT) - printf("(sp%+d)", offset); + printf("(sp)%+d", offset); else if (reg == ORC_REG_UNDEFINED) printf("(und)"); else
From: Josh Poimboeuf jpoimboe@redhat.com
mainline inclusion from mainline-v5.12-rc3 commit b59cc97674c947861783ca92b9a6e7d043adba96 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LCHG CVE: NA
--------------------------------
The ORC unwinder attempts to fall back to frame pointers when ORC data is missing for a given instruction. It sets state->error, but then tries to keep going as a best-effort type of thing. That may result in further warnings if the unwinder gets lost.
Until we have some way to register generated code with the unwinder, missing ORC will be expected, and occasionally going off the rails will also be expected. So don't warn about it.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Borislav Petkov bp@suse.de Tested-by: Ivan Babrou ivan@cloudflare.com Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.161253464... Signed-off-by: Yipeng Zou zouyipeng@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/unwind_orc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index eea8ec5eca3b..fe1bfb3eed89 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -11,7 +11,7 @@
#define orc_warn_current(args...) \ ({ \ - if (state->task == current) \ + if (state->task == current && !state->error) \ orc_warn(args); \ })
From: Dmitry Monakhov dmtrmonakhov@yandex-team.ru
mainline inclusion from mainline-v5.18-rc5 commit 6c8ef58a50b5fab6e364b558143490a2014e2a4f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LCHG CVE: NA
--------------------------------
A crash was observed in the ORC unwinder:
BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0) kernel stack overflow (page fault): 0000 [#1] SMP NOPTI CPU: 93 PID: 23787 Comm: context_switch1 Not tainted 5.4.145 #1 RIP: 0010:unwind_next_frame Call Trace: <NMI> perf_callchain_kernel get_perf_callchain perf_callchain perf_prepare_sample perf_event_output_forward __perf_event_overflow perf_ibs_handle_irq perf_ibs_nmi_handler nmi_handle default_do_nmi do_nmi end_repeat_nmi
This was really two bugs:
1) The perf IBS code passed inconsistent regs to the unwinder.
2) The unwinder didn't handle the bad input gracefully.
Fix the latter bug. The ORC unwinder needs to be immune against bad inputs. The problem is that stack_access_ok() doesn't recheck the validity of the full range of registers after switching to the next valid stack with get_stack_info(). Fix that.
[ jpoimboe: rewrote commit log ]
Signed-off-by: Dmitry Monakhov dmtrmonakhov@yandex-team.ru Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/1650353656-956624-1-git-send-email-dmtrmonakhov@ya... Signed-off-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Yipeng Zou zouyipeng@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/unwind_orc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index fe1bfb3eed89..3ff76f88e220 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -329,11 +329,11 @@ static bool stack_access_ok(struct unwind_state *state, unsigned long _addr, struct stack_info *info = &state->stack_info; void *addr = (void *)_addr;
- if (!on_stack(info, addr, len) && - (get_stack_info(addr, state->task, info, &state->stack_mask))) - return false; + if (on_stack(info, addr, len)) + return true;
- return true; + return !get_stack_info(addr, state->task, info, &state->stack_mask) && + on_stack(info, addr, len); }
static bool deref_stack_reg(struct unwind_state *state, unsigned long addr,
From: "Eric W. Biederman" ebiederm@xmission.com
mainline inclusion from mainline-v5.15-rc1 commit d21918e5a94a862ccb297b9f2be38574c865fda0 category: bugfix bugzilla: 187336, https://gitee.com/openeuler/kernel/issues/I5LCBR CVE: NA
--------------------------------
Replace get_nr_threads with atomic_read(¤t->signal->live) as that is a more accurate number that is decremented sooner.
Acked-by: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/87lf6z6qbd.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
Conflicts: kernel/seccomp.c
Signed-off-by: GONG, Ruiqi gongruiqi1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/seccomp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 369b63ac7cf9..e5a5334c2f96 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -751,7 +751,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, seccomp_log(this_syscall, SIGSYS, action, true); /* Dump core only if this is the last remaining thread. */ if (action == SECCOMP_RET_KILL_PROCESS || - get_nr_threads(current) == 1) { + (atomic_read(¤t->signal->live) == 1)) { siginfo_t info;
/* Show the original registers in the dump. */
From: Juergen Gross jgross@suse.com
stable inclusion from stable-4.19.253 commit 36e2f161fb01795722f2ff1a24d95f08100333dd category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5JTYM CVE: CVE-2022-36123
--------------------------------
[ Upstream commit 38fa5479b41376dc9d7f57e71c83514285a25ca0 ]
The .brk section has the same properties as .bss: it is an alloc-only section and should be cleared before being used.
Not doing so is especially a problem for Xen PV guests, as the hypervisor will validate page tables (check for writable page tables and hypervisor private bits) before accepting them to be used.
Make sure .brk is initially zero by letting clear_bss() clear the brk area, too.
Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Borislav Petkov bp@suse.de Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: GONG, Ruiqi gongruiqi1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/head64.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 88dc38b4a147..90c2613af36b 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -383,6 +383,8 @@ static void __init clear_bss(void) { memset(__bss_start, 0, (unsigned long) __bss_stop - (unsigned long) __bss_start); + memset(__brk_base, 0, + (unsigned long) __brk_limit - (unsigned long) __brk_base); }
static unsigned long get_cmd_line_ptr(void)
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 187345, https://gitee.com/openeuler/kernel/issues/I5KZZ0 CVE: NA
--------------------------------
Otherwise, null pointer crash can be triggered to handle bio in blk_mq_submit_bio() while queue is not initialized.
Since queue is registered right after initialization, use the flag 'QUEUE_FLAG_REGISTERED' to make sure queue is initialized, althrough this will delay a little when bio can be handled, it doesn't hurt in real user case.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk-mq.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c index 34d4fdb4e717..eb89afa84ac5 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1957,6 +1957,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) struct request *same_queue_rq = NULL; blk_qc_t cookie;
+ if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) { + bio_io_error(bio); + return BLK_QC_T_NONE; + } + blk_queue_bounce(q, &bio);
blk_queue_split(q, &bio);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 187345, https://gitee.com/openeuler/kernel/issues/I5KZZ0 CVE: NA
--------------------------------
Commit faf2662e328c ("block: fix that part scan is disabled in device_add_disk()") introduce a regression:
Test procedures: dmsetup create test --notable dmsetup remove test
Test result: dmsetup will stuck forever
Root cause: before: 1) dmsetup creat add_disk_add_disk_no_queue_reg() scan partitions uevent 2) blk_register_queue -> notable will not call this 3) dmsetup remove wait for uevent
after: 1) dmsetup creat add_disk_add_disk_no_queue_reg() 2) blk_register_queue() -> notable will not call this scan_partitions uevent 3) dmsetup remove wait for uevent -> impossible for notable
Fix the problem by moving scan_partitions and uevent from blk_register_queue() to the end of add_disk_add_disk_no_queue_reg().
Fixes: faf2662e328c ("block: fix that part scan is disabled in device_add_disk()") Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk-sysfs.c | 48 ----------------------------------------------- block/genhd.c | 43 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 48 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 530f1bf36c87..30898a7855d7 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -881,42 +881,6 @@ struct kobj_type blk_queue_ktype = { .release = blk_release_queue, };
-static void disk_init_partition(struct gendisk *disk) -{ - struct device *ddev = disk_to_dev(disk); - struct block_device *bdev; - struct disk_part_iter piter; - struct hd_struct *part; - - /* No minors to use for partitions */ - if (!disk_part_scan_enabled(disk)) - goto exit; - - /* No such device (e.g., media were just removed) */ - if (!get_capacity(disk)) - goto exit; - - bdev = bdget_disk(disk, 0); - if (!bdev) - goto exit; - - bdev->bd_invalidated = 1; - if (blkdev_get(bdev, FMODE_READ, NULL)) - goto exit; - blkdev_put(bdev, FMODE_READ); - -exit: - /* announce disk after possible partitions are created */ - dev_set_uevent_suppress(ddev, 0); - kobject_uevent(&ddev->kobj, KOBJ_ADD); - - /* announce possible partitions */ - disk_part_iter_init(&piter, disk, 0); - while ((part = disk_part_iter_next(&piter))) - kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); - disk_part_iter_exit(&piter); -} - /** * blk_register_queue - register a block layer queue with sysfs * @disk: Disk of which the request queue should be registered with sysfs. @@ -972,21 +936,9 @@ int blk_register_queue(struct gendisk *disk) } }
- /* - * Set the flag at last, so that block devcie can't be opened - * before it's registration is done. - */ - disk->flags |= GENHD_FL_UP; ret = 0; unlock: mutex_unlock(&q->sysfs_lock); - /* - * Init partitions after releasing 'sysfs_lock', otherwise lockdep - * will be confused because it will treat 'bd_mutex' from different - * devices as the same lock. - */ - if (!ret) - disk_init_partition(disk);
/* * SCSI probing may synchronously create and destroy a lot of diff --git a/block/genhd.c b/block/genhd.c index 124f8d94584c..4a748603c881 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -639,6 +639,42 @@ static void register_disk(struct device *parent, struct gendisk *disk) } }
+static void disk_init_partition(struct gendisk *disk) +{ + struct device *ddev = disk_to_dev(disk); + struct block_device *bdev; + struct disk_part_iter piter; + struct hd_struct *part; + + /* No minors to use for partitions */ + if (!disk_part_scan_enabled(disk)) + goto exit; + + /* No such device (e.g., media were just removed) */ + if (!get_capacity(disk)) + goto exit; + + bdev = bdget_disk(disk, 0); + if (!bdev) + goto exit; + + bdev->bd_invalidated = 1; + if (blkdev_get(bdev, FMODE_READ, NULL)) + goto exit; + blkdev_put(bdev, FMODE_READ); + +exit: + /* announce disk after possible partitions are created */ + dev_set_uevent_suppress(ddev, 0); + kobject_uevent(&ddev->kobj, KOBJ_ADD); + + /* announce possible partitions */ + disk_part_iter_init(&piter, disk, 0); + while ((part = disk_part_iter_next(&piter))) + kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); + disk_part_iter_exit(&piter); +} + /** * __device_add_disk - add disk information to kernel list * @parent: parent device for the disk @@ -704,6 +740,13 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
disk_add_events(disk); blk_integrity_add(disk); + + /* + * Set the flag at last, so that block devcie can't be opened + * before it's registration is done. + */ + disk->flags |= GENHD_FL_UP; + disk_init_partition(disk); }
void device_add_disk(struct device *parent, struct gendisk *disk)