From: Thomas Gleixner tglx@linutronix.de
stable inclusion from stable-v4.19.274 commit d6a300076d11a6e27b4d4f7fd986ec66ee97a3e1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit d125d1349abeb46945dc5e98f7824bf688266f13 upstream.
syzbot reported a RCU stall which is caused by setting up an alarmtimer with a very small interval and ignoring the signal. The reproducer arms the alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a problem per se, but that's an issue when the signal is ignored because then the timer is immediately rearmed because there is no way to delay that rearming to the signal delivery path. See posix_timer_fn() and commit 58229a189942 ("posix-timers: Prevent softirq starvation by small intervals and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers signal with SIGCONT. That has the same effect as explicitely setting SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so the tracer can see it, which delays the restart of the timer to the signal delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...> ... ./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns [ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: ... [ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran: [ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0: [ 184.669821][ C0] NMI backtrace for cpu 0 [ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0 ... [ 184.670036][ C0] Call Trace: [ 184.670041][ C0] <IRQ> [ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for timers which have the signal ignored is smaller than a jiffie and artifically delay it by shifting the next expiry out by a jiffie. That's accurate vs. the overrun accounting, but slightly inaccurate vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix available for the regular userspace induced SIG_IGN mechanism, but that did not work due to the implicit ignore for SIGCONT and similar signals. This needs to be worked on, but for now the only available workaround is to do exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1 ("alarmtimer: Rate limit periodic intervals") already, but that fix got lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b@syzkaller.appspotmail.com Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine") Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: John Stultz jstultz@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/time/alarmtimer.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 6a2ba39889bd..56af8a97cf2d 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -476,11 +476,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval) } EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval) +static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle) { struct alarm_base *base = &alarm_bases[alarm->type]; + ktime_t now = base->gettime(); + + if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) { + /* + * Same issue as with posix_timer_fn(). Timers which are + * periodic but the signal is ignored can starve the system + * with a very small interval. The real fix which was + * promised in the context of posix_timer_fn() never + * materialized, but someone should really work on it. + * + * To prevent DOS fake @now to be 1 jiffie out which keeps + * the overrun accounting correct but creates an + * inconsistency vs. timer_gettime(2). + */ + ktime_t kj = NSEC_PER_SEC / HZ; + + if (interval < kj) + now = ktime_add(now, kj); + } + + return alarm_forward(alarm, now, interval); +}
- return alarm_forward(alarm, base->gettime(), interval); +u64 alarm_forward_now(struct alarm *alarm, ktime_t interval) +{ + return __alarm_forward_now(alarm, interval, false); } EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -554,9 +578,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm, if (posix_timer_event(ptr, si_private) && ptr->it_interval) { /* * Handle ignored signals and rearm the timer. This will go - * away once we handle ignored signals proper. + * away once we handle ignored signals proper. Ensure that + * small intervals cannot starve the system. */ - ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval); + ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true); ++ptr->it_requeue_pending; ptr->it_active = 1; result = ALARMTIMER_RESTART;
From: Vishal Verma vishal.l.verma@intel.com
stable inclusion from stable-v4.19.275 commit 5f8401d7dba21e549306fe4a7a9ff8bf3bd8d56a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit fb6df4366f86dd252bfa3049edffa52d17e7b895 ]
Lockdep reports that acpi_nfit_shutdown() may deadlock against an opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a 'work' and therefore has already acquired workqueue-internal locks. It also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first acquires init_mutex, and was subsequently attempting to cancel any pending workqueue items. This reversed locking order causes a potential deadlock:
====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc3 #116 Tainted: G O N ------------------------------------------------------ libndctl/1958 is trying to acquire lock: ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450
but task is already holding lock: ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit]
which lock already depends on the new lock.
...
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work));
*** DEADLOCK ***
Since the workqueue manipulation is protected by its own internal locking, the cancellation of pending work doesn't need to be done under acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the init_mutex to fix the deadlock. Any work that starts after acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the cancel_delayed_work_sync() will safely flush it out.
Reported-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Vishal Verma vishal.l.verma@intel.com Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/acpi/nfit/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 4b863bcd2259..77ed2f255146 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -3437,8 +3437,8 @@ void acpi_nfit_shutdown(void *data)
mutex_lock(&acpi_desc->init_mutex); set_bit(ARS_CANCEL, &acpi_desc->scrub_flags); - cancel_delayed_work_sync(&acpi_desc->dwork); mutex_unlock(&acpi_desc->init_mutex); + cancel_delayed_work_sync(&acpi_desc->dwork);
/* * Bounce the nvdimm bus lock to make sure any in-flight
From: Zhen Lei thunder.leizhen@huawei.com
stable inclusion from stable-v4.19.276 commit b84d49628b0fcb1c4925b565ccfeb50a0b0f4630 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit 47904aed898a08f028572b9b5a5cc101ddfb2d82 ]
The type of member ->irqs_sum is unsigned long, but kstat_cpu_irqs_sum() returns int, which can result in truncation. Therefore, change the kstat_cpu_irqs_sum() function's return value to unsigned long to avoid truncation.
Fixes: f2c66cd8eedd ("/proc/stat: scalability of irq num per cpu") Reported-by: Elliott, Robert (Servers) elliott@hpe.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Cc: Tejun Heo tj@kernel.org Cc: "Peter Zijlstra (Intel)" peterz@infradead.org Cc: Josh Don joshdon@google.com Cc: Andrew Morton akpm@linux-foundation.org Reviewed-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/kernel_stat.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 594d6c7ee5d4..42c0b770bca1 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -75,7 +75,7 @@ extern unsigned int kstat_irqs_usr(unsigned int irq); /* * Number of interrupts per cpu, since bootup */ -static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu) +static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu) { return kstat_cpu(cpu).irqs_sum; }
From: Daniil Tatianin d-tatianin@yandex-team.ru
stable inclusion from stable-v4.19.276 commit 331db828d34c37c466e4915fd50ea51415da143c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit ca843a4c79486e99a19b859ef0b9887854afe146 ]
Previously acpi_ns_simple_repair() would crash if expected_btypes contained any combination of ACPI_RTYPE_NONE with a different type, e.g | ACPI_RTYPE_INTEGER because of slightly incorrect logic in the !return_object branch, which wouldn't return AE_AML_NO_RETURN_VALUE for such cases.
Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool.
Link: https://github.com/acpica/acpica/pull/811 Fixes: 61db45ca2163 ("ACPICA: Restore code that repairs NULL package elements in return values.") Signed-off-by: Daniil Tatianin d-tatianin@yandex-team.ru Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/acpi/acpica/nsrepair.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/acpica/nsrepair.c b/drivers/acpi/acpica/nsrepair.c index ff2ab8fbec38..8de80bf7802b 100644 --- a/drivers/acpi/acpica/nsrepair.c +++ b/drivers/acpi/acpica/nsrepair.c @@ -181,8 +181,9 @@ acpi_ns_simple_repair(struct acpi_evaluate_info *info, * Try to fix if there was no return object. Warning if failed to fix. */ if (!return_object) { - if (expected_btypes && (!(expected_btypes & ACPI_RTYPE_NONE))) { - if (package_index != ACPI_NOT_PACKAGE_ELEMENT) { + if (expected_btypes) { + if (!(expected_btypes & ACPI_RTYPE_NONE) && + package_index != ACPI_NOT_PACKAGE_ELEMENT) { ACPI_WARN_PREDEFINED((AE_INFO, info->full_pathname, ACPI_WARN_ALWAYS, @@ -196,14 +197,15 @@ acpi_ns_simple_repair(struct acpi_evaluate_info *info, if (ACPI_SUCCESS(status)) { return (AE_OK); /* Repair was successful */ } - } else { + } + + if (expected_btypes != ACPI_RTYPE_NONE) { ACPI_WARN_PREDEFINED((AE_INFO, info->full_pathname, ACPI_WARN_ALWAYS, "Missing expected return value")); + return (AE_AML_NO_RETURN_VALUE); } - - return (AE_AML_NO_RETURN_VALUE); } }
From: Armin Wolf W_Armin@gmx.de
stable inclusion from stable-v4.19.276 commit 6671af7f52c382963b482c6ae55f3e6ee582e0f6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit f2ac14b5f197e4a2dec51e5ceaa56682ff1592bc ]
When encountering a string bigger than the destination buffer (32 bytes), the string is not properly NUL-terminated, causing buffer overreads later.
This for example happens on the Inspiron 3505, where the battery model name is larger than 32 bytes, which leads to sysfs showing the model name together with the serial number string (which is NUL-terminated and thus prevents worse).
Fix this by using strscpy() which ensures that the result is always NUL-terminated.
Fixes: 106449e870b3 ("ACPI: Battery: Allow extract string from integer") Signed-off-by: Armin Wolf W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/acpi/battery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c index cb97b6105f52..ac426905e7ee 100644 --- a/drivers/acpi/battery.c +++ b/drivers/acpi/battery.c @@ -448,7 +448,7 @@ static int extract_package(struct acpi_battery *battery, u8 *ptr = (u8 *)battery + offsets[i].offset; if (element->type == ACPI_TYPE_STRING || element->type == ACPI_TYPE_BUFFER) - strncpy(ptr, element->string.pointer, 32); + strscpy(ptr, element->string.pointer, 32); else if (element->type == ACPI_TYPE_INTEGER) { strncpy(ptr, (u8 *)&element->integer.value, sizeof(u64));
From: Herbert Xu herbert@gondor.apana.org.au
stable inclusion from stable-v4.19.276 commit 1effbddaff60eeef8017c6dea1ee0ed970164d14 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit 32e62025e5e52fbe4812ef044759de7010b15dbc ]
As it is seqiv only handles the special return value of EINPROGERSS, which means that in all other cases it will free data related to the request.
However, as the caller of seqiv may specify MAY_BACKLOG, we also need to expect EBUSY and treat it in the same way. Otherwise backlogged requests will trigger a use-after-free.
Fixes: 0a270321dbf9 ("[CRYPTO] seqiv: Add Sequence Number IV Generator") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- crypto/seqiv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/crypto/seqiv.c b/crypto/seqiv.c index 39dbf2f7e5f5..ca68608ab14e 100644 --- a/crypto/seqiv.c +++ b/crypto/seqiv.c @@ -30,7 +30,7 @@ static void seqiv_aead_encrypt_complete2(struct aead_request *req, int err) struct aead_request *subreq = aead_request_ctx(req); struct crypto_aead *geniv;
- if (err == -EINPROGRESS) + if (err == -EINPROGRESS || err == -EBUSY) return;
if (err)
From: Herbert Xu herbert@gondor.apana.org.au
stable inclusion from stable-v4.19.276 commit a023f1a938ad43642ad68f68527001e5686f5e60 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit 564cabc0ca0bdfa8f0fc1ae74b24d0a7554522c5 ]
Use the akcipher_request_complete helper instead of calling the completion function directly. In fact the previous code was buggy in that EINPROGRESS was never passed back to the original caller.
Fixes: 3d5b1ecdea6f ("crypto: rsa - RSA padding algorithm") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- crypto/rsa-pkcs1pad.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-)
diff --git a/crypto/rsa-pkcs1pad.c b/crypto/rsa-pkcs1pad.c index 48bd07e22a1d..e2fe43f4b4c6 100644 --- a/crypto/rsa-pkcs1pad.c +++ b/crypto/rsa-pkcs1pad.c @@ -216,16 +216,14 @@ static void pkcs1pad_encrypt_sign_complete_cb( struct crypto_async_request *child_async_req, int err) { struct akcipher_request *req = child_async_req->data; - struct crypto_async_request async_req;
if (err == -EINPROGRESS) - return; + goto out; + + err = pkcs1pad_encrypt_sign_complete(req, err);
- async_req.data = req->base.data; - async_req.tfm = crypto_akcipher_tfm(crypto_akcipher_reqtfm(req)); - async_req.flags = child_async_req->flags; - req->base.complete(&async_req, - pkcs1pad_encrypt_sign_complete(req, err)); +out: + akcipher_request_complete(req, err); }
static int pkcs1pad_encrypt(struct akcipher_request *req) @@ -334,15 +332,14 @@ static void pkcs1pad_decrypt_complete_cb( struct crypto_async_request *child_async_req, int err) { struct akcipher_request *req = child_async_req->data; - struct crypto_async_request async_req;
if (err == -EINPROGRESS) - return; + goto out; + + err = pkcs1pad_decrypt_complete(req, err);
- async_req.data = req->base.data; - async_req.tfm = crypto_akcipher_tfm(crypto_akcipher_reqtfm(req)); - async_req.flags = child_async_req->flags; - req->base.complete(&async_req, pkcs1pad_decrypt_complete(req, err)); +out: + akcipher_request_complete(req, err); }
static int pkcs1pad_decrypt(struct akcipher_request *req) @@ -502,15 +499,14 @@ static void pkcs1pad_verify_complete_cb( struct crypto_async_request *child_async_req, int err) { struct akcipher_request *req = child_async_req->data; - struct crypto_async_request async_req;
if (err == -EINPROGRESS) - return; + goto out;
- async_req.data = req->base.data; - async_req.tfm = crypto_akcipher_tfm(crypto_akcipher_reqtfm(req)); - async_req.flags = child_async_req->flags; - req->base.complete(&async_req, pkcs1pad_verify_complete(req, err)); + err = pkcs1pad_verify_complete(req, err); + +out: + akcipher_request_complete(req, err); }
/*
From: Jann Horn jannh@google.com
stable inclusion from stable-v4.19.276 commit 003e49fab13d0de9cda625489c402e5d18012a8b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit 9f76d59173d9d146e96c66886b671c1915a5c5e5 ]
The nanosleep syscalls use the restart_block mechanism, with a quirk: The `type` and `rmtp`/`compat_rmtp` fields are set up unconditionally on syscall entry, while the rest of the restart_block is only set up in the unlikely case that the syscall is actually interrupted by a signal (or pseudo-signal) that doesn't have a signal handler.
If the restart_block was set up by a previous syscall (futex(..., FUTEX_WAIT, ...) or poll()) and hasn't been invalidated somehow since then, this will clobber some of the union fields used by futex_wait_restart() and do_restart_poll().
If userspace afterwards wrongly calls the restart_syscall syscall, futex_wait_restart()/do_restart_poll() will read struct fields that have been clobbered.
This doesn't actually lead to anything particularly interesting because none of the union fields contain trusted kernel data, and futex(..., FUTEX_WAIT, ...) and poll() aren't syscalls where it makes much sense to apply seccomp filters to their arguments.
So the current consequences are just of the "if userspace does bad stuff, it can damage itself, and that's not a problem" flavor.
But still, it seems like a hazard for future developers, so invalidate the restart_block when partly setting it up in the nanosleep syscalls.
Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20230105134403.754986-1-jannh@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/time/hrtimer.c | 2 ++ kernel/time/posix-stubs.c | 2 ++ kernel/time/posix-timers.c | 2 ++ 3 files changed, 6 insertions(+)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 32ee24f5142a..8512f06f0ebe 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1838,6 +1838,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kernel_timespec __user *, rqtp, if (!timespec64_valid(&tu)) return -EINVAL;
+ current->restart_block.fn = do_no_restart_syscall; current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp = rmtp; return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC); @@ -1858,6 +1859,7 @@ COMPAT_SYSCALL_DEFINE2(nanosleep, struct compat_timespec __user *, rqtp, if (!timespec64_valid(&tu)) return -EINVAL;
+ current->restart_block.fn = do_no_restart_syscall; current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp = rmtp; return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC); diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c index 2c6847d5d69b..362c159fb3f8 100644 --- a/kernel/time/posix-stubs.c +++ b/kernel/time/posix-stubs.c @@ -144,6 +144,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp = NULL; + current->restart_block.fn = do_no_restart_syscall; current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp = rmtp; return hrtimer_nanosleep(&t, flags & TIMER_ABSTIME ? @@ -230,6 +231,7 @@ COMPAT_SYSCALL_DEFINE4(clock_nanosleep, clockid_t, which_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp = NULL; + current->restart_block.fn = do_no_restart_syscall; current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp = rmtp; return hrtimer_nanosleep(&t, flags & TIMER_ABSTIME ? diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 12c4048c5423..d86fdabc5748 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -1225,6 +1225,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp = NULL; + current->restart_block.fn = do_no_restart_syscall; current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp = rmtp;
@@ -1252,6 +1253,7 @@ COMPAT_SYSCALL_DEFINE4(clock_nanosleep, clockid_t, which_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp = NULL; + current->restart_block.fn = do_no_restart_syscall; current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp = rmtp;
From: Breno Leitao leitao@debian.org
stable inclusion from stable-v4.19.276 commit ca582161f5900991e26240e17f30740a8f3b9f2b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit 0125acda7d76b943ca55811df40ed6ec0ecf670f ]
Currently, x86_spec_ctrl_base is read at boot time and speculative bits are set if Kconfig items are enabled. For example, IBRS is enabled if CONFIG_CPU_IBRS_ENTRY is configured, etc. These MSR bits are not cleared if the mitigations are disabled.
This is a problem when kexec-ing a kernel that has the mitigation disabled from a kernel that has the mitigation enabled. In this case, the MSR bits are not cleared during the new kernel boot. As a result, this might have some performance degradation that is hard to pinpoint.
This problem does not happen if the machine is (hard) rebooted because the bit will be cleared by default.
[ bp: Massage. ]
Suggested-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/20221128153148.1129350-1-leitao@debian.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/kernel/cpu/bugs.c | 10 +++++++++- 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index f34221811090..1336c900e723 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -50,6 +50,10 @@ #define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior */ #define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)
+/* A mask for bits which the kernel toggles when controlling mitigations */ +#define SPEC_CTRL_MITIGATIONS_MASK (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD \ + | SPEC_CTRL_RRSBA_DIS_S) + #define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ #define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index f73fa10e0a9f..836000481438 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -128,9 +128,17 @@ void __init check_bugs(void) * have unknown values. AMD64_LS_CFG MSR is cached in the early AMD * init code as it is not enumerated and depends on the family. */ - if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + if (cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL)) { rdmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ /* + * Previously running kernel (kexec), may have some controls + * turned ON. Clear them and let the mitigations setup below + * rediscover them based on configuration. + */ + x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK; + } + /* Select the proper CPU mitigations before patching alternatives: */ spectre_v1_select_mitigation(); spectre_v2_select_mitigation();
From: Yang Jihong yangjihong1@huawei.com
stable inclusion from stable-v4.19.276 commit 4334c26f53585a45455af324c08a4b0036bfaa8d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit 868a6fc0ca2407622d2833adefe1c4d284766c4c upstream.
Since the following commit:
commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe may be in the optimizing or unoptimizing state when op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty.
The __recover_optprobed_insn check logic is incorrect, a kprobe in the unoptimizing state may be incorrectly determined as unoptimizing. As a result, incorrect instructions are copied.
The optprobe_queued_unopt function needs to be exported for invoking in arch directory.
Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/
Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Cc: stable@vger.kernel.org Signed-off-by: Yang Jihong yangjihong1@huawei.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/kprobes/opt.c | 4 ++-- include/linux/kprobes.h | 1 + kernel/kprobes.c | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 544bd41a514c..0ce2106a1e7d 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -56,8 +56,8 @@ unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr) /* This function only handles jump-optimized kprobe */ if (kp && kprobe_optimized(kp)) { op = container_of(kp, struct optimized_kprobe, kp); - /* If op->list is not empty, op is under optimizing */ - if (list_empty(&op->list)) + /* If op is optimized or under unoptimizing */ + if (list_empty(&op->list) || optprobe_queued_unopt(op)) goto found; } } diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index c28204e22b54..a8811a56e6e0 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -329,6 +329,7 @@ extern int proc_kprobes_optimization_handler(struct ctl_table *table, size_t *length, loff_t *ppos); #endif extern void wait_for_kprobe_optimizer(void); +bool optprobe_queued_unopt(struct optimized_kprobe *op); #else static inline void wait_for_kprobe_optimizer(void) { } #endif /* CONFIG_OPTPROBES */ diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 877fe4b36dfe..46d5ba206a95 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -625,7 +625,7 @@ void wait_for_kprobe_optimizer(void) mutex_unlock(&kprobe_mutex); }
-static bool optprobe_queued_unopt(struct optimized_kprobe *op) +bool optprobe_queued_unopt(struct optimized_kprobe *op) { struct optimized_kprobe *_op;
From: Yang Jihong yangjihong1@huawei.com
stable inclusion from stable-v4.19.276 commit add105f090a6ad3af1caaf1d81f896208cfc7afb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6 upstream.
When arch_prepare_optimized_kprobe calculating jump destination address, it copies original instructions from jmp-optimized kprobe (see __recover_optprobed_insn), and calculated based on length of original instruction.
arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when checking whether jmp-optimized kprobe exists. As a result, setup_detour_execution may jump to a range that has been overwritten by jump destination address, resulting in an inval opcode error.
For example, assume that register two kprobes whose addresses are <func+9> and <func+11> in "func" function. The original code of "func" function is as follows:
0xffffffff816cb5e9 <+9>: push %r12 0xffffffff816cb5eb <+11>: xor %r12d,%r12d 0xffffffff816cb5ee <+14>: test %rdi,%rdi 0xffffffff816cb5f1 <+17>: setne %r12b 0xffffffff816cb5f5 <+21>: push %rbp
1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1. After the optimization, "func" code changes to:
0xffffffff816cc079 <+9>: push %r12 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp
Now op1->flags == KPROBE_FLAG_OPTIMATED;
2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2.
register_kprobe(kp2) register_aggr_kprobe alloc_aggr_kprobe __prepare_optimized_kprobe arch_prepare_optimized_kprobe __recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn, // jump address = <func+14>
3. disable kp1:
disable_kprobe(kp1) __disable_kprobe ... if (p == orig_p || aggr_kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized orig_p->flags |= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED ...
4. unregister kp2 __unregister_kprobe_top ... if (!kprobe_disabled(ap) && !kprobes_all_disarmed) { optimize_kprobe(op) ... if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return return; p->kp.flags |= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED }
"func" code now is:
0xffffffff816cc079 <+9>: int3 0xffffffff816cc07a <+10>: push %rsp 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp
5. if call "func", int3 handler call setup_detour_execution:
if (p->flags & KPROBE_FLAG_OPTIMIZED) { ... regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX; ... }
The code for the destination address is
0xffffffffa021072c: push %r12 0xffffffffa021072e: xor %r12d,%r12d 0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14>
However, <func+14> is not a valid start instruction address. As a result, an error occurs.
Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/
Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Signed-off-by: Yang Jihong yangjihong1@huawei.com Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/kprobes/opt.c | 2 +- include/linux/kprobes.h | 1 + kernel/kprobes.c | 4 ++-- 3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 0ce2106a1e7d..36b5a493e5b4 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -328,7 +328,7 @@ int arch_check_optimized_kprobe(struct optimized_kprobe *op)
for (i = 1; i < op->optinsn.size; i++) { p = get_kprobe(op->kp.addr + i); - if (p && !kprobe_disabled(p)) + if (p && !kprobe_disarmed(p)) return -EEXIST; }
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index a8811a56e6e0..3dd7bb3fbc44 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -330,6 +330,7 @@ extern int proc_kprobes_optimization_handler(struct ctl_table *table, #endif extern void wait_for_kprobe_optimizer(void); bool optprobe_queued_unopt(struct optimized_kprobe *op); +bool kprobe_disarmed(struct kprobe *p); #else static inline void wait_for_kprobe_optimizer(void) { } #endif /* CONFIG_OPTPROBES */ diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 46d5ba206a95..70a91f0a9755 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -418,8 +418,8 @@ static inline int kprobe_optready(struct kprobe *p) return 0; }
-/* Return true(!0) if the kprobe is disarmed. Note: p must be on hash list */ -static inline int kprobe_disarmed(struct kprobe *p) +/* Return true if the kprobe is disarmed. Note: p must be on hash list */ +bool kprobe_disarmed(struct kprobe *p) { struct optimized_kprobe *op;
From: Johan Hovold johan+linaro@kernel.org
stable inclusion from stable-v4.19.276 commit 2dcccf91bc4e9937dccf86c9b1f5026ffd72a80b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit b06730a571a9ff1ba5bd6b20bf9e50e5a12f1ec6 upstream.
The sanity check for an already mapped virq is done outside of the irq_domain_mutex-protected section which means that an (unlikely) racing association may not be detected.
Fix this by factoring out the association implementation, which will also be used in a follow-on change to fix a shared-interrupt mapping race.
Fixes: ddaf144c61da ("irqdomain: Refactor irq_domain_associate_many()") Cc: stable@vger.kernel.org # 3.11 Tested-by: Hsin-Yi Wang hsinyi@chromium.org Tested-by: Mark-PK Tsai mark-pk.tsai@mediatek.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230213104302.17307-2-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/irq/irqdomain.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index a3c94cdf44e7..ddbe00c591aa 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -515,8 +515,8 @@ void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq) irq_domain_clear_mapping(domain, hwirq); }
-int irq_domain_associate(struct irq_domain *domain, unsigned int virq, - irq_hw_number_t hwirq) +static int irq_domain_associate_locked(struct irq_domain *domain, unsigned int virq, + irq_hw_number_t hwirq) { struct irq_data *irq_data = irq_get_irq_data(virq); int ret; @@ -529,7 +529,6 @@ int irq_domain_associate(struct irq_domain *domain, unsigned int virq, if (WARN(irq_data->domain, "error: virq%i is already associated", virq)) return -EINVAL;
- mutex_lock(&irq_domain_mutex); irq_data->hwirq = hwirq; irq_data->domain = domain; if (domain->ops->map) { @@ -546,7 +545,6 @@ int irq_domain_associate(struct irq_domain *domain, unsigned int virq, } irq_data->domain = NULL; irq_data->hwirq = 0; - mutex_unlock(&irq_domain_mutex); return ret; }
@@ -557,12 +555,23 @@ int irq_domain_associate(struct irq_domain *domain, unsigned int virq,
domain->mapcount++; irq_domain_set_mapping(domain, hwirq, irq_data); - mutex_unlock(&irq_domain_mutex);
irq_clear_status_flags(virq, IRQ_NOREQUEST);
return 0; } + +int irq_domain_associate(struct irq_domain *domain, unsigned int virq, + irq_hw_number_t hwirq) +{ + int ret; + + mutex_lock(&irq_domain_mutex); + ret = irq_domain_associate_locked(domain, virq, hwirq); + mutex_unlock(&irq_domain_mutex); + + return ret; +} EXPORT_SYMBOL_GPL(irq_domain_associate);
void irq_domain_associate_many(struct irq_domain *domain, unsigned int irq_base,
From: Johan Hovold johan+linaro@kernel.org
stable inclusion from stable-v4.19.276 commit 1b9fe6e4930155f1083a4dc32d3a47b1c4e8f55c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit 3f883c38f5628f46b30bccf090faec054088e262 upstream.
The global irq_domain_mutex is held when mapping interrupts from non-hierarchical domains but currently not when disposing them.
This specifically means that updates of the domain mapcount is racy (currently only used for statistics in debugfs).
Make sure to hold the global irq_domain_mutex also when disposing mappings from non-hierarchical domains.
Fixes: 9dc6be3d4193 ("genirq/irqdomain: Add map counter") Cc: stable@vger.kernel.org # 4.13 Tested-by: Hsin-Yi Wang hsinyi@chromium.org Tested-by: Mark-PK Tsai mark-pk.tsai@mediatek.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230213104302.17307-3-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/irq/irqdomain.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index ddbe00c591aa..df2fc1dbd7fa 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -494,6 +494,9 @@ void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq) return;
hwirq = irq_data->hwirq; + + mutex_lock(&irq_domain_mutex); + irq_set_status_flags(irq, IRQ_NOREQUEST);
/* remove chip and handler */ @@ -513,6 +516,8 @@ void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
/* Clear reverse map for this hwirq */ irq_domain_clear_mapping(domain, hwirq); + + mutex_unlock(&irq_domain_mutex); }
static int irq_domain_associate_locked(struct irq_domain *domain, unsigned int virq,
From: Johan Hovold johan+linaro@kernel.org
stable inclusion from stable-v4.19.276 commit 525eb5cb8edfb7014711c2c87827ed17af8872fb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit e3b7ab025e931accdc2c12acf9b75c6197f1c062 upstream.
In case a newly allocated IRQ ever ends up not having any associated struct irq_data it would not even be possible to dispose the mapping.
Replace the bogus disposal with a WARN_ON().
This will also be used to fix a shared-interrupt mapping race, hence the CC-stable tag.
Fixes: 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ") Cc: stable@vger.kernel.org # 4.8 Tested-by: Hsin-Yi Wang hsinyi@chromium.org Tested-by: Mark-PK Tsai mark-pk.tsai@mediatek.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230213104302.17307-4-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/irq/irqdomain.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index df2fc1dbd7fa..89f704ba2e6b 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -832,13 +832,8 @@ unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec) }
irq_data = irq_get_irq_data(virq); - if (!irq_data) { - if (irq_domain_is_hierarchy(domain)) - irq_domain_free_irqs(virq, 1); - else - irq_dispose_mapping(virq); + if (WARN_ON(!irq_data)) return 0; - }
/* Store trigger type */ irqd_set_trigger_type(irq_data, type);
From: Zhihao Cheng chengzhihao1@huawei.com
stable inclusion from stable-v4.19.278 commit 59eee0cdf8c036f554add97a4da7c06d7a9ff34a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit f5361da1e60d54ec81346aee8e3d8baf1be0b762 upstream.
If the boot loader inode has never been used before, the EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the i_size to 0. However, if the "never before used" boot loader has a non-zero i_size, then i_disksize will be non-zero, and the inconsistency between i_size and i_disksize can trigger a kernel warning:
WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa RIP: 0010:ext4_file_write_iter+0xbc7/0xd10 Call Trace: vfs_write+0x3b1/0x5c0 ksys_write+0x77/0x160 __x64_sys_write+0x22/0x30 do_syscall_64+0x39/0x80
Reproducer: 1. create corrupted image and mount it: mke2fs -t ext4 /tmp/foo.img 200 debugfs -wR "sif <5> size 25700" /tmp/foo.img mount -t ext4 /tmp/foo.img /mnt cd /mnt echo 123 > file 2. Run the reproducer program: posix_memalign(&buf, 1024, 1024) fd = open("file", O_RDWR | O_DIRECT); ioctl(fd, EXT4_IOC_SWAP_BOOT); write(fd, buf, 1024);
Fix this by setting i_disksize as well as i_size to zero when initiaizing the boot loader inode.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159 Cc: stable@kernel.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/ioctl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 4a5f9a41c519..6e8fd8171cb0 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -181,6 +181,7 @@ static long swap_inode_boot_loader(struct super_block *sb, ei_bl->i_flags = 0; inode_set_iversion(inode_bl, 1); i_size_write(inode_bl, 0); + EXT4_I(inode_bl)->i_disksize = inode_bl->i_size; inode_bl->i_mode = S_IFREG; if (ext4_has_feature_extents(sb)) { ext4_set_inode_flag(inode_bl, EXT4_INODE_EXTENTS);
From: Baokun Li libaokun1@huawei.com
stable inclusion from stable-v4.19.279 commit 3aea195acd977e82d970cbc7078f983880c7ee6a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
[ Upstream commit 5cd740287ae5e3f9d1c46f5bfe8778972fd6d3fe ]
In ext4_fill_super(), EXT4_ORPHAN_FS flag is cleared after ext4_orphan_cleanup() is executed. Therefore, when __ext4_iget() is called to get an inode whose i_nlink is 0 when the flag exists, no error is returned. If the inode is a special inode, a null pointer dereference may occur. If the value of i_nlink is 0 for any inodes (except boot loader inodes) got by using the EXT4_IGET_SPECIAL flag, the current file system is corrupted. Therefore, make the ext4_iget() function return an error if it gets such an abnormal special inode.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199179 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216541 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216539 Reported-by: Luís Henriques lhenriques@suse.de Suggested-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230107032126.4165860-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/inode.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b327fde0225a..1dbaebce54c6 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5019,13 +5019,6 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, goto bad_inode; raw_inode = ext4_raw_inode(&iloc);
- if ((ino == EXT4_ROOT_INO) && (raw_inode->i_links_count == 0)) { - ext4_error_inode(inode, function, line, 0, - "iget: root inode unallocated"); - ret = -EFSCORRUPTED; - goto bad_inode; - } - if ((flags & EXT4_IGET_HANDLE) && (raw_inode->i_links_count == 0) && (raw_inode->i_mode == 0)) { ret = -ESTALE; @@ -5096,11 +5089,16 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, * NeilBrown 1999oct15 */ if (inode->i_nlink == 0) { - if ((inode->i_mode == 0 || + if ((inode->i_mode == 0 || flags & EXT4_IGET_SPECIAL || !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS)) && ino != EXT4_BOOT_LOADER_INO) { - /* this inode is deleted */ - ret = -ESTALE; + /* this inode is deleted or unallocated */ + if (flags & EXT4_IGET_SPECIAL) { + ext4_error_inode(inode, function, line, 0, + "iget: special inode unallocated"); + ret = -EFSCORRUPTED; + } else + ret = -ESTALE; goto bad_inode; } /* The only unlinked inodes we let through here have
From: Nikita Zhandarovich n.zhandarovich@fintech.ru
stable inclusion from stable-v4.19.279 commit ffdf8d81c48822a329af9f31dc239090f4a60761 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit cbebd68f59f03633469f3ecf9bea99cd6cce3854 upstream.
cmdline_find_option() may fail before doing any initialization of the buffer array. This may lead to unpredictable results when the same buffer is used later in calls to strncmp() function. Fix the issue by returning early if cmdline_find_option() returns an error.
Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE.
Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Signed-off-by: Nikita Zhandarovich n.zhandarovich@fintech.ru Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Tom Lendacky thomas.lendacky@amd.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/mm/mem_encrypt_identity.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index 4cbaacbe889a..0fadc69d2621 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -563,7 +563,8 @@ void __init sme_enable(struct boot_params *bp) cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr | ((u64)bp->ext_cmd_line_ptr << 32));
- cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)); + if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0) + return;
if (!strncmp(buffer, cmdline_on, sizeof(buffer))) sme_me_mask = me_mask;
From: "Jason A. Donenfeld" Jason@zx2c4.com
stable inclusion from stable-v4.19.274 commit e4935368448ce8097dada35163598e93567f1110 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit d7bf7f3b813e3755226bcb5114ad2ac477514ebf ]
add_latent_entropy() is called every time a process forks, in kernel_clone(). This in turn calls add_device_randomness() using the latent entropy global state. add_device_randomness() does two things:
2) Mixes into the input pool the latent entropy argument passed; and 1) Mixes in a cycle counter, a sort of measurement of when the event took place, the high precision bits of which are presumably difficult to predict.
(2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n disables both (1) and (2), instead of just (2).
This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_ entropy() is called.
Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: PaX Team pageexec@freemail.hu Cc: Emese Revfy re.emese@gmail.com Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Sasha Levin sashal@kernel.org
Conflicts: include/linux/random.h
Signed-off-by: Cui GaoSheng cuigaosheng1@huawei.com Reviewed-by: yiyang yiyang13@huawei.com Reviewed-by: guozihua guozihua@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/random.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/random.h b/include/linux/random.h index 37209b3b22ae..d05e70d56d26 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -20,15 +20,15 @@ struct random_ready_callback {
extern void add_device_randomness(const void *, unsigned int);
-#if defined(CONFIG_GCC_PLUGIN_LATENT_ENTROPY) && !defined(__CHECKER__) static inline void add_latent_entropy(void) { +#if defined(CONFIG_GCC_PLUGIN_LATENT_ENTROPY) && !defined(__CHECKER__) add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); -} #else -static inline void add_latent_entropy(void) {} + add_device_randomness(NULL, 0); #endif +}
extern void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy;
From: Dave Hansen dave.hansen@linux.intel.com
stable inclusion from stable-v4.19.274 commit f8e54da1c729cc23d9a7b7bd42379323e7fb7979 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1 CVE: NA
--------------------------------
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream.
The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively:
if (access_ok(from, size)) // Right here
even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated.
But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this:
if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away.
Reported-by: Jordy Zomer jordyzomer@google.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Daniel Borkmann daniel@iogearbox.net # BPF bits Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Conflicts: lib/usercopy.c Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/nospec.h | 4 ++++ kernel/bpf/core.c | 2 -- lib/usercopy.c | 7 +++++++ 3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/nospec.h b/include/linux/nospec.h index 0c5ef54fd416..207ef2a20e48 100644 --- a/include/linux/nospec.h +++ b/include/linux/nospec.h @@ -9,6 +9,10 @@
struct task_struct;
+#ifndef barrier_nospec +# define barrier_nospec() do { } while (0) +#endif + /** * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise * @index: array element index diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 625edbc8360d..abf30609e0d9 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1430,9 +1430,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) * reuse preexisting logic from Spectre v1 mitigation that * happens to produce the required code on x86 for v4 as well. */ -#ifdef CONFIG_X86 barrier_nospec(); -#endif CONT; #define LDST(SIZEOP, SIZE) \ STX_MEM_##SIZEOP: \ diff --git a/lib/usercopy.c b/lib/usercopy.c index c2bfbcaeb3dc..68be1c66704b 100644 --- a/lib/usercopy.c +++ b/lib/usercopy.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/uaccess.h> +#include <linux/nospec.h>
/* out-of-line parts */
@@ -9,6 +10,12 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n unsigned long res = n; might_fault(); if (likely(access_ok(from, n))) { + /* + * Ensure that bad access_ok() speculation will not + * lead to nasty side effects *after* the copy is + * finished: + */ + barrier_nospec(); kasan_check_write(to, n); res = raw_copy_from_user(to, from, n); }