There are two major types of uncorrected recoverable (UCR) errors :
- Synchronous error: The error is detected and raised at the point of the consumption in the execution flow, e.g. when a CPU tries to access a poisoned cache line. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64 and Machine Check Exception (MCE) on X86. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error.
- Asynchronous error: The error is detected out of processor execution context, e.g. when an error is detected by a background scrubber. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error.
Currently, both synchronous and asynchronous error use memory_failure_queue() to schedule memory_failure() exectute in kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context:
- will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop
Issue 1: send wrong si_code in early_kill mode
Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed poisoned page. This is done by sending a SIGBUS signal with an error code BUS_MCEERR_AR, indicating an action-required machine check error on read.
However, when kill_proc() is called to terminate the processes who have the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered.
To reproduce this problem:
# STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed
The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not fact.
To fix it, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data.
After this patch set:
# STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed
The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error as we expected.
Issue 2: a synchronous error infinite loop due to memory_failure() failed
If a user-space process, e.g. devmem, a poisoned page which has been set HWPosion flag, kill_accessing_process() is called to send SIGBUS to the current processs with error info. Because the memory_failure() is executed in the kworker contex, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an excepction again, resulting in a synchronous error infinite loop. Such loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error.
To reproduce this problem:
# STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed
# STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400
To fix it, if memory_failure() failed, perform a force kill to current process.
Issue 3: a synchronous error infinite loop due to no memory_failure() queued
No memory_failure() work is queued unless all bellow preconditions check passed:
- `if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))` in ghes_handle_memory_failure() - `if (flags == -1)` in ghes_handle_memory_failure() - `if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))` in ghes_do_memory_failure() - `if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) ` in ghes_do_memory_failure()
If the preconditions are not passed, the user-space process will trigger SEA again. This loop can potentially exceed the platform firmware threshold or even trigger a kernel hard lockup, leading to a system reboot.
To fix it, if no memory_failure() queued, perform a force kill to current process.
And the the memory errors triggered in kernel-mode[5], also relies on this patchset to kill the failure thread.
Lv Ying and XiuQi from Huawei also proposed to address similar problem[2][4]. Acknowledge to discussion with them.
[1] Add ARMv8 RAS virtualization support in QEMU https://patchew.org/QEMU/20200512030609.19593-1-gengdongjiu@huawei.com/ [2] https://lore.kernel.org/lkml/20221205115111.131568-3-lvying6@huawei.com/ [3] https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com [4] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/ [5] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20240528085915.1...
Shuai Xue (4): ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered mm: memory-failure: move return value documentation to function declaration ACPI: APEI: handle synchronous exceptions in task work
arch/x86/kernel/cpu/mce/core.c | 5 -- drivers/acpi/apei/ghes.c | 114 ++++++++++++++++++++++----------- include/acpi/ghes.h | 3 - include/linux/mm.h | 1 - mm/memory-failure.c | 19 ++---- 5 files changed, 82 insertions(+), 60 deletions(-)
From: Shuai Xue xueshuai@linux.alibaba.com
mainline inclusion from mainline-v6.8-rc1 commit a70297d2213253853e95f5b49651f924990c6d3b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB0OV7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
There are two major types of uncorrected recoverable (UCR) errors :
- Synchronous error: The error is detected and raised at the point of the consumption in the execution flow, e.g. when a CPU tries to access a poisoned cache line. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64 and Machine Check Exception (MCE) on X86. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error.
- Asynchronous error: The error is detected out of processor execution context, e.g. when an error is detected by a background scrubber. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error.
When APEI firmware first is enabled, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA notification ), or for handling asynchronous errors (e.g. SCI or External Interrupt notification). In other words, we can distinguish synchronous errors by APEI notification. For synchronous errors, kernel will kill the current process which accessing the poisoned page by sending SIGBUS with BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in early kill mode. However, the GHES driver always sets mf_flags to 0 so that all synchronous errors are handled as asynchronous errors in memory failure.
To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous events.
Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Tested-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Xiaofei Tan tanxiaofei@huawei.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: James Morse james.morse@arm.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com
Conflicts: drivers/acpi/apei/ghes.c [fix context conflicts] Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- drivers/acpi/apei/ghes.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 6535b444ad7b..151ca604b139 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -99,6 +99,20 @@ static inline bool is_hest_type_generic_v2(struct ghes *ghes) return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2; }
+/* + * A platform may describe one error source for the handling of synchronous + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI + * or External Interrupt). On x86, the HEST notifications are always + * asynchronous, so only SEA on ARM is delivered as a synchronous + * notification. + */ +static inline bool is_hest_sync_notify(struct ghes *ghes) +{ + u8 notify_type = ghes->generic->notify.type; + + return notify_type == ACPI_HEST_NOTIFY_SEA; +} + /* * This driver isn't really modular, however for the time being, * continuing to use module_param is the easiest way to remain @@ -466,7 +480,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) }
static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, - int sev) + int sev, bool sync) { int flags = -1; int sec_sev = ghes_severity(gdata->error_severity); @@ -480,7 +494,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) flags = MF_SOFT_OFFLINE; if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) - flags = 0; + flags = sync ? MF_ACTION_REQUIRED : 0;
if (flags != -1) return ghes_do_memory_failure(mem_err->physical_addr, flags); @@ -488,9 +502,11 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, return false; }
-static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev) +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, + int sev, bool sync) { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); + int flags = sync ? MF_ACTION_REQUIRED : 0; bool queued = false; int sec_sev, i; char *p; @@ -514,7 +530,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int s * and don't filter out 'corrected' error here. */ if (is_cache && has_pa) { - queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0); + queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags); p += err_info->length; continue; } @@ -635,6 +651,7 @@ static bool ghes_do_proc(struct ghes *ghes, const guid_t *fru_id = &guid_null; char *fru_text = ""; bool queued = false; + bool sync = is_hest_sync_notify(ghes);
sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { @@ -652,13 +669,13 @@ static bool ghes_do_proc(struct ghes *ghes, ghes_edac_report_mem_error(sev, mem_err);
arch_apei_report_mem_error(sev, mem_err); - queued = ghes_handle_memory_failure(gdata, sev); + queued = ghes_handle_memory_failure(gdata, sev, sync); } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { - queued = ghes_handle_arm_hw_error(gdata, sev); + queued = ghes_handle_arm_hw_error(gdata, sev, sync); #ifdef CONFIG_ACPI_APEI_GHES_TS_CORE } else if (guid_equal(sec_type, &CPER_SEC_TS_CORE)) {
From: Shuai Xue xueshuai@linux.alibaba.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB0OV7 CVE: NA
Reference: https://lore.kernel.org/lkml/20241202030527.20586-2-xueshuai@linux.alibaba.c...
----------------------------------------------------------------------
Synchronous error was detected as a result of user-space process accessing a 2-bit uncorrected error. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a memory_failure() work which poisons the related page, unmaps the page, and then sends a SIGBUS to the process, so that a system wide panic can be avoided.
However, no memory_failure() work will be queued when abnormal synchronous errors occur. These errors can include situations such as invalid PA, unexpected severity, no memory failure config support, invalid GUID section, etc. In such case, the user-space process will trigger SEA again. This loop can potentially exceed the platform firmware threshold or even trigger a kernel hard lockup, leading to a system reboot.
Fix it by performing a force kill if no memory_failure() work is queued for synchronous errors.
Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Yazen Ghannam yazen.ghannam@amd.com
Conflicts: drivers/acpi/apei/ghes.c [fix context conflicts and print format] Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- drivers/acpi/apei/ghes.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 151ca604b139..b1bd85055f46 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -698,6 +698,16 @@ static bool ghes_do_proc(struct ghes *ghes, #endif }
+ /* + * If no memory failure work is queued for abnormal synchronous + * errors, do a force kill. + */ + if (sync && !queued) { + pr_err(HW_ERR GHES_PFX "%s:%d: hardware memory corruption (SIGBUS)\n", + current->comm, task_pid_nr(current)); + force_sig(SIGBUS); + } + return queued; }
From: Shuai Xue xueshuai@linux.alibaba.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB0OV7 CVE: NA
Reference: https://lore.kernel.org/lkml/20241202030527.20586-3-xueshuai@linux.alibaba.c...
----------------------------------------------------------------------
Part of return value comments for memory_failure() were originally documented at the call site. Move those comments to the function declaration to improve code readability and to provide developers with immediate access to function usage and return information.
Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Yazen Ghannam yazen.ghannam@amd.com
Conflicts: arch/x86/kernel/cpu/mce/core.c mm/memory-failure.c [fix context conflicts and memory_failure() return value] Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- arch/x86/kernel/cpu/mce/core.c | 5 ----- mm/memory-failure.c | 6 ++++++ 2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 6cb84c19a3bd..497ff1901d9d 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1276,11 +1276,6 @@ static void kill_me_maybe(struct callback_head *cb) return; }
- /* - * -EHWPOISON from memory_failure() means that it already sent SIGBUS - * to the current process with the proper error info, so no need to - * send SIGBUS here again. - */ if (ret == -EHWPOISON) return;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a5ab0b2f213b..32b26b58e92d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1524,6 +1524,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, * * Must run in process context (e.g. a work queue) with interrupts * enabled and no spinlocks hold. + * Return: + * 0 - success, + * -ENXIO - memory not managed by the kernel + * -EHWPOISON - the page was already poisoned, potentially + * kill process, + * other negative values - failure. */ int memory_failure(unsigned long pfn, int flags) {
From: Shuai Xue xueshuai@linux.alibaba.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB0OV7 CVE: NA
Reference: https://lore.kernel.org/lkml/20241202030527.20586-4-xueshuai@linux.alibaba.c...
----------------------------------------------------------------------
The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous error use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context.
As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, memory_failure():
- will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop
Issue 1: send wrong si_code in early_kill mode
Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed poisoned page. This is done by sending a SIGBUS signal with an error code BUS_MCEERR_AR, indicating an action-required machine check error on read.
However, when kill_proc() is called to terminate the processes who have the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered.
To reproduce this problem:
#sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed
The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not the fact.
After this patch:
# STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed
The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as we expected.
Issue 2: a synchronous error infinite loop
If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Because the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error.
To reproduce this problem:
# STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed
# STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400
To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data.
Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Tested-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Xiaofei Tan tanxiaofei@huawei.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com
Conflicts: include/linux/mm.h drivers/acpi/apei/ghes.c [fix context conflicts] Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- drivers/acpi/apei/ghes.c | 77 +++++++++++++++++++++++----------------- include/acpi/ghes.h | 3 -- include/linux/mm.h | 1 - mm/memory-failure.c | 13 ------- 4 files changed, 44 insertions(+), 50 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index b1bd85055f46..e638fa4b7426 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -441,28 +441,41 @@ static void ghes_clear_estatus(struct ghes *ghes, }
/* - * Called as task_work before returning to user-space. - * Ensure any queued work has been done before we return to the context that - * triggered the notification. + * struct ghes_task_work - for synchronous RAS event + * + * @twork: callback_head for task work + * @pfn: page frame number of corrupted page + * @flags: work control flags + * + * Structure to pass task work to be handled before + * returning to user-space via task_work_add(). */ -static void ghes_kick_task_work(struct callback_head *head) +struct ghes_task_work { + struct callback_head twork; + u64 pfn; + int flags; +}; + +static void memory_failure_cb(struct callback_head *twork) { - struct acpi_hest_generic_status *estatus; - struct ghes_estatus_node *estatus_node; - u32 node_len; + struct ghes_task_work *twcb = container_of(twork, struct ghes_task_work, twork); + int ret;
- estatus_node = container_of(head, struct ghes_estatus_node, task_work); - if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) - memory_failure_queue_kick(estatus_node->task_work_cpu); + ret = memory_failure(twcb->pfn, twcb->flags); + gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb));
- estatus = GHES_ESTATUS_FROM_NODE(estatus_node); - node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); - gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); + if (!ret || ret == -EHWPOISON || ret == -EOPNOTSUPP) + return; + + pr_err("%#llx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", + twcb->pfn, current->comm, task_pid_nr(current)); + force_sig(SIGBUS); }
static bool ghes_do_memory_failure(u64 physical_addr, int flags) { unsigned long pfn; + struct ghes_task_work *twcb;
if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; @@ -475,6 +488,18 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) return false; }
+ if (flags == MF_ACTION_REQUIRED && current->mm) { + twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb)); + if (!twcb) + return false; + + twcb->pfn = pfn; + twcb->flags = flags; + init_task_work(&twcb->twork, memory_failure_cb); + task_work_add(current, &twcb->twork, TWA_RESUME); + return true; + } + memory_failure_queue(pfn, flags); return true; } @@ -642,7 +667,7 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata, schedule_work(&entry->work); }
-static bool ghes_do_proc(struct ghes *ghes, +static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { int sev, sec_sev; @@ -707,8 +732,6 @@ static bool ghes_do_proc(struct ghes *ghes, current->comm, task_pid_nr(current)); force_sig(SIGBUS); } - - return queued; }
static void __ghes_print_estatus(const char *pfx, @@ -1004,9 +1027,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) struct ghes_estatus_node *estatus_node; struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; - bool task_work_pending; u32 len, node_len; - int ret;
llnode = llist_del_all(&ghes_estatus_llist); /* @@ -1021,25 +1042,16 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) estatus = GHES_ESTATUS_FROM_NODE(estatus_node); len = cper_estatus_len(estatus); node_len = GHES_ESTATUS_NODE_LEN(len); - task_work_pending = ghes_do_proc(estatus_node->ghes, estatus); + + ghes_do_proc(estatus_node->ghes, estatus); + if (!ghes_estatus_cached(estatus)) { generic = estatus_node->generic; if (ghes_print_estatus(NULL, generic, estatus)) ghes_estatus_cache_add(generic, estatus); } - - if (task_work_pending && current->mm) { - estatus_node->task_work.func = ghes_kick_task_work; - estatus_node->task_work_cpu = smp_processor_id(); - ret = task_work_add(current, &estatus_node->task_work, - TWA_RESUME); - if (ret) - estatus_node->task_work.func = NULL; - } - - if (!estatus_node->task_work.func) - gen_pool_free(ghes_estatus_pool, - (unsigned long)estatus_node, node_len); + gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, + node_len);
llnode = next; } @@ -1100,7 +1112,6 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
estatus_node->ghes = ghes; estatus_node->generic = ghes->generic; - estatus_node->task_work.func = NULL; estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) { diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 544d92789b34..ac8125c28c5a 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -33,9 +33,6 @@ struct ghes_estatus_node { struct llist_node llnode; struct acpi_hest_generic *generic; struct ghes *ghes; - - int task_work_cpu; - struct callback_head task_work; };
struct ghes_estatus_cache { diff --git a/include/linux/mm.h b/include/linux/mm.h index 00bc6978391b..d9d8e68c2109 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3172,7 +3172,6 @@ enum mf_flags { }; extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); -extern void memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 32b26b58e92d..3944859bb587 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1798,19 +1798,6 @@ static void memory_failure_work_func(struct work_struct *work) } }
-/* - * Process memory_failure work queued on the specified CPU. - * Used to avoid return-to-userspace racing with the memory_failure workqueue. - */ -void memory_failure_queue_kick(int cpu) -{ - struct memory_failure_cpu *mf_cpu; - - mf_cpu = &per_cpu(memory_failure_cpu, cpu); - cancel_work_sync(&mf_cpu->work); - memory_failure_work_func(&mf_cpu->work); -} - static int __init memory_failure_init(void) { struct memory_failure_cpu *mf_cpu;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/14132 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/V...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/14132 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/V...