tree: https://gitee.com/openeuler/kernel.git OLK-5.10
head: 5c3dff2dbd96a0929b37a4f5331b76089f8f6285
commit: f2c902d8c653f8021f9761092a27f7b9db42b662 [2574/2574] tracing: Make tracepoint lockdep check actually test something
config: x86_64-buildonly-randconfig-003-20241210 (https://download.01.org/0day-ci/archive/20241211/202412110343.SnsYTxbB-lkp@…)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241211/202412110343.SnsYTxbB-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412110343.SnsYTxbB-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_timing_generator.o: warning: objtool: dce110_timing_generator_set_test_pattern()+0x40d: unreachable instruction
--
>> drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_transform_v.o: warning: objtool: dce110_xfmv_set_scaler()+0x1047: unreachable instruction
--
>> drivers/gpu/drm/amd/amdgpu/../display/dc/dce120/dce120_timing_generator.o: warning: objtool: dce120_timing_generator_set_test_pattern()+0x96d: unreachable instruction
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
tree: https://gitee.com/openeuler/kernel.git OLK-6.6
head: c0c5f5fdc62998e0c68e6c77d6aae2566185bfd5
commit: bf177ad1d8f72824180b44563c09f37562f645de [1610/1610] drivers: initial support for rnpgbevf drivers from Mucse Technology
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20241210/202412101859.KXPZLAMD-lkp@…)
compiler: loongarch64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241210/202412101859.KXPZLAMD-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412101859.KXPZLAMD-lkp@intel.com/
All errors (new ones prefixed by >>):
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnp/rnp_mbx.o:(.data.rel.local+0x0): multiple definition of `mbx_ops_generic'; drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.o:(.data.rel.ro.local+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpm/rnpm_main.o: in function `.LANCHOR3':
rnpm_main.c:(.bss+0x110): multiple definition of `mpe_pkt_version'; drivers/net/ethernet/mucse/rnp/rnp_main.o:rnp_main.c:(.bss+0x60): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpm/rnpm_main.o: in function `.LANCHOR3':
rnpm_main.c:(.bss+0x114): multiple definition of `mpe_src_port'; drivers/net/ethernet/mucse/rnp/rnp_main.o:rnp_main.c:(.bss+0x64): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpm/rnpm_mbx.o:(.data.rel.local+0x0): multiple definition of `mbx_ops_generic'; drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.o:(.data.rel.ro.local+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpm/rnpm_pcs.o:(.data.rel.local+0x0): multiple definition of `pcs_ops_generic'; drivers/net/ethernet/mucse/rnp/rnp_pcs.o:(.data.rel.local+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpm/rnpm_ptp.o: in function `.LANCHOR1':
rnpm_ptp.c:(.data.rel.ro.local+0x0): multiple definition of `mac_ptp'; drivers/net/ethernet/mucse/rnp/rnp_ptp.o:rnp_ptp.c:(.data.rel.ro.local+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.o: in function `.LANCHOR7':
rnpgbe_main.c:(.data.rel.ro+0x0): multiple definition of `rnp10_netdev_ops'; drivers/net/ethernet/mucse/rnp/rnp_main.o:rnp_main.c:(.data.rel.ro+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.o:(.data.rel.local+0x0): multiple definition of `mbx_ops_generic'; drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.o:(.data.rel.ro.local+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.o: in function `mbx_cookie_zalloc':
rnpgbe_mbx_fw.c:(.text+0x128): multiple definition of `mbx_cookie_zalloc'; drivers/net/ethernet/mucse/rnpm/rnpm_mbx_fw.o:rnpm_mbx_fw.c:(.text+0x6a4): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbe/rnpgbe_sriov.o: in function `check_ari_mode':
rnpgbe_sriov.c:(.text+0x2acc): multiple definition of `check_ari_mode'; drivers/net/ethernet/mucse/rnp/rnp_sriov.o:rnp_sriov.c:(.text+0x2af8): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbe/rnpgbe_ptp.o: in function `.LANCHOR1':
rnpgbe_ptp.c:(.data.rel.ro.local+0x0): multiple definition of `mac_ptp'; drivers/net/ethernet/mucse/rnp/rnp_ptp.o:rnp_ptp.c:(.data.rel.ro.local+0x0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbevf/rnpgbevf_main.o: in function `remove_mbx_irq':
>> rnpgbevf_main.c:(.text+0xb6bc): multiple definition of `remove_mbx_irq'; drivers/net/ethernet/mucse/rnpvf/rnpvf_main.o:rnpvf_main.c:(.text+0xc0b0): first defined here
loongarch64-linux-ld: drivers/net/ethernet/mucse/rnpgbevf/rnpgbevf_main.o: in function `register_mbx_irq':
rnpgbevf_main.c:(.text+0xbc28): multiple definition of `register_mbx_irq'; drivers/net/ethernet/mucse/rnpvf/rnpvf_main.o:rnpvf_main.c:(.text+0xc61c): first defined here
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Carrie.Cai,
FYI, the error/warning still remains.
tree: https://gitee.com/openeuler/kernel.git OLK-6.6
head: c0c5f5fdc62998e0c68e6c77d6aae2566185bfd5
commit: 914854f2adb6988ac3b6521088ec96833d6743e2 [1609/1609] driver: crypto - update support for Mont-TSSE Driver
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20241210/202412101347.Sow2DxhA-lkp@…)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241210/202412101347.Sow2DxhA-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412101347.Sow2DxhA-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from drivers/crypto/montage/tsse/tsse_ipc_api.c:10:
In file included from drivers/crypto/montage/tsse/tsse_dev.h:13:
In file included from include/linux/pci.h:1669:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:2243:
include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
508 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
509 | item];
| ~~~~
include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
515 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
516 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
527 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
528 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
536 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
537 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/crypto/montage/tsse/tsse_ipc_api.c:62:36: warning: variable 'device_handle' is uninitialized when used here [-Wuninitialized]
62 | service_instance->device_handle = device_handle;
| ^~~~~~~~~~~~~
drivers/crypto/montage/tsse/tsse_ipc_api.c:56:19: note: initialize the variable 'device_handle' to silence this warning
56 | int device_handle;
| ^
| = 0
6 warnings generated.
vim +/device_handle +62 drivers/crypto/montage/tsse/tsse_ipc_api.c
41
42 /**
43 * tsse_im_service_handle_alloc() - Allocate IPC Message service handle for specific service.
44 * @name: IPC Message service name
45 * @cb: request callback for the service
46 * @handle: function output for the service handle
47 * Return: 0 if allocated successfully, other values for failure
48 */
49 int tsse_im_service_handle_alloc(
50 const char *name,
51 tsse_im_cb_func cb,
52 tsse_im_service_handle *handle)
53 {
54 struct tsse_service_instance *service_instance;
55 int ret;
56 int device_handle;
57
58 service_instance = kzalloc(sizeof(struct tsse_service_instance), GFP_ATOMIC);
59 if (!service_instance)
60 return -ENOMEM;
61 service_instance->service_opened = 0;
> 62 service_instance->device_handle = device_handle;
63 service_instance->cb = cb;
64 strscpy(service_instance->service_name, name, TSSE_IM_SERVICE_NAME_LEN);
65
66 ret = tsse_schedule_device_handle(service_instance);
67 if (ret) {
68 kfree(service_instance);
69 return ret;
70 }
71
72 ret = tsse_service_open(service_instance);
73 if (ret) {
74 pr_err("%s(): open service: %s failed: %d\n",
75 __func__, service_instance->service_name, ret);
76 kfree(service_instance);
77 return ret;
78 }
79 *handle = service_instance;
80 return 0;
81 }
82 EXPORT_SYMBOL_GPL(tsse_im_service_handle_alloc);
83
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
There are two major types of uncorrected recoverable (UCR) errors :
- Synchronous error: The error is detected and raised at the point of the
consumption in the execution flow, e.g. when a CPU tries to access
a poisoned cache line. The CPU will take a synchronous error exception
such as Synchronous External Abort (SEA) on Arm64 and Machine Check
Exception (MCE) on X86. OS requires to take action (for example, offline
failure page/kill failure thread) to recover this uncorrectable error.
- Asynchronous error: The error is detected out of processor execution
context, e.g. when an error is detected by a background scrubber. Some data
in the memory are corrupted. But the data have not been consumed. OS is
optional to take action to recover this uncorrectable error.
Currently, both synchronous and asynchronous error use
memory_failure_queue() to schedule memory_failure() exectute in kworker
context. As a result, when a user-space process is accessing a poisoned
data, a data abort is taken and the memory_failure() is executed in the
kworker context:
- will send wrong si_code by SIGBUS signal in early_kill mode, and
- can not kill the user-space in some cases resulting a synchronous
error infinite loop
Issue 1: send wrong si_code in early_kill mode
Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as
MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED
could be used to determine whether a synchronous exception occurs on
ARM64 platform. When a synchronous exception is detected, the kernel is
expected to terminate the current process which has accessed poisoned
page. This is done by sending a SIGBUS signal with an error code
BUS_MCEERR_AR, indicating an action-required machine check error on
read.
However, when kill_proc() is called to terminate the processes who have
the poisoned page mapped, it sends the incorrect SIGBUS error code
BUS_MCEERR_AO because the context in which it operates is not the one
where the error was triggered.
To reproduce this problem:
# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 5 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO
error and it is not fact.
To fix it, queue memory_failure() as a task_work so that it runs in
the context of the process that is actually consuming the poisoned data.
After this patch set:
# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR
error as we expected.
Issue 2: a synchronous error infinite loop due to memory_failure() failed
If a user-space process, e.g. devmem, a poisoned page which has been set
HWPosion flag, kill_accessing_process() is called to send SIGBUS to the
current processs with error info. Because the memory_failure() is
executed in the kworker contex, it will just do nothing but return
EFAULT. So, devmem will access the posioned page and trigger an
excepction again, resulting in a synchronous error infinite loop. Such
loop may cause platform firmware to exceed some threshold and reboot
when Linux could have recovered from this error.
To reproduce this problem:
# STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
# STEP 2: access the same page and it will trigger a synchronous error infinite loop
devmem 0x4092d55b400
To fix it, if memory_failure() failed, perform a force kill to current process.
Issue 3: a synchronous error infinite loop due to no memory_failure() queued
No memory_failure() work is queued unless all bellow preconditions check passed:
- `if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))` in ghes_handle_memory_failure()
- `if (flags == -1)` in ghes_handle_memory_failure()
- `if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))` in ghes_do_memory_failure()
- `if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) ` in ghes_do_memory_failure()
If the preconditions are not passed, the user-space process will trigger SEA again.
This loop can potentially exceed the platform firmware threshold or even
trigger a kernel hard lockup, leading to a system reboot.
To fix it, if no memory_failure() queued, perform a force kill to current process.
And the the memory errors triggered in kernel-mode[5], also relies on this
patchset to kill the failure thread.
Lv Ying and XiuQi from Huawei also proposed to address similar problem[2][4].
Acknowledge to discussion with them.
[1] Add ARMv8 RAS virtualization support in QEMU https://patchew.org/QEMU/20200512030609.19593-1-gengdongjiu@huawei.com/
[2] https://lore.kernel.org/lkml/20221205115111.131568-3-lvying6@huawei.com/
[3] https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
[4] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
[5] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20240528085915.…
Shuai Xue (4):
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
synchronous events
ACPI: APEI: send SIGBUS to current task if synchronous memory error
not recovered
mm: memory-failure: move return value documentation to function
declaration
ACPI: APEI: handle synchronous exceptions in task work
arch/x86/kernel/cpu/mce/core.c | 5 --
drivers/acpi/apei/ghes.c | 114 ++++++++++++++++++++++-----------
include/acpi/ghes.h | 3 -
include/linux/mm.h | 1 -
mm/memory-failure.c | 19 ++----
5 files changed, 82 insertions(+), 60 deletions(-)
--
2.25.1
Hi Shuai,
FYI, the error/warning still remains.
tree: https://gitee.com/openeuler/kernel.git OLK-6.6
head: db988390007bce595dba0dfd782c610578e26d2d
commit: 4213ff7957de370c1cfe528c2bad1eb2e499038a [1610/1610] net/ethernet/huawei/hinic3: Add the CQM on which the RDMA depends
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20241210/202412101054.7uczAWCS-lkp@…)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241210/202412101054.7uczAWCS-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412101054.7uczAWCS-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:6:
In file included from include/linux/pci.h:1663:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:2204:
include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
508 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
509 | item];
| ~~~~
include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
515 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
516 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
527 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
528 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
536 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
537 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:371:3: error: a randomized struct can only be initialized with a designated initializer
371 | {check_for_use_node_alloc, cqm_buf_use_node_alloc_page},
| ^
drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:372:3: error: a randomized struct can only be initialized with a designated initializer
372 | {check_for_nouse_node_alloc, cqm_buf_unused_node_alloc_page}
| ^
drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:376:3: error: a randomized struct can only be initialized with a designated initializer
376 | {check_use_non_vram, cqm_buf_free_page_common}
| ^
drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:382:25: error: invalid application of 'sizeof' to an incomplete type 'const struct malloc_memory[]'
382 | u32 malloc_funcs_num = ARRAY_SIZE(g_malloc_funcs);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/kernel.h:57:32: note: expanded from macro 'ARRAY_SIZE'
57 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
| ^~~~~
drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:399:23: error: invalid application of 'sizeof' to an incomplete type 'const struct free_memory[]'
399 | u32 free_funcs_num = ARRAY_SIZE(g_free_funcs);
| ^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/kernel.h:57:32: note: expanded from macro 'ARRAY_SIZE'
57 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
| ^~~~~
5 warnings and 5 errors generated.
vim +371 drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c
369
370 static const struct malloc_memory g_malloc_funcs[] = {
> 371 {check_for_use_node_alloc, cqm_buf_use_node_alloc_page},
372 {check_for_nouse_node_alloc, cqm_buf_unused_node_alloc_page}
373 };
374
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
There are two major types of uncorrected recoverable (UCR) errors :
- Synchronous error: The error is detected and raised at the point of the
consumption in the execution flow, e.g. when a CPU tries to access
a poisoned cache line. The CPU will take a synchronous error exception
such as Synchronous External Abort (SEA) on Arm64 and Machine Check
Exception (MCE) on X86. OS requires to take action (for example, offline
failure page/kill failure thread) to recover this uncorrectable error.
- Asynchronous error: The error is detected out of processor execution
context, e.g. when an error is detected by a background scrubber. Some data
in the memory are corrupted. But the data have not been consumed. OS is
optional to take action to recover this uncorrectable error.
Currently, both synchronous and asynchronous error use
memory_failure_queue() to schedule memory_failure() exectute in kworker
context. As a result, when a user-space process is accessing a poisoned
data, a data abort is taken and the memory_failure() is executed in the
kworker context:
- will send wrong si_code by SIGBUS signal in early_kill mode, and
- can not kill the user-space in some cases resulting a synchronous
error infinite loop
Issue 1: send wrong si_code in early_kill mode
Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as
MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED
could be used to determine whether a synchronous exception occurs on
ARM64 platform. When a synchronous exception is detected, the kernel is
expected to terminate the current process which has accessed poisoned
page. This is done by sending a SIGBUS signal with an error code
BUS_MCEERR_AR, indicating an action-required machine check error on
read.
However, when kill_proc() is called to terminate the processes who have
the poisoned page mapped, it sends the incorrect SIGBUS error code
BUS_MCEERR_AO because the context in which it operates is not the one
where the error was triggered.
To reproduce this problem:
# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 5 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO
error and it is not fact.
To fix it, queue memory_failure() as a task_work so that it runs in
the context of the process that is actually consuming the poisoned data.
After this patch set:
# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR
error as we expected.
Issue 2: a synchronous error infinite loop due to memory_failure() failed
If a user-space process, e.g. devmem, a poisoned page which has been set
HWPosion flag, kill_accessing_process() is called to send SIGBUS to the
current processs with error info. Because the memory_failure() is
executed in the kworker contex, it will just do nothing but return
EFAULT. So, devmem will access the posioned page and trigger an
excepction again, resulting in a synchronous error infinite loop. Such
loop may cause platform firmware to exceed some threshold and reboot
when Linux could have recovered from this error.
To reproduce this problem:
# STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
# STEP 2: access the same page and it will trigger a synchronous error infinite loop
devmem 0x4092d55b400
To fix it, if memory_failure() failed, perform a force kill to current process.
Issue 3: a synchronous error infinite loop due to no memory_failure() queued
No memory_failure() work is queued unless all bellow preconditions check passed:
- `if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))` in ghes_handle_memory_failure()
- `if (flags == -1)` in ghes_handle_memory_failure()
- `if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))` in ghes_do_memory_failure()
- `if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) ` in ghes_do_memory_failure()
If the preconditions are not passed, the user-space process will trigger SEA again.
This loop can potentially exceed the platform firmware threshold or even
trigger a kernel hard lockup, leading to a system reboot.
To fix it, if no memory_failure() queued, perform a force kill to current process.
And the the memory errors triggered in kernel-mode[5], also relies on this
patchset to kill the failure thread.
Lv Ying and XiuQi from Huawei also proposed to address similar problem[2][4].
Acknowledge to discussion with them.
[1] Add ARMv8 RAS virtualization support in QEMU https://patchew.org/QEMU/20200512030609.19593-1-gengdongjiu@huawei.com/
[2] https://lore.kernel.org/lkml/20221205115111.131568-3-lvying6@huawei.com/
[3] https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
[4] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
[5] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20240528085915.…
Shuai Xue (4):
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
synchronous events
ACPI: APEI: send SIGBUS to current task if synchronous memory error
not recovered
mm: memory-failure: move return value documentation to function
declaration
ACPI: APEI: handle synchronous exceptions in task work
arch/x86/kernel/cpu/mce/core.c | 5 --
drivers/acpi/apei/ghes.c | 114 ++++++++++++++++++++++-----------
include/acpi/ghes.h | 3 -
include/linux/mm.h | 1 -
mm/memory-failure.c | 19 ++----
5 files changed, 82 insertions(+), 60 deletions(-)
--
2.25.1