[PATCH OLK-6.6 00/10] ACPI & PCI bugfix from 22.03 SP3

older
[PATCH OLK-6.6 0/1] cgroup: Return...

Xiongfeng Wang

22 Dec 2023 22 Dec '23

4:12 p.m.

Xiongfeng Wang (10): pci: do not save 'PCI_BRIDGE_CTL_BUS_RESET' PCI: check BIR before mapping MSI-X Table PCI: Fail MSI-X mapping if MSI-X Table offset is out of range of BAR space sysrq: avoid concurrently info printing by 'sysrq-trigger' PCI: Add MCFG quirks for some Hisilicon Chip host controllers PCI: add a member in 'struct pci_bus' to record the original 'pci_ops' PCI/AER: increments pci bus reference count in aer-inject process ntp: Avoid undefined behaviour in second_overflow() hinic: ethtool: Allow userspace to set more aggregation params PCI/sysfs: Take reference on device to be removed drivers/acpi/pci_mcfg.c | 4 ++++ drivers/net/ethernet/huawei/hinic/hinic_ethtool.c | 10 +++++----- drivers/pci/bus.c | 2 ++ drivers/pci/msi/msi.c | 11 +++++++++++ drivers/pci/pci-sysfs.c | 9 +++++++-- drivers/pci/pci.c | 3 +++ drivers/pci/pcie/aer_inject.c | 9 +++++++++ drivers/pci/probe.c | 12 +++++++++--- drivers/tty/sysrq.c | 6 ++++++ include/linux/pci.h | 1 + kernel/time/ntp.c | 2 ++ 11 files changed, 59 insertions(+), 10 deletions(-) -- 2.20.1

Show replies by date

patchwork bot

22 Dec 22 Dec

3:57 p.m.

反馈：您发送到kernel@openeuler.org的补丁/补丁集，已成功转换为PR！ PR链接地址： https://gitee.com/openeuler/kernel/pulls/3533 邮件列表地址：https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3533 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 01/10] pci: do not save 'PCI_BRIDGE_CTL_BUS_RESET'

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA --------------------------- When I inject a PCIE Fatal error into a mellanox netdevice, 'dmesg' shows the device is recovered successfully, but 'lspci' didn't show the device. I checked the configuration space of the slot where the netdevice is inserted and found out the bit 'PCI_BRIDGE_CTL_BUS_RESET' is set. Later, I found out it is because this bit is saved in 'saved_config_space' of 'struct pci_dev' when 'pci_pm_runtime_suspend()' is called. And 'PCI_BRIDGE_CTL_BUS_RESET' is set every time we restore the configuration sapce. This patch avoid saving the bit 'PCI_BRIDGE_CTL_BUS_RESET' when we save the configuration space of a bridge. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> --- drivers/pci/pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a607f277ccf1..dab00ab19364 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1692,6 +1692,9 @@ int pci_save_state(struct pci_dev *dev) pci_dbg(dev, "save config %#04x: %#010x\n", i * 4, dev->saved_config_space[i]); } + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dev->saved_config_space[PCI_BRIDGE_CONTROL / 4] &= + ~(PCI_BRIDGE_CTL_BUS_RESET << 16); dev->state_saved = true; i = pci_save_pcie_state(dev); -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 02/10] PCI: check BIR before mapping MSI-X Table

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ------------------- We use 'bir' as the index of array resource[DEVICE_COUNT_RESOURCE]. Wrong 'bir' will cause access out of range. This patch add a check for 'bir'. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> Conflicts: drivers/pci/msi.c Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> --- drivers/pci/msi/msi.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index ef1d8857a51b..21355bfe58a3 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -559,6 +559,11 @@ static void __iomem *msix_map_region(struct pci_dev *dev, pci_read_config_dword(dev, dev->msix_cap + PCI_MSIX_TABLE, &table_offset); bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR); + if (bir >= DEVICE_COUNT_RESOURCE) { + dev_err(&dev->dev, "MSI-X Table BIR is out of range !\n"); + return NULL; + } + flags = pci_resource_flags(dev, bir); if (!flags || (flags & IORESOURCE_UNSET)) return NULL; -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 03/10] PCI: Fail MSI-X mapping if MSI-X Table offset is out of range of BAR space

From: Xiongfeng Wang <xiongfeng.wang@linaro.org> euler inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ---------------------------------------- This patch add check for the offset of MSI-X Table. If it is out of range of the BAR space BIR selects, we just fail this MSI-X mapping. Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> Conflicts: drivers/pci/msi.c Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> --- drivers/pci/msi/msi.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 21355bfe58a3..130bf74de69b 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -569,6 +569,12 @@ static void __iomem *msix_map_region(struct pci_dev *dev, return NULL; table_offset &= PCI_MSIX_TABLE_OFFSET; + if (table_offset >= pci_resource_len(dev, bir)) { + dev_err(&dev->dev, + "MSI-X Table offset is out of range of BAR:%d!\n", + bir); + return NULL; + } phys_addr = pci_resource_start(dev, bir) + table_offset; return ioremap(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE); -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 04/10] sysrq: avoid concurrently info printing by 'sysrq-trigger'

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA --------------------------------- When we print system information by echo 't' into 'sysrq-trigger' on several cores at the same time, we got the following calltrace. [ 1352.854632] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6 [ 1352.854633] Modules linked in: nf_log_arp nf_log_ipv6 nf_log_ipv4 nf_log_common binfmt_misc salsa20_generic camellia_generic cast6_generic cast_common rfkill serpent_generic twofish_generic twofish_common xts lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 sha512_generic loop jprob(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat hns_roce_hw_v2 hns_roce ib_core aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ipmi_ssif ofpart sha256_arm64 sha1_ce cmdlinepart [ 1352.854649] hi_sfc ses enclosure mtd sg sbsa_gwdt ipmi_si ipmi_devintf ipmi_msghandler spi_dw_mmio sch_fq_codel ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod realtek hclge hisi_sas_v3_hw hisi_sas_main ahci libsas libahci hns3 hinic libata usb_storage hnae3 megaraid_sas scsi_transport_sas i2c_designware_platform i2c_designware_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_vs] [ 1352.854658] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1 [ 1352.854659] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019 [ 1352.854659] pstate: 80400089 (Nzcv daIf +PAN -UAO) [ 1352.854660] pc : queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854660] lr : print_cpu+0x414/0x690 [ 1352.854660] sp : ffff0001743afb80 [ 1352.854661] x29: ffff0001743afb80 x28: ffff805fcef6e880 [ 1352.854662] x27: 0000000000000000 x26: 0000000000000000 [ 1352.854662] x25: ffff000008cab000 x24: ffff000008cab000 [ 1352.854663] x23: 0000000000000000 x22: 0000000000000000 [ 1352.854664] x21: ffff000009478000 x20: 0000000000900001 [ 1352.854664] x19: ffff000009478d20 x18: ffffffffffffffff [ 1352.854665] x17: 0000000000000000 x16: 0000000000000000 [ 1352.854666] x15: ffff000009273708 x14: ffff00000947af60 [ 1352.854667] x13: ffff00000947abab x12: ffff00000929d000 [ 1352.854668] x11: 0000000000006fc8 x10: ffff00000947a1c0 [ 1352.854668] x9 : 0000000000000001 x8 : 0000000000000000 [ 1352.854669] x7 : ffff0000092737c8 x6 : ffff803fffc9e1c0 [ 1352.854670] x5 : 0000000000000000 x4 : ffff803fffc9e1c0 [ 1352.854671] x3 : ffff000008f5e000 x2 : 00000000001c0000 [ 1352.854671] x1 : 0000000000000000 x0 : ffff803fffc9e1c8 [ 1352.854672] Call trace: [ 1352.854673] queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854673] print_cpu+0x414/0x690 [ 1352.854673] sysrq_sched_debug_show+0x50/0x80 [ 1352.854674] show_state_filter+0xc0/0xd0 [ 1352.854674] sysrq_handle_showstate+0x18/0x28 [ 1352.854674] __handle_sysrq+0xa0/0x190 [ 1352.854675] write_sysrq_trigger+0x70/0x88 [ 1352.854675] proc_reg_write+0x80/0xd8 [ 1352.854675] __vfs_write+0x60/0x190 [ 1352.854676] vfs_write+0xac/0x1c0 [ 1352.854676] ksys_write+0x74/0xf0 [ 1352.854676] __arm64_sys_write+0x24/0x30 [ 1352.854677] el0_svc_common+0x78/0x130 [ 1352.854677] el0_svc_handler+0x38/0x78 [ 1352.854677] el0_svc+0x8/0xc [ 1352.854678] Kernel panic - not syncing: Hard LOCKUP [ 1352.854679] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1 [ 1352.854679] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019 [ 1352.854679] Call trace: [ 1352.854680] dump_backtrace+0x0/0x198 [ 1352.854680] show_stack+0x24/0x30 [ 1352.854681] dump_stack+0xa4/0xc4 [ 1352.854681] panic+0x130/0x304 [ 1352.854681] __stack_chk_fail+0x0/0x28 [ 1352.854682] watchdog_hardlockup_check+0x138/0x140 [ 1352.854682] sdei_watchdog_callback+0x20/0x30 [ 1352.854682] sdei_event_handler+0x50/0xf0 [ 1352.854683] __sdei_handler+0xd8/0x228 [ 1352.854683] __sdei_asm_handler+0xbc/0x134 [ 1352.854683] queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854684] print_cpu+0x414/0x690 [ 1352.854684] sysrq_sched_debug_show+0x50/0x80 [ 1352.854684] show_state_filter+0xc0/0xd0 [ 1352.854685] sysrq_handle_showstate+0x18/0x28 [ 1352.854685] __handle_sysrq+0xa0/0x190 [ 1352.854685] write_sysrq_trigger+0x70/0x88 [ 1352.854686] proc_reg_write+0x80/0xd8 [ 1352.854686] __vfs_write+0x60/0x190 [ 1352.854686] vfs_write+0xac/0x1c0 [ 1352.854687] ksys_write+0x74/0xf0 [ 1352.854687] __arm64_sys_write+0x24/0x30 [ 1352.854687] el0_svc_common+0x78/0x130 [ 1352.854688] el0_svc_handler+0x38/0x78 [ 1352.854688] el0_svc+0x8/0xc It is because there are many processes in the system. 'print_cpu()' aquires 'sched_debug_lock', print some information, and releases 'sched_debug_lock'. This procedure takes about 4 seconds in our testcase. When four cores concurrently print system info by sysrq, it will takes the last core 12 seconds to get the spinlock. This will cause a hardlockup. Signed-off-by: Kai Shen <shenkai8@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-By: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> --- drivers/tty/sysrq.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 6b4a28bcf2f5..5cc7b98d2d73 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -1148,6 +1148,9 @@ int unregister_sysrq_key(u8 key, const struct sysrq_key_op *op_p) EXPORT_SYMBOL(unregister_sysrq_key); #ifdef CONFIG_PROC_FS + +static DEFINE_MUTEX(sysrq_mutex); + /* * writing 'C' to /proc/sysrq-trigger is like sysrq-C */ @@ -1159,7 +1162,10 @@ static ssize_t write_sysrq_trigger(struct file *file, const char __user *buf, if (get_user(c, buf)) return -EFAULT; + + mutex_lock(&sysrq_mutex); __handle_sysrq(c, false); + mutex_unlock(&sysrq_mutex); } return count; -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 05/10] PCI: Add MCFG quirks for some Hisilicon Chip host controllers

euler inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ------------------------------------------------- The PCIe controller in some Hisilicon Chip is not completely ECAM-compliant. Part of its PCIe cores do not support ECAM. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- drivers/acpi/pci_mcfg.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 860014b89b8e..96eccc4b1678 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -79,6 +79,10 @@ static struct mcfg_fixup mcfg_quirks[] = { HISI_QUAD_DOM("HIP07 ", 4, &hisi_pcie_ops), HISI_QUAD_DOM("HIP07 ", 8, &hisi_pcie_ops), HISI_QUAD_DOM("HIP07 ", 12, &hisi_pcie_ops), + HISI_QUAD_DOM("HIP12 ", 0x20, &hisi_pcie_ops), + HISI_QUAD_DOM("HIP12 ", 0x24, &hisi_pcie_ops), + HISI_QUAD_DOM("HIP12 ", 0x28, &hisi_pcie_ops), + HISI_QUAD_DOM("HIP12 ", 0x2c, &hisi_pcie_ops), #define THUNDER_PEM_RES(addr, node) \ DEFINE_RES_MEM((addr) + ((u64) (node) << 44), 0x39 * SZ_16M) -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 06/10] PCI: add a member in 'struct pci_bus' to record the original 'pci_ops'

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ------------------------------------------------------------------------- When I test 'aer-inject' with the following procedures: 1. inject a fatal error into a upstream PCI bridge 2. remove the upstream bridge by sysfs 3. rescan the PCI tree by 'echo 1 > /sys/bus/pci/rescan' 4. execute command 'rmmod aer-inject' 5. remove the upstream bridge by sysfs again I came across the following Oops. [ 799.713238] Internal error: Oops: 96000007 [#1] SMP [ 799.718099] Process bash (pid: 10683, stack limit = 0x00000000125a3b1b) [ 799.724686] CPU: 108 PID: 10683 Comm: bash Kdump: loaded Not tainted 4.19.36 #2 [ 799.731962] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019 [ 799.739325] pstate: 40400009 (nZcv daif +PAN -UAO) [ 799.744104] pc : pci_remove_bus+0xc0/0x1c0 [ 799.748182] lr : pci_remove_bus+0x94/0x1c0 [ 799.752260] sp : ffffa02e335df940 [ 799.755560] x29: ffffa02e335df940 x28: ffff2000088216a8 [ 799.760849] x27: 1ffff405c66bbfbc x26: ffff20000a9518c0 [ 799.766139] x25: ffffa02dea6ec418 x24: 1ffff405bd4dd883 [ 799.771427] x23: ffffa02e72576628 x22: 1ffff405ce4aecc0 [ 799.776715] x21: ffffa02e72576608 x20: ffff200002e75080 [ 799.782003] x19: ffffa02e72576600 x18: 0000000000000000 [ 799.787291] x17: 0000000000000000 x16: 0000000000000000 [ 799.792578] x15: 0000000000000001 x14: dfff200000000000 [ 799.797866] x13: ffff20000a6dfaf0 x12: 0000000000000000 [ 799.803154] x11: 1fffe4000159b217 x10: ffff04000159b217 [ 799.808442] x9 : dfff200000000000 x8 : ffff20000acd90bf [ 799.813730] x7 : 0000000000000000 x6 : 0000000000000000 [ 799.819017] x5 : 0000000000000001 x4 : 0000000000000000 [ 799.824306] x3 : 1ffff405dbe62603 x2 : 1fffe400005cea11 [ 799.829593] x1 : dfff200000000000 x0 : ffff200002e75088 [ 799.834882] Call trace: [ 799.837323] pci_remove_bus+0xc0/0x1c0 [ 799.841056] pci_remove_bus_device+0xd0/0x2f0 [ 799.845392] pci_stop_and_remove_bus_device_locked+0x2c/0x40 [ 799.851028] remove_store+0x1b8/0x1d0 [ 799.854679] dev_attr_store+0x60/0x80 [ 799.858330] sysfs_kf_write+0x104/0x170 [ 799.862149] kernfs_fop_write+0x23c/0x430 [ 799.866143] __vfs_write+0xec/0x4e0 [ 799.869615] vfs_write+0x12c/0x3d0 [ 799.873001] ksys_write+0xd0/0x190 [ 799.876389] __arm64_sys_write+0x70/0xa0 [ 799.880298] el0_svc_common+0xfc/0x278 [ 799.884030] el0_svc_handler+0x50/0xc0 [ 799.887764] el0_svc+0x8/0xc [ 799.890634] Code: d2c40001 f2fbffe1 91002280 d343fc02 (38e16841) [ 799.896700] kernel fault(0x1) notification starting on CPU 108 It is because when we alloc a new bus in rescanning process, the 'pci_ops' of the newly allocced 'pci_bus' is inherited from its parent pci bus. Whereas, the 'pci_ops' of the parent bus may be changed to 'aer_inj_pci_ops' in 'aer_inject()'. When we unload the module 'aer_inject', we only restore the 'pci_ops' for the pci bus of the error-injected device and the root port in 'aer_inject_exit'. After we have unloaded the module, the 'pci_ops' of the newly allocced pci bus is still 'aer_inj_pci_ops'. When we access it, an Oops happened. This patch add a member 'backup_ops' in 'struct pci_bus' to record the original 'ops'. When we alloc a child pci bus, we assign the 'backup_ops' of the parent bus to the 'ops' of the child bus. Maybe the best way is to not modify the 'pci_ops' in 'struct pci_bus', but this will refactor the 'aer_inject' framework a lot. I haven't found a better way to handle it. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Conflicts: include/linux/pci.h Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> --- drivers/pci/probe.c | 12 +++++++++--- include/linux/pci.h | 1 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 43159965e09e..1681a9f454f4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -897,6 +897,7 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge) bus->sysdata = bridge->sysdata; bus->ops = bridge->ops; + bus->backup_ops = bus->ops; bus->number = bus->busn_res.start = bridge->busnr; #ifdef CONFIG_PCI_DOMAINS_GENERIC if (bridge->domain_nr == PCI_DOMAIN_NR_NOT_SET) @@ -1098,10 +1099,15 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, child->bus_flags = parent->bus_flags; host = pci_find_host_bridge(parent); - if (host->child_ops) + if (host->child_ops) { child->ops = host->child_ops; - else - child->ops = parent->ops; + } else { + if (parent->backup_ops) + child->ops = parent->backup_ops; + else + child->ops = parent->ops; + } + child->backup_ops = child->ops; /* * Initialize some portions of the bus device, but don't register diff --git a/include/linux/pci.h b/include/linux/pci.h index b56417276042..9c8d2cddf465 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -657,6 +657,7 @@ struct pci_bus { struct resource busn_res; /* Bus numbers routed to this bus */ struct pci_ops *ops; /* Configuration access functions */ + struct pci_ops *backup_ops; void *sysdata; /* Hook for sys-specific extension */ struct proc_dir_entry *procdir; /* Directory entry in /proc/bus/pci */ -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 07/10] PCI/AER: increments pci bus reference count in aer-inject process

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ------------------------------------------------------------------------- When I test 'aer-inject' with the following procedures: 1. inject a fatal error into a PCI device 2. remove the parent device by sysfs 3. execute command 'rmmod aer-inject' I came across the following use-after-free. [ 297.581524] ================================================================== [ 297.581543] BUG: KASAN: use-after-free in pci_bus_set_ops+0xb4/0xb8 [ 297.581545] Read of size 8 at addr ffff802edbde80e0 by task rmmod/21839 [ 297.581552] CPU: 119 PID: 21839 Comm: rmmod Kdump: loaded Not tainted 4.19.36 #1 [ 297.581554] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019 [ 297.581556] Call trace: [ 297.581561] dump_backtrace+0x0/0x360 [ 297.581563] show_stack+0x24/0x30 [ 297.581569] dump_stack+0xd8/0x104 [ 297.581576] print_address_description+0x68/0x278 [ 297.581578] kasan_report+0x204/0x330 [ 297.581580] __asan_report_load8_noabort+0x30/0x40 [ 297.581582] pci_bus_set_ops+0xb4/0xb8 [ 297.581591] aer_inject_exit+0x198/0x334 [aer_inject] [ 297.581595] __arm64_sys_delete_module+0x310/0x490 [ 297.581601] el0_svc_common+0xfc/0x278 [ 297.581603] el0_svc_handler+0x50/0xc0 [ 297.581605] el0_svc+0x8/0xc [ 297.581608] Allocated by task 1: [ 297.581611] kasan_kmalloc+0xe0/0x190 [ 297.581614] kmem_cache_alloc_trace+0x104/0x218 [ 297.581616] pci_alloc_bus+0x50/0x2e0 [ 297.581618] pci_add_new_bus+0xa8/0xe08 [ 297.581620] pci_scan_bridge_extend+0x884/0xb28 [ 297.581623] pci_scan_child_bus_extend+0x350/0x628 [ 297.581625] pci_scan_child_bus+0x24/0x30 [ 297.581627] pci_scan_bridge_extend+0x3b8/0xb28 [ 297.581629] pci_scan_child_bus_extend+0x350/0x628 [ 297.581631] pci_scan_child_bus+0x24/0x30 [ 297.581635] acpi_pci_root_create+0x558/0x888 [ 297.581640] pci_acpi_scan_root+0x198/0x330 [ 297.581641] acpi_pci_root_add+0x7bc/0xbb0 [ 297.581646] acpi_bus_attach+0x2f4/0x728 [ 297.581647] acpi_bus_attach+0x1b0/0x728 [ 297.581649] acpi_bus_attach+0x1b0/0x728 [ 297.581651] acpi_bus_scan+0xa0/0x110 [ 297.581657] acpi_scan_init+0x20c/0x500 [ 297.581659] acpi_init+0x54c/0x5d4 [ 297.581661] do_one_initcall+0xbc/0x480 [ 297.581665] kernel_init_freeable+0x5fc/0x6ac [ 297.581670] kernel_init+0x18/0x128 [ 297.581671] ret_from_fork+0x10/0x18 [ 297.581673] Freed by task 19270: [ 297.581675] __kasan_slab_free+0x120/0x228 [ 297.581677] kasan_slab_free+0x10/0x18 [ 297.581678] kfree+0x80/0x1f8 [ 297.581680] release_pcibus_dev+0x54/0x68 [ 297.581686] device_release+0xd4/0x1c0 [ 297.581689] kobject_put+0x12c/0x400 [ 297.581691] device_unregister+0x30/0xc0 [ 297.581693] pci_remove_bus+0xe8/0x1c0 [ 297.581695] pci_remove_bus_device+0xd0/0x2f0 [ 297.581697] pci_stop_and_remove_bus_device_locked+0x2c/0x40 [ 297.581701] remove_store+0x1b8/0x1d0 [ 297.581703] dev_attr_store+0x60/0x80 [ 297.581708] sysfs_kf_write+0x104/0x170 [ 297.581710] kernfs_fop_write+0x23c/0x430 [ 297.581713] __vfs_write+0xec/0x4e0 [ 297.581714] vfs_write+0x12c/0x3d0 [ 297.581715] ksys_write+0xd0/0x190 [ 297.581716] __arm64_sys_write+0x70/0xa0 [ 297.581718] el0_svc_common+0xfc/0x278 [ 297.581720] el0_svc_handler+0x50/0xc0 [ 297.581721] el0_svc+0x8/0xc [ 297.581724] The buggy address belongs to the object at ffff802edbde8000 which belongs to the cache kmalloc-2048 of size 2048 [ 297.581726] The buggy address is located 224 bytes inside of 2048-byte region [ffff802edbde8000, ffff802edbde8800) [ 297.581727] The buggy address belongs to the page: [ 297.581730] page:ffff7e00bb6f7a00 count:1 mapcount:0 mapping:ffff8026de810780 index:0x0 compound_mapcount: 0 [ 297.591520] flags: 0x2ffffe0000008100(slab|head) [ 297.596121] raw: 2ffffe0000008100 ffff7e00bb6f5008 ffff7e00bb6ff608 ffff8026de810780 [ 297.596123] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000 [ 297.596124] page dumped because: kasan: bad access detected [ 297.596126] Memory state around the buggy address: [ 297.596128] ffff802edbde7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 297.596129] ffff802edbde8000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596131] >ffff802edbde8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596132] ^ [ 297.596133] ffff802edbde8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596135] ffff802edbde8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596135] ================================================================== It is because when we unload the module and restore the member 'pci_ops' of 'pci_bus', the 'pci_bus' has been freed. This patch increments the reference count of 'pci_bus' when we modify its member 'pci_ops' and decrements the reference count after we have restored its member. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> --- drivers/pci/bus.c | 2 ++ drivers/pci/pcie/aer_inject.c | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 9c2137dae429..8232e2adc82f 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -441,9 +441,11 @@ struct pci_bus *pci_bus_get(struct pci_bus *bus) get_device(&bus->dev); return bus; } +EXPORT_SYMBOL(pci_bus_get); void pci_bus_put(struct pci_bus *bus) { if (bus) put_device(&bus->dev); } +EXPORT_SYMBOL(pci_bus_put); diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c index 2dab275d252f..e3c47e4a7325 100644 --- a/drivers/pci/pcie/aer_inject.c +++ b/drivers/pci/pcie/aer_inject.c @@ -26,6 +26,7 @@ #include <linux/device.h> #include "portdrv.h" +#include "../pci.h" /* Override the existing corrected and uncorrected error masks */ static bool aer_mask_override; @@ -307,6 +308,13 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus) spin_lock_irqsave(&inject_lock, flags); if (ops == &aer_inj_pci_ops) goto out; + /* + * increments the reference count of the pci bus. Otherwise, when we + * restore the 'pci_ops' in 'aer_inject_exit', the 'pci_bus' may have + * been freed. + */ + pci_bus_get(bus); + pci_bus_ops_init(bus_ops, bus, ops); list_add(&bus_ops->list, &pci_bus_ops_list); bus_ops = NULL; @@ -530,6 +538,7 @@ static void __exit aer_inject_exit(void) while ((bus_ops = pci_bus_ops_pop())) { pci_bus_set_ops(bus_ops->bus, bus_ops->ops); + pci_bus_put(bus_ops->bus); kfree(bus_ops); } -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 08/10] ntp: Avoid undefined behaviour in second_overflow()

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ---------------------------------------- When I ran Syzkaller testsuite, I got the following call trace. Reviewed-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> ================================================================================ UBSAN: Undefined behaviour in kernel/time/ntp.c:457:16 signed integer overflow: 9223372036854775807 + 500 cannot be represented in type 'long int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.25-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 ubsan_epilogue+0xe/0x81 lib/ubsan.c:159 handle_overflow+0x193/0x1e2 lib/ubsan.c:190 second_overflow+0x403/0x540 kernel/time/ntp.c:457 accumulate_nsecs_to_secs kernel/time/timekeeping.c:2002 [inline] logarithmic_accumulation kernel/time/timekeeping.c:2046 [inline] timekeeping_advance+0x2bb/0xec0 kernel/time/timekeeping.c:2114 tick_do_update_jiffies64.part.2+0x1a0/0x350 kernel/time/tick-sched.c:97 tick_do_update_jiffies64 kernel/time/tick-sched.c:1229 [inline] tick_nohz_update_jiffies kernel/time/tick-sched.c:499 [inline] tick_nohz_irq_enter kernel/time/tick-sched.c:1232 [inline] tick_irq_enter+0x1fd/0x240 kernel/time/tick-sched.c:1249 irq_enter+0xc4/0x100 kernel/softirq.c:353 entering_irq arch/x86/include/asm/apic.h:517 [inline] entering_ack_irq arch/x86/include/asm/apic.h:523 [inline] smp_apic_timer_interrupt+0x20/0x480 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 c0 21 b1 83 e8 a1 0a 02 ff e9 ab fd ff ff 4c 89 e7 e8 77 b6 a5 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffff888106307d20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881062e4f1c RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848c78a0 R13: 0000000000000003 R14: 1ffff11020c60fae R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ca/0x420 kernel/sched/idle.c:262 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368 start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 ================================================================================ It is because time_maxerror is set as 0x7FFFFFFFFFFFFFFF by user. It overflows when we add it with 'MAXFREQ / NSEC_PER_USEC' in 'second_overflow()'. This patch add a limit check and saturate it when the user set 'time_maxerror'. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Yang Yingliang <yangyingliang@huawei.com> --- kernel/time/ntp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 406dccb79c2b..5b1ea7d5478b 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -728,6 +728,8 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc, if (txc->modes & ADJ_MAXERROR) time_maxerror = txc->maxerror; + if (time_maxerror > NTP_PHASE_LIMIT) + time_maxerror = NTP_PHASE_LIMIT; if (txc->modes & ADJ_ESTERROR) time_esterror = txc->esterror; -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 09/10] hinic: ethtool: Allow userspace to set more aggregation params

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA -------------------------------- Allow userspace to set the following interrupt aggregation params by ethtool: Adaptive RX:on/off rx/tx-usecs rx/tx-usecs-low/high rx/tx-frames rx/tx-frames-low/high Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Conflicts: drivers/net/ethernet/huawei/hinic/hinic_ethtool.c Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> --- drivers/net/ethernet/huawei/hinic/hinic_ethtool.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c index f4b680286911..5da76f56460c 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c @@ -1823,11 +1823,11 @@ static const struct ethtool_ops hinic_ethtool_ops = { }; static const struct ethtool_ops hinicvf_ethtool_ops = { - .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | - ETHTOOL_COALESCE_RX_MAX_FRAMES | - ETHTOOL_COALESCE_TX_USECS | - ETHTOOL_COALESCE_TX_MAX_FRAMES, - + .supported_coalesce_params = ETHTOOL_COALESCE_USECS | + ETHTOOL_COALESCE_MAX_FRAMES | + ETHTOOL_COALESCE_USECS_LOW_HIGH | + ETHTOOL_COALESCE_MAX_FRAMES_LOW_HIGH | + ETHTOOL_COALESCE_PKT_RATE_RX_USECS, .get_link_ksettings = hinic_get_link_ksettings, .get_drvinfo = hinic_get_drvinfo, .get_link = ethtool_op_get_link, -- 2.20.1

Xiongfeng Wang

4:12 p.m.

New subject: [PATCH OLK-6.6 10/10] PCI/sysfs: Take reference on device to be removed

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLHB CVE: NA ------------------------------------ When I do some aer-inject and sysfs remove stress tests, I got the following use-after-free Calltrace: ================================================================== BUG: KASAN: use-after-free in pci_stop_bus_device+0x174/0x178 Read of size 8 at addr fffffc3e2e402218 by task bash/26311 CPU: 38 PID: 26311 Comm: bash Tainted: G W 4.19.105+ #82 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B161.01 06/10/2021 Call trace: dump_backtrace+0x0/0x360 show_stack+0x24/0x30 dump_stack+0x130/0x164 print_address_description+0x68/0x278 kasan_report+0x204/0x330 __asan_report_load8_noabort+0x30/0x40 pci_stop_bus_device+0x174/0x178 pci_stop_and_remove_bus_device_locked+0x24/0x40 remove_store+0x1c8/0x1e0 dev_attr_store+0x60/0x80 sysfs_kf_write+0x104/0x170 kernfs_fop_write+0x23c/0x430 __vfs_write+0xec/0x4e0 vfs_write+0x12c/0x3d0 ksys_write+0xe8/0x208 __arm64_sys_write+0x70/0xa0 el0_svc_common+0x10c/0x450 el0_svc_handler+0x50/0xc0 el0_svc+0x10/0x14 Allocated by task 684: kasan_kmalloc+0xe0/0x190 kmem_cache_alloc_trace+0x110/0x240 pci_alloc_dev+0x4c/0x110 pci_scan_single_device+0x100/0x218 pci_scan_slot+0x8c/0x2d8 pci_scan_child_bus_extend+0x90/0x628 pci_scan_child_bus+0x24/0x30 pci_scan_bridge_extend+0x3b8/0xb28 pci_scan_child_bus_extend+0x350/0x628 pci_rescan_bus+0x24/0x48 pcie_do_fatal_recovery+0x390/0x4b0 handle_error_source+0x124/0x158 aer_isr+0x5a0/0x800 process_one_work+0x598/0x1250 worker_thread+0x384/0xf08 kthread+0x2a4/0x320 ret_from_fork+0x10/0x18 Freed by task 685: __kasan_slab_free+0x120/0x228 kasan_slab_free+0x10/0x18 kfree+0x88/0x218 pci_release_dev+0xb4/0xd8 device_release+0x6c/0x1c0 kobject_put+0x12c/0x400 put_device+0x24/0x30 pci_dev_put+0x24/0x30 handle_error_source+0x12c/0x158 aer_isr+0x5a0/0x800 process_one_work+0x598/0x1250 worker_thread+0x384/0xf08 kthread+0x2a4/0x320 ret_from_fork+0x10/0x18 The buggy address belongs to the object at fffffc3e2e402200 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 24 bytes inside of 4096-byte region [fffffc3e2e402200, fffffc3e2e403200) The buggy address belongs to the page: page:ffff7ff0f8b90000 count:1 mapcount:0 mapping:ffffdc365f016e00 index:0x0 compound_mapcount: 0 flags: 0x6ffffe0000008100(slab|head) raw: 6ffffe0000008100 ffff7f70d83aae00 0000000300000003 ffffdc365f016e00 raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: fffffc3e2e402100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fffffc3e2e402180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

...

fffffc3e2e402200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ fffffc3e2e402280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffc3e2e402300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================

It is caused by the following race condition: CPU0 CPU1 remove_store() aer_isr() device_remove_file_self() handle_error_source() pci_stop_and_remove_bus_device_locked pcie_do_fatal_recovery() (blocked) pci_lock_rescan_remove() #CPU1 acquire the lock pci_stop_and_remove_bus_device() pci_unlock_rescan_remove() #CPU1 release the lock pci_lock_rescan_remove() #CPU0 acquire the lock pci_dev_put() #free pci_dev pci_stop_and_remove_bus_device() pci_stop_bus_device() #use-after-free pci_unlock_rescan_remove() An AER interrupt is triggered on CPU1. CPU1 starts to process it. A work 'aer_isr()' is scheduled on CPU1. It calling into pcie_do_fatal_recovery(), and aquire lock 'pci_rescan_remove_lock'. Before it removes the sysfs corresponding to the error pci device, a sysfs remove operation is executed on CPU0. CPU0 use device_remove_file_self() to remove the sysfs directory and wait for the lock to be released. After CPU1 finish pci_stop_and_remove_bus_device(), it release the lock and free the 'pci_dev' in pci_dev_put(). CPU0 acquire the lock and access the 'pci_dev'. Then a use-after-free is triggered. To fix this issue, we increase the reference count in remove_store() before remove the device and decrease the reference count in the end. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> Conflicts: drivers/pci/pci-sysfs.c Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> --- drivers/pci/pci-sysfs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 3317b9354716..e3373cdc5244 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -483,12 +483,17 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { unsigned long val; + struct pci_dev *pdev = to_pci_dev(dev); if (kstrtoul(buf, 0, &val) < 0) return -EINVAL; - if (val && device_remove_file_self(dev, attr)) - pci_stop_and_remove_bus_device_locked(to_pci_dev(dev)); + if (val) { + pci_dev_get(pdev); + if (device_remove_file_self(dev, attr)) + pci_stop_and_remove_bus_device_locked(pdev); + pci_dev_put(pdev); + } return count; } static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL, -- 2.20.1

562

Age (days ago)

562

Last active (days ago)

List overview

11 comments

2 participants

participants (2)

patchwork bot
Xiongfeng Wang

[PATCH OLK-6.6 00/10] ACPI & PCI bugfix from 22.03 SP3

tags

participants (2)