- Kernel - mailweb.openeuler.org

[PATCH OLK-6.6 V2] pciehp: fix a race between pciehp and removing operations by sysfs
by Xiongfeng Wang 26 Jan '24

26 Jan '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8YPFU CVE: NA ------------------------------------------------- When I run a stress test about pcie hotplug and removing operations by sysfs, I got a hange task, and the following call trace is printed. INFO: task irq/746-pciehp:41551 blocked for more than 120 seconds. Tainted: P W OE 4.19.25- "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. irq/746-pciehp D 0 41551 2 0x00000228 Call trace: __switch_to+0x94/0xe8 __schedule+0x270/0x8b0 schedule+0x2c/0x88 schedule_preempt_disabled+0x14/0x20 __mutex_lock.isra.1+0x1fc/0x540 __mutex_lock_slowpath+0x24/0x30 mutex_lock+0x80/0xa8 pci_lock_rescan_remove+0x20/0x28 pciehp_configure_device+0x30/0x140 pciehp_handle_presence_or_link_change+0x35c/0x4b0 pciehp_ist+0x1cc/0x1d0 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x200 kthread+0x134/0x138 ret_from_fork+0x10/0x18 INFO: task bash:6424 blocked for more than 120 seconds. Tainted: P W OE 4.19.25- "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bash D 0 6424 2231 0x00000200 Call trace: __switch_to+0x94/0xe8 __schedule+0x270/0x8b0 schedule+0x2c/0x88 schedule_timeout+0x224/0x448 wait_for_common+0x198/0x2a0 wait_for_completion+0x28/0x38 kthread_stop+0x60/0x190 __free_irq+0x1c0/0x348 free_irq+0x40/0x88 pcie_shutdown_notification+0x54/0x80 pciehp_remove+0x30/0x50 pcie_port_remove_service+0x3c/0x58 device_release_driver_internal+0x1b4/0x250 device_release_driver+0x28/0x38 bus_remove_device+0xd4/0x160 device_del+0x128/0x348 device_unregister+0x24/0x78 remove_iter+0x48/0x58 device_for_each_child+0x6c/0xb8 pcie_port_device_remove+0x2c/0x48 pcie_portdrv_remove+0x5c/0x68 pci_device_remove+0x48/0xd8 device_release_driver_internal+0x1b4/0x250 device_release_driver+0x28/0x38 pci_stop_bus_device+0x84/0xb8 pci_stop_and_remove_bus_device_locked+0x24/0x40 remove_store+0xa4/0xb8 dev_attr_store+0x44/0x60 sysfs_kf_write+0x58/0x80 kernfs_fop_write+0xe8/0x1f0 __vfs_write+0x60/0x190 vfs_write+0xac/0x1c0 ksys_write+0x6c/0xd8 __arm64_sys_write+0x24/0x30 el0_svc_common+0xa0/0x180 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc When we remove a slot by sysfs. 'pci_stop_and_remove_bus_device_locked()' will be called. This function will get the global mutex lock 'pci_rescan_remove_lock', and remove the slot. If the irq thread 'pciehp_ist' is still running, we will wait until it exits. If a pciehp interrupt happens immediately after we remove the slot by sysfs, but before we free the pciehp irq in 'pci_stop_and_remove_bus_device_locked()'. 'pciehp_ist' will hung because the global mutex lock 'pci_rescan_remove_lock' is held by the sysfs operation. But the sysfs operation is waiting for the pciehp irq thread 'pciehp_ist' ends. Then a hung task occurs. So this two kinds of operation, removing through attention buttion and removing through /sys/devices/pci***/remove, should not be excuted at the same time. This patch add a global variable to mark that one of these operations is under processing. When this variable is set, if another operation is requested, it will be rejected. We use a global variable 'slot_being_removed_rescaned' to mark whether a slot is being removed or rescaned. This will cause a slot hotplug operation is delayed if another slot is being remove or rescaned. But if these two slots are under different root ports, they should not influence each other. This patch make the flag 'slot_being_removed_rescanned' per root port so that one slot hotplug operation doesn't influence slots below another root port. We record the root port in struct pci_dev when the pci device is initialized and added into the system instead of using 'pcie_find_root_port()' to find the root port when we need it. Because iterating the pci tree needs the protection of 'pci_lock_rescan_remove()'. This will make the problem more complexed because the lock is very coarse-grained. We don't need to worry about 'use-after-free' because child pci devices are always removed before the root port device is removed. Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> Reviewed-by: Hanjun Guo <guohanjun(a)huawei.com> Conflicts: drivers/pci/hotplug/pciehp_ctrl.c drivers/pci/hotplug/pciehp_hpc.c include/linux/pci.h Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- drivers/pci/hotplug/pciehp.h | 5 ++ drivers/pci/hotplug/pciehp_ctrl.c | 40 ++++++++++++++++ drivers/pci/hotplug/pciehp_hpc.c | 79 ++++++++++++++++++++++++++----- drivers/pci/pci-sysfs.c | 22 +++++++++ drivers/pci/probe.c | 5 ++ include/linux/pci.h | 8 ++++ include/linux/workqueue.h | 2 + 7 files changed, 150 insertions(+), 11 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index e0a614acee05..dad48d151ce2 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -199,6 +199,11 @@ static inline const char *slot_name(struct controller *ctrl) return hotplug_slot_name(&ctrl->hotplug_slot); } +static inline struct pci_dev *ctrl_dev(struct controller *ctrl) +{ + return ctrl->pcie->port; +} + static inline struct controller *to_ctrl(struct hotplug_slot *hotplug_slot) { return container_of(hotplug_slot, struct controller, hotplug_slot); diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index dcdbfcf404dd..458ab6f6379c 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -143,6 +143,8 @@ void pciehp_queue_pushbutton_work(struct work_struct *work) { struct controller *ctrl = container_of(work, struct controller, button_work.work); + int events = ctrl->button_work.data; + struct pci_dev *rpdev = ctrl_dev(ctrl)->rpdev; mutex_lock(&ctrl->state_lock); switch (ctrl->state) { @@ -153,6 +155,15 @@ void pciehp_queue_pushbutton_work(struct work_struct *work) pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); break; default: + if (events) { + atomic_or(events, &ctrl->pending_events); + if (!pciehp_poll_mode) + irq_wake_thread(ctrl->pcie->irq, ctrl); + } else { + if (rpdev) + clear_bit(0, + &rpdev->slot_being_removed_rescanned); + } break; } mutex_unlock(&ctrl->state_lock); @@ -160,6 +171,8 @@ void pciehp_queue_pushbutton_work(struct work_struct *work) void pciehp_handle_button_press(struct controller *ctrl) { + struct pci_dev *rpdev = ctrl_dev(ctrl)->rpdev; + mutex_lock(&ctrl->state_lock); switch (ctrl->state) { case OFF_STATE: @@ -176,6 +189,7 @@ void pciehp_handle_button_press(struct controller *ctrl) /* blink power indicator and turn off attention */ pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK, PCI_EXP_SLTCTL_ATTN_IND_OFF); + ctrl->button_work.data = 0; schedule_delayed_work(&ctrl->button_work, 5 * HZ); break; case BLINKINGOFF_STATE: @@ -199,10 +213,14 @@ void pciehp_handle_button_press(struct controller *ctrl) ctrl_info(ctrl, "Slot(%s): Button press: canceling request to power on\n", slot_name(ctrl)); } + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); break; default: ctrl_err(ctrl, "Slot(%s): Button press: ignoring invalid state %#x\n", slot_name(ctrl), ctrl->state); + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); break; } mutex_unlock(&ctrl->state_lock); @@ -210,6 +228,8 @@ void pciehp_handle_button_press(struct controller *ctrl) void pciehp_handle_disable_request(struct controller *ctrl) { + struct pci_dev *rpdev = ctrl_dev(ctrl)->rpdev; + mutex_lock(&ctrl->state_lock); switch (ctrl->state) { case BLINKINGON_STATE: @@ -221,11 +241,14 @@ void pciehp_handle_disable_request(struct controller *ctrl) mutex_unlock(&ctrl->state_lock); ctrl->request_result = pciehp_disable_slot(ctrl, SAFE_REMOVAL); + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); } void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) { int present, link_active; + struct pci_dev *rpdev = ctrl_dev(ctrl)->rpdev; /* * If the slot is on and presence or link has changed, turn it off. @@ -266,6 +289,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) slot_name(ctrl)); } mutex_unlock(&ctrl->state_lock); + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); return; } @@ -288,6 +313,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) mutex_unlock(&ctrl->state_lock); break; } + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); } static int __pciehp_enable_slot(struct controller *ctrl) @@ -408,6 +435,14 @@ int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot) int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot) { struct controller *ctrl = to_ctrl(hotplug_slot); + struct pci_dev *rpdev = ctrl_dev(ctrl)->rpdev; + + if (rpdev && test_and_set_bit(0, + &rpdev->slot_being_removed_rescanned)) { + ctrl_info(ctrl, "Slot(%s): Slot is being removed or rescanned, please try later!\n", + slot_name(ctrl)); + return -EINVAL; + } mutex_lock(&ctrl->state_lock); switch (ctrl->state) { @@ -418,6 +453,8 @@ int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot) wait_event(ctrl->requester, !atomic_read(&ctrl->pending_events) && !ctrl->ist_running); + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); return ctrl->request_result; case POWEROFF_STATE: ctrl_info(ctrl, "Slot(%s): Already in powering off state\n", @@ -436,5 +473,8 @@ int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot) } mutex_unlock(&ctrl->state_lock); + if (rpdev) + clear_bit(0, &rpdev->slot_being_removed_rescanned); + return -ENODEV; } diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index fd713abdfb9f..a0fdcd4dc358 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -45,11 +45,6 @@ static const struct dmi_system_id inband_presence_disabled_dmi_table[] = { {} }; -static inline struct pci_dev *ctrl_dev(struct controller *ctrl) -{ - return ctrl->pcie->port; -} - static irqreturn_t pciehp_isr(int irq, void *dev_id); static irqreturn_t pciehp_ist(int irq, void *dev_id); static int pciehp_poll(void *data); @@ -694,6 +689,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) { struct controller *ctrl = (struct controller *)dev_id; struct pci_dev *pdev = ctrl_dev(ctrl); + struct pci_dev *rpdev = pdev->rpdev; irqreturn_t ret; u32 events; @@ -716,8 +712,20 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) } /* Check Attention Button Pressed */ - if (events & PCI_EXP_SLTSTA_ABP) - pciehp_handle_button_press(ctrl); + if (events & PCI_EXP_SLTSTA_ABP) { + if (!rpdev || (rpdev && !test_and_set_bit(0, + &rpdev->slot_being_removed_rescanned))) + pciehp_handle_button_press(ctrl); + else { + if (ctrl->state == BLINKINGOFF_STATE || + ctrl->state == BLINKINGON_STATE) + pciehp_handle_button_press(ctrl); + else + ctrl_info(ctrl, "Slot(%s): Slot operation failed because a remove or" + " rescan operation is under processing, please try later!\n", + slot_name(ctrl)); + } + } /* Check Power Fault Detected */ if (events & PCI_EXP_SLTSTA_PFD) { @@ -741,10 +749,59 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) * or Data Link Layer State Changed events. */ down_read_nested(&ctrl->reset_lock, ctrl->depth); - if (events & DISABLE_SLOT) - pciehp_handle_disable_request(ctrl); - else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) - pciehp_handle_presence_or_link_change(ctrl, events); + if (events & DISABLE_SLOT) { + if (!rpdev || (rpdev && !test_and_set_bit(0, + &rpdev->slot_being_removed_rescanned))) + pciehp_handle_disable_request(ctrl); + else { + if (ctrl->state == BLINKINGOFF_STATE || + ctrl->state == BLINKINGON_STATE) + pciehp_handle_disable_request(ctrl); + else { + ctrl_info(ctrl, "Slot(%s): DISABLE_SLOT event in remove or rescan process!\n", + slot_name(ctrl)); + /* + * we use the work_struct private data to store + * the event type + */ + ctrl->button_work.data = DISABLE_SLOT; + /* + * If 'work.timer' is pending, schedule the work will + * cause BUG_ON(). + */ + if (!timer_pending(&ctrl->button_work.timer)) + schedule_delayed_work(&ctrl->button_work, 3 * HZ); + else + ctrl_info(ctrl, "Slot(%s): Didn't schedule delayed_work because timer is pending!\n", + slot_name(ctrl)); + } + } + } else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) { + if (!rpdev || (rpdev && !test_and_set_bit(0, + &rpdev->slot_being_removed_rescanned))) + pciehp_handle_presence_or_link_change(ctrl, events); + else { + if (ctrl->state == BLINKINGOFF_STATE || + ctrl->state == BLINKINGON_STATE) + pciehp_handle_presence_or_link_change(ctrl, + events); + else { + /* + * When we are removing or rescanning through + * sysfs, suprise link down/up happens. So we + * will handle this event 3 seconds later. + */ + ctrl_info(ctrl, "Slot(%s): Surprise link down/up in remove or rescan process!\n", + slot_name(ctrl)); + ctrl->button_work.data = events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC); + if (!timer_pending(&ctrl->button_work.timer)) + schedule_delayed_work(&ctrl->button_work, 3 * HZ); + else + ctrl_info(ctrl, "Slot(%s): Didn't schedule delayed_work because timer is pending!\n", + slot_name(ctrl)); + } + } + } up_read(&ctrl->reset_lock); ret = IRQ_HANDLED; diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index e3373cdc5244..c7e948b4a5fb 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -484,16 +484,38 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, { unsigned long val; struct pci_dev *pdev = to_pci_dev(dev); + struct pci_dev *rpdev = pdev->rpdev; if (kstrtoul(buf, 0, &val) < 0) return -EINVAL; + if (rpdev && test_and_set_bit(0, + &rpdev->slot_being_removed_rescanned)) { + pr_info("Slot is being removed or rescanned, please try later!\n"); + return -EINVAL; + } + + /* + * if 'dev' is root port itself, 'pci_stop_and_remove_bus_device()' may + * free the 'rpdev', but we need to clear + * 'rpdev->slot_being_removed_rescanned' in the end. So get 'rpdev' to + * avoid possible 'use-after-free'. + */ + if (rpdev) + pci_dev_get(rpdev); + if (val) { pci_dev_get(pdev); if (device_remove_file_self(dev, attr)) pci_stop_and_remove_bus_device_locked(pdev); pci_dev_put(pdev); } + + if (rpdev) { + clear_bit(0, &rpdev->slot_being_removed_rescanned); + pci_dev_put(rpdev); + } + return count; } static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL, diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 1681a9f454f4..ec710d190e6e 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2584,6 +2584,11 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) /* Set up MSI IRQ domain */ pci_set_msi_domain(dev); + if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) + dev->rpdev = dev; + else + dev->rpdev = pcie_find_root_port(dev); + /* Notifier could use PCI capabilities */ dev->match_driver = false; ret = device_add(&dev->dev); diff --git a/include/linux/pci.h b/include/linux/pci.h index 9c8d2cddf465..131fa5ce64df 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -529,6 +529,13 @@ struct pci_dev { /* These methods index pci_reset_fn_methods[] */ u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */ + + /* + * This flag is only set on root ports. When a slot below a root port + * is being removed or rescanned, this flag is set. + */ + unsigned long slot_being_removed_rescanned; + struct pci_dev *rpdev; /* root port pci_dev */ }; static inline struct pci_dev *pci_physfn(struct pci_dev *dev) @@ -1082,6 +1089,7 @@ extern struct bus_type pci_bus_type; /* Do NOT directly access these two variables, unless you are arch-specific PCI * code, or PCI core code. */ extern struct list_head pci_root_buses; /* List of all known PCI buses */ + /* Some device drivers need know if PCI is initiated */ int no_pci_devices(void); diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 24b1e5070f4d..571dd2ec4381 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -115,6 +115,8 @@ struct delayed_work { /* target workqueue and CPU ->timer uses to queue ->work */ struct workqueue_struct *wq; int cpu; + /* delayed_work private data, only used in pciehp now */ + unsigned long data; }; struct rcu_work { -- 2.20.1

1 0

[PATCH OLK-5.10] netlink: fix potential sleeping issue in mqueue_flush_file
by Zhengchao Shao 26 Jan '24

26 Jan '24

mainline inclusion from mainline-v6.8-rc1 commit 234ec0b6034b16869d45128b8cd2dc6ffe596f04 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8Z1LM Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 9737c3229c12..e3f34dccc97a 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -366,7 +366,7 @@ static void netlink_skb_destructor(struct sk_buff *skb) if (is_vmalloc_addr(skb->head)) { if (!skb->cloned || !atomic_dec_return(&(skb_shinfo(skb)->dataref))) - vfree(skb->head); + vfree_atomic(skb->head); skb->head = NULL; } -- 2.34.1

1 0

[PATCH openEuler-1.0-LTS] netlink: fix potential sleeping issue in mqueue_flush_file
by Zhengchao Shao 26 Jan '24

26 Jan '24

mainline inclusion from mainline-v6.8-rc1 commit 234ec0b6034b16869d45128b8cd2dc6ffe596f04 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8Z1LM Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 6b5be7a426ac..fd0fa4c37a3b 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -374,7 +374,7 @@ static void netlink_skb_destructor(struct sk_buff *skb) if (is_vmalloc_addr(skb->head)) { if (!skb->cloned || !atomic_dec_return(&(skb_shinfo(skb)->dataref))) - vfree(skb->head); + vfree_atomic(skb->head); skb->head = NULL; } -- 2.34.1

2 1

[PATCH OLK-6.6 V2] blk-mq: avoid housekeeping CPUs scheduling a worker on a non-housekeeping CPU
by Xiongfeng Wang 26 Jan '24

26 Jan '24

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8YB4M CVE: N/A --------------------------------- When NOHZ_FULL is enabled, such as in HPC situation, CPUs are divided into housekeeping CPUs and non-housekeeping CPUs. Non-housekeeping CPUs are NOHZ_FULL CPUs and are often monopolized by the userspace process, such HPC application process. Any sort of interruption is not expected. blk_mq_hctx_next_cpu() selects each cpu in 'hctx->cpumask' alternately to schedule the work thread blk_mq_run_work_fn(). When 'hctx->cpumask' contains housekeeping CPU and non-housekeeping CPU at the same time, a housekeeping CPU, which want to request a IO, may schedule a worker on a non-housekeeping CPU. This may affect the performance of the userspace application running on non-housekeeping CPUs. So let's just schedule the worker thread on the current CPU when the current CPU is housekeeping CPU. Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- block/blk-mq.c | 11 ++++++++++- include/linux/sched/isolation.h | 3 +++ kernel/sched/isolation.c | 8 ++++++++ 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 6ab7f360ff2a..52cfbeb75355 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -24,6 +24,7 @@ #include <linux/sched/sysctl.h> #include <linux/sched/topology.h> #include <linux/sched/signal.h> +#include <linux/sched/isolation.h> #include <linux/delay.h> #include <linux/crash_dump.h> #include <linux/prefetch.h> @@ -2214,9 +2215,17 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) */ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) { + int work_cpu; + if (unlikely(blk_mq_hctx_stopped(hctx))) return; - kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, + + if (enhanced_isolcpus && tick_nohz_full_enabled() && + housekeeping_cpu(raw_smp_processor_id(), HK_TYPE_WQ)) + work_cpu = raw_smp_processor_id(); + else + work_cpu = blk_mq_hctx_next_cpu(hctx); + kblockd_mod_delayed_work_on(work_cpu, &hctx->run_work, msecs_to_jiffies(msecs)); } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index fe1a46f30d24..3894e74e8dc5 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -19,6 +19,7 @@ enum hk_type { }; #ifdef CONFIG_CPU_ISOLATION +extern bool enhanced_isolcpus; DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); extern int housekeeping_any_cpu(enum hk_type type); extern const struct cpumask *housekeeping_cpumask(enum hk_type type); @@ -29,6 +30,8 @@ extern void __init housekeeping_init(void); #else +#define enhanced_isolcpus 0 + static inline int housekeeping_any_cpu(enum hk_type type) { return smp_processor_id(); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 373d42c707bc..3884c245faf5 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -239,3 +239,11 @@ static int __init housekeeping_isolcpus_setup(char *str) return housekeeping_setup(str, flags); } __setup("isolcpus=", housekeeping_isolcpus_setup); + +bool enhanced_isolcpus; +static int __init enhanced_isolcpus_setup(char *str) +{ + enhanced_isolcpus = true; + return 0; +} +__setup("enhanced_isolcpus", enhanced_isolcpus_setup); -- 2.20.1

2 1

[PATCH OLK-6.6] pci: Enable acs for QLogic HBA cards
by Zeng Heng 26 Jan '24

26 Jan '24

From: Xishi Qiu <qiuxishi(a)huawei.com> euler inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8Z1IW CVE: N/A ------------------------------------------------- Add support of port isolation for QLogic HBA cards. Signed-off-by: Xishi Qiu <qiuxishi(a)huawei.com> Signed-off-by: Fang Ying <fangying1(a)huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: Hui Wang <john.wanghui(a)huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com> Confilicts: drivers/pci/quirks.c Signed-off-by: Xuefeng Wang <wxf.wang(a)hisilicon.com> Reviewed-by: Yang Yingliang <yangyingliang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> Confilicts: drivers/pci/quirks.c Reviewed-by: wangxiongfeng <wangxiongfeng2(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Signed-off-by: Zeng Heng <zengheng4(a)huawei.com> --- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index ae95d0950772..83a875b53111 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5050,6 +5050,8 @@ static const struct pci_dev_acs_enabled { { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_intel_spt_pch_acs }, { 0x19a2, 0x710, pci_quirk_mf_endpoint_acs }, /* Emulex BE3-R */ { 0x10df, 0x720, pci_quirk_mf_endpoint_acs }, /* Emulex Skyhawk-R */ + { 0x1077, 0x2031, pci_quirk_mf_endpoint_acs}, /* QLogic QL2672 */ + { 0x1077, 0x2532, pci_quirk_mf_endpoint_acs}, /* Cavium ThunderX */ { PCI_VENDOR_ID_CAVIUM, PCI_ANY_ID, pci_quirk_cavium_acs }, /* Cavium multi-function devices */ -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS 0/2] dhugetlb: skip unexpected migration
by Liu Shixin 26 Jan '24

26 Jan '24

Fix unexpected migration of pages from dynamic hugetlb pool. Liu Shixin (2): dhugetlb: introduce page_belong_to_dynamic_hugetlb() function dhugetlb: skip unexpected migration include/linux/hugetlb.h | 5 +++++ include/linux/migrate.h | 6 +++++- mm/compaction.c | 3 +++ mm/hugetlb.c | 16 +++++++++++----- mm/mempolicy.c | 10 ++++++++-- mm/migrate.c | 3 +++ mm/page_isolation.c | 3 ++- 7 files changed, 37 insertions(+), 9 deletions(-) -- 2.25.1

2 3

[PATCH OLK-6.6] arm64: topology: Support PHYTIUM CPU
by Zeng Heng 26 Jan '24

26 Jan '24

From: Hanjun Guo <guohanjun(a)huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8Z0KI CVE: NA --------------------------- Add the support for PHYTIUM topology detect, it's better use PPTT ACPI table to report the topology, but we can live with it at now. Signed-off-by: Hanjun Guo <guohanjun(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Conflicts: drivers/base/arch_topology.c Signed-off-by: Zeng Heng <zengheng4(a)huawei.com> --- arch/arm64/include/asm/cputype.h | 1 + drivers/base/arch_topology.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 202588ad92e8..8629961ad39b 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -60,6 +60,7 @@ #define ARM_CPU_IMP_FUJITSU 0x46 #define ARM_CPU_IMP_HISI 0x48 #define ARM_CPU_IMP_APPLE 0x61 +#define ARM_CPU_IMP_PHYTIUM 0x70 #define ARM_CPU_IMP_AMPERE 0xC0 #define ARM_CPU_PART_AEM_V8 0xD0F diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index b741b5ba82bd..0db635e94492 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -862,6 +862,14 @@ void store_cpu_topology(unsigned int cpuid) cpuid_topo->core_id = cpuid; cpuid_topo->package_id = cpu_to_node(cpuid); +#if defined(CONFIG_ARM64) + if (read_cpuid_implementor() == ARM_CPU_IMP_PHYTIUM) { + cpuid_topo->thread_id = 0; + cpuid_topo->core_id = cpuid; + cpuid_topo->package_id = 0; + } +#endif + pr_debug("CPU%u: package %d core %d thread %d\n", cpuid, cpuid_topo->package_id, cpuid_topo->core_id, cpuid_topo->thread_id); -- 2.25.1

2 1

[openeuler:OLK-6.6 691/2767] binfmt_elf32.c:undefined reference to `arch_elf_adjust_prot'
by kernel test robot 25 Jan '24

25 Jan '24

tree: https://gitee.com/openeuler/kernel.git OLK-6.6 head: 6188d51075eac8cf7cf6909fec1e73fbc51aa29b commit: 7de3ab4c3dd938fae3626a6344830b018eb7ba4f [691/2767] arm64: introduce binfmt_elf32.c config: arm64-randconfig-004-20240125 (https://download.01.org/0day-ci/archive/20240125/202401252015.eYqQYSXZ-lkp@…) compiler: aarch64-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401252015.eYqQYSXZ-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401252015.eYqQYSXZ-lkp@intel.com/ All errors (new ones prefixed by >>): >> aarch64-linux-ld: Unexpected GOT/PLT entries detected! >> aarch64-linux-ld: Unexpected run-time procedure linkages detected! aarch64-linux-ld: arch/arm64/kernel/binfmt_elf32.o: in function `load_elf_interp': >> binfmt_elf32.c:(.text+0xa70): undefined reference to `arch_elf_adjust_prot' aarch64-linux-ld: arch/arm64/kernel/binfmt_elf32.o: in function `load_elf_binary': binfmt_elf32.c:(.text+0x1d88): undefined reference to `arch_elf_adjust_prot' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 0

[openeuler:OLK-6.6 2629/2767] include/linux/resctrl.h:291: undefined reference to `lockdep_is_cpus_held'
by kernel test robot 25 Jan '24

25 Jan '24

tree: https://gitee.com/openeuler/kernel.git OLK-6.6 head: 6188d51075eac8cf7cf6909fec1e73fbc51aa29b commit: c054efea286659a6504fdeafc0a611c6a3b124cc [2629/2767] x86/resctrl: Claim get_domain_from_cpu() for resctrl config: x86_64-randconfig-012-20240125 (https://download.01.org/0day-ci/archive/20240125/202401252038.VOiTn1TA-lkp@…) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401252038.VOiTn1TA-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401252038.VOiTn1TA-lkp@intel.com/ All errors (new ones prefixed by >>): ld: vmlinux.o: in function `resctrl_get_domain_from_cpu': >> include/linux/resctrl.h:291: undefined reference to `lockdep_is_cpus_held' >> ld: include/linux/resctrl.h:291: undefined reference to `lockdep_is_cpus_held' >> ld: include/linux/resctrl.h:291: undefined reference to `lockdep_is_cpus_held' vim +291 include/linux/resctrl.h 274 275 /* 276 * Caller must be in a RCU read-side critical section, or hold the 277 * cpuhp read lock to prevent the struct rdt_domain being freed. 278 */ 279 static inline struct rdt_domain * 280 resctrl_get_domain_from_cpu(int cpu, struct rdt_resource *r) 281 { 282 struct rdt_domain *d; 283 284 /* 285 * Walking r->domains, ensure it can't race with cpuhp. 286 * Because this is called via IPI by rdt_ctrl_update(), assertions 287 * about locks this thread holds will lead to false positives. Check 288 * someone is holding the CPUs lock. 289 */ 290 if (IS_ENABLED(CONFIG_LOCKDEP)) > 291 lockdep_is_cpus_held(); 292 293 list_for_each_entry_rcu(d, &r->domains, list) { 294 /* Find the domain that contains this CPU */ 295 if (cpumask_test_cpu(cpu, &d->cpu_mask)) 296 return d; 297 } 298 299 return NULL; 300 } 301 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 0

[PATCH OLK-6.6 0/1] arm64: Turn on CONFIG_IPI_AS_NMI in openeuler_defconfig
by Liao Chen 25 Jan '24

25 Jan '24

This patch turns on CONFIG_IPI_AS_NMI in openeuler_defconfig, which follows upstream commit 6188d51075ea (arm64: Add CONFIG_IPI_AS_NMI to IPI as NMI feature). CONFIG_IPI_AS_NMI=y Liao Chen (1): arm64: Turn on CONFIG_IPI_AS_NMI in openeuler_defconfig arch/arm64/configs/openeuler_defconfig | 1 + 1 file changed, 1 insertion(+) -- 2.34.1

1 0