From: Ziming Du <duziming2@huawei.com> Offering: HULK hulk inclusion category: bugfix bugzilla: 190803 -------------------------------- During system shutdown, a deadlock may occur between AER recovery process and device shutdown as follows: The device_shutdown path holds the device_lock throughout the entire process and waits for the irq handlers to complete when release nodes: device_shutdown device_lock # A hold device_lock pci_device_shutdown pcie_port_device_remove remove_iter device_unregister device_del bus_remove_device device_release_driver devres_release_all release_nodes # B wait for irq handlers The aer_isr path will acquire device_lock in pci_bus_reset(): aer_isr # B execute irq process aer_isr_one_error aer_process_err_devices handle_error_source pcie_do_recovery aer_root_reset pci_bus_error_reset pci_bus_reset # A acquire device_lock The circular dependency causes system hang. Fix it by using pci_bus_trylock() instead of pci_bus_lock() in pci_bus_reset(). When the lock is unavailable, return -EAGAIN, as in similar cases. Fixes: c4eed62a2143 ("PCI/ERR: Use slot reset if available") Signed-off-by: Ziming Du <duziming2@huawei.com> Signed-off-by: Zhang Hongtao <zhanghongtao35@huawei.com> --- drivers/pci/pci.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b93605616d4e4..d1a8531df0271 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5309,15 +5309,21 @@ static int pci_bus_reset(struct pci_bus *bus, int probe) if (probe) return 0; - pci_bus_lock(bus); + /* + * Replace blocking lock with trylock to prevent deadlock during bus reset. + * Same as above except return -EAGAIN if the bus cannot be locked. + */ + if (pci_bus_trylock(bus)) { + might_sleep(); - might_sleep(); + ret = pci_bridge_secondary_bus_reset(bus->self); - ret = pci_bridge_secondary_bus_reset(bus->self); + pci_bus_unlock(bus); - pci_bus_unlock(bus); + return ret; + } - return ret; + return -EAGAIN; } /** -- 2.43.0