From: xiabing xiabing12@h-partners.com
Arnd Bergmann (1): scsi: hisi_sas: Work around build failure in suspend function
Qi Liu (1): scsi: hisi_sas: Add slave_destroy interface for v3 hw
Xiang Chen (1): scsi: sd: try more retries of START_STOP when resuming scsi device
Yihang Li (2): scsi: hisi_sas: Block requests before take debugfs snapshot scsi: hisi_sas: Check usage count only when the runtime PM status is RPM_SUSPENDING
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 31 +++++++++++++++++++++----- drivers/scsi/sd.c | 24 +++++++++++++++----- 2 files changed, 43 insertions(+), 12 deletions(-)
From: Xiang Chen chenxiang66@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BNF8 CVE: NA
----------------------------------------------------------------------
When sending START_STOP commands to resume scsi_device, it may be interrupted by exception operations such as host reset or FLR. Once the command of START_STOP is failed, the runtime_status of scsi device will be error and it is difficult for user to recover it. So try more retries to increase robustness as the process of command SYNCHRONIZE_CACHE in function sd_sync_cache() when suspending scsi device.
Signed-off-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/sd.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index b5bd2d3c0e87..7c8bfb26c995 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3663,6 +3663,7 @@ static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) { struct scsi_disk *sdkp = dev_get_drvdata(dev); struct scsi_sense_hdr sshdr; + int retries; int ret = 0;
if (!sdkp) /* E.g.: runtime suspend following sd_remove() */ @@ -3693,9 +3694,15 @@ static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) if (sdkp->device->manage_start_stop) { sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); /* an error is not worth aborting a system sleep */ - ret = sd_start_stop_device(sdkp, 0); - if (ignore_stop_errors) - ret = 0; + for (retries = 3; retries > 0; --retries) { + ret = sd_start_stop_device(sdkp, 0); + if (!ret) + break; + if (ignore_stop_errors) { + ret = 0; + break; + } + } }
return ret; @@ -3714,6 +3721,7 @@ static int sd_suspend_runtime(struct device *dev) static int sd_resume(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); + int retries; int ret;
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ @@ -3723,9 +3731,13 @@ static int sd_resume(struct device *dev) return 0;
sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); - ret = sd_start_stop_device(sdkp, 1); - if (!ret) - opal_unlock_from_suspend(sdkp->opal_dev); + for (retries = 3; retries > 0; --retries) { + ret = sd_start_stop_device(sdkp, 1); + if (!ret) { + opal_unlock_from_suspend(sdkp->opal_dev); + break; + } + } return ret; }
From: Qi Liu liuqi115@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BNF8 CVE: NA
----------------------------------------------------------------------
A WARNING is triggered when executing link reset of remote PHY and rmmod SAS driver simultaneously. Following is the WARNING log:
WARNING: CPU: 61 PID: 21818 at drivers/base/core.c:1347 __device_links_no_driver+0xb4/0xc0 Call trace: __device_links_no_driver+0xb4/0xc0 device_links_driver_cleanup+0xb0/0xfc __device_release_driver+0x198/0x23c device_release_driver+0x38/0x50 bus_remove_device+0x130/0x140 device_del+0x184/0x434 __scsi_remove_device+0x118/0x150 scsi_remove_target+0x1bc/0x240 sas_rphy_remove+0x90/0x94 sas_rphy_delete+0x24/0x3c sas_destruct_devices+0x64/0xa0 [libsas] sas_revalidate_domain+0xe4/0x150 [libsas] process_one_work+0x1e0/0x46c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x20 ---[ end trace 71e059eb58f85d4a ]---
During SAS phy up, link->status is set to DL_STATE_AVAILABLE in device_links_driver_bound, then this setting influences __device_links_no_driver() before driver rmmod and caused WARNING.
So we add the slave_destroy interface, to make sure link is removed after flush workque.
Fixes: 16fd4a7c59170 ("scsi: hisi_sas: Add device link between SCSI devices and hisi_hba") Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 22c4ba835bda..8fcca9af507b 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2870,7 +2870,8 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) return 0;
if (!device_link_add(&sdev->sdev_gendev, dev, - DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)) { + DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME | + DL_FLAG_RPM_ACTIVE)) { if (pm_runtime_enabled(dev)) { dev_info(dev, "add device link failed, disable runtime PM for the host\n"); pm_runtime_disable(dev); @@ -2880,6 +2881,15 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) return 0; }
+static void slave_destroy_v3_hw(struct scsi_device *sdev) +{ + struct Scsi_Host *shost = dev_to_shost(&sdev->sdev_gendev); + struct hisi_hba *hisi_hba = shost_priv(shost); + struct device *dev = hisi_hba->dev; + + device_link_remove(&sdev->sdev_gendev, dev); +} + static struct device_attribute *host_attrs_v3_hw[] = { &dev_attr_phy_event_threshold, &dev_attr_intr_conv_v3_hw, @@ -3268,6 +3278,7 @@ static struct scsi_host_template sht_v3_hw = { .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, .slave_alloc = hisi_sas_slave_alloc, + .slave_destroy = slave_destroy_v3_hw, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BNF8 CVE: NA
----------------------------------------------------------------------
When the FIO is running and the dump is triggered continuously, some SATA I/Os fail to be returned to the upper layer due to the setting of HISI_SAS_REJECT_CMD_BIT. The SCSI layer invokes the error processing thread. However, sas_ata_hard_reset() also fails to be reset due to the setting of HISI_SAS_REJECT_CMD_BIT. As a result, the device is disabled. Call scsi_block_requests() and wait command complete before setting HISI_SAS_REJECT_CMD_BIT to avoid SATA I/O failures.
Signed-off-by: Yihang Li liyihang9@huawei.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 8fcca9af507b..c89b30a89735 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -3074,21 +3074,24 @@ static const struct hisi_sas_debugfs_reg debugfs_ras_reg = {
static void debugfs_snapshot_prepare_v3_hw(struct hisi_hba *hisi_hba) { - set_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags); - - hisi_sas_write32(hisi_hba, DLVRY_QUEUE_ENABLE, 0); + struct Scsi_Host *shost = hisi_hba->shost;
+ scsi_block_requests(shost); wait_cmds_complete_timeout_v3_hw(hisi_hba, 100, 5000); - + set_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags); hisi_sas_sync_irqs(hisi_hba); + hisi_sas_write32(hisi_hba, DLVRY_QUEUE_ENABLE, 0); }
static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba) { + struct Scsi_Host *shost = hisi_hba->shost; + hisi_sas_write32(hisi_hba, DLVRY_QUEUE_ENABLE, (u32)((1ULL << hisi_hba->queue_count) - 1));
clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags); + scsi_unblock_requests(shost); }
static void read_iost_itct_cache_v3_hw(struct hisi_hba *hisi_hba,
From: Arnd Bergmann arnd@arndb.de
mainline inclusion from mainline-v6.4-rc1 commit e01e2290f094 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BNF8 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
The suspend/resume functions in this driver seem to have multiple problems, the latest one just got introduced by a bugfix:
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw': drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count' 5142 | if (atomic_read(&device->power.usage_count)) { drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw': drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count' 5142 | if (atomic_read(&device->power.usage_count)) {
As far as I can tell, the 'usage_count' is not meant to be accessed by device drivers at all, though I don't know what the driver is supposed to do instead.
Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and marking functions as __maybe_unused to avoid warnings about unused functions. This should probably be changed to using DEFINE_RUNTIME_DEV_PM_OPS().
Both changes require actually understanding what the driver needs to do, and being able to test this, so instead here is the simplest patch to make it pass the randconfig builds instead.
Fixes: e368d38cb952 ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0") Signed-off-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.org Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index c89b30a89735..84fb2873a6a1 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -5101,11 +5101,13 @@ static int _suspend_v3_hw(struct device *device) flush_workqueue(hisi_hba->wq); interrupt_disable_v3_hw(hisi_hba);
+#ifdef CONFIG_PM if (atomic_read(&device->power.usage_count)) { dev_err(dev, "PM suspend: host status cannot be suspended\n"); rc = -EBUSY; goto err_out; } +#endif
rc = disable_host_v3_hw(hisi_hba); if (rc) { @@ -5124,7 +5126,9 @@ static int _suspend_v3_hw(struct device *device)
err_out_recover_host: enable_host_v3_hw(hisi_hba); +#ifdef CONFIG_PM err_out: +#endif interrupt_enable_v3_hw(hisi_hba); clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags); clear_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags);
Users can suspend the machine with 'echo disk > /sys/power/state', but the suspend will fail because the SAS controller cannot be suspended:
[root@localhost ~]# echo freeze > /sys/power/state -bash: echo: write error: Device or resource busy [15104.142955] PM: suspend entry (s2idle) ... [15104.283465] hisi_sas_v3_hw 0000:32:04.0: entering suspend state [15104.283480] hisi_sas_v3_hw 0000:30:04.0: entering suspend state [15104.283500] hisi_sas_v3_hw 0000:32:04.0: PM suspend: host status cannot be suspended [15104.283508] hisi_sas_v3_hw 0000:30:04.0: PM suspend: host status cannot be suspended [15104.283516] hisi_sas_v3_hw 0000:32:04.0: PM: pci_pm_suspend(): suspend_v3_hw+0x0/0x210 [hisi_sas_v3_hw] returns -16 [15104.283527] hisi_sas_v3_hw 0000:32:04.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1c0 returns -16 [15104.283524] hisi_sas_v3_hw 0000:30:04.0: PM: pci_pm_suspend(): suspend_v3_hw+0x0/0x210 [hisi_sas_v3_hw] returns -16 [15104.283533] hisi_sas_v3_hw 0000:32:04.0: PM: failed to suspend async: error -16 [15104.283536] hisi_sas_v3_hw 0000:30:04.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1c0 returns -16 [15104.283542] hisi_sas_v3_hw 0000:30:04.0: PM: failed to suspend async: error -16
The problem is that when the ->runtime_suspend() callback suspend_v3_hw() is executing, the current runtime PM status is RPM_ACTIVE and the usage count of the controller is not 0, so return immediately.
To fix it, Check the device usage count only when the runtime PM status is RPM_SUSPENDING.
Signed-off-by: Yihang Li liyihang9@huawei.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 84fb2873a6a1..5a5957834924 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -5102,7 +5102,8 @@ static int _suspend_v3_hw(struct device *device) interrupt_disable_v3_hw(hisi_hba);
#ifdef CONFIG_PM - if (atomic_read(&device->power.usage_count)) { + if ((device->power.runtime_status == RPM_SUSPENDING) && + atomic_read(&device->power.usage_count)) { dev_err(dev, "PM suspend: host status cannot be suspended\n"); rc = -EBUSY; goto err_out;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/1035 邮件列表地址: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/WF...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/1035 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/WF...