From: xiabing xiabing12@h-partners.com
bugfix
Xingui Yang (2): scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing scsi: hisi_sas: Change DMA setup lock timeout to 2.5s
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
From: Xingui Yang yangxingui@huawei.com
mainline inclusion from mainline-v6.2-rc1 commit 4ef4f1a6155571d3d53583a4e8e7ccbbec220b8a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I76WNL CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
When an NCQ error occurs, the controller will abnormally complete the I/Os that are newly delivered to disk, and bit8 in CQ dw3 will be set which indicates that the SATA disk is in error state. The current processing flow is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set FIS stat to ATA_ERR. After analyzing the I/O by ata_eh_analyze_tf(), err_mask will set to AC_ERR_HSM. If media error occurs for four times within 10 minutes and the chip rejects new I/Os for four times, NCQ will be disabled due to excessive errors, which is undesirable.
Therefore, use sas_task_abort() to handle abnormally completed I/Os when SATA disk is in error state, as these abnormally completed I/Os are already processed by sas_ata_device_link_abort() and qc->flag are set to ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be modified in EH. Unlike the current process flow, it will not increase the count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the disk that do not have an error but do not return after the NCQ error, they are retried after the EH.
Signed-off-by: Xingui Yang yangxingui@huawei.com Signed-off-by: John Garry john.garry@huawei.com Link: https://lore.kernel.org/r/1665998435-199946-5-git-send-email-john.garry@huaw... Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 6e40c4c06650..b282188f2971 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -427,6 +427,8 @@ #define CMPLT_HDR_DEV_ID_OFF 16 #define CMPLT_HDR_DEV_ID_MSK (0xffff << CMPLT_HDR_DEV_ID_OFF) /* dw3 */ +#define SATA_DISK_IN_ERROR_STATUS_OFF 8 +#define SATA_DISK_IN_ERROR_STATUS_MSK (0x1 << SATA_DISK_IN_ERROR_STATUS_OFF) #define CMPLT_HDR_SATA_DISK_ERR_OFF 16 #define CMPLT_HDR_SATA_DISK_ERR_MSK (0x1 << CMPLT_HDR_SATA_DISK_ERR_OFF) #define CMPLT_HDR_IO_IN_TARGET_OFF 17 @@ -2238,7 +2240,8 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, } else if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) { ts->residual = trans_tx_fail_type; ts->stat = SAS_DATA_UNDERRUN; - } else if (dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) { + } else if ((dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) || + (dw3 & SATA_DISK_IN_ERROR_STATUS_MSK)) { ts->stat = SAS_PHY_DOWN; slot->abort = 1; } else {
From: Xingui Yang yangxingui@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I76WNL CVE: NA
----------------------------------------------------------------------
DMA setup lock timeout protection is added when DMA setup frames are received, it's a function outside the protocol and used to prevent SATA disk I/Os from being delivered for a long time. The default value is 100ms, it's too strict and easily triggered timeout when the disk is overloaded or faulty. Based on the average I/O latency of 300 disks, we adjust the value to 2.5s.
Signed-off-by: Xingui Yang yangxingui@huawei.com Signed-off-by: Yihang Li liyihang9@huawei.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: xiabing xiabing12@h-partners.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index b282188f2971..22c4ba835bda 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -29,6 +29,7 @@ #define SATA_INITI_D2H_STORE_ADDR_LO 0x60 #define SATA_INITI_D2H_STORE_ADDR_HI 0x64 #define CFG_MAX_TAG 0x68 +#define TRANS_LOCK_ICT_TIME 0X70 #define HGC_SAS_TX_OPEN_FAIL_RETRY_CTRL 0x84 #define HGC_SAS_TXFAIL_RETRY_CTRL 0x88 #define HGC_GET_ITV_TIME 0x90 @@ -626,6 +627,8 @@ static void init_reg_v3_hw(struct hisi_hba *hisi_hba) hisi_sas_write32(hisi_hba, DLVRY_QUEUE_ENABLE, (u32)((1ULL << hisi_hba->queue_count) - 1)); hisi_sas_write32(hisi_hba, CFG_MAX_TAG, 0xfff0400); + /* time / CLK_AHB = 2.5s / 2ns = 0x4A817C80 */ + hisi_sas_write32(hisi_hba, TRANS_LOCK_ICT_TIME, 0x4A817C80); hisi_sas_write32(hisi_hba, HGC_SAS_TXFAIL_RETRY_CTRL, 0x108); hisi_sas_write32(hisi_hba, CFG_AGING_TIME, 0x1); hisi_sas_write32(hisi_hba, INT_COAL_EN, 0x1); @@ -2967,6 +2970,7 @@ static const struct hisi_sas_debugfs_reg_lu debugfs_global_reg_lu[] = { HISI_SAS_DEBUGFS_REG(SATA_INITI_D2H_STORE_ADDR_LO), HISI_SAS_DEBUGFS_REG(SATA_INITI_D2H_STORE_ADDR_HI), HISI_SAS_DEBUGFS_REG(CFG_MAX_TAG), + HISI_SAS_DEBUGFS_REG(TRANS_LOCK_ICT_TIME), HISI_SAS_DEBUGFS_REG(HGC_SAS_TX_OPEN_FAIL_RETRY_CTRL), HISI_SAS_DEBUGFS_REG(HGC_SAS_TXFAIL_RETRY_CTRL), HISI_SAS_DEBUGFS_REG(HGC_GET_ITV_TIME),