From: Xiang Chen chenxiang66@hisilicon.com
For ssp and stp commands, most fields of slot->cmd_hdr will be overwrote by function prep_ssp_v3_hw() and prep_ata_v3_hw(), so it is not necessary to memset them in front, and just write dw7 and sg_len to 0 for them (dw5 is alway 0). For smp and abort commands, they are not used frequently and most fields of them are not overwrote, so memset slot->cmd_hdr separately for them.
Signed-off-by: Xiang Chen chenxiang66@hisilicon.com --- drivers/scsi/hisi_sas/hisi_sas_main.c | 1 - drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 +++ drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 6 ++++++ drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 8 ++++++++ 4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index fe40e3ad68c0..9792b2628933 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -439,7 +439,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba, slot->is_internal = tmf; task->lldd_task = slot;
- memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); memset(hisi_sas_cmd_hdr_addr_mem(slot), 0, HISI_SAS_COMMAND_TABLE_SZ); memset(hisi_sas_status_buf_addr_mem(slot), 0, sizeof(struct hisi_sas_err_record)); diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 3059d19e4368..282d8b9d882e 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -922,6 +922,7 @@ static void prep_smp_v1_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
+ memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* req */ sg_req = &task->smp_task.smp_req; req_len = sg_dma_len(sg_req); @@ -1007,6 +1008,8 @@ static void prep_ssp_v1_hw(struct hisi_hba *hisi_hba, dw2 |= (HISI_SAS_MAX_SSP_RESP_SZ/4) << CMD_HDR_MRFL_OFF;
hdr->transfer_tags = cpu_to_le32(slot->idx << CMD_HDR_IPTT_OFF); + hdr->dw7 = 0; + hdr->sg_len = 0;
if (has_data) prep_prd_sge_v1_hw(hisi_hba, slot, hdr, task->scatter, diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 64ed3e472e65..c7c892ac98e4 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -1705,6 +1705,7 @@ static void prep_smp_v2_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
+ memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* req */ sg_req = &task->smp_task.smp_req; req_dma_addr = sg_dma_address(sg_req); @@ -1784,6 +1785,8 @@ static void prep_ssp_v2_hw(struct hisi_hba *hisi_hba, hdr->dw2 = cpu_to_le32(dw2);
hdr->transfer_tags = cpu_to_le32(slot->idx); + hdr->dw7 = 0; + hdr->sg_len = 0;
if (has_data) prep_prd_sge_v2_hw(hisi_hba, slot, hdr, task->scatter, @@ -2551,6 +2554,8 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
/* dw3 */ hdr->transfer_tags = cpu_to_le32(slot->idx); + hdr->dw7 = 0; + hdr->sg_len = 0;
if (has_data) prep_prd_sge_v2_hw(hisi_hba, slot, hdr, task->scatter, @@ -2617,6 +2622,7 @@ static void prep_abort_v2_hw(struct hisi_hba *hisi_hba, /* Set the timeout to 10ms less than internal abort timeout */ mod_timer(timer, jiffies + msecs_to_jiffies(100));
+ memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* dw0 */ hdr->dw0 = cpu_to_le32((5 << CMD_HDR_CMD_OFF) | /*abort*/ (port->id << CMD_HDR_PORT_OFF) | diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 699c49865b77..636ca7ccd1f6 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -1262,6 +1262,9 @@ static void prep_ssp_v3_hw(struct hisi_hba *hisi_hba, hdr->dw2 = cpu_to_le32(dw2); hdr->transfer_tags = cpu_to_le32(slot->idx);
+ hdr->dw7 = 0; + hdr->sg_len = 0; + if (has_data) { prep_prd_sge_v3_hw(hisi_hba, slot, hdr, task->scatter, slot->n_elem); @@ -1343,6 +1346,7 @@ static void prep_smp_v3_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
+ memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* req */ sg_req = &task->smp_task.smp_req; req_len = sg_dma_len(sg_req); @@ -1433,6 +1437,9 @@ static void prep_ata_v3_hw(struct hisi_hba *hisi_hba, /* dw3 */ hdr->transfer_tags = cpu_to_le32(slot->idx);
+ hdr->dw7 = 0; + hdr->sg_len = 0; + if (has_data) prep_prd_sge_v3_hw(hisi_hba, slot, hdr, task->scatter, slot->n_elem); @@ -1458,6 +1465,7 @@ static void prep_abort_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_cmd_hdr *hdr = slot->cmd_hdr; struct hisi_sas_port *port = slot->port;
+ memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* dw0 */ hdr->dw0 = cpu_to_le32((5U << CMD_HDR_CMD_OFF) | /*abort*/ (port->id << CMD_HDR_PORT_OFF) |
On 19/11/2021 03:34, chenxiang wrote:
Is there any measureable performance difference here? It seems risky and sligtly messy, so would really need to be wrote it. More below.
From: Xiang Chen chenxiang66@hisilicon.com
For ssp and stp commands, most fields of slot->cmd_hdr will be overwrote
overwritten
Capitalize acroynms, like /s/ssp/SSP/
by function prep_ssp_v3_hw() and prep_ata_v3_hw(), so it is not necessary
use prep_ssp_vX_hw
to memset them in front,
/s/memset/zero/
/s/in front/up front/
and just write dw7 and sg_len to 0 for them (dw5 is alway 0). For smp and abort commands, they are not used frequently and most fields of them are not overwrote, so memset slot->cmd_hdr separately for them.
Signed-off-by: Xiang Chen chenxiang66@hisilicon.com
drivers/scsi/hisi_sas/hisi_sas_main.c | 1 - drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 +++ drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 6 ++++++ drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 8 ++++++++ 4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index fe40e3ad68c0..9792b2628933 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -439,7 +439,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba, slot->is_internal = tmf; task->lldd_task = slot;
- memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); memset(hisi_sas_cmd_hdr_addr_mem(slot), 0, HISI_SAS_COMMAND_TABLE_SZ); memset(hisi_sas_status_buf_addr_mem(slot), 0, sizeof(struct hisi_sas_err_record));
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 3059d19e4368..282d8b9d882e 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -922,6 +922,7 @@ static void prep_smp_v1_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
- memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* req */ sg_req = &task->smp_task.smp_req; req_len = sg_dma_len(sg_req);
@@ -1007,6 +1008,8 @@ static void prep_ssp_v1_hw(struct hisi_hba *hisi_hba, dw2 |= (HISI_SAS_MAX_SSP_RESP_SZ/4) << CMD_HDR_MRFL_OFF;
hdr->transfer_tags = cpu_to_le32(slot->idx << CMD_HDR_IPTT_OFF);
hdr->dw7 = 0;
hdr->sg_len = 0;
if (has_data) prep_prd_sge_v1_hw(hisi_hba, slot, hdr, task->scatter,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 64ed3e472e65..c7c892ac98e4 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -1705,6 +1705,7 @@ static void prep_smp_v2_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
- memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr));
sizeof(*hdr) is preferred
/* req */ sg_req = &task->smp_task.smp_req; req_dma_addr = sg_dma_address(sg_req); @@ -1784,6 +1785,8 @@ static void prep_ssp_v2_hw(struct hisi_hba *hisi_hba, hdr->dw2 = cpu_to_le32(dw2);
hdr->transfer_tags = cpu_to_le32(slot->idx);
- hdr->dw7 = 0;
- hdr->sg_len = 0;
please order them as they appear in the structure, i.e. sg_len before dw7. It may be that we should re-order all of them to help avoid mistake in missing zero'ing a field.
And why not zero dif_prd_table_addr? What if the previous usage of the header had this set?
if (has_data) prep_prd_sge_v2_hw(hisi_hba, slot, hdr, task->scatter, @@ -2551,6 +2554,8 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba,
/* dw3 */ hdr->transfer_tags = cpu_to_le32(slot->idx);
- hdr->dw7 = 0;
- hdr->sg_len = 0;
在 2021/11/19 23:09, John Garry 写道:
On 19/11/2021 03:34, chenxiang wrote:
Is there any measureable performance difference here? It seems risky and sligtly messy, so would really need to be wrote it. More below.
Before i saw memset() occupies some here with FlameGraph and perf. I think maybe we can remove them. For slot->cmd_hdr, we can optimize as the patch does. For slot->buf (including status buffer/command table), maybe we don't allocate it and let them allocated with request, and then they will be initialized as 0 in function scsi_init_command() (it zeroes all of them including scsi command + private data of LLDD at one time, and i think it is more efficience).
From: Xiang Chen chenxiang66@hisilicon.com
For ssp and stp commands, most fields of slot->cmd_hdr will be overwrote
overwritten
Capitalize acroynms, like /s/ssp/SSP/
ok
by function prep_ssp_v3_hw() and prep_ata_v3_hw(), so it is not necessary
use prep_ssp_vX_hw
ok
to memset them in front,
/s/memset/zero/
/s/in front/up front/
ok
and just write dw7 and sg_len to 0 for them (dw5 is alway 0). For smp and abort commands, they are not used frequently and most fields of them are not overwrote, so memset slot->cmd_hdr separately for them.
Signed-off-by: Xiang Chen chenxiang66@hisilicon.com
drivers/scsi/hisi_sas/hisi_sas_main.c | 1 - drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 +++ drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 6 ++++++ drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 8 ++++++++ 4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index fe40e3ad68c0..9792b2628933 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -439,7 +439,6 @@ void hisi_sas_task_deliver(struct hisi_hba *hisi_hba, slot->is_internal = tmf; task->lldd_task = slot;
- memset(slot->cmd_hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); memset(hisi_sas_cmd_hdr_addr_mem(slot), 0,
HISI_SAS_COMMAND_TABLE_SZ); memset(hisi_sas_status_buf_addr_mem(slot), 0, sizeof(struct hisi_sas_err_record)); diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 3059d19e4368..282d8b9d882e 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -922,6 +922,7 @@ static void prep_smp_v1_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
- memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr)); /* req */ sg_req = &task->smp_task.smp_req; req_len = sg_dma_len(sg_req);
@@ -1007,6 +1008,8 @@ static void prep_ssp_v1_hw(struct hisi_hba *hisi_hba, dw2 |= (HISI_SAS_MAX_SSP_RESP_SZ/4) << CMD_HDR_MRFL_OFF; hdr->transfer_tags = cpu_to_le32(slot->idx << CMD_HDR_IPTT_OFF);
- hdr->dw7 = 0;
- hdr->sg_len = 0; if (has_data) prep_prd_sge_v1_hw(hisi_hba, slot, hdr, task->scatter,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 64ed3e472e65..c7c892ac98e4 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -1705,6 +1705,7 @@ static void prep_smp_v2_hw(struct hisi_hba *hisi_hba, dma_addr_t req_dma_addr; unsigned int req_len;
- memset(hdr, 0, sizeof(struct hisi_sas_cmd_hdr));
sizeof(*hdr) is preferred
ok
/* req */ sg_req = &task->smp_task.smp_req; req_dma_addr = sg_dma_address(sg_req);
@@ -1784,6 +1785,8 @@ static void prep_ssp_v2_hw(struct hisi_hba *hisi_hba, hdr->dw2 = cpu_to_le32(dw2); hdr->transfer_tags = cpu_to_le32(slot->idx);
- hdr->dw7 = 0;
- hdr->sg_len = 0;
please order them as they appear in the structure, i.e. sg_len before dw7. It may be that we should re-order all of them to help avoid mistake in missing zero'ing a field.
ok
And why not zero dif_prd_table_addr? What if the previous usage of the header had this set?
Hdr->sg_len includes two part: length of DATA SGL/SEG and DIF SGL/SGE. Only when the length is not zero, then hardware will access those data. If length of DATA SGL or DIF SGL/SGE is not zero, the field DATA SGL/SGE and DIF SGL/SGE base address will be overwritten. If all of them is zero, we don't need to care about it as the hardware will not access them. So i think we don't need to zero them.
if (has_data) prep_prd_sge_v2_hw(hisi_hba, slot, hdr, task->scatter,
@@ -2551,6 +2554,8 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba, /* dw3 */ hdr->transfer_tags = cpu_to_le32(slot->idx);
- hdr->dw7 = 0;
- hdr->sg_len = 0;
.
On 22/11/2021 02:24, chenxiang (M) wrote:
在 2021/11/19 23:09, John Garry 写道:
On 19/11/2021 03:34, chenxiang wrote:
Is there any measureable performance difference here? It seems risky and sligtly messy, so would really need to be wrote it. More below.
Before i saw memset() occupies some here with FlameGraph and perf. I think maybe we can remove them.
I'd still rather know the performance gain.
For slot->cmd_hdr, we can optimize as the patch does. For slot->buf (including status buffer/command table), maybe we don't allocate it and let them allocated with request, and then they will be initialized as 0 in function scsi_init_command() (it zeroes all of them including scsi command + private data of LLDD at one time, and i think it is more efficience).
From: Xiang Chen chenxiang66@hisilicon.com
...
And why not zero dif_prd_table_addr? What if the previous usage of the header had this set?
Hdr->sg_len includes two part: length of DATA SGL/SEG and DIF SGL/SGE. Only when the length is not zero, then hardware will access those data. If length of DATA SGL or DIF SGL/SGE is not zero, the field DATA SGL/SGE and DIF SGL/SGE base address will be overwritten. If all of them is zero, we don't need to care about it as the hardware will not access them. So i think we don't need to zero them.
I'll still just rather zero these fields for safety
Thanks, John
在 2021/11/22 18:23, John Garry 写道:
On 22/11/2021 02:24, chenxiang (M) wrote:
在 2021/11/19 23:09, John Garry 写道:
On 19/11/2021 03:34, chenxiang wrote:
Is there any measureable performance difference here? It seems risky and sligtly messy, so would really need to be wrote it. More below.
Before i saw memset() occupies some here with FlameGraph and perf. I think maybe we can remove them.
I'd still rather know the performance gain.
This is a small change, and i don't think we can see any different from the performance. If you think it is not need to do that, i am ok to drop it.
For slot->cmd_hdr, we can optimize as the patch does. For slot->buf (including status buffer/command table), maybe we don't allocate it and let them allocated with request, and then they will be initialized as 0 in function scsi_init_command() (it zeroes all of them including scsi command + private data of LLDD at one time, and i think it is more efficience).
From: Xiang Chen chenxiang66@hisilicon.com
...
And why not zero dif_prd_table_addr? What if the previous usage of the header had this set?
Hdr->sg_len includes two part: length of DATA SGL/SEG and DIF SGL/SGE. Only when the length is not zero, then hardware will access those data. If length of DATA SGL or DIF SGL/SGE is not zero, the field DATA SGL/SGE and DIF SGL/SGE base address will be overwritten. If all of them is zero, we don't need to care about it as the hardware will not access them. So i think we don't need to zero them.
I'll still just rather zero these fields for safety
Thanks, John
.