- Linuxarm - mailweb.openeuler.org

Re: [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling driver
by Shameerali Kolothum Thodi 19 Dec '24

19 Dec '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Thursday, December 19, 2024 9:18 AM > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com> > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > Subject: [PATCH v2 4/5] hisi_acc_vfio_pci: bugfix the problem of uninstalling > driver > > In a live migration scenario. If the number of VFs at the destination is > greater than the source, the recovery operation will fail and qemu will not > be able to complete the process and exit after shutting down the device FD. > > This will cause the driver to be unable to be unloaded normally due to > abnormal reference counting of the live migration driver caused by the > abnormal closing operation of fd. > > Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live > migration") > Signed-off-by: Longfang Liu <liulongfang(a)huawei.com> > --- > drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > index c057c0e24693..8d9e07ebf4fd 100644 > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > @@ -1501,6 +1501,7 @@ static void hisi_acc_vfio_pci_close_device(struct > vfio_device *core_vdev) > struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(core_vdev); > struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; > > + hisi_acc_vf_disable_fds(hisi_acc_vdev); > mutex_lock(&hisi_acc_vdev->open_mutex); > hisi_acc_vdev->dev_opened = false; > iounmap(vf_qm->io_base); Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com> Thanks, Shameer

1 0

Re: [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue
by Shameerali Kolothum Thodi 19 Dec '24

19 Dec '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Thursday, December 19, 2024 9:18 AM > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com> > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > Subject: [PATCH v2 3/5] hisi_acc_vfio_pci: bugfix cache write-back issue > > At present, cache write-back is placed in the device data > copy stage after stopping the device operation. > Writing back to the cache at this stage will cause the data > obtained by the cache to be written back to be empty. > > In order to ensure that the cache data is written back > successfully, the data needs to be written back into the > stop device stage. > > Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live > migration") > Signed-off-by: Longfang Liu <liulongfang(a)huawei.com> > --- > drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > index 4c8f1ae5b636..c057c0e24693 100644 > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > @@ -559,7 +559,6 @@ static int vf_qm_state_save(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > { > struct acc_vf_data *vf_data = &migf->vf_data; > struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; > - struct device *dev = &vf_qm->pdev->dev; > int ret; > > if (unlikely(qm_wait_dev_not_ready(vf_qm))) { > @@ -573,12 +572,6 @@ static int vf_qm_state_save(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > vf_data->vf_qm_state = QM_READY; > hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state; > > - ret = vf_qm_cache_wb(vf_qm); > - if (ret) { > - dev_err(dev, "failed to writeback QM Cache!\n"); > - return ret; > - } > - > ret = vf_qm_read_data(vf_qm, vf_data); > if (ret) > return -EINVAL; > @@ -1005,6 +998,13 @@ static int hisi_acc_vf_stop_device(struct > hisi_acc_vf_core_device *hisi_acc_vdev > dev_err(dev, "failed to check QM INT state!\n"); > return ret; > } > + > + ret = vf_qm_cache_wb(vf_qm); > + if (ret) { > + dev_err(dev, "failed to writeback QM cache!\n"); > + return ret; > + } > + > return 0; > } Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com> Thanks, Shameer

1 0

Re: [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption restore
by Shameerali Kolothum Thodi 19 Dec '24

19 Dec '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Thursday, December 19, 2024 9:18 AM > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com> > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > Subject: [PATCH v2 2/5] hisi_acc_vfio_pci: add eq and aeq interruption > restore > > In order to ensure that the task packets of the accelerator > device are not lost during the migration process, it is necessary > to send an EQ and AEQ command to the device after the live migration > is completed and to update the completion position of the task queue. > > Let the device recheck the completed tasks data and if there are > uncollected packets, device resend a task completion interrupt > to the software. > > Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live > migration") > Signed-off-by: Longfang Liu <liulongfang(a)huawei.com> > --- > drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > index 8518efea3a52..4c8f1ae5b636 100644 > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > @@ -463,6 +463,19 @@ static int vf_qm_get_match_data(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > return 0; > } > > +static void vf_qm_xeqc_save(struct hisi_qm *qm, > + struct hisi_acc_vf_migration_file *migf) > +{ > + struct acc_vf_data *vf_data = &migf->vf_data; > + u16 eq_head, aeq_head; > + > + eq_head = vf_data->qm_eqc_dw[0] & 0xFFFF; > + qm_db(qm, 0, QM_DOORBELL_CMD_EQ, eq_head, 0); > + > + aeq_head = vf_data->qm_aeqc_dw[0] & 0xFFFF; > + qm_db(qm, 0, QM_DOORBELL_CMD_AEQ, aeq_head, 0); > +} > + > static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev, > struct hisi_acc_vf_migration_file *migf) > { > @@ -571,6 +584,9 @@ static int vf_qm_state_save(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > return -EINVAL; > > migf->total_length = sizeof(struct acc_vf_data); > + /* Save eqc and aeqc interrupt information */ > + vf_qm_xeqc_save(vf_qm, migf); > + > return 0; > } Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com> Thanks, Shameer

1 0

Re: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error
by Shameerali Kolothum Thodi 19 Dec '24

19 Dec '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Thursday, December 19, 2024 9:18 AM > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com> > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > Subject: [PATCH v2 1/5] hisi_acc_vfio_pci: fix XQE dma address error > > The dma addresses of EQE and AEQE are wrong after migration and > results in guest kernel-mode encryption services failure. > Comparing the definition of hardware registers, we found that > there was an error when the data read from the register was > combined into an address. Therefore, the address combination > sequence needs to be corrected. > > Even after fixing the above problem, we still have an issue > where the Guest from an old kernel can get migrated to > new kernel and may result in wrong data. > > In order to ensure that the address is correct after migration, > if an old magic number is detected, the dma address needs to be > updated. > > Fixes:b0eed085903e("hisi_acc_vfio_pci: Add support for VFIO live > migration") > Signed-off-by: Longfang Liu <liulongfang(a)huawei.com> > --- > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 34 +++++++++++++++---- > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 9 ++++- > 2 files changed, 36 insertions(+), 7 deletions(-) > > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > index 451c639299eb..8518efea3a52 100644 > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > @@ -350,6 +350,27 @@ static int vf_qm_func_stop(struct hisi_qm *qm) > return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0); > } > > +static int vf_qm_magic_check(struct acc_vf_data *vf_data) > +{ > + switch (vf_data->acc_magic) { > + case ACC_DEV_MAGIC_V2: > + break; > + case ACC_DEV_MAGIC_V1: > + /* Correct dma address */ > + vf_data->eqe_dma = vf_data- > >qm_eqc_dw[QM_XQC_ADDR_HIGH]; > + vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET; > + vf_data->eqe_dma |= vf_data- > >qm_eqc_dw[QM_XQC_ADDR_LOW]; > + vf_data->aeqe_dma = vf_data- > >qm_aeqc_dw[QM_XQC_ADDR_HIGH]; > + vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET; > + vf_data->aeqe_dma |= vf_data- > >qm_aeqc_dw[QM_XQC_ADDR_LOW]; > + break; > + default: > + return -EINVAL; > + } > + > + return 0; > +} > + > static int vf_qm_check_match(struct hisi_acc_vf_core_device > *hisi_acc_vdev, > struct hisi_acc_vf_migration_file *migf) > { > @@ -363,7 +384,8 @@ static int vf_qm_check_match(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev- > >match_done) > return 0; > > - if (vf_data->acc_magic != ACC_DEV_MAGIC) { > + ret = vf_qm_magic_check(vf_data); > + if (ret) { > dev_err(dev, "failed to match ACC_DEV_MAGIC\n"); > return -EINVAL; > } > @@ -418,7 +440,7 @@ static int vf_qm_get_match_data(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > int vf_id = hisi_acc_vdev->vf_id; > int ret; > > - vf_data->acc_magic = ACC_DEV_MAGIC; > + vf_data->acc_magic = ACC_DEV_MAGIC_V2; > /* Save device id */ > vf_data->dev_id = hisi_acc_vdev->vf_dev->device; > > @@ -496,12 +518,12 @@ static int vf_qm_read_data(struct hisi_qm > *vf_qm, struct acc_vf_data *vf_data) > return -EINVAL; > > /* Every reg is 32 bit, the dma address is 64 bit. */ > - vf_data->eqe_dma = vf_data->qm_eqc_dw[1]; > + vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH]; > vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET; > - vf_data->eqe_dma |= vf_data->qm_eqc_dw[0]; > - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1]; > + vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW]; > + vf_data->aeqe_dma = vf_data- > >qm_aeqc_dw[QM_XQC_ADDR_HIGH]; > vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET; > - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0]; > + vf_data->aeqe_dma |= vf_data- > >qm_aeqc_dw[QM_XQC_ADDR_LOW]; > > /* Through SQC_BT/CQC_BT to get sqc and cqc address */ > ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma); > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h > index 245d7537b2bc..2afce68f5a34 100644 > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h > @@ -39,6 +39,9 @@ > #define QM_REG_ADDR_OFFSET 0x0004 > > #define QM_XQC_ADDR_OFFSET 32U > +#define QM_XQC_ADDR_LOW 0x1 > +#define QM_XQC_ADDR_HIGH 0x2 > + > #define QM_VF_AEQ_INT_MASK 0x0004 > #define QM_VF_EQ_INT_MASK 0x000c > #define QM_IFC_INT_SOURCE_V 0x0020 > @@ -50,10 +53,14 @@ > #define QM_EQC_DW0 0X8000 > #define QM_AEQC_DW0 0X8020 > > +enum acc_magic_num { > + ACC_DEV_MAGIC_V1 = 0XCDCDCDCDFEEDAACC, > + ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECADEDE, I think we have discussed this before that having some kind of version info embed into magic_num will be beneficial going forward. ie, may be use the last 4 bytes for denoting version. ACC_DEV_MAGIC_V2 = 0xAACCFEEDDECA0002 The reason being, otherwise we have to come up with a random magic each time when a fix like this is required in future. Thanks, Shameer

1 0

Re: [PATCH v15 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
by Shameerali Kolothum Thodi 13 Nov '24

13 Nov '24

> -----Original Message----- > From: Alex Williamson <alex.williamson(a)redhat.com> > Sent: Tuesday, November 12, 2024 9:51 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi(a)huawei.com> > Cc: liulongfang <liulongfang(a)huawei.com>; jgg(a)nvidia.com; Jonathan > Cameron <jonathan.cameron(a)huawei.com>; kvm(a)vger.kernel.org; linux- > kernel(a)vger.kernel.org; linuxarm(a)openeuler.org > Subject: Re: [PATCH v15 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon > migration driver > > On Tue, 12 Nov 2024 08:40:03 +0000 > Shameerali Kolothum Thodi <shameerali.kolothum.thodi(a)huawei.com> > wrote: > > > > -----Original Message----- > > > From: liulongfang <liulongfang(a)huawei.com> > > > Sent: Tuesday, November 12, 2024 7:33 AM > > > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > > > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > > > <jonathan.cameron(a)huawei.com> > > > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > > > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > > > Subject: [PATCH v15 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon > > > migration driver > > > > > > > > > +static void hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device > > > *hisi_acc_vdev) > > > +{ > > > + struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev; > > > + struct hisi_acc_vf_migration_file *migf = NULL; > > > + struct dentry *vfio_dev_migration = NULL; > > > + struct dentry *vfio_hisi_acc = NULL; > > > > Nit, I think we can get rid of these NULL initializations. > > Yup, all three are unnecessary. > > > If you have time, please consider respin (sorry, missed this in earlier > reviews.) > > If that's the only comment, I can fix that on commit if you want to add > an ack/review conditional on that change. Thanks, Thanks Alex. With the above nits addressed, Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com> Shameer

1 0

Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
by Jonathan Cameron 13 Nov '24

13 Nov '24

On Tue, 12 Nov 2024 19:22:27 +0800 Xiaofei Tan <tanxiaofei(a)huawei.com> wrote: > When the module hed is built-in, the init order is determined by > Makefile order. That order violates expectations. Because the module > hed init is behind evged. RAS records can't be handled in the > special time window that evged has initialized while hed not. > If the number of such RAS records is more than the APEI HEST error > source number, the HEST resources could be occupied all, and then > could affect subsequent RAS error reporting. Looks good but I'd like a comment in the makefile to cut down on risk of this breaking again. > > Signed-off-by: Xiaofei Tan <tanxiaofei(a)huawei.com> > --- > drivers/acpi/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index 61ca4afe83dc..73a6b490d6a0 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -14,6 +14,7 @@ tables.o: $(src)/../../include/$(CONFIG_ACPI_CUSTOM_DSDT_FILE) ; > endif > > obj-$(CONFIG_ACPI) += tables.o > +obj-$(CONFIG_ACPI_HED) += hed.o Add a comment on why it is up here. > > # > # ACPI Core Subsystem (Interpreter) > @@ -95,7 +96,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o > obj-$(CONFIG_ACPI_BATTERY) += battery.o > obj-$(CONFIG_ACPI_SBS) += sbshc.o > obj-$(CONFIG_ACPI_SBS) += sbs.o > -obj-$(CONFIG_ACPI_HED) += hed.o > obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o > obj-$(CONFIG_ACPI_BGRT) += bgrt.o > obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o

1 0

Re: [PATCH v15 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
by Shameerali Kolothum Thodi 12 Nov '24

12 Nov '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Tuesday, November 12, 2024 7:33 AM > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com> > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > Subject: [PATCH v15 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon > migration driver > > > +static void hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device > *hisi_acc_vdev) > +{ > + struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev; > + struct hisi_acc_vf_migration_file *migf = NULL; > + struct dentry *vfio_dev_migration = NULL; > + struct dentry *vfio_hisi_acc = NULL; Nit, I think we can get rid of these NULL initializations. If you have time, please consider respin (sorry, missed this in earlier reviews.) Thanks, Shameer

1 0

Re: [PATCH v12 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
by Shameerali Kolothum Thodi 05 Nov '24

05 Nov '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Tuesday, November 5, 2024 3:53 AM > To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com> > Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com> > Subject: [PATCH v12 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon > migration driver Hi, Few minor comments below. Please don't re-spin just for these yet. Please wait for others to review as well. Thanks, Shameer > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > index a8c53952d82e..7728c9745b9d 100644 > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > @@ -627,15 +627,30 @@ static void hisi_acc_vf_disable_fd(struct > hisi_acc_vf_migration_file *migf) > mutex_unlock(&migf->lock); > } > > +static void hisi_acc_debug_migf_copy(struct hisi_acc_vf_core_device > *hisi_acc_vdev, > + struct hisi_acc_vf_migration_file *src_migf) Alignment should match open parenthesis here. > +{ > + struct hisi_acc_vf_migration_file *dst_migf = hisi_acc_vdev- > >debug_migf; > + > + if (!dst_migf) > + return; > + > + dst_migf->total_length = src_migf->total_length; > + memcpy(&dst_migf->vf_data, &src_migf->vf_data, > + sizeof(struct acc_vf_data)); Here too, alignment not correct. It is better to run, ./scripts/checkpatch --strict on these patches if you haven't done already. > +} > + > static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device > *hisi_acc_vdev) > { > if (hisi_acc_vdev->resuming_migf) { > + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev- > >resuming_migf); > hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf); > fput(hisi_acc_vdev->resuming_migf->filp); > hisi_acc_vdev->resuming_migf = NULL; > } > > if (hisi_acc_vdev->saving_migf) { > + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev- > >saving_migf); > hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf); > fput(hisi_acc_vdev->saving_migf->filp); > hisi_acc_vdev->saving_migf = NULL; > @@ -1294,6 +1309,129 @@ static long hisi_acc_vfio_pci_ioctl(struct > vfio_device *core_vdev, unsigned int > return vfio_pci_core_ioctl(core_vdev, cmd, arg); > } > > +static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device > *vdev) > +{ > + struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(vdev); > + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; > + int ret; > + > + lockdep_assert_held(&hisi_acc_vdev->open_mutex); > + /* > + * When the device is not opened, the io_base is not mapped. > + * The driver cannot perform device read and write operations. > + */ > + if (!hisi_acc_vdev->dev_opened) { > + seq_printf(seq, "device not opened!\n"); > + return -EINVAL; > + } > + > + ret = qm_wait_dev_not_ready(vf_qm); > + if (ret) { > + seq_printf(seq, "VF device not ready!\n"); > + return -EBUSY; > + } > + > + return 0; > +} > + > +static int hisi_acc_vf_debug_cmd(struct seq_file *seq, void *data) > +{ > + struct device *vf_dev = seq->private; > + struct vfio_pci_core_device *core_device = > dev_get_drvdata(vf_dev); > + struct vfio_device *vdev = &core_device->vdev; > + struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(vdev); > + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; > + u64 value; > + int ret; > + > + mutex_lock(&hisi_acc_vdev->open_mutex); > + ret = hisi_acc_vf_debug_check(seq, vdev); > + if (ret) { > + mutex_unlock(&hisi_acc_vdev->open_mutex); > + return ret; > + } > + > + value = readl(vf_qm->io_base + QM_MB_CMD_SEND_BASE); > + if (value == QM_MB_CMD_NOT_READY) { > + mutex_unlock(&hisi_acc_vdev->open_mutex); > + seq_printf(seq, "mailbox cmd channel not ready!\n"); > + return -EINVAL; > + } > + mutex_unlock(&hisi_acc_vdev->open_mutex); > + seq_printf(seq, "mailbox cmd channel ready!\n"); > + > + return 0; > +} > + > +static int hisi_acc_vf_dev_read(struct seq_file *seq, void *data) > +{ > + struct device *vf_dev = seq->private; > + struct vfio_pci_core_device *core_device = > dev_get_drvdata(vf_dev); > + struct vfio_device *vdev = &core_device->vdev; > + struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(vdev); > + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding); > + struct acc_vf_data *vf_data; > + int ret; > + > + mutex_lock(&hisi_acc_vdev->open_mutex); > + ret = hisi_acc_vf_debug_check(seq, vdev); > + if (ret) { > + mutex_unlock(&hisi_acc_vdev->open_mutex); > + return ret; > + } > + > + mutex_lock(&hisi_acc_vdev->state_mutex); > + vf_data = kzalloc(sizeof(struct acc_vf_data), GFP_KERNEL); > + if (!vf_data) { > + ret = -ENOMEM; > + goto mutex_release; > + } > + > + vf_data->vf_qm_state = hisi_acc_vdev->vf_qm_state; > + ret = vf_qm_read_data(&hisi_acc_vdev->vf_qm, vf_data); > + if (ret) > + goto migf_err; > + > + seq_hex_dump(seq, "Dev Data:", DUMP_PREFIX_OFFSET, 16, 1, > + (const void *)vf_data, vf_data_sz, false); > + > + seq_printf(seq, > + "guest driver load: %u\n" > + "data size: %lu\n", > + hisi_acc_vdev->vf_qm_state, > + sizeof(struct acc_vf_data)); There was a suggestion to add a comment here to describe vf_qm_state better. May be something like, vf_qm_state here indicates whether the Guest has loaded the driver for the ACC VF device or not. > + > +migf_err: > + kfree(vf_data); > +mutex_release: > + mutex_unlock(&hisi_acc_vdev->state_mutex); > + mutex_unlock(&hisi_acc_vdev->open_mutex); > + > + return ret; > +} > + > +static int hisi_acc_vf_migf_read(struct seq_file *seq, void *data) > +{ > + struct device *vf_dev = seq->private; > + struct vfio_pci_core_device *core_device = > dev_get_drvdata(vf_dev); > + struct vfio_device *vdev = &core_device->vdev; > + struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(vdev); > + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding); > + struct hisi_acc_vf_migration_file *debug_migf = hisi_acc_vdev- > >debug_migf; > + > + /* Check whether the live migration operation has been performed > */ > + if (debug_migf->total_length < QM_MATCH_SIZE) { > + seq_printf(seq, "device not migrated!\n"); > + return -EAGAIN; > + } > + > + seq_hex_dump(seq, "Mig Data:", DUMP_PREFIX_OFFSET, 16, 1, > + (const void *)&debug_migf->vf_data, vf_data_sz, false); > + seq_printf(seq, "migrate data length: %lu\n", debug_migf- > >total_length); > + > + return 0; > +} > + > static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev) > { > struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(core_vdev); > @@ -1305,12 +1443,16 @@ static int hisi_acc_vfio_pci_open_device(struct > vfio_device *core_vdev) > return ret; > > if (core_vdev->mig_ops) { > + mutex_lock(&hisi_acc_vdev->open_mutex); > ret = hisi_acc_vf_qm_init(hisi_acc_vdev); > if (ret) { > + mutex_unlock(&hisi_acc_vdev->open_mutex); > vfio_pci_core_disable(vdev); > return ret; > } > hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING; > + hisi_acc_vdev->dev_opened = true; > + mutex_unlock(&hisi_acc_vdev->open_mutex); > } > > vfio_pci_core_finish_enable(vdev); > @@ -1322,7 +1464,10 @@ static void hisi_acc_vfio_pci_close_device(struct > vfio_device *core_vdev) > struct hisi_acc_vf_core_device *hisi_acc_vdev = > hisi_acc_get_vf_dev(core_vdev); > struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; > > + mutex_lock(&hisi_acc_vdev->open_mutex); > + hisi_acc_vdev->dev_opened = false; > iounmap(vf_qm->io_base); > + mutex_unlock(&hisi_acc_vdev->open_mutex); > vfio_pci_core_close_device(core_vdev); > } > > @@ -1342,6 +1487,7 @@ static int hisi_acc_vfio_pci_migrn_init_dev(struct > vfio_device *core_vdev) > hisi_acc_vdev->pf_qm = pf_qm; > hisi_acc_vdev->vf_dev = pdev; > mutex_init(&hisi_acc_vdev->state_mutex); > + mutex_init(&hisi_acc_vdev->open_mutex); > > core_vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | > VFIO_MIGRATION_PRE_COPY; > core_vdev->mig_ops = &hisi_acc_vfio_pci_migrn_state_ops; > @@ -1387,6 +1533,48 @@ static const struct vfio_device_ops > hisi_acc_vfio_pci_ops = { > .detach_ioas = vfio_iommufd_physical_detach_ioas, > }; > > +static void hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device > *hisi_acc_vdev) > +{ > + struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev; > + struct dentry *vfio_dev_migration = NULL; > + struct dentry *vfio_hisi_acc = NULL; > + struct device *dev = vdev->dev; > + void *migf = NULL; > + > + if (!debugfs_initialized() || > + !IS_ENABLED(CONFIG_VFIO_DEBUGFS)) > + return; > + > + if (vdev->ops != &hisi_acc_vfio_pci_migrn_ops) > + return; > + > + vfio_dev_migration = debugfs_lookup("migration", vdev- > >debug_root); > + if (!vfio_dev_migration) { > + dev_err(dev, "failed to lookup migration debugfs file!\n"); > + return; > + } > + > + migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), > GFP_KERNEL); > + if (!migf) > + return; > + hisi_acc_vdev->debug_migf = migf; > + > + vfio_hisi_acc = debugfs_create_dir("hisi_acc", vfio_dev_migration); > + debugfs_create_devm_seqfile(dev, "dev_data", vfio_hisi_acc, > + hisi_acc_vf_dev_read); > + debugfs_create_devm_seqfile(dev, "migf_data", vfio_hisi_acc, > + hisi_acc_vf_migf_read); > + debugfs_create_devm_seqfile(dev, "cmd_state", vfio_hisi_acc, > + hisi_acc_vf_debug_cmd); > +} > + > +static void hisi_acc_vf_debugfs_exit(struct hisi_acc_vf_core_device > *hisi_acc_vdev) > +{ > + /* If migrn_ops is not used, kfree(NULL) is valid */ The above comment is not required. Please remove. > + kfree(hisi_acc_vdev->debug_migf); > + hisi_acc_vdev->debug_migf = NULL; > +} > +

1 0

Re: [PATCH v11 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
by Shameerali Kolothum Thodi 04 Nov '24

04 Nov '24

> -----Original Message----- > From: liulongfang <liulongfang(a)huawei.com> > Sent: Monday, November 4, 2024 8:31 AM > To: Alex Williamson <alex.williamson(a)redhat.com> > Cc: jgg(a)nvidia.com; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron > <jonathan.cameron(a)huawei.com>; kvm(a)vger.kernel.org; linux- > kernel(a)vger.kernel.org; linuxarm(a)openeuler.org > Subject: Re: [PATCH v11 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon > migration driver [...] > >>> + > >>> + seq_printf(seq, > >>> + "acc device:\n" > >>> + "guest driver load: %u\n" > >>> + "device opened: %d\n" > >>> + "migrate data length: %lu\n", > >>> + hisi_acc_vdev->vf_qm_state, > >>> + hisi_acc_vdev->dev_opened, > >>> + debug_migf->total_length); > >> > >> This debugfs entry is described as returning the data from the last > >> migration, but vf_qm_state and dev_opened are relative to the current > >> device/guest driver state. Both seem to have no relevance to the data > >> in debug_migf. > >> > > > > The benefit of dev_opened retention is that user can obtain the device > status > > during the cat migf_data operation. > > > > I will remove dev_opened in the next version. > And hisi_acc_vdev->vf_qm_state is changed to debug_migf- > >vf_data.vf_qm_state > Keep information about whether the device driver in the Guest OS is loaded > when live migration occurs. I think you already get that when you dump debug_migf->vf_data. So not required. Thanks, Shameer

1 0

Re: [PATCH v10 0/4] debugfs to hisilicon migration driver
by Shameerali Kolothum Thodi 25 Oct '24

25 Oct '24

> -----Original Message----- > From: Jason Gunthorpe <jgg(a)nvidia.com> > Sent: Thursday, October 24, 2024 4:28 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi(a)huawei.com> > Cc: liulongfang <liulongfang(a)huawei.com>; alex.williamson(a)redhat.com; > Jonathan Cameron <jonathan.cameron(a)huawei.com>; > kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; > linuxarm(a)openeuler.org > Subject: Re: [PATCH v10 0/4] debugfs to hisilicon migration driver > > On Thu, Oct 24, 2024 at 01:18:55PM +0000, Shameerali Kolothum Thodi > wrote: > > > Add a debugfs function to the hisilicon migration driver in VFIO to > > > provide intermediate state values and data during device migration. > > > > > > When the execution of live migration fails, the user can view the > > > status and data during the migration process separately from the > > > source and the destination, which is convenient for users to analyze > > > and locate problems. > > > > Could you please take another look at this series as it looks like almost > there. > > Why are we so keen to do this? Nobody else needed a complex debugfs for > their live migration? I don't think it is that complex debugfs. Longfang has found this very helpful in testing and debug with hardware. And hopefully this can be expanded in future with different hardware revisions. Thanks, Shameer

1 0