There are some software bugs in the accelerator live migration driver that need to be fixed, and there are still some clean code issues that need to be resolved.
Longfang Liu (5): hisi_acc_vfio_pci: Fixes a memory leak bug hisi_acc_vfio_pci: Fixes error return code issue hisi_acc_vfio_pci: Remove useless function parameter hisi_acc_vfio_pci: Fix device data address combination problem hisi_acc_vfio_pci: Fix some clean code issues
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 66 ++++++++++--------- .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 1 - 2 files changed, 34 insertions(+), 33 deletions(-)
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
So we need to add a memory release operation.
Reviewed-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Longfang Liu liulongfang@huawei.com --- drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index ea762e28c1cc..8fd68af2ed5f 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -828,15 +828,15 @@ hisi_acc_vf_stop_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev) return ERR_PTR(err); }
- stream_open(migf->filp->f_inode, migf->filp); - mutex_init(&migf->lock); - ret = vf_qm_state_save(hisi_acc_vdev, migf); if (ret) { - fput(migf->filp); + kfree(migf); return ERR_PTR(ret); }
+ stream_open(migf->filp->f_inode, migf->filp); + mutex_init(&migf->lock); + return migf; }
On Thu, 15 Sep 2022 09:31:53 +0800 Longfang Liu liulongfang@huawei.com wrote:
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
So we need to add a memory release operation.
Reviewed-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Longfang Liu liulongfang@huawei.com
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index ea762e28c1cc..8fd68af2ed5f 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -828,15 +828,15 @@ hisi_acc_vf_stop_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev) return ERR_PTR(err); }
- stream_open(migf->filp->f_inode, migf->filp);
- mutex_init(&migf->lock);
- ret = vf_qm_state_save(hisi_acc_vdev, migf); if (ret) {
fput(migf->filp);
Sorry, why did this fput() get removed? Thanks,
Alex
kfree(migf);
return ERR_PTR(ret); }
stream_open(migf->filp->f_inode, migf->filp);
mutex_init(&migf->lock);
return migf;
}
On Tue, Sep 20, 2022 at 10:34:43AM -0600, Alex Williamson wrote:
On Thu, 15 Sep 2022 09:31:53 +0800 Longfang Liu liulongfang@huawei.com wrote:
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
Why isn't it released? The fput() releases it:
static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp) { struct hisi_acc_vf_migration_file *migf = filp->private_data;
hisi_acc_vf_disable_fd(migf); mutex_destroy(&migf->lock); kfree(migf); ^^^^^^^^^^
This patch looks wrong to me.
Jason
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 20 September 2022 17:38 To: Alex Williamson alex.williamson@redhat.com Cc: liulongfang liulongfang@huawei.com; Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [PATCH 1/5] hisi_acc_vfio_pci: Fixes a memory leak bug
On Tue, Sep 20, 2022 at 10:34:43AM -0600, Alex Williamson wrote:
On Thu, 15 Sep 2022 09:31:53 +0800 Longfang Liu liulongfang@huawei.com wrote:
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
Why isn't it released? The fput() releases it:
static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp) { struct hisi_acc_vf_migration_file *migf = filp->private_data;
hisi_acc_vf_disable_fd(migf); mutex_destroy(&migf->lock); kfree(migf); ^^^^^^^^^^
This patch looks wrong to me.
That's right. Missed that. Sorry of the oversight.
Thanks, Shameer
On 2022/9/21 1:03, Shameerali Kolothum Thodi wrote:
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 20 September 2022 17:38 To: Alex Williamson alex.williamson@redhat.com Cc: liulongfang liulongfang@huawei.com; Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [PATCH 1/5] hisi_acc_vfio_pci: Fixes a memory leak bug
On Tue, Sep 20, 2022 at 10:34:43AM -0600, Alex Williamson wrote:
On Thu, 15 Sep 2022 09:31:53 +0800 Longfang Liu liulongfang@huawei.com wrote:
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
Why isn't it released? The fput() releases it:
static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp) { struct hisi_acc_vf_migration_file *migf = filp->private_data;
hisi_acc_vf_disable_fd(migf); mutex_destroy(&migf->lock); kfree(migf); ^^^^^^^^^^
This patch looks wrong to me.
That's right. Missed that. Sorry of the oversight.
Yes, fput will call release in ops of file, here will call hisi_acc_vf_release_file to complete the release operation of migf, so this patch is unnecessary.
But there is another place that needs to be modified: hisi_acc_vf_disable_fd in hisi_acc_vf_disable_fds is not needed, because it will have an fput next. Is this correct?
Thanks, Shameer
.
Thanks, Longfang.
-----Original Message----- From: liulongfang Sent: 21 September 2022 04:13 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; Jason Gunthorpe jgg@nvidia.com; Alex Williamson alex.williamson@redhat.com Cc: cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [PATCH 1/5] hisi_acc_vfio_pci: Fixes a memory leak bug
On 2022/9/21 1:03, Shameerali Kolothum Thodi wrote:
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 20 September 2022 17:38 To: Alex Williamson alex.williamson@redhat.com Cc: liulongfang liulongfang@huawei.com; Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [PATCH 1/5] hisi_acc_vfio_pci: Fixes a memory leak bug
On Tue, Sep 20, 2022 at 10:34:43AM -0600, Alex Williamson wrote:
On Thu, 15 Sep 2022 09:31:53 +0800 Longfang Liu liulongfang@huawei.com wrote:
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
Why isn't it released? The fput() releases it:
static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp) { struct hisi_acc_vf_migration_file *migf = filp->private_data;
hisi_acc_vf_disable_fd(migf); mutex_destroy(&migf->lock); kfree(migf); ^^^^^^^^^^
This patch looks wrong to me.
That's right. Missed that. Sorry of the oversight.
Yes, fput will call release in ops of file, here will call hisi_acc_vf_release_file to complete the release operation of migf, so this patch is unnecessary.
But there is another place that needs to be modified: hisi_acc_vf_disable_fd in hisi_acc_vf_disable_fds is not needed, because it will have an fput next. Is this correct?
I don't think that is correct either. fput() decrements ref count and will only call release() if the count is zero. We have an explicit get_file() for the hisi_acc_vf_disable_fds(). Isn't it?
Thanks, Shameer
Thanks, Shameer
.
Thanks, Longfang.
On 2022/9/21 15:27, Shameerali Kolothum Thodi wrote:
-----Original Message----- From: liulongfang Sent: 21 September 2022 04:13 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; Jason Gunthorpe jgg@nvidia.com; Alex Williamson alex.williamson@redhat.com Cc: cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [PATCH 1/5] hisi_acc_vfio_pci: Fixes a memory leak bug
On 2022/9/21 1:03, Shameerali Kolothum Thodi wrote:
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 20 September 2022 17:38 To: Alex Williamson alex.williamson@redhat.com Cc: liulongfang liulongfang@huawei.com; Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [PATCH 1/5] hisi_acc_vfio_pci: Fixes a memory leak bug
On Tue, Sep 20, 2022 at 10:34:43AM -0600, Alex Williamson wrote:
On Thu, 15 Sep 2022 09:31:53 +0800 Longfang Liu liulongfang@huawei.com wrote:
During the stop copy phase of live migration, the driver allocates a memory for the migrated data to save the data.
When an exception occurs when the driver reads device data, the driver will report an error to qemu and exit the current migration state. But this memory is not released, which will lead to a memory leak problem.
Why isn't it released? The fput() releases it:
static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp) { struct hisi_acc_vf_migration_file *migf = filp->private_data;
hisi_acc_vf_disable_fd(migf); mutex_destroy(&migf->lock); kfree(migf); ^^^^^^^^^^
This patch looks wrong to me.
That's right. Missed that. Sorry of the oversight.
Yes, fput will call release in ops of file, here will call hisi_acc_vf_release_file to complete the release operation of migf, so this patch is unnecessary.
But there is another place that needs to be modified: hisi_acc_vf_disable_fd in hisi_acc_vf_disable_fds is not needed, because it will have an fput next. Is this correct?
I don't think that is correct either. fput() decrements ref count and will only call release() if the count is zero. We have an explicit get_file() for the hisi_acc_vf_disable_fds(). Isn't it?
Thanks, Shameer
OK! These are not necessary to be modified, so there is no need to add them to the patchset. I am going to modify the patchset and send it out in the next version.
Thanks, Shameer
.
Thanks, Longfang.
.
Thanks, Longfang.
During the process of compatibility and matching of live migration device information, if the isolation status of the two devices is inconsistent, the live migration needs to be exited.
The current driver does not return the error code correctly and needs to be fixed.
Reviewed-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Longfang Liu liulongfang@huawei.com --- drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index 8fd68af2ed5f..3790b76a578e 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -405,7 +405,7 @@ static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
if (vf_data->que_iso_cfg != que_iso_state) { dev_err(dev, "failed to match isolation state\n"); - return ret; + return -EINVAL; }
ret = qm_write_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state, 1);
Remove unused function parameters for vf_qm_fun_reset() and ensure the device is enabled before the reset operation is performed.
Signed-off-by: Longfang Liu liulongfang@huawei.com --- drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index 3790b76a578e..c172a52088b7 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -345,8 +345,7 @@ static struct hisi_acc_vf_core_device *hssi_acc_drvdata(struct pci_dev *pdev) core_device); }
-static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev, - struct hisi_qm *qm) +static void vf_qm_fun_reset(struct hisi_qm *qm) { int i;
@@ -662,7 +661,10 @@ static void hisi_acc_vf_start_device(struct hisi_acc_vf_core_device *hisi_acc_vd if (hisi_acc_vdev->vf_qm_state != QM_READY) return;
- vf_qm_fun_reset(hisi_acc_vdev, vf_qm); + /* Make sure the device is enabled */ + qm_dev_cmd_init(vf_qm); + + vf_qm_fun_reset(vf_qm); }
static int hisi_acc_vf_load_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
The queue address of the accelerator device should be combined into a dma address in a way of combining the low and high bits. The previous combination is wrong and needs to be modified.
Signed-off-by: Longfang Liu liulongfang@huawei.com --- drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index c172a52088b7..fce49c7f5db8 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -527,12 +527,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev, return -EINVAL;
/* Every reg is 32 bit, the dma address is 64 bit. */ - vf_data->eqe_dma = vf_data->qm_eqc_dw[2]; + vf_data->eqe_dma = vf_data->qm_eqc_dw[1]; vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET; - vf_data->eqe_dma |= vf_data->qm_eqc_dw[1]; - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[2]; + vf_data->eqe_dma |= vf_data->qm_eqc_dw[0]; + vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1]; vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET; - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[1]; + vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
/* Through SQC_BT/CQC_BT to get sqc and cqc address */ ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
1.Fix some code comments 2.Fix some code style issues 3.Delete an unused macro definition
Signed-off-by: Longfang Liu liulongfang@huawei.com --- .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 40 +++++++++---------- .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 1 - 2 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index fce49c7f5db8..c4857e171da9 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -426,10 +426,10 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev, int ret;
vf_data->acc_magic = ACC_DEV_MAGIC; - /* save device id */ + /* Save device id */ vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
- /* vf qp num save from PF */ + /* VF qp num save from PF */ ret = pf_qm_get_qp_num(pf_qm, vf_id, &vf_data->qp_base); if (ret <= 0) { dev_err(dev, "failed to get vft qp nums!\n"); @@ -473,19 +473,19 @@ static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
ret = qm_set_regs(qm, vf_data); if (ret) { - dev_err(dev, "Set VF regs failed\n"); + dev_err(dev, "set VF regs failed\n"); return ret; }
ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0); if (ret) { - dev_err(dev, "Set sqc failed\n"); + dev_err(dev, "set sqc failed\n"); return ret; }
ret = hisi_qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0); if (ret) { - dev_err(dev, "Set cqc failed\n"); + dev_err(dev, "set cqc failed\n"); return ret; }
@@ -640,15 +640,16 @@ static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vde static void hisi_acc_vf_state_mutex_unlock(struct hisi_acc_vf_core_device *hisi_acc_vdev) { -again: - spin_lock(&hisi_acc_vdev->reset_lock); - if (hisi_acc_vdev->deferred_reset) { + while (true) { + spin_lock(&hisi_acc_vdev->reset_lock); + if (!hisi_acc_vdev->deferred_reset) + break; + hisi_acc_vdev->deferred_reset = false; spin_unlock(&hisi_acc_vdev->reset_lock); hisi_acc_vdev->vf_qm_state = QM_NOT_READY; hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING; hisi_acc_vf_disable_fds(hisi_acc_vdev); - goto again; } mutex_unlock(&hisi_acc_vdev->state_mutex); spin_unlock(&hisi_acc_vdev->reset_lock); @@ -709,10 +710,9 @@ static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
if (pos) return -ESPIPE; - pos = &filp->f_pos;
- if (*pos < 0 || - check_add_overflow((loff_t)len, *pos, &requested_length)) + if (filp->f_pos < 0 || + check_add_overflow((loff_t)len, filp->f_pos, &requested_length)) return -EINVAL;
if (requested_length > sizeof(struct acc_vf_data)) @@ -729,7 +729,7 @@ static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu done = -EFAULT; goto out_unlock; } - *pos += len; + filp->f_pos += len; done = len; migf->total_length += len; out_unlock: @@ -772,14 +772,14 @@ static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t { struct hisi_acc_vf_migration_file *migf = filp->private_data; ssize_t done = 0; + size_t min_len; int ret;
if (pos) return -ESPIPE; - pos = &filp->f_pos;
mutex_lock(&migf->lock); - if (*pos > migf->total_length) { + if (filp->f_pos > migf->total_length) { done = -EINVAL; goto out_unlock; } @@ -789,15 +789,15 @@ static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t goto out_unlock; }
- len = min_t(size_t, migf->total_length - *pos, len); - if (len) { - ret = copy_to_user(buf, &migf->vf_data, len); + min_len = min_t(size_t, migf->total_length - filp->f_pos, len); + if (min_len) { + ret = copy_to_user(buf, &migf->vf_data, min_len); if (ret) { done = -EFAULT; goto out_unlock; } - *pos += len; - done = len; + filp->f_pos += min_len; + done = min_len; } out_unlock: mutex_unlock(&migf->lock); diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h index 5494f4983bbe..8e4bf21deae1 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h @@ -16,7 +16,6 @@ #define SEC_CORE_INT_STATUS 0x301008 #define HPRE_HAC_INT_STATUS 0x301800 #define HZIP_CORE_INT_STATUS 0x3010AC -#define QM_QUE_ISO_CFG 0x301154
#define QM_VFT_CFG_RDY 0x10006c #define QM_VFT_CFG_OP_WR 0x100058