Linuxarm
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- 1 participants
- 790 discussions

Re: [PATCH v8 4/4] Documentation: add debugfs description for hisi migration
by Shameerali Kolothum Thodi 14 Aug '24
by Shameerali Kolothum Thodi 14 Aug '24
14 Aug '24
> -----Original Message-----
> From: liulongfang <liulongfang(a)huawei.com>
> Sent: Tuesday, August 6, 2024 1:29 PM
> To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> <jonathan.cameron(a)huawei.com>
> Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org;
> linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com>
> Subject: [PATCH v8 4/4] Documentation: add debugfs description for hisi
> migration
>
> Add a debugfs document description file to help users understand
> how to use the hisilicon accelerator live migration driver's
> debugfs.
>
> Update the file paths that need to be maintained in MAINTAINERS
>
> Signed-off-by: Longfang Liu <liulongfang(a)huawei.com>
LGTM,
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Thanks,
Shameer
1
0

Re: [PATCH v7 2/4] hisi_acc_vfio_pci: create subfunction for data reading
by Shameerali Kolothum Thodi 05 Aug '24
by Shameerali Kolothum Thodi 05 Aug '24
05 Aug '24
> -----Original Message-----
> From: liulongfang <liulongfang(a)huawei.com>
> Sent: Tuesday, July 30, 2024 1:15 PM
> To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> <jonathan.cameron(a)huawei.com>
> Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org;
> linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com>
> Subject: [PATCH v7 2/4] hisi_acc_vfio_pci: create subfunction for data
> reading
>
> This patch generates the code for the operation of reading data from
> the device into a sub-function.
> Then, it can be called during the device status data saving phase of
> the live migration process and the device status data reading function
> in debugfs.
> Thereby reducing the redundant code of the driver.
>
> Signed-off-by: Longfang Liu <liulongfang(a)huawei.com>
> ---
LGTM,
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Thanks,
Shameer
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 54 +++++++++++--------
> 1 file changed, 33 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 45351be8e270..a8c53952d82e 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -486,31 +486,11 @@ static int vf_qm_load_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> return 0;
> }
>
> -static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> - struct hisi_acc_vf_migration_file *migf)
> +static int vf_qm_read_data(struct hisi_qm *vf_qm, struct acc_vf_data
> *vf_data)
> {
> - struct acc_vf_data *vf_data = &migf->vf_data;
> - struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> struct device *dev = &vf_qm->pdev->dev;
> int ret;
>
> - if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
> - /* Update state and return with match data */
> - vf_data->vf_qm_state = QM_NOT_READY;
> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> - migf->total_length = QM_MATCH_SIZE;
> - return 0;
> - }
> -
> - vf_data->vf_qm_state = QM_READY;
> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> -
> - ret = vf_qm_cache_wb(vf_qm);
> - if (ret) {
> - dev_err(dev, "failed to writeback QM Cache!\n");
> - return ret;
> - }
> -
> ret = qm_get_regs(vf_qm, vf_data);
> if (ret)
> return -EINVAL;
> @@ -536,6 +516,38 @@ static int vf_qm_state_save(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> return -EINVAL;
> }
>
> + return 0;
> +}
> +
> +static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> + struct hisi_acc_vf_migration_file *migf)
> +{
> + struct acc_vf_data *vf_data = &migf->vf_data;
> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> + struct device *dev = &vf_qm->pdev->dev;
> + int ret;
> +
> + if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
> + /* Update state and return with match data */
> + vf_data->vf_qm_state = QM_NOT_READY;
> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> + migf->total_length = QM_MATCH_SIZE;
> + return 0;
> + }
> +
> + vf_data->vf_qm_state = QM_READY;
> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> +
> + ret = vf_qm_cache_wb(vf_qm);
> + if (ret) {
> + dev_err(dev, "failed to writeback QM Cache!\n");
> + return ret;
> + }
> +
> + ret = vf_qm_read_data(vf_qm, vf_data);
> + if (ret)
> + return -EINVAL;
> +
> migf->total_length = sizeof(struct acc_vf_data);
> return 0;
> }
> --
> 2.24.0
1
0

Re: [PATCH v7 1/4] hisi_acc_vfio_pci: extract public functions for container_of
by Shameerali Kolothum Thodi 05 Aug '24
by Shameerali Kolothum Thodi 05 Aug '24
05 Aug '24
> -----Original Message-----
> From: liulongfang <liulongfang(a)huawei.com>
> Sent: Tuesday, July 30, 2024 1:15 PM
> To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> <jonathan.cameron(a)huawei.com>
> Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org;
> linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com>
> Subject: [PATCH v7 1/4] hisi_acc_vfio_pci: extract public functions for
> container_of
>
> In the current driver, vdev is obtained from struct
> hisi_acc_vf_core_device through the container_of function.
> This method is used in many places in the driver. In order to
> reduce this repetitive operation, It was extracted into
> a public function.
>
> Signed-off-by: Longfang Liu <liulongfang(a)huawei.com>
> ---
LGTM,
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Thanks,
Shameer
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 21 ++++++++++---------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 9a3e97108ace..45351be8e270 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -630,6 +630,12 @@ static void hisi_acc_vf_disable_fds(struct
> hisi_acc_vf_core_device *hisi_acc_vde
> }
> }
>
> +static struct hisi_acc_vf_core_device *hisi_acc_get_vf_dev(struct
> vfio_device *vdev)
> +{
> + return container_of(vdev, struct hisi_acc_vf_core_device,
> + core_device.vdev);
> +}
> +
> static void hisi_acc_vf_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev)
> {
> hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
> @@ -1033,8 +1039,7 @@ static struct file *
> hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev,
> enum vfio_device_mig_state new_state)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
> - struct hisi_acc_vf_core_device, core_device.vdev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
> enum vfio_device_mig_state next_state;
> struct file *res = NULL;
> int ret;
> @@ -1075,8 +1080,7 @@ static int
> hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
> enum vfio_device_mig_state *curr_state)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
> - struct hisi_acc_vf_core_device, core_device.vdev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
>
> mutex_lock(&hisi_acc_vdev->state_mutex);
> *curr_state = hisi_acc_vdev->mig_state;
> @@ -1280,8 +1284,7 @@ static long hisi_acc_vfio_pci_ioctl(struct vfio_device
> *core_vdev, unsigned int
>
> static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev =
> container_of(core_vdev,
> - struct hisi_acc_vf_core_device, core_device.vdev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(core_vdev);
> struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
> int ret;
>
> @@ -1304,8 +1307,7 @@ static int hisi_acc_vfio_pci_open_device(struct
> vfio_device *core_vdev)
>
> static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev =
> container_of(core_vdev,
> - struct hisi_acc_vf_core_device, core_device.vdev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(core_vdev);
> struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>
> iounmap(vf_qm->io_base);
> @@ -1320,8 +1322,7 @@ static const struct vfio_migration_ops
> hisi_acc_vfio_pci_migrn_state_ops = {
>
> static int hisi_acc_vfio_pci_migrn_init_dev(struct vfio_device *core_vdev)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev =
> container_of(core_vdev,
> - struct hisi_acc_vf_core_device, core_device.vdev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(core_vdev);
> struct pci_dev *pdev = to_pci_dev(core_vdev->dev);
> struct hisi_qm *pf_qm = hisi_acc_get_pf_qm(pdev);
>
> --
> 2.24.0
1
0

Re: [PATCH v7 4/4] Documentation: add debugfs description for hisi migration
by Shameerali Kolothum Thodi 05 Aug '24
by Shameerali Kolothum Thodi 05 Aug '24
05 Aug '24
> -----Original Message-----
> From: liulongfang <liulongfang(a)huawei.com>
> Sent: Tuesday, July 30, 2024 1:15 PM
> To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> <jonathan.cameron(a)huawei.com>
> Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org;
> linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com>
> Subject: [PATCH v7 4/4] Documentation: add debugfs description for hisi
> migration
>
> Add a debugfs document description file to help users understand
> how to use the hisilicon accelerator live migration driver's
> debugfs.
>
> Update the file paths that need to be maintained in MAINTAINERS
>
> Signed-off-by: Longfang Liu <liulongfang(a)huawei.com>
> ---
> .../ABI/testing/debugfs-hisi-migration | 25 +++++++++++++++++++
> 1 file changed, 25 insertions(+)
> create mode 100644 Documentation/ABI/testing/debugfs-hisi-migration
>
> diff --git a/Documentation/ABI/testing/debugfs-hisi-migration
> b/Documentation/ABI/testing/debugfs-hisi-migration
> new file mode 100644
> index 000000000000..053f3ebba9b1
> --- /dev/null
> +++ b/Documentation/ABI/testing/debugfs-hisi-migration
> @@ -0,0 +1,25 @@
> +What:
> /sys/kernel/debug/vfio/<device>/migration/hisi_acc/dev_data
> +Date: Jul 2024
> +KernelVersion: 6.11
> +Contact: Longfang Liu <liulongfang(a)huawei.com>
> +Description: Read the configuration data and some status data
> + required for device live migration. These data include device
> + status data, queue configuration data, some task
> configuration
> + data and device attribute data. The output format of the data
> + is defined by the live migration driver.
> +
> +What:
> /sys/kernel/debug/vfio/<device>/migration/hisi_acc/migf_data
> +Date: Jul 2024
> +KernelVersion: 6.11
> +Contact: Longfang Liu <liulongfang(a)huawei.com>
> +Description: Read the data from the last completed live migration.
> + This data includes the same device status data as in
> "dev_data".
> + And some device status data after the migration is
> completed.
Actually what info is different from dev_data here? Only that it is the
dev_data after a migration is attempted/completed, right?
Thanks,
Shameer
> +
> +What:
> /sys/kernel/debug/vfio/<device>/migration/hisi_acc/cmd_state
> +Date: Jul 2024
> +KernelVersion: 6.11
> +Contact: Longfang Liu <liulongfang(a)huawei.com>
> +Description: Used to obtain the device command sending and receiving
> + channel status. Returns failure or success logs based on the
> + results.
> --
> 2.24.0
1
0

Re: [PATCH v7 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
by Shameerali Kolothum Thodi 05 Aug '24
by Shameerali Kolothum Thodi 05 Aug '24
05 Aug '24
> -----Original Message-----
> From: liulongfang <liulongfang(a)huawei.com>
> Sent: Tuesday, July 30, 2024 1:15 PM
> To: alex.williamson(a)redhat.com; jgg(a)nvidia.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> <jonathan.cameron(a)huawei.com>
> Cc: kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org;
> linuxarm(a)openeuler.org; liulongfang <liulongfang(a)huawei.com>
> Subject: [PATCH v7 3/4] hisi_acc_vfio_pci: register debugfs for hisilicon
> migration driver
>
> On the debugfs framework of VFIO, if the CONFIG_VFIO_DEBUGFS macro is
> enabled, the debug function is registered for the live migration driver
> of the HiSilicon accelerator device.
>
> After registering the HiSilicon accelerator device on the debugfs
> framework of live migration of vfio, a directory file "hisi_acc"
> of debugfs is created, and then three debug function files are
> created in this directory:
>
> vfio
> |
> +---<dev_name1>
> | +---migration
> | +--state
> | +--hisi_acc
> | +--dev_data
> | +--migf_data
> | +--cmd_state
> |
> +---<dev_name2>
> +---migration
> +--state
> +--hisi_acc
> +--dev_data
> +--migf_data
> +--cmd_state
>
> dev_data file: read device data that needs to be migrated from the
> current device in real time
> migf_data file: read the migration data of the last live migration
> from the current driver.
> cmd_state: used to get the cmd channel state for the device.
>
> +----------------+ +--------------+ +---------------+
> | migration dev | | src dev | | dst dev |
> +-------+--------+ +------+-------+ +-------+-------+
> | | |
> | +------v-------+ +-------v-------+
> | | saving_migf | | resuming_migf |
> read | | file | | file |
> | +------+-------+ +-------+-------+
> | | copy |
> | +------------+----------+
> | |
> +-------v--------+ +-------v--------+
> | data buffer | | debug_migf |
> +-------+--------+ +-------+--------+
> | |
> cat | cat |
> +-------v--------+ +-------v--------+
> | dev_data | | migf_data |
> +----------------+ +----------------+
>
> When accessing debugfs, user can obtain the most recent status data
> of the device through the "dev_data" file. It can read recent
> complete status data of the device. If the current device is being
> migrated, it will wait for it to complete.
> The data for the last completed migration function will be stored
> in debug_migf. Users can read it via "migf_data".
>
> Signed-off-by: Longfang Liu <liulongfang(a)huawei.com>
> ---
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 220 ++++++++++++++++++
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 6 +
> 2 files changed, 226 insertions(+)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index a8c53952d82e..ae8946901e73 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -627,15 +627,31 @@ static void hisi_acc_vf_disable_fd(struct
> hisi_acc_vf_migration_file *migf)
> mutex_unlock(&migf->lock);
> }
>
> +static void hisi_acc_debug_migf_copy(struct hisi_acc_vf_core_device
> *hisi_acc_vdev,
> + struct hisi_acc_vf_migration_file *src_migf)
> +{
> + struct hisi_acc_vf_migration_file *dst_migf = hisi_acc_vdev-
> >debug_migf;
> +
> + if (!dst_migf)
> + return;
> +
> + dst_migf->disabled = true;
This is always set to true, then why bother printing this as part of migf_data
below?. See also comments on "disabled" below. Is there any value in giving
this info to user?
> + dst_migf->total_length = src_migf->total_length;
> + memcpy(&dst_migf->vf_data, &src_migf->vf_data,
> + sizeof(struct acc_vf_data));
> +}
> +
> static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device
> *hisi_acc_vdev)
> {
> if (hisi_acc_vdev->resuming_migf) {
> + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev-
> >resuming_migf);
> hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf);
> fput(hisi_acc_vdev->resuming_migf->filp);
> hisi_acc_vdev->resuming_migf = NULL;
> }
>
> if (hisi_acc_vdev->saving_migf) {
> + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev-
> >saving_migf);
> hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf);
> fput(hisi_acc_vdev->saving_migf->filp);
> hisi_acc_vdev->saving_migf = NULL;
> @@ -1294,6 +1310,201 @@ static long hisi_acc_vfio_pci_ioctl(struct
> vfio_device *core_vdev, unsigned int
> return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> }
>
> +static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device
> *vdev)
> +{
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> + int ret;
> +
> + if (!vdev->mig_ops) {
> + seq_printf(seq, "%s\n", "device does not support live
> migration!\n");
> + return -EINVAL;
> + }
> +
> + /*
> + * When the device is not opened, the io_base is not mapped.
> + * The driver cannot perform device read and write operations.
> + */
I think it is better you make sure the lock is held in this function before
checking this. Use lockdep_assert_held(lock).
> + if (!hisi_acc_vdev->dev_opened) {
> + seq_printf(seq, "%s\n", "device not opened!\n");
> + return -EINVAL;
> + }
> +
> + ret = qm_wait_dev_not_ready(vf_qm);
> + if (ret) {
> + seq_printf(seq, "%s\n", "VF device not ready!\n");
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +static int hisi_acc_vf_debug_cmd(struct seq_file *seq, void *data)
> +{
> + struct device *vf_dev = seq->private;
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
> + struct vfio_device *vdev = &core_device->vdev;
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> + u64 value;
> + int ret;
> +
> + mutex_lock(&hisi_acc_vdev->state_mutex);
> + ret = hisi_acc_vf_debug_check(seq, vdev);
> + if (ret) {
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> + return ret;
> + }
> +
> + value = readl(vf_qm->io_base + QM_MB_CMD_SEND_BASE);
> + if (value == QM_MB_CMD_NOT_READY) {
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> + seq_printf(seq, "mailbox cmd channel not ready!\n");
> + return -EINVAL;
> + }
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> + seq_printf(seq, "mailbox cmd channel ready!\n");
> +
> + return 0;
> +}
> +
> +static int hisi_acc_vf_dev_read(struct seq_file *seq, void *data)
> +{
> + struct device *vf_dev = seq->private;
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
> + struct vfio_device *vdev = &core_device->vdev;
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
> + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding);
> + struct acc_vf_data *vf_data = NULL;
> + bool migf_state;
> + int ret;
> +
> + vf_data = kzalloc(sizeof(struct acc_vf_data), GFP_KERNEL);
> + if (!vf_data)
> + return -ENOMEM;
> +
> + mutex_lock(&hisi_acc_vdev->state_mutex);
> + ret = hisi_acc_vf_debug_check(seq, vdev);
> + if (ret) {
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> + goto migf_err;
> + }
> +
> + vf_data->vf_qm_state = hisi_acc_vdev->vf_qm_state;
> + ret = vf_qm_read_data(&hisi_acc_vdev->vf_qm, vf_data);
> + if (ret) {
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> + goto migf_err;
> + }
> +
> + if (hisi_acc_vdev->resuming_migf)
> + migf_state = hisi_acc_vdev->resuming_migf->disabled;
> + else if (hisi_acc_vdev->saving_migf)
> + migf_state = hisi_acc_vdev->saving_migf->disabled;
> + else
> + migf_state = true;
I am still not sure what information we are getting from this "disabled". The value
"true" means the migf file is released, isn't it? How is that equivalent to report as "data
valid" below? Also "migf_state" is misleading here.
Thanks,
Shameer
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> +
> + seq_hex_dump(seq, "Dev Data:", DUMP_PREFIX_OFFSET, 16, 1,
> + (unsigned char *)vf_data,
> + vf_data_sz, false);
> +
> + seq_printf(seq,
> + "acc device:\n"
> + "device ready: %u\n"
> + "device opened: %d\n"
> + "data valid: %d\n"
> + "data size: %lu\n",
> + hisi_acc_vdev->vf_qm_state,
> + hisi_acc_vdev->dev_opened,
> + migf_state,
> + sizeof(struct acc_vf_data));
> +
> +migf_err:
> + kfree(vf_data);
> +
> + return ret;
> +}
> +
> +static int hisi_acc_vf_migf_read(struct seq_file *seq, void *data)
> +{
> + struct device *vf_dev = seq->private;
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
> + struct vfio_device *vdev = &core_device->vdev;
> + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
> + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding);
> + struct hisi_acc_vf_migration_file *debug_migf = hisi_acc_vdev-
> >debug_migf;
> +
> + /* Check whether the live migration operation has been performed
> */
> + if (debug_migf->total_length < QM_MATCH_SIZE) {
> + seq_printf(seq, "%s\n", "device not migrated!\n");
> + return -EAGAIN;
> + }
> +
> + seq_hex_dump(seq, "Mig Data:", DUMP_PREFIX_OFFSET, 16, 1,
> + (unsigned char *)&debug_migf->vf_data,
> + vf_data_sz, false);
> +
> + seq_printf(seq,
> + "acc device:\n"
> + "device ready: %u\n"
> + "device opened: %d\n"
> + "data valid: %d\n"
> + "data size: %lu\n",
> + hisi_acc_vdev->vf_qm_state,
> + hisi_acc_vdev->dev_opened,
> + debug_migf->disabled,
> + debug_migf->total_length);
> +
> + return 0;
> +}
> +
> +static int hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device
> *hisi_acc_vdev)
> +{
> + struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev;
> + struct dentry *vfio_dev_migration = NULL;
> + struct dentry *vfio_hisi_acc = NULL;
> + struct device *dev = vdev->dev;
> + void *migf = NULL;
> +
> + if (!debugfs_initialized() ||
> + !IS_ENABLED(CONFIG_VFIO_DEBUGFS))
> + return 0;
> +
> + migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), GFP_KERNEL);
> + if (!migf)
> + return -ENOMEM;
> + hisi_acc_vdev->debug_migf = migf;
> +
> + vfio_dev_migration = debugfs_lookup("migration", vdev-
> >debug_root);
> + if (!vfio_dev_migration) {
> + kfree(migf);
> + hisi_acc_vdev->debug_migf = NULL;
> + dev_err(dev, "failed to lookup migration debugfs file!\n");
> + return -ENODEV;
> + }
> +
> + vfio_hisi_acc = debugfs_create_dir("hisi_acc", vfio_dev_migration);
> + debugfs_create_devm_seqfile(dev, "dev_data", vfio_hisi_acc,
> + hisi_acc_vf_dev_read);
> + debugfs_create_devm_seqfile(dev, "migf_data", vfio_hisi_acc,
> + hisi_acc_vf_migf_read);
> + debugfs_create_devm_seqfile(dev, "cmd_state", vfio_hisi_acc,
> + hisi_acc_vf_debug_cmd);
> +
> + return 0;
> +}
> +
> +static void hisi_acc_vf_debugfs_exit(struct hisi_acc_vf_core_device
> *hisi_acc_vdev)
> +{
> + if (!debugfs_initialized() ||
> + !IS_ENABLED(CONFIG_VFIO_DEBUGFS))
> + return;
> +
> + if (hisi_acc_vdev->debug_migf)
> + kfree(hisi_acc_vdev->debug_migf);
> +}
> +
> static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
> {
> struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(core_vdev);
> @@ -1311,9 +1522,11 @@ static int hisi_acc_vfio_pci_open_device(struct
> vfio_device *core_vdev)
> return ret;
> }
> hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
> + hisi_acc_vdev->dev_opened = true;
> }
>
> vfio_pci_core_finish_enable(vdev);
> +
> return 0;
> }
>
> @@ -1322,7 +1535,10 @@ static void hisi_acc_vfio_pci_close_device(struct
> vfio_device *core_vdev)
> struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(core_vdev);
> struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>
> + mutex_lock(&hisi_acc_vdev->state_mutex);
> + hisi_acc_vdev->dev_opened = false;
> iounmap(vf_qm->io_base);
> + mutex_unlock(&hisi_acc_vdev->state_mutex);
> vfio_pci_core_close_device(core_vdev);
> }
>
> @@ -1413,6 +1629,9 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev
> *pdev, const struct pci_device
> ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
> if (ret)
> goto out_put_vdev;
> +
> + if (ops == &hisi_acc_vfio_pci_migrn_ops)
> + hisi_acc_vfio_debug_init(hisi_acc_vdev);
> return 0;
>
> out_put_vdev:
> @@ -1425,6 +1644,7 @@ static void hisi_acc_vfio_pci_remove(struct pci_dev
> *pdev)
> struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_drvdata(pdev);
>
> vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
> + hisi_acc_vf_debugfs_exit(hisi_acc_vdev);
> vfio_put_device(&hisi_acc_vdev->core_device.vdev);
> }
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> index 5bab46602fad..f86f3b88b09e 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> @@ -32,6 +32,7 @@
> #define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
> #define QM_SQC_VFT_NUM_SHIFT_V2 45
> #define QM_SQC_VFT_NUM_MASK_V2 GENMASK(9, 0)
> +#define QM_MB_CMD_NOT_READY 0xffffffff
>
> /* RW regs */
> #define QM_REGS_MAX_LEN 7
> @@ -111,5 +112,10 @@ struct hisi_acc_vf_core_device {
> int vf_id;
> struct hisi_acc_vf_migration_file *resuming_migf;
> struct hisi_acc_vf_migration_file *saving_migf;
> +
> + /* To make sure the device is opened */
> + bool dev_opened;
> + /* To save migration data */
> + struct hisi_acc_vf_migration_file *debug_migf;
> };
> #endif /* HISI_ACC_VFIO_PCI_H */
> --
> 2.24.0
1
0

Re: [PATCH] soc: hisilicon: Support memory repair driver on Kunpeng SoC
by Jonathan Cameron 19 Jul '24
by Jonathan Cameron 19 Jul '24
19 Jul '24
On Tue, 9 Jul 2024 17:34:18 +0800
Xiaofei Tan <tanxiaofei(a)huawei.com> wrote:
> Kunpeng SoC support several memory repair capability. Normally,
> such repairs are done in firmware startup stage or triggered by
> BMC(Baseboard Management Controller) and done in firmware in
> runtime stage. OS can't perceive such capability.
>
> Some support online repair, while others can't. One reason of not
> supporting online repair is that the memory is in use and the
> hardware does not support backpressure on the primary access channel.
>
> In order to support more online memory repair, we seek solutions
> from OS. That is try to isolate the memory page first, and do repair,
> and de-isloate after repair done.
>
> This patch support online ACLS(Adapter Cache Line Sparing) repair, and
> may add other repair such as PPR in future.
>
> Signed-off-by: Xiaofei Tan <tanxiaofei(a)huawei.com>
As you know PPR is on the list of features we are proposing to cover
with the 'new' EDAC stuff / RAS control that Shiju is working on so
as to bring a unified interface to CXL and anything else (e.g. this)
that needs it. (I happen to know there is a proposal from Micron to
discuss PPR specifically at LPC).
I'm not keen to see an upstream solution for a specific device that we then
have to move over later.
More significantly what stops the user of this shooting themselves in the
foot? How do we know the memory is offline when repair is attempted?
That was the bit that I was least sure about for the CXL version (and why
it needs to be kernel mediated rather than just made a userspace problem)
If this is something custom to keep openeuler happy in the short
term them fair enough, but we don't want lots of (OS in the loop) PPR
solutions. I did raise an open question in one of those threads on whether
anyone currently cares about poison repair on a running system. Interest
so far is minor so it maybe challenging to convince people of this except
that it might be used on initial boot to remove lines known bad from previous
boots etc before onlining the memory.
If we are considering a DAX / ACPI Specific Purpose Memory model for
the memory we are considering repairing things get easier as can do
repair after tearing down the mappings.
Various comments inline. May well overlap with earlier reviews though!
> ---
> drivers/soc/hisilicon/Kconfig | 10 +
> drivers/soc/hisilicon/Makefile | 1 +
> drivers/soc/hisilicon/hisi_mem_ras.c | 553 +++++++++++++++++++++++++++
> drivers/soc/hisilicon/hisi_mem_ras.h | 84 ++++
Needs Documentation/ABI/*
> 4 files changed, 648 insertions(+)
> create mode 100644 drivers/soc/hisilicon/hisi_mem_ras.c
> create mode 100644 drivers/soc/hisilicon/hisi_mem_ras.h
>
> diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig
> index ae8e55a776e0..983cafc38e5c 100644
> --- a/drivers/soc/hisilicon/Kconfig
> +++ b/drivers/soc/hisilicon/Kconfig
> @@ -44,4 +44,14 @@ config HISI_HBMCACHE
> To compile the driver as a module, choose M here:
> the module will be called hisi_hbmcache.
>
> +config HISI_MEM_RAS
> + tristate "Add support for HISI Memory repair"
> + depends on ACPI
> + help
> + Add Memory repair support for Hisilicon device, which can be used to query
> + memory repair cap and repair hardware error in memory devices. This feature
> + need to work with hardware firmwares.
I'd drop the lat sentence. It's a detail the kernel doesn't need.
> +
> + If not sure say no.
> +
> endmenu
> diff --git a/drivers/soc/hisilicon/Makefile b/drivers/soc/hisilicon/Makefile
> index 4ef8823250e8..b339a67bcd77 100644
> --- a/drivers/soc/hisilicon/Makefile
> +++ b/drivers/soc/hisilicon/Makefile
> @@ -6,3 +6,4 @@ obj-$(CONFIG_KUNPENG_HCCS) += kunpeng_hccs.o
> obj-$(CONFIG_HISI_HBMDEV) += hisi_hbmdev.o
> obj-$(CONFIG_HISI_HBMCACHE) += hisi_hbmcache.o
> obj-$(CONFIG_ARM64_PBHA) += pbha.o
> +obj-$(CONFIG_HISI_MEM_RAS) += hisi_mem_ras.o
> diff --git a/drivers/soc/hisilicon/hisi_mem_ras.c b/drivers/soc/hisilicon/hisi_mem_ras.c
> new file mode 100644
> index 000000000000..a29c41ccdba1
> --- /dev/null
> +++ b/drivers/soc/hisilicon/hisi_mem_ras.c
> @@ -0,0 +1,553 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) Huawei Technologies Co., Ltd. 2024. All rights reserved.
> + */
> +
> +#include <linux/kobject.h>
> +#include <linux/module.h>
> +#include <linux/acpi.h>
> +#include <linux/iopoll.h>
> +#include <linux/platform_device.h>
> +#include <linux/mm.h>
> +#include <acpi/pcc.h>
> +
> +#include "hisi_mem_ras.h"
> +#define DRV_NAME "hisi_mem_ras"
> +
> +#define HISI_MEM_PCC_CMD_WAIT_RETRIES_NUM 500ULL
> +#define HISI_MEM_POLL_STATUS_TIME_INTERVAL_US 3
> +
> +static struct hisi_mem_ras *mras;
> +
> +static int hisi_mem_get_paddr_range(struct hisi_mem_dev *hdev)
> +{
> + struct platform_device *pdev = hdev->pdev;
> + struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
> + struct hisi_mem_register_ctx ctx = {0};
Might as well set dev here.
> + acpi_handle handle = adev->handle;
> + acpi_status status;
> +
> + if (!acpi_has_method(handle, METHOD_NAME__CRS)) {
> + dev_err(&pdev->dev, "No _CRS method.\n");
> + return -ENODEV;
> + }
> +
> + ctx.dev = &pdev->dev;
> + status = acpi_walk_resources(handle, METHOD_NAME__CRS,
> + hisi_mem_get_paddr_range_cb, &ctx);
> + if (ACPI_FAILURE(status))
> + return ctx.err;
> +
> + hdev->paddr_min = ctx.paddr_min;
> + hdev->paddr_max = ctx.paddr_max;
> + return 0;
> +}
> +
> +
> +static int hisi_mem_register_pcc_channel(struct hisi_mem_dev *hdev)
> +{
> + struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
> + struct mbox_client *cl = &cl_info->client;
> + struct pcc_mbox_chan *pcc_chan;
> + struct platform_device *pdev = hdev->pdev;
> + int rc;
> +
> + cl->dev = &pdev->dev;
> + cl->tx_block = false;
> + cl->knows_txdone = true;
> + cl->tx_done = hisi_mem_chan_tx_done;
> + cl->rx_callback = hdev->verspec_data->rx_callback;
> + init_completion(&cl_info->done);
> +
> + pcc_chan = pcc_mbox_request_channel(cl, hdev->chan_id);
> + if (IS_ERR(pcc_chan)) {
> + dev_err(&pdev->dev, "PPC channel request failed.\n");
> + rc = -ENODEV;
> + goto out;
return dev_err_probe(&pdev->dev, -ENODEV,...);
> + }
> + cl_info->pcc_chan = pcc_chan;
Maybe worth some __free() magic and delaying setting this until all
error paths are done then using a no_free_ptr() to steal the pointer
from the autocleanup. Will let you do direct returns and
use dev_err_probe() giving cleaner result.
> + cl_info->mbox_chan = pcc_chan->mchan;
> +
> + /*
> + * pcc_chan->latency is just a nominal value. In reality the remote
> + * processor could be much slower to reply. So add an arbitrary amount
> + * of wait on top of nominal.
> + */
> + cl_info->deadline_us =
> + HISI_MEM_PCC_CMD_WAIT_RETRIES_NUM * pcc_chan->latency;
> + if (!hdev->verspec_data->has_txdone_irq &&
> + cl_info->mbox_chan->mbox->txdone_irq) {
> + dev_err(&pdev->dev, "PCC IRQ in PCCT is enabled.\n");
> + rc = -EINVAL;
> + goto err_mbx_channel_free;
> + } else if (hdev->verspec_data->has_txdone_irq &&
> + !cl_info->mbox_chan->mbox->txdone_irq) {
> + dev_err(&pdev->dev, "PCC IRQ in PCCT isn't supported.\n");
> + rc = -EINVAL;
> + goto err_mbx_channel_free;
> + }
> +
> + if (pcc_chan->shmem_base_addr) {
> + cl_info->pcc_comm_addr = ioremap(pcc_chan->shmem_base_addr,
> + pcc_chan->shmem_size);
> + if (!cl_info->pcc_comm_addr) {
> + dev_err(&pdev->dev, "Failed to ioremap PCC communication region for channel-%u.\n",
> + hdev->chan_id);
> + rc = -ENOMEM;
> + goto err_mbx_channel_free;
> + }
> + }
> +
> + return 0;
> +
> +err_mbx_channel_free:
> + pcc_mbox_free_channel(cl_info->pcc_chan);
> +out:
Never have an out label that just returns. Just return inline as it is
easier to read.
> + return rc;
> +}
> +
> +
> +static int hisi_mem_wait_cmd_complete_by_poll(struct hisi_mem_dev *hdev)
> +{
> + struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
> + struct acpi_pcct_shared_memory __iomem *comm_base =
> + cl_info->pcc_comm_addr;
That's an ugly formatting. Just one tab more than line above
before cl_info->...
> + struct platform_device *pdev = hdev->pdev;
> + u16 status;
> + int ret;
> +
> + /*
> + * Poll PCC status register every 3us(delay_us) for maximum of
> + * deadline_us(timeout_us) until PCC command complete bit is set(cond)
> + */
> + ret = readw_poll_timeout(&comm_base->status, status,
> + status & PCC_STATUS_CMD_COMPLETE,
> + HISI_MEM_POLL_STATUS_TIME_INTERVAL_US,
> + cl_info->deadline_us);
> + if (unlikely(ret))
> + dev_err(&pdev->dev, "poll PCC status failed, ret = %d.\n", ret);
> +
> + return ret;
> +}
> +
> +static int hisi_mem_wait_cmd_complete_by_irq(struct hisi_mem_dev *hdev)
> +{
> + struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
> + struct platform_device *pdev = hdev->pdev;
> +
> + if (!wait_for_completion_timeout(&cl_info->done,
> + usecs_to_jiffies(cl_info->deadline_us))) {
> + dev_err(&pdev->dev, "PCC command executed timeout!\n");
> + return -ETIMEDOUT;
> + }
> +
> + return 0;
> +}
> +
> +static inline void hisi_mem_fill_pcc_shared_mem_region(struct hisi_mem_dev *hdev,
> + u8 cmd,
> + struct hisi_mem_desc *desc,
Given this is the request, why use the union rather than
struct hisi_mem_req_desc?
> + void __iomem *comm_space,
> + u16 space_size)
> +{
> + struct acpi_pcct_shared_memory tmp = {
> + .signature = PCC_SIGNATURE | hdev->chan_id,
> + .command = cmd,
> + .status = 0,
I'd not bother intializing status. C will do it for you and it's a fairly
sensible default.
> + };
> +
> + memcpy_toio(hdev->cl_info.pcc_comm_addr, (void *)&tmp,
Should never need to cast to or from an void *
> + sizeof(struct acpi_pcct_shared_memory));
sizeof (*tmp)
> +
> + /* Copy the message to the PCC comm space */
> + memcpy_toio(comm_space, (void *)desc, space_size);
If desc is less than the size of the memspace this is going to copy in
extra. Maybe that's prevented somewhere but it's not obvious to me.
> +}
> +
> +static inline void hisi_mem_fill_ext_pcc_shared_mem_region(struct hisi_mem_dev *hdev,
> + u8 cmd,
> + struct hisi_mem_desc *desc,
> + void __iomem *comm_space,
> + u16 space_size)
> +{
> + struct acpi_pcct_ext_pcc_shared_memory tmp = {
> + .signature = PCC_SIGNATURE | hdev->chan_id,
> + .flags = PCC_CMD_COMPLETION_NOTIFY,
> + .length = HISI_MEM_PCC_SHARE_MEM_BYTES,
> + .command = cmd,
> + };
> +
> + memcpy_toio(hdev->cl_info.pcc_comm_addr, (void *)&tmp,
Should be no point in void * casts.
> + sizeof(struct acpi_pcct_ext_pcc_shared_memory));
sizeof (*tmp)
> +
> + /* Copy the message to the PCC comm space */
> + memcpy_toio(comm_space, (void *)desc, space_size);
> +}
> +
> +static const struct hisi_mem_verspecific_data hisi04b1_verspec_data = {
> + .rx_callback = NULL,
> + .wait_cmd_complete = hisi_mem_wait_cmd_complete_by_poll,
> + .fill_pcc_shared_mem = hisi_mem_fill_pcc_shared_mem_region,
> + .shared_mem_size = sizeof(struct acpi_pcct_shared_memory),
> + .has_txdone_irq = false,
> +};
> +
> +static const struct hisi_mem_verspecific_data hisi04b2_verspec_data = {
> + .rx_callback = hisi_mem_pcc_rx_callback,
> + .wait_cmd_complete = hisi_mem_wait_cmd_complete_by_irq,
> + .fill_pcc_shared_mem = hisi_mem_fill_ext_pcc_shared_mem_region,
> + .shared_mem_size = sizeof(struct acpi_pcct_ext_pcc_shared_memory),
> + .has_txdone_irq = true,
> +};
> +
> +static int hisi_mem_pcc_cmd_send(struct hisi_mem_dev *hdev, u8 cmd,
> + struct hisi_mem_desc *desc)
The operation isn't done in place, so I'd split this into request and response.
They can use same data if that actually makes sense. I'm not sure it does though.
> +{
> + const struct hisi_mem_verspecific_data *verspec_data = hdev->verspec_data;
> + struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
> + struct platform_device *pdev = hdev->pdev;
> + void __iomem *comm_space;
> + u16 space_size;
> + int ret;
> +
> + comm_space = cl_info->pcc_comm_addr + verspec_data->shared_mem_size;
> + space_size = HISI_MEM_PCC_SHARE_MEM_BYTES - verspec_data->shared_mem_size;
> + verspec_data->fill_pcc_shared_mem(hdev, cmd, desc,
> + comm_space, space_size);
> + if (verspec_data->has_txdone_irq)
> + reinit_completion(&cl_info->done);
> +
> + /* Ring doorbell */
> + ret = mbox_send_message(cl_info->mbox_chan, &cmd);
> + if (ret < 0) {
> + dev_err(&pdev->dev, "Send PCC mbox message failed, ret = %d.\n",
> + ret);
> + goto end;
> + }
> +
> + ret = verspec_data->wait_cmd_complete(hdev);
> + if (ret)
> + goto end;
> +
> + /* Copy response data */
> + memcpy_fromio((void *)desc, comm_space, space_size);
Should be no need to cast to void *
> +
> +end:
> + if (verspec_data->has_txdone_irq)
> + mbox_chan_txdone(cl_info->mbox_chan, ret);
> + else
> + mbox_client_txdone(cl_info->mbox_chan, ret);
> + return ret;
> +}
> +
> +int hisi_mem_do_acls_query(struct hisi_mem_dev *hdev, int *status, u64 paddr)
> +{
> + struct platform_device *pdev = hdev->pdev;
> + struct hisi_mem_desc desc;
> + int ret;
> +
> + memset(&desc, 0, sizeof(desc));
> + desc.req.paddr = paddr;
> + desc.req.subcmd = HISI_MEM_RAS_QUERY_CAP;
> + ret = hisi_mem_pcc_cmd_send(hdev, HISI_MEM_RAS_ACLS, &desc);
Similar comments to below.
> + if (ret) {
> + dev_err(&pdev->dev, "do acls query failed, ret = %d.\n", ret);
> + return ret;
> + }
> +
> + *status = desc.rsp.retStatus;
> + return 0;
> +}
> +
> +static ssize_t acls_query_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct hisi_mem_dev *hdev, *tmp;
> + int status, ret;
> + u64 paddr;
> +
> + if (kstrtoull(buf, 16, &paddr))
> + return -EINVAL;
> +
> + list_for_each_entry_safe(hdev, tmp, &mras->list, list) {
> + struct platform_device *pdev = hdev->pdev;
> +
> + if (paddr < hdev->paddr_min || paddr > hdev->paddr_max)
> + continue;
> +
> + ret = hisi_mem_do_acls_query(hdev, &status, paddr);
> + if (ret)
> + return ret;
> +
> + if (status == HISI_MEM_ACLS_OK) {
> + return count;
As below, I'd rather see errors out of line and good path inline.
if (status == HISI_MEM_ACLS_QUERY_NO_RES) {
} else if (status != HSI_MEM_ACLS_OK)
}
return count;
It's a minor thing but if people are reviewing a lot of code this
convention saves them thinking about what is good path and what is bad.
> + } else if (status == HISI_MEM_ACLS_QUERY_NO_RES) {
> + dev_info(&pdev->dev, "acls query no resource left.\n");
> + return -ENOSPC;
> + }
> +
> + dev_err(&pdev->dev, "acls query error code %d.\n", status);
> + return -EIO;
> + }
> +
> + return -ENODEV;
> +}
> +
> +static struct kobj_attribute acls_query_store_attribute =
> + __ATTR(acls_query, 0200, NULL, acls_query_store);
> +
> +int hisi_mem_do_acls_repair(struct hisi_mem_dev *hdev, int *status, u64 paddr)
> +{
> + struct hisi_mem_desc desc;
> + struct platform_device *pdev;
> + int ret;
> +
> + memset(&desc, 0, sizeof(desc));
> + desc.req.paddr = paddr;
> + desc.req.subcmd = HISI_MEM_RAS_DO_REPAIR;
struct hisi_mem_desc desc = {
.req = {
.paddr = paddr,
.subcmd = HISI_MEM_RAS_DO_REPAIR,
},
};
Probably better than meset unless the descriptor has holes that
we are relying on filling with zero. If so add a comment.
> + ret = hisi_mem_pcc_cmd_send(hdev, HISI_MEM_RAS_ACLS, &desc);
As below, I'd use separate storage for req and rsp so we can type them
all the way through. They are tiny so who cares about overhead ;)
> + if (ret) {
> + dev_err(&pdev->dev, "do acls repair failed, ret = %d.\n", ret);
> + return ret;
> + }
> +
> + *status = desc.rsp.retStatus;
> + return 0;
> +}
> +
> +static ssize_t acls_repair_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct hisi_mem_dev *hdev, *tmp;
> + u64 paddr, ret;
> + int status;
> +
> + if (kstrtoull(buf, 16, &paddr))
> + return -EINVAL;
> +
> + list_for_each_entry_safe(hdev, tmp, &mras->list, list) {
> + struct platform_device *pdev = hdev->pdev;
> +
> + if (paddr < hdev->paddr_min || paddr > hdev->paddr_max)
> + continue;
Hmm. So this is why the unified control.
I suspect the question that an upstream proposal would ask is why can't
your instances all have their own controls + padd_min/max exposed in sysfs
then issue repair on the right ones only.
A generic interfaces is going to include info on granularity as well.
How big is the repair?
> +
> + ret = hisi_mem_do_acls_repair(hdev, &status, paddr);
> + if (ret)
> + return ret;
> +
> + if (status == HISI_MEM_ACLS_OK)
> + return count;
Flip this so error condition out of line not the good one.
> +
> + dev_err(&pdev->dev, "acls repair error code %d.\n", status);
> + return -EIO;
> + }
> +
> + return -ENODEV;
> +}
> +static struct kobj_attribute acls_repair_store_attribute =
> + __ATTR(acls_repair, 0200, NULL, acls_repair_store);
> +
> +static struct attribute *acls_attrs[] = {
> + &acls_query_store_attribute.attr,
> + &acls_repair_store_attribute.attr,
> + NULL,
No comma after null terminators.
> +};
> +
> +static struct attribute_group acls_attr_group = {
> + .attrs = acls_attrs,
> +};
AS these are associated with init / exit not probe / remove, move
them down to make that association more obvious
> +
> +static int hisi_mem_ras_probe(struct platform_device *pdev)
> +{
> + struct hisi_mem_dev *hdev;
> + int ret = -ENOMEM;
> +
> + hdev = kzalloc(sizeof(struct hisi_mem_dev), GFP_KERNEL);
> + if (!hdev)
> + return -ENOMEM;
> +
> + hdev->pdev = pdev;
> + ret = hisi_mem_get_pcc_chan_id(hdev);
> + if (ret)
> + goto free_hdev;
> +
> + ret = hisi_mem_register_pcc_channel(hdev);
This lot all needs tearing down in remove..
> + if (ret)
> + goto free_hdev;
> +
> + platform_set_drvdata(pdev, hdev);
> + list_add_tail(&hdev->list, &mras->list);
> + hdev->verspec_data = &hisi04b2_verspec_data;
> + return 0;
> +
> +free_hdev:
> + kfree(hdev);
> + return ret;
> +}
> +
> +static int hisi_mem_ras_remove(struct platform_device *pdev)
> +{
> + struct hisi_mem_dev *hdev = platform_get_drvdata(pdev);
> +
> + platform_set_drvdata(pdev, NULL);
Should be no need to do that.
> + list_del(&hdev->list);
Use a devm_add_action_or_reset() for this.
> +
> + return 0;
> +}
> +
> +static const struct acpi_device_id hisi_mem_acpi_match[] = {
> + { "HISI0FF1", 0},
> + { },
No need for comma on null terminator.
> +};
> +MODULE_DEVICE_TABLE(acpi, hisi_mem_acpi_match);
> +
> +static struct platform_driver hisi_mem_ras_driver = {
> + .probe = hisi_mem_ras_probe,
> + .remove = hisi_mem_ras_remove,
> + .driver = {
> + .name = "hisi_mem_ras",
> + .acpi_match_table = hisi_mem_acpi_match,
> + },
> +};
> +
> +static int __init hisi_mem_ras_init(void)
> +{
> + int ret;
> +
> + mras = kzalloc(sizeof(struct hisi_mem_ras), GFP_KERNEL);
sizeof(*mras)
> + if (!mras)
> + return -ENOMEM;
> +
> + mras->acls_kobj = kobject_create_and_add("acls", kernel_kobj);
> + if (!mras->acls_kobj) {
> + kfree(mras);
> + return -ENOMEM;
> + }
> +
> + ret = sysfs_create_group(mras->acls_kobj, &acls_attr_group);
> + if (ret) {
> + mras->acls_kobj = NULL;
> + kobject_put(mras->acls_kobj);
Put something you just made NULL?
Use a goto for this stuff so it's consistent.
> + return ret;
> + }
This is unusual to see. I guess ABI docs will make it clear
what the intent is. Gut feeling is it won't be upstreamable though.
If the aim is bringing the various PPR control instances together then
that's what the RAS feature control proposal is all about.
> +
> + ret = platform_driver_register(&hisi_mem_ras_driver);
> + if (ret) {
> + sysfs_remove_group(mras->acls_kobj, &acls_attr_group);
> + kobject_put(mras->acls_kobj);
> + kfree(mras);
> + return ret;
> + }
> +
> + INIT_LIST_HEAD(&mras->list);
> +
> + return 0;
> +}
> +module_init(hisi_mem_ras_init);
> +
> +static void __exit hisi_mem_ras_exit(void)
> +{
> + platform_driver_unregister(&hisi_mem_ras_driver);
> + sysfs_remove_group(mras->acls_kobj, &acls_attr_group);
> + kobject_put(mras->acls_kobj);
> + kfree(mras);
> +}
> +module_exit(hisi_mem_ras_exit);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Xiaofei Tan <tanxiaofei(a)huawei.com>");
> +MODULE_DESCRIPTION("HISILICON Memory RAS driver");
> +MODULE_ALIAS("platform:" DRV_NAME);
> diff --git a/drivers/soc/hisilicon/hisi_mem_ras.h b/drivers/soc/hisilicon/hisi_mem_ras.h
> new file mode 100644
> index 000000000000..a9acbb0c1012
> --- /dev/null
> +++ b/drivers/soc/hisilicon/hisi_mem_ras.h
> @@ -0,0 +1,84 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved.
If posting somewhere, update the year.
> + */
> +
> +#ifndef _HISI_MEM_RAS_H
> +#define _HISI_MEM_RAS_H
> +
> +struct hisi_mem_ras {
> + struct list_head list;
> + struct kobject *acls_kobj;
> +};
> +
> +struct hisi_mem_mbox_client_info {
> + struct mbox_client client;
> + struct mbox_chan *mbox_chan;
> + struct pcc_mbox_chan *pcc_chan;
> + u64 deadline_us;
> + void __iomem *pcc_comm_addr;
> + struct completion done;
> +};
> +
> +struct hisi_mem_desc;
> +struct hisi_mem_dev;
> +
> +struct hisi_mem_verspecific_data {
> + void (*rx_callback)(struct mbox_client *cl, void *mssg);
> + int (*wait_cmd_complete)(struct hisi_mem_dev *hdev);
> + void (*fill_pcc_shared_mem)(struct hisi_mem_dev *hdev,
> + u8 cmd, struct hisi_mem_desc *desc,
As above, I'm not convinced a fill should take the union. You only
fill with requests.
> + void __iomem *comm_space,
> + u16 space_size);
> + u16 shared_mem_size;
> + bool has_txdone_irq;
> +};
> +
> +struct hisi_mem_dev {
> + struct list_head list;
> + struct platform_device *pdev;
> + //struct acpi_device *acpi_dev;
Tidy that up.
> + const struct hisi_mem_verspecific_data *verspec_data;
> + /* device capabilities from firmware, like hisi_mem_CAPS_xxx. */
> + u64 caps;
> + u8 chan_id;
> + u64 paddr_min;
> + u64 paddr_max;
> + struct mutex lock;
> + struct hisi_mem_mbox_client_info cl_info;
> +};
> +
> +#define HISI_MEM_PCC_SHARE_MEM_BYTES 128
> +enum hisi_mem_ras_cmd_type {
> + HISI_MEM_RAS_ACLS = 0x4,
> + HISI_MEM_RAS_PPR = 0X5,
> +};
> +
> +enum hisi_mem_ras_subcmd_type {
> + HISI_MEM_RAS_QUERY_CAP = 0x0,
> + HISI_MEM_RAS_DO_REPAIR = 0x1,
> +};
> +
> +struct hisi_mem_req_desc {
> + u64 paddr;
> + u8 subcmd;
> + u8 rsv[3];
> +};
> +
> +#define HISI_MEM_ACLS_OK 0
> +#define HISI_MEM_ACLS_QUERY_NO_RES 1
> +#define HISI_MEM_ACLS_REPAIR_FAIL 2
> +#define HISI_MEM_ACLS_EINVAL 3
> +struct hisi_mem_rsp_desc {
> + u8 retStatus; /* 0: success, other: failure */
> + u8 rsv[7];
> +};
> +
> +struct hisi_mem_desc {
> + union {
> + struct hisi_mem_req_desc req;
> + struct hisi_mem_rsp_desc rsp;
> + };
> +};
> +
> +#endif
1
0

18 Jul '24
Hi Tanxiaofei,
Some more comments inline.
>-----Original Message-----
>From: tanxiaofei <tanxiaofei(a)huawei.com>
>Sent: 09 July 2024 10:34
>To: Shiju Jose <shiju.jose(a)huawei.com>; Jonathan Cameron
><jonathan.cameron(a)huawei.com>; Mauro Carvalho Chehab
><M.Chehab(a)huawei.com>; Roberto Sassu <roberto.sassu(a)huawei.com>;
>Guohanjun (Hanjun Guo) <guohanjun(a)huawei.com>
>Cc: linuxarm(a)openeuler.org; tanxiaofei <tanxiaofei(a)huawei.com>
>Subject: [PATCH] soc: hisilicon: Support memory repair driver on Kunpeng SoC
>
>Kunpeng SoC support several memory repair capability. Normally, such repairs
>are done in firmware startup stage or triggered by BMC(Baseboard
>Management Controller) and done in firmware in runtime stage. OS can't
>perceive such capability.
>
>Some support online repair, while others can't. One reason of not supporting
>online repair is that the memory is in use and the hardware does not support
>backpressure on the primary access channel.
>
>In order to support more online memory repair, we seek solutions from OS. That
>is try to isolate the memory page first, and do repair, and de-isloate after repair
>done.
>
>This patch support online ACLS(Adapter Cache Line Sparing) repair, and may add
>other repair such as PPR in future.
>
>Signed-off-by: Xiaofei Tan <tanxiaofei(a)huawei.com>
>---
> drivers/soc/hisilicon/Kconfig | 10 +
> drivers/soc/hisilicon/Makefile | 1 +
> drivers/soc/hisilicon/hisi_mem_ras.c | 553 +++++++++++++++++++++++++++
>drivers/soc/hisilicon/hisi_mem_ras.h | 84 ++++
> 4 files changed, 648 insertions(+)
> create mode 100644 drivers/soc/hisilicon/hisi_mem_ras.c
> create mode 100644 drivers/soc/hisilicon/hisi_mem_ras.h
>
>diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index
>ae8e55a776e0..983cafc38e5c 100644
>--- a/drivers/soc/hisilicon/Kconfig
>+++ b/drivers/soc/hisilicon/Kconfig
>@@ -44,4 +44,14 @@ config HISI_HBMCACHE
> To compile the driver as a module, choose M here:
> the module will be called hisi_hbmcache.
>
>+config HISI_MEM_RAS
>+ tristate "Add support for HISI Memory repair"
>+ depends on ACPI
>+ help
>+ Add Memory repair support for Hisilicon device, which can be used to
>query
>+ memory repair cap and repair hardware error in memory devices. This
>feature
>+ need to work with hardware firmwares.
>+
>+ If not sure say no.
>+
> endmenu
>diff --git a/drivers/soc/hisilicon/Makefile b/drivers/soc/hisilicon/Makefile index
>4ef8823250e8..b339a67bcd77 100644
>--- a/drivers/soc/hisilicon/Makefile
>+++ b/drivers/soc/hisilicon/Makefile
>@@ -6,3 +6,4 @@ obj-$(CONFIG_KUNPENG_HCCS) += kunpeng_hccs.o
> obj-$(CONFIG_HISI_HBMDEV) += hisi_hbmdev.o
> obj-$(CONFIG_HISI_HBMCACHE) += hisi_hbmcache.o
> obj-$(CONFIG_ARM64_PBHA) += pbha.o
>+obj-$(CONFIG_HISI_MEM_RAS) += hisi_mem_ras.o
>diff --git a/drivers/soc/hisilicon/hisi_mem_ras.c
>b/drivers/soc/hisilicon/hisi_mem_ras.c
>new file mode 100644
>index 000000000000..a29c41ccdba1
>--- /dev/null
>+++ b/drivers/soc/hisilicon/hisi_mem_ras.c
>@@ -0,0 +1,553 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) Huawei Technologies Co., Ltd. 2024. All rights reserved.
>+ */
>+
>+#include <linux/kobject.h>
>+#include <linux/module.h>
>+#include <linux/acpi.h>
>+#include <linux/iopoll.h>
>+#include <linux/platform_device.h>
>+#include <linux/mm.h>
>+#include <acpi/pcc.h>
>+
>+#include "hisi_mem_ras.h"
>+#define DRV_NAME "hisi_mem_ras"
>+
>+#define HISI_MEM_PCC_CMD_WAIT_RETRIES_NUM 500ULL
>+#define HISI_MEM_POLL_STATUS_TIME_INTERVAL_US 3
>+
>+static struct hisi_mem_ras *mras;
>+
>+struct hisi_mem_register_ctx {
>+ struct device *dev;
>+ u8 chan_id;
>+ int err;
>+ u64 paddr_min;
>+ u64 paddr_max;
>+};
>+
>+static acpi_status hisi_mem_get_chan_id_cb(struct acpi_resource *ares,
>+ void *context)
>+{
>+ struct acpi_resource_generic_register *reg;
>+ struct hisi_mem_register_ctx *ctx = context;
>+
>+ if (ares->type != ACPI_RESOURCE_TYPE_GENERIC_REGISTER)
>+ return AE_OK;
>+
>+ reg = &ares->data.generic_reg;
>+ if (reg->space_id != ACPI_ADR_SPACE_PLATFORM_COMM) {
>+ dev_err(ctx->dev, "Bad register resource.\n");
>+ ctx->err = -EINVAL;
>+ return AE_ERROR;
>+ }
>+ ctx->chan_id = reg->access_size;
Passing PCC channel ID in register access size may not be correct.
>+
>+ return AE_OK;
>+}
>+
>+static int hisi_mem_get_pcc_chan_id(struct hisi_mem_dev *hdev) {
>+ struct platform_device *pdev = hdev->pdev;
>+ struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
>+ struct hisi_mem_register_ctx ctx = {0};
>+ acpi_handle handle = adev->handle;
>+ acpi_status status;
>+
>+ if (!acpi_has_method(handle, METHOD_NAME__CRS)) {
>+ dev_err(&pdev->dev, "No _CRS method.\n");
>+ return -ENODEV;
>+ }
>+
>+ ctx.dev = &pdev->dev;
>+ status = acpi_walk_resources(handle, METHOD_NAME__CRS,
>+ hisi_mem_get_chan_id_cb, &ctx);
>+ if (ACPI_FAILURE(status))
>+ return ctx.err;
>+
Please add following here as hdev->chan_id is not set.
hdev->chan_id = ctx->chan_id;
>+ return 0;
>+}
>+
>+static acpi_status hisi_mem_get_paddr_range_cb(struct acpi_resource *ares,
>+ void *context)
>+{
>+ struct hisi_mem_register_ctx *ctx = context;
>+ struct acpi_resource_address64 *addr64;
>+
>+ if (ares->type != ACPI_RESOURCE_TYPE_ADDRESS64)
>+ return AE_OK;
>+
>+ addr64 = &ares->data.address64;
>+ ctx->paddr_min = addr64->address.minimum;
>+ ctx->paddr_max = addr64->address.maximum;
>+
>+ return AE_OK;
>+}
>+
>+static int hisi_mem_get_paddr_range(struct hisi_mem_dev *hdev) {
>+ struct platform_device *pdev = hdev->pdev;
>+ struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
>+ struct hisi_mem_register_ctx ctx = {0};
>+ acpi_handle handle = adev->handle;
>+ acpi_status status;
>+
>+ if (!acpi_has_method(handle, METHOD_NAME__CRS)) {
>+ dev_err(&pdev->dev, "No _CRS method.\n");
>+ return -ENODEV;
>+ }
>+
>+ ctx.dev = &pdev->dev;
>+ status = acpi_walk_resources(handle, METHOD_NAME__CRS,
>+ hisi_mem_get_paddr_range_cb, &ctx);
>+ if (ACPI_FAILURE(status))
>+ return ctx.err;
>+
>+ hdev->paddr_min = ctx.paddr_min;
>+ hdev->paddr_max = ctx.paddr_max;
>+ return 0;
>+}
>+
>+static void hisi_mem_chan_tx_done(struct mbox_client *cl, void *msg,
>+int ret) {
>+ if (ret < 0)
>+ pr_debug("TX did not complete: CMD sent:0x%x, ret:%d\n",
>+ *(u8 *)msg, ret);
>+ else
>+ pr_debug("TX completed. CMD sent:0x%x, ret:%d\n",
>+ *(u8 *)msg, ret);
>+}
>+
>+static void hisi_mem_pcc_rx_callback(struct mbox_client *cl, void
>+*mssg) {
>+ struct hisi_mem_mbox_client_info *cl_info =
>+ container_of(cl, struct hisi_mem_mbox_client_info,
>client);
>+
>+ complete(&cl_info->done);
>+}
>+
>+static void hisi_mem_unregister_pcc_channel(struct hisi_mem_dev *hdev)
>+{
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+
>+ if (cl_info->pcc_comm_addr)
>+ iounmap(cl_info->pcc_comm_addr);
>+ pcc_mbox_free_channel(hdev->cl_info.pcc_chan);
>+}
>+
>+static int hisi_mem_register_pcc_channel(struct hisi_mem_dev *hdev) {
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct mbox_client *cl = &cl_info->client;
>+ struct pcc_mbox_chan *pcc_chan;
>+ struct platform_device *pdev = hdev->pdev;
>+ int rc;
>+
>+ cl->dev = &pdev->dev;
>+ cl->tx_block = false;
>+ cl->knows_txdone = true;
>+ cl->tx_done = hisi_mem_chan_tx_done;
>+ cl->rx_callback = hdev->verspec_data->rx_callback;
>+ init_completion(&cl_info->done);
>+
>+ pcc_chan = pcc_mbox_request_channel(cl, hdev->chan_id);
Where the hdev->chan_id is being set? I think it is the same ctx->chan_id set in the
hisi_mem_get_chan_id_cb() ?
>+ if (IS_ERR(pcc_chan)) {
>+ dev_err(&pdev->dev, "PPC channel request failed.\n");
PPC -> PCC
>+ rc = -ENODEV;
>+ goto out;
>+ }
>+ cl_info->pcc_chan = pcc_chan;
>+ cl_info->mbox_chan = pcc_chan->mchan;
>+
>+ /*
>+ * pcc_chan->latency is just a nominal value. In reality the remote
>+ * processor could be much slower to reply. So add an arbitrary amount
>+ * of wait on top of nominal.
>+ */
>+ cl_info->deadline_us =
>+ HISI_MEM_PCC_CMD_WAIT_RETRIES_NUM *
>pcc_chan->latency;
>+ if (!hdev->verspec_data->has_txdone_irq &&
>+ cl_info->mbox_chan->mbox->txdone_irq) {
>+ dev_err(&pdev->dev, "PCC IRQ in PCCT is enabled.\n");
>+ rc = -EINVAL;
>+ goto err_mbx_channel_free;
>+ } else if (hdev->verspec_data->has_txdone_irq &&
>+ !cl_info->mbox_chan->mbox->txdone_irq) {
>+ dev_err(&pdev->dev, "PCC IRQ in PCCT isn't supported.\n");
>+ rc = -EINVAL;
>+ goto err_mbx_channel_free;
>+ }
>+
>+ if (pcc_chan->shmem_base_addr) {
>+ cl_info->pcc_comm_addr = ioremap(pcc_chan-
>>shmem_base_addr,
>+ pcc_chan->shmem_size);
>+ if (!cl_info->pcc_comm_addr) {
>+ dev_err(&pdev->dev, "Failed to ioremap PCC
>communication region for channel-%u.\n",
>+ hdev->chan_id);
>+ rc = -ENOMEM;
>+ goto err_mbx_channel_free;
>+ }
>+ }
>+
>+ return 0;
>+
>+err_mbx_channel_free:
>+ pcc_mbox_free_channel(cl_info->pcc_chan);
>+out:
>+ return rc;
>+}
>+
>+
>+static int hisi_mem_wait_cmd_complete_by_poll(struct hisi_mem_dev
>+*hdev) {
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct acpi_pcct_shared_memory __iomem *comm_base =
>+ cl_info-
>>pcc_comm_addr;
>+ struct platform_device *pdev = hdev->pdev;
>+ u16 status;
>+ int ret;
>+
>+ /*
>+ * Poll PCC status register every 3us(delay_us) for maximum of
>+ * deadline_us(timeout_us) until PCC command complete bit is set(cond)
>+ */
>+ ret = readw_poll_timeout(&comm_base->status, status,
>+ status & PCC_STATUS_CMD_COMPLETE,
>+ HISI_MEM_POLL_STATUS_TIME_INTERVAL_US,
>+ cl_info->deadline_us);
>+ if (unlikely(ret))
>+ dev_err(&pdev->dev, "poll PCC status failed, ret = %d.\n", ret);
>+
>+ return ret;
>+}
>+
>+static int hisi_mem_wait_cmd_complete_by_irq(struct hisi_mem_dev *hdev)
>+{
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct platform_device *pdev = hdev->pdev;
>+
>+ if (!wait_for_completion_timeout(&cl_info->done,
>+ usecs_to_jiffies(cl_info->deadline_us))) {
>+ dev_err(&pdev->dev, "PCC command executed timeout!\n");
>+ return -ETIMEDOUT;
>+ }
>+
>+ return 0;
>+}
>+
>+static inline void hisi_mem_fill_pcc_shared_mem_region(struct hisi_mem_dev
>*hdev,
>+ u8 cmd,
>+ struct hisi_mem_desc *desc,
>+ void __iomem *comm_space,
>+ u16 space_size)
>+{
>+ struct acpi_pcct_shared_memory tmp = {
>+ .signature = PCC_SIGNATURE | hdev->chan_id,
>+ .command = cmd,
>+ .status = 0,
>+ };
>+
>+ memcpy_toio(hdev->cl_info.pcc_comm_addr, (void *)&tmp,
>+ sizeof(struct acpi_pcct_shared_memory));
>+
>+ /* Copy the message to the PCC comm space */
>+ memcpy_toio(comm_space, (void *)desc, space_size); }
>+
>+static inline void hisi_mem_fill_ext_pcc_shared_mem_region(struct
>hisi_mem_dev *hdev,
>+ u8 cmd,
>+ struct hisi_mem_desc
>*desc,
>+ void __iomem
>*comm_space,
>+ u16 space_size)
>+{
>+ struct acpi_pcct_ext_pcc_shared_memory tmp = {
>+ .signature = PCC_SIGNATURE | hdev->chan_id,
>+ .flags = PCC_CMD_COMPLETION_NOTIFY,
>+ .length = HISI_MEM_PCC_SHARE_MEM_BYTES,
>+ .command = cmd,
>+ };
>+
>+ memcpy_toio(hdev->cl_info.pcc_comm_addr, (void *)&tmp,
>+ sizeof(struct acpi_pcct_ext_pcc_shared_memory));
>+
>+ /* Copy the message to the PCC comm space */
>+ memcpy_toio(comm_space, (void *)desc, space_size); }
>+
>+static const struct hisi_mem_verspecific_data hisi04b1_verspec_data = {
>+ .rx_callback = NULL,
>+ .wait_cmd_complete = hisi_mem_wait_cmd_complete_by_poll,
>+ .fill_pcc_shared_mem = hisi_mem_fill_pcc_shared_mem_region,
>+ .shared_mem_size = sizeof(struct acpi_pcct_shared_memory),
>+ .has_txdone_irq = false,
>+};
>+
>+static const struct hisi_mem_verspecific_data hisi04b2_verspec_data = {
>+ .rx_callback = hisi_mem_pcc_rx_callback,
>+ .wait_cmd_complete = hisi_mem_wait_cmd_complete_by_irq,
>+ .fill_pcc_shared_mem = hisi_mem_fill_ext_pcc_shared_mem_region,
>+ .shared_mem_size = sizeof(struct acpi_pcct_ext_pcc_shared_memory),
>+ .has_txdone_irq = true,
>+};
>+
>+static int hisi_mem_pcc_cmd_send(struct hisi_mem_dev *hdev, u8 cmd,
>+ struct hisi_mem_desc *desc)
>+{
>+ const struct hisi_mem_verspecific_data *verspec_data = hdev-
>>verspec_data;
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct platform_device *pdev = hdev->pdev;
>+ void __iomem *comm_space;
>+ u16 space_size;
>+ int ret;
>+
>+ comm_space = cl_info->pcc_comm_addr + verspec_data-
>>shared_mem_size;
>+ space_size = HISI_MEM_PCC_SHARE_MEM_BYTES - verspec_data-
>>shared_mem_size;
>+ verspec_data->fill_pcc_shared_mem(hdev, cmd, desc,
>+ comm_space, space_size);
>+ if (verspec_data->has_txdone_irq)
>+ reinit_completion(&cl_info->done);
>+
>+ /* Ring doorbell */
>+ ret = mbox_send_message(cl_info->mbox_chan, &cmd);
>+ if (ret < 0) {
>+ dev_err(&pdev->dev, "Send PCC mbox message failed, ret =
>%d.\n",
>+ ret);
>+ goto end;
>+ }
>+
>+ ret = verspec_data->wait_cmd_complete(hdev);
>+ if (ret)
>+ goto end;
>+
>+ /* Copy response data */
>+ memcpy_fromio((void *)desc, comm_space, space_size);
>+
>+end:
>+ if (verspec_data->has_txdone_irq)
>+ mbox_chan_txdone(cl_info->mbox_chan, ret);
>+ else
>+ mbox_client_txdone(cl_info->mbox_chan, ret);
>+ return ret;
>+}
>+
>+int hisi_mem_do_acls_query(struct hisi_mem_dev *hdev, int *status, u64
>+paddr) {
>+ struct platform_device *pdev = hdev->pdev;
>+ struct hisi_mem_desc desc;
>+ int ret;
>+
>+ memset(&desc, 0, sizeof(desc));
>+ desc.req.paddr = paddr;
>+ desc.req.subcmd = HISI_MEM_RAS_QUERY_CAP;
>+ ret = hisi_mem_pcc_cmd_send(hdev, HISI_MEM_RAS_ACLS, &desc);
>+ if (ret) {
>+ dev_err(&pdev->dev, "do acls query failed, ret = %d.\n", ret);
>+ return ret;
>+ }
>+
>+ *status = desc.rsp.retStatus;
>+ return 0;
>+}
>+
>+static ssize_t acls_query_store(struct kobject *kobj,
>+ struct kobj_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ struct hisi_mem_dev *hdev, *tmp;
>+ int status, ret;
>+ u64 paddr;
>+
>+ if (kstrtoull(buf, 16, &paddr))
>+ return -EINVAL;
>+
>+ list_for_each_entry_safe(hdev, tmp, &mras->list, list) {
>+ struct platform_device *pdev = hdev->pdev;
>+
>+ if (paddr < hdev->paddr_min || paddr > hdev->paddr_max)
>+ continue;
>+
>+ ret = hisi_mem_do_acls_query(hdev, &status, paddr);
>+ if (ret)
>+ return ret;
>+
>+ if (status == HISI_MEM_ACLS_OK) {
>+ return count;
>+ } else if (status == HISI_MEM_ACLS_QUERY_NO_RES) {
>+ dev_info(&pdev->dev, "acls query no resource left.\n");
>+ return -ENOSPC;
>+ }
>+
>+ dev_err(&pdev->dev, "acls query error code %d.\n", status);
>+ return -EIO;
>+ }
>+
>+ return -ENODEV;
>+}
>+
>+static struct kobj_attribute acls_query_store_attribute =
>+ __ATTR(acls_query, 0200, NULL, acls_query_store);
>+
>+int hisi_mem_do_acls_repair(struct hisi_mem_dev *hdev, int *status, u64
>+paddr) {
>+ struct hisi_mem_desc desc;
>+ struct platform_device *pdev;
>+ int ret;
>+
>+ memset(&desc, 0, sizeof(desc));
>+ desc.req.paddr = paddr;
>+ desc.req.subcmd = HISI_MEM_RAS_DO_REPAIR;
>+ ret = hisi_mem_pcc_cmd_send(hdev, HISI_MEM_RAS_ACLS, &desc);
>+ if (ret) {
>+ dev_err(&pdev->dev, "do acls repair failed, ret = %d.\n", ret);
>+ return ret;
>+ }
>+
>+ *status = desc.rsp.retStatus;
>+ return 0;
>+}
>+
>+static ssize_t acls_repair_store(struct kobject *kobj,
>+ struct kobj_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ struct hisi_mem_dev *hdev, *tmp;
>+ u64 paddr, ret;
>+ int status;
>+
>+ if (kstrtoull(buf, 16, &paddr))
>+ return -EINVAL;
>+
>+ list_for_each_entry_safe(hdev, tmp, &mras->list, list) {
>+ struct platform_device *pdev = hdev->pdev;
>+
>+ if (paddr < hdev->paddr_min || paddr > hdev->paddr_max)
>+ continue;
>+
>+ ret = hisi_mem_do_acls_repair(hdev, &status, paddr);
>+ if (ret)
>+ return ret;
>+
>+ if (status == HISI_MEM_ACLS_OK)
>+ return count;
>+
>+ dev_err(&pdev->dev, "acls repair error code %d.\n", status);
>+ return -EIO;
>+ }
>+
>+ return -ENODEV;
>+}
>+static struct kobj_attribute acls_repair_store_attribute =
>+ __ATTR(acls_repair, 0200, NULL, acls_repair_store);
>+
>+static struct attribute *acls_attrs[] = {
>+ &acls_query_store_attribute.attr,
>+ &acls_repair_store_attribute.attr,
>+ NULL,
>+};
>+
>+static struct attribute_group acls_attr_group = {
>+ .attrs = acls_attrs,
>+};
>+
>+static int hisi_mem_ras_probe(struct platform_device *pdev) {
>+ struct hisi_mem_dev *hdev;
>+ int ret = -ENOMEM;
>+
>+ hdev = kzalloc(sizeof(struct hisi_mem_dev), GFP_KERNEL);
>+ if (!hdev)
>+ return -ENOMEM;
>+
>+ hdev->pdev = pdev;
>+ ret = hisi_mem_get_pcc_chan_id(hdev);
>+ if (ret)
>+ goto free_hdev;
>+
>+ ret = hisi_mem_register_pcc_channel(hdev);
>+ if (ret)
>+ goto free_hdev;
>+
>+ platform_set_drvdata(pdev, hdev);
>+ list_add_tail(&hdev->list, &mras->list);
>+ hdev->verspec_data = &hisi04b2_verspec_data;
1. hdev->verspec_data is being accessed in the above hisi_mem_register_pcc_channel() function.
Need to set hdev->verspec_data before calling hisi_mem_register_pcc_channel()?
2. Both hisi04b1_verspec_data and hisi04b2_verspec_data are defined, seems hisi04b1_verspec_data not used.
>+ return 0;
>+
>+free_hdev:
>+ kfree(hdev);
>+ return ret;
>+}
>+
>+static int hisi_mem_ras_remove(struct platform_device *pdev) {
>+ struct hisi_mem_dev *hdev = platform_get_drvdata(pdev);
>+
>+ platform_set_drvdata(pdev, NULL);
>+ list_del(&hdev->list);
>+
>+ return 0;
>+}
>+
>+static const struct acpi_device_id hisi_mem_acpi_match[] = {
>+ { "HISI0FF1", 0},
>+ { },
>+};
>+MODULE_DEVICE_TABLE(acpi, hisi_mem_acpi_match);
>+
>+static struct platform_driver hisi_mem_ras_driver = {
>+ .probe = hisi_mem_ras_probe,
>+ .remove = hisi_mem_ras_remove,
>+ .driver = {
>+ .name = "hisi_mem_ras",
>+ .acpi_match_table = hisi_mem_acpi_match,
>+ },
>+};
>+
>+static int __init hisi_mem_ras_init(void) {
>+ int ret;
>+
>+ mras = kzalloc(sizeof(struct hisi_mem_ras), GFP_KERNEL);
>+ if (!mras)
>+ return -ENOMEM;
>+
>+ mras->acls_kobj = kobject_create_and_add("acls", kernel_kobj);
>+ if (!mras->acls_kobj) {
>+ kfree(mras);
>+ return -ENOMEM;
>+ }
>+
>+ ret = sysfs_create_group(mras->acls_kobj, &acls_attr_group);
>+ if (ret) {
>+ mras->acls_kobj = NULL;
This should be done after below kobject_put.
>+ kobject_put(mras->acls_kobj);
>+ return ret;
>+ }
>+
>+ ret = platform_driver_register(&hisi_mem_ras_driver);
>+ if (ret) {
>+ sysfs_remove_group(mras->acls_kobj, &acls_attr_group);
>+ kobject_put(mras->acls_kobj);
>+ kfree(mras);
>+ return ret;
>+ }
>+
>+ INIT_LIST_HEAD(&mras->list);
>+
>+ return 0;
>+}
>+module_init(hisi_mem_ras_init);
>+
>+static void __exit hisi_mem_ras_exit(void) {
>+ platform_driver_unregister(&hisi_mem_ras_driver);
>+ sysfs_remove_group(mras->acls_kobj, &acls_attr_group);
>+ kobject_put(mras->acls_kobj);
>+ kfree(mras);
>+}
>+module_exit(hisi_mem_ras_exit);
>+
>+MODULE_LICENSE("GPL v2");
>+MODULE_AUTHOR("Xiaofei Tan <tanxiaofei(a)huawei.com>");
>+MODULE_DESCRIPTION("HISILICON Memory RAS driver");
>+MODULE_ALIAS("platform:" DRV_NAME);
>diff --git a/drivers/soc/hisilicon/hisi_mem_ras.h
>b/drivers/soc/hisilicon/hisi_mem_ras.h
>new file mode 100644
>index 000000000000..a9acbb0c1012
>--- /dev/null
>+++ b/drivers/soc/hisilicon/hisi_mem_ras.h
>@@ -0,0 +1,84 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+/*
>+ * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved.
>+ */
>+
>+#ifndef _HISI_MEM_RAS_H
>+#define _HISI_MEM_RAS_H
>+
>+struct hisi_mem_ras {
>+ struct list_head list;
>+ struct kobject *acls_kobj;
>+};
>+
>+struct hisi_mem_mbox_client_info {
>+ struct mbox_client client;
>+ struct mbox_chan *mbox_chan;
>+ struct pcc_mbox_chan *pcc_chan;
>+ u64 deadline_us;
>+ void __iomem *pcc_comm_addr;
>+ struct completion done;
>+};
>+
>+struct hisi_mem_desc;
>+struct hisi_mem_dev;
>+
>+struct hisi_mem_verspecific_data {
>+ void (*rx_callback)(struct mbox_client *cl, void *mssg);
>+ int (*wait_cmd_complete)(struct hisi_mem_dev *hdev);
>+ void (*fill_pcc_shared_mem)(struct hisi_mem_dev *hdev,
>+ u8 cmd, struct hisi_mem_desc *desc,
>+ void __iomem *comm_space,
>+ u16 space_size);
>+ u16 shared_mem_size;
>+ bool has_txdone_irq;
>+};
>+
>+struct hisi_mem_dev {
>+ struct list_head list;
>+ struct platform_device *pdev;
>+ //struct acpi_device *acpi_dev;
>+ const struct hisi_mem_verspecific_data *verspec_data;
>+ /* device capabilities from firmware, like hisi_mem_CAPS_xxx. */
>+ u64 caps;
>+ u8 chan_id;
>+ u64 paddr_min;
>+ u64 paddr_max;
>+ struct mutex lock;
>+ struct hisi_mem_mbox_client_info cl_info; };
>+
>+#define HISI_MEM_PCC_SHARE_MEM_BYTES 128
>+enum hisi_mem_ras_cmd_type {
>+ HISI_MEM_RAS_ACLS = 0x4,
>+ HISI_MEM_RAS_PPR = 0X5,
>+};
>+
>+enum hisi_mem_ras_subcmd_type {
>+ HISI_MEM_RAS_QUERY_CAP = 0x0,
>+ HISI_MEM_RAS_DO_REPAIR = 0x1,
>+};
>+
>+struct hisi_mem_req_desc {
>+ u64 paddr;
>+ u8 subcmd;
>+ u8 rsv[3];
>+};
>+
>+#define HISI_MEM_ACLS_OK 0
>+#define HISI_MEM_ACLS_QUERY_NO_RES 1
>+#define HISI_MEM_ACLS_REPAIR_FAIL 2
>+#define HISI_MEM_ACLS_EINVAL 3
>+struct hisi_mem_rsp_desc {
>+ u8 retStatus; /* 0: success, other: failure */
>+ u8 rsv[7];
>+};
>+
>+struct hisi_mem_desc {
>+ union {
>+ struct hisi_mem_req_desc req;
>+ struct hisi_mem_rsp_desc rsp;
>+ };
>+};
>+
>+#endif
>--
>2.33.0
Thanks,
Shiju
1
0

10 Jul '24
Hi Tanxiaofei,
Please find few comments inline after quick look.
I will go through in more detail bit later.
Thanks,
Shiju
>-----Original Message-----
>From: tanxiaofei <tanxiaofei(a)huawei.com>
>Sent: 09 July 2024 10:34
>To: Shiju Jose <shiju.jose(a)huawei.com>; Jonathan Cameron
><jonathan.cameron(a)huawei.com>; Mauro Carvalho Chehab
><M.Chehab(a)huawei.com>; Roberto Sassu <roberto.sassu(a)huawei.com>;
>Guohanjun (Hanjun Guo) <guohanjun(a)huawei.com>
>Cc: linuxarm(a)openeuler.org; tanxiaofei <tanxiaofei(a)huawei.com>
>Subject: [PATCH] soc: hisilicon: Support memory repair driver on Kunpeng SoC
>
>Kunpeng SoC support several memory repair capability. Normally, such repairs
>are done in firmware startup stage or triggered by BMC(Baseboard
>Management Controller) and done in firmware in runtime stage. OS can't
>perceive such capability.
>
>Some support online repair, while others can't. One reason of not supporting
>online repair is that the memory is in use and the hardware does not support
>backpressure on the primary access channel.
>
>In order to support more online memory repair, we seek solutions from OS. That
>is try to isolate the memory page first, and do repair, and de-isloate after repair
>done.
>
>This patch support online ACLS(Adapter Cache Line Sparing) repair, and may add
>other repair such as PPR in future.
>
>Signed-off-by: Xiaofei Tan <tanxiaofei(a)huawei.com>
>---
> drivers/soc/hisilicon/Kconfig | 10 +
> drivers/soc/hisilicon/Makefile | 1 +
> drivers/soc/hisilicon/hisi_mem_ras.c | 553 +++++++++++++++++++++++++++
>drivers/soc/hisilicon/hisi_mem_ras.h | 84 ++++
> 4 files changed, 648 insertions(+)
> create mode 100644 drivers/soc/hisilicon/hisi_mem_ras.c
> create mode 100644 drivers/soc/hisilicon/hisi_mem_ras.h
>
>diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index
>ae8e55a776e0..983cafc38e5c 100644
>--- a/drivers/soc/hisilicon/Kconfig
>+++ b/drivers/soc/hisilicon/Kconfig
>@@ -44,4 +44,14 @@ config HISI_HBMCACHE
> To compile the driver as a module, choose M here:
> the module will be called hisi_hbmcache.
>
>+config HISI_MEM_RAS
>+ tristate "Add support for HISI Memory repair"
>+ depends on ACPI
>+ help
>+ Add Memory repair support for Hisilicon device, which can be used to
>query
>+ memory repair cap and repair hardware error in memory devices. This
>feature
>+ need to work with hardware firmwares.
>+
>+ If not sure say no.
>+
> endmenu
>diff --git a/drivers/soc/hisilicon/Makefile b/drivers/soc/hisilicon/Makefile index
>4ef8823250e8..b339a67bcd77 100644
>--- a/drivers/soc/hisilicon/Makefile
>+++ b/drivers/soc/hisilicon/Makefile
>@@ -6,3 +6,4 @@ obj-$(CONFIG_KUNPENG_HCCS) += kunpeng_hccs.o
> obj-$(CONFIG_HISI_HBMDEV) += hisi_hbmdev.o
> obj-$(CONFIG_HISI_HBMCACHE) += hisi_hbmcache.o
> obj-$(CONFIG_ARM64_PBHA) += pbha.o
>+obj-$(CONFIG_HISI_MEM_RAS) += hisi_mem_ras.o
>diff --git a/drivers/soc/hisilicon/hisi_mem_ras.c
>b/drivers/soc/hisilicon/hisi_mem_ras.c
>new file mode 100644
>index 000000000000..a29c41ccdba1
>--- /dev/null
>+++ b/drivers/soc/hisilicon/hisi_mem_ras.c
>@@ -0,0 +1,553 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) Huawei Technologies Co., Ltd. 2024. All rights reserved.
>+ */
>+
>+#include <linux/kobject.h>
>+#include <linux/module.h>
>+#include <linux/acpi.h>
>+#include <linux/iopoll.h>
>+#include <linux/platform_device.h>
>+#include <linux/mm.h>
>+#include <acpi/pcc.h>
>+
>+#include "hisi_mem_ras.h"
>+#define DRV_NAME "hisi_mem_ras"
>+
>+#define HISI_MEM_PCC_CMD_WAIT_RETRIES_NUM 500ULL
>+#define HISI_MEM_POLL_STATUS_TIME_INTERVAL_US 3
>+
>+static struct hisi_mem_ras *mras;
>+
>+struct hisi_mem_register_ctx {
>+ struct device *dev;
>+ u8 chan_id;
>+ int err;
>+ u64 paddr_min;
>+ u64 paddr_max;
>+};
>+
>+static acpi_status hisi_mem_get_chan_id_cb(struct acpi_resource *ares,
>+ void *context)
>+{
>+ struct acpi_resource_generic_register *reg;
>+ struct hisi_mem_register_ctx *ctx = context;
>+
>+ if (ares->type != ACPI_RESOURCE_TYPE_GENERIC_REGISTER)
>+ return AE_OK;
>+
>+ reg = &ares->data.generic_reg;
>+ if (reg->space_id != ACPI_ADR_SPACE_PLATFORM_COMM) {
>+ dev_err(ctx->dev, "Bad register resource.\n");
>+ ctx->err = -EINVAL;
>+ return AE_ERROR;
>+ }
>+ ctx->chan_id = reg->access_size;
>+
>+ return AE_OK;
>+}
>+
>+static int hisi_mem_get_pcc_chan_id(struct hisi_mem_dev *hdev) {
>+ struct platform_device *pdev = hdev->pdev;
>+ struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
>+ struct hisi_mem_register_ctx ctx = {0};
>+ acpi_handle handle = adev->handle;
>+ acpi_status status;
>+
>+ if (!acpi_has_method(handle, METHOD_NAME__CRS)) {
>+ dev_err(&pdev->dev, "No _CRS method.\n");
>+ return -ENODEV;
>+ }
>+
>+ ctx.dev = &pdev->dev;
>+ status = acpi_walk_resources(handle, METHOD_NAME__CRS,
>+ hisi_mem_get_chan_id_cb, &ctx);
>+ if (ACPI_FAILURE(status))
>+ return ctx.err;
>+
>+ return 0;
>+}
>+
>+static acpi_status hisi_mem_get_paddr_range_cb(struct acpi_resource *ares,
>+ void *context)
>+{
>+ struct hisi_mem_register_ctx *ctx = context;
>+ struct acpi_resource_address64 *addr64;
>+
>+ if (ares->type != ACPI_RESOURCE_TYPE_ADDRESS64)
>+ return AE_OK;
>+
>+ addr64 = &ares->data.address64;
>+ ctx->paddr_min = addr64->address.minimum;
>+ ctx->paddr_max = addr64->address.maximum;
>+
>+ return AE_OK;
>+}
>+
>+static int hisi_mem_get_paddr_range(struct hisi_mem_dev *hdev) {
>+ struct platform_device *pdev = hdev->pdev;
>+ struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
>+ struct hisi_mem_register_ctx ctx = {0};
>+ acpi_handle handle = adev->handle;
>+ acpi_status status;
>+
>+ if (!acpi_has_method(handle, METHOD_NAME__CRS)) {
>+ dev_err(&pdev->dev, "No _CRS method.\n");
>+ return -ENODEV;
>+ }
>+
>+ ctx.dev = &pdev->dev;
>+ status = acpi_walk_resources(handle, METHOD_NAME__CRS,
>+ hisi_mem_get_paddr_range_cb, &ctx);
>+ if (ACPI_FAILURE(status))
>+ return ctx.err;
>+
>+ hdev->paddr_min = ctx.paddr_min;
>+ hdev->paddr_max = ctx.paddr_max;
>+ return 0;
>+}
>+
>+static void hisi_mem_chan_tx_done(struct mbox_client *cl, void *msg,
>+int ret) {
>+ if (ret < 0)
>+ pr_debug("TX did not complete: CMD sent:0x%x, ret:%d\n",
>+ *(u8 *)msg, ret);
>+ else
>+ pr_debug("TX completed. CMD sent:0x%x, ret:%d\n",
>+ *(u8 *)msg, ret);
>+}
>+
>+static void hisi_mem_pcc_rx_callback(struct mbox_client *cl, void
>+*mssg) {
>+ struct hisi_mem_mbox_client_info *cl_info =
>+ container_of(cl, struct hisi_mem_mbox_client_info,
>client);
>+
>+ complete(&cl_info->done);
>+}
>+
>+static void hisi_mem_unregister_pcc_channel(struct hisi_mem_dev *hdev)
This function is not used.
>+{
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+
>+ if (cl_info->pcc_comm_addr)
>+ iounmap(cl_info->pcc_comm_addr);
>+ pcc_mbox_free_channel(hdev->cl_info.pcc_chan);
>+}
>+
>+static int hisi_mem_register_pcc_channel(struct hisi_mem_dev *hdev) {
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct mbox_client *cl = &cl_info->client;
>+ struct pcc_mbox_chan *pcc_chan;
>+ struct platform_device *pdev = hdev->pdev;
>+ int rc;
>+
>+ cl->dev = &pdev->dev;
>+ cl->tx_block = false;
>+ cl->knows_txdone = true;
>+ cl->tx_done = hisi_mem_chan_tx_done;
>+ cl->rx_callback = hdev->verspec_data->rx_callback;
>+ init_completion(&cl_info->done);
>+
>+ pcc_chan = pcc_mbox_request_channel(cl, hdev->chan_id);
>+ if (IS_ERR(pcc_chan)) {
>+ dev_err(&pdev->dev, "PPC channel request failed.\n");
>+ rc = -ENODEV;
>+ goto out;
>+ }
>+ cl_info->pcc_chan = pcc_chan;
>+ cl_info->mbox_chan = pcc_chan->mchan;
>+
>+ /*
>+ * pcc_chan->latency is just a nominal value. In reality the remote
>+ * processor could be much slower to reply. So add an arbitrary amount
>+ * of wait on top of nominal.
>+ */
>+ cl_info->deadline_us =
>+ HISI_MEM_PCC_CMD_WAIT_RETRIES_NUM *
>pcc_chan->latency;
>+ if (!hdev->verspec_data->has_txdone_irq &&
>+ cl_info->mbox_chan->mbox->txdone_irq) {
>+ dev_err(&pdev->dev, "PCC IRQ in PCCT is enabled.\n");
>+ rc = -EINVAL;
>+ goto err_mbx_channel_free;
>+ } else if (hdev->verspec_data->has_txdone_irq &&
>+ !cl_info->mbox_chan->mbox->txdone_irq) {
>+ dev_err(&pdev->dev, "PCC IRQ in PCCT isn't supported.\n");
>+ rc = -EINVAL;
>+ goto err_mbx_channel_free;
>+ }
>+
>+ if (pcc_chan->shmem_base_addr) {
>+ cl_info->pcc_comm_addr = ioremap(pcc_chan-
>>shmem_base_addr,
>+ pcc_chan->shmem_size);
>+ if (!cl_info->pcc_comm_addr) {
>+ dev_err(&pdev->dev, "Failed to ioremap PCC
>communication region for channel-%u.\n",
>+ hdev->chan_id);
>+ rc = -ENOMEM;
>+ goto err_mbx_channel_free;
>+ }
>+ }
>+
>+ return 0;
>+
>+err_mbx_channel_free:
>+ pcc_mbox_free_channel(cl_info->pcc_chan);
>+out:
>+ return rc;
>+}
>+
>+
>+static int hisi_mem_wait_cmd_complete_by_poll(struct hisi_mem_dev
>+*hdev) {
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct acpi_pcct_shared_memory __iomem *comm_base =
>+ cl_info-
>>pcc_comm_addr;
>+ struct platform_device *pdev = hdev->pdev;
>+ u16 status;
>+ int ret;
>+
>+ /*
>+ * Poll PCC status register every 3us(delay_us) for maximum of
>+ * deadline_us(timeout_us) until PCC command complete bit is set(cond)
>+ */
>+ ret = readw_poll_timeout(&comm_base->status, status,
>+ status & PCC_STATUS_CMD_COMPLETE,
>+ HISI_MEM_POLL_STATUS_TIME_INTERVAL_US,
>+ cl_info->deadline_us);
>+ if (unlikely(ret))
>+ dev_err(&pdev->dev, "poll PCC status failed, ret = %d.\n", ret);
>+
>+ return ret;
>+}
>+
>+static int hisi_mem_wait_cmd_complete_by_irq(struct hisi_mem_dev *hdev)
>+{
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct platform_device *pdev = hdev->pdev;
>+
>+ if (!wait_for_completion_timeout(&cl_info->done,
>+ usecs_to_jiffies(cl_info->deadline_us))) {
>+ dev_err(&pdev->dev, "PCC command executed timeout!\n");
>+ return -ETIMEDOUT;
>+ }
>+
>+ return 0;
>+}
>+
>+static inline void hisi_mem_fill_pcc_shared_mem_region(struct hisi_mem_dev
>*hdev,
>+ u8 cmd,
>+ struct hisi_mem_desc *desc,
>+ void __iomem *comm_space,
>+ u16 space_size)
>+{
>+ struct acpi_pcct_shared_memory tmp = {
>+ .signature = PCC_SIGNATURE | hdev->chan_id,
>+ .command = cmd,
>+ .status = 0,
>+ };
>+
>+ memcpy_toio(hdev->cl_info.pcc_comm_addr, (void *)&tmp,
>+ sizeof(struct acpi_pcct_shared_memory));
>+
>+ /* Copy the message to the PCC comm space */
>+ memcpy_toio(comm_space, (void *)desc, space_size); }
>+
>+static inline void hisi_mem_fill_ext_pcc_shared_mem_region(struct
>hisi_mem_dev *hdev,
>+ u8 cmd,
>+ struct hisi_mem_desc
>*desc,
>+ void __iomem
>*comm_space,
>+ u16 space_size)
>+{
>+ struct acpi_pcct_ext_pcc_shared_memory tmp = {
>+ .signature = PCC_SIGNATURE | hdev->chan_id,
>+ .flags = PCC_CMD_COMPLETION_NOTIFY,
>+ .length = HISI_MEM_PCC_SHARE_MEM_BYTES,
>+ .command = cmd,
>+ };
>+
>+ memcpy_toio(hdev->cl_info.pcc_comm_addr, (void *)&tmp,
>+ sizeof(struct acpi_pcct_ext_pcc_shared_memory));
>+
>+ /* Copy the message to the PCC comm space */
>+ memcpy_toio(comm_space, (void *)desc, space_size); }
>+
>+static const struct hisi_mem_verspecific_data hisi04b1_verspec_data = {
>+ .rx_callback = NULL,
>+ .wait_cmd_complete = hisi_mem_wait_cmd_complete_by_poll,
>+ .fill_pcc_shared_mem = hisi_mem_fill_pcc_shared_mem_region,
>+ .shared_mem_size = sizeof(struct acpi_pcct_shared_memory),
>+ .has_txdone_irq = false,
>+};
>+
>+static const struct hisi_mem_verspecific_data hisi04b2_verspec_data = {
>+ .rx_callback = hisi_mem_pcc_rx_callback,
>+ .wait_cmd_complete = hisi_mem_wait_cmd_complete_by_irq,
>+ .fill_pcc_shared_mem = hisi_mem_fill_ext_pcc_shared_mem_region,
>+ .shared_mem_size = sizeof(struct acpi_pcct_ext_pcc_shared_memory),
>+ .has_txdone_irq = true,
>+};
>+
>+static int hisi_mem_pcc_cmd_send(struct hisi_mem_dev *hdev, u8 cmd,
>+ struct hisi_mem_desc *desc)
>+{
>+ const struct hisi_mem_verspecific_data *verspec_data = hdev-
>>verspec_data;
>+ struct hisi_mem_mbox_client_info *cl_info = &hdev->cl_info;
>+ struct platform_device *pdev = hdev->pdev;
>+ void __iomem *comm_space;
>+ u16 space_size;
>+ int ret;
>+
>+ comm_space = cl_info->pcc_comm_addr + verspec_data-
>>shared_mem_size;
>+ space_size = HISI_MEM_PCC_SHARE_MEM_BYTES - verspec_data-
>>shared_mem_size;
>+ verspec_data->fill_pcc_shared_mem(hdev, cmd, desc,
>+ comm_space, space_size);
>+ if (verspec_data->has_txdone_irq)
>+ reinit_completion(&cl_info->done);
>+
>+ /* Ring doorbell */
>+ ret = mbox_send_message(cl_info->mbox_chan, &cmd);
>+ if (ret < 0) {
>+ dev_err(&pdev->dev, "Send PCC mbox message failed, ret =
>%d.\n",
>+ ret);
>+ goto end;
>+ }
>+
>+ ret = verspec_data->wait_cmd_complete(hdev);
>+ if (ret)
>+ goto end;
>+
>+ /* Copy response data */
>+ memcpy_fromio((void *)desc, comm_space, space_size);
>+
>+end:
>+ if (verspec_data->has_txdone_irq)
>+ mbox_chan_txdone(cl_info->mbox_chan, ret);
>+ else
>+ mbox_client_txdone(cl_info->mbox_chan, ret);
>+ return ret;
>+}
>+
>+int hisi_mem_do_acls_query(struct hisi_mem_dev *hdev, int *status, u64
>+paddr) {
static function?
>+ struct platform_device *pdev = hdev->pdev;
>+ struct hisi_mem_desc desc;
>+ int ret;
>+
>+ memset(&desc, 0, sizeof(desc));
>+ desc.req.paddr = paddr;
>+ desc.req.subcmd = HISI_MEM_RAS_QUERY_CAP;
>+ ret = hisi_mem_pcc_cmd_send(hdev, HISI_MEM_RAS_ACLS, &desc);
>+ if (ret) {
>+ dev_err(&pdev->dev, "do acls query failed, ret = %d.\n", ret);
>+ return ret;
>+ }
>+
>+ *status = desc.rsp.retStatus;
>+ return 0;
>+}
>+
>+static ssize_t acls_query_store(struct kobject *kobj,
>+ struct kobj_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ struct hisi_mem_dev *hdev, *tmp;
>+ int status, ret;
>+ u64 paddr;
>+
>+ if (kstrtoull(buf, 16, &paddr))
>+ return -EINVAL;
>+
>+ list_for_each_entry_safe(hdev, tmp, &mras->list, list) {
>+ struct platform_device *pdev = hdev->pdev;
>+
>+ if (paddr < hdev->paddr_min || paddr > hdev->paddr_max)
>+ continue;
>+
>+ ret = hisi_mem_do_acls_query(hdev, &status, paddr);
>+ if (ret)
>+ return ret;
>+
>+ if (status == HISI_MEM_ACLS_OK) {
>+ return count;
>+ } else if (status == HISI_MEM_ACLS_QUERY_NO_RES) {
>+ dev_info(&pdev->dev, "acls query no resource left.\n");
>+ return -ENOSPC;
>+ }
>+
>+ dev_err(&pdev->dev, "acls query error code %d.\n", status);
>+ return -EIO;
>+ }
>+
>+ return -ENODEV;
>+}
>+
>+static struct kobj_attribute acls_query_store_attribute =
>+ __ATTR(acls_query, 0200, NULL, acls_query_store);
0200 for write?
0444 if query is read only.
May be we can use DEVICE_ATTR_RO/ DEVICE_ATTR_RW/ DEVICE_ATTR_WO instead?
>+
>+int hisi_mem_do_acls_repair(struct hisi_mem_dev *hdev, int *status, u64
>+paddr) {
static function?
>+ struct hisi_mem_desc desc;
>+ struct platform_device *pdev;
>+ int ret;
>+
>+ memset(&desc, 0, sizeof(desc));
>+ desc.req.paddr = paddr;
>+ desc.req.subcmd = HISI_MEM_RAS_DO_REPAIR;
>+ ret = hisi_mem_pcc_cmd_send(hdev, HISI_MEM_RAS_ACLS, &desc);
>+ if (ret) {
>+ dev_err(&pdev->dev, "do acls repair failed, ret = %d.\n", ret);
>+ return ret;
>+ }
>+
>+ *status = desc.rsp.retStatus;
>+ return 0;
>+}
>+
>+static ssize_t acls_repair_store(struct kobject *kobj,
>+ struct kobj_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ struct hisi_mem_dev *hdev, *tmp;
>+ u64 paddr, ret;
>+ int status;
>+
>+ if (kstrtoull(buf, 16, &paddr))
>+ return -EINVAL;
>+
>+ list_for_each_entry_safe(hdev, tmp, &mras->list, list) {
>+ struct platform_device *pdev = hdev->pdev;
>+
>+ if (paddr < hdev->paddr_min || paddr > hdev->paddr_max)
>+ continue;
>+
>+ ret = hisi_mem_do_acls_repair(hdev, &status, paddr);
>+ if (ret)
>+ return ret;
>+
>+ if (status == HISI_MEM_ACLS_OK)
>+ return count;
>+
>+ dev_err(&pdev->dev, "acls repair error code %d.\n", status);
>+ return -EIO;
>+ }
>+
>+ return -ENODEV;
>+}
>+static struct kobj_attribute acls_repair_store_attribute =
>+ __ATTR(acls_repair, 0200, NULL, acls_repair_store);
>+
>+static struct attribute *acls_attrs[] = {
>+ &acls_query_store_attribute.attr,
>+ &acls_repair_store_attribute.attr,
>+ NULL,
>+};
>+
>+static struct attribute_group acls_attr_group = {
>+ .attrs = acls_attrs,
>+};
>+
>+static int hisi_mem_ras_probe(struct platform_device *pdev) {
>+ struct hisi_mem_dev *hdev;
>+ int ret = -ENOMEM;
May not require to initialize here.
>+
>+ hdev = kzalloc(sizeof(struct hisi_mem_dev), GFP_KERNEL);
>+ if (!hdev)
>+ return -ENOMEM;
>+
>+ hdev->pdev = pdev;
>+ ret = hisi_mem_get_pcc_chan_id(hdev);
>+ if (ret)
>+ goto free_hdev;
>+
>+ ret = hisi_mem_register_pcc_channel(hdev);
>+ if (ret)
>+ goto free_hdev;
>+
>+ platform_set_drvdata(pdev, hdev);
>+ list_add_tail(&hdev->list, &mras->list);
>+ hdev->verspec_data = &hisi04b2_verspec_data;
>+ return 0;
>+
>+free_hdev:
>+ kfree(hdev);
>+ return ret;
>+}
>+
>+static int hisi_mem_ras_remove(struct platform_device *pdev) {
>+ struct hisi_mem_dev *hdev = platform_get_drvdata(pdev);
>+
>+ platform_set_drvdata(pdev, NULL);
>+ list_del(&hdev->list);
>+
>+ return 0;
>+}
>+
>+static const struct acpi_device_id hisi_mem_acpi_match[] = {
>+ { "HISI0FF1", 0},
>+ { },
>+};
>+MODULE_DEVICE_TABLE(acpi, hisi_mem_acpi_match);
>+
>+static struct platform_driver hisi_mem_ras_driver = {
>+ .probe = hisi_mem_ras_probe,
>+ .remove = hisi_mem_ras_remove,
>+ .driver = {
>+ .name = "hisi_mem_ras",
>+ .acpi_match_table = hisi_mem_acpi_match,
>+ },
>+};
>+
>+static int __init hisi_mem_ras_init(void) {
>+ int ret;
>+
>+ mras = kzalloc(sizeof(struct hisi_mem_ras), GFP_KERNEL);
>+ if (!mras)
>+ return -ENOMEM;
>+
>+ mras->acls_kobj = kobject_create_and_add("acls", kernel_kobj);
>+ if (!mras->acls_kobj) {
>+ kfree(mras);
>+ return -ENOMEM;
>+ }
>+
>+ ret = sysfs_create_group(mras->acls_kobj, &acls_attr_group);
>+ if (ret) {
>+ mras->acls_kobj = NULL;
>+ kobject_put(mras->acls_kobj);
kfree(mras); missing here.
>+ return ret;
>+ }
>+
>+ ret = platform_driver_register(&hisi_mem_ras_driver);
>+ if (ret) {
>+ sysfs_remove_group(mras->acls_kobj, &acls_attr_group);
>+ kobject_put(mras->acls_kobj);
>+ kfree(mras);
>+ return ret;
>+ }
>+
>+ INIT_LIST_HEAD(&mras->list);
>+
>+ return 0;
>+}
>+module_init(hisi_mem_ras_init);
>+
>+static void __exit hisi_mem_ras_exit(void) {
>+ platform_driver_unregister(&hisi_mem_ras_driver);
>+ sysfs_remove_group(mras->acls_kobj, &acls_attr_group);
>+ kobject_put(mras->acls_kobj);
>+ kfree(mras);
>+}
>+module_exit(hisi_mem_ras_exit);
>+
>+MODULE_LICENSE("GPL v2");
>+MODULE_AUTHOR("Xiaofei Tan <tanxiaofei(a)huawei.com>");
>+MODULE_DESCRIPTION("HISILICON Memory RAS driver");
>+MODULE_ALIAS("platform:" DRV_NAME);
>diff --git a/drivers/soc/hisilicon/hisi_mem_ras.h
>b/drivers/soc/hisilicon/hisi_mem_ras.h
>new file mode 100644
>index 000000000000..a9acbb0c1012
>--- /dev/null
>+++ b/drivers/soc/hisilicon/hisi_mem_ras.h
>@@ -0,0 +1,84 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+/*
>+ * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved.
>+ */
>+
>+#ifndef _HISI_MEM_RAS_H
>+#define _HISI_MEM_RAS_H
>+
>+struct hisi_mem_ras {
>+ struct list_head list;
>+ struct kobject *acls_kobj;
>+};
>+
>+struct hisi_mem_mbox_client_info {
>+ struct mbox_client client;
>+ struct mbox_chan *mbox_chan;
>+ struct pcc_mbox_chan *pcc_chan;
>+ u64 deadline_us;
>+ void __iomem *pcc_comm_addr;
>+ struct completion done;
>+};
>+
>+struct hisi_mem_desc;
>+struct hisi_mem_dev;
>+
>+struct hisi_mem_verspecific_data {
>+ void (*rx_callback)(struct mbox_client *cl, void *mssg);
>+ int (*wait_cmd_complete)(struct hisi_mem_dev *hdev);
>+ void (*fill_pcc_shared_mem)(struct hisi_mem_dev *hdev,
>+ u8 cmd, struct hisi_mem_desc *desc,
>+ void __iomem *comm_space,
>+ u16 space_size);
>+ u16 shared_mem_size;
>+ bool has_txdone_irq;
>+};
>+
>+struct hisi_mem_dev {
>+ struct list_head list;
>+ struct platform_device *pdev;
>+ //struct acpi_device *acpi_dev;
Not used.
>+ const struct hisi_mem_verspecific_data *verspec_data;
>+ /* device capabilities from firmware, like hisi_mem_CAPS_xxx. */
>+ u64 caps;
>+ u8 chan_id;
>+ u64 paddr_min;
>+ u64 paddr_max;
>+ struct mutex lock;
>+ struct hisi_mem_mbox_client_info cl_info; };
>+
>+#define HISI_MEM_PCC_SHARE_MEM_BYTES 128
>+enum hisi_mem_ras_cmd_type {
>+ HISI_MEM_RAS_ACLS = 0x4,
>+ HISI_MEM_RAS_PPR = 0X5,
>+};
>+
>+enum hisi_mem_ras_subcmd_type {
>+ HISI_MEM_RAS_QUERY_CAP = 0x0,
>+ HISI_MEM_RAS_DO_REPAIR = 0x1,
>+};
>+
>+struct hisi_mem_req_desc {
>+ u64 paddr;
>+ u8 subcmd;
>+ u8 rsv[3];
>+};
>+
>+#define HISI_MEM_ACLS_OK 0
>+#define HISI_MEM_ACLS_QUERY_NO_RES 1
>+#define HISI_MEM_ACLS_REPAIR_FAIL 2
>+#define HISI_MEM_ACLS_EINVAL 3
>+struct hisi_mem_rsp_desc {
>+ u8 retStatus; /* 0: success, other: failure */
>+ u8 rsv[7];
>+};
>+
>+struct hisi_mem_desc {
>+ union {
>+ struct hisi_mem_req_desc req;
>+ struct hisi_mem_rsp_desc rsp;
>+ };
>+};
>+
>+#endif
>--
>2.33.0
1
0

Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address
by Shameerali Kolothum Thodi 13 May '24
by Shameerali Kolothum Thodi 13 May '24
13 May '24
> -----Original Message-----
> From: liulongfang <liulongfang(a)huawei.com>
> Sent: Monday, May 13, 2024 9:16 AM
> To: Alex Williamson <alex.williamson(a)redhat.com>; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi(a)huawei.com>
> Cc: jgg(a)nvidia.com; Jonathan Cameron <jonathan.cameron(a)huawei.com>;
> kvm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org;
> linuxarm(a)openeuler.org
> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> the XQC address
>
> On 2024/5/9 22:29, Alex Williamson wrote:
> > On Thu, 9 May 2024 09:37:51 +0000
> > Shameerali Kolothum Thodi <shameerali.kolothum.thodi(a)huawei.com>
> wrote:
> >
> >>> -----Original Message-----
> >>> From: Alex Williamson <alex.williamson(a)redhat.com>
> >>> Sent: Wednesday, May 8, 2024 7:00 PM
> >>> To: liulongfang <liulongfang(a)huawei.com>
> >>> Cc: jgg(a)nvidia.com; Shameerali Kolothum Thodi
> >>> <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> >>> <jonathan.cameron(a)huawei.com>; kvm(a)vger.kernel.org; linux-
> >>> kernel(a)vger.kernel.org; linuxarm(a)openeuler.org
> >>> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register
> location of
> >>> the XQC address
> >>
> >> [...]
> >>
> >>>> HiSilicon accelerator equipment can perform general services after
> >>> completing live migration.
> >>>> This kind of business is executed through the user mode driver and only
> >>> needs to use SQE and CQE.
> >>>>
> >>>> At the same time, this device can also perform kernel-mode services in
> the
> >>> VM through the crypto
> >>>> subsystem. This kind of service requires the use of EQE.
> >>>>
> >>>> Finally, if the device is abnormal, the driver needs to perform a device
> >>> reset, and AEQE needs to
> >>>> be used in this case.
> >>>>
> >>>> Therefore, a complete device live migration function needs to ensure
> that
> >>> device functions are
> >>>> normal in all these scenarios.
> >>>> Therefore, this data still needs to be migrated.
> >>>
> >>> Ok, I had jumped to an in-kernel host driver in reference to "kernel
> >>> mode" rather than a guest kernel. Migrating with bad data only affects
> >>> the current configuration of the device, reloading a guest driver to
> >>> update these registers or a reset of the device would allow proper
> >>> operation of the device, correct?
> >>
> >> Yes, after talking to Longfang, the device RAS will trigger a reset and
> >> would function after reset.
> >>
> >>>
> >>> But I think this still isn't really a complete solution, we know
> >>> there's a bug in the migration data stream, so not only would we fix
> >>> the data stream, but I think we should also take measures to prevent
> >>> loading a known bad data stream. AIUI migration of this device while
> >>> running in kernel mode (ie. a kernel driver within a guest VM) is
> >>> broken. Therefore, the least we can do in a new kernel, knowing that
> >>> there was previously a bug in the migration data stream, is to fail to
> >>> load that migration data because it risks this scenario where the
> >>> device is broken after migration. Shouldn't we then also increment a
> >>> migration version field in the data stream to block migrations that
> >>> risk this breakage, or barring that, change the magic data field to
> >>> prevent the migration? Thanks,
> >>
> >> Ok. We could add a new ACC_DEV_MAGIC_V2 and prevent the migration
> >> in vf_qm_check_match(). The only concern here is that, it will completely
> >> block old kernel to new kernel migration and since we can recover the
> >> device after the reset whether it is too restrictive or not.
> >
> > What's the impact to the running driver, kernel or userspace, if the
> > device is reset? Migration is intended to be effectively transparent
>
> If the device is reset, the user's task needs to be restarted.
> If an exception has been detected, the best way is not to migrate.
>
> > to the driver. If the driver stalls and needs to reset the device,
> > what has the migration driver accomplished versus an offline migration?
> >
> > If there's a way to detect from the migration data if the device is
> > running in kernel mode or user mode then you could potentially accept
> > and send v1 magic conditional that the device is in user mode and
> > require v2 magic for any migration where the device is in kernel mode.
> > This all adds complication though and seems like it has corner cases
> > where we might allow migration to an old kernel that might trap the
> > device there if the use case changes.
> >
>
> The driver does not support checking whether the device is running in
> kernel mode or user mode.
> Moreover, the device supports user-mode services and kernel-mode services
> to run at the same time.
>
> > Essentially it comes down to what should the migration experience be
> > and while restricting old->new and new->old migration is undesirable,
> > it seems old->old migration is effectively already broken anyway. As
> > you consider a v2 magic, perhaps consider how the migration data
> > structure might be improved overall to better handle new features and
> > bugs. Thanks,
> >
>
> We discussed a plan:
> Update ACC_DEV_MAGIC to ACC_DEV_MAGIC_VERSION and configure its
> last byte
> as version information:
>
> /* QM match information, last byte is version number */
> #define ACC_DEV_MAGIC_VERSION 0XACCDEVFEEDCAFE01
Oops..cant have V there. But the idea is replace magic with last byte
as version info which can be used in future for handling bugs/features
etc.
Thanks,
Shameer
1
0

Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address
by Shameerali Kolothum Thodi 09 May '24
by Shameerali Kolothum Thodi 09 May '24
09 May '24
> -----Original Message-----
> From: Alex Williamson <alex.williamson(a)redhat.com>
> Sent: Wednesday, May 8, 2024 7:00 PM
> To: liulongfang <liulongfang(a)huawei.com>
> Cc: jgg(a)nvidia.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi(a)huawei.com>; Jonathan Cameron
> <jonathan.cameron(a)huawei.com>; kvm(a)vger.kernel.org; linux-
> kernel(a)vger.kernel.org; linuxarm(a)openeuler.org
> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> the XQC address
[...]
> > HiSilicon accelerator equipment can perform general services after
> completing live migration.
> > This kind of business is executed through the user mode driver and only
> needs to use SQE and CQE.
> >
> > At the same time, this device can also perform kernel-mode services in the
> VM through the crypto
> > subsystem. This kind of service requires the use of EQE.
> >
> > Finally, if the device is abnormal, the driver needs to perform a device
> reset, and AEQE needs to
> > be used in this case.
> >
> > Therefore, a complete device live migration function needs to ensure that
> device functions are
> > normal in all these scenarios.
> > Therefore, this data still needs to be migrated.
>
> Ok, I had jumped to an in-kernel host driver in reference to "kernel
> mode" rather than a guest kernel. Migrating with bad data only affects
> the current configuration of the device, reloading a guest driver to
> update these registers or a reset of the device would allow proper
> operation of the device, correct?
Yes, after talking to Longfang, the device RAS will trigger a reset and
would function after reset.
>
> But I think this still isn't really a complete solution, we know
> there's a bug in the migration data stream, so not only would we fix
> the data stream, but I think we should also take measures to prevent
> loading a known bad data stream. AIUI migration of this device while
> running in kernel mode (ie. a kernel driver within a guest VM) is
> broken. Therefore, the least we can do in a new kernel, knowing that
> there was previously a bug in the migration data stream, is to fail to
> load that migration data because it risks this scenario where the
> device is broken after migration. Shouldn't we then also increment a
> migration version field in the data stream to block migrations that
> risk this breakage, or barring that, change the magic data field to
> prevent the migration? Thanks,
Ok. We could add a new ACC_DEV_MAGIC_V2 and prevent the migration
in vf_qm_check_match(). The only concern here is that, it will completely
block old kernel to new kernel migration and since we can recover the
device after the reset whether it is too restrictive or not.
Thanks,
Shameer
1
0