- Kernel - mailweb.openeuler.org

[PATCH openEuler-22.03-LTS 00/17] Add error handle for add_disk
by Li Nan 22 Nov '23

22 Nov '23

Christoph Hellwig (5): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference Luis Chamberlain (4): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling Tetsuo Handa (1): block: check minor range in device_add_disk() Yu Kuai (1): block: fix memory leak for elevator on add_disk failure Zhong Jinghua (6): block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path block/blk.h | 5 +- include/linux/genhd.h | 13 +++ block/blk-integrity.c | 12 ++- block/genhd.c | 246 +++++++++++++++++++++++++----------------- 4 files changed, 173 insertions(+), 103 deletions(-) -- 2.39.2

1 17

[PATCH OLK-6.6 v2] dm ioctl: add DMINFO() to track dm device create/remove
by Li Lingfeng 21 Nov '23

21 Nov '23

From: Luo Meng <luomeng12(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8HVC5 CVE: NA -------------------------------- Add DMINFO() to help tracking device creation/removal success. Signed-off-by: Luo Meng <luomeng12(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/md/dm-ioctl.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 21ebb6c39394..5efe0193b2e8 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -327,6 +327,9 @@ static struct dm_table *__hash_remove(struct hash_cell *hc) table = NULL; if (hc->new_map) table = hc->new_map; + + DMINFO("%s[%i]: %s (%s) is removed successfully", + current->comm, current->pid, hc->md->disk->disk_name, hc->name); dm_put(hc->md); free_cell(hc); @@ -880,6 +883,7 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si { int r, m = DM_ANY_MINOR; struct mapped_device *md; + struct hash_cell *hc; r = check_name(param->name); if (r) @@ -903,6 +907,13 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si __dev_status(md, param); + mutex_lock(&dm_hash_cells_mutex); + hc = dm_get_mdptr(md); + if (hc) + DMINFO("%s[%i]: %s (%s) is created successfully", + current->comm, current->pid, md->disk->disk_name, hc->name); + + mutex_unlock(&dm_hash_cells_mutex); dm_put(md); return 0; -- 2.31.1

2 5

[PATCH V2 OLK-5.10] vhost-vdpa: allow set feature VHOST_F_LOG_ALL when been negotiated.
by Jiang Dongxu 21 Nov '23

21 Nov '23

From: jiangdongxu <jiangdongxu1(a)huawei.com> virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I86ITO ---------------------------------------------------------------------- It's not allowed to change the features after vhost-vdpa devices have been negotiated. But log start/end is allowed. Add exception to feature VHOST_F_LOG_ALL. Signed-off-by: jiangdongxu <jiangdongxu1(a)huawei.com> --- drivers/vhost/vdpa.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 8e3bf64123ae..eec8027dfc4f 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -425,16 +425,19 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep) u64 features; int i; + if (copy_from_user(&features, featurep, sizeof(features))) + return -EFAULT; + + actual_features = ops->get_driver_features(vdpa); + /* * It's not allowed to change the features after they have - * been negotiated. + * been negotiated. But log start/end is allowed. */ - if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK) + if ((ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK) && + (features & ~(BIT_ULL(VHOST_F_LOG_ALL))) != actual_features) return -EBUSY; - if (copy_from_user(&features, featurep, sizeof(features))) - return -EFAULT; - if (vdpa_set_features(vdpa, features)) return -EINVAL; -- 2.27.0

2 5

[OLK-6.6 v2] dm ioctl: add DMINFO() to track dm device create/remove
by Li Lingfeng 21 Nov '23

21 Nov '23

From: Luo Meng <luomeng12(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8HVC5 CVE: NA -------------------------------- Add DMINFO() to help tracking device creation/removal success. Signed-off-by: Luo Meng <luomeng12(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/md/dm-ioctl.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 21ebb6c39394..5efe0193b2e8 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -327,6 +327,9 @@ static struct dm_table *__hash_remove(struct hash_cell *hc) table = NULL; if (hc->new_map) table = hc->new_map; + + DMINFO("%s[%i]: %s (%s) is removed successfully", + current->comm, current->pid, hc->md->disk->disk_name, hc->name); dm_put(hc->md); free_cell(hc); @@ -880,6 +883,7 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si { int r, m = DM_ANY_MINOR; struct mapped_device *md; + struct hash_cell *hc; r = check_name(param->name); if (r) @@ -903,6 +907,13 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si __dev_status(md, param); + mutex_lock(&dm_hash_cells_mutex); + hc = dm_get_mdptr(md); + if (hc) + DMINFO("%s[%i]: %s (%s) is created successfully", + current->comm, current->pid, md->disk->disk_name, hc->name); + + mutex_unlock(&dm_hash_cells_mutex); dm_put(md); return 0; -- 2.31.1

1 0

[PATCH OLK-5.10] vhost-vdpa: allow set feature VHOST_F_LOG_ALL when been negotiated.
by Jiang Dongxu 21 Nov '23

21 Nov '23

From: jiangdongxu <jiangdongxu1(a)huawei.com> virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I86ITO ---------------------------------------------------------------------- It's not allowed to change the features after vhost-vdpa devices have been negotiated. But log start/end is allowed. Add exception to feature VHOST_F_LOG_ALL. Signed-off-by: jiangdongxu <jiangdongxu1(a)huawei.com> --- drivers/vhost/vdpa.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 8e3bf64123ae..eec8027dfc4f 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -425,16 +425,19 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep) u64 features; int i; + if (copy_from_user(&features, featurep, sizeof(features))) + return -EFAULT; + + actual_features = ops->get_driver_features(vdpa); + /* * It's not allowed to change the features after they have - * been negotiated. + * been negotiated. But log start/end is allowed. */ - if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK) + if ((ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK) && + (features & ~(BIT_ULL(VHOST_F_LOG_ALL))) != actual_features) return -EBUSY; - if (copy_from_user(&features, featurep, sizeof(features))) - return -EFAULT; - if (vdpa_set_features(vdpa, features)) return -EINVAL; -- 2.27.0

2 5

[PATCH OLK-6.6] dm ioctl: add DMINFO() to track dm device create/remove
by Li Lingfeng 21 Nov '23

21 Nov '23

From: Luo Meng <luomeng12(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8HVC5 CVE: NA -------------------------------- Add DMINFO() to help tracking device creation/removal success. Signed-off-by: Luo Meng <luomeng12(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/md/dm-ioctl.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 21ebb6c39394..5efe0193b2e8 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -327,6 +327,9 @@ static struct dm_table *__hash_remove(struct hash_cell *hc) table = NULL; if (hc->new_map) table = hc->new_map; + + DMINFO("%s[%i]: %s (%s) is removed successfully", + current->comm, current->pid, hc->md->disk->disk_name, hc->name); dm_put(hc->md); free_cell(hc); @@ -880,6 +883,7 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si { int r, m = DM_ANY_MINOR; struct mapped_device *md; + struct hash_cell *hc; r = check_name(param->name); if (r) @@ -903,6 +907,13 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si __dev_status(md, param); + mutex_lock(&dm_hash_cells_mutex); + hc = dm_get_mdptr(md); + if (hc) + DMINFO("%s[%i]: %s (%s) is created successfully", + current->comm, current->pid, md->disk->disk_name, hc->name); + + mutex_unlock(&dm_hash_cells_mutex); dm_put(md); return 0; -- 2.31.1

2 1

[PATCH OLK-5.10] vhost_vdpa: add reset state params to indicate reset level
by Jiang Dongxu 21 Nov '23

21 Nov '23

From: jiangdongxu1 <jiangdongxu1(a)huawei.com> virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I86ITO ---------------------------------------------------------------------- Add the interface parameter state to the vdpa reset interface, which respectively identifies the reset when the device is turned on/off and the virtio reset issued by the virtual machine. Signed-off-by: jiangdongxu1 <jiangdongxu1(a)huawei.com> --- drivers/vhost/vdpa.c | 10 +++++----- include/linux/vdpa.h | 16 +++++++++++++--- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index eec8027dfc4f..eed51d004531 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -212,13 +212,13 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid) irq_bypass_unregister_producer(&vq->call_ctx.producer); } -static int vhost_vdpa_reset(struct vhost_vdpa *v) +static int vhost_vdpa_reset(struct vhost_vdpa *v, int state) { struct vdpa_device *vdpa = v->vdpa; v->in_batch = 0; - return vdpa_reset(vdpa); + return vdpa_reset(vdpa, state); } static long vhost_vdpa_bind_mm(struct vhost_vdpa *v) @@ -297,7 +297,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) vhost_vdpa_unsetup_vq_irq(v, i); if (status == 0) { - ret = vdpa_reset(vdpa); + ret = vdpa_reset(vdpa, VDPA_DEV_RESET_VIRTIO); if (ret) return ret; } else @@ -1496,7 +1496,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep) return r; nvqs = v->nvqs; - r = vhost_vdpa_reset(v); + r = vhost_vdpa_reset(v, VDPA_DEV_RESET_OPEN); if (r) goto err; @@ -1542,7 +1542,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep) mutex_lock(&d->mutex); filep->private_data = NULL; vhost_vdpa_clean_irq(v); - vhost_vdpa_reset(v); + vhost_vdpa_reset(v, VDPA_DEV_RESET_CLOSE); vhost_dev_stop(&v->vdev); vhost_vdpa_unbind_mm(v); vhost_vdpa_config_put(v); diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 83df2380eeb9..ef53829c7a7a 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -120,6 +120,12 @@ struct vdpa_map_file { u64 offset; }; +enum vdpa_reset_state { + VDPA_DEV_RESET_VIRTIO = 0, + VDPA_DEV_RESET_OPEN = 1, + VDPA_DEV_RESET_CLOSE = 2, +}; + /** * struct vdpa_config_ops - operations for configuring a vDPA device. * Note: vDPA device drivers are required to implement all of the @@ -218,6 +224,10 @@ struct vdpa_map_file { * @status: virtio device status * @reset: Reset device * @vdev: vdpa device + * @state: state for reset + * VDPA_DEV_RESET_VIRTIO for virtio reset + * VDPA_DEV_RESET_OPEN for vhost-vdpa device open + * VDPA_DEV_RESET_CLOSE for vhost-vdpa device close * Returns integer: success (0) or error (< 0) * @suspend: Suspend the device (optional) * @vdev: vdpa device @@ -359,7 +369,7 @@ struct vdpa_config_ops { u32 (*get_vendor_id)(struct vdpa_device *vdev); u8 (*get_status)(struct vdpa_device *vdev); void (*set_status)(struct vdpa_device *vdev, u8 status); - int (*reset)(struct vdpa_device *vdev); + int (*reset)(struct vdpa_device *vdev, int state); int (*suspend)(struct vdpa_device *vdev); int (*resume)(struct vdpa_device *vdev); size_t (*get_config_size)(struct vdpa_device *vdev); @@ -482,14 +492,14 @@ static inline struct device *vdpa_get_dma_dev(struct vdpa_device *vdev) return vdev->dma_dev; } -static inline int vdpa_reset(struct vdpa_device *vdev) +static inline int vdpa_reset(struct vdpa_device *vdev, int state) { const struct vdpa_config_ops *ops = vdev->config; int ret; down_write(&vdev->cf_lock); vdev->features_valid = false; - ret = ops->reset(vdev); + ret = ops->reset(vdev, state); up_write(&vdev->cf_lock); return ret; } -- 2.27.0

2 4

[PATCH OLK-5.10] vhost-vdpa: allow set feature VHOST_F_LOG_ALL when been negotiated.
by Jiang Dongxu 21 Nov '23

21 Nov '23

From: jiangdongxu1 <jiangdongxu1(a)huawei.com> virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I86ITO ---------------------------------------------------------------------- It's not allowed to change the features after vhost-vdpa devices have been negotiated. But log start/end is allowed. Add exception to feature VHOST_F_LOG_ALL. --- drivers/vhost/vdpa.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 8e3bf64123ae..eec8027dfc4f 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -425,16 +425,19 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep) u64 features; int i; + if (copy_from_user(&features, featurep, sizeof(features))) + return -EFAULT; + + actual_features = ops->get_driver_features(vdpa); + /* * It's not allowed to change the features after they have - * been negotiated. + * been negotiated. But log start/end is allowed. */ - if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK) + if ((ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK) && + (features & ~(BIT_ULL(VHOST_F_LOG_ALL))) != actual_features) return -EBUSY; - if (copy_from_user(&features, featurep, sizeof(features))) - return -EFAULT; - if (vdpa_set_features(vdpa, features)) return -EINVAL; -- 2.27.0

2 5

[PATCH OLK-6.6 v2 0/3] Add initial openeuler_defconfig for arm64 and
by Zheng Zengkai 21 Nov '23

21 Nov '23

If new options are introduced, but openeuler_defconfig is not explicitly configured, the actual compiled version may be configured according to the default settings, which may be different from the author's expectation. Therefore, some commands/scripts are added to help developers to check and update the defconfig. It is also convenient for continuous integration tools to check the consistency of defconfig. Based on OLK-5.10 openeuler_defconfig, use the commands mentioned above to generate the initial openeuler_defconfig for OLK-6.6 kernel build. Xie XiuQi (1): kconfig: Add script to check & update openeuler_defconfig Zheng Zengkai (2): config: add initial openeuler_defconfig for arm64 config: add initial openeuler_defconfig for x86 arch/arm64/configs/openeuler_defconfig | 7942 ++++++++++++++++++++ arch/x86/configs/openeuler_defconfig | 9158 ++++++++++++++++++++++++ scripts/kconfig/Makefile | 22 + scripts/kconfig/makeconfig.sh | 24 + 4 files changed, 17146 insertions(+) create mode 100644 arch/arm64/configs/openeuler_defconfig create mode 100644 arch/x86/configs/openeuler_defconfig create mode 100644 scripts/kconfig/makeconfig.sh -- 2.20.1

2 5

[PATCH OLK-5.10 v3 RESEND 0/3] Introduce CPU inspect feature
by Yu Liao 21 Nov '23

21 Nov '23

v2 -> v3: - fix compilation failure on arm32 - enable CPU inspect for arm64 by default This patches series introduce CPU-inspect feature. CPU-inspect is designed to provide a framework for early detection of SDC by proactively executing CPU inspection test cases. Yu Liao (3): cpuinspect: add CPU-inspect infrastructure cpuinspect: add ATF inspector openeuler_defconfig: enable CPU inspect for arm64 by default arch/arm64/configs/openeuler_defconfig | 7 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/cpuinspect/Kconfig | 24 +++ drivers/cpuinspect/Makefile | 8 + drivers/cpuinspect/cpuinspect.c | 165 ++++++++++++++++ drivers/cpuinspect/cpuinspect.h | 46 +++++ drivers/cpuinspect/inspector-atf.c | 70 +++++++ drivers/cpuinspect/inspector.c | 124 ++++++++++++ drivers/cpuinspect/sysfs.c | 257 +++++++++++++++++++++++++ include/linux/cpuinspect.h | 40 ++++ 11 files changed, 744 insertions(+) create mode 100644 drivers/cpuinspect/Kconfig create mode 100644 drivers/cpuinspect/Makefile create mode 100644 drivers/cpuinspect/cpuinspect.c create mode 100644 drivers/cpuinspect/cpuinspect.h create mode 100644 drivers/cpuinspect/inspector-atf.c create mode 100644 drivers/cpuinspect/inspector.c create mode 100644 drivers/cpuinspect/sysfs.c create mode 100644 include/linux/cpuinspect.h -- 2.33.0

1 3

[PATCH OLK-5.10] iommu/arm-smmu-v3: Add config isolations for BTM
by Zhang Zekun 20 Nov '23

20 Nov '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8I0NL ------------------------------- Add config isolation for SMMU BTM (Broadcast TLB Maintenance), because VMIDS are all shared and will be invalid all together, which might cause performance issues. Besides, If the hardware fail to invalid TLB, it can cause unexpected errors, you need to do some tests to make sure BTM work as expect. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- drivers/iommu/Kconfig | 10 ++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++ 2 files changed, 12 insertions(+) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index b630e58c49b6..f768fbc60dd3 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -438,6 +438,16 @@ config SMMU_BYPASS_DEV This feature will be replaced by ACPI IORT RMR node, which will be upstreamed in mainline. +config SMMU_BTM_SUPPORT + bool "SMMU BTM (Broadcast TLB Maintenance) support" + depends on ARM_SMMU_V3 + help + ARM SMMU BTM feature can support for receiving broadcast TLBI + operations issued by Arm PEs in the system, which can help to + speed up the tlb invalidation speed in SVA scenarios. + + This feature need support for BTM features, if not sure, say NO. + endif # IOMMU_SUPPORT config IOVA_MAX_GLOBAL_MAGS diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index d93ce123df49..09777b05b089 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -4954,8 +4954,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) * broadcasted TLB invalidations that target EL2-E2H world. Don't enable * BTM in that case. */ +#ifdef CONFIG_SMMU_BTM_SUPPORT if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP)) smmu->features |= ARM_SMMU_FEAT_BTM; +#endif /* * The coherency feature as set by FW is used in preference to the ID -- 2.17.1

2 1

[PATCH OLK-5.10 v3 0/3] Introduce CPU inspect feature
by Yu Liao 20 Nov '23

20 Nov '23

v2 -> v3: - fix compilation failure on arm32 - enable CPU inspect for arm64 by default This patches series introduce CPU-inspect feature. CPU-inspect is designed to provide a framework for early detection of SDC by proactively executing CPU inspection test cases. Yu Liao (3): cpuinspect: add CPU-inspect infrastructure cpuinspect: add ATF inspector openeuler_defconfig: enable CPU inspect for arm64 by default arch/arm64/configs/openeuler_defconfig | 7 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/cpuinspect/Kconfig | 24 +++ drivers/cpuinspect/Makefile | 8 + drivers/cpuinspect/cpuinspect.c | 165 ++++++++++++++++ drivers/cpuinspect/cpuinspect.h | 46 +++++ drivers/cpuinspect/inspector-atf.c | 70 +++++++ drivers/cpuinspect/inspector.c | 124 ++++++++++++ drivers/cpuinspect/sysfs.c | 257 +++++++++++++++++++++++++ include/linux/cpuinspect.h | 40 ++++ 11 files changed, 744 insertions(+) create mode 100644 drivers/cpuinspect/Kconfig create mode 100644 drivers/cpuinspect/Makefile create mode 100644 drivers/cpuinspect/cpuinspect.c create mode 100644 drivers/cpuinspect/cpuinspect.h create mode 100644 drivers/cpuinspect/inspector-atf.c create mode 100644 drivers/cpuinspect/inspector.c create mode 100644 drivers/cpuinspect/sysfs.c create mode 100644 include/linux/cpuinspect.h -- 2.33.0

1 3

[PATCH OLK-5.10 v2 0/2] Introduce CPU inspect feature
by Yu Liao 20 Nov '23

20 Nov '23

This patches series introduce CPU-inspect feature. CPU-inspect is designed to provide a framework for early detection of SDC by proactively executing CPU inspection test cases. v1 -> v2: fix some issues Yu Liao (2): cpuinspect: add CPU-inspect infrastructure cpuinspect: add ATF inspector drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/cpuinspect/Kconfig | 24 +++ drivers/cpuinspect/Makefile | 8 + drivers/cpuinspect/cpuinspect.c | 165 ++++++++++++++++++ drivers/cpuinspect/cpuinspect.h | 46 ++++++ drivers/cpuinspect/inspector-atf.c | 70 ++++++++ drivers/cpuinspect/inspector.c | 124 ++++++++++++++ drivers/cpuinspect/sysfs.c | 257 +++++++++++++++++++++++++++++ include/linux/cpuinspect.h | 40 +++++ 10 files changed, 737 insertions(+) create mode 100644 drivers/cpuinspect/Kconfig create mode 100644 drivers/cpuinspect/Makefile create mode 100644 drivers/cpuinspect/cpuinspect.c create mode 100644 drivers/cpuinspect/cpuinspect.h create mode 100644 drivers/cpuinspect/inspector-atf.c create mode 100644 drivers/cpuinspect/inspector.c create mode 100644 drivers/cpuinspect/sysfs.c create mode 100644 include/linux/cpuinspect.h -- 2.33.0

2 3

[PATCH OLK-6.6] dm ioctl: add DMINFO() to track dm device create/remove
by Li Lingfeng 20 Nov '23

20 Nov '23

From: Luo Meng <luomeng12(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8HVC5 CVE: NA -------------------------------- Add DMINFO() to help tracking device creation/removal success. Signed-off-by: Luo Meng <luomeng12(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/md/dm-ioctl.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 21ebb6c39394..5efe0193b2e8 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -327,6 +327,9 @@ static struct dm_table *__hash_remove(struct hash_cell *hc) table = NULL; if (hc->new_map) table = hc->new_map; + + DMINFO("%s[%i]: %s (%s) is removed successfully", + current->comm, current->pid, hc->md->disk->disk_name, hc->name); dm_put(hc->md); free_cell(hc); @@ -880,6 +883,7 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si { int r, m = DM_ANY_MINOR; struct mapped_device *md; + struct hash_cell *hc; r = check_name(param->name); if (r) @@ -903,6 +907,13 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si __dev_status(md, param); + mutex_lock(&dm_hash_cells_mutex); + hc = dm_get_mdptr(md); + if (hc) + DMINFO("%s[%i]: %s (%s) is created successfully", + current->comm, current->pid, md->disk->disk_name, hc->name); + + mutex_unlock(&dm_hash_cells_mutex); dm_put(md); return 0; -- 2.31.1

1 0

[PATCH openEuler-22.03-LTS-SP2 0/3] add pid enable_swap
by liubo 20 Nov '23

20 Nov '23

Liu Yuntao (1): add "enable_swap" ability Ni Cunshu (1): mm: add preferred_swap ability liubo (1): preferred_swap: share memory can specify swap device fs/proc/base.c | 188 +++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 4 + include/linux/sched/coredump.h | 1 + include/linux/swap.h | 13 +++ kernel/fork.c | 4 + mm/Kconfig | 7 ++ mm/swap_slots.c | 64 ++++++++++- mm/swap_state.c | 1 + mm/swapfile.c | 86 ++++++++++++++- mm/vmscan.c | 38 +++++++ 10 files changed, 403 insertions(+), 3 deletions(-) -- 2.33.0

2 4

[PATCH OLK-5.10 v2 0/2] Fix syntax issues in comments and print
by Li Lingfeng 20 Nov '23

20 Nov '23

Fix syntax issues in comments and print statements v1->v2: Fix syntax issues in comments of the code section that opening an exclusive opened block device for write. Li Lingfeng (2): fs: Fix syntax issues in comments and print statements. fs: Fix syntax issues in comments fs/block_dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.31.1

2 3

[PATCH OLK-5.10 RESEND 0/2] block: warn once for each partition in bio_check_ro()
by Li Nan 20 Nov '23

20 Nov '23

From: Yu Kuai <yukuai3(a)huawei.com> Yu Kuai (2): block: warn once for each partition in bio_check_ro() block: fix kabi broken in struct hd_part block/blk-core.c | 5 +++++ include/linux/genhd.h | 1 + 2 files changed, 6 insertions(+) -- 2.39.2

1 0

[PATCH OLK-5.10] fs: Fix syntax issues in comments and print statements.
by Li Lingfeng 20 Nov '23

20 Nov '23

hulk inclusion category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I8HWD1 -------------------------------- There are syntax errors in the comments and print statements of the code section that detects opening write opened block devices exclusively, which need to be fixed. Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- fs/block_dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index a0e4d3ec300e..6389551aa29e 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1669,11 +1669,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, void *holder, #ifdef CONFIG_BLK_DEV_DUMPINFO /* * Open an write opened block device exclusively, the - * writing process may probability corrupt the device, + * writing process may probably corrupt the device, * such as a mounted file system, give a hint here. */ if (is_conflict_excl_open(bdev, claiming, mode)) - blkdev_dump_conflict_opener(bdev, "VFS: Open an write opened " + blkdev_dump_conflict_opener(bdev, "VFS: Open a write opened " "block device exclusively"); #endif bd_finish_claiming(bdev, claiming, holder); -- 2.31.1

2 1

[PATCH openEuler-1.0-LTS] mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
by Tong Tiangen 20 Nov '23

20 Nov '23

From: Miaohe Lin <linmiaohe(a)huawei.com> mainline inclusion from mainline-v5.13-rc1 commit 34f5e9b9d1990d286199084efa752530ee3d8297 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8HVSL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- If the zone device page does not belong to un-addressable device memory, the variable entry will be uninitialized and lead to indeterminate pte entry ultimately. Fix this unexpected case and warn about it. Link: https://lkml.kernel.org/r/20210325131524.48181-4-linmiaohe@huawei.com Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache coherent with CPU") Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Alistair Popple <apopple(a)nvidia.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Rafael Aquini <aquini(a)redhat.com> Cc: Yang Shi <shy828301(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> conflict: mm/migrate.c Signed-off-by: Tong Tiangen <tongtiangen(a)huawei.com> --- mm/migrate.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index dc6416ccef44..56a2033d443c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2688,6 +2688,13 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); entry = pte_mkdevmap(entry); + } else { + /* + * For now we not support migrating to MEMORY_DEVICE_FS_DAX + * and MEMORY_DEVICE_PCI_P2PDMA device memory. + */ + pr_warn_once("Unsupported ZONE_DEVICE page type.\n"); + goto abort; } } else { entry = mk_pte(page, vma->vm_page_prot); -- 2.25.1

2 1

[PATCH OLK-5.10 0/2] block: warn once for each partition in bio_check_ro()
by Li Nan 20 Nov '23

20 Nov '23

From: Yu Kuai <yukuai3(a)huawei.com> Yu Kuai (2): block: warn once for each partition in bio_check_ro() block: fix kabi broken in struct hd_part block/blk-core.c | 5 +++++ include/linux/genhd.h | 1 + 2 files changed, 6 insertions(+) -- 2.39.2

2 3

[PATCH OLK-5.10] net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
by Liu Jian 18 Nov '23

18 Nov '23

stable inclusion from stable-v5.10.195 commit a5096cc6e7836711541b7cd2d6da48d36fe420e9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8H5DK CVE: CVE-2023-6176 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… --------------------------- [ Upstream commit cfaa80c91f6f99b9342b6557f0f0e1143e434066 ] I got the below warning when do fuzzing test: BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470 Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9 CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE Hardware name: linux,dummy-virt (DT) Workqueue: pencrypt_parallel padata_parallel_worker Call trace: dump_backtrace+0x0/0x420 show_stack+0x34/0x44 dump_stack+0x1d0/0x248 __kasan_report+0x138/0x140 kasan_report+0x44/0x6c __asan_load4+0x94/0xd0 scatterwalk_copychunks+0x320/0x470 skcipher_next_slow+0x14c/0x290 skcipher_walk_next+0x2fc/0x480 skcipher_walk_first+0x9c/0x110 skcipher_walk_aead_common+0x380/0x440 skcipher_walk_aead_encrypt+0x54/0x70 ccm_encrypt+0x13c/0x4d0 crypto_aead_encrypt+0x7c/0xfc pcrypt_aead_enc+0x28/0x84 padata_parallel_worker+0xd0/0x2dc process_one_work+0x49c/0xbdc worker_thread+0x124/0x880 kthread+0x210/0x260 ret_from_fork+0x10/0x18 This is because the value of rec_seq of tls_crypto_info configured by the user program is too large, for example, 0xffffffffffffff. In addition, TLS is asynchronously accelerated. When tls_do_encryption() returns -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow, skmsg is released before the asynchronous encryption process ends. As a result, the UAF problem occurs during the asynchronous processing of the encryption module. If the operation is asynchronous and the encryption module returns EINPROGRESS, do not free the record information. Fixes: 635d93981786 ("net/tls: free record only on encryption error") Signed-off-by: Liu Jian <liujian56(a)huawei.com> Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net> Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/tls/tls_sw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 96c95ea728ac..84e1f2af1a83 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -815,7 +815,7 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk, psock = sk_psock_get(sk); if (!psock || !policy) { err = tls_push_record(sk, flags, record_type); - if (err && sk->sk_err == EBADMSG) { + if (err && err != -EINPROGRESS && sk->sk_err == EBADMSG) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); err = -sk->sk_err; @@ -844,7 +844,7 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk, switch (psock->eval) { case __SK_PASS: err = tls_push_record(sk, flags, record_type); - if (err && sk->sk_err == EBADMSG) { + if (err && err != -EINPROGRESS && sk->sk_err == EBADMSG) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); err = -sk->sk_err; -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP2] mm/hugetlb: fix parameter passed to allocate bootmem memory
by Wupeng Ma 18 Nov '23

18 Nov '23

From: Jason Zeng <jason.zeng(a)intel.com> Intel inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7HUWC CVE: NA ----------------------------------------------- Intel-SIG: hugetlb: fix parameter passed to allocate bootmem memory __alloc_bootmem_huge_page_inner() should use 'min_addr" as the 3rd parameter to invoke memblock_alloc_try_nid_raw() and memblock_alloc_try_nid_raw_flags. Fixes: 74bfdf157f1f ("mm/hugetlb: Hugetlb use non-mirrored memory if memory reliable is enabled") Signed-off-by: Jason Zeng <jason.zeng(a)intel.com> [Wupeng: cherry-pick from a78d9e8141261369833de3cd2aa4e7e7880bddaf] Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7aca3351b9b9..8cf3d5bb4881 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2713,10 +2713,10 @@ static void *__init __alloc_bootmem_huge_page_inner(phys_addr_t size, int nid) { if (!mem_reliable_is_enabled()) - return memblock_alloc_try_nid_raw(size, align, max_addr, + return memblock_alloc_try_nid_raw(size, align, min_addr, max_addr, nid); - return memblock_alloc_try_nid_raw_flags(size, align, max_addr, max_addr, + return memblock_alloc_try_nid_raw_flags(size, align, min_addr, max_addr, nid, MEMBLOCK_NOMIRROR); } -- 2.25.1

2 1

[PATCH OLK-6.6 0/3] Add initial openeuler_defconfig for arm64 and x86
by Zheng Zengkai 17 Nov '23

17 Nov '23

If new options are introduced, but openeuler_defconfig is not explicitly configured, the actual compiled version may be configured according to the default settings, which may be different from the author's expectation. Therefore, some commands/scripts are added to help developers to check and update the defconfig. It is also convenient for continuous integration tools to check the consistency of defconfig. Based on OLK-5.10 openeuler_defconfig, use the commands mentioned above to generate the initial openeuler_defconfig for OLK-6.6 kernel build. Xie XiuQi (1): kconfig: Add script to check & update openeuler_defconfig Zheng Zengkai (2): config: add initial openeuler_defconfig for arm64 config: add initial openeuler_defconfig for x86 arch/arm64/configs/openeuler_defconfig | 7942 ++++++++++++++++++++ arch/x86/configs/openeuler_defconfig | 9158 ++++++++++++++++++++++++ scripts/kconfig/Makefile | 22 + scripts/kconfig/makeconfig.sh | 24 + 4 files changed, 17146 insertions(+) create mode 100644 arch/arm64/configs/openeuler_defconfig create mode 100644 arch/x86/configs/openeuler_defconfig create mode 100644 scripts/kconfig/makeconfig.sh -- 2.20.1

1 3

[PATCH OLK-6.6 0/3] Add initial openeuler_defconfig for arm64 and x86
by Zheng Zengkai 17 Nov '23

17 Nov '23

If new options are introduced, but openeuler_defconfig is not explicitly configured, the actual compiled version may be configured according to the default settings, which may be different from the author's expectation. Therefore, some commands/scripts are added to help developers to check and update the defconfig. It is also convenient for continuous integration tools to check the consistency of defconfig. Based on OLK-5.10 openeuler_defconfig, use the commands mentioned above to generate the initial openeuler_defconfig for OLK-6.6 kernel build. Xie XiuQi (1): kconfig: Add script to check & update openeuler_defconfig Zheng Zengkai (2): config: add initial openeuler_defconfig for arm64 config: add initial openeuler_defconfig for x86 arch/arm64/configs/openeuler_defconfig | 7942 ++++++++++++++++++++ arch/x86/configs/openeuler_defconfig | 9158 ++++++++++++++++++++++++ scripts/kconfig/Makefile | 22 + scripts/kconfig/makeconfig.sh | 24 + 4 files changed, 17146 insertions(+) create mode 100644 arch/arm64/configs/openeuler_defconfig create mode 100644 arch/x86/configs/openeuler_defconfig create mode 100644 scripts/kconfig/makeconfig.sh -- 2.20.1

1 3

[PATCH OLK-6.6 0/3] Add initial openeuler_defconfig for arm64 and x86
by Zheng Zengkai 17 Nov '23

17 Nov '23

If new options are introduced, but openeuler_defconfig is not explicitly configured, the actual compiled version may be configured according to the default settings, which may be different from the author's expectation. Therefore, some commands/scripts are added to help developers to check and update the defconfig. It is also convenient for continuous integration tools to check the consistency of defconfig. Based on OLK-5.10 openeuler_defconfig, use the commands mentioned above to generate the initial openeuler_defconfig for OLK-6.6 kernel build. Xie XiuQi (1): kconfig: Add script to check & update openeuler_defconfig Zheng Zengkai (2): config: add initial openeuler_defconfig for arm64 config: add initial openeuler_defconfig for x86 arch/arm64/configs/openeuler_defconfig | 7942 ++++++++++++++++++++ arch/x86/configs/openeuler_defconfig | 9158 ++++++++++++++++++++++++ scripts/kconfig/Makefile | 22 + scripts/kconfig/makeconfig.sh | 24 + 4 files changed, 17146 insertions(+) create mode 100644 arch/arm64/configs/openeuler_defconfig create mode 100644 arch/x86/configs/openeuler_defconfig create mode 100644 scripts/kconfig/makeconfig.sh -- 2.20.1

1 3

[PATCH openEuler-1.0-LTS v3] netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
by Zhengchao Shao 17 Nov '23

17 Nov '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.5-rc1 commit ff0a3a7d52ff7282dbd183e7fc29a1fe386b0c30 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZSO CVE: CVE-2023-39197 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Eric Dumazet says: nf_conntrack_dccp_packet() has an unique: dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); And nothing more is 'pulled' from the packet, depending on the content. dh->dccph_doff, and/or dh->dccph_x ...) So dccp_ack_seq() is happily reading stuff past the _dh buffer. BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0 Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371 [..] Fix this by increasing the stack buffer to also include room for the extra sequence numbers and all the known dccp packet type headers, then pull again after the initial validation of the basic header. While at it, mark packets invalid that lack 48bit sequence bit but where RFC says the type MUST use them. Compile tested only. v2: first skb_header_pointer() now needs to adjust the size to only pull the generic header. (Eric) Heads-up: I intend to remove dccp conntrack support later this year. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Reported-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Florian Westphal <fw(a)strlen.de> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_conntrack_proto_dccp.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_conntrack_proto_dccp.c | 52 +++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index 3ba1f4d9934f..fe17c2ddb6c8 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -433,17 +433,47 @@ static u64 dccp_ack_seq(const struct dccp_hdr *dh) ntohl(dhack->dccph_ack_nr_low); } +struct nf_conntrack_dccp_buf { + struct dccp_hdr dh; /* generic header part */ + struct dccp_hdr_ext ext; /* optional depending dh->dccph_x */ + union { /* depends on header type */ + struct dccp_hdr_ack_bits ack; + struct dccp_hdr_request req; + struct dccp_hdr_response response; + struct dccp_hdr_reset rst; + } u; +}; + +static struct dccp_hdr * +dccp_header_pointer(const struct sk_buff *skb, int offset, const struct dccp_hdr *dh, + struct nf_conntrack_dccp_buf *buf) +{ + unsigned int hdrlen = __dccp_hdr_len(dh); + + if (hdrlen > sizeof(*buf)) + return NULL; + + return skb_header_pointer(skb, offset, hdrlen, buf); +} + static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); - struct dccp_hdr _dh, *dh; + struct nf_conntrack_dccp_buf _dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; unsigned int *timeouts; + struct dccp_hdr *dh; - dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); + dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); BUG_ON(dh == NULL); + + /* pull again, including possible 48 bit sequences and subtype header */ + dh = dccp_header_pointer(skb, dataoff, dh, &_dh); + if (!dh) + return NF_DROP; + type = dh->dccph_type; if (type == DCCP_PKT_RESET && @@ -526,10 +556,20 @@ static int dccp_error(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, unsigned int dataoff, u_int8_t pf, unsigned int hooknum) { + static const unsigned long require_seq48 = 1 << DCCP_PKT_REQUEST | + 1 << DCCP_PKT_RESPONSE | + 1 << DCCP_PKT_CLOSEREQ | + 1 << DCCP_PKT_CLOSE | + 1 << DCCP_PKT_RESET | + 1 << DCCP_PKT_SYNC | + 1 << DCCP_PKT_SYNCACK; struct dccp_hdr _dh, *dh; unsigned int dccp_len = skb->len - dataoff; unsigned int cscov; const char *msg; + u8 type; + + BUILD_BUG_ON(DCCP_PKT_INVALID >= BITS_PER_LONG); dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); if (dh == NULL) { @@ -559,11 +599,17 @@ static int dccp_error(struct net *net, struct nf_conn *tmpl, goto out_invalid; } - if (dh->dccph_type >= DCCP_PKT_INVALID) { + type = dh->dccph_type; + if (type >= DCCP_PKT_INVALID) { msg = "nf_ct_dccp: reserved packet type "; goto out_invalid; } + if (test_bit(type, &require_seq48) && !dh->dccph_x) { + msg = "nf_ct_dccp: type lacks 48bit sequence numbers"; + goto out_invalid; + } + return NF_ACCEPT; out_invalid: -- 2.34.1

2 1

[PATCH OLK-5.10] LoongArch: fix two cpu hotplug problem
by Hongchen Zhang 17 Nov '23

17 Nov '23

From: Bibo Mao <maobibo(a)loongson.cn> LoongArch inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8H1QC ------------------------------------------ 1. When cpu is hotplug out, cpu is in idle state and function arch_cpu_idle_dead is called. Timer interrupt for this processor should be disabled, else there will be timer interrupt for the dead cpu. Also this prevents vcpu to schedule out during halt-polling flow when system is running in vm mode, since there is pending timer interrupt. This patch adds detailed implementation for timer shutdown interface, so that timer will be disabled when cpu is plug-out. 2. for woken-up cpu, entry address is 8 bytes, we should check first low 4 bytes and then high 4 bytes. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/kernel/smp.c | 17 +++++++++++++++-- arch/loongarch/kernel/time.c | 25 +++++++++---------------- 2 files changed, 24 insertions(+), 18 deletions(-) diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c index f5a94559d441..abf6484fac70 100644 --- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -310,16 +310,29 @@ void play_dead(void) register void (*init_fn)(void); idle_task_exit(); - local_irq_enable(); + /* + * vcpu can be woken up from idle emulation in vm if irq is disabled + */ + if (!cpu_has_hypervisor) + local_irq_enable(); set_csr_ecfg(ECFGF_IPI); __this_cpu_write(cpu_state, CPU_DEAD); __smp_mb(); do { __asm__ __volatile__("idle 0\n\t"); - addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0); + /* + * mailbox info is wroten from other CPU with IPI send method + * in function csr_mail_send, only 4 bytes can be wroten with + * IPI send method in one time. + * + * High 4 bytes is sent and then low 4 bytes for 8 bytes mail + * sending method. Here low 4 bytes is read by the first. + */ + addr = iocsr_read32(LOONGARCH_IOCSR_MBUF0); } while (addr == 0); + addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0); init_fn = (void *)TO_CACHE(addr); iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_CLEAR); diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c index 18fa38705da7..b2e8108bee10 100644 --- a/arch/loongarch/kernel/time.c +++ b/arch/loongarch/kernel/time.c @@ -59,21 +59,6 @@ static int constant_set_state_oneshot(struct clock_event_device *evt) return 0; } -static int constant_set_state_oneshot_stopped(struct clock_event_device *evt) -{ - unsigned long timer_config; - - raw_spin_lock(&state_lock); - - timer_config = csr_read64(LOONGARCH_CSR_TCFG); - timer_config &= ~CSR_TCFG_EN; - csr_write64(timer_config, LOONGARCH_CSR_TCFG); - - raw_spin_unlock(&state_lock); - - return 0; -} - static int constant_set_state_periodic(struct clock_event_device *evt) { unsigned long period; @@ -93,6 +78,14 @@ static int constant_set_state_periodic(struct clock_event_device *evt) static int constant_set_state_shutdown(struct clock_event_device *evt) { + unsigned long timer_config; + + raw_spin_lock(&state_lock); + timer_config = csr_read64(LOONGARCH_CSR_TCFG); + timer_config &= ~CSR_TCFG_EN; + csr_write64(timer_config, LOONGARCH_CSR_TCFG); + raw_spin_unlock(&state_lock); + return 0; } @@ -161,7 +154,7 @@ int constant_clockevent_init(void) cd->rating = 320; cd->cpumask = cpumask_of(cpu); cd->set_state_oneshot = constant_set_state_oneshot; - cd->set_state_oneshot_stopped = constant_set_state_oneshot_stopped; + cd->set_state_oneshot_stopped = constant_set_state_shutdown; cd->set_state_periodic = constant_set_state_periodic; cd->set_state_shutdown = constant_set_state_shutdown; cd->set_next_event = constant_timer_next_event; -- 2.33.0

2 1

[PATCH openEuler-22.03-LTS 00/22] Add error handle for add_disk
by Li Nan 16 Nov '23

16 Nov '23

To make applying the mainline patch easier, reorder the nbd commit about the first_minor check here. Christoph Hellwig (5): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference Luis Chamberlain (4): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling Tetsuo Handa (1): block: check minor range in device_add_disk() Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (7): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path block/blk.h | 5 +- include/linux/genhd.h | 13 +++ block/blk-integrity.c | 12 ++- block/genhd.c | 246 +++++++++++++++++++++++++----------------- drivers/block/nbd.c | 5 +- 5 files changed, 175 insertions(+), 106 deletions(-) -- 2.39.2

2 23

[PATCH OLK-5.10 0/2] Introduce CPU inspect feature
by Yu Liao 16 Nov '23

16 Nov '23

This patches series introduce CPU-inspect feature. CPU-inspect is designed to provide a framework for early detection of SDC by proactively executing CPU inspection test cases. Yu Liao (2): cpuinspect: add CPU-inspect infrastructure cpuinspect: add ATF inspector drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/cpuinspect/Kconfig | 24 +++ drivers/cpuinspect/Makefile | 8 + drivers/cpuinspect/cpuinspect.c | 166 ++++++++++++++++++++ drivers/cpuinspect/cpuinspect.h | 46 ++++++ drivers/cpuinspect/inspector-atf.c | 70 +++++++++ drivers/cpuinspect/inspector.c | 124 +++++++++++++++ drivers/cpuinspect/sysfs.c | 236 +++++++++++++++++++++++++++++ include/linux/cpuinspect.h | 40 +++++ 10 files changed, 717 insertions(+) create mode 100644 drivers/cpuinspect/Kconfig create mode 100644 drivers/cpuinspect/Makefile create mode 100644 drivers/cpuinspect/cpuinspect.c create mode 100644 drivers/cpuinspect/cpuinspect.h create mode 100644 drivers/cpuinspect/inspector-atf.c create mode 100644 drivers/cpuinspect/inspector.c create mode 100644 drivers/cpuinspect/sysfs.c create mode 100644 include/linux/cpuinspect.h -- 2.33.0

2 3

[PATCH openEuler-1.0-LTS] sched/rt: Fix double enqueue caused by rt_effective_prio
by Xia Fukun 16 Nov '23

16 Nov '23

From: Peter Zijlstra <peterz(a)infradead.org> mainline inclusion from mainline-v6.5-rc7 commit f558c2b834ec27e75d37b1c860c139e7b7c3a8e4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8H4SN CVE: NA -------------------------------- Double enqueues in rt runqueues (list) have been reported while running a simple test that spawns a number of threads doing a short sleep/run pattern while being concurrently setscheduled between rt and fair class. WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360 CPU: 3 PID: 2825 Comm: setsched__13 RIP: 0010:enqueue_task_rt+0x355/0x360 Call Trace: __sched_setscheduler+0x581/0x9d0 _sched_setscheduler+0x63/0xa0 do_sched_setscheduler+0xa0/0x150 __x64_sys_sched_setscheduler+0x1a/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40, next=ffff98679fc67ca0. kernel BUG at lib/list_debug.c:31! invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI CPU: 3 PID: 2825 Comm: setsched__13 RIP: 0010:__list_add_valid+0x41/0x50 Call Trace: enqueue_task_rt+0x291/0x360 __sched_setscheduler+0x581/0x9d0 _sched_setscheduler+0x63/0xa0 do_sched_setscheduler+0xa0/0x150 __x64_sys_sched_setscheduler+0x1a/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae __sched_setscheduler() uses rt_effective_prio() to handle proper queuing of priority boosted tasks that are setscheduled while being boosted. rt_effective_prio() is however called twice per each __sched_setscheduler() call: first directly by __sched_setscheduler() before dequeuing the task and then by __setscheduler() to actually do the priority change. If the priority of the pi_top_task is concurrently being changed however, it might happen that the two calls return different results. If, for example, the first call returned the same rt priority the task was running at and the second one a fair priority, the task won't be removed by the rt list (on_list still set) and then enqueued in the fair runqueue. When eventually setscheduled back to rt it will be seen as enqueued already and the WARNING/BUG be issued. Fix this by calling rt_effective_prio() only once and then reusing the return value. While at it refactor code as well for clarity. Concurrent priority inheritance handling is still safe and will eventually converge to a new state by following the inheritance chain(s). Fixes: 0782e63bc6fe ("sched: Handle priority boosted tasks proper in setscheduler()") [squashed Peterz changes; added changelog] Reported-by: Mark Simmons <msimmons(a)redhat.com> Signed-off-by: Juri Lelli <juri.lelli(a)redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Link: https://lkml.kernel.org/r/20210803104501.38333-1-juri.lelli@redhat.com Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/core.c | 82 +++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 47 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7825ceaae0c4..ce0a9026450d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -784,12 +784,18 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags) dequeue_task(rq, p, flags); } -/* - * __normal_prio - return the priority that is based on the static prio - */ -static inline int __normal_prio(struct task_struct *p) +static inline int __normal_prio(int policy, int rt_prio, int nice) { - return p->static_prio; + int prio; + + if (dl_policy(policy)) + prio = MAX_DL_PRIO - 1; + else if (rt_policy(policy)) + prio = MAX_RT_PRIO - 1 - rt_prio; + else + prio = NICE_TO_PRIO(nice); + + return prio; } /* @@ -801,15 +807,7 @@ static inline int __normal_prio(struct task_struct *p) */ static inline int normal_prio(struct task_struct *p) { - int prio; - - if (task_has_dl_policy(p)) - prio = MAX_DL_PRIO-1; - else if (task_has_rt_policy(p)) - prio = MAX_RT_PRIO-1 - p->rt_priority; - else - prio = __normal_prio(p); - return prio; + return __normal_prio(p->policy, p->rt_priority, PRIO_TO_NICE(p->static_prio)); } /* @@ -2327,7 +2325,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) } else if (PRIO_TO_NICE(p->static_prio) < 0) p->static_prio = NICE_TO_PRIO(0); - p->prio = p->normal_prio = __normal_prio(p); + p->prio = p->normal_prio = p->static_prio; set_load_weight(p, false); /* @@ -3795,6 +3793,18 @@ int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flag } EXPORT_SYMBOL(default_wake_function); +static void __setscheduler_prio(struct task_struct *p, int prio) +{ + if (dl_prio(prio)) + p->sched_class = &dl_sched_class; + else if (rt_prio(prio)) + p->sched_class = &rt_sched_class; + else + p->sched_class = &fair_sched_class; + + p->prio = prio; +} + #ifdef CONFIG_RT_MUTEXES static inline int __rt_effective_prio(struct task_struct *pi_task, int prio) @@ -3909,22 +3919,19 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) queue_flag |= ENQUEUE_REPLENISH; } else p->dl.dl_boosted = 0; - p->sched_class = &dl_sched_class; } else if (rt_prio(prio)) { if (dl_prio(oldprio)) p->dl.dl_boosted = 0; if (oldprio < prio) queue_flag |= ENQUEUE_HEAD; - p->sched_class = &rt_sched_class; } else { if (dl_prio(oldprio)) p->dl.dl_boosted = 0; if (rt_prio(oldprio)) p->rt.timeout = 0; - p->sched_class = &fair_sched_class; } - p->prio = prio; + __setscheduler_prio(p, prio); if (queued) enqueue_task(rq, p, queue_flag); @@ -4158,27 +4165,6 @@ static void __setscheduler_params(struct task_struct *p, set_load_weight(p, true); } -/* Actually do priority change: must hold pi & rq lock. */ -static void __setscheduler(struct rq *rq, struct task_struct *p, - const struct sched_attr *attr, bool keep_boost) -{ - __setscheduler_params(p, attr); - - /* - * Keep a potential priority boosting if called from - * sched_setscheduler(). - */ - p->prio = normal_prio(p); - if (keep_boost) - p->prio = rt_effective_prio(p, p->prio); - - if (dl_prio(p->prio)) - p->sched_class = &dl_sched_class; - else if (rt_prio(p->prio)) - p->sched_class = &rt_sched_class; - else - p->sched_class = &fair_sched_class; -} /* * Check the target process has a UID that matches the current process's: @@ -4200,10 +4186,8 @@ static int __sched_setscheduler(struct task_struct *p, const struct sched_attr *attr, bool user, bool pi) { - int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : - MAX_RT_PRIO - 1 - attr->sched_priority; - int retval, oldprio, oldpolicy = -1, queued, running; - int new_effective_prio, policy = attr->sched_policy; + int oldpolicy = -1, policy = attr->sched_policy; + int retval, oldprio, newprio, queued, running; const struct sched_class *prev_class; struct rq_flags rf; int reset_on_fork; @@ -4399,6 +4383,7 @@ static int __sched_setscheduler(struct task_struct *p, p->sched_reset_on_fork = reset_on_fork; oldprio = p->prio; + newprio = __normal_prio(policy, attr->sched_priority, attr->sched_nice); if (pi) { /* * Take priority boosted tasks into account. If the new @@ -4407,8 +4392,8 @@ static int __sched_setscheduler(struct task_struct *p, * the runqueue. This will be done when the task deboost * itself. */ - new_effective_prio = rt_effective_prio(p, newprio); - if (new_effective_prio == oldprio) + newprio = rt_effective_prio(p, newprio); + if (newprio == oldprio) queue_flags &= ~DEQUEUE_MOVE; } @@ -4420,7 +4405,10 @@ static int __sched_setscheduler(struct task_struct *p, put_prev_task(rq, p); prev_class = p->sched_class; - __setscheduler(rq, p, attr, pi); + if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { + __setscheduler_params(p, attr); + __setscheduler_prio(p, newprio); + } if (queued) { /* -- 2.34.1

2 1

[PATCH OLK-5.10] LoongArch: fix two cpu hotplug problem
by Hongchen Zhang 16 Nov '23

16 Nov '23

From: Bibo Mao <maobibo(a)loongson.cn> LoongArch inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8H1QC ------------------------------------------ 1. When cpu is hotplug out, cpu is in idle state and function arch_cpu_idle_dead is called. Timer interrupt for this processor should be disabled, else there will be timer interrupt for the dead cpu. Also this prevents vcpu to schedule out during halt-polling flow when system is running in vm mode, since there is pending timer interrupt. This patch adds detailed implementation for timer shutdown interface, so that timer will be disabled when cpu is plug-out. 2. for woken-up cpu, entry address is 8 bytes, we should check first low 4 bytes and then high 4 bytes. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/kernel/smp.c | 17 +++++++++++++++-- arch/loongarch/kernel/time.c | 25 +++++++++---------------- 2 files changed, 24 insertions(+), 18 deletions(-) diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c index f5a94559d441..abf6484fac70 100644 --- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -310,16 +310,29 @@ void play_dead(void) register void (*init_fn)(void); idle_task_exit(); - local_irq_enable(); + /* + * vcpu can be woken up from idle emulation in vm if irq is disabled + */ + if (!cpu_has_hypervisor) + local_irq_enable(); set_csr_ecfg(ECFGF_IPI); __this_cpu_write(cpu_state, CPU_DEAD); __smp_mb(); do { __asm__ __volatile__("idle 0\n\t"); - addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0); + /* + * mailbox info is wroten from other CPU with IPI send method + * in function csr_mail_send, only 4 bytes can be wroten with + * IPI send method in one time. + * + * High 4 bytes is sent and then low 4 bytes for 8 bytes mail + * sending method. Here low 4 bytes is read by the first. + */ + addr = iocsr_read32(LOONGARCH_IOCSR_MBUF0); } while (addr == 0); + addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0); init_fn = (void *)TO_CACHE(addr); iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_CLEAR); diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c index 18fa38705da7..b2e8108bee10 100644 --- a/arch/loongarch/kernel/time.c +++ b/arch/loongarch/kernel/time.c @@ -59,21 +59,6 @@ static int constant_set_state_oneshot(struct clock_event_device *evt) return 0; } -static int constant_set_state_oneshot_stopped(struct clock_event_device *evt) -{ - unsigned long timer_config; - - raw_spin_lock(&state_lock); - - timer_config = csr_read64(LOONGARCH_CSR_TCFG); - timer_config &= ~CSR_TCFG_EN; - csr_write64(timer_config, LOONGARCH_CSR_TCFG); - - raw_spin_unlock(&state_lock); - - return 0; -} - static int constant_set_state_periodic(struct clock_event_device *evt) { unsigned long period; @@ -93,6 +78,14 @@ static int constant_set_state_periodic(struct clock_event_device *evt) static int constant_set_state_shutdown(struct clock_event_device *evt) { + unsigned long timer_config; + + raw_spin_lock(&state_lock); + timer_config = csr_read64(LOONGARCH_CSR_TCFG); + timer_config &= ~CSR_TCFG_EN; + csr_write64(timer_config, LOONGARCH_CSR_TCFG); + raw_spin_unlock(&state_lock); + return 0; } @@ -161,7 +154,7 @@ int constant_clockevent_init(void) cd->rating = 320; cd->cpumask = cpumask_of(cpu); cd->set_state_oneshot = constant_set_state_oneshot; - cd->set_state_oneshot_stopped = constant_set_state_oneshot_stopped; + cd->set_state_oneshot_stopped = constant_set_state_shutdown; cd->set_state_periodic = constant_set_state_periodic; cd->set_state_shutdown = constant_set_state_shutdown; cd->set_next_event = constant_timer_next_event; -- 2.33.0

2 1

[PATCH OLK-5.10 v5 00/19] introduce smart_grid zone
by Yipeng Zou 16 Nov '23

16 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v4: 1. Place the highest level task in current domain level itself in sched_grid_prefer_cpus Since v3: 1. fix CI warning Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 11 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/fair.c | 496 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 273 ++++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 21 files changed, 1601 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10] SCSI: hisi_raid: support SPxxx series RAID/HBA controllers
by zhanglei 16 Nov '23

16 Nov '23

driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I89D3P CVE: NA ------------------------------------------ This commit is to support SPxxx RAID/HBA controllers. RAID controllers support RAID 0/1/5/6/10/50/60 modes. HBA controlllers support RAID 0/1/10 modes. RAID/HBA support SAS/SATA HDD/SSD. Signed-off-by: zhanglei <zhanglei48(a)huawei.com> --- Documentation/scsi/hisi_raid.rst | 84 + MAINTAINERS | 7 + arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/scsi/Kconfig | 1 + drivers/scsi/Makefile | 1 + drivers/scsi/hisi_raid/Kconfig | 14 + drivers/scsi/hisi_raid/Makefile | 7 + drivers/scsi/hisi_raid/hiraid.h | 760 +++++ drivers/scsi/hisi_raid/hiraid_main.c | 3982 ++++++++++++++++++++++++ 10 files changed, 4858 insertions(+) create mode 100644 Documentation/scsi/hisi_raid.rst create mode 100644 drivers/scsi/hisi_raid/Kconfig create mode 100644 drivers/scsi/hisi_raid/Makefile create mode 100644 drivers/scsi/hisi_raid/hiraid.h create mode 100644 drivers/scsi/hisi_raid/hiraid_main.c diff --git a/Documentation/scsi/hisi_raid.rst b/Documentation/scsi/hisi_raid.rst new file mode 100644 index 000000000000..523a6763a7fd --- /dev/null +++ b/Documentation/scsi/hisi_raid.rst @@ -0,0 +1,84 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================== +hisi_raid - HUAWEI SCSI RAID Controller driver +============================================== + +This file describes the hisi_raid SCSI driver for HUAWEI +RAID controllers. The hisi_raid driver is the first +generation RAID driver. + +For hisi_raid controller support, enable the hisi_raid driver +when configuring the kernel. + +hisi_raid specific entries in /sys +================================= + +hisi_raid host attributes +------------------------ + - /sys/class/scsi_host/host*/csts_pp + - /sys/class/scsi_host/host*/csts_shst + - /sys/class/scsi_host/host*/csts_cfs + - /sys/class/scsi_host/host*/csts_rdy + - /sys/class/scsi_host/host*/fw_version + + The host csts_pp attribute is a read only attribute. This attribute + indicates whether the controller is processing commands. If this attribute + is set to ‘1’, then the controller is processing commands normally. If + this attribute is cleared to ‘0’, then the controller has temporarily stopped + processing commands in order to handle an event (e.g., firmware activation). + + The host csts_shst attribute is a read only attribute. This attribute + indicates status of shutdown processing.The shutdown status values are defined + as: + ====== ============================== + Value Definition + ====== ============================== + 00b Normal operation + 01b Shutdown processing occurring + 10b Shutdown processing complete + 11b Reserved + ====== ============================== + The host csts_cfs attribute is a read only attribute. This attribute is set to + ’1’ when a fatal controller error occurred that could not be communicated in the + appropriate Completion Queue. This bit is cleared to ‘0’ when a fatal controller + error has not occurred. + + The host csts_rdy attribute is a read only attribute. This attribute is set to + ‘1’ when the controller is ready to process submission queue entries. + + The fw_version attribute is read-only and will return the driver version and the + controller firmware version. + +hisi_raid scsi device attributes +------------------------------ + - /sys/class/scsi_device/X\:X\:X\:X/device/raid_level + - /sys/class/scsi_device/X\:X\:X\:X/device/raid_state + - /sys/class/scsi_device/X\:X\:X\:X/device/raid_resync + + The device raid_level attribute is a read only attribute. This attribute indicates + RAID level of scsi device(will dispaly "NA" if scsi device is not virtual disk type). + + The device raid_state attribute is read-only and indicates RAID status of scsi + device(will dispaly "NA" if scsi device is not virtual disk type). + + The device raid_resync attribute is read-only and indicates RAID rebuild processing + of scsi device(will dispaly "NA" if scsi device is not virtual disk type). + +Supported devices +================= + + =================== ======= ======================================= + PCI ID (pci.ids) OEM Product + =================== ======= ======================================= + 19E5:3858 HUAWEI SP186-M-8i(HBA:8Ports) + 19E5:3858 HUAWEI SP186-M-16i(HBA:16Ports) + 19E5:3858 HUAWEI SP186-M-32i(HBA:32Ports) + 19E5:3858 HUAWEI SP186-M-40i(HBA:40Ports) + 19E5:3758 HUAWEI SP686C-M-16i(RAID:16Ports,2G cache) + 19E5:3758 HUAWEI SP686C-M-16i(RAID:16Ports,4G cache) + 19E5:3758 HUAWEI SP686C-MH-32i(RAID:32Ports,4G cache) + 19E5:3758 HUAWEI SP686C-M-40i(RAID:40Ports,2G cache) + 19E5:3758 HUAWEI SP686C-M-40i(RAID:40Ports,4G cache) + =================== ======= ======================================= + diff --git a/MAINTAINERS b/MAINTAINERS index a7815fd1072f..8324f56a2096 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8070,6 +8070,13 @@ M: Yonglong Liu <liuyonglong(a)huawei.com> S: Supported F: drivers/ptp/ptp_hisi.c +HISI_RAID SCSI RAID DRIVERS +M: Zhang Lei <zhanglei48(a)huawei.com> +L: linux-scsi(a)vger.kernel.org +S: Maintained +F: Documentation/scsi/hisi_raid.rst +F: drivers/scsi/hisi_raid/ + HMM - Heterogeneous Memory Management M: Jérôme Glisse <jglisse(a)redhat.com> L: linux-mm(a)kvack.org diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index ec758f0530c1..b9a50ef6d768 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -2413,6 +2413,7 @@ CONFIG_SCSI_MPT2SAS_MAX_SGE=128 CONFIG_SCSI_MPT3SAS_MAX_SGE=128 CONFIG_SCSI_MPT2SAS=m CONFIG_SCSI_SMARTPQI=m +CONFIG_SCSI_HISI_RAID=m # CONFIG_SCSI_UFSHCD is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_MYRB is not set diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 5171aa50736b..43b5294326e6 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -2369,6 +2369,7 @@ CONFIG_SCSI_MPT2SAS_MAX_SGE=128 CONFIG_SCSI_MPT3SAS_MAX_SGE=128 CONFIG_SCSI_MPT2SAS=m CONFIG_SCSI_SMARTPQI=m +CONFIG_SCSI_HISI_RAID=m # CONFIG_SCSI_UFSHCD is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_BUSLOGIC is not set diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index a9da1b2dec4a..41ef664cf0ed 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -473,6 +473,7 @@ source "drivers/scsi/megaraid/Kconfig.megaraid" source "drivers/scsi/sssraid/Kconfig" source "drivers/scsi/mpt3sas/Kconfig" source "drivers/scsi/smartpqi/Kconfig" +source "drivers/scsi/hisi_raid/Kconfig" source "drivers/scsi/ufs/Kconfig" config SCSI_HPTIOP diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index c2a1efa16912..8f26dbb5ee37 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -101,6 +101,7 @@ obj-$(CONFIG_MEGARAID_LEGACY) += megaraid.o obj-$(CONFIG_MEGARAID_NEWGEN) += megaraid/ obj-$(CONFIG_MEGARAID_SAS) += megaraid/ obj-$(CONFIG_SCSI_MPT3SAS) += mpt3sas/ +obj-$(CONFIG_SCSI_HISI_RAID) += hisi_raid/ obj-$(CONFIG_SCSI_UFSHCD) += ufs/ obj-$(CONFIG_SCSI_ACARD) += atp870u.o obj-$(CONFIG_SCSI_SUNESP) += esp_scsi.o sun_esp.o diff --git a/drivers/scsi/hisi_raid/Kconfig b/drivers/scsi/hisi_raid/Kconfig new file mode 100644 index 000000000000..d402dc45a7c1 --- /dev/null +++ b/drivers/scsi/hisi_raid/Kconfig @@ -0,0 +1,14 @@ +# +# Kernel configuration file for the hisi_raid +# + +config SCSI_HISI_RAID + tristate "Huawei Hisi_Raid Adapter" + depends on PCI && SCSI + select BLK_DEV_BSGLIB + depends on ARM64 || X86_64 + help + This driver supports hisi_raid SPxxx serial RAID controller, which has + PCI Express Gen4 interface with host and supports SAS/SATA HDD/SSD. + To compile this driver as a module, choose M here: the module will + be called hisi_raid. diff --git a/drivers/scsi/hisi_raid/Makefile b/drivers/scsi/hisi_raid/Makefile new file mode 100644 index 000000000000..b71a675f4190 --- /dev/null +++ b/drivers/scsi/hisi_raid/Makefile @@ -0,0 +1,7 @@ +# +# Makefile for the hisi_raid drivers. +# + +obj-$(CONFIG_SCSI_HISI_RAID) += hiraid.o + +hiraid-objs := hiraid_main.o diff --git a/drivers/scsi/hisi_raid/hiraid.h b/drivers/scsi/hisi_raid/hiraid.h new file mode 100644 index 000000000000..1ebc3dd3f2ec --- /dev/null +++ b/drivers/scsi/hisi_raid/hiraid.h @@ -0,0 +1,760 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Huawei Technologies Co., Ltd */ + +#ifndef __HIRAID_H_ +#define __HIRAID_H_ + +#define HIRAID_HDD_PD_QD 64 +#define HIRAID_HDD_VD_QD 256 +#define HIRAID_SSD_PD_QD 64 +#define HIRAID_SSD_VD_QD 256 + +#define BGTASK_TYPE_REBUILD 4 +#define USR_CMD_READ 0xc2 +#define USR_CMD_RDLEN 0x1000 +#define USR_CMD_VDINFO 0x704 +#define USR_CMD_BGTASK 0x504 +#define VDINFO_PARAM_LEN 0x04 + +#define HIRAID_DEFAULT_MAX_CHANNEL 4 +#define HIRAID_DEFAULT_MAX_ID 240 +#define HIRAID_DEFAULT_MAX_LUN_PER_HOST 8 + +#define FUA_MASK 0x08 + +#define HIRAID_IO_SQES 7 +#define HIRAID_IO_CQES 4 +#define PRP_ENTRY_SIZE 8 + +#define EXTRA_POOL_SIZE 256 +#define MAX_EXTRA_POOL_NUM 16 +#define MAX_CMD_PER_DEV 64 +#define MAX_CDB_LEN 16 + +#define HIRAID_AQ_DEPTH 128 +#define HIRAID_ASYN_COMMANDS 16 +#define HIRAID_AQ_BLK_MQ_DEPTH (HIRAID_AQ_DEPTH - HIRAID_ASYN_COMMANDS) +#define HIRAID_AQ_MQ_TAG_DEPTH (HIRAID_AQ_BLK_MQ_DEPTH - 1) + +#define HIRAID_ADMIN_QUEUE_NUM 1 +#define HIRAID_PTHRU_CMDS_PERQ 1 +#define HIRAID_TOTAL_PTCMDS(qn) (HIRAID_PTHRU_CMDS_PERQ * (qn)) + +#define HIRAID_DEV_INFO_ATTR_BOOT(attr) ((attr) & 0x01) +#define HIRAID_DEV_INFO_ATTR_VD(attr) (((attr) & 0x02) == 0x0) +#define HIRAID_DEV_INFO_ATTR_PT(attr) (((attr) & 0x22) == 0x02) +#define HIRAID_DEV_INFO_ATTR_RAWDISK(attr) ((attr) & 0x20) +#define HIRAID_DEV_DISK_TYPE(attr) ((attr) & 0x1e) + +#define HIRAID_DEV_INFO_FLAG_VALID(flag) ((flag) & 0x01) +#define HIRAID_DEV_INFO_FLAG_CHANGE(flag) ((flag) & 0x02) + +#define HIRAID_CAP_MQES(cap) ((cap) & 0xffff) +#define HIRAID_CAP_STRIDE(cap) (((cap) >> 32) & 0xf) +#define HIRAID_CAP_MPSMIN(cap) (((cap) >> 48) & 0xf) +#define HIRAID_CAP_MPSMAX(cap) (((cap) >> 52) & 0xf) +#define HIRAID_CAP_TIMEOUT(cap) (((cap) >> 24) & 0xff) +#define HIRAID_CAP_DMAMASK(cap) (((cap) >> 37) & 0xff) + +#define IO_SQE_SIZE sizeof(struct hiraid_scsi_io_cmd) +#define ADMIN_SQE_SIZE sizeof(struct hiraid_admin_command) +#define SQE_SIZE(qid) (((qid) > 0) ? IO_SQE_SIZE : ADMIN_SQE_SIZE) +#define CQ_SIZE(depth) ((depth) * sizeof(struct hiraid_completion)) +#define SQ_SIZE(qid, depth) ((depth) * SQE_SIZE(qid)) + +#define SENSE_SIZE(depth) ((depth) * SCSI_SENSE_BUFFERSIZE) + +#define IO_6_DEFAULT_TX_LEN 256 + +#define MAX_DEV_ENTRY_PER_PAGE_4K 340 + +#define MAX_REALTIME_BGTASK_NUM 32 + +#define PCI_VENDOR_ID_HUAWEI_LOGIC 0x19E5 +#define HIRAID_SERVER_DEVICE_HBA_DID 0x3858 +#define HIRAID_SERVER_DEVICE_RAID_DID 0x3758 + +enum { + HIRAID_SC_SUCCESS = 0x0, + HIRAID_SC_INVALID_OPCODE = 0x1, + HIRAID_SC_INVALID_FIELD = 0x2, + + HIRAID_SC_ABORT_LIMIT = 0x103, + HIRAID_SC_ABORT_MISSING = 0x104, + HIRAID_SC_ASYNC_LIMIT = 0x105, + + HIRAID_SC_DNR = 0x4000, +}; + +enum { + HIRAID_REG_CAP = 0x0000, + HIRAID_REG_CC = 0x0014, + HIRAID_REG_CSTS = 0x001c, + HIRAID_REG_AQA = 0x0024, + HIRAID_REG_ASQ = 0x0028, + HIRAID_REG_ACQ = 0x0030, + HIRAID_REG_DBS = 0x1000, +}; + +enum { + HIRAID_CC_ENABLE = 1 << 0, + HIRAID_CC_CSS_NVM = 0 << 4, + HIRAID_CC_MPS_SHIFT = 7, + HIRAID_CC_AMS_SHIFT = 11, + HIRAID_CC_SHN_SHIFT = 14, + HIRAID_CC_IOSQES_SHIFT = 16, + HIRAID_CC_IOCQES_SHIFT = 20, + HIRAID_CC_AMS_RR = 0 << HIRAID_CC_AMS_SHIFT, + HIRAID_CC_SHN_NONE = 0 << HIRAID_CC_SHN_SHIFT, + HIRAID_CC_IOSQES = HIRAID_IO_SQES << HIRAID_CC_IOSQES_SHIFT, + HIRAID_CC_IOCQES = HIRAID_IO_CQES << HIRAID_CC_IOCQES_SHIFT, + HIRAID_CC_SHN_NORMAL = 1 << HIRAID_CC_SHN_SHIFT, + HIRAID_CC_SHN_MASK = 3 << HIRAID_CC_SHN_SHIFT, + HIRAID_CSTS_CFS_SHIFT = 1, + HIRAID_CSTS_SHST_SHIFT = 2, + HIRAID_CSTS_PP_SHIFT = 5, + HIRAID_CSTS_RDY = 1 << 0, + HIRAID_CSTS_SHST_CMPLT = 2 << 2, + HIRAID_CSTS_SHST_MASK = 3 << 2, + HIRAID_CSTS_CFS_MASK = 1 << HIRAID_CSTS_CFS_SHIFT, + HIRAID_CSTS_PP_MASK = 1 << HIRAID_CSTS_PP_SHIFT, +}; + +enum { + HIRAID_ADMIN_DELETE_SQ = 0x00, + HIRAID_ADMIN_CREATE_SQ = 0x01, + HIRAID_ADMIN_DELETE_CQ = 0x04, + HIRAID_ADMIN_CREATE_CQ = 0x05, + HIRAID_ADMIN_ABORT_CMD = 0x08, + HIRAID_ADMIN_SET_FEATURES = 0x09, + HIRAID_ADMIN_ASYNC_EVENT = 0x0c, + HIRAID_ADMIN_GET_INFO = 0xc6, + HIRAID_ADMIN_RESET = 0xc8, +}; + +enum { + HIRAID_GET_CTRL_INFO = 0, + HIRAID_GET_DEVLIST_INFO = 1, +}; + +enum hiraid_rst_type { + HIRAID_RESET_TARGET = 0, + HIRAID_RESET_BUS = 1, +}; + +enum { + HIRAID_ASYN_EVENT_ERROR = 0, + HIRAID_ASYN_EVENT_NOTICE = 2, + HIRAID_ASYN_EVENT_VS = 7, +}; + +enum { + HIRAID_ASYN_DEV_CHANGED = 0x00, + HIRAID_ASYN_FW_ACT_START = 0x01, + HIRAID_ASYN_HOST_PROBING = 0x10, +}; + +enum { + HIRAID_ASYN_TIMESYN = 0x00, + HIRAID_ASYN_FW_ACT_FINISH = 0x02, + HIRAID_ASYN_EVENT_MIN = 0x80, + HIRAID_ASYN_EVENT_MAX = 0xff, +}; + +enum { + HIRAID_CMD_WRITE = 0x01, + HIRAID_CMD_READ = 0x02, + + HIRAID_CMD_NONRW_NONE = 0x80, + HIRAID_CMD_NONRW_TODEV = 0x81, + HIRAID_CMD_NONRW_FROMDEV = 0x82, +}; + +enum { + HIRAID_QUEUE_PHYS_CONTIG = (1 << 0), + HIRAID_CQ_IRQ_ENABLED = (1 << 1), + + HIRAID_FEATURE_NUM_QUEUES = 0x07, + HIRAID_FEATURE_ASYNC_EVENT = 0x0b, + HIRAID_FEATURE_TIMESTAMP = 0x0e, +}; + +enum hiraid_dev_state { + DEV_NEW, + DEV_LIVE, + DEV_RESETTING, + DEV_DELETING, + DEV_DEAD, +}; + +enum { + HIRAID_CARD_HBA, + HIRAID_CARD_RAID, +}; + +enum hiraid_cmd_type { + HIRAID_CMD_ADMIN, + HIRAID_CMD_PTHRU, +}; + +enum { + SQE_FLAG_SGL_METABUF = (1 << 6), + SQE_FLAG_SGL_METASEG = (1 << 7), + SQE_FLAG_SGL_ALL = SQE_FLAG_SGL_METABUF | SQE_FLAG_SGL_METASEG, +}; + +enum hiraid_cmd_state { + CMD_IDLE = 0, + CMD_FLIGHT = 1, + CMD_COMPLETE = 2, + CMD_TIMEOUT = 3, + CMD_TMO_COMPLETE = 4, +}; + +enum { + HIRAID_BSG_ADMIN, + HIRAID_BSG_IOPTHRU, +}; + +enum { + HIRAID_SAS_HDD_VD = 0x04, + HIRAID_SATA_HDD_VD = 0x08, + HIRAID_SAS_SSD_VD = 0x0c, + HIRAID_SATA_SSD_VD = 0x10, + HIRAID_NVME_SSD_VD = 0x14, + HIRAID_SAS_HDD_PD = 0x06, + HIRAID_SATA_HDD_PD = 0x0a, + HIRAID_SAS_SSD_PD = 0x0e, + HIRAID_SATA_SSD_PD = 0x12, + HIRAID_NVME_SSD_PD = 0x16, +}; + +enum { + DISPATCH_BY_CPU, + DISPATCH_BY_DISK, +}; + +struct hiraid_completion { + __le32 result; + union { + struct { + __u8 sense_len; + __u8 resv[3]; + }; + __le32 result1; + }; + __le16 sq_head; + __le16 sq_id; + __le16 cmd_id; + __le16 status; +}; + +struct hiraid_ctrl_info { + __le32 nd; + __le16 max_cmds; + __le16 max_channel; + __le32 max_tgt_id; + __le16 max_lun; + __le16 max_num_sge; + __le16 lun_num_boot; + __u8 mdts; + __u8 acl; + __u8 asynevent; + __u8 card_type; + __u8 pt_use_sgl; + __u8 rsvd; + __le32 rtd3e; + __u8 sn[32]; + __u8 fw_version[16]; + __u8 rsvd1[4020]; +}; + +struct hiraid_dev { + struct pci_dev *pdev; + struct device *dev; + struct Scsi_Host *shost; + struct hiraid_queue *queues; + struct dma_pool *prp_page_pool; + struct dma_pool *prp_extra_pool[MAX_EXTRA_POOL_NUM]; + void __iomem *bar; + u32 max_qid; + u32 num_vecs; + u32 queue_count; + u32 ioq_depth; + u32 db_stride; + u32 __iomem *dbs; + struct rw_semaphore dev_rwsem; + int numa_node; + u32 page_size; + u32 ctrl_config; + u32 online_queues; + u64 cap; + u32 scsi_qd; + u32 instance; + struct hiraid_ctrl_info *ctrl_info; + struct hiraid_dev_info *dev_info; + + struct hiraid_cmd *adm_cmds; + struct list_head adm_cmd_list; + spinlock_t adm_cmd_lock; + + struct hiraid_cmd *io_ptcmds; + struct list_head io_pt_list; + spinlock_t io_pt_lock; + + struct work_struct scan_work; + struct work_struct timesyn_work; + struct work_struct reset_work; + struct work_struct fwact_work; + + enum hiraid_dev_state state; + spinlock_t state_lock; + + void *sense_buffer_virt; + dma_addr_t sense_buffer_phy; + u32 last_qcnt; + u8 hdd_dispatch; + + struct request_queue *bsg_queue; +}; + +struct hiraid_sgl_desc { + __le64 addr; + __le32 length; + __u8 rsvd[3]; + __u8 type; +}; + +union hiraid_data_ptr { + struct { + __le64 prp1; + __le64 prp2; + }; + struct hiraid_sgl_desc sgl; +}; + +struct hiraid_admin_com_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le32 cdw2[4]; + union hiraid_data_ptr dptr; + __le32 cdw10; + __le32 cdw11; + __le32 cdw12; + __le32 cdw13; + __le32 cdw14; + __le32 cdw15; +}; + +struct hiraid_features { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u64 rsvd2[2]; + union hiraid_data_ptr dptr; + __le32 fid; + __le32 dword11; + __le32 dword12; + __le32 dword13; + __le32 dword14; + __le32 dword15; +}; + +struct hiraid_create_cq { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __u32 rsvd1[5]; + __le64 prp1; + __u64 rsvd8; + __le16 cqid; + __le16 qsize; + __le16 cq_flags; + __le16 irq_vector; + __u32 rsvd12[4]; +}; + +struct hiraid_create_sq { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __u32 rsvd1[5]; + __le64 prp1; + __u64 rsvd8; + __le16 sqid; + __le16 qsize; + __le16 sq_flags; + __le16 cqid; + __u32 rsvd12[4]; +}; + +struct hiraid_delete_queue { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __u32 rsvd1[9]; + __le16 qid; + __u16 rsvd10; + __u32 rsvd11[5]; +}; + +struct hiraid_get_info { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u32 rsvd2[4]; + union hiraid_data_ptr dptr; + __u8 type; + __u8 rsvd10[3]; + __le32 cdw11; + __u32 rsvd12[4]; +}; + +struct hiraid_usr_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + union { + struct { + __le16 subopcode; + __le16 rsvd1; + } info_0; + __le32 cdw2; + }; + union { + struct { + __le16 data_len; + __le16 param_len; + } info_1; + __le32 cdw3; + }; + __u64 metadata; + union hiraid_data_ptr dptr; + __le32 cdw10; + __le32 cdw11; + __le32 cdw12; + __le32 cdw13; + __le32 cdw14; + __le32 cdw15; +}; + +struct hiraid_abort_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u64 rsvd2[4]; + __le16 sqid; + __le16 cid; + __u32 rsvd11[5]; +}; + +struct hiraid_reset_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u64 rsvd2[4]; + __u8 type; + __u8 rsvd10[3]; + __u32 rsvd11[5]; +}; + +struct hiraid_admin_command { + union { + struct hiraid_admin_com_cmd common; + struct hiraid_features features; + struct hiraid_create_cq create_cq; + struct hiraid_create_sq create_sq; + struct hiraid_delete_queue delete_queue; + struct hiraid_get_info get_info; + struct hiraid_abort_cmd abort; + struct hiraid_reset_cmd reset; + struct hiraid_usr_cmd usr_cmd; + }; +}; + +struct hiraid_scsi_io_com_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_len; + __u8 rsvd2; + __le32 cdw3[3]; + union hiraid_data_ptr dptr; + __le32 cdw10[6]; + __u8 cdb[32]; + __le64 sense_addr; + __le32 cdw26[6]; +}; + +struct hiraid_scsi_rw_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_len; + __u8 rsvd2; + __u32 rsvd3[3]; + union hiraid_data_ptr dptr; + __le64 slba; + __le16 nlb; + __le16 control; + __u32 rsvd13[3]; + __u8 cdb[32]; + __le64 sense_addr; + __u32 rsvd26[6]; +}; + +struct hiraid_scsi_nonrw_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_length; + __u8 rsvd2; + __u32 rsvd3[3]; + union hiraid_data_ptr dptr; + __u32 rsvd10[5]; + __le32 buf_len; + __u8 cdb[32]; + __le64 sense_addr; + __u32 rsvd26[6]; +}; + +struct hiraid_scsi_io_cmd { + union { + struct hiraid_scsi_io_com_cmd common; + struct hiraid_scsi_rw_cmd rw; + struct hiraid_scsi_nonrw_cmd nonrw; + }; +}; + +struct hiraid_passthru_common_cmd { + __u8 opcode; + __u8 flags; + __u16 rsvd0; + __u32 nsid; + union { + struct { + __u16 subopcode; + __u16 rsvd1; + } info_0; + __u32 cdw2; + }; + union { + struct { + __u16 data_len; + __u16 param_len; + } info_1; + __u32 cdw3; + }; + __u64 metadata; + + __u64 addr; + __u64 prp2; + + __u32 cdw10; + __u32 cdw11; + __u32 cdw12; + __u32 cdw13; + __u32 cdw14; + __u32 cdw15; + __u32 timeout_ms; + __u32 result0; + __u32 result1; +}; + +struct hiraid_passthru_io_cmd { + __u8 opcode; + __u8 flags; + __u16 rsvd0; + __u32 nsid; + union { + struct { + __u16 res_sense_len; + __u8 cdb_len; + __u8 rsvd0; + } info_0; + __u32 cdw2; + }; + union { + struct { + __u16 subopcode; + __u16 rsvd1; + } info_1; + __u32 cdw3; + }; + union { + struct { + __u16 rsvd; + __u16 param_len; + } info_2; + __u32 cdw4; + }; + __u32 cdw5; + __u64 addr; + __u64 prp2; + union { + struct { + __u16 eid; + __u16 sid; + } info_3; + __u32 cdw10; + }; + union { + struct { + __u16 did; + __u8 did_flag; + __u8 rsvd2; + } info_4; + __u32 cdw11; + }; + __u32 cdw12; + __u32 cdw13; + __u32 cdw14; + __u32 data_len; + __u32 cdw16; + __u32 cdw17; + __u32 cdw18; + __u32 cdw19; + __u32 cdw20; + __u32 cdw21; + __u32 cdw22; + __u32 cdw23; + __u64 sense_addr; + __u32 cdw26[4]; + __u32 timeout_ms; + __u32 result0; + __u32 result1; +}; + +struct hiraid_bsg_request { + u32 msgcode; + u32 control; + union { + struct hiraid_passthru_common_cmd admcmd; + struct hiraid_passthru_io_cmd pthrucmd; + }; +}; + +struct hiraid_cmd { + u16 qid; + u16 cid; + u32 result0; + u32 result1; + u16 status; + void *priv; + enum hiraid_cmd_state state; + struct completion cmd_done; + struct list_head list; +}; + +struct hiraid_queue { + struct hiraid_dev *hdev; + spinlock_t sq_lock; + + spinlock_t cq_lock ____cacheline_aligned_in_smp; + + void *sq_cmds; + + struct hiraid_completion *cqes; + + dma_addr_t sq_buffer_phy; + dma_addr_t cq_buffer_phy; + u32 __iomem *q_db; + u8 cq_phase; + u8 sqes; + u16 qid; + u16 sq_tail; + u16 cq_head; + u16 last_cq_head; + u16 q_depth; + s16 cq_vector; + atomic_t inflight; + void *sense_buffer_virt; + dma_addr_t sense_buffer_phy; + struct dma_pool *prp_small_pool; +}; + +struct hiraid_mapmange { + struct hiraid_queue *hiraidq; + enum hiraid_cmd_state state; + u16 cid; + int page_cnt; + u32 sge_cnt; + u32 len; + bool use_sgl; + dma_addr_t first_dma; + void *sense_buffer_virt; + dma_addr_t sense_buffer_phy; + struct scatterlist *sgl; + void *list[0]; +}; + +struct hiraid_vd_info { + __u8 name[32]; + __le16 id; + __u8 rg_id; + __u8 rg_level; + __u8 sg_num; + __u8 sg_disk_num; + __u8 vd_status; + __u8 vd_type; + __u8 rsvd1[4056]; +}; + +struct bgtask_info { + __u8 type; + __u8 progress; + __u8 rate; + __u8 rsvd0; + __le16 vd_id; + __le16 time_left; + __u8 rsvd1[4]; +}; + +struct hiraid_bgtask { + __u8 sw; + __u8 task_num; + __u8 rsvd[6]; + struct bgtask_info bgtask[MAX_REALTIME_BGTASK_NUM]; +}; + +struct hiraid_dev_info { + __le32 hdid; + __le16 target; + __u8 channel; + __u8 lun; + __u8 attr; + __u8 flag; + __le16 max_io_kb; +}; + +struct hiraid_dev_list { + __le32 dev_num; + __u32 rsvd0[3]; + struct hiraid_dev_info devinfo[MAX_DEV_ENTRY_PER_PAGE_4K]; +}; + +struct hiraid_sdev_hostdata { + u32 hdid; + u16 max_io_kb; + u8 attr; + u8 flag; + u8 rg_id; + u8 hwq; + u16 pend_count; +}; + +#endif + diff --git a/drivers/scsi/hisi_raid/hiraid_main.c b/drivers/scsi/hisi_raid/hiraid_main.c new file mode 100644 index 000000000000..b9ffa642479c --- /dev/null +++ b/drivers/scsi/hisi_raid/hiraid_main.c @@ -0,0 +1,3982 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2022 Huawei Technologies Co., Ltd */ + +/* Huawei Raid Series Linux Driver */ + +#define pr_fmt(fmt) "hiraid: " fmt + +#include <linux/sched/signal.h> +#include <linux/version.h> +#include <linux/pci.h> +#include <linux/aer.h> +#include <linux/module.h> +#include <linux/ioport.h> +#include <linux/device.h> +#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/cdev.h> +#include <linux/sysfs.h> +#include <linux/gfp.h> +#include <linux/types.h> +#include <linux/ratelimit.h> +#include <linux/once.h> +#include <linux/debugfs.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/blkdev.h> +#include <linux/bsg-lib.h> +#include <asm/unaligned.h> +#include <linux/sort.h> +#include <target/target_core_backend.h> + +#include <scsi/scsi.h> +#include <scsi/scsi_cmnd.h> +#include <scsi/scsi_device.h> +#include <scsi/scsi_host.h> +#include <scsi/scsi_transport.h> +#include <scsi/scsi_dbg.h> +#include <scsi/sg.h> + +#include "hiraid.h" + +static u32 admin_tmout = 60; +module_param(admin_tmout, uint, 0644); +MODULE_PARM_DESC(admin_tmout, "admin commands timeout (seconds)"); + +static u32 scmd_tmout_rawdisk = 180; +module_param(scmd_tmout_rawdisk, uint, 0644); +MODULE_PARM_DESC(scmd_tmout_rawdisk, "scsi commands timeout for rawdisk(seconds)"); + +static u32 scmd_tmout_vd = 180; +module_param(scmd_tmout_vd, uint, 0644); +MODULE_PARM_DESC(scmd_tmout_vd, "scsi commands timeout for vd(seconds)"); + +static bool max_io_force; +module_param(max_io_force, bool, 0644); +MODULE_PARM_DESC(max_io_force, "force max_hw_sectors_kb = 1024, default false(performance first)"); + +static bool work_mode; +module_param(work_mode, bool, 0444); +MODULE_PARM_DESC(work_mode, "work mode switch, default false for multi hw queues"); + +#define MAX_IO_QUEUES 128 +#define MIN_IO_QUEUES 1 + +static int ioq_num_set(const char *val, const struct kernel_param *kp) +{ + int n = 0; + int ret; + + ret = kstrtoint(val, 10, &n); + if (ret != 0 || n < MIN_IO_QUEUES || n > MAX_IO_QUEUES) + return -EINVAL; + + return param_set_int(val, kp); +} + +static const struct kernel_param_ops max_hwq_num_ops = { + .set = ioq_num_set, + .get = param_get_uint, +}; + +static u32 max_hwq_num = 128; +module_param_cb(max_hwq_num, &max_hwq_num_ops, &max_hwq_num, 0444); +MODULE_PARM_DESC(max_hwq_num, "max num of hw io queues, should >= 1, default 128"); + +static int io_queue_depth_set(const char *val, const struct kernel_param *kp) +{ + int n = 0; + int ret; + + ret = kstrtoint(val, 10, &n); + if (ret != 0 || n < 2) + return -EINVAL; + + return param_set_int(val, kp); +} + +static const struct kernel_param_ops io_queue_depth_ops = { + .set = io_queue_depth_set, + .get = param_get_uint, +}; + +static u32 io_queue_depth = 1024; +module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644); +MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2"); + +static u32 log_debug_switch; +module_param(log_debug_switch, uint, 0644); +MODULE_PARM_DESC(log_debug_switch, "set log state, default zero for switch off"); + +static int extra_pool_num_set(const char *val, const struct kernel_param *kp) +{ + u8 n = 0; + int ret; + + ret = kstrtou8(val, 10, &n); + if (ret != 0) + return -EINVAL; + if (n > MAX_EXTRA_POOL_NUM) + n = MAX_EXTRA_POOL_NUM; + if (n < 1) + n = 1; + *((u8 *)kp->arg) = n; + + return 0; +} + +static const struct kernel_param_ops small_pool_num_ops = { + .set = extra_pool_num_set, + .get = param_get_byte, +}; + +/* It was found that the spindlock of a single pool conflicts + * a lot with multiple CPUs.So multiple pools are introduced + * to reduce the conflictions. + */ +static unsigned char extra_pool_num = 4; +module_param_cb(extra_pool_num, &small_pool_num_ops, &extra_pool_num, 0644); +MODULE_PARM_DESC(extra_pool_num, "set prp extra pool num, default 4, MAX 16"); + +static void hiraid_handle_async_notice(struct hiraid_dev *hdev, u32 result); +static void hiraid_handle_async_vs(struct hiraid_dev *hdev, u32 result, u32 result1); + +static struct class *hiraid_class; + +#define HIRAID_CAP_TIMEOUT_UNIT_MS (HZ / 2) + +static struct workqueue_struct *work_queue; + +#define dev_log_dbg(dev, fmt, ...) do { \ + if (unlikely(log_debug_switch)) \ + dev_info(dev, "[%s] " fmt, \ + __func__, ##__VA_ARGS__); \ +} while (0) + +#define HIRAID_DRV_VERSION "1.1.0.0" + +#define ADMIN_TIMEOUT (admin_tmout * HZ) +#define USRCMD_TIMEOUT (180 * HZ) +#define CTL_RST_TIME (600 * HZ) + +#define HIRAID_WAIT_ABNL_CMD_TIMEOUT 6 +#define HIRAID_WAIT_RST_IO_TIMEOUT 10 + +#define HIRAID_DMA_MSK_BIT_MAX 64 + +#define IOQ_PT_DATA_LEN 4096 +#define IOQ_PT_SGL_DATA_LEN (1024 * 1024) + +#define MAX_CAN_QUEUE (4096 - 1) +#define MIN_CAN_QUEUE (1024 - 1) + +enum SENSE_STATE_CODE { + SENSE_STATE_OK = 0, + SENSE_STATE_NEED_CHECK, + SENSE_STATE_ERROR, + SENSE_STATE_EP_PCIE_ERROR, + SENSE_STATE_NAC_DMA_ERROR, + SENSE_STATE_ABORTED, + SENSE_STATE_NEED_RETRY +}; + +enum { + FW_EH_OK = 0, + FW_EH_DEV_NONE = 0x701 +}; + +static const char * const raid_levels[] = {"0", "1", "5", "6", "10", "50", "60", "NA"}; + +static const char * const raid_states[] = { + "NA", "NORMAL", "FAULT", "DEGRADE", "NOT_FORMATTED", "FORMATTING", "SANITIZING", + "INITIALIZING", "INITIALIZE_FAIL", "DELETING", "DELETE_FAIL", "WRITE_PROTECT" +}; + +static int hiraid_remap_bar(struct hiraid_dev *hdev, u32 size) +{ + struct pci_dev *pdev = hdev->pdev; + + if (size > pci_resource_len(pdev, 0)) { + dev_err(hdev->dev, "input size[%u] exceed bar0 length[%llu]\n", + size, pci_resource_len(pdev, 0)); + return -ENOMEM; + } + + if (hdev->bar) + iounmap(hdev->bar); + + hdev->bar = ioremap(pci_resource_start(pdev, 0), size); + if (!hdev->bar) { + dev_err(hdev->dev, "ioremap for bar0 failed\n"); + return -ENOMEM; + } + hdev->dbs = hdev->bar + HIRAID_REG_DBS; + + return 0; +} + +static int hiraid_dev_map(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + int ret; + + ret = pci_request_mem_regions(pdev, "hiraid"); + if (ret) { + dev_err(hdev->dev, "fail to request memory regions\n"); + return ret; + } + + ret = hiraid_remap_bar(hdev, HIRAID_REG_DBS + 4096); + if (ret) { + pci_release_mem_regions(pdev); + return ret; + } + + return 0; +} + +static void hiraid_dev_unmap(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + + if (hdev->bar) { + iounmap(hdev->bar); + hdev->bar = NULL; + } + pci_release_mem_regions(pdev); +} + +static int hiraid_pci_enable(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + int ret = -ENOMEM; + u64 maskbit = HIRAID_DMA_MSK_BIT_MAX; + + if (pci_enable_device_mem(pdev)) { + dev_err(hdev->dev, "enable pci device memory resources failed\n"); + return ret; + } + pci_set_master(pdev); + + if (readl(hdev->bar + HIRAID_REG_CSTS) == U32_MAX) { + ret = -ENODEV; + dev_err(hdev->dev, "read CSTS register failed\n"); + goto disable; + } + + hdev->cap = lo_hi_readq(hdev->bar + HIRAID_REG_CAP); + hdev->ioq_depth = min_t(u32, HIRAID_CAP_MQES(hdev->cap) + 1, io_queue_depth); + hdev->db_stride = 1 << HIRAID_CAP_STRIDE(hdev->cap); + + maskbit = HIRAID_CAP_DMAMASK(hdev->cap); + if (maskbit < 32 || maskbit > HIRAID_DMA_MSK_BIT_MAX) { + dev_err(hdev->dev, "err, dma mask invalid[%llu], set to default\n", maskbit); + maskbit = HIRAID_DMA_MSK_BIT_MAX; + } + + if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(maskbit))) { + if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32))) { + dev_err(hdev->dev, "set dma mask[32] and coherent failed\n"); + goto disable; + } + dev_info(hdev->dev, "set dma mask[32] success\n"); + } else { + dev_info(hdev->dev, "set dma mask[%llu] success\n", maskbit); + } + + ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); + if (ret < 0) { + dev_err(hdev->dev, "allocate one IRQ for setup admin queue failed\n"); + goto disable; + } + + pci_enable_pcie_error_reporting(pdev); + pci_save_state(pdev); + + return 0; + +disable: + pci_disable_device(pdev); + return ret; +} + + +/* + * It is fact that first prp and last prp may be not full page. + * The size to count total nprps for the io equal to size + page_size, + * it may be a slightly overestimate. + * + * 8B per prp address. It may be there is one prp_list address per page, + * prp_list address does not count in io data prps. So divisor equal to + * PAGE_SIZE - 8, it may be a slightly overestimate. + */ +static int hiraid_prp_pagenum(struct hiraid_dev *hdev) +{ + u32 size = 1U << ((hdev->ctrl_info->mdts) * 1U) << 12; + u32 nprps = DIV_ROUND_UP(size + hdev->page_size, hdev->page_size); + + return DIV_ROUND_UP(PRP_ENTRY_SIZE * nprps, hdev->page_size - PRP_ENTRY_SIZE); +} + +/* + * Calculates the number of pages needed for the SGL segments. For example a 4k + * page can accommodate 256 SGL descriptors. + */ +static int hiraid_sgl_pagenum(struct hiraid_dev *hdev) +{ + u32 nsge = le16_to_cpu(hdev->ctrl_info->max_num_sge); + + return DIV_ROUND_UP(nsge * sizeof(struct hiraid_sgl_desc), hdev->page_size); +} + +static inline void **hiraid_mapbuf_list(struct hiraid_mapmange *mapbuf) +{ + return mapbuf->list; +} + +static u32 hiraid_get_max_cmd_size(struct hiraid_dev *hdev) +{ + u32 alloc_size = sizeof(__le64 *) * max(hiraid_prp_pagenum(hdev), hiraid_sgl_pagenum(hdev)); + + dev_info(hdev->dev, "mapbuf size[%lu], alloc_size[%u]\n", + sizeof(struct hiraid_mapmange), alloc_size); + + return sizeof(struct hiraid_mapmange) + alloc_size; +} + +static int hiraid_build_passthru_prp(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf) +{ + struct scatterlist *sg = mapbuf->sgl; + __le64 *phy_regpage, *prior_list; + u64 buf_addr = sg_dma_address(sg); + int buf_length = sg_dma_len(sg); + u32 page_size = hdev->page_size; + int offset = buf_addr & (page_size - 1); + void **list = hiraid_mapbuf_list(mapbuf); + int maplen = mapbuf->len; + struct dma_pool *pool; + dma_addr_t buffer_phy; + int i; + + maplen -= (page_size - offset); + if (maplen <= 0) { + mapbuf->first_dma = 0; + return 0; + } + + buf_length -= (page_size - offset); + if (buf_length) { + buf_addr += (page_size - offset); + } else { + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + if (maplen <= page_size) { + mapbuf->first_dma = buf_addr; + return 0; + } + + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate first admin prp_list memory failed\n"); + mapbuf->first_dma = buf_addr; + mapbuf->page_cnt = -1; + return -ENOMEM; + } + list[0] = phy_regpage; + mapbuf->first_dma = buffer_phy; + i = 0; + for (;;) { + if (i == page_size / PRP_ENTRY_SIZE) { + prior_list = phy_regpage; + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th admin prp list memory failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = phy_regpage; + phy_regpage[0] = prior_list[i - 1]; + prior_list[i - 1] = cpu_to_le64(buffer_phy); + i = 1; + } + phy_regpage[i++] = cpu_to_le64(buf_addr); + buf_addr += page_size; + buf_length -= page_size; + maplen -= page_size; + if (maplen <= 0) + break; + if (buf_length > 0) + continue; + if (unlikely(buf_length < 0)) + goto bad_admin_sgl; + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + return 0; + +bad_admin_sgl: + dev_err(hdev->dev, "setup prps, invalid admin SGL for payload[%d] nents[%d]\n", + mapbuf->len, mapbuf->sge_cnt); + return -EIO; +} + +static int hiraid_build_prp(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf) +{ + struct scatterlist *sg = mapbuf->sgl; + __le64 *phy_regpage, *prior_list; + u64 buf_addr = sg_dma_address(sg); + int buf_length = sg_dma_len(sg); + u32 page_size = hdev->page_size; + int offset = buf_addr & (page_size - 1); + void **list = hiraid_mapbuf_list(mapbuf); + int maplen = mapbuf->len; + struct dma_pool *pool; + dma_addr_t buffer_phy; + int nprps, i; + + maplen -= (page_size - offset); + if (maplen <= 0) { + mapbuf->first_dma = 0; + return 0; + } + + buf_length -= (page_size - offset); + if (buf_length) { + buf_addr += (page_size - offset); + } else { + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + if (maplen <= page_size) { + mapbuf->first_dma = buf_addr; + return 0; + } + + nprps = DIV_ROUND_UP(maplen, page_size); + if (nprps <= (EXTRA_POOL_SIZE / PRP_ENTRY_SIZE)) { + pool = mapbuf->hiraidq->prp_small_pool; + mapbuf->page_cnt = 0; + } else { + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + } + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate first prp_list memory failed\n"); + mapbuf->first_dma = buf_addr; + mapbuf->page_cnt = -1; + return -ENOMEM; + } + list[0] = phy_regpage; + mapbuf->first_dma = buffer_phy; + i = 0; + for (;;) { + if (i == page_size / PRP_ENTRY_SIZE) { + prior_list = phy_regpage; + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th prp list memory failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = phy_regpage; + phy_regpage[0] = prior_list[i - 1]; + prior_list[i - 1] = cpu_to_le64(buffer_phy); + i = 1; + } + phy_regpage[i++] = cpu_to_le64(buf_addr); + buf_addr += page_size; + buf_length -= page_size; + maplen -= page_size; + if (maplen <= 0) + break; + if (buf_length > 0) + continue; + if (unlikely(buf_length < 0)) + goto bad_sgl; + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + return 0; + +bad_sgl: + dev_err(hdev->dev, "setup prps, invalid SGL for payload[%d] nents[%d]\n", + mapbuf->len, mapbuf->sge_cnt); + return -EIO; +} + +#define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct hiraid_sgl_desc)) + +static void hiraid_submit_cmd(struct hiraid_queue *hiraidq, const void *cmd) +{ + u32 sqes = SQE_SIZE(hiraidq->qid); + unsigned long flags; + struct hiraid_admin_com_cmd *acd = (struct hiraid_admin_com_cmd *)cmd; + + spin_lock_irqsave(&hiraidq->sq_lock, flags); + memcpy((hiraidq->sq_cmds + sqes * hiraidq->sq_tail), cmd, sqes); + if (++hiraidq->sq_tail == hiraidq->q_depth) + hiraidq->sq_tail = 0; + + writel(hiraidq->sq_tail, hiraidq->q_db); + spin_unlock_irqrestore(&hiraidq->sq_lock, flags); + + dev_log_dbg(hiraidq->hdev->dev, "cid[%d] qid[%d] opcode[0x%x] flags[0x%x] hdid[%u]\n", + le16_to_cpu(acd->cmd_id), hiraidq->qid, acd->opcode, acd->flags, + le32_to_cpu(acd->hdid)); +} + +static inline bool hiraid_is_rw_scmd(struct scsi_cmnd *scmd) +{ + switch (scmd->cmnd[0]) { + case READ_6: + case READ_10: + case READ_12: + case READ_16: + case WRITE_6: + case WRITE_10: + case WRITE_12: + case WRITE_16: + return true; + default: + return false; + } +} + +/* + * checks if prps can be built for the IO cmd + */ +static bool hiraid_is_prp(struct hiraid_dev *hdev, struct scatterlist *sgl, u32 nsge) +{ + struct scatterlist *sg = sgl; + u32 page_mask = hdev->page_size - 1; + bool is_prp = true; + u32 i = 0; + + for_each_sg(sgl, sg, nsge, i) { + /* + * Data length of the middle sge multiple of page_size, + * address page_size aligned. + */ + if (i != 0 && i != nsge - 1) { + if ((sg_dma_len(sg) & page_mask) || + (sg_dma_address(sg) & page_mask)) { + is_prp = false; + break; + } + } + + /* + * The first sge addr plus the data length meets + * the page_size alignment. + */ + if (nsge > 1 && i == 0) { + if ((sg_dma_address(sg) + sg_dma_len(sg)) & page_mask) { + is_prp = false; + break; + } + } + + /* The last sge addr meets the page_size alignment. */ + if (nsge > 1 && i == (nsge - 1)) { + if (sg_dma_address(sg) & page_mask) { + is_prp = false; + break; + } + } + } + + return is_prp; +} + +enum { + HIRAID_SGL_FMT_DATA_DESC = 0x00, + HIRAID_SGL_FMT_SEG_DESC = 0x02, + HIRAID_SGL_FMT_LAST_SEG_DESC = 0x03, + HIRAID_KEY_SGL_FMT_DATA_DESC = 0x04, + HIRAID_TRANSPORT_SGL_DATA_DESC = 0x05 +}; + +static void hiraid_sgl_set_data(struct hiraid_sgl_desc *sge, struct scatterlist *sg) +{ + sge->addr = cpu_to_le64(sg_dma_address(sg)); + sge->length = cpu_to_le32(sg_dma_len(sg)); + sge->type = HIRAID_SGL_FMT_DATA_DESC << 4; +} + +static void hiraid_sgl_set_seg(struct hiraid_sgl_desc *sge, dma_addr_t buffer_phy, int entries) +{ + sge->addr = cpu_to_le64(buffer_phy); + if (entries <= SGES_PER_PAGE) { + sge->length = cpu_to_le32(entries * sizeof(*sge)); + sge->type = HIRAID_SGL_FMT_LAST_SEG_DESC << 4; + } else { + sge->length = cpu_to_le32(PAGE_SIZE); + sge->type = HIRAID_SGL_FMT_SEG_DESC << 4; + } +} + +static int hiraid_build_passthru_sgl(struct hiraid_dev *hdev, + struct hiraid_admin_command *admin_cmd, + struct hiraid_mapmange *mapbuf) +{ + struct hiraid_sgl_desc *sg_list, *link, *old_sg_list; + struct scatterlist *sg = mapbuf->sgl; + void **list = hiraid_mapbuf_list(mapbuf); + struct dma_pool *pool; + int nsge = mapbuf->sge_cnt; + dma_addr_t buffer_phy; + int i = 0; + + admin_cmd->common.flags |= SQE_FLAG_SGL_METABUF; + + if (nsge == 1) { + hiraid_sgl_set_data(&admin_cmd->common.dptr.sgl, sg); + return 0; + } + + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate first admin sgl_list failed\n"); + mapbuf->page_cnt = -1; + return -ENOMEM; + } + + list[0] = sg_list; + mapbuf->first_dma = buffer_phy; + hiraid_sgl_set_seg(&admin_cmd->common.dptr.sgl, buffer_phy, nsge); + do { + if (i == SGES_PER_PAGE) { + old_sg_list = sg_list; + link = &old_sg_list[SGES_PER_PAGE - 1]; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th admin sgl_list failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = sg_list; + + i = 0; + memcpy(&sg_list[i++], link, sizeof(*link)); + hiraid_sgl_set_seg(link, buffer_phy, nsge); + } + + hiraid_sgl_set_data(&sg_list[i++], sg); + sg = sg_next(sg); + } while (--nsge > 0); + + return 0; +} + + +static int hiraid_build_sgl(struct hiraid_dev *hdev, struct hiraid_scsi_io_cmd *io_cmd, + struct hiraid_mapmange *mapbuf) +{ + struct hiraid_sgl_desc *sg_list, *link, *old_sg_list; + struct scatterlist *sg = mapbuf->sgl; + void **list = hiraid_mapbuf_list(mapbuf); + struct dma_pool *pool; + int nsge = mapbuf->sge_cnt; + dma_addr_t buffer_phy; + int i = 0; + + io_cmd->common.flags |= SQE_FLAG_SGL_METABUF; + + if (nsge == 1) { + hiraid_sgl_set_data(&io_cmd->common.dptr.sgl, sg); + return 0; + } + + if (nsge <= (EXTRA_POOL_SIZE / sizeof(struct hiraid_sgl_desc))) { + pool = mapbuf->hiraidq->prp_small_pool; + mapbuf->page_cnt = 0; + } else { + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + } + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate first sgl_list failed\n"); + mapbuf->page_cnt = -1; + return -ENOMEM; + } + + list[0] = sg_list; + mapbuf->first_dma = buffer_phy; + hiraid_sgl_set_seg(&io_cmd->common.dptr.sgl, buffer_phy, nsge); + do { + if (i == SGES_PER_PAGE) { + old_sg_list = sg_list; + link = &old_sg_list[SGES_PER_PAGE - 1]; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th sgl_list failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = sg_list; + + i = 0; + memcpy(&sg_list[i++], link, sizeof(*link)); + hiraid_sgl_set_seg(link, buffer_phy, nsge); + } + + hiraid_sgl_set_data(&sg_list[i++], sg); + sg = sg_next(sg); + } while (--nsge > 0); + + return 0; +} + +#define HIRAID_RW_FUA BIT(14) + +static int hiraid_setup_rw_cmd(struct hiraid_dev *hdev, + struct hiraid_scsi_rw_cmd *io_cmd, + struct scsi_cmnd *scmd) +{ + u32 start_lba_lo, start_lba_hi; + u32 datalength = 0; + u16 control = 0; + + start_lba_lo = 0; + start_lba_hi = 0; + + if (scmd->sc_data_direction == DMA_TO_DEVICE) { + io_cmd->opcode = HIRAID_CMD_WRITE; + } else if (scmd->sc_data_direction == DMA_FROM_DEVICE) { + io_cmd->opcode = HIRAID_CMD_READ; + } else { + dev_err(hdev->dev, "invalid RW_IO for unsupported data direction[%d]\n", + scmd->sc_data_direction); + WARN_ON(1); + return -EINVAL; + } + + /* 6-byte READ(0x08) or WRITE(0x0A) cdb */ + if (scmd->cmd_len == 6) { + datalength = (u32)(scmd->cmnd[4] == 0 ? + IO_6_DEFAULT_TX_LEN : scmd->cmnd[4]); + start_lba_lo = (u32)get_unaligned_be24(&scmd->cmnd[1]); + + start_lba_lo &= 0x1FFFFF; + } + + /* 10-byte READ(0x28) or WRITE(0x2A) cdb */ + else if (scmd->cmd_len == 10) { + datalength = (u32)get_unaligned_be16(&scmd->cmnd[7]); + start_lba_lo = get_unaligned_be32(&scmd->cmnd[2]); + + if (scmd->cmnd[1] & FUA_MASK) + control |= HIRAID_RW_FUA; + } + + /* 12-byte READ(0xA8) or WRITE(0xAA) cdb */ + else if (scmd->cmd_len == 12) { + datalength = get_unaligned_be32(&scmd->cmnd[6]); + start_lba_lo = get_unaligned_be32(&scmd->cmnd[2]); + + if (scmd->cmnd[1] & FUA_MASK) + control |= HIRAID_RW_FUA; + } + /* 16-byte READ(0x88) or WRITE(0x8A) cdb */ + else if (scmd->cmd_len == 16) { + datalength = get_unaligned_be32(&scmd->cmnd[10]); + start_lba_lo = get_unaligned_be32(&scmd->cmnd[6]); + start_lba_hi = get_unaligned_be32(&scmd->cmnd[2]); + + if (scmd->cmnd[1] & FUA_MASK) + control |= HIRAID_RW_FUA; + } + + if (unlikely(datalength > U16_MAX || datalength == 0)) { + dev_err(hdev->dev, "invalid IO for illegal transfer data length[%u]\n", datalength); + WARN_ON(1); + return -EINVAL; + } + + io_cmd->slba = cpu_to_le64(((u64)start_lba_hi << 32) | start_lba_lo); + /* 0base for nlb */ + io_cmd->nlb = cpu_to_le16((u16)(datalength - 1)); + io_cmd->control = cpu_to_le16(control); + + return 0; +} + +static int hiraid_setup_nonrw_cmd(struct hiraid_dev *hdev, + struct hiraid_scsi_nonrw_cmd *io_cmd, struct scsi_cmnd *scmd) +{ + io_cmd->buf_len = cpu_to_le32(scsi_bufflen(scmd)); + + switch (scmd->sc_data_direction) { + case DMA_NONE: + io_cmd->opcode = HIRAID_CMD_NONRW_NONE; + break; + case DMA_TO_DEVICE: + io_cmd->opcode = HIRAID_CMD_NONRW_TODEV; + break; + case DMA_FROM_DEVICE: + io_cmd->opcode = HIRAID_CMD_NONRW_FROMDEV; + break; + default: + dev_err(hdev->dev, "invalid NON_IO for unsupported data direction[%d]\n", + scmd->sc_data_direction); + WARN_ON(1); + return -EINVAL; + } + + return 0; +} + +static int hiraid_setup_io_cmd(struct hiraid_dev *hdev, + struct hiraid_scsi_io_cmd *io_cmd, struct scsi_cmnd *scmd) +{ + memcpy(io_cmd->common.cdb, scmd->cmnd, scmd->cmd_len); + io_cmd->common.cdb_len = scmd->cmd_len; + + if (hiraid_is_rw_scmd(scmd)) + return hiraid_setup_rw_cmd(hdev, &io_cmd->rw, scmd); + else + return hiraid_setup_nonrw_cmd(hdev, &io_cmd->nonrw, scmd); +} + +static inline void hiraid_init_mapbuff(struct hiraid_mapmange *mapbuf) +{ + mapbuf->sge_cnt = 0; + mapbuf->page_cnt = -1; + mapbuf->use_sgl = false; + WRITE_ONCE(mapbuf->state, CMD_IDLE); +} + +static void hiraid_free_mapbuf(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf) +{ + const int last_prp = hdev->page_size / sizeof(__le64) - 1; + dma_addr_t buffer_phy, next_buffer_phy; + struct hiraid_sgl_desc *sg_list; + __le64 *prp_list; + void *addr; + int i; + + buffer_phy = mapbuf->first_dma; + if (mapbuf->page_cnt == 0) + dma_pool_free(mapbuf->hiraidq->prp_small_pool, + hiraid_mapbuf_list(mapbuf)[0], buffer_phy); + + for (i = 0; i < mapbuf->page_cnt; i++) { + addr = hiraid_mapbuf_list(mapbuf)[i]; + + if (mapbuf->use_sgl) { + sg_list = addr; + next_buffer_phy = + le64_to_cpu((sg_list[SGES_PER_PAGE - 1]).addr); + } else { + prp_list = addr; + next_buffer_phy = le64_to_cpu(prp_list[last_prp]); + } + + dma_pool_free(hdev->prp_page_pool, addr, buffer_phy); + buffer_phy = next_buffer_phy; + } + + mapbuf->sense_buffer_virt = NULL; + mapbuf->page_cnt = -1; +} + +static int hiraid_io_map_data(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf, + struct scsi_cmnd *scmd, struct hiraid_scsi_io_cmd *io_cmd) +{ + int ret; + + ret = scsi_dma_map(scmd); + if (unlikely(ret < 0)) + return ret; + mapbuf->sge_cnt = ret; + + /* No data to DMA, it may be scsi no-rw command */ + if (unlikely(mapbuf->sge_cnt == 0)) + return 0; + + mapbuf->len = scsi_bufflen(scmd); + mapbuf->sgl = scsi_sglist(scmd); + mapbuf->use_sgl = !hiraid_is_prp(hdev, mapbuf->sgl, mapbuf->sge_cnt); + + if (mapbuf->use_sgl) { + ret = hiraid_build_sgl(hdev, io_cmd, mapbuf); + } else { + ret = hiraid_build_prp(hdev, mapbuf); + io_cmd->common.dptr.prp1 = + cpu_to_le64(sg_dma_address(mapbuf->sgl)); + io_cmd->common.dptr.prp2 = cpu_to_le64(mapbuf->first_dma); + } + + if (ret) + scsi_dma_unmap(scmd); + + return ret; +} + +static void hiraid_check_status(struct hiraid_mapmange *mapbuf, struct scsi_cmnd *scmd, + struct hiraid_completion *cqe) +{ + scsi_set_resid(scmd, 0); + + switch ((le16_to_cpu(cqe->status) >> 1) & 0x7f) { + case SENSE_STATE_OK: + set_host_byte(scmd, DID_OK); + break; + case SENSE_STATE_NEED_CHECK: + set_host_byte(scmd, DID_OK); + scmd->result |= le16_to_cpu(cqe->status) >> 8; + if (scmd->result & SAM_STAT_CHECK_CONDITION) { + memset(scmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); + memcpy(scmd->sense_buffer, + mapbuf->sense_buffer_virt, SCSI_SENSE_BUFFERSIZE); + scmd->result = (scmd->result & 0x00ffffff) | (DRIVER_SENSE << 24); + } + break; + case SENSE_STATE_ABORTED: + set_host_byte(scmd, DID_ABORT); + break; + case SENSE_STATE_NEED_RETRY: + set_host_byte(scmd, DID_REQUEUE); + break; + default: + set_host_byte(scmd, DID_BAD_TARGET); + dev_warn_ratelimited(mapbuf->hiraidq->hdev->dev, "cid[%d] qid[%d] sdev[%d:%d] opcode[%.2x] bad status[0x%x]\n", + le16_to_cpu(cqe->cmd_id), le16_to_cpu(cqe->sq_id), scmd->device->channel, + scmd->device->id, scmd->cmnd[0], le16_to_cpu(cqe->status)); + break; + } +} + +static inline void hiraid_query_scmd_tag(struct scsi_cmnd *scmd, u16 *qid, u16 *cid, + struct hiraid_dev *hdev, struct hiraid_sdev_hostdata *hostdata) +{ + u32 tag = blk_mq_unique_tag(blk_mq_rq_from_pdu((void *)scmd)); + + if (work_mode) { + if ((hdev->hdd_dispatch == DISPATCH_BY_DISK) && (hostdata->hwq != 0)) + *qid = hostdata->hwq; + else + *qid = raw_smp_processor_id() % (hdev->online_queues - 1) + 1; + } else { + *qid = blk_mq_unique_tag_to_hwq(tag) + 1; + } + *cid = blk_mq_unique_tag_to_tag(tag); +} + +static int hiraid_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) +{ + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + struct hiraid_dev *hdev = shost_priv(shost); + struct scsi_device *sdev = scmd->device; + struct hiraid_sdev_hostdata *hostdata; + struct hiraid_scsi_io_cmd io_cmd; + struct hiraid_queue *ioq; + u16 hwq, cid; + int ret; + + if (unlikely(hdev->state == DEV_RESETTING)) + return SCSI_MLQUEUE_HOST_BUSY; + + if (unlikely(hdev->state != DEV_LIVE)) { + set_host_byte(scmd, DID_NO_CONNECT); + scmd->scsi_done(scmd); + return 0; + } + + if (log_debug_switch) + scsi_print_command(scmd); + + hostdata = sdev->hostdata; + hiraid_query_scmd_tag(scmd, &hwq, &cid, hdev, hostdata); + ioq = &hdev->queues[hwq]; + + if (unlikely(atomic_inc_return(&ioq->inflight) > + (hdev->ioq_depth - HIRAID_PTHRU_CMDS_PERQ))) { + atomic_dec(&ioq->inflight); + return SCSI_MLQUEUE_HOST_BUSY; + } + + memset(&io_cmd, 0, sizeof(io_cmd)); + io_cmd.rw.hdid = cpu_to_le32(hostdata->hdid); + io_cmd.rw.cmd_id = cpu_to_le16(cid); + + ret = hiraid_setup_io_cmd(hdev, &io_cmd, scmd); + if (unlikely(ret)) { + set_host_byte(scmd, DID_ERROR); + scmd->scsi_done(scmd); + atomic_dec(&ioq->inflight); + return 0; + } + + ret = cid * SCSI_SENSE_BUFFERSIZE; + if (work_mode) { + mapbuf->sense_buffer_virt = hdev->sense_buffer_virt + ret; + mapbuf->sense_buffer_phy = hdev->sense_buffer_phy + ret; + } else { + mapbuf->sense_buffer_virt = ioq->sense_buffer_virt + ret; + mapbuf->sense_buffer_phy = ioq->sense_buffer_phy + ret; + } + io_cmd.common.sense_addr = cpu_to_le64(mapbuf->sense_buffer_phy); + io_cmd.common.sense_len = cpu_to_le16(SCSI_SENSE_BUFFERSIZE); + + hiraid_init_mapbuff(mapbuf); + + mapbuf->hiraidq = ioq; + mapbuf->cid = cid; + ret = hiraid_io_map_data(hdev, mapbuf, scmd, &io_cmd); + if (unlikely(ret)) { + dev_err(hdev->dev, "io map data err\n"); + set_host_byte(scmd, DID_ERROR); + scmd->scsi_done(scmd); + ret = 0; + goto deinit_iobuf; + } + + WRITE_ONCE(mapbuf->state, CMD_FLIGHT); + hiraid_submit_cmd(ioq, &io_cmd); + + return 0; + +deinit_iobuf: + atomic_dec(&ioq->inflight); + hiraid_free_mapbuf(hdev, mapbuf); + return ret; +} + +static int hiraid_match_dev(struct hiraid_dev *hdev, u16 idx, struct scsi_device *sdev) +{ + if (HIRAID_DEV_INFO_FLAG_VALID(hdev->dev_info[idx].flag)) { + if (sdev->channel == hdev->dev_info[idx].channel && + sdev->id == le16_to_cpu(hdev->dev_info[idx].target) && + sdev->lun < hdev->dev_info[idx].lun) { + dev_info(hdev->dev, "match device success, channel:target:lun[%d:%d:%d]\n", + hdev->dev_info[idx].channel, + hdev->dev_info[idx].target, + hdev->dev_info[idx].lun); + return 1; + } + } + + return 0; +} + +static int hiraid_disk_qd(u8 attr) +{ + switch (HIRAID_DEV_DISK_TYPE(attr)) { + case HIRAID_SAS_HDD_VD: + case HIRAID_SATA_HDD_VD: + return HIRAID_HDD_VD_QD; + case HIRAID_SAS_SSD_VD: + case HIRAID_SATA_SSD_VD: + case HIRAID_NVME_SSD_VD: + return HIRAID_SSD_VD_QD; + case HIRAID_SAS_HDD_PD: + case HIRAID_SATA_HDD_PD: + return HIRAID_HDD_PD_QD; + case HIRAID_SAS_SSD_PD: + case HIRAID_SATA_SSD_PD: + case HIRAID_NVME_SSD_PD: + return HIRAID_SSD_PD_QD; + default: + return MAX_CMD_PER_DEV; + } +} + +static bool hiraid_disk_is_hdd(u8 attr) +{ + switch (HIRAID_DEV_DISK_TYPE(attr)) { + case HIRAID_SAS_HDD_VD: + case HIRAID_SATA_HDD_VD: + case HIRAID_SAS_HDD_PD: + case HIRAID_SATA_HDD_PD: + return true; + default: + return false; + } +} + +static int hiraid_slave_alloc(struct scsi_device *sdev) +{ + struct hiraid_sdev_hostdata *hostdata; + struct hiraid_dev *hdev; + u16 idx; + + hdev = shost_priv(sdev->host); + hostdata = kzalloc(sizeof(*hostdata), GFP_KERNEL); + if (!hostdata) { + dev_err(hdev->dev, "alloc scsi host data memory failed\n"); + return -ENOMEM; + } + + down_read(&hdev->dev_rwsem); + for (idx = 0; idx < le32_to_cpu(hdev->ctrl_info->nd); idx++) { + if (hiraid_match_dev(hdev, idx, sdev)) + goto scan_host; + } + up_read(&hdev->dev_rwsem); + + kfree(hostdata); + return -ENXIO; + +scan_host: + hostdata->hdid = le32_to_cpu(hdev->dev_info[idx].hdid); + hostdata->max_io_kb = le16_to_cpu(hdev->dev_info[idx].max_io_kb); + hostdata->attr = hdev->dev_info[idx].attr; + hostdata->flag = hdev->dev_info[idx].flag; + hostdata->rg_id = 0xff; + sdev->hostdata = hostdata; + up_read(&hdev->dev_rwsem); + return 0; +} + +static void hiraid_slave_destroy(struct scsi_device *sdev) +{ + kfree(sdev->hostdata); + sdev->hostdata = NULL; +} + +static int hiraid_slave_configure(struct scsi_device *sdev) +{ + unsigned int timeout = scmd_tmout_rawdisk * HZ; + struct hiraid_dev *hdev = shost_priv(sdev->host); + struct hiraid_sdev_hostdata *hostdata = sdev->hostdata; + u32 max_sec = sdev->host->max_sectors; + int qd = MAX_CMD_PER_DEV; + + if (hostdata) { + if (HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + timeout = scmd_tmout_vd * HZ; + else if (HIRAID_DEV_INFO_ATTR_RAWDISK(hostdata->attr)) + timeout = scmd_tmout_rawdisk * HZ; + max_sec = hostdata->max_io_kb << 1; + qd = hiraid_disk_qd(hostdata->attr); + + if (hiraid_disk_is_hdd(hostdata->attr)) + hostdata->hwq = hostdata->hdid % (hdev->online_queues - 1) + 1; + else + hostdata->hwq = 0; + } else { + dev_err(hdev->dev, "err, sdev->hostdata is null\n"); + } + + blk_queue_rq_timeout(sdev->request_queue, timeout); + sdev->eh_timeout = timeout; + scsi_change_queue_depth(sdev, qd); + + if ((max_sec == 0) || (max_sec > sdev->host->max_sectors)) + max_sec = sdev->host->max_sectors; + + if (!max_io_force) + blk_queue_max_hw_sectors(sdev->request_queue, max_sec); + + dev_info(hdev->dev, "sdev->channel:id:lun[%d:%d:%lld] scmd_timeout[%d]s maxsec[%d]\n", + sdev->channel, sdev->id, sdev->lun, timeout / HZ, max_sec); + + return 0; +} + +static void hiraid_shost_init(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + u8 domain, bus; + u32 dev_func; + + domain = pci_domain_nr(pdev->bus); + bus = pdev->bus->number; + dev_func = pdev->devfn; + + hdev->shost->nr_hw_queues = work_mode ? 1 : hdev->online_queues - 1; + hdev->shost->can_queue = hdev->scsi_qd; + + hdev->shost->sg_tablesize = le16_to_cpu(hdev->ctrl_info->max_num_sge); + /* 512B per sector */ + hdev->shost->max_sectors = (1U << ((hdev->ctrl_info->mdts) * 1U) << 12) / 512; + hdev->shost->cmd_per_lun = MAX_CMD_PER_DEV; + hdev->shost->max_channel = le16_to_cpu(hdev->ctrl_info->max_channel) - 1; + hdev->shost->max_id = le32_to_cpu(hdev->ctrl_info->max_tgt_id); + hdev->shost->max_lun = le16_to_cpu(hdev->ctrl_info->max_lun); + + hdev->shost->this_id = -1; + hdev->shost->unique_id = (domain << 16) | (bus << 8) | dev_func; + hdev->shost->max_cmd_len = MAX_CDB_LEN; + hdev->shost->hostt->cmd_size = hiraid_get_max_cmd_size(hdev); +} + +static int hiraid_alloc_queue(struct hiraid_dev *hdev, u16 qid, u16 depth) +{ + struct hiraid_queue *hiraidq = &hdev->queues[qid]; + int ret = 0; + + if (hdev->queue_count > qid) { + dev_info(hdev->dev, "warn: queue[%d] is exist\n", qid); + return 0; + } + + hiraidq->cqes = dma_alloc_coherent(hdev->dev, CQ_SIZE(depth), + &hiraidq->cq_buffer_phy, GFP_KERNEL | __GFP_ZERO); + if (!hiraidq->cqes) + return -ENOMEM; + + hiraidq->sq_cmds = dma_alloc_coherent(hdev->dev, SQ_SIZE(qid, depth), + &hiraidq->sq_buffer_phy, GFP_KERNEL); + if (!hiraidq->sq_cmds) { + ret = -ENOMEM; + goto free_cqes; + } + + /* + * if single hw queue, we do not need to alloc sense buffer for every queue, + * we have alloced all on hiraid_alloc_resources. + */ + if (work_mode) + goto initq; + + /* alloc sense buffer */ + hiraidq->sense_buffer_virt = dma_alloc_coherent(hdev->dev, SENSE_SIZE(depth), + &hiraidq->sense_buffer_phy, GFP_KERNEL | __GFP_ZERO); + if (!hiraidq->sense_buffer_virt) { + ret = -ENOMEM; + goto free_sq_cmds; + } + +initq: + spin_lock_init(&hiraidq->sq_lock); + spin_lock_init(&hiraidq->cq_lock); + hiraidq->hdev = hdev; + hiraidq->q_depth = depth; + hiraidq->qid = qid; + hiraidq->cq_vector = -1; + hdev->queue_count++; + + return 0; + +free_sq_cmds: + dma_free_coherent(hdev->dev, SQ_SIZE(qid, depth), (void *)hiraidq->sq_cmds, + hiraidq->sq_buffer_phy); +free_cqes: + dma_free_coherent(hdev->dev, CQ_SIZE(depth), (void *)hiraidq->cqes, + hiraidq->cq_buffer_phy); + return ret; +} + +static int hiraid_wait_control_ready(struct hiraid_dev *hdev, u64 cap, bool enabled) +{ + unsigned long timeout = + ((HIRAID_CAP_TIMEOUT(cap) + 1) * HIRAID_CAP_TIMEOUT_UNIT_MS) + jiffies; + u32 bit = enabled ? HIRAID_CSTS_RDY : 0; + + while ((readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_RDY) != bit) { + usleep_range(1000, 2000); + if (fatal_signal_pending(current)) + return -EINTR; + + if (time_after(jiffies, timeout)) { + dev_err(hdev->dev, "device not ready; aborting %s\n", + enabled ? "initialisation" : "reset"); + return -ENODEV; + } + } + return 0; +} + +static int hiraid_shutdown_control(struct hiraid_dev *hdev) +{ + unsigned long timeout = le32_to_cpu(hdev->ctrl_info->rtd3e) / 1000000 * HZ + jiffies; + + hdev->ctrl_config &= ~HIRAID_CC_SHN_MASK; + hdev->ctrl_config |= HIRAID_CC_SHN_NORMAL; + writel(hdev->ctrl_config, hdev->bar + HIRAID_REG_CC); + + while ((readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_SHST_MASK) != + HIRAID_CSTS_SHST_CMPLT) { + msleep(100); + if (fatal_signal_pending(current)) + return -EINTR; + if (time_after(jiffies, timeout)) { + dev_err(hdev->dev, "device shutdown incomplete, abort shutdown\n"); + return -ENODEV; + } + } + return 0; +} + +static int hiraid_disable_control(struct hiraid_dev *hdev) +{ + hdev->ctrl_config &= ~HIRAID_CC_SHN_MASK; + hdev->ctrl_config &= ~HIRAID_CC_ENABLE; + writel(hdev->ctrl_config, hdev->bar + HIRAID_REG_CC); + + return hiraid_wait_control_ready(hdev, hdev->cap, false); +} + +static int hiraid_enable_control(struct hiraid_dev *hdev) +{ + u64 cap = hdev->cap; + u32 dev_page_min = HIRAID_CAP_MPSMIN(cap) + 12; + u32 page_shift = PAGE_SHIFT; + + if (page_shift < dev_page_min) { + dev_err(hdev->dev, "minimum device page size[%u], too large for host[%u]\n", + 1U << dev_page_min, 1U << page_shift); + return -ENODEV; + } + + page_shift = min_t(unsigned int, HIRAID_CAP_MPSMAX(cap) + 12, PAGE_SHIFT); + hdev->page_size = 1U << page_shift; + + hdev->ctrl_config = HIRAID_CC_CSS_NVM; + hdev->ctrl_config |= (page_shift - 12) << HIRAID_CC_MPS_SHIFT; + hdev->ctrl_config |= HIRAID_CC_AMS_RR | HIRAID_CC_SHN_NONE; + hdev->ctrl_config |= HIRAID_CC_IOSQES | HIRAID_CC_IOCQES; + hdev->ctrl_config |= HIRAID_CC_ENABLE; + writel(hdev->ctrl_config, hdev->bar + HIRAID_REG_CC); + + return hiraid_wait_control_ready(hdev, cap, true); +} + +static void hiraid_init_queue(struct hiraid_queue *hiraidq, u16 qid) +{ + struct hiraid_dev *hdev = hiraidq->hdev; + + memset((void *)hiraidq->cqes, 0, CQ_SIZE(hiraidq->q_depth)); + + hiraidq->sq_tail = 0; + hiraidq->cq_head = 0; + hiraidq->cq_phase = 1; + hiraidq->q_db = &hdev->dbs[qid * 2 * hdev->db_stride]; + hiraidq->prp_small_pool = hdev->prp_extra_pool[qid % extra_pool_num]; + hdev->online_queues++; + atomic_set(&hiraidq->inflight, 0); +} + +static inline bool hiraid_cqe_pending(struct hiraid_queue *hiraidq) +{ + return (le16_to_cpu(hiraidq->cqes[hiraidq->cq_head].status) & 1) == + hiraidq->cq_phase; +} + +static void hiraid_complete_io_cmnd(struct hiraid_queue *ioq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = ioq->hdev; + struct blk_mq_tags *tags; + struct scsi_cmnd *scmd; + struct hiraid_mapmange *mapbuf; + struct request *req; + unsigned long elapsed; + + atomic_dec(&ioq->inflight); + + if (work_mode) + tags = hdev->shost->tag_set.tags[0]; + else + tags = hdev->shost->tag_set.tags[ioq->qid - 1]; + req = blk_mq_tag_to_rq(tags, le16_to_cpu(cqe->cmd_id)); + if (unlikely(!req || !blk_mq_request_started(req))) { + dev_warn(hdev->dev, "invalid id[%d] completed on queue[%d]\n", + le16_to_cpu(cqe->cmd_id), ioq->qid); + return; + } + + scmd = blk_mq_rq_to_pdu(req); + mapbuf = scsi_cmd_priv(scmd); + + elapsed = jiffies - scmd->jiffies_at_alloc; + dev_log_dbg(hdev->dev, "cid[%d] qid[%d] finish IO cost %3ld.%3ld seconds\n", + le16_to_cpu(cqe->cmd_id), ioq->qid, elapsed / HZ, elapsed % HZ); + + if (cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_COMPLETE) != CMD_FLIGHT) { + dev_warn(hdev->dev, "cid[%d] qid[%d] enters abnormal handler, cost %3ld.%3ld seconds\n", + le16_to_cpu(cqe->cmd_id), ioq->qid, elapsed / HZ, elapsed % HZ); + WRITE_ONCE(mapbuf->state, CMD_TMO_COMPLETE); + + if (mapbuf->sge_cnt) { + mapbuf->sge_cnt = 0; + scsi_dma_unmap(scmd); + } + hiraid_free_mapbuf(hdev, mapbuf); + + return; + } + + hiraid_check_status(mapbuf, scmd, cqe); + if (mapbuf->sge_cnt) { + mapbuf->sge_cnt = 0; + scsi_dma_unmap(scmd); + } + hiraid_free_mapbuf(hdev, mapbuf); + scmd->scsi_done(scmd); +} + +static void hiraid_complete_admin_cmnd(struct hiraid_queue *adminq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = adminq->hdev; + struct hiraid_cmd *adm_cmd; + + adm_cmd = hdev->adm_cmds + le16_to_cpu(cqe->cmd_id); + if (unlikely(adm_cmd->state == CMD_IDLE)) { + dev_warn(adminq->hdev->dev, "invalid id[%d] completed on queue[%d]\n", + le16_to_cpu(cqe->cmd_id), le16_to_cpu(cqe->sq_id)); + return; + } + + adm_cmd->status = le16_to_cpu(cqe->status) >> 1; + adm_cmd->result0 = le32_to_cpu(cqe->result); + adm_cmd->result1 = le32_to_cpu(cqe->result1); + + complete(&adm_cmd->cmd_done); +} + +static void hiraid_send_async_event(struct hiraid_dev *hdev, u16 cid); + +static void hiraid_complete_async_event(struct hiraid_queue *hiraidq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = hiraidq->hdev; + u32 result = le32_to_cpu(cqe->result); + + dev_info(hdev->dev, "recv async event, cid[%d] status[0x%x] result[0x%x]\n", + le16_to_cpu(cqe->cmd_id), le16_to_cpu(cqe->status) >> 1, result); + + hiraid_send_async_event(hdev, le16_to_cpu(cqe->cmd_id)); + + if ((le16_to_cpu(cqe->status) >> 1) != HIRAID_SC_SUCCESS) + return; + switch (result & 0x7) { + case HIRAID_ASYN_EVENT_NOTICE: + hiraid_handle_async_notice(hdev, result); + break; + case HIRAID_ASYN_EVENT_VS: + hiraid_handle_async_vs(hdev, result, le32_to_cpu(cqe->result1)); + break; + default: + dev_warn(hdev->dev, "unsupported async event type[%u]\n", result & 0x7); + break; + } +} + +static void hiraid_complete_pthru_cmnd(struct hiraid_queue *ioq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = ioq->hdev; + struct hiraid_cmd *ptcmd; + + ptcmd = hdev->io_ptcmds + (ioq->qid - 1) * HIRAID_PTHRU_CMDS_PERQ + + le16_to_cpu(cqe->cmd_id) - hdev->scsi_qd; + + ptcmd->status = le16_to_cpu(cqe->status) >> 1; + ptcmd->result0 = le32_to_cpu(cqe->result); + ptcmd->result1 = le32_to_cpu(cqe->result1); + + complete(&ptcmd->cmd_done); +} + +static inline void hiraid_handle_cqe(struct hiraid_queue *hiraidq, u16 idx) +{ + struct hiraid_completion *cqe = &hiraidq->cqes[idx]; + struct hiraid_dev *hdev = hiraidq->hdev; + u16 cid = le16_to_cpu(cqe->cmd_id); + + if (unlikely(!work_mode && (cid >= hiraidq->q_depth))) { + dev_err(hdev->dev, "invalid command id[%d] completed on queue[%d]\n", + cid, cqe->sq_id); + return; + } + + dev_log_dbg(hdev->dev, "cid[%d] qid[%d] result[0x%x] sqid[%d] status[0x%x]\n", + cid, hiraidq->qid, le32_to_cpu(cqe->result), + le16_to_cpu(cqe->sq_id), le16_to_cpu(cqe->status)); + + if (unlikely(hiraidq->qid == 0 && cid >= HIRAID_AQ_BLK_MQ_DEPTH)) { + hiraid_complete_async_event(hiraidq, cqe); + return; + } + + if (unlikely(hiraidq->qid && cid >= hdev->scsi_qd)) { + hiraid_complete_pthru_cmnd(hiraidq, cqe); + return; + } + + if (hiraidq->qid) + hiraid_complete_io_cmnd(hiraidq, cqe); + else + hiraid_complete_admin_cmnd(hiraidq, cqe); +} + +static void hiraid_complete_cqes(struct hiraid_queue *hiraidq, u16 start, u16 end) +{ + while (start != end) { + hiraid_handle_cqe(hiraidq, start); + if (++start == hiraidq->q_depth) + start = 0; + } +} + +static inline void hiraid_update_cq_head(struct hiraid_queue *hiraidq) +{ + if (++hiraidq->cq_head == hiraidq->q_depth) { + hiraidq->cq_head = 0; + hiraidq->cq_phase = !hiraidq->cq_phase; + } +} + +static inline bool hiraid_process_cq(struct hiraid_queue *hiraidq, u16 *start, u16 *end, int tag) +{ + bool found = false; + + *start = hiraidq->cq_head; + while (!found && hiraid_cqe_pending(hiraidq)) { + if (le16_to_cpu(hiraidq->cqes[hiraidq->cq_head].cmd_id) == tag) + found = true; + hiraid_update_cq_head(hiraidq); + } + *end = hiraidq->cq_head; + + if (*start != *end) + writel(hiraidq->cq_head, hiraidq->q_db + hiraidq->hdev->db_stride); + + return found; +} + +static bool hiraid_poll_cq(struct hiraid_queue *hiraidq, int cid) +{ + u16 start, end; + bool found; + + if (!hiraid_cqe_pending(hiraidq)) + return 0; + + spin_lock_irq(&hiraidq->cq_lock); + found = hiraid_process_cq(hiraidq, &start, &end, cid); + spin_unlock_irq(&hiraidq->cq_lock); + + hiraid_complete_cqes(hiraidq, start, end); + return found; +} + +static irqreturn_t hiraid_handle_irq(int irq, void *data) +{ + struct hiraid_queue *hiraidq = data; + irqreturn_t ret = IRQ_NONE; + u16 start, end; + + spin_lock(&hiraidq->cq_lock); + if (hiraidq->cq_head != hiraidq->last_cq_head) + ret = IRQ_HANDLED; + + hiraid_process_cq(hiraidq, &start, &end, -1); + hiraidq->last_cq_head = hiraidq->cq_head; + spin_unlock(&hiraidq->cq_lock); + + if (start != end) { + hiraid_complete_cqes(hiraidq, start, end); + ret = IRQ_HANDLED; + } + return ret; +} + +static int hiraid_setup_admin_queue(struct hiraid_dev *hdev) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + u32 aqa; + int ret; + + dev_info(hdev->dev, "start disable controller\n"); + + ret = hiraid_disable_control(hdev); + if (ret) + return ret; + + ret = hiraid_alloc_queue(hdev, 0, HIRAID_AQ_DEPTH); + if (ret) + return ret; + + aqa = adminq->q_depth - 1; + aqa |= aqa << 16; + writel(aqa, hdev->bar + HIRAID_REG_AQA); + lo_hi_writeq(adminq->sq_buffer_phy, hdev->bar + HIRAID_REG_ASQ); + lo_hi_writeq(adminq->cq_buffer_phy, hdev->bar + HIRAID_REG_ACQ); + + dev_info(hdev->dev, "start enable controller\n"); + + ret = hiraid_enable_control(hdev); + if (ret) { + ret = -ENODEV; + return ret; + } + + adminq->cq_vector = 0; + ret = pci_request_irq(hdev->pdev, adminq->cq_vector, hiraid_handle_irq, NULL, + adminq, "hiraid%d_q%d", hdev->instance, adminq->qid); + if (ret) { + adminq->cq_vector = -1; + return ret; + } + + hiraid_init_queue(adminq, 0); + + dev_info(hdev->dev, "setup admin queue success, queuecount[%d] online[%d] pagesize[%d]\n", + hdev->queue_count, hdev->online_queues, hdev->page_size); + + return 0; +} + +static u32 hiraid_get_bar_size(struct hiraid_dev *hdev, u32 nr_ioqs) +{ + return (HIRAID_REG_DBS + ((nr_ioqs + 1) * 8 * hdev->db_stride)); +} + +static int hiraid_create_admin_cmds(struct hiraid_dev *hdev) +{ + u16 i; + + INIT_LIST_HEAD(&hdev->adm_cmd_list); + spin_lock_init(&hdev->adm_cmd_lock); + + hdev->adm_cmds = kcalloc_node(HIRAID_AQ_BLK_MQ_DEPTH, sizeof(struct hiraid_cmd), + GFP_KERNEL, hdev->numa_node); + + if (!hdev->adm_cmds) { + dev_err(hdev->dev, "alloc admin cmds failed\n"); + return -ENOMEM; + } + + for (i = 0; i < HIRAID_AQ_BLK_MQ_DEPTH; i++) { + hdev->adm_cmds[i].qid = 0; + hdev->adm_cmds[i].cid = i; + list_add_tail(&(hdev->adm_cmds[i].list), &hdev->adm_cmd_list); + } + + dev_info(hdev->dev, "alloc admin cmds success, num[%d]\n", HIRAID_AQ_BLK_MQ_DEPTH); + + return 0; +} + +static void hiraid_free_admin_cmds(struct hiraid_dev *hdev) +{ + kfree(hdev->adm_cmds); + hdev->adm_cmds = NULL; + INIT_LIST_HEAD(&hdev->adm_cmd_list); +} + +static struct hiraid_cmd *hiraid_get_cmd(struct hiraid_dev *hdev, enum hiraid_cmd_type type) +{ + struct hiraid_cmd *cmd = NULL; + unsigned long flags; + struct list_head *head = &hdev->adm_cmd_list; + spinlock_t *slock = &hdev->adm_cmd_lock; + + if (type == HIRAID_CMD_PTHRU) { + head = &hdev->io_pt_list; + slock = &hdev->io_pt_lock; + } + + spin_lock_irqsave(slock, flags); + if (list_empty(head)) { + spin_unlock_irqrestore(slock, flags); + dev_err(hdev->dev, "err, cmd[%d] list empty\n", type); + return NULL; + } + cmd = list_entry(head->next, struct hiraid_cmd, list); + list_del_init(&cmd->list); + spin_unlock_irqrestore(slock, flags); + + WRITE_ONCE(cmd->state, CMD_FLIGHT); + + return cmd; +} + +static void hiraid_put_cmd(struct hiraid_dev *hdev, struct hiraid_cmd *cmd, + enum hiraid_cmd_type type) +{ + unsigned long flags; + struct list_head *head = &hdev->adm_cmd_list; + spinlock_t *slock = &hdev->adm_cmd_lock; + + if (type == HIRAID_CMD_PTHRU) { + head = &hdev->io_pt_list; + slock = &hdev->io_pt_lock; + } + + spin_lock_irqsave(slock, flags); + WRITE_ONCE(cmd->state, CMD_IDLE); + list_add_tail(&cmd->list, head); + spin_unlock_irqrestore(slock, flags); +} + +static bool hiraid_admin_need_reset(struct hiraid_admin_command *cmd) +{ + switch (cmd->common.opcode) { + case HIRAID_ADMIN_DELETE_SQ: + case HIRAID_ADMIN_CREATE_SQ: + case HIRAID_ADMIN_DELETE_CQ: + case HIRAID_ADMIN_CREATE_CQ: + case HIRAID_ADMIN_SET_FEATURES: + return false; + default: + return true; + } +} + +static int hiraid_reset_work_sync(struct hiraid_dev *hdev); +static inline void hiraid_admin_timeout(struct hiraid_dev *hdev, struct hiraid_cmd *cmd) +{ + /* command may be returned because controller reset */ + if (READ_ONCE(cmd->state) == CMD_COMPLETE) + return; + if (hiraid_reset_work_sync(hdev) == -EBUSY) + flush_work(&hdev->reset_work); +} + +static int hiraid_put_admin_sync_request(struct hiraid_dev *hdev, struct hiraid_admin_command *cmd, + u32 *result0, u32 *result1, u32 timeout) +{ + struct hiraid_cmd *adm_cmd = hiraid_get_cmd(hdev, HIRAID_CMD_ADMIN); + + if (!adm_cmd) { + dev_err(hdev->dev, "err, get admin cmd failed\n"); + return -EFAULT; + } + + timeout = timeout ? timeout : ADMIN_TIMEOUT; + + init_completion(&adm_cmd->cmd_done); + + cmd->common.cmd_id = cpu_to_le16(adm_cmd->cid); + hiraid_submit_cmd(&hdev->queues[0], cmd); + + if (!wait_for_completion_timeout(&adm_cmd->cmd_done, timeout)) { + dev_err(hdev->dev, "cid[%d] qid[%d] timeout, opcode[0x%x] subopcode[0x%x]\n", + adm_cmd->cid, adm_cmd->qid, cmd->usr_cmd.opcode, + cmd->usr_cmd.info_0.subopcode); + + /* reset controller if admin timeout */ + if (hiraid_admin_need_reset(cmd)) + hiraid_admin_timeout(hdev, adm_cmd); + + hiraid_put_cmd(hdev, adm_cmd, HIRAID_CMD_ADMIN); + return -ETIME; + } + + if (result0) + *result0 = adm_cmd->result0; + if (result1) + *result1 = adm_cmd->result1; + + hiraid_put_cmd(hdev, adm_cmd, HIRAID_CMD_ADMIN); + + return adm_cmd->status; +} + +/** + * hiraid_create_cq - send cmd to controller for create controller cq + */ +static int hiraid_create_complete_queue(struct hiraid_dev *hdev, u16 qid, + struct hiraid_queue *hiraidq, u16 cq_vector) +{ + struct hiraid_admin_command admin_cmd; + int flags = HIRAID_QUEUE_PHYS_CONTIG | HIRAID_CQ_IRQ_ENABLED; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.create_cq.opcode = HIRAID_ADMIN_CREATE_CQ; + admin_cmd.create_cq.prp1 = cpu_to_le64(hiraidq->cq_buffer_phy); + admin_cmd.create_cq.cqid = cpu_to_le16(qid); + admin_cmd.create_cq.qsize = cpu_to_le16(hiraidq->q_depth - 1); + admin_cmd.create_cq.cq_flags = cpu_to_le16(flags); + admin_cmd.create_cq.irq_vector = cpu_to_le16(cq_vector); + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +/** + * hiraid_create_sq - send cmd to controller for create controller sq + */ +static int hiraid_create_send_queue(struct hiraid_dev *hdev, u16 qid, + struct hiraid_queue *hiraidq) +{ + struct hiraid_admin_command admin_cmd; + int flags = HIRAID_QUEUE_PHYS_CONTIG; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.create_sq.opcode = HIRAID_ADMIN_CREATE_SQ; + admin_cmd.create_sq.prp1 = cpu_to_le64(hiraidq->sq_buffer_phy); + admin_cmd.create_sq.sqid = cpu_to_le16(qid); + admin_cmd.create_sq.qsize = cpu_to_le16(hiraidq->q_depth - 1); + admin_cmd.create_sq.sq_flags = cpu_to_le16(flags); + admin_cmd.create_sq.cqid = cpu_to_le16(qid); + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +static void hiraid_free_all_queues(struct hiraid_dev *hdev) +{ + int i; + struct hiraid_queue *hq; + + for (i = 0; i < hdev->queue_count; i++) { + hq = &hdev->queues[i]; + dma_free_coherent(hdev->dev, CQ_SIZE(hq->q_depth), + (void *)hq->cqes, hq->cq_buffer_phy); + dma_free_coherent(hdev->dev, SQ_SIZE(hq->qid, hq->q_depth), + hq->sq_cmds, hq->sq_buffer_phy); + if (!work_mode) + dma_free_coherent(hdev->dev, SENSE_SIZE(hq->q_depth), + hq->sense_buffer_virt, hq->sense_buffer_phy); + } + + hdev->queue_count = 0; +} + +static void hiraid_free_sense_buffer(struct hiraid_dev *hdev) +{ + if (hdev->sense_buffer_virt) { + dma_free_coherent(hdev->dev, + SENSE_SIZE(hdev->scsi_qd + max_hwq_num * HIRAID_PTHRU_CMDS_PERQ), + hdev->sense_buffer_virt, hdev->sense_buffer_phy); + hdev->sense_buffer_virt = NULL; + } +} + +static int hiraid_delete_queue(struct hiraid_dev *hdev, u8 opcode, u16 qid) +{ + struct hiraid_admin_command admin_cmd; + int ret; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.delete_queue.opcode = opcode; + admin_cmd.delete_queue.qid = cpu_to_le16(qid); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); + + if (ret) + dev_err(hdev->dev, "delete %s:[%d] failed\n", + (opcode == HIRAID_ADMIN_DELETE_CQ) ? "cq" : "sq", qid); + + return ret; +} + +static int hiraid_delete_complete_queue(struct hiraid_dev *hdev, u16 cqid) +{ + return hiraid_delete_queue(hdev, HIRAID_ADMIN_DELETE_CQ, cqid); +} + +static int hiraid_delete_send_queue(struct hiraid_dev *hdev, u16 sqid) +{ + return hiraid_delete_queue(hdev, HIRAID_ADMIN_DELETE_SQ, sqid); +} + +static int hiraid_create_queue(struct hiraid_queue *hiraidq, u16 qid) +{ + struct hiraid_dev *hdev = hiraidq->hdev; + u16 cq_vector; + int ret; + + cq_vector = (hdev->num_vecs == 1) ? 0 : qid; + ret = hiraid_create_complete_queue(hdev, qid, hiraidq, cq_vector); + if (ret) + return ret; + + ret = hiraid_create_send_queue(hdev, qid, hiraidq); + if (ret) + goto delete_cq; + + hiraidq->cq_vector = cq_vector; + ret = pci_request_irq(hdev->pdev, cq_vector, hiraid_handle_irq, NULL, + hiraidq, "hiraid%d_q%d", hdev->instance, qid); + if (ret) { + hiraidq->cq_vector = -1; + dev_err(hdev->dev, "request queue[%d] irq failed\n", qid); + goto delete_sq; + } + + hiraid_init_queue(hiraidq, qid); + + return 0; + +delete_sq: + hiraid_delete_send_queue(hdev, qid); +delete_cq: + hiraid_delete_complete_queue(hdev, qid); + + return ret; +} + +static int hiraid_create_io_queues(struct hiraid_dev *hdev) +{ + u32 i, max; + int ret = 0; + + max = min(hdev->max_qid, hdev->queue_count - 1); + for (i = hdev->online_queues; i <= max; i++) { + ret = hiraid_create_queue(&hdev->queues[i], i); + if (ret) { + dev_err(hdev->dev, "create queue[%d] failed\n", i); + break; + } + } + + if (!hdev->last_qcnt) + hdev->last_qcnt = hdev->online_queues; + + dev_info(hdev->dev, "queue_count[%d] online_queue[%d] last_online[%d]", + hdev->queue_count, hdev->online_queues, hdev->last_qcnt); + + return ret >= 0 ? 0 : ret; +} + +static int hiraid_set_features(struct hiraid_dev *hdev, u32 fid, u32 dword11, void *buffer, + size_t buflen, u32 *result) +{ + struct hiraid_admin_command admin_cmd; + int ret; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + + if (buffer && buflen) { + data_ptr = dma_alloc_coherent(hdev->dev, buflen, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memcpy(data_ptr, buffer, buflen); + } + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.features.opcode = HIRAID_ADMIN_SET_FEATURES; + admin_cmd.features.fid = cpu_to_le32(fid); + admin_cmd.features.dword11 = cpu_to_le32(dword11); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, result, NULL, 0); + + if (data_ptr) + dma_free_coherent(hdev->dev, buflen, data_ptr, buffer_phy); + + return ret; +} + +static int hiraid_configure_timestamp(struct hiraid_dev *hdev) +{ + __le64 timestamp; + int ret; + + timestamp = cpu_to_le64(ktime_to_ms(ktime_get_real())); + ret = hiraid_set_features(hdev, HIRAID_FEATURE_TIMESTAMP, 0, + &timestamp, sizeof(timestamp), NULL); + + if (ret) + dev_err(hdev->dev, "set timestamp failed[%d]\n", ret); + return ret; +} + +static int hiraid_get_queue_cnt(struct hiraid_dev *hdev, u32 *cnt) +{ + u32 q_cnt = (*cnt - 1) | ((*cnt - 1) << 16); + u32 nr_ioqs, result; + int status; + + status = hiraid_set_features(hdev, HIRAID_FEATURE_NUM_QUEUES, q_cnt, NULL, 0, &result); + if (status) { + dev_err(hdev->dev, "set queue count failed, status[%d]\n", + status); + return -EIO; + } + + nr_ioqs = min(result & 0xffff, result >> 16) + 1; + *cnt = min(*cnt, nr_ioqs); + if (*cnt == 0) { + dev_err(hdev->dev, "illegal qcount: zero, nr_ioqs[%d], cnt[%d]\n", nr_ioqs, *cnt); + return -EIO; + } + return 0; +} + +static int hiraid_setup_io_queues(struct hiraid_dev *hdev) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + struct pci_dev *pdev = hdev->pdev; + u32 i, size, nr_ioqs; + int ret; + + struct irq_affinity affd = { + .pre_vectors = 1 + }; + + /* alloc IO sense buffer for single hw queue mode */ + if (work_mode && !hdev->sense_buffer_virt) { + hdev->sense_buffer_virt = dma_alloc_coherent(hdev->dev, + SENSE_SIZE(hdev->scsi_qd + max_hwq_num * HIRAID_PTHRU_CMDS_PERQ), + &hdev->sense_buffer_phy, GFP_KERNEL | __GFP_ZERO); + if (!hdev->sense_buffer_virt) + return -ENOMEM; + } + + nr_ioqs = min(num_online_cpus(), max_hwq_num); + ret = hiraid_get_queue_cnt(hdev, &nr_ioqs); + if (ret < 0) + return ret; + + size = hiraid_get_bar_size(hdev, nr_ioqs); + ret = hiraid_remap_bar(hdev, size); + if (ret) + return -ENOMEM; + + adminq->q_db = hdev->dbs; + + pci_free_irq(pdev, 0, adminq); + pci_free_irq_vectors(pdev); + hdev->online_queues--; + + ret = pci_alloc_irq_vectors_affinity(pdev, 1, (nr_ioqs + 1), + PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); + if (ret <= 0) + return -EIO; + + hdev->num_vecs = ret; + hdev->max_qid = max(ret - 1, 1); + + ret = pci_request_irq(pdev, adminq->cq_vector, hiraid_handle_irq, NULL, + adminq, "hiraid%d_q%d", hdev->instance, adminq->qid); + if (ret) { + dev_err(hdev->dev, "request admin irq failed\n"); + adminq->cq_vector = -1; + return ret; + } + + hdev->online_queues++; + + for (i = hdev->queue_count; i <= hdev->max_qid; i++) { + ret = hiraid_alloc_queue(hdev, i, hdev->ioq_depth); + if (ret) + break; + } + dev_info(hdev->dev, "max_qid[%d] queuecount[%d] onlinequeue[%d] ioqdepth[%d]\n", + hdev->max_qid, hdev->queue_count, hdev->online_queues, hdev->ioq_depth); + + return hiraid_create_io_queues(hdev); +} + +static void hiraid_delete_io_queues(struct hiraid_dev *hdev) +{ + u16 queues = hdev->online_queues - 1; + u8 opcode = HIRAID_ADMIN_DELETE_SQ; + u16 i, pass; + + if (!pci_device_is_present(hdev->pdev)) { + dev_err(hdev->dev, "pci_device is not present, skip disable io queues\n"); + return; + } + + if (hdev->online_queues < 2) { + dev_err(hdev->dev, "err, io queue has been delete\n"); + return; + } + + for (pass = 0; pass < 2; pass++) { + for (i = queues; i > 0; i--) + if (hiraid_delete_queue(hdev, opcode, i)) + break; + + opcode = HIRAID_ADMIN_DELETE_CQ; + } +} + +static void hiraid_pci_disable(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + u32 i; + + for (i = 0; i < hdev->online_queues; i++) + pci_free_irq(pdev, hdev->queues[i].cq_vector, &hdev->queues[i]); + pci_free_irq_vectors(pdev); + if (pci_is_enabled(pdev)) { + pci_disable_pcie_error_reporting(pdev); + pci_disable_device(pdev); + } + hdev->online_queues = 0; +} + +static void hiraid_disable_admin_queue(struct hiraid_dev *hdev, bool shutdown) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + u16 start, end; + + if (pci_device_is_present(hdev->pdev)) { + if (shutdown) + hiraid_shutdown_control(hdev); + else + hiraid_disable_control(hdev); + } + + if (hdev->queue_count == 0) { + dev_err(hdev->dev, "err, admin queue has been delete\n"); + return; + } + + spin_lock_irq(&adminq->cq_lock); + hiraid_process_cq(adminq, &start, &end, -1); + spin_unlock_irq(&adminq->cq_lock); + hiraid_complete_cqes(adminq, start, end); +} + +static int hiraid_create_prp_pools(struct hiraid_dev *hdev) +{ + int i; + char poolname[20] = { 0 }; + + hdev->prp_page_pool = dma_pool_create("prp list page", hdev->dev, + PAGE_SIZE, PAGE_SIZE, 0); + + if (!hdev->prp_page_pool) { + dev_err(hdev->dev, "create prp_page_pool failed\n"); + return -ENOMEM; + } + + for (i = 0; i < extra_pool_num; i++) { + sprintf(poolname, "prp_list_256_%d", i); + hdev->prp_extra_pool[i] = dma_pool_create(poolname, hdev->dev, EXTRA_POOL_SIZE, + EXTRA_POOL_SIZE, 0); + + if (!hdev->prp_extra_pool[i]) { + dev_err(hdev->dev, "create prp extra pool[%d] failed\n", i); + goto destroy_prp_extra_pool; + } + } + + return 0; + +destroy_prp_extra_pool: + while (i > 0) + dma_pool_destroy(hdev->prp_extra_pool[--i]); + dma_pool_destroy(hdev->prp_page_pool); + + return -ENOMEM; +} + +static void hiraid_free_prp_pools(struct hiraid_dev *hdev) +{ + int i; + + for (i = 0; i < extra_pool_num; i++) + dma_pool_destroy(hdev->prp_extra_pool[i]); + dma_pool_destroy(hdev->prp_page_pool); +} + +static int hiraid_request_devices(struct hiraid_dev *hdev, struct hiraid_dev_info *dev) +{ + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + struct hiraid_admin_command admin_cmd; + struct hiraid_dev_list *list_buf; + dma_addr_t buffer_phy = 0; + u32 i, idx, hdid, ndev; + int ret = 0; + + list_buf = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!list_buf) + return -ENOMEM; + + for (idx = 0; idx < nd;) { + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.get_info.opcode = HIRAID_ADMIN_GET_INFO; + admin_cmd.get_info.type = HIRAID_GET_DEVLIST_INFO; + admin_cmd.get_info.cdw11 = cpu_to_le32(idx); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); + + if (ret) { + dev_err(hdev->dev, "get device list failed, nd[%u] idx[%u] ret[%d]\n", + nd, idx, ret); + goto out; + } + ndev = le32_to_cpu(list_buf->dev_num); + + dev_info(hdev->dev, "get dev list ndev num[%u]\n", ndev); + + for (i = 0; i < ndev; i++) { + hdid = le32_to_cpu(list_buf->devinfo[i].hdid); + dev_info(hdev->dev, "devices[%d], hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + i, hdid, le16_to_cpu(list_buf->devinfo[i].target), + list_buf->devinfo[i].channel, + list_buf->devinfo[i].lun, + list_buf->devinfo[i].attr); + if (hdid > nd || hdid == 0) { + dev_err(hdev->dev, "err, hdid[%d] invalid\n", hdid); + continue; + } + memcpy(&dev[hdid - 1], &list_buf->devinfo[i], + sizeof(struct hiraid_dev_info)); + } + idx += ndev; + + if (ndev < MAX_DEV_ENTRY_PER_PAGE_4K) + break; + } + +out: + dma_free_coherent(hdev->dev, PAGE_SIZE, list_buf, buffer_phy); + return ret; +} + +static void hiraid_send_async_event(struct hiraid_dev *hdev, u16 cid) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + struct hiraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.common.opcode = HIRAID_ADMIN_ASYNC_EVENT; + admin_cmd.common.cmd_id = cpu_to_le16(cid); + + hiraid_submit_cmd(adminq, &admin_cmd); + dev_info(hdev->dev, "send async event to controller, cid[%d]\n", cid); +} + +static inline void hiraid_init_async_event(struct hiraid_dev *hdev) +{ + u16 i; + + for (i = 0; i < hdev->ctrl_info->asynevent; i++) + hiraid_send_async_event(hdev, i + HIRAID_AQ_BLK_MQ_DEPTH); +} + +static int hiraid_add_device(struct hiraid_dev *hdev, struct hiraid_dev_info *devinfo) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + dev_info(hdev->dev, "add device, hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + le32_to_cpu(devinfo->hdid), le16_to_cpu(devinfo->target), + devinfo->channel, devinfo->lun, devinfo->attr); + + sdev = scsi_device_lookup(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + if (sdev) { + dev_warn(hdev->dev, "device is already exist, channel[%d] targetid[%d] lun[%d]\n", + devinfo->channel, le16_to_cpu(devinfo->target), 0); + scsi_device_put(sdev); + return -EEXIST; + } + scsi_add_device(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + return 0; +} + +static int hiraid_rescan_device(struct hiraid_dev *hdev, struct hiraid_dev_info *devinfo) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + dev_info(hdev->dev, "rescan device, hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + le32_to_cpu(devinfo->hdid), le16_to_cpu(devinfo->target), + devinfo->channel, devinfo->lun, devinfo->attr); + + sdev = scsi_device_lookup(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + if (!sdev) { + dev_warn(hdev->dev, "device is not exit rescan it, channel[%d] target_id[%d] lun[%d]\n", + devinfo->channel, le16_to_cpu(devinfo->target), 0); + return -ENODEV; + } + + scsi_rescan_device(&sdev->sdev_gendev); + scsi_device_put(sdev); + return 0; +} + +static int hiraid_delete_device(struct hiraid_dev *hdev, struct hiraid_dev_info *devinfo) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + dev_info(hdev->dev, "remove device, hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + le32_to_cpu(devinfo->hdid), le16_to_cpu(devinfo->target), + devinfo->channel, devinfo->lun, devinfo->attr); + + sdev = scsi_device_lookup(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + if (!sdev) { + dev_warn(hdev->dev, "device is not exit remove it, channel[%d] target_id[%d] lun[%d]\n", + devinfo->channel, le16_to_cpu(devinfo->target), 0); + return -ENODEV; + } + + scsi_remove_device(sdev); + scsi_device_put(sdev); + return 0; +} + +static int hiraid_dev_list_init(struct hiraid_dev *hdev) +{ + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + + hdev->dev_info = kzalloc_node(nd * sizeof(struct hiraid_dev_info), + GFP_KERNEL, hdev->numa_node); + if (!hdev->dev_info) + return -ENOMEM; + + return 0; +} + +static int hiraid_luntarget_sort(const void *l, const void *r) +{ + const struct hiraid_dev_info *ln = l; + const struct hiraid_dev_info *rn = r; + int l_attr = HIRAID_DEV_INFO_ATTR_BOOT(ln->attr); + int r_attr = HIRAID_DEV_INFO_ATTR_BOOT(rn->attr); + + /* boot first */ + if (l_attr != r_attr) + return (r_attr - l_attr); + + if (ln->channel == rn->channel) + return le16_to_cpu(ln->target) - le16_to_cpu(rn->target); + + return ln->channel - rn->channel; +} + +static void hiraid_scan_work(struct work_struct *work) +{ + struct hiraid_dev *hdev = + container_of(work, struct hiraid_dev, scan_work); + struct hiraid_dev_info *dev, *old_dev, *new_dev; + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + u8 flag, org_flag; + int i, ret; + int count = 0; + + dev = kcalloc(nd, sizeof(struct hiraid_dev_info), GFP_KERNEL); + if (!dev) + return; + + new_dev = kcalloc(nd, sizeof(struct hiraid_dev_info), GFP_KERNEL); + if (!new_dev) + goto free_list; + + ret = hiraid_request_devices(hdev, dev); + if (ret) + goto free_all; + old_dev = hdev->dev_info; + for (i = 0; i < nd; i++) { + org_flag = old_dev[i].flag; + flag = dev[i].flag; + + dev_log_dbg(hdev->dev, "i[%d] org_flag[0x%x] flag[0x%x]\n", i, org_flag, flag); + + if (HIRAID_DEV_INFO_FLAG_VALID(flag)) { + if (!HIRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->dev_rwsem); + memcpy(&old_dev[i], &dev[i], + sizeof(struct hiraid_dev_info)); + memcpy(&new_dev[count++], &dev[i], + sizeof(struct hiraid_dev_info)); + up_write(&hdev->dev_rwsem); + } else if (HIRAID_DEV_INFO_FLAG_CHANGE(flag)) { + hiraid_rescan_device(hdev, &dev[i]); + } + } else { + if (HIRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->dev_rwsem); + old_dev[i].flag &= 0xfe; + up_write(&hdev->dev_rwsem); + hiraid_delete_device(hdev, &old_dev[i]); + } + } + } + + dev_info(hdev->dev, "scan work add device num[%d]\n", count); + + sort(new_dev, count, sizeof(new_dev[0]), hiraid_luntarget_sort, NULL); + + for (i = 0; i < count; i++) + hiraid_add_device(hdev, &new_dev[i]); + +free_all: + kfree(new_dev); +free_list: + kfree(dev); +} + +static void hiraid_timesyn_work(struct work_struct *work) +{ + struct hiraid_dev *hdev = + container_of(work, struct hiraid_dev, timesyn_work); + + hiraid_configure_timestamp(hdev); +} + +static int hiraid_init_control_info(struct hiraid_dev *hdev); +static void hiraid_fwactive_work(struct work_struct *work) +{ + struct hiraid_dev *hdev = container_of(work, struct hiraid_dev, fwact_work); + + if (hiraid_init_control_info(hdev)) + dev_err(hdev->dev, "get controller info failed after fw activation\n"); +} + +static void hiraid_queue_scan(struct hiraid_dev *hdev) +{ + queue_work(work_queue, &hdev->scan_work); +} + +static void hiraid_handle_async_notice(struct hiraid_dev *hdev, u32 result) +{ + switch ((result & 0xff00) >> 8) { + case HIRAID_ASYN_DEV_CHANGED: + hiraid_queue_scan(hdev); + break; + case HIRAID_ASYN_FW_ACT_START: + dev_info(hdev->dev, "fw activation starting\n"); + break; + case HIRAID_ASYN_HOST_PROBING: + break; + default: + dev_warn(hdev->dev, "async event result[%08x]\n", result); + } +} + +static void hiraid_handle_async_vs(struct hiraid_dev *hdev, u32 result, u32 result1) +{ + switch ((result & 0xff00) >> 8) { + case HIRAID_ASYN_TIMESYN: + queue_work(work_queue, &hdev->timesyn_work); + break; + case HIRAID_ASYN_FW_ACT_FINISH: + dev_info(hdev->dev, "fw activation finish\n"); + queue_work(work_queue, &hdev->fwact_work); + break; + case HIRAID_ASYN_EVENT_MIN ... HIRAID_ASYN_EVENT_MAX: + dev_info(hdev->dev, "recv card event[%d] param1[0x%x] param2[0x%x]\n", + (result & 0xff00) >> 8, result, result1); + break; + default: + dev_warn(hdev->dev, "async event result[0x%x]\n", result); + } +} + +static int hiraid_alloc_resources(struct hiraid_dev *hdev) +{ + int ret, nqueue; + + hdev->ctrl_info = kzalloc_node(sizeof(*hdev->ctrl_info), GFP_KERNEL, hdev->numa_node); + if (!hdev->ctrl_info) + return -ENOMEM; + + ret = hiraid_create_prp_pools(hdev); + if (ret) + goto free_ctrl_info; + nqueue = min(num_possible_cpus(), max_hwq_num) + 1; + hdev->queues = kcalloc_node(nqueue, sizeof(struct hiraid_queue), + GFP_KERNEL, hdev->numa_node); + if (!hdev->queues) { + ret = -ENOMEM; + goto destroy_dma_pools; + } + + ret = hiraid_create_admin_cmds(hdev); + if (ret) + goto free_queues; + + dev_info(hdev->dev, "total queues num[%d]\n", nqueue); + + return 0; + +free_queues: + kfree(hdev->queues); +destroy_dma_pools: + hiraid_free_prp_pools(hdev); +free_ctrl_info: + kfree(hdev->ctrl_info); + + return ret; +} + +static void hiraid_free_resources(struct hiraid_dev *hdev) +{ + hiraid_free_admin_cmds(hdev); + kfree(hdev->queues); + hiraid_free_prp_pools(hdev); + kfree(hdev->ctrl_info); +} + +static void hiraid_bsg_buf_unmap(struct hiraid_dev *hdev, struct bsg_job *job) +{ + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_mapmange *mapbuf = job->dd_data; + enum dma_data_direction dma_dir = rq_data_dir(rq) ? DMA_TO_DEVICE : DMA_FROM_DEVICE; + + if (mapbuf->sge_cnt) + dma_unmap_sg(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, dma_dir); + + hiraid_free_mapbuf(hdev, mapbuf); +} + +static int hiraid_bsg_buf_map(struct hiraid_dev *hdev, struct bsg_job *job, + struct hiraid_admin_command *cmd) +{ + struct hiraid_bsg_request *bsg_req = job->request; + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_mapmange *mapbuf = job->dd_data; + enum dma_data_direction dma_dir = rq_data_dir(rq) ? DMA_TO_DEVICE : DMA_FROM_DEVICE; + int ret = 0; + + /* No data to DMA, it may be scsi no-rw command */ + mapbuf->sge_cnt = job->request_payload.sg_cnt; + mapbuf->sgl = job->request_payload.sg_list; + mapbuf->len = job->request_payload.payload_len; + mapbuf->page_cnt = -1; + if (unlikely(mapbuf->sge_cnt == 0)) + goto out; + + mapbuf->use_sgl = !hiraid_is_prp(hdev, mapbuf->sgl, mapbuf->sge_cnt); + + ret = dma_map_sg_attrs(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, dma_dir, DMA_ATTR_NO_WARN); + if (!ret) + goto out; + + if ((mapbuf->use_sgl == (bool)true) && (bsg_req->msgcode == HIRAID_BSG_IOPTHRU) && + (hdev->ctrl_info->pt_use_sgl != (bool)false)) { + ret = hiraid_build_passthru_sgl(hdev, cmd, mapbuf); + } else { + mapbuf->use_sgl = false; + + ret = hiraid_build_passthru_prp(hdev, mapbuf); + cmd->common.dptr.prp1 = cpu_to_le64(sg_dma_address(mapbuf->sgl)); + cmd->common.dptr.prp2 = cpu_to_le64(mapbuf->first_dma); + } + + if (ret) + goto unmap; + + return 0; + +unmap: + dma_unmap_sg(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, dma_dir); +out: + return ret; +} + +static int hiraid_get_control_info(struct hiraid_dev *hdev, struct hiraid_ctrl_info *ctrl_info) +{ + struct hiraid_admin_command admin_cmd; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + int ret; + + data_ptr = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.get_info.opcode = HIRAID_ADMIN_GET_INFO; + admin_cmd.get_info.type = HIRAID_GET_CTRL_INFO; + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); + if (!ret) + memcpy(ctrl_info, data_ptr, sizeof(struct hiraid_ctrl_info)); + + dma_free_coherent(hdev->dev, PAGE_SIZE, data_ptr, buffer_phy); + + return ret; +} + +static int hiraid_init_control_info(struct hiraid_dev *hdev) +{ + int ret; + + hdev->ctrl_info->nd = cpu_to_le32(240); + hdev->ctrl_info->mdts = 8; + hdev->ctrl_info->max_cmds = cpu_to_le16(4096); + hdev->ctrl_info->max_num_sge = cpu_to_le16(128); + hdev->ctrl_info->max_channel = cpu_to_le16(4); + hdev->ctrl_info->max_tgt_id = cpu_to_le32(3239); + hdev->ctrl_info->max_lun = cpu_to_le16(2); + + ret = hiraid_get_control_info(hdev, hdev->ctrl_info); + if (ret) + dev_err(hdev->dev, "get controller info failed[%d]\n", ret); + + dev_info(hdev->dev, "device_num = %d\n", hdev->ctrl_info->nd); + dev_info(hdev->dev, "max_cmd = %d\n", hdev->ctrl_info->max_cmds); + dev_info(hdev->dev, "max_channel = %d\n", hdev->ctrl_info->max_channel); + dev_info(hdev->dev, "max_tgt_id = %d\n", hdev->ctrl_info->max_tgt_id); + dev_info(hdev->dev, "max_lun = %d\n", hdev->ctrl_info->max_lun); + dev_info(hdev->dev, "max_num_sge = %d\n", hdev->ctrl_info->max_num_sge); + dev_info(hdev->dev, "lun_num_boot = %d\n", hdev->ctrl_info->lun_num_boot); + dev_info(hdev->dev, "max_data_transfer_size = %d\n", hdev->ctrl_info->mdts); + dev_info(hdev->dev, "abort_cmd_limit = %d\n", hdev->ctrl_info->acl); + dev_info(hdev->dev, "asyn_event_num = %d\n", hdev->ctrl_info->asynevent); + dev_info(hdev->dev, "card_type = %d\n", hdev->ctrl_info->card_type); + dev_info(hdev->dev, "pt_use_sgl = %d\n", hdev->ctrl_info->pt_use_sgl); + dev_info(hdev->dev, "rtd3e = %d\n", hdev->ctrl_info->rtd3e); + dev_info(hdev->dev, "serial_num = %s\n", hdev->ctrl_info->sn); + dev_info(hdev->dev, "fw_verion = %s\n", hdev->ctrl_info->fw_version); + + if (!hdev->ctrl_info->asynevent) + hdev->ctrl_info->asynevent = 1; + if (hdev->ctrl_info->asynevent > HIRAID_ASYN_COMMANDS) + hdev->ctrl_info->asynevent = HIRAID_ASYN_COMMANDS; + + hdev->scsi_qd = work_mode ? + le16_to_cpu(hdev->ctrl_info->max_cmds) : (hdev->ioq_depth - HIRAID_PTHRU_CMDS_PERQ); + + return 0; +} + +static int hiraid_user_send_admcmd(struct hiraid_dev *hdev, struct bsg_job *job) +{ + struct hiraid_bsg_request *bsg_req = job->request; + struct hiraid_passthru_common_cmd *ptcmd = &(bsg_req->admcmd); + struct hiraid_admin_command admin_cmd; + u32 timeout = msecs_to_jiffies(ptcmd->timeout_ms); + u32 result[2] = {0}; + int status; + + if (hdev->state >= DEV_RESETTING) { + dev_err(hdev->dev, "err, host state[%d] is not right\n", + hdev->state); + return -EBUSY; + } + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.common.opcode = ptcmd->opcode; + admin_cmd.common.flags = ptcmd->flags; + admin_cmd.common.hdid = cpu_to_le32(ptcmd->nsid); + admin_cmd.common.cdw2[0] = cpu_to_le32(ptcmd->cdw2); + admin_cmd.common.cdw2[1] = cpu_to_le32(ptcmd->cdw3); + admin_cmd.common.cdw10 = cpu_to_le32(ptcmd->cdw10); + admin_cmd.common.cdw11 = cpu_to_le32(ptcmd->cdw11); + admin_cmd.common.cdw12 = cpu_to_le32(ptcmd->cdw12); + admin_cmd.common.cdw13 = cpu_to_le32(ptcmd->cdw13); + admin_cmd.common.cdw14 = cpu_to_le32(ptcmd->cdw14); + admin_cmd.common.cdw15 = cpu_to_le32(ptcmd->cdw15); + + status = hiraid_bsg_buf_map(hdev, job, &admin_cmd); + if (status) { + dev_err(hdev->dev, "err, map data failed\n"); + return status; + } + + status = hiraid_put_admin_sync_request(hdev, &admin_cmd, &result[0], &result[1], timeout); + if (status >= 0) { + job->reply_len = sizeof(result); + memcpy(job->reply, result, sizeof(result)); + } + if (status) + dev_info(hdev->dev, "opcode[0x%x] subopcode[0x%x] status[0x%x] result0[0x%x];" + "result1[0x%x]\n", ptcmd->opcode, ptcmd->info_0.subopcode, status, + result[0], result[1]); + + hiraid_bsg_buf_unmap(hdev, job); + + return status; +} + +static int hiraid_alloc_io_ptcmds(struct hiraid_dev *hdev) +{ + u32 i; + u32 ptnum = HIRAID_TOTAL_PTCMDS(hdev->online_queues - 1); + + INIT_LIST_HEAD(&hdev->io_pt_list); + spin_lock_init(&hdev->io_pt_lock); + + hdev->io_ptcmds = kcalloc_node(ptnum, sizeof(struct hiraid_cmd), + GFP_KERNEL, hdev->numa_node); + + if (!hdev->io_ptcmds) { + dev_err(hdev->dev, "alloc io pthrunum failed\n"); + return -ENOMEM; + } + + for (i = 0; i < ptnum; i++) { + hdev->io_ptcmds[i].qid = i / HIRAID_PTHRU_CMDS_PERQ + 1; + hdev->io_ptcmds[i].cid = i % HIRAID_PTHRU_CMDS_PERQ + hdev->scsi_qd; + list_add_tail(&(hdev->io_ptcmds[i].list), &hdev->io_pt_list); + } + + dev_info(hdev->dev, "alloc io pthru cmd success, pthrunum[%d]\n", ptnum); + + return 0; +} + +static void hiraid_free_io_ptcmds(struct hiraid_dev *hdev) +{ + kfree(hdev->io_ptcmds); + hdev->io_ptcmds = NULL; + + INIT_LIST_HEAD(&hdev->io_pt_list); +} + +static int hiraid_put_io_sync_request(struct hiraid_dev *hdev, struct hiraid_scsi_io_cmd *io_cmd, + u32 *result, u32 *reslen, u32 timeout) +{ + int ret; + dma_addr_t buffer_phy; + struct hiraid_queue *ioq; + void *sense_addr = NULL; + struct hiraid_cmd *pt_cmd = hiraid_get_cmd(hdev, HIRAID_CMD_PTHRU); + + if (!pt_cmd) { + dev_err(hdev->dev, "err, get ioq cmd failed\n"); + return -EFAULT; + } + + timeout = timeout ? timeout : ADMIN_TIMEOUT; + + init_completion(&pt_cmd->cmd_done); + + ioq = &hdev->queues[pt_cmd->qid]; + if (work_mode) { + ret = ((pt_cmd->qid - 1) * HIRAID_PTHRU_CMDS_PERQ + pt_cmd->cid) * + SCSI_SENSE_BUFFERSIZE; + sense_addr = hdev->sense_buffer_virt + ret; + buffer_phy = hdev->sense_buffer_phy + ret; + } else { + ret = pt_cmd->cid * SCSI_SENSE_BUFFERSIZE; + sense_addr = ioq->sense_buffer_virt + ret; + buffer_phy = ioq->sense_buffer_phy + ret; + } + + io_cmd->common.sense_addr = cpu_to_le64(buffer_phy); + io_cmd->common.sense_len = cpu_to_le16(SCSI_SENSE_BUFFERSIZE); + io_cmd->common.cmd_id = cpu_to_le16(pt_cmd->cid); + + hiraid_submit_cmd(ioq, io_cmd); + + if (!wait_for_completion_timeout(&pt_cmd->cmd_done, timeout)) { + dev_err(hdev->dev, "cid[%d] qid[%d] timeout, opcode[0x%x] subopcode[0x%x]\n", + pt_cmd->cid, pt_cmd->qid, io_cmd->common.opcode, + (le32_to_cpu(io_cmd->common.cdw3[0]) & 0xffff)); + + hiraid_admin_timeout(hdev, pt_cmd); + + hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); + return -ETIME; + } + + if (result && reslen) { + if ((pt_cmd->status & 0x17f) == 0x101) { + memcpy(result, sense_addr, SCSI_SENSE_BUFFERSIZE); + *reslen = SCSI_SENSE_BUFFERSIZE; + } + } + + hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); + + return pt_cmd->status; +} + +static int hiraid_user_send_ptcmd(struct hiraid_dev *hdev, struct bsg_job *job) +{ + struct hiraid_bsg_request *bsg_req = (struct hiraid_bsg_request *)(job->request); + struct hiraid_passthru_io_cmd *cmd = &(bsg_req->pthrucmd); + struct hiraid_scsi_io_cmd pthru_cmd; + int status = 0; + u32 timeout = msecs_to_jiffies(cmd->timeout_ms); + // data len is 4k before use sgl, now len is 1M + u32 io_pt_data_len = (hdev->ctrl_info->pt_use_sgl == (bool)true) ? + IOQ_PT_SGL_DATA_LEN : IOQ_PT_DATA_LEN; + + if (cmd->data_len > io_pt_data_len) { + dev_err(hdev->dev, "data len bigger than %d\n", io_pt_data_len); + return -EFAULT; + } + + if (hdev->state != DEV_LIVE) { + dev_err(hdev->dev, "err, host state[%d] is not live\n", hdev->state); + return -EBUSY; + } + + memset(&pthru_cmd, 0, sizeof(pthru_cmd)); + pthru_cmd.common.opcode = cmd->opcode; + pthru_cmd.common.flags = cmd->flags; + pthru_cmd.common.hdid = cpu_to_le32(cmd->nsid); + pthru_cmd.common.sense_len = cpu_to_le16(cmd->info_0.res_sense_len); + pthru_cmd.common.cdb_len = cmd->info_0.cdb_len; + pthru_cmd.common.rsvd2 = cmd->info_0.rsvd0; + pthru_cmd.common.cdw3[0] = cpu_to_le32(cmd->cdw3); + pthru_cmd.common.cdw3[1] = cpu_to_le32(cmd->cdw4); + pthru_cmd.common.cdw3[2] = cpu_to_le32(cmd->cdw5); + + pthru_cmd.common.cdw10[0] = cpu_to_le32(cmd->cdw10); + pthru_cmd.common.cdw10[1] = cpu_to_le32(cmd->cdw11); + pthru_cmd.common.cdw10[2] = cpu_to_le32(cmd->cdw12); + pthru_cmd.common.cdw10[3] = cpu_to_le32(cmd->cdw13); + pthru_cmd.common.cdw10[4] = cpu_to_le32(cmd->cdw14); + pthru_cmd.common.cdw10[5] = cpu_to_le32(cmd->data_len); + + memcpy(pthru_cmd.common.cdb, &cmd->cdw16, cmd->info_0.cdb_len); + + pthru_cmd.common.cdw26[0] = cpu_to_le32(cmd->cdw26[0]); + pthru_cmd.common.cdw26[1] = cpu_to_le32(cmd->cdw26[1]); + pthru_cmd.common.cdw26[2] = cpu_to_le32(cmd->cdw26[2]); + pthru_cmd.common.cdw26[3] = cpu_to_le32(cmd->cdw26[3]); + + status = hiraid_bsg_buf_map(hdev, job, (struct hiraid_admin_command *)&pthru_cmd); + if (status) { + dev_err(hdev->dev, "err, map data failed\n"); + return status; + } + + status = hiraid_put_io_sync_request(hdev, &pthru_cmd, job->reply, &job->reply_len, timeout); + + if (status) + dev_info(hdev->dev, "opcode[0x%x] subopcode[0x%x] status[0x%x] replylen[%d]\n", + cmd->opcode, cmd->info_1.subopcode, status, job->reply_len); + + hiraid_bsg_buf_unmap(hdev, job); + + return status; +} + +static bool hiraid_check_scmd_finished(struct scsi_cmnd *scmd) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + struct hiraid_queue *hiraidq; + + hiraidq = mapbuf->hiraidq; + if (!hiraidq) + return false; + if (READ_ONCE(mapbuf->state) == CMD_COMPLETE || hiraid_poll_cq(hiraidq, mapbuf->cid)) { + dev_warn(hdev->dev, "cid[%d] qid[%d] has been completed\n", + mapbuf->cid, hiraidq->qid); + return true; + } + return false; +} + +static enum blk_eh_timer_return hiraid_timed_out(struct scsi_cmnd *scmd) +{ + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + unsigned int timeout = scmd->device->request_queue->rq_timeout; + + if (hiraid_check_scmd_finished(scmd)) + goto out; + + if (time_after(jiffies, scmd->jiffies_at_alloc + timeout)) { + if (cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_TIMEOUT) == CMD_FLIGHT) + return BLK_EH_DONE; + } +out: + return BLK_EH_RESET_TIMER; +} + +/* send abort command by admin queue temporary */ +static int hiraid_send_abort_cmd(struct hiraid_dev *hdev, u32 hdid, u16 qid, u16 cid) +{ + struct hiraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.abort.opcode = HIRAID_ADMIN_ABORT_CMD; + admin_cmd.abort.hdid = cpu_to_le32(hdid); + admin_cmd.abort.sqid = cpu_to_le16(qid); + admin_cmd.abort.cid = cpu_to_le16(cid); + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +/* send reset command by admin quueue temporary */ +static int hiraid_send_reset_cmd(struct hiraid_dev *hdev, u8 type, u32 hdid) +{ + struct hiraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.reset.opcode = HIRAID_ADMIN_RESET; + admin_cmd.reset.hdid = cpu_to_le32(hdid); + admin_cmd.reset.type = type; + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +static bool hiraid_dev_state_trans(struct hiraid_dev *hdev, enum hiraid_dev_state new_state) +{ + unsigned long flags; + enum hiraid_dev_state old_state; + bool change = false; + + spin_lock_irqsave(&hdev->state_lock, flags); + + old_state = hdev->state; + switch (new_state) { + case DEV_LIVE: + switch (old_state) { + case DEV_NEW: + case DEV_RESETTING: + change = true; + break; + default: + break; + } + break; + case DEV_RESETTING: + switch (old_state) { + case DEV_LIVE: + change = true; + break; + default: + break; + } + break; + case DEV_DELETING: + if (old_state != DEV_DELETING) + change = true; + break; + case DEV_DEAD: + switch (old_state) { + case DEV_NEW: + case DEV_LIVE: + case DEV_RESETTING: + change = true; + break; + default: + break; + } + break; + default: + break; + } + if (change) + hdev->state = new_state; + spin_unlock_irqrestore(&hdev->state_lock, flags); + + dev_info(hdev->dev, "oldstate[%d]->newstate[%d], change[%d]\n", + old_state, new_state, change); + + return change; +} + +static void hiraid_drain_pending_ios(struct hiraid_dev *hdev); + +static void hiraid_flush_running_cmds(struct hiraid_dev *hdev) +{ + int i, j; + + scsi_block_requests(hdev->shost); + hiraid_drain_pending_ios(hdev); + scsi_unblock_requests(hdev->shost); + + j = HIRAID_AQ_BLK_MQ_DEPTH; + for (i = 0; i < j; i++) { + if (READ_ONCE(hdev->adm_cmds[i].state) == CMD_FLIGHT) { + dev_info(hdev->dev, "flush admin, cid[%d]\n", i); + hdev->adm_cmds[i].status = 0xFFFF; + WRITE_ONCE(hdev->adm_cmds[i].state, CMD_COMPLETE); + complete(&(hdev->adm_cmds[i].cmd_done)); + } + } + + j = HIRAID_TOTAL_PTCMDS(hdev->online_queues - 1); + for (i = 0; i < j; i++) { + if (READ_ONCE(hdev->io_ptcmds[i].state) == CMD_FLIGHT) { + hdev->io_ptcmds[i].status = 0xFFFF; + WRITE_ONCE(hdev->io_ptcmds[i].state, CMD_COMPLETE); + complete(&(hdev->io_ptcmds[i].cmd_done)); + } + } +} + +static int hiraid_dev_disable(struct hiraid_dev *hdev, bool shutdown) +{ + int ret = -ENODEV; + struct hiraid_queue *adminq = &hdev->queues[0]; + u16 start, end; + + if (pci_device_is_present(hdev->pdev)) { + if (shutdown) + hiraid_shutdown_control(hdev); + else + ret = hiraid_disable_control(hdev); + } + + if (hdev->queue_count == 0) { + dev_err(hdev->dev, "warn: queue has been delete\n"); + return ret; + } + + spin_lock_irq(&adminq->cq_lock); + hiraid_process_cq(adminq, &start, &end, -1); + spin_unlock_irq(&adminq->cq_lock); + hiraid_complete_cqes(adminq, start, end); + + hiraid_pci_disable(hdev); + + hiraid_flush_running_cmds(hdev); + + return ret; +} + +static void hiraid_reset_work(struct work_struct *work) +{ + int ret = 0; + struct hiraid_dev *hdev = container_of(work, struct hiraid_dev, reset_work); + + if (hdev->state != DEV_RESETTING) { + dev_err(hdev->dev, "err, host is not reset state\n"); + return; + } + + dev_info(hdev->dev, "enter host reset\n"); + + if (hdev->ctrl_config & HIRAID_CC_ENABLE) { + dev_info(hdev->dev, "start dev_disable\n"); + ret = hiraid_dev_disable(hdev, false); + } + + if (ret) + goto out; + + ret = hiraid_pci_enable(hdev); + if (ret) + goto out; + + ret = hiraid_setup_admin_queue(hdev); + if (ret) + goto pci_disable; + + ret = hiraid_setup_io_queues(hdev); + if (ret || hdev->online_queues != hdev->last_qcnt) + goto pci_disable; + + hiraid_dev_state_trans(hdev, DEV_LIVE); + + hiraid_init_async_event(hdev); + + hiraid_queue_scan(hdev); + + return; + +pci_disable: + hiraid_pci_disable(hdev); +out: + hiraid_dev_state_trans(hdev, DEV_DEAD); + dev_err(hdev->dev, "err, host reset failed\n"); +} + +static int hiraid_reset_work_sync(struct hiraid_dev *hdev) +{ + if (!hiraid_dev_state_trans(hdev, DEV_RESETTING)) { + dev_info(hdev->dev, "can't change to reset state\n"); + return -EBUSY; + } + + if (!queue_work(work_queue, &hdev->reset_work)) { + dev_err(hdev->dev, "err, host is already in reset state\n"); + return -EBUSY; + } + + flush_work(&hdev->reset_work); + if (hdev->state != DEV_LIVE) + return -ENODEV; + + return 0; +} + +static int hiraid_wait_io_completion(struct hiraid_mapmange *mapbuf) +{ + u16 times = 0; + + do { + if (READ_ONCE(mapbuf->state) == CMD_TMO_COMPLETE) + break; + msleep(500); + times++; + } while (times <= HIRAID_WAIT_ABNL_CMD_TIMEOUT); + + /* wait command completion timeout after abort/reset success */ + if (times >= HIRAID_WAIT_ABNL_CMD_TIMEOUT) + return -ETIMEDOUT; + + return 0; +} + +static bool hiraid_tgt_rst_pending_io_count(struct request *rq, void *data, bool reserved) +{ + unsigned int id = *(unsigned int *)data; + struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq); + struct hiraid_mapmange *mapbuf; + struct hiraid_sdev_hostdata *hostdata; + + if (scmd) { + mapbuf = scsi_cmd_priv(scmd); + if ((mapbuf->state == CMD_FLIGHT) || (mapbuf->state == CMD_TIMEOUT)) { + if ((scmd->device) && (scmd->device->id == id)) { + hostdata = scmd->device->hostdata; + hostdata->pend_count++; + } + } + } + return true; +} +static bool hiraid_clean_pending_io(struct request *rq, void *data, bool reserved) +{ + struct hiraid_dev *hdev = data; + struct scsi_cmnd *scmd; + struct hiraid_mapmange *mapbuf; + + if (unlikely(!rq || !blk_mq_request_started(rq))) + return true; + + scmd = blk_mq_rq_to_pdu(rq); + mapbuf = scsi_cmd_priv(scmd); + + if ((cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_COMPLETE) != CMD_FLIGHT) && + (cmpxchg(&mapbuf->state, CMD_TIMEOUT, CMD_COMPLETE) != CMD_TIMEOUT)) + return true; + + set_host_byte(scmd, DID_NO_CONNECT); + if (mapbuf->sge_cnt) + scsi_dma_unmap(scmd); + hiraid_free_mapbuf(hdev, mapbuf); + dev_warn_ratelimited(hdev->dev, "back unfinished CQE, cid[%d] qid[%d]\n", + mapbuf->cid, mapbuf->hiraidq->qid); + scmd->scsi_done(scmd); + + return true; +} + +static void hiraid_drain_pending_ios(struct hiraid_dev *hdev) +{ + blk_mq_tagset_busy_iter(&hdev->shost->tag_set, hiraid_clean_pending_io, (void *)(hdev)); +} + +static int wait_tgt_reset_io_done(struct scsi_cmnd *scmd) +{ + u16 timeout = 0; + struct hiraid_sdev_hostdata *hostdata; + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + + hostdata = scmd->device->hostdata; + + do { + hostdata->pend_count = 0; + blk_mq_tagset_busy_iter(&hdev->shost->tag_set, hiraid_tgt_rst_pending_io_count, + (void *)(&scmd->device->id)); + + if (!hostdata->pend_count) + return 0; + + msleep(500); + timeout++; + } while (timeout <= HIRAID_WAIT_RST_IO_TIMEOUT); + + return -ETIMEDOUT; +} + +static int hiraid_abort(struct scsi_cmnd *scmd) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + struct hiraid_sdev_hostdata *hostdata; + u16 hwq, cid; + int ret; + + scsi_print_command(scmd); + + if (hdev->state != DEV_LIVE || !hiraid_wait_io_completion(mapbuf) || + hiraid_check_scmd_finished(scmd)) + return SUCCESS; + + hostdata = scmd->device->hostdata; + cid = mapbuf->cid; + hwq = mapbuf->hiraidq->qid; + + dev_warn(hdev->dev, "cid[%d] qid[%d] timeout, send abort\n", cid, hwq); + ret = hiraid_send_abort_cmd(hdev, hostdata->hdid, hwq, cid); + if (ret != -ETIME) { + ret = hiraid_wait_io_completion(mapbuf); + if (ret) { + dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, not found\n", cid, hwq); + return FAILED; + } + dev_warn(hdev->dev, "cid[%d] qid[%d] abort succ\n", cid, hwq); + return SUCCESS; + } + dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, timeout\n", cid, hwq); + return FAILED; +} + +static int hiraid_scsi_reset(struct scsi_cmnd *scmd, enum hiraid_rst_type rst) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + struct hiraid_sdev_hostdata *hostdata; + int ret; + + if (hdev->state != DEV_LIVE) + return SUCCESS; + + hostdata = scmd->device->hostdata; + + dev_warn(hdev->dev, "sdev[%d:%d] send %s reset\n", scmd->device->channel, scmd->device->id, + rst ? "bus" : "target"); + ret = hiraid_send_reset_cmd(hdev, rst, hostdata->hdid); + if ((ret == 0) || (ret == FW_EH_DEV_NONE && rst == HIRAID_RESET_TARGET)) { + if (rst == HIRAID_RESET_TARGET) { + ret = wait_tgt_reset_io_done(scmd); + if (ret) { + dev_warn(hdev->dev, "sdev[%d:%d] target has %d peding cmd, target reset failed\n", + scmd->device->channel, scmd->device->id, + hostdata->pend_count); + return FAILED; + } + } + dev_warn(hdev->dev, "sdev[%d:%d] %s reset success\n", + scmd->device->channel, scmd->device->id, rst ? "bus" : "target"); + return SUCCESS; + } + + dev_warn(hdev->dev, "sdev[%d:%d] %s reset failed\n", + scmd->device->channel, scmd->device->id, rst ? "bus" : "target"); + return FAILED; +} + +static int hiraid_target_reset(struct scsi_cmnd *scmd) +{ + return hiraid_scsi_reset(scmd, HIRAID_RESET_TARGET); +} + +static int hiraid_bus_reset(struct scsi_cmnd *scmd) +{ + return hiraid_scsi_reset(scmd, HIRAID_RESET_BUS); +} + +static int hiraid_host_reset(struct scsi_cmnd *scmd) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + + if (hdev->state != DEV_LIVE) + return SUCCESS; + + dev_warn(hdev->dev, "sdev[%d:%d] send host reset\n", + scmd->device->channel, scmd->device->id); + if (hiraid_reset_work_sync(hdev) == -EBUSY) + flush_work(&hdev->reset_work); + + if (hdev->state != DEV_LIVE) { + dev_warn(hdev->dev, "sdev[%d:%d] host reset failed\n", + scmd->device->channel, scmd->device->id); + return FAILED; + } + + dev_warn(hdev->dev, "sdev[%d:%d] host reset success\n", + scmd->device->channel, scmd->device->id); + + return SUCCESS; +} + +static pci_ers_result_t hiraid_pci_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + dev_info(hdev->dev, "pci error detected, state[%d]\n", state); + + switch (state) { + case pci_channel_io_normal: + dev_warn(hdev->dev, "channel is normal, do nothing\n"); + + return PCI_ERS_RESULT_CAN_RECOVER; + case pci_channel_io_frozen: + dev_warn(hdev->dev, "channel io frozen, need reset controller\n"); + + scsi_block_requests(hdev->shost); + + hiraid_dev_state_trans(hdev, DEV_RESETTING); + + return PCI_ERS_RESULT_NEED_RESET; + case pci_channel_io_perm_failure: + dev_warn(hdev->dev, "channel io failure, disconnect\n"); + + return PCI_ERS_RESULT_DISCONNECT; + } + + return PCI_ERS_RESULT_NEED_RESET; +} + +static pci_ers_result_t hiraid_pci_slot_reset(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + dev_info(hdev->dev, "restart after slot reset\n"); + + pci_restore_state(pdev); + + if (!queue_work(work_queue, &hdev->reset_work)) { + dev_err(hdev->dev, "err, the device is resetting state\n"); + return PCI_ERS_RESULT_NONE; + } + + flush_work(&hdev->reset_work); + + scsi_unblock_requests(hdev->shost); + + return PCI_ERS_RESULT_RECOVERED; +} + +static void hiraid_reset_pci_finish(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + dev_info(hdev->dev, "enter hiraid reset finish\n"); +} + +static ssize_t csts_pp_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) { + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_PP_MASK); + ret >>= HIRAID_CSTS_PP_SHIFT; + } + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t csts_shst_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) { + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_SHST_MASK); + ret >>= HIRAID_CSTS_SHST_SHIFT; + } + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t csts_cfs_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) { + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_CFS_MASK); + ret >>= HIRAID_CSTS_CFS_SHIFT; + } + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t csts_rdy_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_RDY); + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t fw_version_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + + return snprintf(buf, PAGE_SIZE, "%s\n", hdev->ctrl_info->fw_version); +} + +static ssize_t hdd_dispatch_store(struct device *cdev, struct device_attribute *attr, + const char *buf, size_t count) +{ + int val = 0; + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + + if (kstrtoint(buf, 0, &val) != 0) + return -EINVAL; + if (val < DISPATCH_BY_CPU || val > DISPATCH_BY_DISK) + return -EINVAL; + hdev->hdd_dispatch = val; + + return strlen(buf); +} +static ssize_t hdd_dispatch_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + + return snprintf(buf, PAGE_SIZE, "%d\n", hdev->hdd_dispatch); +} + +static DEVICE_ATTR_RO(csts_pp); +static DEVICE_ATTR_RO(csts_shst); +static DEVICE_ATTR_RO(csts_cfs); +static DEVICE_ATTR_RO(csts_rdy); +static DEVICE_ATTR_RO(fw_version); +static DEVICE_ATTR_RW(hdd_dispatch); + +static struct device_attribute *hiraid_host_attrs[] = { + &dev_attr_csts_rdy, + &dev_attr_csts_pp, + &dev_attr_csts_cfs, + &dev_attr_fw_version, + &dev_attr_csts_shst, + &dev_attr_hdd_dispatch, + NULL, +}; + +static int hiraid_get_vd_info(struct hiraid_dev *hdev, struct hiraid_vd_info *vd_info, u16 vid) +{ + struct hiraid_admin_command admin_cmd; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + int ret; + + if (hdev->state >= DEV_RESETTING) { + dev_err(hdev->dev, "err, host state[%d] is not right\n", hdev->state); + return -EBUSY; + } + + data_ptr = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.usr_cmd.opcode = USR_CMD_READ; + admin_cmd.usr_cmd.info_0.subopcode = cpu_to_le16(USR_CMD_VDINFO); + admin_cmd.usr_cmd.info_1.data_len = cpu_to_le16(USR_CMD_RDLEN); + admin_cmd.usr_cmd.info_1.param_len = cpu_to_le16(VDINFO_PARAM_LEN); + admin_cmd.usr_cmd.cdw10 = cpu_to_le32(vid); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, USRCMD_TIMEOUT); + if (!ret) + memcpy(vd_info, data_ptr, sizeof(struct hiraid_vd_info)); + + dma_free_coherent(hdev->dev, PAGE_SIZE, data_ptr, buffer_phy); + + return ret; +} + +static int hiraid_get_bgtask(struct hiraid_dev *hdev, struct hiraid_bgtask *bgtask) +{ + struct hiraid_admin_command admin_cmd; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + int ret; + + if (hdev->state >= DEV_RESETTING) { + dev_err(hdev->dev, "err, host state[%d] is not right\n", hdev->state); + return -EBUSY; + } + + data_ptr = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.usr_cmd.opcode = USR_CMD_READ; + admin_cmd.usr_cmd.info_0.subopcode = cpu_to_le16(USR_CMD_BGTASK); + admin_cmd.usr_cmd.info_1.data_len = cpu_to_le16(USR_CMD_RDLEN); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, USRCMD_TIMEOUT); + if (!ret) + memcpy(bgtask, data_ptr, sizeof(struct hiraid_bgtask)); + + dma_free_coherent(hdev->dev, PAGE_SIZE, data_ptr, buffer_phy); + + return ret; +} + +static ssize_t raid_level_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev; + struct hiraid_dev *hdev; + struct hiraid_vd_info *vd_info; + struct hiraid_sdev_hostdata *hostdata; + int ret; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + vd_info = kmalloc(sizeof(*vd_info), GFP_KERNEL); + if (!vd_info || !HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + return snprintf(buf, PAGE_SIZE, "NA\n"); + + ret = hiraid_get_vd_info(hdev, vd_info, sdev->id); + if (ret) + vd_info->rg_level = ARRAY_SIZE(raid_levels) - 1; + + ret = (vd_info->rg_level < ARRAY_SIZE(raid_levels)) ? + vd_info->rg_level : (ARRAY_SIZE(raid_levels) - 1); + + kfree(vd_info); + + return snprintf(buf, PAGE_SIZE, "RAID-%s\n", raid_levels[ret]); +} + +static ssize_t raid_state_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev; + struct hiraid_dev *hdev; + struct hiraid_vd_info *vd_info; + struct hiraid_sdev_hostdata *hostdata; + int ret; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + vd_info = kmalloc(sizeof(*vd_info), GFP_KERNEL); + if (!vd_info || !HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + return snprintf(buf, PAGE_SIZE, "NA\n"); + + ret = hiraid_get_vd_info(hdev, vd_info, sdev->id); + if (ret) { + vd_info->vd_status = 0; + vd_info->rg_id = 0xff; + } + + ret = (vd_info->vd_status < ARRAY_SIZE(raid_states)) ? vd_info->vd_status : 0; + + kfree(vd_info); + + return snprintf(buf, PAGE_SIZE, "%s\n", raid_states[ret]); +} + +static ssize_t raid_resync_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev; + struct hiraid_dev *hdev; + struct hiraid_vd_info *vd_info; + struct hiraid_bgtask *bgtask; + struct hiraid_sdev_hostdata *hostdata; + u8 rg_id, i, progress = 0; + int ret; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + vd_info = kmalloc(sizeof(*vd_info), GFP_KERNEL); + if (!vd_info || !HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + return snprintf(buf, PAGE_SIZE, "NA\n"); + + ret = hiraid_get_vd_info(hdev, vd_info, sdev->id); + if (ret) + goto out; + + rg_id = vd_info->rg_id; + + bgtask = (struct hiraid_bgtask *)vd_info; + ret = hiraid_get_bgtask(hdev, bgtask); + if (ret) + goto out; + for (i = 0; i < bgtask->task_num; i++) { + if ((bgtask->bgtask[i].type == BGTASK_TYPE_REBUILD) && + (le16_to_cpu(bgtask->bgtask[i].vd_id) == rg_id)) + progress = bgtask->bgtask[i].progress; + } + +out: + kfree(vd_info); + return snprintf(buf, PAGE_SIZE, "%d\n", progress); +} + +static ssize_t dispatch_hwq_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hiraid_sdev_hostdata *hostdata; + + hostdata = to_scsi_device(dev)->hostdata; + return snprintf(buf, PAGE_SIZE, "%d\n", hostdata->hwq); +} + +static ssize_t dispatch_hwq_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + int val; + struct hiraid_dev *hdev; + struct scsi_device *sdev; + struct hiraid_sdev_hostdata *hostdata; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + if (kstrtoint(buf, 0, &val) != 0) + return -EINVAL; + if (val <= 0 || val >= hdev->online_queues) + return -EINVAL; + if (!hiraid_disk_is_hdd(hostdata->attr)) + return -EINVAL; + + hostdata->hwq = val; + return strlen(buf); +} + +static DEVICE_ATTR_RO(raid_level); +static DEVICE_ATTR_RO(raid_state); +static DEVICE_ATTR_RO(raid_resync); +static DEVICE_ATTR_RW(dispatch_hwq); + +static struct device_attribute *hiraid_dev_attrs[] = { + &dev_attr_raid_state, + &dev_attr_raid_level, + &dev_attr_raid_resync, + &dev_attr_dispatch_hwq, + NULL, +}; + +static struct pci_error_handlers hiraid_err_handler = { + .error_detected = hiraid_pci_error_detected, + .slot_reset = hiraid_pci_slot_reset, + .reset_done = hiraid_reset_pci_finish, +}; + +static int hiraid_sysfs_host_reset(struct Scsi_Host *shost, int reset_type) +{ + int ret; + struct hiraid_dev *hdev = shost_priv(shost); + + dev_info(hdev->dev, "start sysfs host reset cmd\n"); + ret = hiraid_reset_work_sync(hdev); + dev_info(hdev->dev, "stop sysfs host reset cmd[%d]\n", ret); + + return ret; +} + +static int hiraid_scan_finished(struct Scsi_Host *shost, unsigned long time) +{ + struct hiraid_dev *hdev = shost_priv(shost); + + hiraid_scan_work(&hdev->scan_work); + + return 1; +} + +static struct scsi_host_template hiraid_driver_template = { + .module = THIS_MODULE, + .name = "hiraid", + .proc_name = "hiraid", + .queuecommand = hiraid_queue_command, + .slave_alloc = hiraid_slave_alloc, + .slave_destroy = hiraid_slave_destroy, + .slave_configure = hiraid_slave_configure, + .scan_finished = hiraid_scan_finished, + .eh_timed_out = hiraid_timed_out, + .eh_abort_handler = hiraid_abort, + .eh_target_reset_handler = hiraid_target_reset, + .eh_bus_reset_handler = hiraid_bus_reset, + .eh_host_reset_handler = hiraid_host_reset, + .change_queue_depth = scsi_change_queue_depth, + .this_id = -1, + .unchecked_isa_dma = 0, + .shost_attrs = hiraid_host_attrs, + .sdev_attrs = hiraid_dev_attrs, + .host_reset = hiraid_sysfs_host_reset, +}; + +static void hiraid_shutdown(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + hiraid_delete_io_queues(hdev); + hiraid_disable_admin_queue(hdev, true); +} + +static bool hiraid_bsg_is_valid(struct bsg_job *job) +{ + u64 timeout = 0; + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_bsg_request *bsg_req = job->request; + struct hiraid_dev *hdev = shost_priv(dev_to_shost(job->dev)); + + if (bsg_req == NULL || job->request_len != sizeof(struct hiraid_bsg_request)) + return false; + + switch (bsg_req->msgcode) { + case HIRAID_BSG_ADMIN: + timeout = msecs_to_jiffies(bsg_req->admcmd.timeout_ms); + break; + case HIRAID_BSG_IOPTHRU: + timeout = msecs_to_jiffies(bsg_req->pthrucmd.timeout_ms); + break; + default: + dev_info(hdev->dev, "bsg unsupport msgcode[%d]\n", bsg_req->msgcode); + return false; + } + + if ((timeout + CTL_RST_TIME) > rq->timeout) { + dev_err(hdev->dev, "bsg invalid time\n"); + return false; + } + + return true; +} + +/* bsg dispatch user command */ +static int hiraid_bsg_dispatch(struct bsg_job *job) +{ + struct Scsi_Host *shost = dev_to_shost(job->dev); + struct hiraid_dev *hdev = shost_priv(shost); + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_bsg_request *bsg_req = job->request; + int ret = -ENOMSG; + + job->reply_len = 0; + + if (!hiraid_bsg_is_valid(job)) { + bsg_job_done(job, ret, 0); + return 0; + } + + dev_log_dbg(hdev->dev, "bsg msgcode[%d] msglen[%d] timeout[%d];" + "reqnsge[%d], reqlen[%d]\n", + bsg_req->msgcode, job->request_len, rq->timeout, + job->request_payload.sg_cnt, job->request_payload.payload_len); + + switch (bsg_req->msgcode) { + case HIRAID_BSG_ADMIN: + ret = hiraid_user_send_admcmd(hdev, job); + break; + case HIRAID_BSG_IOPTHRU: + ret = hiraid_user_send_ptcmd(hdev, job); + break; + default: + break; + } + + if (ret > 0) + ret = ret | (ret << 8); + + bsg_job_done(job, ret, 0); + return 0; +} + +static inline void hiraid_unregist_bsg(struct hiraid_dev *hdev) +{ + if (hdev->bsg_queue) { + bsg_unregister_queue(hdev->bsg_queue); + blk_cleanup_queue(hdev->bsg_queue); + } +} +static int hiraid_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct hiraid_dev *hdev; + struct Scsi_Host *shost; + int node, ret; + char bsg_name[15]; + + shost = scsi_host_alloc(&hiraid_driver_template, sizeof(*hdev)); + if (!shost) { + dev_err(&pdev->dev, "failed to allocate scsi host\n"); + return -ENOMEM; + } + hdev = shost_priv(shost); + hdev->pdev = pdev; + hdev->dev = get_device(&pdev->dev); + + node = dev_to_node(hdev->dev); + if (node == NUMA_NO_NODE) { + node = first_memory_node; + set_dev_node(hdev->dev, node); + } + hdev->numa_node = node; + hdev->shost = shost; + hdev->instance = shost->host_no; + pci_set_drvdata(pdev, hdev); + + ret = hiraid_dev_map(hdev); + if (ret) + goto put_dev; + + init_rwsem(&hdev->dev_rwsem); + INIT_WORK(&hdev->scan_work, hiraid_scan_work); + INIT_WORK(&hdev->timesyn_work, hiraid_timesyn_work); + INIT_WORK(&hdev->reset_work, hiraid_reset_work); + INIT_WORK(&hdev->fwact_work, hiraid_fwactive_work); + spin_lock_init(&hdev->state_lock); + + ret = hiraid_alloc_resources(hdev); + if (ret) + goto dev_unmap; + + ret = hiraid_pci_enable(hdev); + if (ret) + goto resources_free; + + ret = hiraid_setup_admin_queue(hdev); + if (ret) + goto pci_disable; + + ret = hiraid_init_control_info(hdev); + if (ret) + goto disable_admin_q; + + ret = hiraid_setup_io_queues(hdev); + if (ret) + goto disable_admin_q; + + hiraid_shost_init(hdev); + + ret = scsi_add_host(hdev->shost, hdev->dev); + if (ret) { + dev_err(hdev->dev, "add shost to system failed, ret[%d]\n", ret); + goto remove_io_queues; + } + + snprintf(bsg_name, sizeof(bsg_name), "hiraid%d", shost->host_no); + hdev->bsg_queue = bsg_setup_queue(&shost->shost_gendev, bsg_name, hiraid_bsg_dispatch, + NULL, hiraid_get_max_cmd_size(hdev)); + if (IS_ERR(hdev->bsg_queue)) { + dev_err(hdev->dev, "err, setup bsg failed\n"); + hdev->bsg_queue = NULL; + goto remove_io_queues; + } + + if (hdev->online_queues == HIRAID_ADMIN_QUEUE_NUM) { + dev_warn(hdev->dev, "warn: only admin queue can be used\n"); + return 0; + } + + hdev->state = DEV_LIVE; + + hiraid_init_async_event(hdev); + + ret = hiraid_dev_list_init(hdev); + if (ret) + goto unregist_bsg; + + ret = hiraid_configure_timestamp(hdev); + if (ret) + dev_warn(hdev->dev, "time synchronization failed\n"); + + ret = hiraid_alloc_io_ptcmds(hdev); + if (ret) + goto unregist_bsg; + + scsi_scan_host(hdev->shost); + + return 0; + +unregist_bsg: + hiraid_unregist_bsg(hdev); +remove_io_queues: + hiraid_delete_io_queues(hdev); +disable_admin_q: + hiraid_free_sense_buffer(hdev); + hiraid_disable_admin_queue(hdev, false); +pci_disable: + hiraid_free_all_queues(hdev); + hiraid_pci_disable(hdev); +resources_free: + hiraid_free_resources(hdev); +dev_unmap: + hiraid_dev_unmap(hdev); +put_dev: + put_device(hdev->dev); + scsi_host_put(shost); + + return -ENODEV; +} + +static void hiraid_remove(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + struct Scsi_Host *shost = hdev->shost; + + dev_info(hdev->dev, "enter hiraid remove\n"); + + hiraid_dev_state_trans(hdev, DEV_DELETING); + flush_work(&hdev->reset_work); + + if (!pci_device_is_present(pdev)) + hiraid_flush_running_cmds(hdev); + + hiraid_unregist_bsg(hdev); + scsi_remove_host(shost); + hiraid_free_io_ptcmds(hdev); + kfree(hdev->dev_info); + hiraid_delete_io_queues(hdev); + hiraid_free_sense_buffer(hdev); + hiraid_disable_admin_queue(hdev, false); + hiraid_free_all_queues(hdev); + hiraid_pci_disable(hdev); + hiraid_free_resources(hdev); + hiraid_dev_unmap(hdev); + put_device(hdev->dev); + scsi_host_put(shost); + + dev_info(hdev->dev, "exit hiraid remove\n"); +} + +static const struct pci_device_id hiraid_hw_card_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, HIRAID_SERVER_DEVICE_HBA_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, HIRAID_SERVER_DEVICE_RAID_DID) }, + { 0, } +}; +MODULE_DEVICE_TABLE(pci, hiraid_hw_card_ids); + +static struct pci_driver hiraid_driver = { + .name = "hiraid", + .id_table = hiraid_hw_card_ids, + .probe = hiraid_probe, + .remove = hiraid_remove, + .shutdown = hiraid_shutdown, + .err_handler = &hiraid_err_handler, +}; + +static int __init hiraid_init(void) +{ + int ret; + + work_queue = alloc_workqueue("hiraid-wq", WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); + if (!work_queue) + return -ENOMEM; + + hiraid_class = class_create(THIS_MODULE, "hiraid"); + if (IS_ERR(hiraid_class)) { + ret = PTR_ERR(hiraid_class); + goto destroy_wq; + } + + ret = pci_register_driver(&hiraid_driver); + if (ret < 0) + goto destroy_class; + + return 0; + +destroy_class: + class_destroy(hiraid_class); +destroy_wq: + destroy_workqueue(work_queue); + + return ret; +} + +static void __exit hiraid_exit(void) +{ + pci_unregister_driver(&hiraid_driver); + class_destroy(hiraid_class); + destroy_workqueue(work_queue); +} + +MODULE_AUTHOR("Huawei Technologies CO., Ltd"); +MODULE_DESCRIPTION("Huawei RAID driver"); +MODULE_LICENSE("GPL"); +MODULE_VERSION(HIRAID_DRV_VERSION); +module_init(hiraid_init); +module_exit(hiraid_exit); -- 2.22.0.windows.1

2 1

[PATCH openEuler-1.0-LTS] Revert "tcp: fix delayed ACKs for MSS boundary condition"
by Dong Chenchen 16 Nov '23

16 Nov '23

From: dongchenchen <dongchenchen2(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8GYWB CVE: NA -------------------------------- This reverts commit 389055ab28760dd7b25c6996c6647b0a37e0a34e. Signed-off-by: dongchenchen <dongchenchen2(a)huawei.com> --- net/ipv4/tcp_input.c | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f8b1ace50f7a..a12598dabb80 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -172,19 +172,6 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) if (unlikely(len > icsk->icsk_ack.rcv_mss + MAX_TCP_OPTION_SPACE)) tcp_gro_dev_warn(sk, skb, len); - /* If the skb has a len of exactly 1*MSS and has the PSH bit - * set then it is likely the end of an application write. So - * more data may not be arriving soon, and yet the data sender - * may be waiting for an ACK if cwnd-bound or using TX zero - * copy. So we set ICSK_ACK_PUSHED here so that - * tcp_cleanup_rbuf() will send an ACK immediately if the app - * reads all of the data and is not ping-pong. If len > MSS - * then this logic does not matter (and does not hurt) because - * tcp_cleanup_rbuf() will always ACK immediately if the app - * reads data and there is more than an MSS of unACKed data. - */ - if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_PSH) - icsk->icsk_ack.pending |= ICSK_ACK_PUSHED; } else { /* Otherwise, we make more careful check taking into account, * that SACKs block is variable. -- 2.25.1

2 1

[PATCH OLK-5.10] SCSI: hisi_raid: support SPxxx series RAID/HBA controllers
by z00848923 16 Nov '23

16 Nov '23

driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I89D3P CVE: NA ------------------------------------------ This commit is to support SPxxx RAID/HBA controllers. RAID controllers support RAID 0/1/5/6/10/50/60 modes. HBA controlllers support RAID 0/1/10 modes. RAID/HBA support SAS/SATA HDD/SSD. Signed-off-by: z00848923 <zhanglei48(a)huawei.com> --- Documentation/scsi/hisi_raid.rst | 84 + MAINTAINERS | 7 + arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/scsi/Kconfig | 1 + drivers/scsi/Makefile | 1 + drivers/scsi/hisi_raid/Kconfig | 14 + drivers/scsi/hisi_raid/Makefile | 7 + drivers/scsi/hisi_raid/hiraid.h | 760 +++++ drivers/scsi/hisi_raid/hiraid_main.c | 3982 ++++++++++++++++++++++++ 10 files changed, 4858 insertions(+) create mode 100644 Documentation/scsi/hisi_raid.rst create mode 100644 drivers/scsi/hisi_raid/Kconfig create mode 100644 drivers/scsi/hisi_raid/Makefile create mode 100644 drivers/scsi/hisi_raid/hiraid.h create mode 100644 drivers/scsi/hisi_raid/hiraid_main.c diff --git a/Documentation/scsi/hisi_raid.rst b/Documentation/scsi/hisi_raid.rst new file mode 100644 index 000000000000..523a6763a7fd --- /dev/null +++ b/Documentation/scsi/hisi_raid.rst @@ -0,0 +1,84 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================== +hisi_raid - HUAWEI SCSI RAID Controller driver +============================================== + +This file describes the hisi_raid SCSI driver for HUAWEI +RAID controllers. The hisi_raid driver is the first +generation RAID driver. + +For hisi_raid controller support, enable the hisi_raid driver +when configuring the kernel. + +hisi_raid specific entries in /sys +================================= + +hisi_raid host attributes +------------------------ + - /sys/class/scsi_host/host*/csts_pp + - /sys/class/scsi_host/host*/csts_shst + - /sys/class/scsi_host/host*/csts_cfs + - /sys/class/scsi_host/host*/csts_rdy + - /sys/class/scsi_host/host*/fw_version + + The host csts_pp attribute is a read only attribute. This attribute + indicates whether the controller is processing commands. If this attribute + is set to ‘1’, then the controller is processing commands normally. If + this attribute is cleared to ‘0’, then the controller has temporarily stopped + processing commands in order to handle an event (e.g., firmware activation). + + The host csts_shst attribute is a read only attribute. This attribute + indicates status of shutdown processing.The shutdown status values are defined + as: + ====== ============================== + Value Definition + ====== ============================== + 00b Normal operation + 01b Shutdown processing occurring + 10b Shutdown processing complete + 11b Reserved + ====== ============================== + The host csts_cfs attribute is a read only attribute. This attribute is set to + ’1’ when a fatal controller error occurred that could not be communicated in the + appropriate Completion Queue. This bit is cleared to ‘0’ when a fatal controller + error has not occurred. + + The host csts_rdy attribute is a read only attribute. This attribute is set to + ‘1’ when the controller is ready to process submission queue entries. + + The fw_version attribute is read-only and will return the driver version and the + controller firmware version. + +hisi_raid scsi device attributes +------------------------------ + - /sys/class/scsi_device/X\:X\:X\:X/device/raid_level + - /sys/class/scsi_device/X\:X\:X\:X/device/raid_state + - /sys/class/scsi_device/X\:X\:X\:X/device/raid_resync + + The device raid_level attribute is a read only attribute. This attribute indicates + RAID level of scsi device(will dispaly "NA" if scsi device is not virtual disk type). + + The device raid_state attribute is read-only and indicates RAID status of scsi + device(will dispaly "NA" if scsi device is not virtual disk type). + + The device raid_resync attribute is read-only and indicates RAID rebuild processing + of scsi device(will dispaly "NA" if scsi device is not virtual disk type). + +Supported devices +================= + + =================== ======= ======================================= + PCI ID (pci.ids) OEM Product + =================== ======= ======================================= + 19E5:3858 HUAWEI SP186-M-8i(HBA:8Ports) + 19E5:3858 HUAWEI SP186-M-16i(HBA:16Ports) + 19E5:3858 HUAWEI SP186-M-32i(HBA:32Ports) + 19E5:3858 HUAWEI SP186-M-40i(HBA:40Ports) + 19E5:3758 HUAWEI SP686C-M-16i(RAID:16Ports,2G cache) + 19E5:3758 HUAWEI SP686C-M-16i(RAID:16Ports,4G cache) + 19E5:3758 HUAWEI SP686C-MH-32i(RAID:32Ports,4G cache) + 19E5:3758 HUAWEI SP686C-M-40i(RAID:40Ports,2G cache) + 19E5:3758 HUAWEI SP686C-M-40i(RAID:40Ports,4G cache) + =================== ======= ======================================= + diff --git a/MAINTAINERS b/MAINTAINERS index a7815fd1072f..8324f56a2096 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8070,6 +8070,13 @@ M: Yonglong Liu <liuyonglong(a)huawei.com> S: Supported F: drivers/ptp/ptp_hisi.c +HISI_RAID SCSI RAID DRIVERS +M: Zhang Lei <zhanglei48(a)huawei.com> +L: linux-scsi(a)vger.kernel.org +S: Maintained +F: Documentation/scsi/hisi_raid.rst +F: drivers/scsi/hisi_raid/ + HMM - Heterogeneous Memory Management M: Jérôme Glisse <jglisse(a)redhat.com> L: linux-mm(a)kvack.org diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index ec758f0530c1..b9a50ef6d768 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -2413,6 +2413,7 @@ CONFIG_SCSI_MPT2SAS_MAX_SGE=128 CONFIG_SCSI_MPT3SAS_MAX_SGE=128 CONFIG_SCSI_MPT2SAS=m CONFIG_SCSI_SMARTPQI=m +CONFIG_SCSI_HISI_RAID=m # CONFIG_SCSI_UFSHCD is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_MYRB is not set diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 5171aa50736b..43b5294326e6 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -2369,6 +2369,7 @@ CONFIG_SCSI_MPT2SAS_MAX_SGE=128 CONFIG_SCSI_MPT3SAS_MAX_SGE=128 CONFIG_SCSI_MPT2SAS=m CONFIG_SCSI_SMARTPQI=m +CONFIG_SCSI_HISI_RAID=m # CONFIG_SCSI_UFSHCD is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_BUSLOGIC is not set diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index a9da1b2dec4a..41ef664cf0ed 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -473,6 +473,7 @@ source "drivers/scsi/megaraid/Kconfig.megaraid" source "drivers/scsi/sssraid/Kconfig" source "drivers/scsi/mpt3sas/Kconfig" source "drivers/scsi/smartpqi/Kconfig" +source "drivers/scsi/hisi_raid/Kconfig" source "drivers/scsi/ufs/Kconfig" config SCSI_HPTIOP diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index c2a1efa16912..8f26dbb5ee37 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -101,6 +101,7 @@ obj-$(CONFIG_MEGARAID_LEGACY) += megaraid.o obj-$(CONFIG_MEGARAID_NEWGEN) += megaraid/ obj-$(CONFIG_MEGARAID_SAS) += megaraid/ obj-$(CONFIG_SCSI_MPT3SAS) += mpt3sas/ +obj-$(CONFIG_SCSI_HISI_RAID) += hisi_raid/ obj-$(CONFIG_SCSI_UFSHCD) += ufs/ obj-$(CONFIG_SCSI_ACARD) += atp870u.o obj-$(CONFIG_SCSI_SUNESP) += esp_scsi.o sun_esp.o diff --git a/drivers/scsi/hisi_raid/Kconfig b/drivers/scsi/hisi_raid/Kconfig new file mode 100644 index 000000000000..d402dc45a7c1 --- /dev/null +++ b/drivers/scsi/hisi_raid/Kconfig @@ -0,0 +1,14 @@ +# +# Kernel configuration file for the hisi_raid +# + +config SCSI_HISI_RAID + tristate "Huawei Hisi_Raid Adapter" + depends on PCI && SCSI + select BLK_DEV_BSGLIB + depends on ARM64 || X86_64 + help + This driver supports hisi_raid SPxxx serial RAID controller, which has + PCI Express Gen4 interface with host and supports SAS/SATA HDD/SSD. + To compile this driver as a module, choose M here: the module will + be called hisi_raid. diff --git a/drivers/scsi/hisi_raid/Makefile b/drivers/scsi/hisi_raid/Makefile new file mode 100644 index 000000000000..b71a675f4190 --- /dev/null +++ b/drivers/scsi/hisi_raid/Makefile @@ -0,0 +1,7 @@ +# +# Makefile for the hisi_raid drivers. +# + +obj-$(CONFIG_SCSI_HISI_RAID) += hiraid.o + +hiraid-objs := hiraid_main.o diff --git a/drivers/scsi/hisi_raid/hiraid.h b/drivers/scsi/hisi_raid/hiraid.h new file mode 100644 index 000000000000..1ebc3dd3f2ec --- /dev/null +++ b/drivers/scsi/hisi_raid/hiraid.h @@ -0,0 +1,760 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Huawei Technologies Co., Ltd */ + +#ifndef __HIRAID_H_ +#define __HIRAID_H_ + +#define HIRAID_HDD_PD_QD 64 +#define HIRAID_HDD_VD_QD 256 +#define HIRAID_SSD_PD_QD 64 +#define HIRAID_SSD_VD_QD 256 + +#define BGTASK_TYPE_REBUILD 4 +#define USR_CMD_READ 0xc2 +#define USR_CMD_RDLEN 0x1000 +#define USR_CMD_VDINFO 0x704 +#define USR_CMD_BGTASK 0x504 +#define VDINFO_PARAM_LEN 0x04 + +#define HIRAID_DEFAULT_MAX_CHANNEL 4 +#define HIRAID_DEFAULT_MAX_ID 240 +#define HIRAID_DEFAULT_MAX_LUN_PER_HOST 8 + +#define FUA_MASK 0x08 + +#define HIRAID_IO_SQES 7 +#define HIRAID_IO_CQES 4 +#define PRP_ENTRY_SIZE 8 + +#define EXTRA_POOL_SIZE 256 +#define MAX_EXTRA_POOL_NUM 16 +#define MAX_CMD_PER_DEV 64 +#define MAX_CDB_LEN 16 + +#define HIRAID_AQ_DEPTH 128 +#define HIRAID_ASYN_COMMANDS 16 +#define HIRAID_AQ_BLK_MQ_DEPTH (HIRAID_AQ_DEPTH - HIRAID_ASYN_COMMANDS) +#define HIRAID_AQ_MQ_TAG_DEPTH (HIRAID_AQ_BLK_MQ_DEPTH - 1) + +#define HIRAID_ADMIN_QUEUE_NUM 1 +#define HIRAID_PTHRU_CMDS_PERQ 1 +#define HIRAID_TOTAL_PTCMDS(qn) (HIRAID_PTHRU_CMDS_PERQ * (qn)) + +#define HIRAID_DEV_INFO_ATTR_BOOT(attr) ((attr) & 0x01) +#define HIRAID_DEV_INFO_ATTR_VD(attr) (((attr) & 0x02) == 0x0) +#define HIRAID_DEV_INFO_ATTR_PT(attr) (((attr) & 0x22) == 0x02) +#define HIRAID_DEV_INFO_ATTR_RAWDISK(attr) ((attr) & 0x20) +#define HIRAID_DEV_DISK_TYPE(attr) ((attr) & 0x1e) + +#define HIRAID_DEV_INFO_FLAG_VALID(flag) ((flag) & 0x01) +#define HIRAID_DEV_INFO_FLAG_CHANGE(flag) ((flag) & 0x02) + +#define HIRAID_CAP_MQES(cap) ((cap) & 0xffff) +#define HIRAID_CAP_STRIDE(cap) (((cap) >> 32) & 0xf) +#define HIRAID_CAP_MPSMIN(cap) (((cap) >> 48) & 0xf) +#define HIRAID_CAP_MPSMAX(cap) (((cap) >> 52) & 0xf) +#define HIRAID_CAP_TIMEOUT(cap) (((cap) >> 24) & 0xff) +#define HIRAID_CAP_DMAMASK(cap) (((cap) >> 37) & 0xff) + +#define IO_SQE_SIZE sizeof(struct hiraid_scsi_io_cmd) +#define ADMIN_SQE_SIZE sizeof(struct hiraid_admin_command) +#define SQE_SIZE(qid) (((qid) > 0) ? IO_SQE_SIZE : ADMIN_SQE_SIZE) +#define CQ_SIZE(depth) ((depth) * sizeof(struct hiraid_completion)) +#define SQ_SIZE(qid, depth) ((depth) * SQE_SIZE(qid)) + +#define SENSE_SIZE(depth) ((depth) * SCSI_SENSE_BUFFERSIZE) + +#define IO_6_DEFAULT_TX_LEN 256 + +#define MAX_DEV_ENTRY_PER_PAGE_4K 340 + +#define MAX_REALTIME_BGTASK_NUM 32 + +#define PCI_VENDOR_ID_HUAWEI_LOGIC 0x19E5 +#define HIRAID_SERVER_DEVICE_HBA_DID 0x3858 +#define HIRAID_SERVER_DEVICE_RAID_DID 0x3758 + +enum { + HIRAID_SC_SUCCESS = 0x0, + HIRAID_SC_INVALID_OPCODE = 0x1, + HIRAID_SC_INVALID_FIELD = 0x2, + + HIRAID_SC_ABORT_LIMIT = 0x103, + HIRAID_SC_ABORT_MISSING = 0x104, + HIRAID_SC_ASYNC_LIMIT = 0x105, + + HIRAID_SC_DNR = 0x4000, +}; + +enum { + HIRAID_REG_CAP = 0x0000, + HIRAID_REG_CC = 0x0014, + HIRAID_REG_CSTS = 0x001c, + HIRAID_REG_AQA = 0x0024, + HIRAID_REG_ASQ = 0x0028, + HIRAID_REG_ACQ = 0x0030, + HIRAID_REG_DBS = 0x1000, +}; + +enum { + HIRAID_CC_ENABLE = 1 << 0, + HIRAID_CC_CSS_NVM = 0 << 4, + HIRAID_CC_MPS_SHIFT = 7, + HIRAID_CC_AMS_SHIFT = 11, + HIRAID_CC_SHN_SHIFT = 14, + HIRAID_CC_IOSQES_SHIFT = 16, + HIRAID_CC_IOCQES_SHIFT = 20, + HIRAID_CC_AMS_RR = 0 << HIRAID_CC_AMS_SHIFT, + HIRAID_CC_SHN_NONE = 0 << HIRAID_CC_SHN_SHIFT, + HIRAID_CC_IOSQES = HIRAID_IO_SQES << HIRAID_CC_IOSQES_SHIFT, + HIRAID_CC_IOCQES = HIRAID_IO_CQES << HIRAID_CC_IOCQES_SHIFT, + HIRAID_CC_SHN_NORMAL = 1 << HIRAID_CC_SHN_SHIFT, + HIRAID_CC_SHN_MASK = 3 << HIRAID_CC_SHN_SHIFT, + HIRAID_CSTS_CFS_SHIFT = 1, + HIRAID_CSTS_SHST_SHIFT = 2, + HIRAID_CSTS_PP_SHIFT = 5, + HIRAID_CSTS_RDY = 1 << 0, + HIRAID_CSTS_SHST_CMPLT = 2 << 2, + HIRAID_CSTS_SHST_MASK = 3 << 2, + HIRAID_CSTS_CFS_MASK = 1 << HIRAID_CSTS_CFS_SHIFT, + HIRAID_CSTS_PP_MASK = 1 << HIRAID_CSTS_PP_SHIFT, +}; + +enum { + HIRAID_ADMIN_DELETE_SQ = 0x00, + HIRAID_ADMIN_CREATE_SQ = 0x01, + HIRAID_ADMIN_DELETE_CQ = 0x04, + HIRAID_ADMIN_CREATE_CQ = 0x05, + HIRAID_ADMIN_ABORT_CMD = 0x08, + HIRAID_ADMIN_SET_FEATURES = 0x09, + HIRAID_ADMIN_ASYNC_EVENT = 0x0c, + HIRAID_ADMIN_GET_INFO = 0xc6, + HIRAID_ADMIN_RESET = 0xc8, +}; + +enum { + HIRAID_GET_CTRL_INFO = 0, + HIRAID_GET_DEVLIST_INFO = 1, +}; + +enum hiraid_rst_type { + HIRAID_RESET_TARGET = 0, + HIRAID_RESET_BUS = 1, +}; + +enum { + HIRAID_ASYN_EVENT_ERROR = 0, + HIRAID_ASYN_EVENT_NOTICE = 2, + HIRAID_ASYN_EVENT_VS = 7, +}; + +enum { + HIRAID_ASYN_DEV_CHANGED = 0x00, + HIRAID_ASYN_FW_ACT_START = 0x01, + HIRAID_ASYN_HOST_PROBING = 0x10, +}; + +enum { + HIRAID_ASYN_TIMESYN = 0x00, + HIRAID_ASYN_FW_ACT_FINISH = 0x02, + HIRAID_ASYN_EVENT_MIN = 0x80, + HIRAID_ASYN_EVENT_MAX = 0xff, +}; + +enum { + HIRAID_CMD_WRITE = 0x01, + HIRAID_CMD_READ = 0x02, + + HIRAID_CMD_NONRW_NONE = 0x80, + HIRAID_CMD_NONRW_TODEV = 0x81, + HIRAID_CMD_NONRW_FROMDEV = 0x82, +}; + +enum { + HIRAID_QUEUE_PHYS_CONTIG = (1 << 0), + HIRAID_CQ_IRQ_ENABLED = (1 << 1), + + HIRAID_FEATURE_NUM_QUEUES = 0x07, + HIRAID_FEATURE_ASYNC_EVENT = 0x0b, + HIRAID_FEATURE_TIMESTAMP = 0x0e, +}; + +enum hiraid_dev_state { + DEV_NEW, + DEV_LIVE, + DEV_RESETTING, + DEV_DELETING, + DEV_DEAD, +}; + +enum { + HIRAID_CARD_HBA, + HIRAID_CARD_RAID, +}; + +enum hiraid_cmd_type { + HIRAID_CMD_ADMIN, + HIRAID_CMD_PTHRU, +}; + +enum { + SQE_FLAG_SGL_METABUF = (1 << 6), + SQE_FLAG_SGL_METASEG = (1 << 7), + SQE_FLAG_SGL_ALL = SQE_FLAG_SGL_METABUF | SQE_FLAG_SGL_METASEG, +}; + +enum hiraid_cmd_state { + CMD_IDLE = 0, + CMD_FLIGHT = 1, + CMD_COMPLETE = 2, + CMD_TIMEOUT = 3, + CMD_TMO_COMPLETE = 4, +}; + +enum { + HIRAID_BSG_ADMIN, + HIRAID_BSG_IOPTHRU, +}; + +enum { + HIRAID_SAS_HDD_VD = 0x04, + HIRAID_SATA_HDD_VD = 0x08, + HIRAID_SAS_SSD_VD = 0x0c, + HIRAID_SATA_SSD_VD = 0x10, + HIRAID_NVME_SSD_VD = 0x14, + HIRAID_SAS_HDD_PD = 0x06, + HIRAID_SATA_HDD_PD = 0x0a, + HIRAID_SAS_SSD_PD = 0x0e, + HIRAID_SATA_SSD_PD = 0x12, + HIRAID_NVME_SSD_PD = 0x16, +}; + +enum { + DISPATCH_BY_CPU, + DISPATCH_BY_DISK, +}; + +struct hiraid_completion { + __le32 result; + union { + struct { + __u8 sense_len; + __u8 resv[3]; + }; + __le32 result1; + }; + __le16 sq_head; + __le16 sq_id; + __le16 cmd_id; + __le16 status; +}; + +struct hiraid_ctrl_info { + __le32 nd; + __le16 max_cmds; + __le16 max_channel; + __le32 max_tgt_id; + __le16 max_lun; + __le16 max_num_sge; + __le16 lun_num_boot; + __u8 mdts; + __u8 acl; + __u8 asynevent; + __u8 card_type; + __u8 pt_use_sgl; + __u8 rsvd; + __le32 rtd3e; + __u8 sn[32]; + __u8 fw_version[16]; + __u8 rsvd1[4020]; +}; + +struct hiraid_dev { + struct pci_dev *pdev; + struct device *dev; + struct Scsi_Host *shost; + struct hiraid_queue *queues; + struct dma_pool *prp_page_pool; + struct dma_pool *prp_extra_pool[MAX_EXTRA_POOL_NUM]; + void __iomem *bar; + u32 max_qid; + u32 num_vecs; + u32 queue_count; + u32 ioq_depth; + u32 db_stride; + u32 __iomem *dbs; + struct rw_semaphore dev_rwsem; + int numa_node; + u32 page_size; + u32 ctrl_config; + u32 online_queues; + u64 cap; + u32 scsi_qd; + u32 instance; + struct hiraid_ctrl_info *ctrl_info; + struct hiraid_dev_info *dev_info; + + struct hiraid_cmd *adm_cmds; + struct list_head adm_cmd_list; + spinlock_t adm_cmd_lock; + + struct hiraid_cmd *io_ptcmds; + struct list_head io_pt_list; + spinlock_t io_pt_lock; + + struct work_struct scan_work; + struct work_struct timesyn_work; + struct work_struct reset_work; + struct work_struct fwact_work; + + enum hiraid_dev_state state; + spinlock_t state_lock; + + void *sense_buffer_virt; + dma_addr_t sense_buffer_phy; + u32 last_qcnt; + u8 hdd_dispatch; + + struct request_queue *bsg_queue; +}; + +struct hiraid_sgl_desc { + __le64 addr; + __le32 length; + __u8 rsvd[3]; + __u8 type; +}; + +union hiraid_data_ptr { + struct { + __le64 prp1; + __le64 prp2; + }; + struct hiraid_sgl_desc sgl; +}; + +struct hiraid_admin_com_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le32 cdw2[4]; + union hiraid_data_ptr dptr; + __le32 cdw10; + __le32 cdw11; + __le32 cdw12; + __le32 cdw13; + __le32 cdw14; + __le32 cdw15; +}; + +struct hiraid_features { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u64 rsvd2[2]; + union hiraid_data_ptr dptr; + __le32 fid; + __le32 dword11; + __le32 dword12; + __le32 dword13; + __le32 dword14; + __le32 dword15; +}; + +struct hiraid_create_cq { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __u32 rsvd1[5]; + __le64 prp1; + __u64 rsvd8; + __le16 cqid; + __le16 qsize; + __le16 cq_flags; + __le16 irq_vector; + __u32 rsvd12[4]; +}; + +struct hiraid_create_sq { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __u32 rsvd1[5]; + __le64 prp1; + __u64 rsvd8; + __le16 sqid; + __le16 qsize; + __le16 sq_flags; + __le16 cqid; + __u32 rsvd12[4]; +}; + +struct hiraid_delete_queue { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __u32 rsvd1[9]; + __le16 qid; + __u16 rsvd10; + __u32 rsvd11[5]; +}; + +struct hiraid_get_info { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u32 rsvd2[4]; + union hiraid_data_ptr dptr; + __u8 type; + __u8 rsvd10[3]; + __le32 cdw11; + __u32 rsvd12[4]; +}; + +struct hiraid_usr_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + union { + struct { + __le16 subopcode; + __le16 rsvd1; + } info_0; + __le32 cdw2; + }; + union { + struct { + __le16 data_len; + __le16 param_len; + } info_1; + __le32 cdw3; + }; + __u64 metadata; + union hiraid_data_ptr dptr; + __le32 cdw10; + __le32 cdw11; + __le32 cdw12; + __le32 cdw13; + __le32 cdw14; + __le32 cdw15; +}; + +struct hiraid_abort_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u64 rsvd2[4]; + __le16 sqid; + __le16 cid; + __u32 rsvd11[5]; +}; + +struct hiraid_reset_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __u64 rsvd2[4]; + __u8 type; + __u8 rsvd10[3]; + __u32 rsvd11[5]; +}; + +struct hiraid_admin_command { + union { + struct hiraid_admin_com_cmd common; + struct hiraid_features features; + struct hiraid_create_cq create_cq; + struct hiraid_create_sq create_sq; + struct hiraid_delete_queue delete_queue; + struct hiraid_get_info get_info; + struct hiraid_abort_cmd abort; + struct hiraid_reset_cmd reset; + struct hiraid_usr_cmd usr_cmd; + }; +}; + +struct hiraid_scsi_io_com_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_len; + __u8 rsvd2; + __le32 cdw3[3]; + union hiraid_data_ptr dptr; + __le32 cdw10[6]; + __u8 cdb[32]; + __le64 sense_addr; + __le32 cdw26[6]; +}; + +struct hiraid_scsi_rw_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_len; + __u8 rsvd2; + __u32 rsvd3[3]; + union hiraid_data_ptr dptr; + __le64 slba; + __le16 nlb; + __le16 control; + __u32 rsvd13[3]; + __u8 cdb[32]; + __le64 sense_addr; + __u32 rsvd26[6]; +}; + +struct hiraid_scsi_nonrw_cmd { + __u8 opcode; + __u8 flags; + __le16 cmd_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_length; + __u8 rsvd2; + __u32 rsvd3[3]; + union hiraid_data_ptr dptr; + __u32 rsvd10[5]; + __le32 buf_len; + __u8 cdb[32]; + __le64 sense_addr; + __u32 rsvd26[6]; +}; + +struct hiraid_scsi_io_cmd { + union { + struct hiraid_scsi_io_com_cmd common; + struct hiraid_scsi_rw_cmd rw; + struct hiraid_scsi_nonrw_cmd nonrw; + }; +}; + +struct hiraid_passthru_common_cmd { + __u8 opcode; + __u8 flags; + __u16 rsvd0; + __u32 nsid; + union { + struct { + __u16 subopcode; + __u16 rsvd1; + } info_0; + __u32 cdw2; + }; + union { + struct { + __u16 data_len; + __u16 param_len; + } info_1; + __u32 cdw3; + }; + __u64 metadata; + + __u64 addr; + __u64 prp2; + + __u32 cdw10; + __u32 cdw11; + __u32 cdw12; + __u32 cdw13; + __u32 cdw14; + __u32 cdw15; + __u32 timeout_ms; + __u32 result0; + __u32 result1; +}; + +struct hiraid_passthru_io_cmd { + __u8 opcode; + __u8 flags; + __u16 rsvd0; + __u32 nsid; + union { + struct { + __u16 res_sense_len; + __u8 cdb_len; + __u8 rsvd0; + } info_0; + __u32 cdw2; + }; + union { + struct { + __u16 subopcode; + __u16 rsvd1; + } info_1; + __u32 cdw3; + }; + union { + struct { + __u16 rsvd; + __u16 param_len; + } info_2; + __u32 cdw4; + }; + __u32 cdw5; + __u64 addr; + __u64 prp2; + union { + struct { + __u16 eid; + __u16 sid; + } info_3; + __u32 cdw10; + }; + union { + struct { + __u16 did; + __u8 did_flag; + __u8 rsvd2; + } info_4; + __u32 cdw11; + }; + __u32 cdw12; + __u32 cdw13; + __u32 cdw14; + __u32 data_len; + __u32 cdw16; + __u32 cdw17; + __u32 cdw18; + __u32 cdw19; + __u32 cdw20; + __u32 cdw21; + __u32 cdw22; + __u32 cdw23; + __u64 sense_addr; + __u32 cdw26[4]; + __u32 timeout_ms; + __u32 result0; + __u32 result1; +}; + +struct hiraid_bsg_request { + u32 msgcode; + u32 control; + union { + struct hiraid_passthru_common_cmd admcmd; + struct hiraid_passthru_io_cmd pthrucmd; + }; +}; + +struct hiraid_cmd { + u16 qid; + u16 cid; + u32 result0; + u32 result1; + u16 status; + void *priv; + enum hiraid_cmd_state state; + struct completion cmd_done; + struct list_head list; +}; + +struct hiraid_queue { + struct hiraid_dev *hdev; + spinlock_t sq_lock; + + spinlock_t cq_lock ____cacheline_aligned_in_smp; + + void *sq_cmds; + + struct hiraid_completion *cqes; + + dma_addr_t sq_buffer_phy; + dma_addr_t cq_buffer_phy; + u32 __iomem *q_db; + u8 cq_phase; + u8 sqes; + u16 qid; + u16 sq_tail; + u16 cq_head; + u16 last_cq_head; + u16 q_depth; + s16 cq_vector; + atomic_t inflight; + void *sense_buffer_virt; + dma_addr_t sense_buffer_phy; + struct dma_pool *prp_small_pool; +}; + +struct hiraid_mapmange { + struct hiraid_queue *hiraidq; + enum hiraid_cmd_state state; + u16 cid; + int page_cnt; + u32 sge_cnt; + u32 len; + bool use_sgl; + dma_addr_t first_dma; + void *sense_buffer_virt; + dma_addr_t sense_buffer_phy; + struct scatterlist *sgl; + void *list[0]; +}; + +struct hiraid_vd_info { + __u8 name[32]; + __le16 id; + __u8 rg_id; + __u8 rg_level; + __u8 sg_num; + __u8 sg_disk_num; + __u8 vd_status; + __u8 vd_type; + __u8 rsvd1[4056]; +}; + +struct bgtask_info { + __u8 type; + __u8 progress; + __u8 rate; + __u8 rsvd0; + __le16 vd_id; + __le16 time_left; + __u8 rsvd1[4]; +}; + +struct hiraid_bgtask { + __u8 sw; + __u8 task_num; + __u8 rsvd[6]; + struct bgtask_info bgtask[MAX_REALTIME_BGTASK_NUM]; +}; + +struct hiraid_dev_info { + __le32 hdid; + __le16 target; + __u8 channel; + __u8 lun; + __u8 attr; + __u8 flag; + __le16 max_io_kb; +}; + +struct hiraid_dev_list { + __le32 dev_num; + __u32 rsvd0[3]; + struct hiraid_dev_info devinfo[MAX_DEV_ENTRY_PER_PAGE_4K]; +}; + +struct hiraid_sdev_hostdata { + u32 hdid; + u16 max_io_kb; + u8 attr; + u8 flag; + u8 rg_id; + u8 hwq; + u16 pend_count; +}; + +#endif + diff --git a/drivers/scsi/hisi_raid/hiraid_main.c b/drivers/scsi/hisi_raid/hiraid_main.c new file mode 100644 index 000000000000..b9ffa642479c --- /dev/null +++ b/drivers/scsi/hisi_raid/hiraid_main.c @@ -0,0 +1,3982 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2022 Huawei Technologies Co., Ltd */ + +/* Huawei Raid Series Linux Driver */ + +#define pr_fmt(fmt) "hiraid: " fmt + +#include <linux/sched/signal.h> +#include <linux/version.h> +#include <linux/pci.h> +#include <linux/aer.h> +#include <linux/module.h> +#include <linux/ioport.h> +#include <linux/device.h> +#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/cdev.h> +#include <linux/sysfs.h> +#include <linux/gfp.h> +#include <linux/types.h> +#include <linux/ratelimit.h> +#include <linux/once.h> +#include <linux/debugfs.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/blkdev.h> +#include <linux/bsg-lib.h> +#include <asm/unaligned.h> +#include <linux/sort.h> +#include <target/target_core_backend.h> + +#include <scsi/scsi.h> +#include <scsi/scsi_cmnd.h> +#include <scsi/scsi_device.h> +#include <scsi/scsi_host.h> +#include <scsi/scsi_transport.h> +#include <scsi/scsi_dbg.h> +#include <scsi/sg.h> + +#include "hiraid.h" + +static u32 admin_tmout = 60; +module_param(admin_tmout, uint, 0644); +MODULE_PARM_DESC(admin_tmout, "admin commands timeout (seconds)"); + +static u32 scmd_tmout_rawdisk = 180; +module_param(scmd_tmout_rawdisk, uint, 0644); +MODULE_PARM_DESC(scmd_tmout_rawdisk, "scsi commands timeout for rawdisk(seconds)"); + +static u32 scmd_tmout_vd = 180; +module_param(scmd_tmout_vd, uint, 0644); +MODULE_PARM_DESC(scmd_tmout_vd, "scsi commands timeout for vd(seconds)"); + +static bool max_io_force; +module_param(max_io_force, bool, 0644); +MODULE_PARM_DESC(max_io_force, "force max_hw_sectors_kb = 1024, default false(performance first)"); + +static bool work_mode; +module_param(work_mode, bool, 0444); +MODULE_PARM_DESC(work_mode, "work mode switch, default false for multi hw queues"); + +#define MAX_IO_QUEUES 128 +#define MIN_IO_QUEUES 1 + +static int ioq_num_set(const char *val, const struct kernel_param *kp) +{ + int n = 0; + int ret; + + ret = kstrtoint(val, 10, &n); + if (ret != 0 || n < MIN_IO_QUEUES || n > MAX_IO_QUEUES) + return -EINVAL; + + return param_set_int(val, kp); +} + +static const struct kernel_param_ops max_hwq_num_ops = { + .set = ioq_num_set, + .get = param_get_uint, +}; + +static u32 max_hwq_num = 128; +module_param_cb(max_hwq_num, &max_hwq_num_ops, &max_hwq_num, 0444); +MODULE_PARM_DESC(max_hwq_num, "max num of hw io queues, should >= 1, default 128"); + +static int io_queue_depth_set(const char *val, const struct kernel_param *kp) +{ + int n = 0; + int ret; + + ret = kstrtoint(val, 10, &n); + if (ret != 0 || n < 2) + return -EINVAL; + + return param_set_int(val, kp); +} + +static const struct kernel_param_ops io_queue_depth_ops = { + .set = io_queue_depth_set, + .get = param_get_uint, +}; + +static u32 io_queue_depth = 1024; +module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644); +MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2"); + +static u32 log_debug_switch; +module_param(log_debug_switch, uint, 0644); +MODULE_PARM_DESC(log_debug_switch, "set log state, default zero for switch off"); + +static int extra_pool_num_set(const char *val, const struct kernel_param *kp) +{ + u8 n = 0; + int ret; + + ret = kstrtou8(val, 10, &n); + if (ret != 0) + return -EINVAL; + if (n > MAX_EXTRA_POOL_NUM) + n = MAX_EXTRA_POOL_NUM; + if (n < 1) + n = 1; + *((u8 *)kp->arg) = n; + + return 0; +} + +static const struct kernel_param_ops small_pool_num_ops = { + .set = extra_pool_num_set, + .get = param_get_byte, +}; + +/* It was found that the spindlock of a single pool conflicts + * a lot with multiple CPUs.So multiple pools are introduced + * to reduce the conflictions. + */ +static unsigned char extra_pool_num = 4; +module_param_cb(extra_pool_num, &small_pool_num_ops, &extra_pool_num, 0644); +MODULE_PARM_DESC(extra_pool_num, "set prp extra pool num, default 4, MAX 16"); + +static void hiraid_handle_async_notice(struct hiraid_dev *hdev, u32 result); +static void hiraid_handle_async_vs(struct hiraid_dev *hdev, u32 result, u32 result1); + +static struct class *hiraid_class; + +#define HIRAID_CAP_TIMEOUT_UNIT_MS (HZ / 2) + +static struct workqueue_struct *work_queue; + +#define dev_log_dbg(dev, fmt, ...) do { \ + if (unlikely(log_debug_switch)) \ + dev_info(dev, "[%s] " fmt, \ + __func__, ##__VA_ARGS__); \ +} while (0) + +#define HIRAID_DRV_VERSION "1.1.0.0" + +#define ADMIN_TIMEOUT (admin_tmout * HZ) +#define USRCMD_TIMEOUT (180 * HZ) +#define CTL_RST_TIME (600 * HZ) + +#define HIRAID_WAIT_ABNL_CMD_TIMEOUT 6 +#define HIRAID_WAIT_RST_IO_TIMEOUT 10 + +#define HIRAID_DMA_MSK_BIT_MAX 64 + +#define IOQ_PT_DATA_LEN 4096 +#define IOQ_PT_SGL_DATA_LEN (1024 * 1024) + +#define MAX_CAN_QUEUE (4096 - 1) +#define MIN_CAN_QUEUE (1024 - 1) + +enum SENSE_STATE_CODE { + SENSE_STATE_OK = 0, + SENSE_STATE_NEED_CHECK, + SENSE_STATE_ERROR, + SENSE_STATE_EP_PCIE_ERROR, + SENSE_STATE_NAC_DMA_ERROR, + SENSE_STATE_ABORTED, + SENSE_STATE_NEED_RETRY +}; + +enum { + FW_EH_OK = 0, + FW_EH_DEV_NONE = 0x701 +}; + +static const char * const raid_levels[] = {"0", "1", "5", "6", "10", "50", "60", "NA"}; + +static const char * const raid_states[] = { + "NA", "NORMAL", "FAULT", "DEGRADE", "NOT_FORMATTED", "FORMATTING", "SANITIZING", + "INITIALIZING", "INITIALIZE_FAIL", "DELETING", "DELETE_FAIL", "WRITE_PROTECT" +}; + +static int hiraid_remap_bar(struct hiraid_dev *hdev, u32 size) +{ + struct pci_dev *pdev = hdev->pdev; + + if (size > pci_resource_len(pdev, 0)) { + dev_err(hdev->dev, "input size[%u] exceed bar0 length[%llu]\n", + size, pci_resource_len(pdev, 0)); + return -ENOMEM; + } + + if (hdev->bar) + iounmap(hdev->bar); + + hdev->bar = ioremap(pci_resource_start(pdev, 0), size); + if (!hdev->bar) { + dev_err(hdev->dev, "ioremap for bar0 failed\n"); + return -ENOMEM; + } + hdev->dbs = hdev->bar + HIRAID_REG_DBS; + + return 0; +} + +static int hiraid_dev_map(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + int ret; + + ret = pci_request_mem_regions(pdev, "hiraid"); + if (ret) { + dev_err(hdev->dev, "fail to request memory regions\n"); + return ret; + } + + ret = hiraid_remap_bar(hdev, HIRAID_REG_DBS + 4096); + if (ret) { + pci_release_mem_regions(pdev); + return ret; + } + + return 0; +} + +static void hiraid_dev_unmap(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + + if (hdev->bar) { + iounmap(hdev->bar); + hdev->bar = NULL; + } + pci_release_mem_regions(pdev); +} + +static int hiraid_pci_enable(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + int ret = -ENOMEM; + u64 maskbit = HIRAID_DMA_MSK_BIT_MAX; + + if (pci_enable_device_mem(pdev)) { + dev_err(hdev->dev, "enable pci device memory resources failed\n"); + return ret; + } + pci_set_master(pdev); + + if (readl(hdev->bar + HIRAID_REG_CSTS) == U32_MAX) { + ret = -ENODEV; + dev_err(hdev->dev, "read CSTS register failed\n"); + goto disable; + } + + hdev->cap = lo_hi_readq(hdev->bar + HIRAID_REG_CAP); + hdev->ioq_depth = min_t(u32, HIRAID_CAP_MQES(hdev->cap) + 1, io_queue_depth); + hdev->db_stride = 1 << HIRAID_CAP_STRIDE(hdev->cap); + + maskbit = HIRAID_CAP_DMAMASK(hdev->cap); + if (maskbit < 32 || maskbit > HIRAID_DMA_MSK_BIT_MAX) { + dev_err(hdev->dev, "err, dma mask invalid[%llu], set to default\n", maskbit); + maskbit = HIRAID_DMA_MSK_BIT_MAX; + } + + if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(maskbit))) { + if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32))) { + dev_err(hdev->dev, "set dma mask[32] and coherent failed\n"); + goto disable; + } + dev_info(hdev->dev, "set dma mask[32] success\n"); + } else { + dev_info(hdev->dev, "set dma mask[%llu] success\n", maskbit); + } + + ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); + if (ret < 0) { + dev_err(hdev->dev, "allocate one IRQ for setup admin queue failed\n"); + goto disable; + } + + pci_enable_pcie_error_reporting(pdev); + pci_save_state(pdev); + + return 0; + +disable: + pci_disable_device(pdev); + return ret; +} + + +/* + * It is fact that first prp and last prp may be not full page. + * The size to count total nprps for the io equal to size + page_size, + * it may be a slightly overestimate. + * + * 8B per prp address. It may be there is one prp_list address per page, + * prp_list address does not count in io data prps. So divisor equal to + * PAGE_SIZE - 8, it may be a slightly overestimate. + */ +static int hiraid_prp_pagenum(struct hiraid_dev *hdev) +{ + u32 size = 1U << ((hdev->ctrl_info->mdts) * 1U) << 12; + u32 nprps = DIV_ROUND_UP(size + hdev->page_size, hdev->page_size); + + return DIV_ROUND_UP(PRP_ENTRY_SIZE * nprps, hdev->page_size - PRP_ENTRY_SIZE); +} + +/* + * Calculates the number of pages needed for the SGL segments. For example a 4k + * page can accommodate 256 SGL descriptors. + */ +static int hiraid_sgl_pagenum(struct hiraid_dev *hdev) +{ + u32 nsge = le16_to_cpu(hdev->ctrl_info->max_num_sge); + + return DIV_ROUND_UP(nsge * sizeof(struct hiraid_sgl_desc), hdev->page_size); +} + +static inline void **hiraid_mapbuf_list(struct hiraid_mapmange *mapbuf) +{ + return mapbuf->list; +} + +static u32 hiraid_get_max_cmd_size(struct hiraid_dev *hdev) +{ + u32 alloc_size = sizeof(__le64 *) * max(hiraid_prp_pagenum(hdev), hiraid_sgl_pagenum(hdev)); + + dev_info(hdev->dev, "mapbuf size[%lu], alloc_size[%u]\n", + sizeof(struct hiraid_mapmange), alloc_size); + + return sizeof(struct hiraid_mapmange) + alloc_size; +} + +static int hiraid_build_passthru_prp(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf) +{ + struct scatterlist *sg = mapbuf->sgl; + __le64 *phy_regpage, *prior_list; + u64 buf_addr = sg_dma_address(sg); + int buf_length = sg_dma_len(sg); + u32 page_size = hdev->page_size; + int offset = buf_addr & (page_size - 1); + void **list = hiraid_mapbuf_list(mapbuf); + int maplen = mapbuf->len; + struct dma_pool *pool; + dma_addr_t buffer_phy; + int i; + + maplen -= (page_size - offset); + if (maplen <= 0) { + mapbuf->first_dma = 0; + return 0; + } + + buf_length -= (page_size - offset); + if (buf_length) { + buf_addr += (page_size - offset); + } else { + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + if (maplen <= page_size) { + mapbuf->first_dma = buf_addr; + return 0; + } + + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate first admin prp_list memory failed\n"); + mapbuf->first_dma = buf_addr; + mapbuf->page_cnt = -1; + return -ENOMEM; + } + list[0] = phy_regpage; + mapbuf->first_dma = buffer_phy; + i = 0; + for (;;) { + if (i == page_size / PRP_ENTRY_SIZE) { + prior_list = phy_regpage; + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th admin prp list memory failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = phy_regpage; + phy_regpage[0] = prior_list[i - 1]; + prior_list[i - 1] = cpu_to_le64(buffer_phy); + i = 1; + } + phy_regpage[i++] = cpu_to_le64(buf_addr); + buf_addr += page_size; + buf_length -= page_size; + maplen -= page_size; + if (maplen <= 0) + break; + if (buf_length > 0) + continue; + if (unlikely(buf_length < 0)) + goto bad_admin_sgl; + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + return 0; + +bad_admin_sgl: + dev_err(hdev->dev, "setup prps, invalid admin SGL for payload[%d] nents[%d]\n", + mapbuf->len, mapbuf->sge_cnt); + return -EIO; +} + +static int hiraid_build_prp(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf) +{ + struct scatterlist *sg = mapbuf->sgl; + __le64 *phy_regpage, *prior_list; + u64 buf_addr = sg_dma_address(sg); + int buf_length = sg_dma_len(sg); + u32 page_size = hdev->page_size; + int offset = buf_addr & (page_size - 1); + void **list = hiraid_mapbuf_list(mapbuf); + int maplen = mapbuf->len; + struct dma_pool *pool; + dma_addr_t buffer_phy; + int nprps, i; + + maplen -= (page_size - offset); + if (maplen <= 0) { + mapbuf->first_dma = 0; + return 0; + } + + buf_length -= (page_size - offset); + if (buf_length) { + buf_addr += (page_size - offset); + } else { + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + if (maplen <= page_size) { + mapbuf->first_dma = buf_addr; + return 0; + } + + nprps = DIV_ROUND_UP(maplen, page_size); + if (nprps <= (EXTRA_POOL_SIZE / PRP_ENTRY_SIZE)) { + pool = mapbuf->hiraidq->prp_small_pool; + mapbuf->page_cnt = 0; + } else { + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + } + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate first prp_list memory failed\n"); + mapbuf->first_dma = buf_addr; + mapbuf->page_cnt = -1; + return -ENOMEM; + } + list[0] = phy_regpage; + mapbuf->first_dma = buffer_phy; + i = 0; + for (;;) { + if (i == page_size / PRP_ENTRY_SIZE) { + prior_list = phy_regpage; + + phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!phy_regpage) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th prp list memory failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = phy_regpage; + phy_regpage[0] = prior_list[i - 1]; + prior_list[i - 1] = cpu_to_le64(buffer_phy); + i = 1; + } + phy_regpage[i++] = cpu_to_le64(buf_addr); + buf_addr += page_size; + buf_length -= page_size; + maplen -= page_size; + if (maplen <= 0) + break; + if (buf_length > 0) + continue; + if (unlikely(buf_length < 0)) + goto bad_sgl; + sg = sg_next(sg); + buf_addr = sg_dma_address(sg); + buf_length = sg_dma_len(sg); + } + + return 0; + +bad_sgl: + dev_err(hdev->dev, "setup prps, invalid SGL for payload[%d] nents[%d]\n", + mapbuf->len, mapbuf->sge_cnt); + return -EIO; +} + +#define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct hiraid_sgl_desc)) + +static void hiraid_submit_cmd(struct hiraid_queue *hiraidq, const void *cmd) +{ + u32 sqes = SQE_SIZE(hiraidq->qid); + unsigned long flags; + struct hiraid_admin_com_cmd *acd = (struct hiraid_admin_com_cmd *)cmd; + + spin_lock_irqsave(&hiraidq->sq_lock, flags); + memcpy((hiraidq->sq_cmds + sqes * hiraidq->sq_tail), cmd, sqes); + if (++hiraidq->sq_tail == hiraidq->q_depth) + hiraidq->sq_tail = 0; + + writel(hiraidq->sq_tail, hiraidq->q_db); + spin_unlock_irqrestore(&hiraidq->sq_lock, flags); + + dev_log_dbg(hiraidq->hdev->dev, "cid[%d] qid[%d] opcode[0x%x] flags[0x%x] hdid[%u]\n", + le16_to_cpu(acd->cmd_id), hiraidq->qid, acd->opcode, acd->flags, + le32_to_cpu(acd->hdid)); +} + +static inline bool hiraid_is_rw_scmd(struct scsi_cmnd *scmd) +{ + switch (scmd->cmnd[0]) { + case READ_6: + case READ_10: + case READ_12: + case READ_16: + case WRITE_6: + case WRITE_10: + case WRITE_12: + case WRITE_16: + return true; + default: + return false; + } +} + +/* + * checks if prps can be built for the IO cmd + */ +static bool hiraid_is_prp(struct hiraid_dev *hdev, struct scatterlist *sgl, u32 nsge) +{ + struct scatterlist *sg = sgl; + u32 page_mask = hdev->page_size - 1; + bool is_prp = true; + u32 i = 0; + + for_each_sg(sgl, sg, nsge, i) { + /* + * Data length of the middle sge multiple of page_size, + * address page_size aligned. + */ + if (i != 0 && i != nsge - 1) { + if ((sg_dma_len(sg) & page_mask) || + (sg_dma_address(sg) & page_mask)) { + is_prp = false; + break; + } + } + + /* + * The first sge addr plus the data length meets + * the page_size alignment. + */ + if (nsge > 1 && i == 0) { + if ((sg_dma_address(sg) + sg_dma_len(sg)) & page_mask) { + is_prp = false; + break; + } + } + + /* The last sge addr meets the page_size alignment. */ + if (nsge > 1 && i == (nsge - 1)) { + if (sg_dma_address(sg) & page_mask) { + is_prp = false; + break; + } + } + } + + return is_prp; +} + +enum { + HIRAID_SGL_FMT_DATA_DESC = 0x00, + HIRAID_SGL_FMT_SEG_DESC = 0x02, + HIRAID_SGL_FMT_LAST_SEG_DESC = 0x03, + HIRAID_KEY_SGL_FMT_DATA_DESC = 0x04, + HIRAID_TRANSPORT_SGL_DATA_DESC = 0x05 +}; + +static void hiraid_sgl_set_data(struct hiraid_sgl_desc *sge, struct scatterlist *sg) +{ + sge->addr = cpu_to_le64(sg_dma_address(sg)); + sge->length = cpu_to_le32(sg_dma_len(sg)); + sge->type = HIRAID_SGL_FMT_DATA_DESC << 4; +} + +static void hiraid_sgl_set_seg(struct hiraid_sgl_desc *sge, dma_addr_t buffer_phy, int entries) +{ + sge->addr = cpu_to_le64(buffer_phy); + if (entries <= SGES_PER_PAGE) { + sge->length = cpu_to_le32(entries * sizeof(*sge)); + sge->type = HIRAID_SGL_FMT_LAST_SEG_DESC << 4; + } else { + sge->length = cpu_to_le32(PAGE_SIZE); + sge->type = HIRAID_SGL_FMT_SEG_DESC << 4; + } +} + +static int hiraid_build_passthru_sgl(struct hiraid_dev *hdev, + struct hiraid_admin_command *admin_cmd, + struct hiraid_mapmange *mapbuf) +{ + struct hiraid_sgl_desc *sg_list, *link, *old_sg_list; + struct scatterlist *sg = mapbuf->sgl; + void **list = hiraid_mapbuf_list(mapbuf); + struct dma_pool *pool; + int nsge = mapbuf->sge_cnt; + dma_addr_t buffer_phy; + int i = 0; + + admin_cmd->common.flags |= SQE_FLAG_SGL_METABUF; + + if (nsge == 1) { + hiraid_sgl_set_data(&admin_cmd->common.dptr.sgl, sg); + return 0; + } + + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate first admin sgl_list failed\n"); + mapbuf->page_cnt = -1; + return -ENOMEM; + } + + list[0] = sg_list; + mapbuf->first_dma = buffer_phy; + hiraid_sgl_set_seg(&admin_cmd->common.dptr.sgl, buffer_phy, nsge); + do { + if (i == SGES_PER_PAGE) { + old_sg_list = sg_list; + link = &old_sg_list[SGES_PER_PAGE - 1]; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th admin sgl_list failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = sg_list; + + i = 0; + memcpy(&sg_list[i++], link, sizeof(*link)); + hiraid_sgl_set_seg(link, buffer_phy, nsge); + } + + hiraid_sgl_set_data(&sg_list[i++], sg); + sg = sg_next(sg); + } while (--nsge > 0); + + return 0; +} + + +static int hiraid_build_sgl(struct hiraid_dev *hdev, struct hiraid_scsi_io_cmd *io_cmd, + struct hiraid_mapmange *mapbuf) +{ + struct hiraid_sgl_desc *sg_list, *link, *old_sg_list; + struct scatterlist *sg = mapbuf->sgl; + void **list = hiraid_mapbuf_list(mapbuf); + struct dma_pool *pool; + int nsge = mapbuf->sge_cnt; + dma_addr_t buffer_phy; + int i = 0; + + io_cmd->common.flags |= SQE_FLAG_SGL_METABUF; + + if (nsge == 1) { + hiraid_sgl_set_data(&io_cmd->common.dptr.sgl, sg); + return 0; + } + + if (nsge <= (EXTRA_POOL_SIZE / sizeof(struct hiraid_sgl_desc))) { + pool = mapbuf->hiraidq->prp_small_pool; + mapbuf->page_cnt = 0; + } else { + pool = hdev->prp_page_pool; + mapbuf->page_cnt = 1; + } + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate first sgl_list failed\n"); + mapbuf->page_cnt = -1; + return -ENOMEM; + } + + list[0] = sg_list; + mapbuf->first_dma = buffer_phy; + hiraid_sgl_set_seg(&io_cmd->common.dptr.sgl, buffer_phy, nsge); + do { + if (i == SGES_PER_PAGE) { + old_sg_list = sg_list; + link = &old_sg_list[SGES_PER_PAGE - 1]; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "allocate [%d]th sgl_list failed\n", + mapbuf->page_cnt + 1); + return -ENOMEM; + } + list[mapbuf->page_cnt++] = sg_list; + + i = 0; + memcpy(&sg_list[i++], link, sizeof(*link)); + hiraid_sgl_set_seg(link, buffer_phy, nsge); + } + + hiraid_sgl_set_data(&sg_list[i++], sg); + sg = sg_next(sg); + } while (--nsge > 0); + + return 0; +} + +#define HIRAID_RW_FUA BIT(14) + +static int hiraid_setup_rw_cmd(struct hiraid_dev *hdev, + struct hiraid_scsi_rw_cmd *io_cmd, + struct scsi_cmnd *scmd) +{ + u32 start_lba_lo, start_lba_hi; + u32 datalength = 0; + u16 control = 0; + + start_lba_lo = 0; + start_lba_hi = 0; + + if (scmd->sc_data_direction == DMA_TO_DEVICE) { + io_cmd->opcode = HIRAID_CMD_WRITE; + } else if (scmd->sc_data_direction == DMA_FROM_DEVICE) { + io_cmd->opcode = HIRAID_CMD_READ; + } else { + dev_err(hdev->dev, "invalid RW_IO for unsupported data direction[%d]\n", + scmd->sc_data_direction); + WARN_ON(1); + return -EINVAL; + } + + /* 6-byte READ(0x08) or WRITE(0x0A) cdb */ + if (scmd->cmd_len == 6) { + datalength = (u32)(scmd->cmnd[4] == 0 ? + IO_6_DEFAULT_TX_LEN : scmd->cmnd[4]); + start_lba_lo = (u32)get_unaligned_be24(&scmd->cmnd[1]); + + start_lba_lo &= 0x1FFFFF; + } + + /* 10-byte READ(0x28) or WRITE(0x2A) cdb */ + else if (scmd->cmd_len == 10) { + datalength = (u32)get_unaligned_be16(&scmd->cmnd[7]); + start_lba_lo = get_unaligned_be32(&scmd->cmnd[2]); + + if (scmd->cmnd[1] & FUA_MASK) + control |= HIRAID_RW_FUA; + } + + /* 12-byte READ(0xA8) or WRITE(0xAA) cdb */ + else if (scmd->cmd_len == 12) { + datalength = get_unaligned_be32(&scmd->cmnd[6]); + start_lba_lo = get_unaligned_be32(&scmd->cmnd[2]); + + if (scmd->cmnd[1] & FUA_MASK) + control |= HIRAID_RW_FUA; + } + /* 16-byte READ(0x88) or WRITE(0x8A) cdb */ + else if (scmd->cmd_len == 16) { + datalength = get_unaligned_be32(&scmd->cmnd[10]); + start_lba_lo = get_unaligned_be32(&scmd->cmnd[6]); + start_lba_hi = get_unaligned_be32(&scmd->cmnd[2]); + + if (scmd->cmnd[1] & FUA_MASK) + control |= HIRAID_RW_FUA; + } + + if (unlikely(datalength > U16_MAX || datalength == 0)) { + dev_err(hdev->dev, "invalid IO for illegal transfer data length[%u]\n", datalength); + WARN_ON(1); + return -EINVAL; + } + + io_cmd->slba = cpu_to_le64(((u64)start_lba_hi << 32) | start_lba_lo); + /* 0base for nlb */ + io_cmd->nlb = cpu_to_le16((u16)(datalength - 1)); + io_cmd->control = cpu_to_le16(control); + + return 0; +} + +static int hiraid_setup_nonrw_cmd(struct hiraid_dev *hdev, + struct hiraid_scsi_nonrw_cmd *io_cmd, struct scsi_cmnd *scmd) +{ + io_cmd->buf_len = cpu_to_le32(scsi_bufflen(scmd)); + + switch (scmd->sc_data_direction) { + case DMA_NONE: + io_cmd->opcode = HIRAID_CMD_NONRW_NONE; + break; + case DMA_TO_DEVICE: + io_cmd->opcode = HIRAID_CMD_NONRW_TODEV; + break; + case DMA_FROM_DEVICE: + io_cmd->opcode = HIRAID_CMD_NONRW_FROMDEV; + break; + default: + dev_err(hdev->dev, "invalid NON_IO for unsupported data direction[%d]\n", + scmd->sc_data_direction); + WARN_ON(1); + return -EINVAL; + } + + return 0; +} + +static int hiraid_setup_io_cmd(struct hiraid_dev *hdev, + struct hiraid_scsi_io_cmd *io_cmd, struct scsi_cmnd *scmd) +{ + memcpy(io_cmd->common.cdb, scmd->cmnd, scmd->cmd_len); + io_cmd->common.cdb_len = scmd->cmd_len; + + if (hiraid_is_rw_scmd(scmd)) + return hiraid_setup_rw_cmd(hdev, &io_cmd->rw, scmd); + else + return hiraid_setup_nonrw_cmd(hdev, &io_cmd->nonrw, scmd); +} + +static inline void hiraid_init_mapbuff(struct hiraid_mapmange *mapbuf) +{ + mapbuf->sge_cnt = 0; + mapbuf->page_cnt = -1; + mapbuf->use_sgl = false; + WRITE_ONCE(mapbuf->state, CMD_IDLE); +} + +static void hiraid_free_mapbuf(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf) +{ + const int last_prp = hdev->page_size / sizeof(__le64) - 1; + dma_addr_t buffer_phy, next_buffer_phy; + struct hiraid_sgl_desc *sg_list; + __le64 *prp_list; + void *addr; + int i; + + buffer_phy = mapbuf->first_dma; + if (mapbuf->page_cnt == 0) + dma_pool_free(mapbuf->hiraidq->prp_small_pool, + hiraid_mapbuf_list(mapbuf)[0], buffer_phy); + + for (i = 0; i < mapbuf->page_cnt; i++) { + addr = hiraid_mapbuf_list(mapbuf)[i]; + + if (mapbuf->use_sgl) { + sg_list = addr; + next_buffer_phy = + le64_to_cpu((sg_list[SGES_PER_PAGE - 1]).addr); + } else { + prp_list = addr; + next_buffer_phy = le64_to_cpu(prp_list[last_prp]); + } + + dma_pool_free(hdev->prp_page_pool, addr, buffer_phy); + buffer_phy = next_buffer_phy; + } + + mapbuf->sense_buffer_virt = NULL; + mapbuf->page_cnt = -1; +} + +static int hiraid_io_map_data(struct hiraid_dev *hdev, struct hiraid_mapmange *mapbuf, + struct scsi_cmnd *scmd, struct hiraid_scsi_io_cmd *io_cmd) +{ + int ret; + + ret = scsi_dma_map(scmd); + if (unlikely(ret < 0)) + return ret; + mapbuf->sge_cnt = ret; + + /* No data to DMA, it may be scsi no-rw command */ + if (unlikely(mapbuf->sge_cnt == 0)) + return 0; + + mapbuf->len = scsi_bufflen(scmd); + mapbuf->sgl = scsi_sglist(scmd); + mapbuf->use_sgl = !hiraid_is_prp(hdev, mapbuf->sgl, mapbuf->sge_cnt); + + if (mapbuf->use_sgl) { + ret = hiraid_build_sgl(hdev, io_cmd, mapbuf); + } else { + ret = hiraid_build_prp(hdev, mapbuf); + io_cmd->common.dptr.prp1 = + cpu_to_le64(sg_dma_address(mapbuf->sgl)); + io_cmd->common.dptr.prp2 = cpu_to_le64(mapbuf->first_dma); + } + + if (ret) + scsi_dma_unmap(scmd); + + return ret; +} + +static void hiraid_check_status(struct hiraid_mapmange *mapbuf, struct scsi_cmnd *scmd, + struct hiraid_completion *cqe) +{ + scsi_set_resid(scmd, 0); + + switch ((le16_to_cpu(cqe->status) >> 1) & 0x7f) { + case SENSE_STATE_OK: + set_host_byte(scmd, DID_OK); + break; + case SENSE_STATE_NEED_CHECK: + set_host_byte(scmd, DID_OK); + scmd->result |= le16_to_cpu(cqe->status) >> 8; + if (scmd->result & SAM_STAT_CHECK_CONDITION) { + memset(scmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); + memcpy(scmd->sense_buffer, + mapbuf->sense_buffer_virt, SCSI_SENSE_BUFFERSIZE); + scmd->result = (scmd->result & 0x00ffffff) | (DRIVER_SENSE << 24); + } + break; + case SENSE_STATE_ABORTED: + set_host_byte(scmd, DID_ABORT); + break; + case SENSE_STATE_NEED_RETRY: + set_host_byte(scmd, DID_REQUEUE); + break; + default: + set_host_byte(scmd, DID_BAD_TARGET); + dev_warn_ratelimited(mapbuf->hiraidq->hdev->dev, "cid[%d] qid[%d] sdev[%d:%d] opcode[%.2x] bad status[0x%x]\n", + le16_to_cpu(cqe->cmd_id), le16_to_cpu(cqe->sq_id), scmd->device->channel, + scmd->device->id, scmd->cmnd[0], le16_to_cpu(cqe->status)); + break; + } +} + +static inline void hiraid_query_scmd_tag(struct scsi_cmnd *scmd, u16 *qid, u16 *cid, + struct hiraid_dev *hdev, struct hiraid_sdev_hostdata *hostdata) +{ + u32 tag = blk_mq_unique_tag(blk_mq_rq_from_pdu((void *)scmd)); + + if (work_mode) { + if ((hdev->hdd_dispatch == DISPATCH_BY_DISK) && (hostdata->hwq != 0)) + *qid = hostdata->hwq; + else + *qid = raw_smp_processor_id() % (hdev->online_queues - 1) + 1; + } else { + *qid = blk_mq_unique_tag_to_hwq(tag) + 1; + } + *cid = blk_mq_unique_tag_to_tag(tag); +} + +static int hiraid_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) +{ + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + struct hiraid_dev *hdev = shost_priv(shost); + struct scsi_device *sdev = scmd->device; + struct hiraid_sdev_hostdata *hostdata; + struct hiraid_scsi_io_cmd io_cmd; + struct hiraid_queue *ioq; + u16 hwq, cid; + int ret; + + if (unlikely(hdev->state == DEV_RESETTING)) + return SCSI_MLQUEUE_HOST_BUSY; + + if (unlikely(hdev->state != DEV_LIVE)) { + set_host_byte(scmd, DID_NO_CONNECT); + scmd->scsi_done(scmd); + return 0; + } + + if (log_debug_switch) + scsi_print_command(scmd); + + hostdata = sdev->hostdata; + hiraid_query_scmd_tag(scmd, &hwq, &cid, hdev, hostdata); + ioq = &hdev->queues[hwq]; + + if (unlikely(atomic_inc_return(&ioq->inflight) > + (hdev->ioq_depth - HIRAID_PTHRU_CMDS_PERQ))) { + atomic_dec(&ioq->inflight); + return SCSI_MLQUEUE_HOST_BUSY; + } + + memset(&io_cmd, 0, sizeof(io_cmd)); + io_cmd.rw.hdid = cpu_to_le32(hostdata->hdid); + io_cmd.rw.cmd_id = cpu_to_le16(cid); + + ret = hiraid_setup_io_cmd(hdev, &io_cmd, scmd); + if (unlikely(ret)) { + set_host_byte(scmd, DID_ERROR); + scmd->scsi_done(scmd); + atomic_dec(&ioq->inflight); + return 0; + } + + ret = cid * SCSI_SENSE_BUFFERSIZE; + if (work_mode) { + mapbuf->sense_buffer_virt = hdev->sense_buffer_virt + ret; + mapbuf->sense_buffer_phy = hdev->sense_buffer_phy + ret; + } else { + mapbuf->sense_buffer_virt = ioq->sense_buffer_virt + ret; + mapbuf->sense_buffer_phy = ioq->sense_buffer_phy + ret; + } + io_cmd.common.sense_addr = cpu_to_le64(mapbuf->sense_buffer_phy); + io_cmd.common.sense_len = cpu_to_le16(SCSI_SENSE_BUFFERSIZE); + + hiraid_init_mapbuff(mapbuf); + + mapbuf->hiraidq = ioq; + mapbuf->cid = cid; + ret = hiraid_io_map_data(hdev, mapbuf, scmd, &io_cmd); + if (unlikely(ret)) { + dev_err(hdev->dev, "io map data err\n"); + set_host_byte(scmd, DID_ERROR); + scmd->scsi_done(scmd); + ret = 0; + goto deinit_iobuf; + } + + WRITE_ONCE(mapbuf->state, CMD_FLIGHT); + hiraid_submit_cmd(ioq, &io_cmd); + + return 0; + +deinit_iobuf: + atomic_dec(&ioq->inflight); + hiraid_free_mapbuf(hdev, mapbuf); + return ret; +} + +static int hiraid_match_dev(struct hiraid_dev *hdev, u16 idx, struct scsi_device *sdev) +{ + if (HIRAID_DEV_INFO_FLAG_VALID(hdev->dev_info[idx].flag)) { + if (sdev->channel == hdev->dev_info[idx].channel && + sdev->id == le16_to_cpu(hdev->dev_info[idx].target) && + sdev->lun < hdev->dev_info[idx].lun) { + dev_info(hdev->dev, "match device success, channel:target:lun[%d:%d:%d]\n", + hdev->dev_info[idx].channel, + hdev->dev_info[idx].target, + hdev->dev_info[idx].lun); + return 1; + } + } + + return 0; +} + +static int hiraid_disk_qd(u8 attr) +{ + switch (HIRAID_DEV_DISK_TYPE(attr)) { + case HIRAID_SAS_HDD_VD: + case HIRAID_SATA_HDD_VD: + return HIRAID_HDD_VD_QD; + case HIRAID_SAS_SSD_VD: + case HIRAID_SATA_SSD_VD: + case HIRAID_NVME_SSD_VD: + return HIRAID_SSD_VD_QD; + case HIRAID_SAS_HDD_PD: + case HIRAID_SATA_HDD_PD: + return HIRAID_HDD_PD_QD; + case HIRAID_SAS_SSD_PD: + case HIRAID_SATA_SSD_PD: + case HIRAID_NVME_SSD_PD: + return HIRAID_SSD_PD_QD; + default: + return MAX_CMD_PER_DEV; + } +} + +static bool hiraid_disk_is_hdd(u8 attr) +{ + switch (HIRAID_DEV_DISK_TYPE(attr)) { + case HIRAID_SAS_HDD_VD: + case HIRAID_SATA_HDD_VD: + case HIRAID_SAS_HDD_PD: + case HIRAID_SATA_HDD_PD: + return true; + default: + return false; + } +} + +static int hiraid_slave_alloc(struct scsi_device *sdev) +{ + struct hiraid_sdev_hostdata *hostdata; + struct hiraid_dev *hdev; + u16 idx; + + hdev = shost_priv(sdev->host); + hostdata = kzalloc(sizeof(*hostdata), GFP_KERNEL); + if (!hostdata) { + dev_err(hdev->dev, "alloc scsi host data memory failed\n"); + return -ENOMEM; + } + + down_read(&hdev->dev_rwsem); + for (idx = 0; idx < le32_to_cpu(hdev->ctrl_info->nd); idx++) { + if (hiraid_match_dev(hdev, idx, sdev)) + goto scan_host; + } + up_read(&hdev->dev_rwsem); + + kfree(hostdata); + return -ENXIO; + +scan_host: + hostdata->hdid = le32_to_cpu(hdev->dev_info[idx].hdid); + hostdata->max_io_kb = le16_to_cpu(hdev->dev_info[idx].max_io_kb); + hostdata->attr = hdev->dev_info[idx].attr; + hostdata->flag = hdev->dev_info[idx].flag; + hostdata->rg_id = 0xff; + sdev->hostdata = hostdata; + up_read(&hdev->dev_rwsem); + return 0; +} + +static void hiraid_slave_destroy(struct scsi_device *sdev) +{ + kfree(sdev->hostdata); + sdev->hostdata = NULL; +} + +static int hiraid_slave_configure(struct scsi_device *sdev) +{ + unsigned int timeout = scmd_tmout_rawdisk * HZ; + struct hiraid_dev *hdev = shost_priv(sdev->host); + struct hiraid_sdev_hostdata *hostdata = sdev->hostdata; + u32 max_sec = sdev->host->max_sectors; + int qd = MAX_CMD_PER_DEV; + + if (hostdata) { + if (HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + timeout = scmd_tmout_vd * HZ; + else if (HIRAID_DEV_INFO_ATTR_RAWDISK(hostdata->attr)) + timeout = scmd_tmout_rawdisk * HZ; + max_sec = hostdata->max_io_kb << 1; + qd = hiraid_disk_qd(hostdata->attr); + + if (hiraid_disk_is_hdd(hostdata->attr)) + hostdata->hwq = hostdata->hdid % (hdev->online_queues - 1) + 1; + else + hostdata->hwq = 0; + } else { + dev_err(hdev->dev, "err, sdev->hostdata is null\n"); + } + + blk_queue_rq_timeout(sdev->request_queue, timeout); + sdev->eh_timeout = timeout; + scsi_change_queue_depth(sdev, qd); + + if ((max_sec == 0) || (max_sec > sdev->host->max_sectors)) + max_sec = sdev->host->max_sectors; + + if (!max_io_force) + blk_queue_max_hw_sectors(sdev->request_queue, max_sec); + + dev_info(hdev->dev, "sdev->channel:id:lun[%d:%d:%lld] scmd_timeout[%d]s maxsec[%d]\n", + sdev->channel, sdev->id, sdev->lun, timeout / HZ, max_sec); + + return 0; +} + +static void hiraid_shost_init(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + u8 domain, bus; + u32 dev_func; + + domain = pci_domain_nr(pdev->bus); + bus = pdev->bus->number; + dev_func = pdev->devfn; + + hdev->shost->nr_hw_queues = work_mode ? 1 : hdev->online_queues - 1; + hdev->shost->can_queue = hdev->scsi_qd; + + hdev->shost->sg_tablesize = le16_to_cpu(hdev->ctrl_info->max_num_sge); + /* 512B per sector */ + hdev->shost->max_sectors = (1U << ((hdev->ctrl_info->mdts) * 1U) << 12) / 512; + hdev->shost->cmd_per_lun = MAX_CMD_PER_DEV; + hdev->shost->max_channel = le16_to_cpu(hdev->ctrl_info->max_channel) - 1; + hdev->shost->max_id = le32_to_cpu(hdev->ctrl_info->max_tgt_id); + hdev->shost->max_lun = le16_to_cpu(hdev->ctrl_info->max_lun); + + hdev->shost->this_id = -1; + hdev->shost->unique_id = (domain << 16) | (bus << 8) | dev_func; + hdev->shost->max_cmd_len = MAX_CDB_LEN; + hdev->shost->hostt->cmd_size = hiraid_get_max_cmd_size(hdev); +} + +static int hiraid_alloc_queue(struct hiraid_dev *hdev, u16 qid, u16 depth) +{ + struct hiraid_queue *hiraidq = &hdev->queues[qid]; + int ret = 0; + + if (hdev->queue_count > qid) { + dev_info(hdev->dev, "warn: queue[%d] is exist\n", qid); + return 0; + } + + hiraidq->cqes = dma_alloc_coherent(hdev->dev, CQ_SIZE(depth), + &hiraidq->cq_buffer_phy, GFP_KERNEL | __GFP_ZERO); + if (!hiraidq->cqes) + return -ENOMEM; + + hiraidq->sq_cmds = dma_alloc_coherent(hdev->dev, SQ_SIZE(qid, depth), + &hiraidq->sq_buffer_phy, GFP_KERNEL); + if (!hiraidq->sq_cmds) { + ret = -ENOMEM; + goto free_cqes; + } + + /* + * if single hw queue, we do not need to alloc sense buffer for every queue, + * we have alloced all on hiraid_alloc_resources. + */ + if (work_mode) + goto initq; + + /* alloc sense buffer */ + hiraidq->sense_buffer_virt = dma_alloc_coherent(hdev->dev, SENSE_SIZE(depth), + &hiraidq->sense_buffer_phy, GFP_KERNEL | __GFP_ZERO); + if (!hiraidq->sense_buffer_virt) { + ret = -ENOMEM; + goto free_sq_cmds; + } + +initq: + spin_lock_init(&hiraidq->sq_lock); + spin_lock_init(&hiraidq->cq_lock); + hiraidq->hdev = hdev; + hiraidq->q_depth = depth; + hiraidq->qid = qid; + hiraidq->cq_vector = -1; + hdev->queue_count++; + + return 0; + +free_sq_cmds: + dma_free_coherent(hdev->dev, SQ_SIZE(qid, depth), (void *)hiraidq->sq_cmds, + hiraidq->sq_buffer_phy); +free_cqes: + dma_free_coherent(hdev->dev, CQ_SIZE(depth), (void *)hiraidq->cqes, + hiraidq->cq_buffer_phy); + return ret; +} + +static int hiraid_wait_control_ready(struct hiraid_dev *hdev, u64 cap, bool enabled) +{ + unsigned long timeout = + ((HIRAID_CAP_TIMEOUT(cap) + 1) * HIRAID_CAP_TIMEOUT_UNIT_MS) + jiffies; + u32 bit = enabled ? HIRAID_CSTS_RDY : 0; + + while ((readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_RDY) != bit) { + usleep_range(1000, 2000); + if (fatal_signal_pending(current)) + return -EINTR; + + if (time_after(jiffies, timeout)) { + dev_err(hdev->dev, "device not ready; aborting %s\n", + enabled ? "initialisation" : "reset"); + return -ENODEV; + } + } + return 0; +} + +static int hiraid_shutdown_control(struct hiraid_dev *hdev) +{ + unsigned long timeout = le32_to_cpu(hdev->ctrl_info->rtd3e) / 1000000 * HZ + jiffies; + + hdev->ctrl_config &= ~HIRAID_CC_SHN_MASK; + hdev->ctrl_config |= HIRAID_CC_SHN_NORMAL; + writel(hdev->ctrl_config, hdev->bar + HIRAID_REG_CC); + + while ((readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_SHST_MASK) != + HIRAID_CSTS_SHST_CMPLT) { + msleep(100); + if (fatal_signal_pending(current)) + return -EINTR; + if (time_after(jiffies, timeout)) { + dev_err(hdev->dev, "device shutdown incomplete, abort shutdown\n"); + return -ENODEV; + } + } + return 0; +} + +static int hiraid_disable_control(struct hiraid_dev *hdev) +{ + hdev->ctrl_config &= ~HIRAID_CC_SHN_MASK; + hdev->ctrl_config &= ~HIRAID_CC_ENABLE; + writel(hdev->ctrl_config, hdev->bar + HIRAID_REG_CC); + + return hiraid_wait_control_ready(hdev, hdev->cap, false); +} + +static int hiraid_enable_control(struct hiraid_dev *hdev) +{ + u64 cap = hdev->cap; + u32 dev_page_min = HIRAID_CAP_MPSMIN(cap) + 12; + u32 page_shift = PAGE_SHIFT; + + if (page_shift < dev_page_min) { + dev_err(hdev->dev, "minimum device page size[%u], too large for host[%u]\n", + 1U << dev_page_min, 1U << page_shift); + return -ENODEV; + } + + page_shift = min_t(unsigned int, HIRAID_CAP_MPSMAX(cap) + 12, PAGE_SHIFT); + hdev->page_size = 1U << page_shift; + + hdev->ctrl_config = HIRAID_CC_CSS_NVM; + hdev->ctrl_config |= (page_shift - 12) << HIRAID_CC_MPS_SHIFT; + hdev->ctrl_config |= HIRAID_CC_AMS_RR | HIRAID_CC_SHN_NONE; + hdev->ctrl_config |= HIRAID_CC_IOSQES | HIRAID_CC_IOCQES; + hdev->ctrl_config |= HIRAID_CC_ENABLE; + writel(hdev->ctrl_config, hdev->bar + HIRAID_REG_CC); + + return hiraid_wait_control_ready(hdev, cap, true); +} + +static void hiraid_init_queue(struct hiraid_queue *hiraidq, u16 qid) +{ + struct hiraid_dev *hdev = hiraidq->hdev; + + memset((void *)hiraidq->cqes, 0, CQ_SIZE(hiraidq->q_depth)); + + hiraidq->sq_tail = 0; + hiraidq->cq_head = 0; + hiraidq->cq_phase = 1; + hiraidq->q_db = &hdev->dbs[qid * 2 * hdev->db_stride]; + hiraidq->prp_small_pool = hdev->prp_extra_pool[qid % extra_pool_num]; + hdev->online_queues++; + atomic_set(&hiraidq->inflight, 0); +} + +static inline bool hiraid_cqe_pending(struct hiraid_queue *hiraidq) +{ + return (le16_to_cpu(hiraidq->cqes[hiraidq->cq_head].status) & 1) == + hiraidq->cq_phase; +} + +static void hiraid_complete_io_cmnd(struct hiraid_queue *ioq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = ioq->hdev; + struct blk_mq_tags *tags; + struct scsi_cmnd *scmd; + struct hiraid_mapmange *mapbuf; + struct request *req; + unsigned long elapsed; + + atomic_dec(&ioq->inflight); + + if (work_mode) + tags = hdev->shost->tag_set.tags[0]; + else + tags = hdev->shost->tag_set.tags[ioq->qid - 1]; + req = blk_mq_tag_to_rq(tags, le16_to_cpu(cqe->cmd_id)); + if (unlikely(!req || !blk_mq_request_started(req))) { + dev_warn(hdev->dev, "invalid id[%d] completed on queue[%d]\n", + le16_to_cpu(cqe->cmd_id), ioq->qid); + return; + } + + scmd = blk_mq_rq_to_pdu(req); + mapbuf = scsi_cmd_priv(scmd); + + elapsed = jiffies - scmd->jiffies_at_alloc; + dev_log_dbg(hdev->dev, "cid[%d] qid[%d] finish IO cost %3ld.%3ld seconds\n", + le16_to_cpu(cqe->cmd_id), ioq->qid, elapsed / HZ, elapsed % HZ); + + if (cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_COMPLETE) != CMD_FLIGHT) { + dev_warn(hdev->dev, "cid[%d] qid[%d] enters abnormal handler, cost %3ld.%3ld seconds\n", + le16_to_cpu(cqe->cmd_id), ioq->qid, elapsed / HZ, elapsed % HZ); + WRITE_ONCE(mapbuf->state, CMD_TMO_COMPLETE); + + if (mapbuf->sge_cnt) { + mapbuf->sge_cnt = 0; + scsi_dma_unmap(scmd); + } + hiraid_free_mapbuf(hdev, mapbuf); + + return; + } + + hiraid_check_status(mapbuf, scmd, cqe); + if (mapbuf->sge_cnt) { + mapbuf->sge_cnt = 0; + scsi_dma_unmap(scmd); + } + hiraid_free_mapbuf(hdev, mapbuf); + scmd->scsi_done(scmd); +} + +static void hiraid_complete_admin_cmnd(struct hiraid_queue *adminq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = adminq->hdev; + struct hiraid_cmd *adm_cmd; + + adm_cmd = hdev->adm_cmds + le16_to_cpu(cqe->cmd_id); + if (unlikely(adm_cmd->state == CMD_IDLE)) { + dev_warn(adminq->hdev->dev, "invalid id[%d] completed on queue[%d]\n", + le16_to_cpu(cqe->cmd_id), le16_to_cpu(cqe->sq_id)); + return; + } + + adm_cmd->status = le16_to_cpu(cqe->status) >> 1; + adm_cmd->result0 = le32_to_cpu(cqe->result); + adm_cmd->result1 = le32_to_cpu(cqe->result1); + + complete(&adm_cmd->cmd_done); +} + +static void hiraid_send_async_event(struct hiraid_dev *hdev, u16 cid); + +static void hiraid_complete_async_event(struct hiraid_queue *hiraidq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = hiraidq->hdev; + u32 result = le32_to_cpu(cqe->result); + + dev_info(hdev->dev, "recv async event, cid[%d] status[0x%x] result[0x%x]\n", + le16_to_cpu(cqe->cmd_id), le16_to_cpu(cqe->status) >> 1, result); + + hiraid_send_async_event(hdev, le16_to_cpu(cqe->cmd_id)); + + if ((le16_to_cpu(cqe->status) >> 1) != HIRAID_SC_SUCCESS) + return; + switch (result & 0x7) { + case HIRAID_ASYN_EVENT_NOTICE: + hiraid_handle_async_notice(hdev, result); + break; + case HIRAID_ASYN_EVENT_VS: + hiraid_handle_async_vs(hdev, result, le32_to_cpu(cqe->result1)); + break; + default: + dev_warn(hdev->dev, "unsupported async event type[%u]\n", result & 0x7); + break; + } +} + +static void hiraid_complete_pthru_cmnd(struct hiraid_queue *ioq, struct hiraid_completion *cqe) +{ + struct hiraid_dev *hdev = ioq->hdev; + struct hiraid_cmd *ptcmd; + + ptcmd = hdev->io_ptcmds + (ioq->qid - 1) * HIRAID_PTHRU_CMDS_PERQ + + le16_to_cpu(cqe->cmd_id) - hdev->scsi_qd; + + ptcmd->status = le16_to_cpu(cqe->status) >> 1; + ptcmd->result0 = le32_to_cpu(cqe->result); + ptcmd->result1 = le32_to_cpu(cqe->result1); + + complete(&ptcmd->cmd_done); +} + +static inline void hiraid_handle_cqe(struct hiraid_queue *hiraidq, u16 idx) +{ + struct hiraid_completion *cqe = &hiraidq->cqes[idx]; + struct hiraid_dev *hdev = hiraidq->hdev; + u16 cid = le16_to_cpu(cqe->cmd_id); + + if (unlikely(!work_mode && (cid >= hiraidq->q_depth))) { + dev_err(hdev->dev, "invalid command id[%d] completed on queue[%d]\n", + cid, cqe->sq_id); + return; + } + + dev_log_dbg(hdev->dev, "cid[%d] qid[%d] result[0x%x] sqid[%d] status[0x%x]\n", + cid, hiraidq->qid, le32_to_cpu(cqe->result), + le16_to_cpu(cqe->sq_id), le16_to_cpu(cqe->status)); + + if (unlikely(hiraidq->qid == 0 && cid >= HIRAID_AQ_BLK_MQ_DEPTH)) { + hiraid_complete_async_event(hiraidq, cqe); + return; + } + + if (unlikely(hiraidq->qid && cid >= hdev->scsi_qd)) { + hiraid_complete_pthru_cmnd(hiraidq, cqe); + return; + } + + if (hiraidq->qid) + hiraid_complete_io_cmnd(hiraidq, cqe); + else + hiraid_complete_admin_cmnd(hiraidq, cqe); +} + +static void hiraid_complete_cqes(struct hiraid_queue *hiraidq, u16 start, u16 end) +{ + while (start != end) { + hiraid_handle_cqe(hiraidq, start); + if (++start == hiraidq->q_depth) + start = 0; + } +} + +static inline void hiraid_update_cq_head(struct hiraid_queue *hiraidq) +{ + if (++hiraidq->cq_head == hiraidq->q_depth) { + hiraidq->cq_head = 0; + hiraidq->cq_phase = !hiraidq->cq_phase; + } +} + +static inline bool hiraid_process_cq(struct hiraid_queue *hiraidq, u16 *start, u16 *end, int tag) +{ + bool found = false; + + *start = hiraidq->cq_head; + while (!found && hiraid_cqe_pending(hiraidq)) { + if (le16_to_cpu(hiraidq->cqes[hiraidq->cq_head].cmd_id) == tag) + found = true; + hiraid_update_cq_head(hiraidq); + } + *end = hiraidq->cq_head; + + if (*start != *end) + writel(hiraidq->cq_head, hiraidq->q_db + hiraidq->hdev->db_stride); + + return found; +} + +static bool hiraid_poll_cq(struct hiraid_queue *hiraidq, int cid) +{ + u16 start, end; + bool found; + + if (!hiraid_cqe_pending(hiraidq)) + return 0; + + spin_lock_irq(&hiraidq->cq_lock); + found = hiraid_process_cq(hiraidq, &start, &end, cid); + spin_unlock_irq(&hiraidq->cq_lock); + + hiraid_complete_cqes(hiraidq, start, end); + return found; +} + +static irqreturn_t hiraid_handle_irq(int irq, void *data) +{ + struct hiraid_queue *hiraidq = data; + irqreturn_t ret = IRQ_NONE; + u16 start, end; + + spin_lock(&hiraidq->cq_lock); + if (hiraidq->cq_head != hiraidq->last_cq_head) + ret = IRQ_HANDLED; + + hiraid_process_cq(hiraidq, &start, &end, -1); + hiraidq->last_cq_head = hiraidq->cq_head; + spin_unlock(&hiraidq->cq_lock); + + if (start != end) { + hiraid_complete_cqes(hiraidq, start, end); + ret = IRQ_HANDLED; + } + return ret; +} + +static int hiraid_setup_admin_queue(struct hiraid_dev *hdev) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + u32 aqa; + int ret; + + dev_info(hdev->dev, "start disable controller\n"); + + ret = hiraid_disable_control(hdev); + if (ret) + return ret; + + ret = hiraid_alloc_queue(hdev, 0, HIRAID_AQ_DEPTH); + if (ret) + return ret; + + aqa = adminq->q_depth - 1; + aqa |= aqa << 16; + writel(aqa, hdev->bar + HIRAID_REG_AQA); + lo_hi_writeq(adminq->sq_buffer_phy, hdev->bar + HIRAID_REG_ASQ); + lo_hi_writeq(adminq->cq_buffer_phy, hdev->bar + HIRAID_REG_ACQ); + + dev_info(hdev->dev, "start enable controller\n"); + + ret = hiraid_enable_control(hdev); + if (ret) { + ret = -ENODEV; + return ret; + } + + adminq->cq_vector = 0; + ret = pci_request_irq(hdev->pdev, adminq->cq_vector, hiraid_handle_irq, NULL, + adminq, "hiraid%d_q%d", hdev->instance, adminq->qid); + if (ret) { + adminq->cq_vector = -1; + return ret; + } + + hiraid_init_queue(adminq, 0); + + dev_info(hdev->dev, "setup admin queue success, queuecount[%d] online[%d] pagesize[%d]\n", + hdev->queue_count, hdev->online_queues, hdev->page_size); + + return 0; +} + +static u32 hiraid_get_bar_size(struct hiraid_dev *hdev, u32 nr_ioqs) +{ + return (HIRAID_REG_DBS + ((nr_ioqs + 1) * 8 * hdev->db_stride)); +} + +static int hiraid_create_admin_cmds(struct hiraid_dev *hdev) +{ + u16 i; + + INIT_LIST_HEAD(&hdev->adm_cmd_list); + spin_lock_init(&hdev->adm_cmd_lock); + + hdev->adm_cmds = kcalloc_node(HIRAID_AQ_BLK_MQ_DEPTH, sizeof(struct hiraid_cmd), + GFP_KERNEL, hdev->numa_node); + + if (!hdev->adm_cmds) { + dev_err(hdev->dev, "alloc admin cmds failed\n"); + return -ENOMEM; + } + + for (i = 0; i < HIRAID_AQ_BLK_MQ_DEPTH; i++) { + hdev->adm_cmds[i].qid = 0; + hdev->adm_cmds[i].cid = i; + list_add_tail(&(hdev->adm_cmds[i].list), &hdev->adm_cmd_list); + } + + dev_info(hdev->dev, "alloc admin cmds success, num[%d]\n", HIRAID_AQ_BLK_MQ_DEPTH); + + return 0; +} + +static void hiraid_free_admin_cmds(struct hiraid_dev *hdev) +{ + kfree(hdev->adm_cmds); + hdev->adm_cmds = NULL; + INIT_LIST_HEAD(&hdev->adm_cmd_list); +} + +static struct hiraid_cmd *hiraid_get_cmd(struct hiraid_dev *hdev, enum hiraid_cmd_type type) +{ + struct hiraid_cmd *cmd = NULL; + unsigned long flags; + struct list_head *head = &hdev->adm_cmd_list; + spinlock_t *slock = &hdev->adm_cmd_lock; + + if (type == HIRAID_CMD_PTHRU) { + head = &hdev->io_pt_list; + slock = &hdev->io_pt_lock; + } + + spin_lock_irqsave(slock, flags); + if (list_empty(head)) { + spin_unlock_irqrestore(slock, flags); + dev_err(hdev->dev, "err, cmd[%d] list empty\n", type); + return NULL; + } + cmd = list_entry(head->next, struct hiraid_cmd, list); + list_del_init(&cmd->list); + spin_unlock_irqrestore(slock, flags); + + WRITE_ONCE(cmd->state, CMD_FLIGHT); + + return cmd; +} + +static void hiraid_put_cmd(struct hiraid_dev *hdev, struct hiraid_cmd *cmd, + enum hiraid_cmd_type type) +{ + unsigned long flags; + struct list_head *head = &hdev->adm_cmd_list; + spinlock_t *slock = &hdev->adm_cmd_lock; + + if (type == HIRAID_CMD_PTHRU) { + head = &hdev->io_pt_list; + slock = &hdev->io_pt_lock; + } + + spin_lock_irqsave(slock, flags); + WRITE_ONCE(cmd->state, CMD_IDLE); + list_add_tail(&cmd->list, head); + spin_unlock_irqrestore(slock, flags); +} + +static bool hiraid_admin_need_reset(struct hiraid_admin_command *cmd) +{ + switch (cmd->common.opcode) { + case HIRAID_ADMIN_DELETE_SQ: + case HIRAID_ADMIN_CREATE_SQ: + case HIRAID_ADMIN_DELETE_CQ: + case HIRAID_ADMIN_CREATE_CQ: + case HIRAID_ADMIN_SET_FEATURES: + return false; + default: + return true; + } +} + +static int hiraid_reset_work_sync(struct hiraid_dev *hdev); +static inline void hiraid_admin_timeout(struct hiraid_dev *hdev, struct hiraid_cmd *cmd) +{ + /* command may be returned because controller reset */ + if (READ_ONCE(cmd->state) == CMD_COMPLETE) + return; + if (hiraid_reset_work_sync(hdev) == -EBUSY) + flush_work(&hdev->reset_work); +} + +static int hiraid_put_admin_sync_request(struct hiraid_dev *hdev, struct hiraid_admin_command *cmd, + u32 *result0, u32 *result1, u32 timeout) +{ + struct hiraid_cmd *adm_cmd = hiraid_get_cmd(hdev, HIRAID_CMD_ADMIN); + + if (!adm_cmd) { + dev_err(hdev->dev, "err, get admin cmd failed\n"); + return -EFAULT; + } + + timeout = timeout ? timeout : ADMIN_TIMEOUT; + + init_completion(&adm_cmd->cmd_done); + + cmd->common.cmd_id = cpu_to_le16(adm_cmd->cid); + hiraid_submit_cmd(&hdev->queues[0], cmd); + + if (!wait_for_completion_timeout(&adm_cmd->cmd_done, timeout)) { + dev_err(hdev->dev, "cid[%d] qid[%d] timeout, opcode[0x%x] subopcode[0x%x]\n", + adm_cmd->cid, adm_cmd->qid, cmd->usr_cmd.opcode, + cmd->usr_cmd.info_0.subopcode); + + /* reset controller if admin timeout */ + if (hiraid_admin_need_reset(cmd)) + hiraid_admin_timeout(hdev, adm_cmd); + + hiraid_put_cmd(hdev, adm_cmd, HIRAID_CMD_ADMIN); + return -ETIME; + } + + if (result0) + *result0 = adm_cmd->result0; + if (result1) + *result1 = adm_cmd->result1; + + hiraid_put_cmd(hdev, adm_cmd, HIRAID_CMD_ADMIN); + + return adm_cmd->status; +} + +/** + * hiraid_create_cq - send cmd to controller for create controller cq + */ +static int hiraid_create_complete_queue(struct hiraid_dev *hdev, u16 qid, + struct hiraid_queue *hiraidq, u16 cq_vector) +{ + struct hiraid_admin_command admin_cmd; + int flags = HIRAID_QUEUE_PHYS_CONTIG | HIRAID_CQ_IRQ_ENABLED; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.create_cq.opcode = HIRAID_ADMIN_CREATE_CQ; + admin_cmd.create_cq.prp1 = cpu_to_le64(hiraidq->cq_buffer_phy); + admin_cmd.create_cq.cqid = cpu_to_le16(qid); + admin_cmd.create_cq.qsize = cpu_to_le16(hiraidq->q_depth - 1); + admin_cmd.create_cq.cq_flags = cpu_to_le16(flags); + admin_cmd.create_cq.irq_vector = cpu_to_le16(cq_vector); + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +/** + * hiraid_create_sq - send cmd to controller for create controller sq + */ +static int hiraid_create_send_queue(struct hiraid_dev *hdev, u16 qid, + struct hiraid_queue *hiraidq) +{ + struct hiraid_admin_command admin_cmd; + int flags = HIRAID_QUEUE_PHYS_CONTIG; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.create_sq.opcode = HIRAID_ADMIN_CREATE_SQ; + admin_cmd.create_sq.prp1 = cpu_to_le64(hiraidq->sq_buffer_phy); + admin_cmd.create_sq.sqid = cpu_to_le16(qid); + admin_cmd.create_sq.qsize = cpu_to_le16(hiraidq->q_depth - 1); + admin_cmd.create_sq.sq_flags = cpu_to_le16(flags); + admin_cmd.create_sq.cqid = cpu_to_le16(qid); + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +static void hiraid_free_all_queues(struct hiraid_dev *hdev) +{ + int i; + struct hiraid_queue *hq; + + for (i = 0; i < hdev->queue_count; i++) { + hq = &hdev->queues[i]; + dma_free_coherent(hdev->dev, CQ_SIZE(hq->q_depth), + (void *)hq->cqes, hq->cq_buffer_phy); + dma_free_coherent(hdev->dev, SQ_SIZE(hq->qid, hq->q_depth), + hq->sq_cmds, hq->sq_buffer_phy); + if (!work_mode) + dma_free_coherent(hdev->dev, SENSE_SIZE(hq->q_depth), + hq->sense_buffer_virt, hq->sense_buffer_phy); + } + + hdev->queue_count = 0; +} + +static void hiraid_free_sense_buffer(struct hiraid_dev *hdev) +{ + if (hdev->sense_buffer_virt) { + dma_free_coherent(hdev->dev, + SENSE_SIZE(hdev->scsi_qd + max_hwq_num * HIRAID_PTHRU_CMDS_PERQ), + hdev->sense_buffer_virt, hdev->sense_buffer_phy); + hdev->sense_buffer_virt = NULL; + } +} + +static int hiraid_delete_queue(struct hiraid_dev *hdev, u8 opcode, u16 qid) +{ + struct hiraid_admin_command admin_cmd; + int ret; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.delete_queue.opcode = opcode; + admin_cmd.delete_queue.qid = cpu_to_le16(qid); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); + + if (ret) + dev_err(hdev->dev, "delete %s:[%d] failed\n", + (opcode == HIRAID_ADMIN_DELETE_CQ) ? "cq" : "sq", qid); + + return ret; +} + +static int hiraid_delete_complete_queue(struct hiraid_dev *hdev, u16 cqid) +{ + return hiraid_delete_queue(hdev, HIRAID_ADMIN_DELETE_CQ, cqid); +} + +static int hiraid_delete_send_queue(struct hiraid_dev *hdev, u16 sqid) +{ + return hiraid_delete_queue(hdev, HIRAID_ADMIN_DELETE_SQ, sqid); +} + +static int hiraid_create_queue(struct hiraid_queue *hiraidq, u16 qid) +{ + struct hiraid_dev *hdev = hiraidq->hdev; + u16 cq_vector; + int ret; + + cq_vector = (hdev->num_vecs == 1) ? 0 : qid; + ret = hiraid_create_complete_queue(hdev, qid, hiraidq, cq_vector); + if (ret) + return ret; + + ret = hiraid_create_send_queue(hdev, qid, hiraidq); + if (ret) + goto delete_cq; + + hiraidq->cq_vector = cq_vector; + ret = pci_request_irq(hdev->pdev, cq_vector, hiraid_handle_irq, NULL, + hiraidq, "hiraid%d_q%d", hdev->instance, qid); + if (ret) { + hiraidq->cq_vector = -1; + dev_err(hdev->dev, "request queue[%d] irq failed\n", qid); + goto delete_sq; + } + + hiraid_init_queue(hiraidq, qid); + + return 0; + +delete_sq: + hiraid_delete_send_queue(hdev, qid); +delete_cq: + hiraid_delete_complete_queue(hdev, qid); + + return ret; +} + +static int hiraid_create_io_queues(struct hiraid_dev *hdev) +{ + u32 i, max; + int ret = 0; + + max = min(hdev->max_qid, hdev->queue_count - 1); + for (i = hdev->online_queues; i <= max; i++) { + ret = hiraid_create_queue(&hdev->queues[i], i); + if (ret) { + dev_err(hdev->dev, "create queue[%d] failed\n", i); + break; + } + } + + if (!hdev->last_qcnt) + hdev->last_qcnt = hdev->online_queues; + + dev_info(hdev->dev, "queue_count[%d] online_queue[%d] last_online[%d]", + hdev->queue_count, hdev->online_queues, hdev->last_qcnt); + + return ret >= 0 ? 0 : ret; +} + +static int hiraid_set_features(struct hiraid_dev *hdev, u32 fid, u32 dword11, void *buffer, + size_t buflen, u32 *result) +{ + struct hiraid_admin_command admin_cmd; + int ret; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + + if (buffer && buflen) { + data_ptr = dma_alloc_coherent(hdev->dev, buflen, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memcpy(data_ptr, buffer, buflen); + } + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.features.opcode = HIRAID_ADMIN_SET_FEATURES; + admin_cmd.features.fid = cpu_to_le32(fid); + admin_cmd.features.dword11 = cpu_to_le32(dword11); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, result, NULL, 0); + + if (data_ptr) + dma_free_coherent(hdev->dev, buflen, data_ptr, buffer_phy); + + return ret; +} + +static int hiraid_configure_timestamp(struct hiraid_dev *hdev) +{ + __le64 timestamp; + int ret; + + timestamp = cpu_to_le64(ktime_to_ms(ktime_get_real())); + ret = hiraid_set_features(hdev, HIRAID_FEATURE_TIMESTAMP, 0, + &timestamp, sizeof(timestamp), NULL); + + if (ret) + dev_err(hdev->dev, "set timestamp failed[%d]\n", ret); + return ret; +} + +static int hiraid_get_queue_cnt(struct hiraid_dev *hdev, u32 *cnt) +{ + u32 q_cnt = (*cnt - 1) | ((*cnt - 1) << 16); + u32 nr_ioqs, result; + int status; + + status = hiraid_set_features(hdev, HIRAID_FEATURE_NUM_QUEUES, q_cnt, NULL, 0, &result); + if (status) { + dev_err(hdev->dev, "set queue count failed, status[%d]\n", + status); + return -EIO; + } + + nr_ioqs = min(result & 0xffff, result >> 16) + 1; + *cnt = min(*cnt, nr_ioqs); + if (*cnt == 0) { + dev_err(hdev->dev, "illegal qcount: zero, nr_ioqs[%d], cnt[%d]\n", nr_ioqs, *cnt); + return -EIO; + } + return 0; +} + +static int hiraid_setup_io_queues(struct hiraid_dev *hdev) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + struct pci_dev *pdev = hdev->pdev; + u32 i, size, nr_ioqs; + int ret; + + struct irq_affinity affd = { + .pre_vectors = 1 + }; + + /* alloc IO sense buffer for single hw queue mode */ + if (work_mode && !hdev->sense_buffer_virt) { + hdev->sense_buffer_virt = dma_alloc_coherent(hdev->dev, + SENSE_SIZE(hdev->scsi_qd + max_hwq_num * HIRAID_PTHRU_CMDS_PERQ), + &hdev->sense_buffer_phy, GFP_KERNEL | __GFP_ZERO); + if (!hdev->sense_buffer_virt) + return -ENOMEM; + } + + nr_ioqs = min(num_online_cpus(), max_hwq_num); + ret = hiraid_get_queue_cnt(hdev, &nr_ioqs); + if (ret < 0) + return ret; + + size = hiraid_get_bar_size(hdev, nr_ioqs); + ret = hiraid_remap_bar(hdev, size); + if (ret) + return -ENOMEM; + + adminq->q_db = hdev->dbs; + + pci_free_irq(pdev, 0, adminq); + pci_free_irq_vectors(pdev); + hdev->online_queues--; + + ret = pci_alloc_irq_vectors_affinity(pdev, 1, (nr_ioqs + 1), + PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); + if (ret <= 0) + return -EIO; + + hdev->num_vecs = ret; + hdev->max_qid = max(ret - 1, 1); + + ret = pci_request_irq(pdev, adminq->cq_vector, hiraid_handle_irq, NULL, + adminq, "hiraid%d_q%d", hdev->instance, adminq->qid); + if (ret) { + dev_err(hdev->dev, "request admin irq failed\n"); + adminq->cq_vector = -1; + return ret; + } + + hdev->online_queues++; + + for (i = hdev->queue_count; i <= hdev->max_qid; i++) { + ret = hiraid_alloc_queue(hdev, i, hdev->ioq_depth); + if (ret) + break; + } + dev_info(hdev->dev, "max_qid[%d] queuecount[%d] onlinequeue[%d] ioqdepth[%d]\n", + hdev->max_qid, hdev->queue_count, hdev->online_queues, hdev->ioq_depth); + + return hiraid_create_io_queues(hdev); +} + +static void hiraid_delete_io_queues(struct hiraid_dev *hdev) +{ + u16 queues = hdev->online_queues - 1; + u8 opcode = HIRAID_ADMIN_DELETE_SQ; + u16 i, pass; + + if (!pci_device_is_present(hdev->pdev)) { + dev_err(hdev->dev, "pci_device is not present, skip disable io queues\n"); + return; + } + + if (hdev->online_queues < 2) { + dev_err(hdev->dev, "err, io queue has been delete\n"); + return; + } + + for (pass = 0; pass < 2; pass++) { + for (i = queues; i > 0; i--) + if (hiraid_delete_queue(hdev, opcode, i)) + break; + + opcode = HIRAID_ADMIN_DELETE_CQ; + } +} + +static void hiraid_pci_disable(struct hiraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + u32 i; + + for (i = 0; i < hdev->online_queues; i++) + pci_free_irq(pdev, hdev->queues[i].cq_vector, &hdev->queues[i]); + pci_free_irq_vectors(pdev); + if (pci_is_enabled(pdev)) { + pci_disable_pcie_error_reporting(pdev); + pci_disable_device(pdev); + } + hdev->online_queues = 0; +} + +static void hiraid_disable_admin_queue(struct hiraid_dev *hdev, bool shutdown) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + u16 start, end; + + if (pci_device_is_present(hdev->pdev)) { + if (shutdown) + hiraid_shutdown_control(hdev); + else + hiraid_disable_control(hdev); + } + + if (hdev->queue_count == 0) { + dev_err(hdev->dev, "err, admin queue has been delete\n"); + return; + } + + spin_lock_irq(&adminq->cq_lock); + hiraid_process_cq(adminq, &start, &end, -1); + spin_unlock_irq(&adminq->cq_lock); + hiraid_complete_cqes(adminq, start, end); +} + +static int hiraid_create_prp_pools(struct hiraid_dev *hdev) +{ + int i; + char poolname[20] = { 0 }; + + hdev->prp_page_pool = dma_pool_create("prp list page", hdev->dev, + PAGE_SIZE, PAGE_SIZE, 0); + + if (!hdev->prp_page_pool) { + dev_err(hdev->dev, "create prp_page_pool failed\n"); + return -ENOMEM; + } + + for (i = 0; i < extra_pool_num; i++) { + sprintf(poolname, "prp_list_256_%d", i); + hdev->prp_extra_pool[i] = dma_pool_create(poolname, hdev->dev, EXTRA_POOL_SIZE, + EXTRA_POOL_SIZE, 0); + + if (!hdev->prp_extra_pool[i]) { + dev_err(hdev->dev, "create prp extra pool[%d] failed\n", i); + goto destroy_prp_extra_pool; + } + } + + return 0; + +destroy_prp_extra_pool: + while (i > 0) + dma_pool_destroy(hdev->prp_extra_pool[--i]); + dma_pool_destroy(hdev->prp_page_pool); + + return -ENOMEM; +} + +static void hiraid_free_prp_pools(struct hiraid_dev *hdev) +{ + int i; + + for (i = 0; i < extra_pool_num; i++) + dma_pool_destroy(hdev->prp_extra_pool[i]); + dma_pool_destroy(hdev->prp_page_pool); +} + +static int hiraid_request_devices(struct hiraid_dev *hdev, struct hiraid_dev_info *dev) +{ + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + struct hiraid_admin_command admin_cmd; + struct hiraid_dev_list *list_buf; + dma_addr_t buffer_phy = 0; + u32 i, idx, hdid, ndev; + int ret = 0; + + list_buf = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!list_buf) + return -ENOMEM; + + for (idx = 0; idx < nd;) { + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.get_info.opcode = HIRAID_ADMIN_GET_INFO; + admin_cmd.get_info.type = HIRAID_GET_DEVLIST_INFO; + admin_cmd.get_info.cdw11 = cpu_to_le32(idx); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); + + if (ret) { + dev_err(hdev->dev, "get device list failed, nd[%u] idx[%u] ret[%d]\n", + nd, idx, ret); + goto out; + } + ndev = le32_to_cpu(list_buf->dev_num); + + dev_info(hdev->dev, "get dev list ndev num[%u]\n", ndev); + + for (i = 0; i < ndev; i++) { + hdid = le32_to_cpu(list_buf->devinfo[i].hdid); + dev_info(hdev->dev, "devices[%d], hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + i, hdid, le16_to_cpu(list_buf->devinfo[i].target), + list_buf->devinfo[i].channel, + list_buf->devinfo[i].lun, + list_buf->devinfo[i].attr); + if (hdid > nd || hdid == 0) { + dev_err(hdev->dev, "err, hdid[%d] invalid\n", hdid); + continue; + } + memcpy(&dev[hdid - 1], &list_buf->devinfo[i], + sizeof(struct hiraid_dev_info)); + } + idx += ndev; + + if (ndev < MAX_DEV_ENTRY_PER_PAGE_4K) + break; + } + +out: + dma_free_coherent(hdev->dev, PAGE_SIZE, list_buf, buffer_phy); + return ret; +} + +static void hiraid_send_async_event(struct hiraid_dev *hdev, u16 cid) +{ + struct hiraid_queue *adminq = &hdev->queues[0]; + struct hiraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.common.opcode = HIRAID_ADMIN_ASYNC_EVENT; + admin_cmd.common.cmd_id = cpu_to_le16(cid); + + hiraid_submit_cmd(adminq, &admin_cmd); + dev_info(hdev->dev, "send async event to controller, cid[%d]\n", cid); +} + +static inline void hiraid_init_async_event(struct hiraid_dev *hdev) +{ + u16 i; + + for (i = 0; i < hdev->ctrl_info->asynevent; i++) + hiraid_send_async_event(hdev, i + HIRAID_AQ_BLK_MQ_DEPTH); +} + +static int hiraid_add_device(struct hiraid_dev *hdev, struct hiraid_dev_info *devinfo) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + dev_info(hdev->dev, "add device, hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + le32_to_cpu(devinfo->hdid), le16_to_cpu(devinfo->target), + devinfo->channel, devinfo->lun, devinfo->attr); + + sdev = scsi_device_lookup(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + if (sdev) { + dev_warn(hdev->dev, "device is already exist, channel[%d] targetid[%d] lun[%d]\n", + devinfo->channel, le16_to_cpu(devinfo->target), 0); + scsi_device_put(sdev); + return -EEXIST; + } + scsi_add_device(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + return 0; +} + +static int hiraid_rescan_device(struct hiraid_dev *hdev, struct hiraid_dev_info *devinfo) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + dev_info(hdev->dev, "rescan device, hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + le32_to_cpu(devinfo->hdid), le16_to_cpu(devinfo->target), + devinfo->channel, devinfo->lun, devinfo->attr); + + sdev = scsi_device_lookup(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + if (!sdev) { + dev_warn(hdev->dev, "device is not exit rescan it, channel[%d] target_id[%d] lun[%d]\n", + devinfo->channel, le16_to_cpu(devinfo->target), 0); + return -ENODEV; + } + + scsi_rescan_device(&sdev->sdev_gendev); + scsi_device_put(sdev); + return 0; +} + +static int hiraid_delete_device(struct hiraid_dev *hdev, struct hiraid_dev_info *devinfo) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + dev_info(hdev->dev, "remove device, hdid[%u] target[%d] channel[%d] lun[%d] attr[0x%x]\n", + le32_to_cpu(devinfo->hdid), le16_to_cpu(devinfo->target), + devinfo->channel, devinfo->lun, devinfo->attr); + + sdev = scsi_device_lookup(shost, devinfo->channel, le16_to_cpu(devinfo->target), 0); + if (!sdev) { + dev_warn(hdev->dev, "device is not exit remove it, channel[%d] target_id[%d] lun[%d]\n", + devinfo->channel, le16_to_cpu(devinfo->target), 0); + return -ENODEV; + } + + scsi_remove_device(sdev); + scsi_device_put(sdev); + return 0; +} + +static int hiraid_dev_list_init(struct hiraid_dev *hdev) +{ + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + + hdev->dev_info = kzalloc_node(nd * sizeof(struct hiraid_dev_info), + GFP_KERNEL, hdev->numa_node); + if (!hdev->dev_info) + return -ENOMEM; + + return 0; +} + +static int hiraid_luntarget_sort(const void *l, const void *r) +{ + const struct hiraid_dev_info *ln = l; + const struct hiraid_dev_info *rn = r; + int l_attr = HIRAID_DEV_INFO_ATTR_BOOT(ln->attr); + int r_attr = HIRAID_DEV_INFO_ATTR_BOOT(rn->attr); + + /* boot first */ + if (l_attr != r_attr) + return (r_attr - l_attr); + + if (ln->channel == rn->channel) + return le16_to_cpu(ln->target) - le16_to_cpu(rn->target); + + return ln->channel - rn->channel; +} + +static void hiraid_scan_work(struct work_struct *work) +{ + struct hiraid_dev *hdev = + container_of(work, struct hiraid_dev, scan_work); + struct hiraid_dev_info *dev, *old_dev, *new_dev; + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + u8 flag, org_flag; + int i, ret; + int count = 0; + + dev = kcalloc(nd, sizeof(struct hiraid_dev_info), GFP_KERNEL); + if (!dev) + return; + + new_dev = kcalloc(nd, sizeof(struct hiraid_dev_info), GFP_KERNEL); + if (!new_dev) + goto free_list; + + ret = hiraid_request_devices(hdev, dev); + if (ret) + goto free_all; + old_dev = hdev->dev_info; + for (i = 0; i < nd; i++) { + org_flag = old_dev[i].flag; + flag = dev[i].flag; + + dev_log_dbg(hdev->dev, "i[%d] org_flag[0x%x] flag[0x%x]\n", i, org_flag, flag); + + if (HIRAID_DEV_INFO_FLAG_VALID(flag)) { + if (!HIRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->dev_rwsem); + memcpy(&old_dev[i], &dev[i], + sizeof(struct hiraid_dev_info)); + memcpy(&new_dev[count++], &dev[i], + sizeof(struct hiraid_dev_info)); + up_write(&hdev->dev_rwsem); + } else if (HIRAID_DEV_INFO_FLAG_CHANGE(flag)) { + hiraid_rescan_device(hdev, &dev[i]); + } + } else { + if (HIRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->dev_rwsem); + old_dev[i].flag &= 0xfe; + up_write(&hdev->dev_rwsem); + hiraid_delete_device(hdev, &old_dev[i]); + } + } + } + + dev_info(hdev->dev, "scan work add device num[%d]\n", count); + + sort(new_dev, count, sizeof(new_dev[0]), hiraid_luntarget_sort, NULL); + + for (i = 0; i < count; i++) + hiraid_add_device(hdev, &new_dev[i]); + +free_all: + kfree(new_dev); +free_list: + kfree(dev); +} + +static void hiraid_timesyn_work(struct work_struct *work) +{ + struct hiraid_dev *hdev = + container_of(work, struct hiraid_dev, timesyn_work); + + hiraid_configure_timestamp(hdev); +} + +static int hiraid_init_control_info(struct hiraid_dev *hdev); +static void hiraid_fwactive_work(struct work_struct *work) +{ + struct hiraid_dev *hdev = container_of(work, struct hiraid_dev, fwact_work); + + if (hiraid_init_control_info(hdev)) + dev_err(hdev->dev, "get controller info failed after fw activation\n"); +} + +static void hiraid_queue_scan(struct hiraid_dev *hdev) +{ + queue_work(work_queue, &hdev->scan_work); +} + +static void hiraid_handle_async_notice(struct hiraid_dev *hdev, u32 result) +{ + switch ((result & 0xff00) >> 8) { + case HIRAID_ASYN_DEV_CHANGED: + hiraid_queue_scan(hdev); + break; + case HIRAID_ASYN_FW_ACT_START: + dev_info(hdev->dev, "fw activation starting\n"); + break; + case HIRAID_ASYN_HOST_PROBING: + break; + default: + dev_warn(hdev->dev, "async event result[%08x]\n", result); + } +} + +static void hiraid_handle_async_vs(struct hiraid_dev *hdev, u32 result, u32 result1) +{ + switch ((result & 0xff00) >> 8) { + case HIRAID_ASYN_TIMESYN: + queue_work(work_queue, &hdev->timesyn_work); + break; + case HIRAID_ASYN_FW_ACT_FINISH: + dev_info(hdev->dev, "fw activation finish\n"); + queue_work(work_queue, &hdev->fwact_work); + break; + case HIRAID_ASYN_EVENT_MIN ... HIRAID_ASYN_EVENT_MAX: + dev_info(hdev->dev, "recv card event[%d] param1[0x%x] param2[0x%x]\n", + (result & 0xff00) >> 8, result, result1); + break; + default: + dev_warn(hdev->dev, "async event result[0x%x]\n", result); + } +} + +static int hiraid_alloc_resources(struct hiraid_dev *hdev) +{ + int ret, nqueue; + + hdev->ctrl_info = kzalloc_node(sizeof(*hdev->ctrl_info), GFP_KERNEL, hdev->numa_node); + if (!hdev->ctrl_info) + return -ENOMEM; + + ret = hiraid_create_prp_pools(hdev); + if (ret) + goto free_ctrl_info; + nqueue = min(num_possible_cpus(), max_hwq_num) + 1; + hdev->queues = kcalloc_node(nqueue, sizeof(struct hiraid_queue), + GFP_KERNEL, hdev->numa_node); + if (!hdev->queues) { + ret = -ENOMEM; + goto destroy_dma_pools; + } + + ret = hiraid_create_admin_cmds(hdev); + if (ret) + goto free_queues; + + dev_info(hdev->dev, "total queues num[%d]\n", nqueue); + + return 0; + +free_queues: + kfree(hdev->queues); +destroy_dma_pools: + hiraid_free_prp_pools(hdev); +free_ctrl_info: + kfree(hdev->ctrl_info); + + return ret; +} + +static void hiraid_free_resources(struct hiraid_dev *hdev) +{ + hiraid_free_admin_cmds(hdev); + kfree(hdev->queues); + hiraid_free_prp_pools(hdev); + kfree(hdev->ctrl_info); +} + +static void hiraid_bsg_buf_unmap(struct hiraid_dev *hdev, struct bsg_job *job) +{ + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_mapmange *mapbuf = job->dd_data; + enum dma_data_direction dma_dir = rq_data_dir(rq) ? DMA_TO_DEVICE : DMA_FROM_DEVICE; + + if (mapbuf->sge_cnt) + dma_unmap_sg(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, dma_dir); + + hiraid_free_mapbuf(hdev, mapbuf); +} + +static int hiraid_bsg_buf_map(struct hiraid_dev *hdev, struct bsg_job *job, + struct hiraid_admin_command *cmd) +{ + struct hiraid_bsg_request *bsg_req = job->request; + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_mapmange *mapbuf = job->dd_data; + enum dma_data_direction dma_dir = rq_data_dir(rq) ? DMA_TO_DEVICE : DMA_FROM_DEVICE; + int ret = 0; + + /* No data to DMA, it may be scsi no-rw command */ + mapbuf->sge_cnt = job->request_payload.sg_cnt; + mapbuf->sgl = job->request_payload.sg_list; + mapbuf->len = job->request_payload.payload_len; + mapbuf->page_cnt = -1; + if (unlikely(mapbuf->sge_cnt == 0)) + goto out; + + mapbuf->use_sgl = !hiraid_is_prp(hdev, mapbuf->sgl, mapbuf->sge_cnt); + + ret = dma_map_sg_attrs(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, dma_dir, DMA_ATTR_NO_WARN); + if (!ret) + goto out; + + if ((mapbuf->use_sgl == (bool)true) && (bsg_req->msgcode == HIRAID_BSG_IOPTHRU) && + (hdev->ctrl_info->pt_use_sgl != (bool)false)) { + ret = hiraid_build_passthru_sgl(hdev, cmd, mapbuf); + } else { + mapbuf->use_sgl = false; + + ret = hiraid_build_passthru_prp(hdev, mapbuf); + cmd->common.dptr.prp1 = cpu_to_le64(sg_dma_address(mapbuf->sgl)); + cmd->common.dptr.prp2 = cpu_to_le64(mapbuf->first_dma); + } + + if (ret) + goto unmap; + + return 0; + +unmap: + dma_unmap_sg(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, dma_dir); +out: + return ret; +} + +static int hiraid_get_control_info(struct hiraid_dev *hdev, struct hiraid_ctrl_info *ctrl_info) +{ + struct hiraid_admin_command admin_cmd; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + int ret; + + data_ptr = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.get_info.opcode = HIRAID_ADMIN_GET_INFO; + admin_cmd.get_info.type = HIRAID_GET_CTRL_INFO; + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); + if (!ret) + memcpy(ctrl_info, data_ptr, sizeof(struct hiraid_ctrl_info)); + + dma_free_coherent(hdev->dev, PAGE_SIZE, data_ptr, buffer_phy); + + return ret; +} + +static int hiraid_init_control_info(struct hiraid_dev *hdev) +{ + int ret; + + hdev->ctrl_info->nd = cpu_to_le32(240); + hdev->ctrl_info->mdts = 8; + hdev->ctrl_info->max_cmds = cpu_to_le16(4096); + hdev->ctrl_info->max_num_sge = cpu_to_le16(128); + hdev->ctrl_info->max_channel = cpu_to_le16(4); + hdev->ctrl_info->max_tgt_id = cpu_to_le32(3239); + hdev->ctrl_info->max_lun = cpu_to_le16(2); + + ret = hiraid_get_control_info(hdev, hdev->ctrl_info); + if (ret) + dev_err(hdev->dev, "get controller info failed[%d]\n", ret); + + dev_info(hdev->dev, "device_num = %d\n", hdev->ctrl_info->nd); + dev_info(hdev->dev, "max_cmd = %d\n", hdev->ctrl_info->max_cmds); + dev_info(hdev->dev, "max_channel = %d\n", hdev->ctrl_info->max_channel); + dev_info(hdev->dev, "max_tgt_id = %d\n", hdev->ctrl_info->max_tgt_id); + dev_info(hdev->dev, "max_lun = %d\n", hdev->ctrl_info->max_lun); + dev_info(hdev->dev, "max_num_sge = %d\n", hdev->ctrl_info->max_num_sge); + dev_info(hdev->dev, "lun_num_boot = %d\n", hdev->ctrl_info->lun_num_boot); + dev_info(hdev->dev, "max_data_transfer_size = %d\n", hdev->ctrl_info->mdts); + dev_info(hdev->dev, "abort_cmd_limit = %d\n", hdev->ctrl_info->acl); + dev_info(hdev->dev, "asyn_event_num = %d\n", hdev->ctrl_info->asynevent); + dev_info(hdev->dev, "card_type = %d\n", hdev->ctrl_info->card_type); + dev_info(hdev->dev, "pt_use_sgl = %d\n", hdev->ctrl_info->pt_use_sgl); + dev_info(hdev->dev, "rtd3e = %d\n", hdev->ctrl_info->rtd3e); + dev_info(hdev->dev, "serial_num = %s\n", hdev->ctrl_info->sn); + dev_info(hdev->dev, "fw_verion = %s\n", hdev->ctrl_info->fw_version); + + if (!hdev->ctrl_info->asynevent) + hdev->ctrl_info->asynevent = 1; + if (hdev->ctrl_info->asynevent > HIRAID_ASYN_COMMANDS) + hdev->ctrl_info->asynevent = HIRAID_ASYN_COMMANDS; + + hdev->scsi_qd = work_mode ? + le16_to_cpu(hdev->ctrl_info->max_cmds) : (hdev->ioq_depth - HIRAID_PTHRU_CMDS_PERQ); + + return 0; +} + +static int hiraid_user_send_admcmd(struct hiraid_dev *hdev, struct bsg_job *job) +{ + struct hiraid_bsg_request *bsg_req = job->request; + struct hiraid_passthru_common_cmd *ptcmd = &(bsg_req->admcmd); + struct hiraid_admin_command admin_cmd; + u32 timeout = msecs_to_jiffies(ptcmd->timeout_ms); + u32 result[2] = {0}; + int status; + + if (hdev->state >= DEV_RESETTING) { + dev_err(hdev->dev, "err, host state[%d] is not right\n", + hdev->state); + return -EBUSY; + } + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.common.opcode = ptcmd->opcode; + admin_cmd.common.flags = ptcmd->flags; + admin_cmd.common.hdid = cpu_to_le32(ptcmd->nsid); + admin_cmd.common.cdw2[0] = cpu_to_le32(ptcmd->cdw2); + admin_cmd.common.cdw2[1] = cpu_to_le32(ptcmd->cdw3); + admin_cmd.common.cdw10 = cpu_to_le32(ptcmd->cdw10); + admin_cmd.common.cdw11 = cpu_to_le32(ptcmd->cdw11); + admin_cmd.common.cdw12 = cpu_to_le32(ptcmd->cdw12); + admin_cmd.common.cdw13 = cpu_to_le32(ptcmd->cdw13); + admin_cmd.common.cdw14 = cpu_to_le32(ptcmd->cdw14); + admin_cmd.common.cdw15 = cpu_to_le32(ptcmd->cdw15); + + status = hiraid_bsg_buf_map(hdev, job, &admin_cmd); + if (status) { + dev_err(hdev->dev, "err, map data failed\n"); + return status; + } + + status = hiraid_put_admin_sync_request(hdev, &admin_cmd, &result[0], &result[1], timeout); + if (status >= 0) { + job->reply_len = sizeof(result); + memcpy(job->reply, result, sizeof(result)); + } + if (status) + dev_info(hdev->dev, "opcode[0x%x] subopcode[0x%x] status[0x%x] result0[0x%x];" + "result1[0x%x]\n", ptcmd->opcode, ptcmd->info_0.subopcode, status, + result[0], result[1]); + + hiraid_bsg_buf_unmap(hdev, job); + + return status; +} + +static int hiraid_alloc_io_ptcmds(struct hiraid_dev *hdev) +{ + u32 i; + u32 ptnum = HIRAID_TOTAL_PTCMDS(hdev->online_queues - 1); + + INIT_LIST_HEAD(&hdev->io_pt_list); + spin_lock_init(&hdev->io_pt_lock); + + hdev->io_ptcmds = kcalloc_node(ptnum, sizeof(struct hiraid_cmd), + GFP_KERNEL, hdev->numa_node); + + if (!hdev->io_ptcmds) { + dev_err(hdev->dev, "alloc io pthrunum failed\n"); + return -ENOMEM; + } + + for (i = 0; i < ptnum; i++) { + hdev->io_ptcmds[i].qid = i / HIRAID_PTHRU_CMDS_PERQ + 1; + hdev->io_ptcmds[i].cid = i % HIRAID_PTHRU_CMDS_PERQ + hdev->scsi_qd; + list_add_tail(&(hdev->io_ptcmds[i].list), &hdev->io_pt_list); + } + + dev_info(hdev->dev, "alloc io pthru cmd success, pthrunum[%d]\n", ptnum); + + return 0; +} + +static void hiraid_free_io_ptcmds(struct hiraid_dev *hdev) +{ + kfree(hdev->io_ptcmds); + hdev->io_ptcmds = NULL; + + INIT_LIST_HEAD(&hdev->io_pt_list); +} + +static int hiraid_put_io_sync_request(struct hiraid_dev *hdev, struct hiraid_scsi_io_cmd *io_cmd, + u32 *result, u32 *reslen, u32 timeout) +{ + int ret; + dma_addr_t buffer_phy; + struct hiraid_queue *ioq; + void *sense_addr = NULL; + struct hiraid_cmd *pt_cmd = hiraid_get_cmd(hdev, HIRAID_CMD_PTHRU); + + if (!pt_cmd) { + dev_err(hdev->dev, "err, get ioq cmd failed\n"); + return -EFAULT; + } + + timeout = timeout ? timeout : ADMIN_TIMEOUT; + + init_completion(&pt_cmd->cmd_done); + + ioq = &hdev->queues[pt_cmd->qid]; + if (work_mode) { + ret = ((pt_cmd->qid - 1) * HIRAID_PTHRU_CMDS_PERQ + pt_cmd->cid) * + SCSI_SENSE_BUFFERSIZE; + sense_addr = hdev->sense_buffer_virt + ret; + buffer_phy = hdev->sense_buffer_phy + ret; + } else { + ret = pt_cmd->cid * SCSI_SENSE_BUFFERSIZE; + sense_addr = ioq->sense_buffer_virt + ret; + buffer_phy = ioq->sense_buffer_phy + ret; + } + + io_cmd->common.sense_addr = cpu_to_le64(buffer_phy); + io_cmd->common.sense_len = cpu_to_le16(SCSI_SENSE_BUFFERSIZE); + io_cmd->common.cmd_id = cpu_to_le16(pt_cmd->cid); + + hiraid_submit_cmd(ioq, io_cmd); + + if (!wait_for_completion_timeout(&pt_cmd->cmd_done, timeout)) { + dev_err(hdev->dev, "cid[%d] qid[%d] timeout, opcode[0x%x] subopcode[0x%x]\n", + pt_cmd->cid, pt_cmd->qid, io_cmd->common.opcode, + (le32_to_cpu(io_cmd->common.cdw3[0]) & 0xffff)); + + hiraid_admin_timeout(hdev, pt_cmd); + + hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); + return -ETIME; + } + + if (result && reslen) { + if ((pt_cmd->status & 0x17f) == 0x101) { + memcpy(result, sense_addr, SCSI_SENSE_BUFFERSIZE); + *reslen = SCSI_SENSE_BUFFERSIZE; + } + } + + hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); + + return pt_cmd->status; +} + +static int hiraid_user_send_ptcmd(struct hiraid_dev *hdev, struct bsg_job *job) +{ + struct hiraid_bsg_request *bsg_req = (struct hiraid_bsg_request *)(job->request); + struct hiraid_passthru_io_cmd *cmd = &(bsg_req->pthrucmd); + struct hiraid_scsi_io_cmd pthru_cmd; + int status = 0; + u32 timeout = msecs_to_jiffies(cmd->timeout_ms); + // data len is 4k before use sgl, now len is 1M + u32 io_pt_data_len = (hdev->ctrl_info->pt_use_sgl == (bool)true) ? + IOQ_PT_SGL_DATA_LEN : IOQ_PT_DATA_LEN; + + if (cmd->data_len > io_pt_data_len) { + dev_err(hdev->dev, "data len bigger than %d\n", io_pt_data_len); + return -EFAULT; + } + + if (hdev->state != DEV_LIVE) { + dev_err(hdev->dev, "err, host state[%d] is not live\n", hdev->state); + return -EBUSY; + } + + memset(&pthru_cmd, 0, sizeof(pthru_cmd)); + pthru_cmd.common.opcode = cmd->opcode; + pthru_cmd.common.flags = cmd->flags; + pthru_cmd.common.hdid = cpu_to_le32(cmd->nsid); + pthru_cmd.common.sense_len = cpu_to_le16(cmd->info_0.res_sense_len); + pthru_cmd.common.cdb_len = cmd->info_0.cdb_len; + pthru_cmd.common.rsvd2 = cmd->info_0.rsvd0; + pthru_cmd.common.cdw3[0] = cpu_to_le32(cmd->cdw3); + pthru_cmd.common.cdw3[1] = cpu_to_le32(cmd->cdw4); + pthru_cmd.common.cdw3[2] = cpu_to_le32(cmd->cdw5); + + pthru_cmd.common.cdw10[0] = cpu_to_le32(cmd->cdw10); + pthru_cmd.common.cdw10[1] = cpu_to_le32(cmd->cdw11); + pthru_cmd.common.cdw10[2] = cpu_to_le32(cmd->cdw12); + pthru_cmd.common.cdw10[3] = cpu_to_le32(cmd->cdw13); + pthru_cmd.common.cdw10[4] = cpu_to_le32(cmd->cdw14); + pthru_cmd.common.cdw10[5] = cpu_to_le32(cmd->data_len); + + memcpy(pthru_cmd.common.cdb, &cmd->cdw16, cmd->info_0.cdb_len); + + pthru_cmd.common.cdw26[0] = cpu_to_le32(cmd->cdw26[0]); + pthru_cmd.common.cdw26[1] = cpu_to_le32(cmd->cdw26[1]); + pthru_cmd.common.cdw26[2] = cpu_to_le32(cmd->cdw26[2]); + pthru_cmd.common.cdw26[3] = cpu_to_le32(cmd->cdw26[3]); + + status = hiraid_bsg_buf_map(hdev, job, (struct hiraid_admin_command *)&pthru_cmd); + if (status) { + dev_err(hdev->dev, "err, map data failed\n"); + return status; + } + + status = hiraid_put_io_sync_request(hdev, &pthru_cmd, job->reply, &job->reply_len, timeout); + + if (status) + dev_info(hdev->dev, "opcode[0x%x] subopcode[0x%x] status[0x%x] replylen[%d]\n", + cmd->opcode, cmd->info_1.subopcode, status, job->reply_len); + + hiraid_bsg_buf_unmap(hdev, job); + + return status; +} + +static bool hiraid_check_scmd_finished(struct scsi_cmnd *scmd) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + struct hiraid_queue *hiraidq; + + hiraidq = mapbuf->hiraidq; + if (!hiraidq) + return false; + if (READ_ONCE(mapbuf->state) == CMD_COMPLETE || hiraid_poll_cq(hiraidq, mapbuf->cid)) { + dev_warn(hdev->dev, "cid[%d] qid[%d] has been completed\n", + mapbuf->cid, hiraidq->qid); + return true; + } + return false; +} + +static enum blk_eh_timer_return hiraid_timed_out(struct scsi_cmnd *scmd) +{ + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + unsigned int timeout = scmd->device->request_queue->rq_timeout; + + if (hiraid_check_scmd_finished(scmd)) + goto out; + + if (time_after(jiffies, scmd->jiffies_at_alloc + timeout)) { + if (cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_TIMEOUT) == CMD_FLIGHT) + return BLK_EH_DONE; + } +out: + return BLK_EH_RESET_TIMER; +} + +/* send abort command by admin queue temporary */ +static int hiraid_send_abort_cmd(struct hiraid_dev *hdev, u32 hdid, u16 qid, u16 cid) +{ + struct hiraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.abort.opcode = HIRAID_ADMIN_ABORT_CMD; + admin_cmd.abort.hdid = cpu_to_le32(hdid); + admin_cmd.abort.sqid = cpu_to_le16(qid); + admin_cmd.abort.cid = cpu_to_le16(cid); + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +/* send reset command by admin quueue temporary */ +static int hiraid_send_reset_cmd(struct hiraid_dev *hdev, u8 type, u32 hdid) +{ + struct hiraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.reset.opcode = HIRAID_ADMIN_RESET; + admin_cmd.reset.hdid = cpu_to_le32(hdid); + admin_cmd.reset.type = type; + + return hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, 0); +} + +static bool hiraid_dev_state_trans(struct hiraid_dev *hdev, enum hiraid_dev_state new_state) +{ + unsigned long flags; + enum hiraid_dev_state old_state; + bool change = false; + + spin_lock_irqsave(&hdev->state_lock, flags); + + old_state = hdev->state; + switch (new_state) { + case DEV_LIVE: + switch (old_state) { + case DEV_NEW: + case DEV_RESETTING: + change = true; + break; + default: + break; + } + break; + case DEV_RESETTING: + switch (old_state) { + case DEV_LIVE: + change = true; + break; + default: + break; + } + break; + case DEV_DELETING: + if (old_state != DEV_DELETING) + change = true; + break; + case DEV_DEAD: + switch (old_state) { + case DEV_NEW: + case DEV_LIVE: + case DEV_RESETTING: + change = true; + break; + default: + break; + } + break; + default: + break; + } + if (change) + hdev->state = new_state; + spin_unlock_irqrestore(&hdev->state_lock, flags); + + dev_info(hdev->dev, "oldstate[%d]->newstate[%d], change[%d]\n", + old_state, new_state, change); + + return change; +} + +static void hiraid_drain_pending_ios(struct hiraid_dev *hdev); + +static void hiraid_flush_running_cmds(struct hiraid_dev *hdev) +{ + int i, j; + + scsi_block_requests(hdev->shost); + hiraid_drain_pending_ios(hdev); + scsi_unblock_requests(hdev->shost); + + j = HIRAID_AQ_BLK_MQ_DEPTH; + for (i = 0; i < j; i++) { + if (READ_ONCE(hdev->adm_cmds[i].state) == CMD_FLIGHT) { + dev_info(hdev->dev, "flush admin, cid[%d]\n", i); + hdev->adm_cmds[i].status = 0xFFFF; + WRITE_ONCE(hdev->adm_cmds[i].state, CMD_COMPLETE); + complete(&(hdev->adm_cmds[i].cmd_done)); + } + } + + j = HIRAID_TOTAL_PTCMDS(hdev->online_queues - 1); + for (i = 0; i < j; i++) { + if (READ_ONCE(hdev->io_ptcmds[i].state) == CMD_FLIGHT) { + hdev->io_ptcmds[i].status = 0xFFFF; + WRITE_ONCE(hdev->io_ptcmds[i].state, CMD_COMPLETE); + complete(&(hdev->io_ptcmds[i].cmd_done)); + } + } +} + +static int hiraid_dev_disable(struct hiraid_dev *hdev, bool shutdown) +{ + int ret = -ENODEV; + struct hiraid_queue *adminq = &hdev->queues[0]; + u16 start, end; + + if (pci_device_is_present(hdev->pdev)) { + if (shutdown) + hiraid_shutdown_control(hdev); + else + ret = hiraid_disable_control(hdev); + } + + if (hdev->queue_count == 0) { + dev_err(hdev->dev, "warn: queue has been delete\n"); + return ret; + } + + spin_lock_irq(&adminq->cq_lock); + hiraid_process_cq(adminq, &start, &end, -1); + spin_unlock_irq(&adminq->cq_lock); + hiraid_complete_cqes(adminq, start, end); + + hiraid_pci_disable(hdev); + + hiraid_flush_running_cmds(hdev); + + return ret; +} + +static void hiraid_reset_work(struct work_struct *work) +{ + int ret = 0; + struct hiraid_dev *hdev = container_of(work, struct hiraid_dev, reset_work); + + if (hdev->state != DEV_RESETTING) { + dev_err(hdev->dev, "err, host is not reset state\n"); + return; + } + + dev_info(hdev->dev, "enter host reset\n"); + + if (hdev->ctrl_config & HIRAID_CC_ENABLE) { + dev_info(hdev->dev, "start dev_disable\n"); + ret = hiraid_dev_disable(hdev, false); + } + + if (ret) + goto out; + + ret = hiraid_pci_enable(hdev); + if (ret) + goto out; + + ret = hiraid_setup_admin_queue(hdev); + if (ret) + goto pci_disable; + + ret = hiraid_setup_io_queues(hdev); + if (ret || hdev->online_queues != hdev->last_qcnt) + goto pci_disable; + + hiraid_dev_state_trans(hdev, DEV_LIVE); + + hiraid_init_async_event(hdev); + + hiraid_queue_scan(hdev); + + return; + +pci_disable: + hiraid_pci_disable(hdev); +out: + hiraid_dev_state_trans(hdev, DEV_DEAD); + dev_err(hdev->dev, "err, host reset failed\n"); +} + +static int hiraid_reset_work_sync(struct hiraid_dev *hdev) +{ + if (!hiraid_dev_state_trans(hdev, DEV_RESETTING)) { + dev_info(hdev->dev, "can't change to reset state\n"); + return -EBUSY; + } + + if (!queue_work(work_queue, &hdev->reset_work)) { + dev_err(hdev->dev, "err, host is already in reset state\n"); + return -EBUSY; + } + + flush_work(&hdev->reset_work); + if (hdev->state != DEV_LIVE) + return -ENODEV; + + return 0; +} + +static int hiraid_wait_io_completion(struct hiraid_mapmange *mapbuf) +{ + u16 times = 0; + + do { + if (READ_ONCE(mapbuf->state) == CMD_TMO_COMPLETE) + break; + msleep(500); + times++; + } while (times <= HIRAID_WAIT_ABNL_CMD_TIMEOUT); + + /* wait command completion timeout after abort/reset success */ + if (times >= HIRAID_WAIT_ABNL_CMD_TIMEOUT) + return -ETIMEDOUT; + + return 0; +} + +static bool hiraid_tgt_rst_pending_io_count(struct request *rq, void *data, bool reserved) +{ + unsigned int id = *(unsigned int *)data; + struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq); + struct hiraid_mapmange *mapbuf; + struct hiraid_sdev_hostdata *hostdata; + + if (scmd) { + mapbuf = scsi_cmd_priv(scmd); + if ((mapbuf->state == CMD_FLIGHT) || (mapbuf->state == CMD_TIMEOUT)) { + if ((scmd->device) && (scmd->device->id == id)) { + hostdata = scmd->device->hostdata; + hostdata->pend_count++; + } + } + } + return true; +} +static bool hiraid_clean_pending_io(struct request *rq, void *data, bool reserved) +{ + struct hiraid_dev *hdev = data; + struct scsi_cmnd *scmd; + struct hiraid_mapmange *mapbuf; + + if (unlikely(!rq || !blk_mq_request_started(rq))) + return true; + + scmd = blk_mq_rq_to_pdu(rq); + mapbuf = scsi_cmd_priv(scmd); + + if ((cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_COMPLETE) != CMD_FLIGHT) && + (cmpxchg(&mapbuf->state, CMD_TIMEOUT, CMD_COMPLETE) != CMD_TIMEOUT)) + return true; + + set_host_byte(scmd, DID_NO_CONNECT); + if (mapbuf->sge_cnt) + scsi_dma_unmap(scmd); + hiraid_free_mapbuf(hdev, mapbuf); + dev_warn_ratelimited(hdev->dev, "back unfinished CQE, cid[%d] qid[%d]\n", + mapbuf->cid, mapbuf->hiraidq->qid); + scmd->scsi_done(scmd); + + return true; +} + +static void hiraid_drain_pending_ios(struct hiraid_dev *hdev) +{ + blk_mq_tagset_busy_iter(&hdev->shost->tag_set, hiraid_clean_pending_io, (void *)(hdev)); +} + +static int wait_tgt_reset_io_done(struct scsi_cmnd *scmd) +{ + u16 timeout = 0; + struct hiraid_sdev_hostdata *hostdata; + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + + hostdata = scmd->device->hostdata; + + do { + hostdata->pend_count = 0; + blk_mq_tagset_busy_iter(&hdev->shost->tag_set, hiraid_tgt_rst_pending_io_count, + (void *)(&scmd->device->id)); + + if (!hostdata->pend_count) + return 0; + + msleep(500); + timeout++; + } while (timeout <= HIRAID_WAIT_RST_IO_TIMEOUT); + + return -ETIMEDOUT; +} + +static int hiraid_abort(struct scsi_cmnd *scmd) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); + struct hiraid_sdev_hostdata *hostdata; + u16 hwq, cid; + int ret; + + scsi_print_command(scmd); + + if (hdev->state != DEV_LIVE || !hiraid_wait_io_completion(mapbuf) || + hiraid_check_scmd_finished(scmd)) + return SUCCESS; + + hostdata = scmd->device->hostdata; + cid = mapbuf->cid; + hwq = mapbuf->hiraidq->qid; + + dev_warn(hdev->dev, "cid[%d] qid[%d] timeout, send abort\n", cid, hwq); + ret = hiraid_send_abort_cmd(hdev, hostdata->hdid, hwq, cid); + if (ret != -ETIME) { + ret = hiraid_wait_io_completion(mapbuf); + if (ret) { + dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, not found\n", cid, hwq); + return FAILED; + } + dev_warn(hdev->dev, "cid[%d] qid[%d] abort succ\n", cid, hwq); + return SUCCESS; + } + dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, timeout\n", cid, hwq); + return FAILED; +} + +static int hiraid_scsi_reset(struct scsi_cmnd *scmd, enum hiraid_rst_type rst) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + struct hiraid_sdev_hostdata *hostdata; + int ret; + + if (hdev->state != DEV_LIVE) + return SUCCESS; + + hostdata = scmd->device->hostdata; + + dev_warn(hdev->dev, "sdev[%d:%d] send %s reset\n", scmd->device->channel, scmd->device->id, + rst ? "bus" : "target"); + ret = hiraid_send_reset_cmd(hdev, rst, hostdata->hdid); + if ((ret == 0) || (ret == FW_EH_DEV_NONE && rst == HIRAID_RESET_TARGET)) { + if (rst == HIRAID_RESET_TARGET) { + ret = wait_tgt_reset_io_done(scmd); + if (ret) { + dev_warn(hdev->dev, "sdev[%d:%d] target has %d peding cmd, target reset failed\n", + scmd->device->channel, scmd->device->id, + hostdata->pend_count); + return FAILED; + } + } + dev_warn(hdev->dev, "sdev[%d:%d] %s reset success\n", + scmd->device->channel, scmd->device->id, rst ? "bus" : "target"); + return SUCCESS; + } + + dev_warn(hdev->dev, "sdev[%d:%d] %s reset failed\n", + scmd->device->channel, scmd->device->id, rst ? "bus" : "target"); + return FAILED; +} + +static int hiraid_target_reset(struct scsi_cmnd *scmd) +{ + return hiraid_scsi_reset(scmd, HIRAID_RESET_TARGET); +} + +static int hiraid_bus_reset(struct scsi_cmnd *scmd) +{ + return hiraid_scsi_reset(scmd, HIRAID_RESET_BUS); +} + +static int hiraid_host_reset(struct scsi_cmnd *scmd) +{ + struct hiraid_dev *hdev = shost_priv(scmd->device->host); + + if (hdev->state != DEV_LIVE) + return SUCCESS; + + dev_warn(hdev->dev, "sdev[%d:%d] send host reset\n", + scmd->device->channel, scmd->device->id); + if (hiraid_reset_work_sync(hdev) == -EBUSY) + flush_work(&hdev->reset_work); + + if (hdev->state != DEV_LIVE) { + dev_warn(hdev->dev, "sdev[%d:%d] host reset failed\n", + scmd->device->channel, scmd->device->id); + return FAILED; + } + + dev_warn(hdev->dev, "sdev[%d:%d] host reset success\n", + scmd->device->channel, scmd->device->id); + + return SUCCESS; +} + +static pci_ers_result_t hiraid_pci_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + dev_info(hdev->dev, "pci error detected, state[%d]\n", state); + + switch (state) { + case pci_channel_io_normal: + dev_warn(hdev->dev, "channel is normal, do nothing\n"); + + return PCI_ERS_RESULT_CAN_RECOVER; + case pci_channel_io_frozen: + dev_warn(hdev->dev, "channel io frozen, need reset controller\n"); + + scsi_block_requests(hdev->shost); + + hiraid_dev_state_trans(hdev, DEV_RESETTING); + + return PCI_ERS_RESULT_NEED_RESET; + case pci_channel_io_perm_failure: + dev_warn(hdev->dev, "channel io failure, disconnect\n"); + + return PCI_ERS_RESULT_DISCONNECT; + } + + return PCI_ERS_RESULT_NEED_RESET; +} + +static pci_ers_result_t hiraid_pci_slot_reset(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + dev_info(hdev->dev, "restart after slot reset\n"); + + pci_restore_state(pdev); + + if (!queue_work(work_queue, &hdev->reset_work)) { + dev_err(hdev->dev, "err, the device is resetting state\n"); + return PCI_ERS_RESULT_NONE; + } + + flush_work(&hdev->reset_work); + + scsi_unblock_requests(hdev->shost); + + return PCI_ERS_RESULT_RECOVERED; +} + +static void hiraid_reset_pci_finish(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + dev_info(hdev->dev, "enter hiraid reset finish\n"); +} + +static ssize_t csts_pp_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) { + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_PP_MASK); + ret >>= HIRAID_CSTS_PP_SHIFT; + } + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t csts_shst_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) { + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_SHST_MASK); + ret >>= HIRAID_CSTS_SHST_SHIFT; + } + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t csts_cfs_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) { + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_CFS_MASK); + ret >>= HIRAID_CSTS_CFS_SHIFT; + } + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t csts_rdy_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + int ret = -1; + + if (pci_device_is_present(hdev->pdev)) + ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_RDY); + + return snprintf(buf, PAGE_SIZE, "%d\n", ret); +} + +static ssize_t fw_version_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + + return snprintf(buf, PAGE_SIZE, "%s\n", hdev->ctrl_info->fw_version); +} + +static ssize_t hdd_dispatch_store(struct device *cdev, struct device_attribute *attr, + const char *buf, size_t count) +{ + int val = 0; + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + + if (kstrtoint(buf, 0, &val) != 0) + return -EINVAL; + if (val < DISPATCH_BY_CPU || val > DISPATCH_BY_DISK) + return -EINVAL; + hdev->hdd_dispatch = val; + + return strlen(buf); +} +static ssize_t hdd_dispatch_show(struct device *cdev, struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(cdev); + struct hiraid_dev *hdev = shost_priv(shost); + + return snprintf(buf, PAGE_SIZE, "%d\n", hdev->hdd_dispatch); +} + +static DEVICE_ATTR_RO(csts_pp); +static DEVICE_ATTR_RO(csts_shst); +static DEVICE_ATTR_RO(csts_cfs); +static DEVICE_ATTR_RO(csts_rdy); +static DEVICE_ATTR_RO(fw_version); +static DEVICE_ATTR_RW(hdd_dispatch); + +static struct device_attribute *hiraid_host_attrs[] = { + &dev_attr_csts_rdy, + &dev_attr_csts_pp, + &dev_attr_csts_cfs, + &dev_attr_fw_version, + &dev_attr_csts_shst, + &dev_attr_hdd_dispatch, + NULL, +}; + +static int hiraid_get_vd_info(struct hiraid_dev *hdev, struct hiraid_vd_info *vd_info, u16 vid) +{ + struct hiraid_admin_command admin_cmd; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + int ret; + + if (hdev->state >= DEV_RESETTING) { + dev_err(hdev->dev, "err, host state[%d] is not right\n", hdev->state); + return -EBUSY; + } + + data_ptr = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.usr_cmd.opcode = USR_CMD_READ; + admin_cmd.usr_cmd.info_0.subopcode = cpu_to_le16(USR_CMD_VDINFO); + admin_cmd.usr_cmd.info_1.data_len = cpu_to_le16(USR_CMD_RDLEN); + admin_cmd.usr_cmd.info_1.param_len = cpu_to_le16(VDINFO_PARAM_LEN); + admin_cmd.usr_cmd.cdw10 = cpu_to_le32(vid); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, USRCMD_TIMEOUT); + if (!ret) + memcpy(vd_info, data_ptr, sizeof(struct hiraid_vd_info)); + + dma_free_coherent(hdev->dev, PAGE_SIZE, data_ptr, buffer_phy); + + return ret; +} + +static int hiraid_get_bgtask(struct hiraid_dev *hdev, struct hiraid_bgtask *bgtask) +{ + struct hiraid_admin_command admin_cmd; + u8 *data_ptr = NULL; + dma_addr_t buffer_phy = 0; + int ret; + + if (hdev->state >= DEV_RESETTING) { + dev_err(hdev->dev, "err, host state[%d] is not right\n", hdev->state); + return -EBUSY; + } + + data_ptr = dma_alloc_coherent(hdev->dev, PAGE_SIZE, &buffer_phy, GFP_KERNEL); + if (!data_ptr) + return -ENOMEM; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.usr_cmd.opcode = USR_CMD_READ; + admin_cmd.usr_cmd.info_0.subopcode = cpu_to_le16(USR_CMD_BGTASK); + admin_cmd.usr_cmd.info_1.data_len = cpu_to_le16(USR_CMD_RDLEN); + admin_cmd.common.dptr.prp1 = cpu_to_le64(buffer_phy); + + ret = hiraid_put_admin_sync_request(hdev, &admin_cmd, NULL, NULL, USRCMD_TIMEOUT); + if (!ret) + memcpy(bgtask, data_ptr, sizeof(struct hiraid_bgtask)); + + dma_free_coherent(hdev->dev, PAGE_SIZE, data_ptr, buffer_phy); + + return ret; +} + +static ssize_t raid_level_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev; + struct hiraid_dev *hdev; + struct hiraid_vd_info *vd_info; + struct hiraid_sdev_hostdata *hostdata; + int ret; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + vd_info = kmalloc(sizeof(*vd_info), GFP_KERNEL); + if (!vd_info || !HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + return snprintf(buf, PAGE_SIZE, "NA\n"); + + ret = hiraid_get_vd_info(hdev, vd_info, sdev->id); + if (ret) + vd_info->rg_level = ARRAY_SIZE(raid_levels) - 1; + + ret = (vd_info->rg_level < ARRAY_SIZE(raid_levels)) ? + vd_info->rg_level : (ARRAY_SIZE(raid_levels) - 1); + + kfree(vd_info); + + return snprintf(buf, PAGE_SIZE, "RAID-%s\n", raid_levels[ret]); +} + +static ssize_t raid_state_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev; + struct hiraid_dev *hdev; + struct hiraid_vd_info *vd_info; + struct hiraid_sdev_hostdata *hostdata; + int ret; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + vd_info = kmalloc(sizeof(*vd_info), GFP_KERNEL); + if (!vd_info || !HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + return snprintf(buf, PAGE_SIZE, "NA\n"); + + ret = hiraid_get_vd_info(hdev, vd_info, sdev->id); + if (ret) { + vd_info->vd_status = 0; + vd_info->rg_id = 0xff; + } + + ret = (vd_info->vd_status < ARRAY_SIZE(raid_states)) ? vd_info->vd_status : 0; + + kfree(vd_info); + + return snprintf(buf, PAGE_SIZE, "%s\n", raid_states[ret]); +} + +static ssize_t raid_resync_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev; + struct hiraid_dev *hdev; + struct hiraid_vd_info *vd_info; + struct hiraid_bgtask *bgtask; + struct hiraid_sdev_hostdata *hostdata; + u8 rg_id, i, progress = 0; + int ret; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + vd_info = kmalloc(sizeof(*vd_info), GFP_KERNEL); + if (!vd_info || !HIRAID_DEV_INFO_ATTR_VD(hostdata->attr)) + return snprintf(buf, PAGE_SIZE, "NA\n"); + + ret = hiraid_get_vd_info(hdev, vd_info, sdev->id); + if (ret) + goto out; + + rg_id = vd_info->rg_id; + + bgtask = (struct hiraid_bgtask *)vd_info; + ret = hiraid_get_bgtask(hdev, bgtask); + if (ret) + goto out; + for (i = 0; i < bgtask->task_num; i++) { + if ((bgtask->bgtask[i].type == BGTASK_TYPE_REBUILD) && + (le16_to_cpu(bgtask->bgtask[i].vd_id) == rg_id)) + progress = bgtask->bgtask[i].progress; + } + +out: + kfree(vd_info); + return snprintf(buf, PAGE_SIZE, "%d\n", progress); +} + +static ssize_t dispatch_hwq_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hiraid_sdev_hostdata *hostdata; + + hostdata = to_scsi_device(dev)->hostdata; + return snprintf(buf, PAGE_SIZE, "%d\n", hostdata->hwq); +} + +static ssize_t dispatch_hwq_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + int val; + struct hiraid_dev *hdev; + struct scsi_device *sdev; + struct hiraid_sdev_hostdata *hostdata; + + sdev = to_scsi_device(dev); + hdev = shost_priv(sdev->host); + hostdata = sdev->hostdata; + + if (kstrtoint(buf, 0, &val) != 0) + return -EINVAL; + if (val <= 0 || val >= hdev->online_queues) + return -EINVAL; + if (!hiraid_disk_is_hdd(hostdata->attr)) + return -EINVAL; + + hostdata->hwq = val; + return strlen(buf); +} + +static DEVICE_ATTR_RO(raid_level); +static DEVICE_ATTR_RO(raid_state); +static DEVICE_ATTR_RO(raid_resync); +static DEVICE_ATTR_RW(dispatch_hwq); + +static struct device_attribute *hiraid_dev_attrs[] = { + &dev_attr_raid_state, + &dev_attr_raid_level, + &dev_attr_raid_resync, + &dev_attr_dispatch_hwq, + NULL, +}; + +static struct pci_error_handlers hiraid_err_handler = { + .error_detected = hiraid_pci_error_detected, + .slot_reset = hiraid_pci_slot_reset, + .reset_done = hiraid_reset_pci_finish, +}; + +static int hiraid_sysfs_host_reset(struct Scsi_Host *shost, int reset_type) +{ + int ret; + struct hiraid_dev *hdev = shost_priv(shost); + + dev_info(hdev->dev, "start sysfs host reset cmd\n"); + ret = hiraid_reset_work_sync(hdev); + dev_info(hdev->dev, "stop sysfs host reset cmd[%d]\n", ret); + + return ret; +} + +static int hiraid_scan_finished(struct Scsi_Host *shost, unsigned long time) +{ + struct hiraid_dev *hdev = shost_priv(shost); + + hiraid_scan_work(&hdev->scan_work); + + return 1; +} + +static struct scsi_host_template hiraid_driver_template = { + .module = THIS_MODULE, + .name = "hiraid", + .proc_name = "hiraid", + .queuecommand = hiraid_queue_command, + .slave_alloc = hiraid_slave_alloc, + .slave_destroy = hiraid_slave_destroy, + .slave_configure = hiraid_slave_configure, + .scan_finished = hiraid_scan_finished, + .eh_timed_out = hiraid_timed_out, + .eh_abort_handler = hiraid_abort, + .eh_target_reset_handler = hiraid_target_reset, + .eh_bus_reset_handler = hiraid_bus_reset, + .eh_host_reset_handler = hiraid_host_reset, + .change_queue_depth = scsi_change_queue_depth, + .this_id = -1, + .unchecked_isa_dma = 0, + .shost_attrs = hiraid_host_attrs, + .sdev_attrs = hiraid_dev_attrs, + .host_reset = hiraid_sysfs_host_reset, +}; + +static void hiraid_shutdown(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + hiraid_delete_io_queues(hdev); + hiraid_disable_admin_queue(hdev, true); +} + +static bool hiraid_bsg_is_valid(struct bsg_job *job) +{ + u64 timeout = 0; + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_bsg_request *bsg_req = job->request; + struct hiraid_dev *hdev = shost_priv(dev_to_shost(job->dev)); + + if (bsg_req == NULL || job->request_len != sizeof(struct hiraid_bsg_request)) + return false; + + switch (bsg_req->msgcode) { + case HIRAID_BSG_ADMIN: + timeout = msecs_to_jiffies(bsg_req->admcmd.timeout_ms); + break; + case HIRAID_BSG_IOPTHRU: + timeout = msecs_to_jiffies(bsg_req->pthrucmd.timeout_ms); + break; + default: + dev_info(hdev->dev, "bsg unsupport msgcode[%d]\n", bsg_req->msgcode); + return false; + } + + if ((timeout + CTL_RST_TIME) > rq->timeout) { + dev_err(hdev->dev, "bsg invalid time\n"); + return false; + } + + return true; +} + +/* bsg dispatch user command */ +static int hiraid_bsg_dispatch(struct bsg_job *job) +{ + struct Scsi_Host *shost = dev_to_shost(job->dev); + struct hiraid_dev *hdev = shost_priv(shost); + struct request *rq = blk_mq_rq_from_pdu(job); + struct hiraid_bsg_request *bsg_req = job->request; + int ret = -ENOMSG; + + job->reply_len = 0; + + if (!hiraid_bsg_is_valid(job)) { + bsg_job_done(job, ret, 0); + return 0; + } + + dev_log_dbg(hdev->dev, "bsg msgcode[%d] msglen[%d] timeout[%d];" + "reqnsge[%d], reqlen[%d]\n", + bsg_req->msgcode, job->request_len, rq->timeout, + job->request_payload.sg_cnt, job->request_payload.payload_len); + + switch (bsg_req->msgcode) { + case HIRAID_BSG_ADMIN: + ret = hiraid_user_send_admcmd(hdev, job); + break; + case HIRAID_BSG_IOPTHRU: + ret = hiraid_user_send_ptcmd(hdev, job); + break; + default: + break; + } + + if (ret > 0) + ret = ret | (ret << 8); + + bsg_job_done(job, ret, 0); + return 0; +} + +static inline void hiraid_unregist_bsg(struct hiraid_dev *hdev) +{ + if (hdev->bsg_queue) { + bsg_unregister_queue(hdev->bsg_queue); + blk_cleanup_queue(hdev->bsg_queue); + } +} +static int hiraid_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct hiraid_dev *hdev; + struct Scsi_Host *shost; + int node, ret; + char bsg_name[15]; + + shost = scsi_host_alloc(&hiraid_driver_template, sizeof(*hdev)); + if (!shost) { + dev_err(&pdev->dev, "failed to allocate scsi host\n"); + return -ENOMEM; + } + hdev = shost_priv(shost); + hdev->pdev = pdev; + hdev->dev = get_device(&pdev->dev); + + node = dev_to_node(hdev->dev); + if (node == NUMA_NO_NODE) { + node = first_memory_node; + set_dev_node(hdev->dev, node); + } + hdev->numa_node = node; + hdev->shost = shost; + hdev->instance = shost->host_no; + pci_set_drvdata(pdev, hdev); + + ret = hiraid_dev_map(hdev); + if (ret) + goto put_dev; + + init_rwsem(&hdev->dev_rwsem); + INIT_WORK(&hdev->scan_work, hiraid_scan_work); + INIT_WORK(&hdev->timesyn_work, hiraid_timesyn_work); + INIT_WORK(&hdev->reset_work, hiraid_reset_work); + INIT_WORK(&hdev->fwact_work, hiraid_fwactive_work); + spin_lock_init(&hdev->state_lock); + + ret = hiraid_alloc_resources(hdev); + if (ret) + goto dev_unmap; + + ret = hiraid_pci_enable(hdev); + if (ret) + goto resources_free; + + ret = hiraid_setup_admin_queue(hdev); + if (ret) + goto pci_disable; + + ret = hiraid_init_control_info(hdev); + if (ret) + goto disable_admin_q; + + ret = hiraid_setup_io_queues(hdev); + if (ret) + goto disable_admin_q; + + hiraid_shost_init(hdev); + + ret = scsi_add_host(hdev->shost, hdev->dev); + if (ret) { + dev_err(hdev->dev, "add shost to system failed, ret[%d]\n", ret); + goto remove_io_queues; + } + + snprintf(bsg_name, sizeof(bsg_name), "hiraid%d", shost->host_no); + hdev->bsg_queue = bsg_setup_queue(&shost->shost_gendev, bsg_name, hiraid_bsg_dispatch, + NULL, hiraid_get_max_cmd_size(hdev)); + if (IS_ERR(hdev->bsg_queue)) { + dev_err(hdev->dev, "err, setup bsg failed\n"); + hdev->bsg_queue = NULL; + goto remove_io_queues; + } + + if (hdev->online_queues == HIRAID_ADMIN_QUEUE_NUM) { + dev_warn(hdev->dev, "warn: only admin queue can be used\n"); + return 0; + } + + hdev->state = DEV_LIVE; + + hiraid_init_async_event(hdev); + + ret = hiraid_dev_list_init(hdev); + if (ret) + goto unregist_bsg; + + ret = hiraid_configure_timestamp(hdev); + if (ret) + dev_warn(hdev->dev, "time synchronization failed\n"); + + ret = hiraid_alloc_io_ptcmds(hdev); + if (ret) + goto unregist_bsg; + + scsi_scan_host(hdev->shost); + + return 0; + +unregist_bsg: + hiraid_unregist_bsg(hdev); +remove_io_queues: + hiraid_delete_io_queues(hdev); +disable_admin_q: + hiraid_free_sense_buffer(hdev); + hiraid_disable_admin_queue(hdev, false); +pci_disable: + hiraid_free_all_queues(hdev); + hiraid_pci_disable(hdev); +resources_free: + hiraid_free_resources(hdev); +dev_unmap: + hiraid_dev_unmap(hdev); +put_dev: + put_device(hdev->dev); + scsi_host_put(shost); + + return -ENODEV; +} + +static void hiraid_remove(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + struct Scsi_Host *shost = hdev->shost; + + dev_info(hdev->dev, "enter hiraid remove\n"); + + hiraid_dev_state_trans(hdev, DEV_DELETING); + flush_work(&hdev->reset_work); + + if (!pci_device_is_present(pdev)) + hiraid_flush_running_cmds(hdev); + + hiraid_unregist_bsg(hdev); + scsi_remove_host(shost); + hiraid_free_io_ptcmds(hdev); + kfree(hdev->dev_info); + hiraid_delete_io_queues(hdev); + hiraid_free_sense_buffer(hdev); + hiraid_disable_admin_queue(hdev, false); + hiraid_free_all_queues(hdev); + hiraid_pci_disable(hdev); + hiraid_free_resources(hdev); + hiraid_dev_unmap(hdev); + put_device(hdev->dev); + scsi_host_put(shost); + + dev_info(hdev->dev, "exit hiraid remove\n"); +} + +static const struct pci_device_id hiraid_hw_card_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, HIRAID_SERVER_DEVICE_HBA_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, HIRAID_SERVER_DEVICE_RAID_DID) }, + { 0, } +}; +MODULE_DEVICE_TABLE(pci, hiraid_hw_card_ids); + +static struct pci_driver hiraid_driver = { + .name = "hiraid", + .id_table = hiraid_hw_card_ids, + .probe = hiraid_probe, + .remove = hiraid_remove, + .shutdown = hiraid_shutdown, + .err_handler = &hiraid_err_handler, +}; + +static int __init hiraid_init(void) +{ + int ret; + + work_queue = alloc_workqueue("hiraid-wq", WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); + if (!work_queue) + return -ENOMEM; + + hiraid_class = class_create(THIS_MODULE, "hiraid"); + if (IS_ERR(hiraid_class)) { + ret = PTR_ERR(hiraid_class); + goto destroy_wq; + } + + ret = pci_register_driver(&hiraid_driver); + if (ret < 0) + goto destroy_class; + + return 0; + +destroy_class: + class_destroy(hiraid_class); +destroy_wq: + destroy_workqueue(work_queue); + + return ret; +} + +static void __exit hiraid_exit(void) +{ + pci_unregister_driver(&hiraid_driver); + class_destroy(hiraid_class); + destroy_workqueue(work_queue); +} + +MODULE_AUTHOR("Huawei Technologies CO., Ltd"); +MODULE_DESCRIPTION("Huawei RAID driver"); +MODULE_LICENSE("GPL"); +MODULE_VERSION(HIRAID_DRV_VERSION); +module_init(hiraid_init); +module_exit(hiraid_exit); -- 2.22.0.windows.1

2 1

[PATCH openEuler-1.0-LTS v2] netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
by Zhengchao Shao 16 Nov '23

16 Nov '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.5-rc1 commit ff0a3a7d52ff7282dbd183e7fc29a1fe386b0c30 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZSO CVE: CVE-2023-39197 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Eric Dumazet says: nf_conntrack_dccp_packet() has an unique: dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); And nothing more is 'pulled' from the packet, depending on the content. dh->dccph_doff, and/or dh->dccph_x ...) So dccp_ack_seq() is happily reading stuff past the _dh buffer. BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0 Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371 [..] Fix this by increasing the stack buffer to also include room for the extra sequence numbers and all the known dccp packet type headers, then pull again after the initial validation of the basic header. While at it, mark packets invalid that lack 48bit sequence bit but where RFC says the type MUST use them. Compile tested only. v2: first skb_header_pointer() now needs to adjust the size to only pull the generic header. (Eric) Heads-up: I intend to remove dccp conntrack support later this year. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Reported-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Florian Westphal <fw(a)strlen.de> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_conntrack_proto_dccp.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_conntrack_proto_dccp.c | 50 ++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index 3ba1f4d9934f..cd7a51a0f100 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -433,17 +433,47 @@ static u64 dccp_ack_seq(const struct dccp_hdr *dh) ntohl(dhack->dccph_ack_nr_low); } +struct nf_conntrack_dccp_buf { + struct dccp_hdr dh; /* generic header part */ + struct dccp_hdr_ext ext; /* optional depending dh->dccph_x */ + union { /* depends on header type */ + struct dccp_hdr_ack_bits ack; + struct dccp_hdr_request req; + struct dccp_hdr_response response; + struct dccp_hdr_reset rst; + } u; +}; + +static struct dccp_hdr * +dccp_header_pointer(const struct sk_buff *skb, int offset, const struct dccp_hdr *dh, + struct nf_conntrack_dccp_buf *buf) +{ + unsigned int hdrlen = __dccp_hdr_len(dh); + + if (hdrlen > sizeof(*buf)) + return NULL; + + return skb_header_pointer(skb, offset, hdrlen, buf); +} + static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); - struct dccp_hdr _dh, *dh; + struct nf_conntrack_dccp_buf _dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; unsigned int *timeouts; + struct dccp_hdr *dh; dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); BUG_ON(dh == NULL); + + /* pull again, including possible 48 bit sequences and subtype header */ + dh = dccp_header_pointer(skb, dataoff, dh, &_dh); + if (!dh) + return NF_DROP; + type = dh->dccph_type; if (type == DCCP_PKT_RESET && @@ -526,10 +556,20 @@ static int dccp_error(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, unsigned int dataoff, u_int8_t pf, unsigned int hooknum) { + static const unsigned long require_seq48 = 1 << DCCP_PKT_REQUEST | + 1 << DCCP_PKT_RESPONSE | + 1 << DCCP_PKT_CLOSEREQ | + 1 << DCCP_PKT_CLOSE | + 1 << DCCP_PKT_RESET | + 1 << DCCP_PKT_SYNC | + 1 << DCCP_PKT_SYNCACK; struct dccp_hdr _dh, *dh; unsigned int dccp_len = skb->len - dataoff; unsigned int cscov; const char *msg; + u8 type; + + BUILD_BUG_ON(DCCP_PKT_INVALID >= BITS_PER_LONG); dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); if (dh == NULL) { @@ -559,11 +599,17 @@ static int dccp_error(struct net *net, struct nf_conn *tmpl, goto out_invalid; } - if (dh->dccph_type >= DCCP_PKT_INVALID) { + type = dh->dccph_type; + if (type >= DCCP_PKT_INVALID) { msg = "nf_ct_dccp: reserved packet type "; goto out_invalid; } + if (test_bit(type, &require_seq48) && !dh->dccph_x) { + msg = "nf_ct_dccp: type lacks 48bit sequence numbers"; + goto out_invalid; + } + return NF_ACCEPT; out_invalid: -- 2.34.1

2 1

[OLK-5.10 0/2] Support SRQ Context tracing by debugfs
by Chengchang Tang 16 Nov '23

16 Nov '23

From: Juan Zhou <zhoujuan51(a)h-partners.com> The first patch refactors the hns debugfs function, and the second patch supports querying SRQ context by debugfs. Junxian Huang (1): RDMA/hns: Refactor hns RoCE debugfs wenglianfa (1): RDMA/hns: Support SRQ Context tracing by debugfs drivers/infiniband/hw/hns/hns_roce_debugfs.c | 318 +++++++++---------- drivers/infiniband/hw/hns/hns_roce_debugfs.h | 57 ++++ drivers/infiniband/hw/hns/hns_roce_device.h | 7 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 26 +- drivers/infiniband/hw/hns/hns_roce_main.c | 1 - drivers/infiniband/hw/hns/hns_roce_srq.c | 12 + 6 files changed, 242 insertions(+), 179 deletions(-) -- 2.30.0

1 2

[PATCH OLK-5.10 v5 00/19] introduce smart_grid zone
by Yipeng Zou 16 Nov '23

16 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v4: 1. Place the highest level task in current domain level itself in sched_grid_prefer_cpus Since v3: 1. fix CI warning Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 11 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/fair.c | 496 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 273 ++++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 21 files changed, 1601 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

1 19

[PATCH openEuler-1.0-LTS,v2] netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
by Zhengchao Shao 16 Nov '23

16 Nov '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.5-rc1 commit ff0a3a7d52ff7282dbd183e7fc29a1fe386b0c30 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZSO CVE: CVE-2023-39197 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Eric Dumazet says: nf_conntrack_dccp_packet() has an unique: dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); And nothing more is 'pulled' from the packet, depending on the content. dh->dccph_doff, and/or dh->dccph_x ...) So dccp_ack_seq() is happily reading stuff past the _dh buffer. BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0 Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371 [..] Fix this by increasing the stack buffer to also include room for the extra sequence numbers and all the known dccp packet type headers, then pull again after the initial validation of the basic header. While at it, mark packets invalid that lack 48bit sequence bit but where RFC says the type MUST use them. Compile tested only. v2: first skb_header_pointer() now needs to adjust the size to only pull the generic header. (Eric) Heads-up: I intend to remove dccp conntrack support later this year. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Reported-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Florian Westphal <fw(a)strlen.de> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_conntrack_proto_dccp.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_conntrack_proto_dccp.c | 50 ++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index 3ba1f4d9934f..cd7a51a0f100 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -433,17 +433,47 @@ static u64 dccp_ack_seq(const struct dccp_hdr *dh) ntohl(dhack->dccph_ack_nr_low); } +struct nf_conntrack_dccp_buf { + struct dccp_hdr dh; /* generic header part */ + struct dccp_hdr_ext ext; /* optional depending dh->dccph_x */ + union { /* depends on header type */ + struct dccp_hdr_ack_bits ack; + struct dccp_hdr_request req; + struct dccp_hdr_response response; + struct dccp_hdr_reset rst; + } u; +}; + +static struct dccp_hdr * +dccp_header_pointer(const struct sk_buff *skb, int offset, const struct dccp_hdr *dh, + struct nf_conntrack_dccp_buf *buf) +{ + unsigned int hdrlen = __dccp_hdr_len(dh); + + if (hdrlen > sizeof(*buf)) + return NULL; + + return skb_header_pointer(skb, offset, hdrlen, buf); +} + static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); - struct dccp_hdr _dh, *dh; + struct nf_conntrack_dccp_buf _dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; unsigned int *timeouts; + struct dccp_hdr *dh; dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); BUG_ON(dh == NULL); + + /* pull again, including possible 48 bit sequences and subtype header */ + dh = dccp_header_pointer(skb, dataoff, dh, &_dh); + if (!dh) + return NF_DROP; + type = dh->dccph_type; if (type == DCCP_PKT_RESET && @@ -526,10 +556,20 @@ static int dccp_error(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, unsigned int dataoff, u_int8_t pf, unsigned int hooknum) { + static const unsigned long require_seq48 = 1 << DCCP_PKT_REQUEST | + 1 << DCCP_PKT_RESPONSE | + 1 << DCCP_PKT_CLOSEREQ | + 1 << DCCP_PKT_CLOSE | + 1 << DCCP_PKT_RESET | + 1 << DCCP_PKT_SYNC | + 1 << DCCP_PKT_SYNCACK; struct dccp_hdr _dh, *dh; unsigned int dccp_len = skb->len - dataoff; unsigned int cscov; const char *msg; + u8 type; + + BUILD_BUG_ON(DCCP_PKT_INVALID >= BITS_PER_LONG); dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); if (dh == NULL) { @@ -559,11 +599,17 @@ static int dccp_error(struct net *net, struct nf_conn *tmpl, goto out_invalid; } - if (dh->dccph_type >= DCCP_PKT_INVALID) { + type = dh->dccph_type; + if (type >= DCCP_PKT_INVALID) { msg = "nf_ct_dccp: reserved packet type "; goto out_invalid; } + if (test_bit(type, &require_seq48) && !dh->dccph_x) { + msg = "nf_ct_dccp: type lacks 48bit sequence numbers"; + goto out_invalid; + } + return NF_ACCEPT; out_invalid: -- 2.34.1

1 0

[PATCH OLK-5.10 v5 00/19] introduce smart_grid zone
by Yipeng Zou 16 Nov '23

16 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v4: 1. Place the highest level task in current domain level itself in sched_grid_prefer_cpus Since v3: 1. fix CI warning Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 11 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/fair.c | 496 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 273 ++++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 21 files changed, 1601 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

1 19

[PATCH OLK-5.10 v5 00/19] introduce smart_grid zone
by Yipeng Zou 16 Nov '23

16 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v4: 1. Place the highest level task in current domain level itself in sched_grid_prefer_cpus Since v3: 1. fix CI warning Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 11 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/fair.c | 496 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 273 ++++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 21 files changed, 1601 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

1 19

[PATCH OLK-5.10 v5 00/19] introduce smart_grid zone
by Yipeng Zou 16 Nov '23

16 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v4: 1. Place the highest level task in current domain level itself in sched_grid_prefer_cpus Since v3: 1. fix CI warning Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 11 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/fair.c | 496 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 273 ++++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 21 files changed, 1601 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

1 19

[PATCH openEuler-1.0-LTS] mm, memory_hotplug: update pcp lists everytime onlining a memory block
by Liu Shixin 15 Nov '23

15 Nov '23

From: Charan Teja Reddy <charante(a)codeaurora.org> mainline inclusion from mainline-v5.9-rc1 commit de1193f0be66f88e2c6d0fd965137668fc2ec4a5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8GREH CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When onlining a first memory block in a zone, pcp lists are not updated thus pcp struct will have the default setting of ->high = 0,->batch = 1. This means till the second memory block in a zone(if it have) is onlined the pcp lists of this zone will not contain any pages because pcp's ->count is always greater than ->high thus free_pcppages_bulk() is called to free batch size(=1) pages every time system wants to add a page to the pcp list through free_unref_page(). To put this in a word, system is not using benefits offered by the pcp lists when there is a single onlineable memory block in a zone. Correct this by always updating the pcp lists when memory block is onlined. Fixes: 1f522509c77a ("mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset") Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Reviewed-by: David Hildenbrand <david(a)redhat.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Vinayak Menon <vinmenon(a)codeaurora.org> Link: http://lkml.kernel.org/r/1596372896-15336-1-git-send-email-charante@codeaur… Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Conflicts: mm/memory_hotplug.c Signed-off-by: Liu Shixin <liushixin2(a)huawei.com> --- mm/memory_hotplug.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3b961b0fffcc..cebfa881e516 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -878,8 +878,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ node_states_set_node(nid, &arg); if (need_zonelists_rebuild) build_all_zonelists(NULL); - else - zone_pcp_update(zone); + zone_pcp_update(zone); } init_per_zone_wmark_min(); -- 2.25.1

2 1

[PATCH OLK-5.10] netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
by Zhengchao Shao 15 Nov '23

15 Nov '23

From: Florian Westphal <fw(a)strlen.de> stable inclusion from stable-v5.10.188 commit 9bdcda7abaf22f6453e5b5efb7eb4e524095d5d8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZSO CVE: CVE-2023-39197 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit ff0a3a7d52ff7282dbd183e7fc29a1fe386b0c30 ] Eric Dumazet says: nf_conntrack_dccp_packet() has an unique: dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); And nothing more is 'pulled' from the packet, depending on the content. dh->dccph_doff, and/or dh->dccph_x ...) So dccp_ack_seq() is happily reading stuff past the _dh buffer. BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0 Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371 [..] Fix this by increasing the stack buffer to also include room for the extra sequence numbers and all the known dccp packet type headers, then pull again after the initial validation of the basic header. While at it, mark packets invalid that lack 48bit sequence bit but where RFC says the type MUST use them. Compile tested only. v2: first skb_header_pointer() now needs to adjust the size to only pull the generic header. (Eric) Heads-up: I intend to remove dccp conntrack support later this year. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Reported-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Florian Westphal <fw(a)strlen.de> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_conntrack_proto_dccp.c | 52 +++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index 94001eb51ffe..a9ae292e932a 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -431,9 +431,19 @@ static bool dccp_error(const struct dccp_hdr *dh, struct sk_buff *skb, unsigned int dataoff, const struct nf_hook_state *state) { + static const unsigned long require_seq48 = 1 << DCCP_PKT_REQUEST | + 1 << DCCP_PKT_RESPONSE | + 1 << DCCP_PKT_CLOSEREQ | + 1 << DCCP_PKT_CLOSE | + 1 << DCCP_PKT_RESET | + 1 << DCCP_PKT_SYNC | + 1 << DCCP_PKT_SYNCACK; unsigned int dccp_len = skb->len - dataoff; unsigned int cscov; const char *msg; + u8 type; + + BUILD_BUG_ON(DCCP_PKT_INVALID >= BITS_PER_LONG); if (dh->dccph_doff * 4 < sizeof(struct dccp_hdr) || dh->dccph_doff * 4 > dccp_len) { @@ -458,10 +468,17 @@ static bool dccp_error(const struct dccp_hdr *dh, goto out_invalid; } - if (dh->dccph_type >= DCCP_PKT_INVALID) { + type = dh->dccph_type; + if (type >= DCCP_PKT_INVALID) { msg = "nf_ct_dccp: reserved packet type "; goto out_invalid; } + + if (test_bit(type, &require_seq48) && !dh->dccph_x) { + msg = "nf_ct_dccp: type lacks 48bit sequence numbers"; + goto out_invalid; + } + return false; out_invalid: nf_l4proto_log_invalid(skb, state->net, state->pf, @@ -469,24 +486,53 @@ static bool dccp_error(const struct dccp_hdr *dh, return true; } +struct nf_conntrack_dccp_buf { + struct dccp_hdr dh; /* generic header part */ + struct dccp_hdr_ext ext; /* optional depending dh->dccph_x */ + union { /* depends on header type */ + struct dccp_hdr_ack_bits ack; + struct dccp_hdr_request req; + struct dccp_hdr_response response; + struct dccp_hdr_reset rst; + } u; +}; + +static struct dccp_hdr * +dccp_header_pointer(const struct sk_buff *skb, int offset, const struct dccp_hdr *dh, + struct nf_conntrack_dccp_buf *buf) +{ + unsigned int hdrlen = __dccp_hdr_len(dh); + + if (hdrlen > sizeof(*buf)) + return NULL; + + return skb_header_pointer(skb, offset, hdrlen, buf); +} + int nf_conntrack_dccp_packet(struct nf_conn *ct, struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo, const struct nf_hook_state *state) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); - struct dccp_hdr _dh, *dh; + struct nf_conntrack_dccp_buf _dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; unsigned int *timeouts; + struct dccp_hdr *dh; - dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); + dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); if (!dh) return NF_DROP; if (dccp_error(dh, skb, dataoff, state)) return -NF_ACCEPT; + /* pull again, including possible 48 bit sequences and subtype header */ + dh = dccp_header_pointer(skb, dataoff, dh, &_dh); + if (!dh) + return NF_DROP; + type = dh->dccph_type; if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh)) return -NF_ACCEPT; -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
by Zhengchao Shao 15 Nov '23

15 Nov '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.5-rc1 commit ff0a3a7d52ff7282dbd183e7fc29a1fe386b0c30 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZSO CVE: CVE-2023-39197 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Eric Dumazet says: nf_conntrack_dccp_packet() has an unique: dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); And nothing more is 'pulled' from the packet, depending on the content. dh->dccph_doff, and/or dh->dccph_x ...) So dccp_ack_seq() is happily reading stuff past the _dh buffer. BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0 Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371 [..] Fix this by increasing the stack buffer to also include room for the extra sequence numbers and all the known dccp packet type headers, then pull again after the initial validation of the basic header. While at it, mark packets invalid that lack 48bit sequence bit but where RFC says the type MUST use them. Compile tested only. v2: first skb_header_pointer() now needs to adjust the size to only pull the generic header. (Eric) Heads-up: I intend to remove dccp conntrack support later this year. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Reported-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Florian Westphal <fw(a)strlen.de> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_conntrack_proto_dccp.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_conntrack_proto_dccp.c | 32 ++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index 3ba1f4d9934f..9681c0ce1d6f 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -433,17 +433,47 @@ static u64 dccp_ack_seq(const struct dccp_hdr *dh) ntohl(dhack->dccph_ack_nr_low); } +struct nf_conntrack_dccp_buf { + struct dccp_hdr dh; /* generic header part */ + struct dccp_hdr_ext ext; /* optional depending dh->dccph_x */ + union { /* depends on header type */ + struct dccp_hdr_ack_bits ack; + struct dccp_hdr_request req; + struct dccp_hdr_response response; + struct dccp_hdr_reset rst; + } u; +}; + +static struct dccp_hdr * +dccp_header_pointer(const struct sk_buff *skb, int offset, const struct dccp_hdr *dh, + struct nf_conntrack_dccp_buf *buf) +{ + unsigned int hdrlen = __dccp_hdr_len(dh); + + if (hdrlen > sizeof(*buf)) + return NULL; + + return skb_header_pointer(skb, offset, hdrlen, buf); +} + static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); - struct dccp_hdr _dh, *dh; + struct nf_conntrack_dccp_buf _dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; unsigned int *timeouts; + struct dccp_hdr *dh; dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); BUG_ON(dh == NULL); + + /* pull again, including possible 48 bit sequences and subtype header */ + dh = dccp_header_pointer(skb, dataoff, dh, &_dh); + if (!dh) + return NF_DROP; + type = dh->dccph_type; if (type == DCCP_PKT_RESET && -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS 0/3] bugfix for CVE-2022-45884
by liwei 14 Nov '23

14 Nov '23

bugfix for CVE-2022-45884 Dinghao Liu (1): media: dvbdev: Fix memleak in dvb_register_device Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() Mauro Carvalho Chehab (1): media: dvbdev: fix error logic at dvb_register_device() drivers/media/dvb-core/dvbdev.c | 88 +++++++++++++++++++++++++-------- include/media/dvbdev.h | 15 ++++++ 2 files changed, 83 insertions(+), 20 deletions(-) -- 2.25.1

2 4

[PATCH openEuler-22.03-LTS-SP1 0/3] bugfix for CVE-2022-45884
by liwei 14 Nov '23

14 Nov '23

bugfix for CVE-2022-45884 Dinghao Liu (1): media: dvbdev: Fix memleak in dvb_register_device Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() Mauro Carvalho Chehab (1): media: dvbdev: fix error logic at dvb_register_device() drivers/media/dvb-core/dvbdev.c | 88 +++++++++++++++++++++++++-------- include/media/dvbdev.h | 15 ++++++ 2 files changed, 83 insertions(+), 20 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10 v4 00/19] introduce smart_grid zone
by Yipeng Zou 14 Nov '23

14 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v3: 1. fix CI warning Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 11 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 268 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1598 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v3 00/19] introduce smart_grid zone
by Yipeng Zou 14 Nov '23

14 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[OLK-5.10 0/6] Support SW stats with debugfs
by Chengchang Tang 14 Nov '23

14 Nov '23

Support SW stats with debugfs. Juan Zhou (2): Revert "RDMA/hns: Fix missing dealloc_dfx_cnt() during device unregister" Revert "RDMA/hns: Add dfx cnt stats" Junxian Huang (4): RDMA/hns: Fix an inappropriate err code for unsupported operations RDMA/hns: Support SW stats with debugfs RDMA/hns: Remove return value checks of debugfs functions RDMA/hns: Don't set the HW stats ops for VF and HIP08 drivers/infiniband/hw/hns/hns_roce_cq.c | 6 +- drivers/infiniband/hw/hns/hns_roce_debugfs.c | 95 +++++++++++++++----- drivers/infiniband/hw/hns/hns_roce_device.h | 4 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 35 +++++--- drivers/infiniband/hw/hns/hns_roce_main.c | 70 +++++---------- drivers/infiniband/hw/hns/hns_roce_pd.c | 2 +- 6 files changed, 121 insertions(+), 91 deletions(-) -- 2.30.0

1 6

[PATCH v2 OLK-5.10] MAINTAINERS: update openEuler/MAINTAINERS for vdpa driver
by Jiang Dongxu 14 Nov '23

14 Nov '23

From: jiangdongxu <jiangdongxu1(a)huawei.com> driver inclusion category: doc bugzilla: https://gitee.com/openeuler/kernel/issues/I8G7VC ---------------------------- Add vdpa vhost-vdpa driver maintainers to openEuler/MAINTAINERS Signed-off-by: jiangdongxu <jiangdongxu1(a)huawei.com> --- openEuler/MAINTAINERS | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/openEuler/MAINTAINERS b/openEuler/MAINTAINERS index a74c666b0bf7..5c2108d55c2d 100644 --- a/openEuler/MAINTAINERS +++ b/openEuler/MAINTAINERS @@ -243,6 +243,12 @@ F: drivers/iommu/hisilicon F: drivers/ubc F: drivers/vfio/ubc +VDPA DRIVER +M: jiangdongxu <jiangdongxu1(a)huawei.com> +S: Maintained +F: drivers/vdpa +F: drivers/vhost/vdpa.c + THE REST M: xiexiuqi(a)huawei.com M: zhengzengkai(a)huawei.com -- 2.27.0

2 1

[PATCH OLK-5.10 0/3] bugfix for CVE-2022-45884
by liwei 14 Nov '23

14 Nov '23

bugfix for CVE-2022-45884 Dinghao Liu (1): media: dvbdev: Fix memleak in dvb_register_device Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() Mauro Carvalho Chehab (1): media: dvbdev: fix error logic at dvb_register_device() drivers/media/dvb-core/dvbdev.c | 88 +++++++++++++++++++++++++-------- include/media/dvbdev.h | 15 ++++++ 2 files changed, 83 insertions(+), 20 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10] MAINTAINERS: update openEuler/MAINTAINERS for vdpa driver
by Jiang Dongxu 14 Nov '23

14 Nov '23

From: jiangdongxu <jiangdongxu1(a)huawei.com> driver inclusion category: doc bugzilla: https://gitee.com/openeuler/kernel/issues/I8G7VC ---------------------------- Add vdpa vhost-vdpa driver maintainers to openEuler/MAINTAINERS --- openEuler/MAINTAINERS | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/openEuler/MAINTAINERS b/openEuler/MAINTAINERS index a74c666b0bf7..5c2108d55c2d 100644 --- a/openEuler/MAINTAINERS +++ b/openEuler/MAINTAINERS @@ -243,6 +243,12 @@ F: drivers/iommu/hisilicon F: drivers/ubc F: drivers/vfio/ubc +VDPA DRIVER +M: jiangdongxu <jiangdongxu1(a)huawei.com> +S: Maintained +F: drivers/vdpa +F: drivers/vhost/vdpa.c + THE REST M: xiexiuqi(a)huawei.com M: zhengzengkai(a)huawei.com -- 2.27.0

2 1

[PATCH OLK-5.10 v3 00/19] introduce smart_grid zone
by Yipeng Zou 14 Nov '23

14 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v3 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

1 19

[PATCH OLK-5.10 v3 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v3 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v3 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Since v2: 1. static alloc sg_zone cpumask. 2. fix some warning Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 135 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 267 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1592 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v3 0/2] Add error handle for sd
by Li Nan 13 Nov '23

13 Nov '23

Luis Chamberlain (1): scsi: sd: Add error handling support for add_disk() Zhong Jinghua (1): scsi: sd: Clean up sdkp if device_add_disk() failed drivers/scsi/sd.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.39.2

2 3

[PATCH OLK-5.10 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 136 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 263 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1589 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v2 00/19] introduce smart_grid zone
by Yipeng Zou 13 Nov '23

13 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 234 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 136 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 160 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 263 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 48 +++ kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1589 insertions(+), 18 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

1 19

[PATCH openEuler-1.0-LTS] drm/qxl: fix UAF on handle creation
by Guo Mengqi 13 Nov '23

13 Nov '23

From: Wander Lairson Costa <wander(a)redhat.com> stable inclusion from stable-v5.15.128 commit d578c919deb786b4d6ba8c7639255cb658731671 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZY4 CVE: CVE-2023-39198 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit c611589b4259ed63b9b77be6872b1ce07ec0ac16 upstream. qxl_mode_dumb_create() dereferences the qobj returned by qxl_gem_object_create_with_handle(), but the handle is the only one holding a reference to it. A potential attacker could guess the returned handle value and closes it between the return of qxl_gem_object_create_with_handle() and the qobj usage, triggering a use-after-free scenario. Reproducer: int dri_fd =-1; struct drm_mode_create_dumb arg = {0}; void gem_close(int handle); void* trigger(void* ptr) { int ret; arg.width = arg.height = 0x20; arg.bpp = 32; ret = ioctl(dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, &arg); if(ret) { perror("[*] DRM_IOCTL_MODE_CREATE_DUMB Failed"); exit(-1); } gem_close(arg.handle); while(1) { struct drm_mode_create_dumb args = {0}; args.width = args.height = 0x20; args.bpp = 32; ret = ioctl(dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, &args); if (ret) { perror("[*] DRM_IOCTL_MODE_CREATE_DUMB Failed"); exit(-1); } printf("[*] DRM_IOCTL_MODE_CREATE_DUMB created, %d\n", args.handle); gem_close(args.handle); } return NULL; } void gem_close(int handle) { struct drm_gem_close args; args.handle = handle; int ret = ioctl(dri_fd, DRM_IOCTL_GEM_CLOSE, &args); // gem close handle if (!ret) printf("gem close handle %d\n", args.handle); } int main(void) { dri_fd= open("/dev/dri/card0", O_RDWR); printf("fd:%d\n", dri_fd); if(dri_fd == -1) return -1; pthread_t tid1; if(pthread_create(&tid1,NULL,trigger,NULL)){ perror("[*] thread_create tid1\n"); return -1; } while (1) { gem_close(arg.handle); } return 0; } This is a KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in qxl_mode_dumb_create+0x3c2/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:69 Write of size 1 at addr ffff88801136c240 by task poc/515 CPU: 1 PID: 515 Comm: poc Not tainted 6.3.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 Call Trace: <TASK> __dump_stack linux/lib/dump_stack.c:88 dump_stack_lvl+0x48/0x70 linux/lib/dump_stack.c:106 print_address_description linux/mm/kasan/report.c:319 print_report+0xd2/0x660 linux/mm/kasan/report.c:430 kasan_report+0xd2/0x110 linux/mm/kasan/report.c:536 __asan_report_store1_noabort+0x17/0x30 linux/mm/kasan/report_generic.c:383 qxl_mode_dumb_create+0x3c2/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:69 drm_mode_create_dumb linux/drivers/gpu/drm/drm_dumb_buffers.c:96 drm_mode_create_dumb_ioctl+0x1f5/0x2d0 linux/drivers/gpu/drm/drm_dumb_buffers.c:102 drm_ioctl_kernel+0x21d/0x430 linux/drivers/gpu/drm/drm_ioctl.c:788 drm_ioctl+0x56f/0xcc0 linux/drivers/gpu/drm/drm_ioctl.c:891 vfs_ioctl linux/fs/ioctl.c:51 __do_sys_ioctl linux/fs/ioctl.c:870 __se_sys_ioctl linux/fs/ioctl.c:856 __x64_sys_ioctl+0x13d/0x1c0 linux/fs/ioctl.c:856 do_syscall_x64 linux/arch/x86/entry/common.c:50 do_syscall_64+0x5b/0x90 linux/arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc linux/arch/x86/entry/entry_64.S:120 RIP: 0033:0x7ff5004ff5f7 Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ff500408ea8 EFLAGS: 00000286 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5004ff5f7 RDX: 00007ff500408ec0 RSI: 00000000c02064b2 RDI: 0000000000000003 RBP: 00007ff500408ef0 R08: 0000000000000000 R09: 000000000000002a R10: 0000000000000000 R11: 0000000000000286 R12: 00007fff1c6cdafe R13: 00007fff1c6cdaff R14: 00007ff500408fc0 R15: 0000000000802000 </TASK> Allocated by task 515: kasan_save_stack+0x38/0x70 linux/mm/kasan/common.c:45 kasan_set_track+0x25/0x40 linux/mm/kasan/common.c:52 kasan_save_alloc_info+0x1e/0x40 linux/mm/kasan/generic.c:510 ____kasan_kmalloc linux/mm/kasan/common.c:374 __kasan_kmalloc+0xc3/0xd0 linux/mm/kasan/common.c:383 kasan_kmalloc linux/./include/linux/kasan.h:196 kmalloc_trace+0x48/0xc0 linux/mm/slab_common.c:1066 kmalloc linux/./include/linux/slab.h:580 kzalloc linux/./include/linux/slab.h:720 qxl_bo_create+0x11a/0x610 linux/drivers/gpu/drm/qxl/qxl_object.c:124 qxl_gem_object_create+0xd9/0x360 linux/drivers/gpu/drm/qxl/qxl_gem.c:58 qxl_gem_object_create_with_handle+0xa1/0x180 linux/drivers/gpu/drm/qxl/qxl_gem.c:89 qxl_mode_dumb_create+0x1cd/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:63 drm_mode_create_dumb linux/drivers/gpu/drm/drm_dumb_buffers.c:96 drm_mode_create_dumb_ioctl+0x1f5/0x2d0 linux/drivers/gpu/drm/drm_dumb_buffers.c:102 drm_ioctl_kernel+0x21d/0x430 linux/drivers/gpu/drm/drm_ioctl.c:788 drm_ioctl+0x56f/0xcc0 linux/drivers/gpu/drm/drm_ioctl.c:891 vfs_ioctl linux/fs/ioctl.c:51 __do_sys_ioctl linux/fs/ioctl.c:870 __se_sys_ioctl linux/fs/ioctl.c:856 __x64_sys_ioctl+0x13d/0x1c0 linux/fs/ioctl.c:856 do_syscall_x64 linux/arch/x86/entry/common.c:50 do_syscall_64+0x5b/0x90 linux/arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc linux/arch/x86/entry/entry_64.S:120 Freed by task 515: kasan_save_stack+0x38/0x70 linux/mm/kasan/common.c:45 kasan_set_track+0x25/0x40 linux/mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x60 linux/mm/kasan/generic.c:521 ____kasan_slab_free linux/mm/kasan/common.c:236 ____kasan_slab_free+0x180/0x1f0 linux/mm/kasan/common.c:200 __kasan_slab_free+0x12/0x30 linux/mm/kasan/common.c:244 kasan_slab_free linux/./include/linux/kasan.h:162 slab_free_hook linux/mm/slub.c:1781 slab_free_freelist_hook+0xd2/0x1a0 linux/mm/slub.c:1807 slab_free linux/mm/slub.c:3787 __kmem_cache_free+0x196/0x2d0 linux/mm/slub.c:3800 kfree+0x78/0x120 linux/mm/slab_common.c:1019 qxl_ttm_bo_destroy+0x140/0x1a0 linux/drivers/gpu/drm/qxl/qxl_object.c:49 ttm_bo_release+0x678/0xa30 linux/drivers/gpu/drm/ttm/ttm_bo.c:381 kref_put linux/./include/linux/kref.h:65 ttm_bo_put+0x50/0x80 linux/drivers/gpu/drm/ttm/ttm_bo.c:393 qxl_gem_object_free+0x3e/0x60 linux/drivers/gpu/drm/qxl/qxl_gem.c:42 drm_gem_object_free+0x5c/0x90 linux/drivers/gpu/drm/drm_gem.c:974 kref_put linux/./include/linux/kref.h:65 __drm_gem_object_put linux/./include/drm/drm_gem.h:431 drm_gem_object_put linux/./include/drm/drm_gem.h:444 qxl_gem_object_create_with_handle+0x151/0x180 linux/drivers/gpu/drm/qxl/qxl_gem.c:100 qxl_mode_dumb_create+0x1cd/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:63 drm_mode_create_dumb linux/drivers/gpu/drm/drm_dumb_buffers.c:96 drm_mode_create_dumb_ioctl+0x1f5/0x2d0 linux/drivers/gpu/drm/drm_dumb_buffers.c:102 drm_ioctl_kernel+0x21d/0x430 linux/drivers/gpu/drm/drm_ioctl.c:788 drm_ioctl+0x56f/0xcc0 linux/drivers/gpu/drm/drm_ioctl.c:891 vfs_ioctl linux/fs/ioctl.c:51 __do_sys_ioctl linux/fs/ioctl.c:870 __se_sys_ioctl linux/fs/ioctl.c:856 __x64_sys_ioctl+0x13d/0x1c0 linux/fs/ioctl.c:856 do_syscall_x64 linux/arch/x86/entry/common.c:50 do_syscall_64+0x5b/0x90 linux/arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc linux/arch/x86/entry/entry_64.S:120 The buggy address belongs to the object at ffff88801136c000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 576 bytes inside of freed 1024-byte region [ffff88801136c000, ffff88801136c400) The buggy address belongs to the physical page: page:0000000089fc329b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11368 head:0000000089fc329b order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0010200 ffff888007841dc0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801136c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801136c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88801136c200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801136c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801136c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint Instead of returning a weak reference to the qxl_bo object, return the created drm_gem_object and let the caller decrement the reference count when it no longer needs it. As a convenience, if the caller is not interested in the gobj object, it can pass NULL to the parameter and the reference counting is descremented internally. The bug and the reproducer were originally found by the Zero Day Initiative project (ZDI-CAN-20940). Link: https://www.zerodayinitiative.com/ Signed-off-by: Wander Lairson Costa <wander(a)redhat.com> Cc: stable(a)vger.kernel.org Reviewed-by: Dave Airlie <airlied(a)redhat.com> Signed-off-by: Dave Airlie <airlied(a)redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230814165119.90847-1-wander… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Guo Mengqi <guomengqi3(a)huawei.com> --- drivers/gpu/drm/qxl/qxl_drv.h | 2 +- drivers/gpu/drm/qxl/qxl_dumb.c | 5 ++++- drivers/gpu/drm/qxl/qxl_gem.c | 25 +++++++++++++++++-------- drivers/gpu/drm/qxl/qxl_ioctl.c | 6 ++---- 4 files changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h index 01220d386b0a..1414907ddb50 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.h +++ b/drivers/gpu/drm/qxl/qxl_drv.h @@ -379,7 +379,7 @@ int qxl_gem_object_create_with_handle(struct qxl_device *qdev, u32 domain, size_t size, struct qxl_surface *surf, - struct qxl_bo **qobj, + struct drm_gem_object **gobj, uint32_t *handle); void qxl_gem_object_free(struct drm_gem_object *gobj); int qxl_gem_object_open(struct drm_gem_object *obj, struct drm_file *file_priv); diff --git a/drivers/gpu/drm/qxl/qxl_dumb.c b/drivers/gpu/drm/qxl/qxl_dumb.c index c666b89eed5d..e5d5ab5b64ee 100644 --- a/drivers/gpu/drm/qxl/qxl_dumb.c +++ b/drivers/gpu/drm/qxl/qxl_dumb.c @@ -34,6 +34,7 @@ int qxl_mode_dumb_create(struct drm_file *file_priv, { struct qxl_device *qdev = dev->dev_private; struct qxl_bo *qobj; + struct drm_gem_object *gobj; uint32_t handle; int r; struct qxl_surface surf; @@ -59,11 +60,13 @@ int qxl_mode_dumb_create(struct drm_file *file_priv, surf.format = format; r = qxl_gem_object_create_with_handle(qdev, file_priv, QXL_GEM_DOMAIN_VRAM, - args->size, &surf, &qobj, + args->size, &surf, &gobj, &handle); if (r) return r; + qobj = gem_to_qxl_bo(gobj); qobj->is_dumb = true; + drm_gem_object_put_unlocked(gobj); args->pitch = pitch; args->handle = handle; return 0; diff --git a/drivers/gpu/drm/qxl/qxl_gem.c b/drivers/gpu/drm/qxl/qxl_gem.c index f5c1e7872e92..886f0cae2dc5 100644 --- a/drivers/gpu/drm/qxl/qxl_gem.c +++ b/drivers/gpu/drm/qxl/qxl_gem.c @@ -73,32 +73,41 @@ int qxl_gem_object_create(struct qxl_device *qdev, int size, return 0; } +/* + * If the caller passed a valid gobj pointer, it is responsible to call + * drm_gem_object_put() when it no longer needs to acess the object. + * + * If gobj is NULL, it is handled internally. + */ int qxl_gem_object_create_with_handle(struct qxl_device *qdev, struct drm_file *file_priv, u32 domain, size_t size, struct qxl_surface *surf, - struct qxl_bo **qobj, + struct drm_gem_object **gobj, uint32_t *handle) { - struct drm_gem_object *gobj; int r; + struct drm_gem_object *local_gobj; - BUG_ON(!qobj); BUG_ON(!handle); r = qxl_gem_object_create(qdev, size, 0, domain, false, false, surf, - &gobj); + &local_gobj); if (r) return -ENOMEM; - r = drm_gem_handle_create(file_priv, gobj, handle); + r = drm_gem_handle_create(file_priv, local_gobj, handle); if (r) return r; - /* drop reference from allocate - handle holds it now */ - *qobj = gem_to_qxl_bo(gobj); - drm_gem_object_put_unlocked(gobj); + + if (gobj) + *gobj = local_gobj; + else + /* drop reference from allocate - handle holds it now */ + drm_gem_object_put_unlocked(local_gobj); + return 0; } diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/drivers/gpu/drm/qxl/qxl_ioctl.c index b60e3df003b4..eb4d6827f74a 100644 --- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -36,7 +36,6 @@ static int qxl_alloc_ioctl(struct drm_device *dev, void *data, struct qxl_device *qdev = dev->dev_private; struct drm_qxl_alloc *qxl_alloc = data; int ret; - struct qxl_bo *qobj; uint32_t handle; u32 domain = QXL_GEM_DOMAIN_VRAM; @@ -48,7 +47,7 @@ static int qxl_alloc_ioctl(struct drm_device *dev, void *data, domain, qxl_alloc->size, NULL, - &qobj, &handle); + NULL, &handle); if (ret) { DRM_ERROR("%s: failed to create gem ret=%d\n", __func__, ret); @@ -391,7 +390,6 @@ static int qxl_alloc_surf_ioctl(struct drm_device *dev, void *data, { struct qxl_device *qdev = dev->dev_private; struct drm_qxl_alloc_surf *param = data; - struct qxl_bo *qobj; int handle; int ret; int size, actual_stride; @@ -411,7 +409,7 @@ static int qxl_alloc_surf_ioctl(struct drm_device *dev, void *data, QXL_GEM_DOMAIN_SURFACE, size, &surf, - &qobj, &handle); + NULL, &handle); if (ret) { DRM_ERROR("%s: failed to create gem ret=%d\n", __func__, ret); -- 2.17.1

2 1

[PATCH OLK-5.10] drm/qxl: fix UAF on handle creation
by Guo Mengqi 13 Nov '23

13 Nov '23

From: Wander Lairson Costa <wander(a)redhat.com> stable inclusion from stable-v5.15.128 commit d578c919deb786b4d6ba8c7639255cb658731671 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EZY4 CVE: CVE-2023-39198 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit c611589b4259ed63b9b77be6872b1ce07ec0ac16 upstream. qxl_mode_dumb_create() dereferences the qobj returned by qxl_gem_object_create_with_handle(), but the handle is the only one holding a reference to it. A potential attacker could guess the returned handle value and closes it between the return of qxl_gem_object_create_with_handle() and the qobj usage, triggering a use-after-free scenario. Reproducer: int dri_fd =-1; struct drm_mode_create_dumb arg = {0}; void gem_close(int handle); void* trigger(void* ptr) { int ret; arg.width = arg.height = 0x20; arg.bpp = 32; ret = ioctl(dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, &arg); if(ret) { perror("[*] DRM_IOCTL_MODE_CREATE_DUMB Failed"); exit(-1); } gem_close(arg.handle); while(1) { struct drm_mode_create_dumb args = {0}; args.width = args.height = 0x20; args.bpp = 32; ret = ioctl(dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, &args); if (ret) { perror("[*] DRM_IOCTL_MODE_CREATE_DUMB Failed"); exit(-1); } printf("[*] DRM_IOCTL_MODE_CREATE_DUMB created, %d\n", args.handle); gem_close(args.handle); } return NULL; } void gem_close(int handle) { struct drm_gem_close args; args.handle = handle; int ret = ioctl(dri_fd, DRM_IOCTL_GEM_CLOSE, &args); // gem close handle if (!ret) printf("gem close handle %d\n", args.handle); } int main(void) { dri_fd= open("/dev/dri/card0", O_RDWR); printf("fd:%d\n", dri_fd); if(dri_fd == -1) return -1; pthread_t tid1; if(pthread_create(&tid1,NULL,trigger,NULL)){ perror("[*] thread_create tid1\n"); return -1; } while (1) { gem_close(arg.handle); } return 0; } This is a KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in qxl_mode_dumb_create+0x3c2/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:69 Write of size 1 at addr ffff88801136c240 by task poc/515 CPU: 1 PID: 515 Comm: poc Not tainted 6.3.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 Call Trace: <TASK> __dump_stack linux/lib/dump_stack.c:88 dump_stack_lvl+0x48/0x70 linux/lib/dump_stack.c:106 print_address_description linux/mm/kasan/report.c:319 print_report+0xd2/0x660 linux/mm/kasan/report.c:430 kasan_report+0xd2/0x110 linux/mm/kasan/report.c:536 __asan_report_store1_noabort+0x17/0x30 linux/mm/kasan/report_generic.c:383 qxl_mode_dumb_create+0x3c2/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:69 drm_mode_create_dumb linux/drivers/gpu/drm/drm_dumb_buffers.c:96 drm_mode_create_dumb_ioctl+0x1f5/0x2d0 linux/drivers/gpu/drm/drm_dumb_buffers.c:102 drm_ioctl_kernel+0x21d/0x430 linux/drivers/gpu/drm/drm_ioctl.c:788 drm_ioctl+0x56f/0xcc0 linux/drivers/gpu/drm/drm_ioctl.c:891 vfs_ioctl linux/fs/ioctl.c:51 __do_sys_ioctl linux/fs/ioctl.c:870 __se_sys_ioctl linux/fs/ioctl.c:856 __x64_sys_ioctl+0x13d/0x1c0 linux/fs/ioctl.c:856 do_syscall_x64 linux/arch/x86/entry/common.c:50 do_syscall_64+0x5b/0x90 linux/arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc linux/arch/x86/entry/entry_64.S:120 RIP: 0033:0x7ff5004ff5f7 Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ff500408ea8 EFLAGS: 00000286 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5004ff5f7 RDX: 00007ff500408ec0 RSI: 00000000c02064b2 RDI: 0000000000000003 RBP: 00007ff500408ef0 R08: 0000000000000000 R09: 000000000000002a R10: 0000000000000000 R11: 0000000000000286 R12: 00007fff1c6cdafe R13: 00007fff1c6cdaff R14: 00007ff500408fc0 R15: 0000000000802000 </TASK> Allocated by task 515: kasan_save_stack+0x38/0x70 linux/mm/kasan/common.c:45 kasan_set_track+0x25/0x40 linux/mm/kasan/common.c:52 kasan_save_alloc_info+0x1e/0x40 linux/mm/kasan/generic.c:510 ____kasan_kmalloc linux/mm/kasan/common.c:374 __kasan_kmalloc+0xc3/0xd0 linux/mm/kasan/common.c:383 kasan_kmalloc linux/./include/linux/kasan.h:196 kmalloc_trace+0x48/0xc0 linux/mm/slab_common.c:1066 kmalloc linux/./include/linux/slab.h:580 kzalloc linux/./include/linux/slab.h:720 qxl_bo_create+0x11a/0x610 linux/drivers/gpu/drm/qxl/qxl_object.c:124 qxl_gem_object_create+0xd9/0x360 linux/drivers/gpu/drm/qxl/qxl_gem.c:58 qxl_gem_object_create_with_handle+0xa1/0x180 linux/drivers/gpu/drm/qxl/qxl_gem.c:89 qxl_mode_dumb_create+0x1cd/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:63 drm_mode_create_dumb linux/drivers/gpu/drm/drm_dumb_buffers.c:96 drm_mode_create_dumb_ioctl+0x1f5/0x2d0 linux/drivers/gpu/drm/drm_dumb_buffers.c:102 drm_ioctl_kernel+0x21d/0x430 linux/drivers/gpu/drm/drm_ioctl.c:788 drm_ioctl+0x56f/0xcc0 linux/drivers/gpu/drm/drm_ioctl.c:891 vfs_ioctl linux/fs/ioctl.c:51 __do_sys_ioctl linux/fs/ioctl.c:870 __se_sys_ioctl linux/fs/ioctl.c:856 __x64_sys_ioctl+0x13d/0x1c0 linux/fs/ioctl.c:856 do_syscall_x64 linux/arch/x86/entry/common.c:50 do_syscall_64+0x5b/0x90 linux/arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc linux/arch/x86/entry/entry_64.S:120 Freed by task 515: kasan_save_stack+0x38/0x70 linux/mm/kasan/common.c:45 kasan_set_track+0x25/0x40 linux/mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x60 linux/mm/kasan/generic.c:521 ____kasan_slab_free linux/mm/kasan/common.c:236 ____kasan_slab_free+0x180/0x1f0 linux/mm/kasan/common.c:200 __kasan_slab_free+0x12/0x30 linux/mm/kasan/common.c:244 kasan_slab_free linux/./include/linux/kasan.h:162 slab_free_hook linux/mm/slub.c:1781 slab_free_freelist_hook+0xd2/0x1a0 linux/mm/slub.c:1807 slab_free linux/mm/slub.c:3787 __kmem_cache_free+0x196/0x2d0 linux/mm/slub.c:3800 kfree+0x78/0x120 linux/mm/slab_common.c:1019 qxl_ttm_bo_destroy+0x140/0x1a0 linux/drivers/gpu/drm/qxl/qxl_object.c:49 ttm_bo_release+0x678/0xa30 linux/drivers/gpu/drm/ttm/ttm_bo.c:381 kref_put linux/./include/linux/kref.h:65 ttm_bo_put+0x50/0x80 linux/drivers/gpu/drm/ttm/ttm_bo.c:393 qxl_gem_object_free+0x3e/0x60 linux/drivers/gpu/drm/qxl/qxl_gem.c:42 drm_gem_object_free+0x5c/0x90 linux/drivers/gpu/drm/drm_gem.c:974 kref_put linux/./include/linux/kref.h:65 __drm_gem_object_put linux/./include/drm/drm_gem.h:431 drm_gem_object_put linux/./include/drm/drm_gem.h:444 qxl_gem_object_create_with_handle+0x151/0x180 linux/drivers/gpu/drm/qxl/qxl_gem.c:100 qxl_mode_dumb_create+0x1cd/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:63 drm_mode_create_dumb linux/drivers/gpu/drm/drm_dumb_buffers.c:96 drm_mode_create_dumb_ioctl+0x1f5/0x2d0 linux/drivers/gpu/drm/drm_dumb_buffers.c:102 drm_ioctl_kernel+0x21d/0x430 linux/drivers/gpu/drm/drm_ioctl.c:788 drm_ioctl+0x56f/0xcc0 linux/drivers/gpu/drm/drm_ioctl.c:891 vfs_ioctl linux/fs/ioctl.c:51 __do_sys_ioctl linux/fs/ioctl.c:870 __se_sys_ioctl linux/fs/ioctl.c:856 __x64_sys_ioctl+0x13d/0x1c0 linux/fs/ioctl.c:856 do_syscall_x64 linux/arch/x86/entry/common.c:50 do_syscall_64+0x5b/0x90 linux/arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc linux/arch/x86/entry/entry_64.S:120 The buggy address belongs to the object at ffff88801136c000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 576 bytes inside of freed 1024-byte region [ffff88801136c000, ffff88801136c400) The buggy address belongs to the physical page: page:0000000089fc329b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11368 head:0000000089fc329b order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0010200 ffff888007841dc0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801136c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801136c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88801136c200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801136c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801136c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint Instead of returning a weak reference to the qxl_bo object, return the created drm_gem_object and let the caller decrement the reference count when it no longer needs it. As a convenience, if the caller is not interested in the gobj object, it can pass NULL to the parameter and the reference counting is descremented internally. The bug and the reproducer were originally found by the Zero Day Initiative project (ZDI-CAN-20940). Link: https://www.zerodayinitiative.com/ Signed-off-by: Wander Lairson Costa <wander(a)redhat.com> Cc: stable(a)vger.kernel.org Reviewed-by: Dave Airlie <airlied(a)redhat.com> Signed-off-by: Dave Airlie <airlied(a)redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230814165119.90847-1-wander… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Guo Mengqi <guomengqi3(a)huawei.com> --- drivers/gpu/drm/qxl/qxl_drv.h | 2 +- drivers/gpu/drm/qxl/qxl_dumb.c | 5 ++++- drivers/gpu/drm/qxl/qxl_gem.c | 25 +++++++++++++++++-------- drivers/gpu/drm/qxl/qxl_ioctl.c | 6 ++---- 4 files changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h index aae90a9ee1db..ee59ef2cba77 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.h +++ b/drivers/gpu/drm/qxl/qxl_drv.h @@ -329,7 +329,7 @@ int qxl_gem_object_create_with_handle(struct qxl_device *qdev, u32 domain, size_t size, struct qxl_surface *surf, - struct qxl_bo **qobj, + struct drm_gem_object **gobj, uint32_t *handle); void qxl_gem_object_free(struct drm_gem_object *gobj); int qxl_gem_object_open(struct drm_gem_object *obj, struct drm_file *file_priv); diff --git a/drivers/gpu/drm/qxl/qxl_dumb.c b/drivers/gpu/drm/qxl/qxl_dumb.c index e377bdbff90d..f7bafc791b1e 100644 --- a/drivers/gpu/drm/qxl/qxl_dumb.c +++ b/drivers/gpu/drm/qxl/qxl_dumb.c @@ -34,6 +34,7 @@ int qxl_mode_dumb_create(struct drm_file *file_priv, { struct qxl_device *qdev = to_qxl(dev); struct qxl_bo *qobj; + struct drm_gem_object *gobj; uint32_t handle; int r; struct qxl_surface surf; @@ -62,11 +63,13 @@ int qxl_mode_dumb_create(struct drm_file *file_priv, r = qxl_gem_object_create_with_handle(qdev, file_priv, QXL_GEM_DOMAIN_SURFACE, - args->size, &surf, &qobj, + args->size, &surf, &gobj, &handle); if (r) return r; + qobj = gem_to_qxl_bo(gobj); qobj->is_dumb = true; + drm_gem_object_put(gobj); args->pitch = pitch; args->handle = handle; return 0; diff --git a/drivers/gpu/drm/qxl/qxl_gem.c b/drivers/gpu/drm/qxl/qxl_gem.c index a08da0bd9098..fc5e3763c359 100644 --- a/drivers/gpu/drm/qxl/qxl_gem.c +++ b/drivers/gpu/drm/qxl/qxl_gem.c @@ -72,32 +72,41 @@ int qxl_gem_object_create(struct qxl_device *qdev, int size, return 0; } +/* + * If the caller passed a valid gobj pointer, it is responsible to call + * drm_gem_object_put() when it no longer needs to acess the object. + * + * If gobj is NULL, it is handled internally. + */ int qxl_gem_object_create_with_handle(struct qxl_device *qdev, struct drm_file *file_priv, u32 domain, size_t size, struct qxl_surface *surf, - struct qxl_bo **qobj, + struct drm_gem_object **gobj, uint32_t *handle) { - struct drm_gem_object *gobj; int r; + struct drm_gem_object *local_gobj; - BUG_ON(!qobj); BUG_ON(!handle); r = qxl_gem_object_create(qdev, size, 0, domain, false, false, surf, - &gobj); + &local_gobj); if (r) return -ENOMEM; - r = drm_gem_handle_create(file_priv, gobj, handle); + r = drm_gem_handle_create(file_priv, local_gobj, handle); if (r) return r; - /* drop reference from allocate - handle holds it now */ - *qobj = gem_to_qxl_bo(gobj); - drm_gem_object_put(gobj); + + if (gobj) + *gobj = local_gobj; + else + /* drop reference from allocate - handle holds it now */ + drm_gem_object_put(local_gobj); + return 0; } diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/drivers/gpu/drm/qxl/qxl_ioctl.c index 5cea6eea72ab..9a02c4871400 100644 --- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -39,7 +39,6 @@ static int qxl_alloc_ioctl(struct drm_device *dev, void *data, struct qxl_device *qdev = to_qxl(dev); struct drm_qxl_alloc *qxl_alloc = data; int ret; - struct qxl_bo *qobj; uint32_t handle; u32 domain = QXL_GEM_DOMAIN_VRAM; @@ -51,7 +50,7 @@ static int qxl_alloc_ioctl(struct drm_device *dev, void *data, domain, qxl_alloc->size, NULL, - &qobj, &handle); + NULL, &handle); if (ret) { DRM_ERROR("%s: failed to create gem ret=%d\n", __func__, ret); @@ -393,7 +392,6 @@ static int qxl_alloc_surf_ioctl(struct drm_device *dev, void *data, { struct qxl_device *qdev = to_qxl(dev); struct drm_qxl_alloc_surf *param = data; - struct qxl_bo *qobj; int handle; int ret; int size, actual_stride; @@ -413,7 +411,7 @@ static int qxl_alloc_surf_ioctl(struct drm_device *dev, void *data, QXL_GEM_DOMAIN_SURFACE, size, &surf, - &qobj, &handle); + NULL, &handle); if (ret) { DRM_ERROR("%s: failed to create gem ret=%d\n", __func__, ret); -- 2.17.1

2 1

[PATCH OLK-5.10 00/19] introduce smart_grid zone
by Yipeng Zou 11 Nov '23

11 Nov '23

The patch sets include two parts: 1. patch 1~15: Rebase smart_grid from openeuler-1.0-LTS to OLK-5.10 2. patch 16~19: introduce smart_grid zone qos and cpufreq Hui Tang (13): sched: Introduce smart grid scheduling strategy for cfs sched: fix smart grid usage count sched: fix WARN found by deadlock detect sched: Fix possible deadlock in tg_set_dynamic_affinity_mode sched: Fix negative count for jump label sched: Fix timer storm for smart grid sched: fix dereference NULL pointers sched: Fix memory leak on error branch sched: clear credit count in error branch sched: Adjust few parameters range for smart grid sched: Delete redundant updates to p->prefer_cpus sched: Fix memory leak for smart grid sched: Fix null pointer derefrence for sd->span Wang ShaoBo (2): sched: smart grid: init sched_grid_qos structure on QOS purpose config: enable CONFIG_QOS_SCHED_SMART_GRID by default Yipeng Zou (4): sched: introduce smart grid qos zone smart_grid: introduce /proc/pid/smart_grid_level smart_grid: introduce smart_grid_strategy_ctrl sysctl smart_grid: cpufreq: introduce smart_grid cpufreq control arch/arm64/configs/openeuler_defconfig | 1 + drivers/cpufreq/cpufreq.c | 231 ++++++++++++ fs/proc/array.c | 13 + fs/proc/base.c | 76 ++++ include/linux/cpufreq.h | 6 + include/linux/sched.h | 22 ++ include/linux/sched/grid_qos.h | 133 +++++++ include/linux/sched/sysctl.h | 5 + init/Kconfig | 13 + kernel/fork.c | 15 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 158 +++++++- kernel/sched/cpufreq_schedutil.c | 1 + kernel/sched/fair.c | 497 ++++++++++++++++++++++++- kernel/sched/grid/Makefile | 2 + kernel/sched/grid/internal.h | 6 + kernel/sched/grid/power.c | 27 ++ kernel/sched/grid/qos.c | 263 +++++++++++++ kernel/sched/grid/stat.c | 47 +++ kernel/sched/sched.h | 51 ++- kernel/sysctl.c | 22 +- mm/mempolicy.c | 12 +- 22 files changed, 1583 insertions(+), 19 deletions(-) create mode 100644 include/linux/sched/grid_qos.h create mode 100644 kernel/sched/grid/Makefile create mode 100644 kernel/sched/grid/internal.h create mode 100644 kernel/sched/grid/power.c create mode 100644 kernel/sched/grid/qos.c create mode 100644 kernel/sched/grid/stat.c -- 2.34.1

2 20

[PATCH OLK-5.10 v3 00/22] Add error handle for driver
by Li Nan 11 Nov '23

11 Nov '23

Christoph Hellwig (9): block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (7): loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Zhong Jinghua (3): scsi: sd: Clean up sdkp if device_add_disk() failed mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup include/linux/blk-mq.h | 12 +++++ include/linux/genhd.h | 23 +++++++++ block/blk-mq.c | 19 ++++++++ block/genhd.c | 42 +++++++++++++++- drivers/block/brd.c | 103 +++++++++++++++++---------------------- drivers/block/loop.c | 26 +++++----- drivers/block/nbd.c | 60 ++++++++++------------- drivers/mtd/ubi/block.c | 76 ++++++++++++++--------------- drivers/nvme/host/core.c | 41 ++++++++-------- drivers/scsi/sd.c | 8 ++- drivers/scsi/sr.c | 7 ++- 11 files changed, 246 insertions(+), 171 deletions(-) -- 2.39.2

1 22

[PATCH OLK-5.10 v3 00/22] Add error handle for driver
by Li Nan 10 Nov '23

10 Nov '23

Christoph Hellwig (9): block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (7): loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Zhong Jinghua (3): scsi: sd: Clean up sdkp if device_add_disk() failed mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup include/linux/blk-mq.h | 12 +++++ include/linux/genhd.h | 23 +++++++++ block/blk-mq.c | 19 ++++++++ block/genhd.c | 42 +++++++++++++++- drivers/block/brd.c | 103 +++++++++++++++++---------------------- drivers/block/loop.c | 26 +++++----- drivers/block/nbd.c | 60 ++++++++++------------- drivers/mtd/ubi/block.c | 76 ++++++++++++++--------------- drivers/nvme/host/core.c | 41 ++++++++-------- drivers/scsi/sd.c | 8 ++- drivers/scsi/sr.c | 7 ++- 11 files changed, 246 insertions(+), 171 deletions(-) -- 2.39.2

1 22

openEuler Kernel SIG 双周例会
by openEuler conference 10 Nov '23

10 Nov '23

您好！ Kernel SIG 邀请您参加 2023-11-24 14:00 召开的WeLink会议(自动录制) 会议主题：openEuler Kernel SIG 双周例会会议内容： 1. 进展update 2. 议题征集中（新增议题可回复邮件申请，也可直接填至会议看板）会议链接：https://bmeeting.huaweicloud.com:36443/#/j/963265866 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 温馨提醒：建议接入会议后修改参会人的姓名，也可以使用您在gitee.com的ID 更多资讯尽在：https://openeuler.org/zh/ Hello! openEuler Kernel SIG invites you to attend the WeLink conference(auto recording) will be held at 2023-11-24 14:00, The subject of the conference is openEuler Kernel SIG 双周例会, Summary: 1. 进展update 2. 议题征集中（新增议题可回复邮件申请，也可直接填至会议看板） You can join the meeting at https://bmeeting.huaweicloud.com:36443/#/j/963265866. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com. More information: https://openeuler.org/en/

1 0

[PATCH openEuler-22.03-LTS 00/22] Add error handle for add_disk
by Li Nan 09 Nov '23

09 Nov '23

To make applying the mainline patch easier, reorder the nbd commit about the first_minor check here. Christoph Hellwig (5): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference Luis Chamberlain (4): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling Tetsuo Handa (1): block: check minor range in device_add_disk() Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (7): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path block/blk.h | 5 +- include/linux/genhd.h | 13 +++ block/blk-integrity.c | 12 ++- block/genhd.c | 246 +++++++++++++++++++++++++----------------- drivers/block/nbd.c | 15 +-- 5 files changed, 175 insertions(+), 116 deletions(-) -- 2.39.2

2 23

[PATCH OLK-5.10 v2 00/20] Add error handle for driver
by Li Nan 09 Nov '23

09 Nov '23

Christoph Hellwig (9): block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (6): loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Zhong Jinghua (2): mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup include/linux/blk-mq.h | 12 +++++ include/linux/genhd.h | 23 +++++++++ block/blk-mq.c | 19 ++++++++ block/genhd.c | 42 +++++++++++++++- drivers/block/brd.c | 103 +++++++++++++++++---------------------- drivers/block/loop.c | 26 +++++----- drivers/block/nbd.c | 60 ++++++++++------------- drivers/mtd/ubi/block.c | 76 ++++++++++++++--------------- drivers/nvme/host/core.c | 41 ++++++++-------- drivers/scsi/sr.c | 7 ++- 10 files changed, 239 insertions(+), 170 deletions(-) -- 2.39.2

2 21

[PATCH OLK-5.10] arm64: config: Disable CONFIG_ARM64_PBHA by default
by Wupeng Ma 09 Nov '23

09 Nov '23

From: Ma Wupeng <mawupeng1(a)huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZC0H -------------------------------- Disable feature PBHA for arm64 by default. Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index aa12c2a7ebaa..ea2452d5576d 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -479,7 +479,7 @@ CONFIG_ARM64_VHE=y CONFIG_ARM64_PMEM=y CONFIG_ARM64_RAS_EXTN=y CONFIG_ARM64_CNP=y -CONFIG_ARM64_PBHA=y +# CONFIG_ARM64_PBHA is not set # end of ARMv8.2 architectural features # -- 2.25.1

2 1

[PATCH OLK-5.10 0/2] Fix memleak in disassociate_ctty()
by Yi Yang 09 Nov '23

09 Nov '23

Fix it by revert hulk patch and backport patch from mainline. Yi Yang (2): Revert "tty: fix pid memleak in disassociate_ctty()" tty: tty_jobctrl: fix pid memleak in disassociate_ctty() drivers/tty/tty_jobctrl.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) -- 2.25.1

2 3

[PATCH openEuler-1.0-LTS 0/2] Fix memleak in disassociate_ctty()
by Yi Yang 09 Nov '23

09 Nov '23

Fix it by revert hulk patch and backport patch from mainline. Yi Yang (2): Revert "tty: fix pid memleak in disassociate_ctty()" tty: tty_jobctrl: fix pid memleak in disassociate_ctty() drivers/tty/tty_jobctrl.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) -- 2.25.1

2 3

[PATCH OLK-5.10 00/20] Add error handle for driver
by Li Nan 09 Nov '23

09 Nov '23

Christoph Hellwig (9): block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (6): loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Zhong Jinghua (2): mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup include/linux/blk-mq.h | 12 +++++ include/linux/genhd.h | 23 +++++++++ block/blk-mq.c | 19 ++++++++ block/genhd.c | 42 +++++++++++++++- drivers/block/brd.c | 103 +++++++++++++++++---------------------- drivers/block/loop.c | 26 +++++----- drivers/block/nbd.c | 60 ++++++++++------------- drivers/mtd/ubi/block.c | 76 ++++++++++++++--------------- drivers/nvme/host/core.c | 41 ++++++++-------- drivers/scsi/sr.c | 7 ++- 10 files changed, 239 insertions(+), 170 deletions(-) -- 2.39.2

2 21

[PATCH openEuler-1.0-LTS v4 0/3] bugfix for CVE-2022-45884
by liwei 09 Nov '23

09 Nov '23

bugfix for CVE-2022-45884 Dinghao Liu (1): media: dvbdev: Fix memleak in dvb_register_device Fuqian Huang (1): media: media/dvb: Use kmemdup rather than duplicating its implementation Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() drivers/media/dvb-core/dvbdev.c | 86 ++++++++++++++++----- drivers/media/dvb-frontends/drx39xyj/drxj.c | 5 +- include/media/dvbdev.h | 15 ++++ 3 files changed, 82 insertions(+), 24 deletions(-) -- 2.25.1

2 4

openEuler Kernel SIG双周例会
by openEuler conference 09 Nov '23

09 Nov '23

您好！ Kernel SIG 邀请您参加 2023-11-10 14:00 召开的Zoom会议(自动录制) 会议主题：openEuler Kernel SIG双周例会会议内容： 1. 进展update 2. 内容征集中（议题申报可回复邮件，或直接填入会议看板）会议链接：https://us06web.zoom.us/j/87016190729?pwd=QXivFoXObEngRNR23faQCqS1XGbrs3.1 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 温馨提醒：建议接入会议后修改参会人的姓名，也可以使用您在gitee.com的ID 更多资讯尽在：https://openeuler.org/zh/ Hello! openEuler Kernel SIG invites you to attend the Zoom conference(auto recording) will be held at 2023-11-10 14:00, The subject of the conference is openEuler Kernel SIG双周例会, Summary: 1. 进展update 2. 内容征集中（议题申报可回复邮件，或直接填入会议看板） You can join the meeting at https://us06web.zoom.us/j/87016190729?pwd=QXivFoXObEngRNR23faQCqS1XGbrs3.1. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com. More information: https://openeuler.org/en/

1 0

[PATCH openEuler-1.0-LTS] sched/fair: Refill bandwidth before scaling
by Xia Fukun 08 Nov '23

08 Nov '23

From: Huaixin Chang <changhuaixin(a)linux.alibaba.com> mainline inclusion from mainline-v6.5-rc7 commit 5a6d6a6ccb5f48ca8cf7c6d64ff83fd9c7999390 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8EUCE CVE: NA -------------------------------- In order to prevent possible hardlockup of sched_cfs_period_timer() loop, loop count is introduced to denote whether to scale quota and period or not. However, scale is done between forwarding period timer and refilling cfs bandwidth runtime, which means that period timer is forwarded with old "period" while runtime is refilled with scaled "quota". Move do_sched_cfs_period_timer() before scaling to solve this. Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Signed-off-by: Huaixin Chang <changhuaixin(a)linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Ben Segall <bsegall(a)google.com> Reviewed-by: Phil Auld <pauld(a)redhat.com> Link: https://lkml.kernel.org/r/20200420024421.22442-3-changhuaixin@linux.alibaba… Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1c78e2f29901..ccd2a060c2df 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5099,6 +5099,8 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (!overrun) break; + idle = do_sched_cfs_period_timer(cfs_b, overrun, flags); + if (++count > 3) { u64 new, old = ktime_to_ns(cfs_b->period); @@ -5128,8 +5130,6 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) /* reset count so we don't come right back in here */ count = 0; } - - idle = do_sched_cfs_period_timer(cfs_b, overrun, flags); } if (idle) cfs_b->period_active = 0; -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
by Xia Fukun 08 Nov '23

08 Nov '23

From: "Eric W. Biederman" <ebiederm(a)xmission.com> mainline inclusion from mainline-v6.5-rc7 commit 8917bef336f5301edd616cfa97b97d0319fd0496 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8ETTA CVE: NA -------------------------------- Any time siginfo is not stored in the signal queue information is lost. Therefore set TRACE_SIGNAL_LOSE_INFO every time the code does not allocate a signal queue entry, and a queue overflow abort is not triggered. Fixes: ba005e1f4172 ("tracepoint: Add signal loss events") Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com> Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/signal.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 69b9d8bff5a8..4166d22645f8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1163,23 +1163,22 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, userns_fixup_signal_uid(&q->info, t); - } else if (!is_si_special(info)) { - if (sig >= SIGRTMIN && info->si_code != SI_USER) { - /* - * Queue overflow, abort. We may abort if the - * signal was rt and sent by user using something - * other than kill(). - */ - result = TRACE_SIGNAL_OVERFLOW_FAIL; - ret = -EAGAIN; - goto ret; - } else { - /* - * This is a silent loss of information. We still - * send the signal, but the *info bits are lost. - */ - result = TRACE_SIGNAL_LOSE_INFO; - } + } else if (!is_si_special(info) && + sig >= SIGRTMIN && info->si_code != SI_USER) { + /* + * Queue overflow, abort. We may abort if the + * signal was rt and sent by user using something + * other than kill(). + */ + result = TRACE_SIGNAL_OVERFLOW_FAIL; + ret = -EAGAIN; + goto ret; + } else { + /* + * This is a silent loss of information. We still + * send the signal, but the *info bits are lost. + */ + result = TRACE_SIGNAL_LOSE_INFO; } out_set: -- 2.34.1

2 1

[PATCH OLK-5.10 0/9] handle uninitialized numa nodes gracefully.
by Wupeng Ma 08 Nov '23

08 Nov '23

From: Ma Wupeng <mawupeng1(a)huawei.com> handle uninitialized numa nodes gracefully. Haifeng Xu (1): mm/memcontrol: do not tweak node in mem_cgroup_init() Michal Hocko (5): mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG mm: handle uninitialized numa nodes gracefully mm, memory_hotplug: drop arch_free_nodedata mm, memory_hotplug: reorganize new pgdat initialization mm: make free_area_init_node aware of memory less nodes Oscar Salvador (1): arch/x86/mm/numa: Do not initialize nodes twice Srikar Dronamraju (1): powerpc/numa: Handle partially initialized numa nodes Wei Yang (1): memcg: do not tweak node in alloc_mem_cgroup_per_node_info arch/ia64/mm/discontig.c | 11 +-- arch/powerpc/mm/numa.c | 2 +- arch/x86/mm/numa.c | 33 +++++---- include/linux/memory_hotplug.h | 118 ++++++++++++++++----------------- include/linux/mm.h | 1 - mm/internal.h | 2 + mm/memcontrol.c | 17 +---- mm/memory_hotplug.c | 55 +++------------ mm/page_alloc.c | 78 +++++++++++++++++++--- 9 files changed, 163 insertions(+), 154 deletions(-) -- 2.25.1

2 10

[PATCH OLK-5.10 v6 00/10] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 08 Nov '23

08 Nov '23

Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (4): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init() Jia He (3): EDAC/ghes: Add a notifier for reporting memory errors EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Make ghes_edac a proper module Li Yang (1): APEI: GHES: correctly return NULL for ghes_get_devices() Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 85 ++++++++++++++++++++---- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 117 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/edac/Kconfig | 4 +- drivers/edac/ghes_edac.c | 90 ++++++++++++++++---------- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/acpi/ghes.h | 34 +++------- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 20 files changed, 371 insertions(+), 97 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 11

[PATCH v3 OLK-5.10 00/24] arm64/perf: Enable branch stack sampling
by Junhao He 08 Nov '23

08 Nov '23

This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All relevant register definitions could be accessed here. v2-v3: - Remove indentation before macro #ifdef v1-v2: - Fix kabi breakage in struct perf_branch_entry Anshuman Khandual (16): perf tools: Add missing branch_sample_type to perf_event_attr__fprintf() perf: Add irq and exception return branch types perf: Add system error and not in transaction branch types perf: Extend branch type classification perf: Capture branch privilege information perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform perf branch: Add system error and not in transaction branch types perf branch: Extend branch type classification perf branch: Add branch privilege information request flag perf branch: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform perf: Consolidate branch sample filter helpers perf record: Add remaining branch filters: "no_cycles", "no_flags" & "hw_index" arm64/sysreg: Add BRBE registers and fields drivers: perf: arm_pmu: Add new sched_task() callback drivers: perf: arm_pmuv3: Enable branch stack sampling framework drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE James Clark (5): perf evsel: Add error message for unsupported branch stack cases perf session: Print branch stack entry type in --dump-raw-trace perf script: Refactor branch stack printing perf script: Output branch sample type perf branch: Fix interpretation of branch records Junhao He (1): perf: Fix kabi breakage in struct perf_branch_entry Mark Brown (1): arm64/sysreg: Introduce helpers for access to sysreg fields Sandipan Das (1): perf/core: Add speculation info to branch entries arch/arm64/include/asm/perf_event.h | 56 ++ arch/arm64/include/asm/sysreg.h | 511 ++++++++++++++ arch/arm64/kernel/head.S | 48 ++ arch/arm64/kernel/perf_event.c | 90 ++- arch/x86/events/intel/lbr.c | 4 +- drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 779 ++++++++++++++++++++++ drivers/perf/arm_brbe.h | 258 +++++++ drivers/perf/arm_pmu.c | 34 +- include/linux/perf/arm_pmu.h | 28 +- include/linux/perf_event.h | 27 + include/uapi/linux/perf_event.h | 53 ++ kernel/events/core.c | 9 +- tools/include/uapi/linux/perf_event.h | 38 +- tools/perf/builtin-script.c | 28 +- tools/perf/util/branch.c | 72 +- tools/perf/util/branch.h | 8 +- tools/perf/util/evsel.c | 4 + tools/perf/util/parse-branch-options.c | 4 + tools/perf/util/perf_event_attr_fprintf.c | 2 +- tools/perf/util/session.c | 5 +- 22 files changed, 2035 insertions(+), 35 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_brbe.h -- 2.33.0

2 25

[PATCH v4 openEuler-1.0-LTS 0/3] handle uninitialized numa nodes gracefully.
by Wupeng Ma 08 Nov '23

08 Nov '23

From: Ma Wupeng <mawupeng1(a)huawei.com> handle uninitialized numa nodes gracefully. Michal Hocko (2): mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG mm: handle uninitialized numa nodes gracefully Oscar Salvador (1): arch/x86/mm/numa: Do not initialize nodes twice arch/ia64/mm/discontig.c | 6 +- arch/x86/mm/numa.c | 26 +++---- include/linux/memory_hotplug.h | 120 ++++++++++++++++----------------- mm/internal.h | 2 + mm/memory_hotplug.c | 24 +++---- mm/page_alloc.c | 45 +++++++++++-- 6 files changed, 123 insertions(+), 100 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10 v6 00/10] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 08 Nov '23

08 Nov '23

Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (4): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init() Jia He (3): EDAC/ghes: Add a notifier for reporting memory errors EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Make ghes_edac a proper module Li Yang (1): APEI: GHES: correctly return NULL for ghes_get_devices() Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 85 ++++++++++++++++++++---- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 117 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/edac/Kconfig | 4 +- drivers/edac/ghes_edac.c | 90 ++++++++++++++++---------- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/acpi/ghes.h | 34 +++------- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 20 files changed, 371 insertions(+), 97 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

1 9

[PATCH OLK-5.10 v5 0/9] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 08 Nov '23

08 Nov '23

From: “z00447436” <zouqian4(a)huawei.com> Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (4): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init() Jia He (3): EDAC/ghes: Add a notifier for reporting memory errors EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Make ghes_edac a proper module Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 83 ++++++++++++++++++++---- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 117 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/edac/Kconfig | 4 +- drivers/edac/ghes_edac.c | 90 ++++++++++++++++---------- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/acpi/ghes.h | 34 +++------- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 20 files changed, 369 insertions(+), 97 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 10

[PATCH v2 OLK-5.10 00/24] arm64/perf: Enable branch stack sampling
by Junhao He 07 Nov '23

07 Nov '23

This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All relevant register definitions could be accessed here. Anshuman Khandual (16): perf tools: Add missing branch_sample_type to perf_event_attr__fprintf() perf: Add irq and exception return branch types perf: Add system error and not in transaction branch types perf: Extend branch type classification perf: Capture branch privilege information perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform perf branch: Add system error and not in transaction branch types perf branch: Extend branch type classification perf branch: Add branch privilege information request flag perf branch: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform perf: Consolidate branch sample filter helpers perf record: Add remaining branch filters: "no_cycles", "no_flags" & "hw_index" arm64/sysreg: Add BRBE registers and fields drivers: perf: arm_pmu: Add new sched_task() callback drivers: perf: arm_pmuv3: Enable branch stack sampling framework drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE James Clark (5): perf evsel: Add error message for unsupported branch stack cases perf session: Print branch stack entry type in --dump-raw-trace perf script: Refactor branch stack printing perf script: Output branch sample type perf branch: Fix interpretation of branch records Junhao He (1): perf: Fix kabi breakage in struct perf_branch_entry Mark Brown (1): arm64/sysreg: Introduce helpers for access to sysreg fields Sandipan Das (1): perf/core: Add speculation info to branch entries arch/arm64/include/asm/perf_event.h | 56 ++ arch/arm64/include/asm/sysreg.h | 511 ++++++++++++++ arch/arm64/kernel/head.S | 48 ++ arch/arm64/kernel/perf_event.c | 90 ++- arch/x86/events/intel/lbr.c | 4 +- drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 779 ++++++++++++++++++++++ drivers/perf/arm_brbe.h | 258 +++++++ drivers/perf/arm_pmu.c | 34 +- include/linux/perf/arm_pmu.h | 28 +- include/linux/perf_event.h | 27 + include/uapi/linux/perf_event.h | 53 ++ kernel/events/core.c | 9 +- tools/include/uapi/linux/perf_event.h | 38 +- tools/perf/builtin-script.c | 28 +- tools/perf/util/branch.c | 72 +- tools/perf/util/branch.h | 8 +- tools/perf/util/evsel.c | 4 + tools/perf/util/parse-branch-options.c | 4 + tools/perf/util/perf_event_attr_fprintf.c | 2 +- tools/perf/util/session.c | 5 +- 22 files changed, 2035 insertions(+), 35 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_brbe.h -- 2.33.0

2 25

2023年度openEuler社区优秀贡献者奖评选启动，欢迎大家参加
by openEuler 07 Nov '23

07 Nov '23

1 0

[PATCH v3 openEuler-1.0-LTS 0/3] handle uninitialized numa nodes gracefully
by Wupeng Ma 07 Nov '23

07 Nov '23

From: Ma Wupeng <mawupeng1(a)huawei.com> handle uninitialized numa nodes gracefully. Changelog since v2: - fix compile error. Changelog since v1: - add bugfix patch #3. Michal Hocko (2): mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG mm: handle uninitialized numa nodes gracefully Oscar Salvador (1): arch/x86/mm/numa: Do not initialize nodes twice arch/ia64/mm/discontig.c | 6 +- arch/x86/mm/numa.c | 26 +++---- include/linux/memory_hotplug.h | 120 ++++++++++++++++----------------- mm/internal.h | 2 + mm/memory_hotplug.c | 24 +++---- mm/page_alloc.c | 45 +++++++++++-- 6 files changed, 123 insertions(+), 100 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10 v2 00/22] Add error handle for add_disk
by Li Nan 07 Nov '23

07 Nov '23

To make applying the mainline patch easier, reorder the nbd commit about the first_minor check here. Christoph Hellwig (5): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference Luis Chamberlain (4): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling Tetsuo Handa (1): block: check minor range in device_add_disk() Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (7): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path block/blk.h | 5 +- include/linux/genhd.h | 13 +++ block/blk-integrity.c | 12 ++- block/genhd.c | 246 +++++++++++++++++++++++++----------------- drivers/block/nbd.c | 15 +-- 5 files changed, 175 insertions(+), 116 deletions(-) -- 2.39.2

2 23

[PATCH OLK-5.10 v2 0/2] cpufreq: Abort show()/store() for half-initialized policies
by Jinjie Ruan 07 Nov '23

07 Nov '23

Fix cpu offline but cat scaling_cur_freq is the same as online bug. Schspa Shi (2): cpufreq: Abort show()/store() for half-initialized policies cpufreq: make interface functions and lock holding state clear drivers/cpufreq/cpufreq.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) -- 2.34.1

2 3

[PATCH openEuler-1.0-LTS v2 0/2] cpufreq: Abort show()/store() for half-initialized policies
by Jinjie Ruan 07 Nov '23

07 Nov '23

Fix cpu offline but cat scaling_cur_freq is the same as online bug. Changes in v2: - Fix the CHECKPATCH warnings. Schspa Shi (2): cpufreq: Abort show()/store() for half-initialized policies cpufreq: make interface functions and lock holding state clear drivers/cpufreq/cpufreq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) -- 2.34.1

2 3

[PATCH OLK-5.10 0/2] cpufreq: Abort show()/store() for half-initialized policies
by Jinjie Ruan 07 Nov '23

07 Nov '23

Fix cpu offline but cat scaling_cur_freq is the same as online bug. Schspa Shi (2): cpufreq: Abort show()/store() for half-initialized policies cpufreq: make interface functions and lock holding state clear drivers/cpufreq/cpufreq.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) -- 2.34.1

2 3

[PATCH openEuler-1.0-LTS 0/2] cpufreq: Abort show()/store() for half-initialized policies
by Jinjie Ruan 07 Nov '23

07 Nov '23

Fix cpu offline but cat scaling_cur_freq is the same as online bug. Schspa Shi (2): cpufreq: Abort show()/store() for half-initialized policies cpufreq: make interface functions and lock holding state clear drivers/cpufreq/cpufreq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) -- 2.34.1

2 3

[PATCH v2 openEuler-1.0-LTS 0/3] handle uninitialized numa nodes gracefully
by Wupeng Ma 07 Nov '23

07 Nov '23

From: Ma Wupeng <mawupeng1(a)huawei.com> handle uninitialized numa nodes gracefully. Changelog since v1: - add bugfix patch #3. Michal Hocko (2): mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG mm: handle uninitialized numa nodes gracefully Oscar Salvador (1): arch/x86/mm/numa: Do not initialize nodes twice arch/ia64/mm/discontig.c | 6 +- arch/x86/mm/numa.c | 26 +++---- include/linux/memory_hotplug.h | 120 ++++++++++++++++----------------- mm/internal.h | 2 + mm/memory_hotplug.c | 24 +++---- mm/page_alloc.c | 45 +++++++++++-- 6 files changed, 123 insertions(+), 100 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10 v3] blk-mq: avoid housekeeping CPUs scheduling a worker on a non-housekeeping CPU
by Xiongfeng Wang 07 Nov '23

07 Nov '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZBTV CVE: N/A ---------------------------------------------------- When NOHZ_FULL is enabled, such as in HPC situation, CPUs are divided into housekeeping CPUs and non-housekeeping CPUs. Non-housekeeping CPUs are NOHZ_FULL CPUs and are often monopolized by the userspace process, such HPC application process. Any sort of interruption is not expected. blk_mq_hctx_next_cpu() selects each cpu in 'hctx->cpumask' alternately to schedule the work thread blk_mq_run_work_fn(). When 'hctx->cpumask' contains housekeeping CPU and non-housekeeping CPU at the same time, a housekeeping CPU, which want to request a IO, may schedule a worker on a non-housekeeping CPU. This may affect the performance of the userspace application running on non-housekeeping CPUs. So let's just schedule the worker thread on the current CPU when the current CPU is housekeeping CPU. Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- block/blk-mq.c | 12 +++++++++++- include/linux/sched/isolation.h | 2 ++ kernel/sched/isolation.c | 8 ++++++++ 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 1d1200afb771..3245809453b8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -23,6 +23,9 @@ #include <linux/sched/sysctl.h> #include <linux/sched/topology.h> #include <linux/sched/signal.h> +#ifndef __GENKSYMS__ +#include <linux/sched/isolation.h> +#endif #include <linux/delay.h> #include <linux/crash_dump.h> #include <linux/prefetch.h> @@ -1676,6 +1679,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, unsigned long msecs) { + int work_cpu; + if (unlikely(blk_mq_hctx_stopped(hctx))) return; @@ -1697,7 +1702,12 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, if (!percpu_ref_tryget(&hctx->queue->q_usage_counter)) return; - kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, + if (enhanced_isolcpus && tick_nohz_full_enabled() && + housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_WQ)) + work_cpu = raw_smp_processor_id(); + else + work_cpu = blk_mq_hctx_next_cpu(hctx); + kblockd_mod_delayed_work_on(work_cpu, &hctx->run_work, msecs_to_jiffies(msecs)); percpu_ref_put(&hctx->queue->q_usage_counter); } diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index cc9f393e2a70..2f93081ad7a0 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -18,6 +18,7 @@ enum hk_flags { }; #ifdef CONFIG_CPU_ISOLATION +extern bool enhanced_isolcpus; DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); extern int housekeeping_any_cpu(enum hk_flags flags); extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags); @@ -28,6 +29,7 @@ extern void __init housekeeping_init(void); #else +#define enhanced_isolcpus 0 static inline int housekeeping_any_cpu(enum hk_flags flags) { return smp_processor_id(); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5a6ea03f9882..785ef5201116 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -198,3 +198,11 @@ static int __init housekeeping_isolcpus_setup(char *str) return housekeeping_setup(str, flags); } __setup("isolcpus=", housekeeping_isolcpus_setup); + +bool enhanced_isolcpus; +static int __init enhanced_isolcpus_setup(char *str) +{ + enhanced_isolcpus = true; + return 0; +} +__setup("enhanced_isolcpus", enhanced_isolcpus_setup); -- 2.20.1

2 1

[PATCH OLK-5.10 00/67] backport SME feature from mainline
by Wang ShaoBo 07 Nov '23

07 Nov '23

From: w00502114 <bobo.shaobowang(a)huawei.com> backport SME feature from mainline Alexandru Elisei (1): arm64: Do not trap PMSNEVFR_EL1 Marc Zyngier (10): KVM: arm64: Let vcpu_sve_pffr() handle HYP VAs KVM: arm64: Provide KVM's own save/restore SVE primitives KVM: arm64: Use {read,write}_sysreg_el1 to access ZCR_EL1 KVM: arm64: Introduce vcpu_sve_vq() helper arm64: sve: Provide a conditional update accessor for ZCR_ELx KVM: arm64: Map SVE context at EL2 when available KVM: arm64: Rework SVE host-save/guest-restore KVM: arm64: Save guest's ZCR_EL1 before saving the FPSIMD state KVM: arm64: Save/restore SVE state for nVHE KVM: arm64: Always start with clearing SME flag on load Mark Brown (54): arm64/sve: Remove redundant system_supports_sve() tests arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/sve: Rework SVE access trap to convert state in registers arm64/sve: Split _sve_flush macro into separate Z and predicate flushes arm64/sve: Skip flushing Z registers with 128 bit vectors arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() arm64/sve: Remove sve_load_from_fpsimd_state() arm64/sve: Better handle failure to allocate SVE register storage arm64/fp: Reindent fpsimd_save() arm64/sve: Make access to FFR optional arm64/sve: Rename find_supported_vector_length() arm64/sve: Use accessor functions for vector lengths in thread_struct arm64/sve: Put system wide vector length information into structs arm64/sve: Explicitly load vector length when restoring SVE state arm64/sve: Track vector lengths for tasks in an array arm64/sve: Make sysctl interface for SVE reusable by SME arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Minor clarification of ABI documentation arm64: cpufeature: Always specify and use a field width for capabilities arm64/sme: Provide ABI documentation for SME arm64/sme: System register and exception syndrome definitions arm64/sme: Manually encode SME instructions arm64: Disable fine grained traps on boot arm64/sme: Early CPU setup for SME arm64/sme: Basic enumeration support arm64/sme: Identify supported SME vector lengths at boot arm64/sme: Implement sysctl to set the default vector length arm64/sme: Implement vector length configuration prctl()s arm64/sme: Implement support for TPIDR2 arm64/sme: Implement SVCR context switching arm64/sme: Implement streaming SVE context switching arm64/sme: Implement ZA context switching arm64/sme: Implement traps and syscall handling for SME arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement streaming SVE signal handling arm64/sme: Implement ZA signal handling arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Add ptrace support for ZA arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Provide Kconfig for SME arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: More sensibly define the size for the ZA register set KVM: arm64: Hide SME system registers from guests KVM: arm64: Trap SME usage in guest KVM: arm64: Handle SME host state when running guests arm64/fp: Make SVE and SME length register definition match architecture arm64/fp: Rename SVE and SME LEN field name to _WIDTH arm64/sme: Drop SYS_ from SMIDR_EL1 defines arm64/sme: Standardise bitfield names for SVCR arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h arm64/sme: Fix tests for 0b1111 value ID registers arm64/sme: Fix SVE/SME typo in ABI documentation arm64/sme: Fix EFI save/restore Wan Jiabing (1): arm64/sme: Fix NULL check after kzalloc Xiaofei Tan (1): arm64: sve: Provide sve_cond_update_zcr_vq fallback when !ARM64_SVE Documentation/arm64/elf_hwcaps.rst | 34 + Documentation/arm64/index.rst | 1 + Documentation/arm64/sme.rst | 428 ++++++++++ Documentation/arm64/sve.rst | 72 +- arch/arm64/Kconfig | 11 + arch/arm64/include/asm/cpu.h | 4 + arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 25 + arch/arm64/include/asm/esr.h | 13 +- arch/arm64/include/asm/exception.h | 1 + arch/arm64/include/asm/fpsimd.h | 262 ++++++- arch/arm64/include/asm/fpsimdmacros.h | 108 ++- arch/arm64/include/asm/hwcap.h | 8 + arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/kvm_host.h | 14 +- arch/arm64/include/asm/kvm_hyp.h | 2 + arch/arm64/include/asm/processor.h | 78 +- arch/arm64/include/asm/signal_common.h | 19 +- arch/arm64/include/asm/sysreg.h | 79 +- arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/include/uapi/asm/hwcap.h | 8 + arch/arm64/include/uapi/asm/ptrace.h | 69 +- arch/arm64/include/uapi/asm/sigcontext.h | 55 +- arch/arm64/kernel/cpufeature.c | 282 +++++-- arch/arm64/kernel/cpuinfo.c | 13 + arch/arm64/kernel/entry-common.c | 11 + arch/arm64/kernel/entry-fpsimd.S | 72 +- arch/arm64/kernel/fpsimd.c | 944 ++++++++++++++++++----- arch/arm64/kernel/head.S | 87 +++ arch/arm64/kernel/process.c | 45 +- arch/arm64/kernel/ptrace.c | 373 +++++++-- arch/arm64/kernel/signal.c | 175 ++++- arch/arm64/kernel/syscall.c | 29 +- arch/arm64/kernel/traps.c | 1 + arch/arm64/kvm/fpsimd.c | 70 +- arch/arm64/kvm/guest.c | 6 +- arch/arm64/kvm/hyp/fpsimd.S | 12 + arch/arm64/kvm/hyp/include/hyp/switch.h | 75 +- arch/arm64/kvm/hyp/nvhe/switch.c | 38 +- arch/arm64/kvm/hyp/vhe/switch.c | 10 +- arch/arm64/kvm/reset.c | 14 +- arch/arm64/kvm/sys_regs.c | 8 +- include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 9 + kernel/sys.c | 12 + 45 files changed, 3126 insertions(+), 460 deletions(-) create mode 100644 Documentation/arm64/sme.rst -- 2.25.1

1 67

[PATCH OLK-5.10 00/23] arm64/perf: Enable branch stack sampling
by Junhao He 07 Nov '23

07 Nov '23

This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All relevant register definitions could be accessed here. Anshuman Khandual (16): perf tools: Add missing branch_sample_type to perf_event_attr__fprintf() perf: Add irq and exception return branch types perf: Add system error and not in transaction branch types perf: Extend branch type classification perf: Capture branch privilege information perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform perf branch: Add system error and not in transaction branch types perf branch: Extend branch type classification perf branch: Add branch privilege information request flag perf branch: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform perf: Consolidate branch sample filter helpers perf record: Add remaining branch filters: "no_cycles", "no_flags" & "hw_index" arm64/sysreg: Add BRBE registers and fields drivers: perf: arm_pmu: Add new sched_task() callback drivers: perf: arm_pmuv3: Enable branch stack sampling framework drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE James Clark (5): perf evsel: Add error message for unsupported branch stack cases perf session: Print branch stack entry type in --dump-raw-trace perf script: Refactor branch stack printing perf script: Output branch sample type perf branch: Fix interpretation of branch records Mark Brown (1): arm64/sysreg: Introduce helpers for access to sysreg fields Sandipan Das (1): perf/core: Add speculation info to branch entries Documentation/arch/arm64/brbe.rst | 145 ++++ arch/arm64/include/asm/perf_event.h | 60 ++ arch/arm64/include/asm/sysreg.h | 499 ++++++++++++++ arch/arm64/kernel/head.S | 48 ++ arch/arm64/kernel/perf_event.c | 90 ++- arch/x86/events/intel/lbr.c | 4 +- drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 779 ++++++++++++++++++++++ drivers/perf/arm_brbe.h | 261 ++++++++ drivers/perf/arm_pmu.c | 34 +- include/linux/perf/arm_pmu.h | 28 +- include/linux/perf_event.h | 27 + include/uapi/linux/perf_event.h | 51 +- kernel/events/core.c | 9 +- tools/include/uapi/linux/perf_event.h | 38 +- tools/perf/builtin-script.c | 28 +- tools/perf/util/branch.c | 72 +- tools/perf/util/branch.h | 8 +- tools/perf/util/evsel.c | 4 + tools/perf/util/parse-branch-options.c | 4 + tools/perf/util/perf_event_attr_fprintf.c | 2 +- tools/perf/util/session.c | 5 +- 23 files changed, 2172 insertions(+), 36 deletions(-) create mode 100644 Documentation/arch/arm64/brbe.rst create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_brbe.h -- 2.33.0

2 24

[PATCH OLK-5.10 00/67] backport SME feature from mainline
by w00502114 07 Nov '23

07 Nov '23

backport SME feature from mainline Alexandru Elisei (1): arm64: Do not trap PMSNEVFR_EL1 Marc Zyngier (10): KVM: arm64: Let vcpu_sve_pffr() handle HYP VAs KVM: arm64: Provide KVM's own save/restore SVE primitives KVM: arm64: Use {read,write}_sysreg_el1 to access ZCR_EL1 KVM: arm64: Introduce vcpu_sve_vq() helper arm64: sve: Provide a conditional update accessor for ZCR_ELx KVM: arm64: Map SVE context at EL2 when available KVM: arm64: Rework SVE host-save/guest-restore KVM: arm64: Save guest's ZCR_EL1 before saving the FPSIMD state KVM: arm64: Save/restore SVE state for nVHE KVM: arm64: Always start with clearing SME flag on load Mark Brown (54): arm64/sve: Remove redundant system_supports_sve() tests arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/sve: Rework SVE access trap to convert state in registers arm64/sve: Split _sve_flush macro into separate Z and predicate flushes arm64/sve: Skip flushing Z registers with 128 bit vectors arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() arm64/sve: Remove sve_load_from_fpsimd_state() arm64/sve: Better handle failure to allocate SVE register storage arm64/fp: Reindent fpsimd_save() arm64/sve: Make access to FFR optional arm64/sve: Rename find_supported_vector_length() arm64/sve: Use accessor functions for vector lengths in thread_struct arm64/sve: Put system wide vector length information into structs arm64/sve: Explicitly load vector length when restoring SVE state arm64/sve: Track vector lengths for tasks in an array arm64/sve: Make sysctl interface for SVE reusable by SME arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Minor clarification of ABI documentation arm64: cpufeature: Always specify and use a field width for capabilities arm64/sme: Provide ABI documentation for SME arm64/sme: System register and exception syndrome definitions arm64/sme: Manually encode SME instructions arm64: Disable fine grained traps on boot arm64/sme: Early CPU setup for SME arm64/sme: Basic enumeration support arm64/sme: Identify supported SME vector lengths at boot arm64/sme: Implement sysctl to set the default vector length arm64/sme: Implement vector length configuration prctl()s arm64/sme: Implement support for TPIDR2 arm64/sme: Implement SVCR context switching arm64/sme: Implement streaming SVE context switching arm64/sme: Implement ZA context switching arm64/sme: Implement traps and syscall handling for SME arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement streaming SVE signal handling arm64/sme: Implement ZA signal handling arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Add ptrace support for ZA arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Provide Kconfig for SME arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: More sensibly define the size for the ZA register set KVM: arm64: Hide SME system registers from guests KVM: arm64: Trap SME usage in guest KVM: arm64: Handle SME host state when running guests arm64/fp: Make SVE and SME length register definition match architecture arm64/fp: Rename SVE and SME LEN field name to _WIDTH arm64/sme: Drop SYS_ from SMIDR_EL1 defines arm64/sme: Standardise bitfield names for SVCR arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h arm64/sme: Fix tests for 0b1111 value ID registers arm64/sme: Fix SVE/SME typo in ABI documentation arm64/sme: Fix EFI save/restore Wan Jiabing (1): arm64/sme: Fix NULL check after kzalloc Xiaofei Tan (1): arm64: sve: Provide sve_cond_update_zcr_vq fallback when !ARM64_SVE Documentation/arm64/elf_hwcaps.rst | 34 + Documentation/arm64/index.rst | 1 + Documentation/arm64/sme.rst | 428 ++++++++++ Documentation/arm64/sve.rst | 72 +- arch/arm64/Kconfig | 11 + arch/arm64/include/asm/cpu.h | 4 + arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 25 + arch/arm64/include/asm/esr.h | 13 +- arch/arm64/include/asm/exception.h | 1 + arch/arm64/include/asm/fpsimd.h | 262 ++++++- arch/arm64/include/asm/fpsimdmacros.h | 108 ++- arch/arm64/include/asm/hwcap.h | 8 + arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/kvm_host.h | 14 +- arch/arm64/include/asm/kvm_hyp.h | 2 + arch/arm64/include/asm/processor.h | 78 +- arch/arm64/include/asm/signal_common.h | 19 +- arch/arm64/include/asm/sysreg.h | 79 +- arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/include/uapi/asm/hwcap.h | 8 + arch/arm64/include/uapi/asm/ptrace.h | 69 +- arch/arm64/include/uapi/asm/sigcontext.h | 55 +- arch/arm64/kernel/cpufeature.c | 282 +++++-- arch/arm64/kernel/cpuinfo.c | 13 + arch/arm64/kernel/entry-common.c | 11 + arch/arm64/kernel/entry-fpsimd.S | 72 +- arch/arm64/kernel/fpsimd.c | 944 ++++++++++++++++++----- arch/arm64/kernel/head.S | 87 +++ arch/arm64/kernel/process.c | 45 +- arch/arm64/kernel/ptrace.c | 373 +++++++-- arch/arm64/kernel/signal.c | 175 ++++- arch/arm64/kernel/syscall.c | 29 +- arch/arm64/kernel/traps.c | 1 + arch/arm64/kvm/fpsimd.c | 70 +- arch/arm64/kvm/guest.c | 6 +- arch/arm64/kvm/hyp/fpsimd.S | 12 + arch/arm64/kvm/hyp/include/hyp/switch.h | 75 +- arch/arm64/kvm/hyp/nvhe/switch.c | 38 +- arch/arm64/kvm/hyp/vhe/switch.c | 10 +- arch/arm64/kvm/reset.c | 14 +- arch/arm64/kvm/sys_regs.c | 8 +- include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 9 + kernel/sys.c | 12 + 45 files changed, 3126 insertions(+), 460 deletions(-) create mode 100644 Documentation/arm64/sme.rst -- 2.25.1

1 67

[PATCH OLK-5.10 v2] blk-mq: avoid housekeeping CPUs scheduling a worker on a non-housekeeping CPU
by Xiongfeng Wang 07 Nov '23

07 Nov '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZBTV CVE: N/A ---------------------------------------------------- When NOHZ_FULL is enabled, such as in HPC situation, CPUs are divided into housekeeping CPUs and non-housekeeping CPUs. Non-housekeeping CPUs are NOHZ_FULL CPUs and are often monopolized by the userspace process, such HPC application process. Any sort of interruption is not expected. blk_mq_hctx_next_cpu() selects each cpu in 'hctx->cpumask' alternately to schedule the work thread blk_mq_run_work_fn(). When 'hctx->cpumask' contains housekeeping CPU and non-housekeeping CPU at the same time, a housekeeping CPU, which want to request a IO, may schedule a worker on a non-housekeeping CPU. This may affect the performance of the userspace application running on non-housekeeping CPUs. So let's just schedule the worker thread on the current CPU when the current CPU is housekeeping CPU. Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- block/blk-mq.c | 13 ++++++++++++- include/linux/sched/isolation.h | 2 ++ kernel/sched/isolation.c | 8 ++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 1d1200afb771..b81d4a69056c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -23,6 +23,9 @@ #include <linux/sched/sysctl.h> #include <linux/sched/topology.h> #include <linux/sched/signal.h> +#ifndef __GENKSYMS__ +#include <linux/sched/isolation.h> +#endif #include <linux/delay.h> #include <linux/crash_dump.h> #include <linux/prefetch.h> @@ -1676,6 +1679,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, unsigned long msecs) { + int work_cpu; + if (unlikely(blk_mq_hctx_stopped(hctx))) return; @@ -1697,7 +1702,13 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, if (!percpu_ref_tryget(&hctx->queue->q_usage_counter)) return; - kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, + if (enhanced_isolcpus && tick_nohz_full_enabled() && + housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_WQ)) + work_cpu = smp_processor_id(); + else + work_cpu = blk_mq_hctx_next_cpu(hctx); + + kblockd_mod_delayed_work_on(work_cpu, &hctx->run_work, msecs_to_jiffies(msecs)); percpu_ref_put(&hctx->queue->q_usage_counter); } diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index cc9f393e2a70..2f93081ad7a0 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -18,6 +18,7 @@ enum hk_flags { }; #ifdef CONFIG_CPU_ISOLATION +extern bool enhanced_isolcpus; DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); extern int housekeeping_any_cpu(enum hk_flags flags); extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags); @@ -28,6 +29,7 @@ extern void __init housekeeping_init(void); #else +#define enhanced_isolcpus 0 static inline int housekeeping_any_cpu(enum hk_flags flags) { return smp_processor_id(); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5a6ea03f9882..785ef5201116 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -198,3 +198,11 @@ static int __init housekeeping_isolcpus_setup(char *str) return housekeeping_setup(str, flags); } __setup("isolcpus=", housekeeping_isolcpus_setup); + +bool enhanced_isolcpus; +static int __init enhanced_isolcpus_setup(char *str) +{ + enhanced_isolcpus = true; + return 0; +} +__setup("enhanced_isolcpus", enhanced_isolcpus_setup); -- 2.20.1

2 1

[PATCH OLK-5.10 00/22] Add error handle for add_disk
by Li Nan 07 Nov '23

07 Nov '23

To make applying the mainline patch easier, reorder the nbd commit about the first_minor check here. Christoph Hellwig (5): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference Luis Chamberlain (4): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling Tetsuo Handa (1): block: check minor range in device_add_disk() Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (7): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path block/blk.h | 5 +- include/linux/genhd.h | 13 +++ block/blk-integrity.c | 12 ++- block/genhd.c | 246 +++++++++++++++++++++++++----------------- drivers/block/nbd.c | 15 +-- 5 files changed, 175 insertions(+), 116 deletions(-) -- 2.39.2

2 23

[PATCH OLK-5.10 v2 00/42] Add error handle for add_disk
by Li Nan 07 Nov '23

07 Nov '23

Christoph Hellwig (14): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (10): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Tetsuo Handa (1): block: check minor range in device_add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (9): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup block/blk.h | 5 +- include/linux/blk-mq.h | 12 ++ include/linux/genhd.h | 36 +++++ block/blk-integrity.c | 12 +- block/blk-mq.c | 19 +++ block/genhd.c | 286 ++++++++++++++++++++++++++------------- drivers/block/brd.c | 103 ++++++-------- drivers/block/loop.c | 26 ++-- drivers/block/nbd.c | 75 ++++------ drivers/mtd/ubi/block.c | 76 +++++------ drivers/nvme/host/core.c | 41 +++--- drivers/scsi/sr.c | 7 +- 12 files changed, 413 insertions(+), 285 deletions(-) -- 2.39.2

1 42

[PATCH openEuler-1.0-LTS] crypto: hisilicon/qm - alloc reserve buffer to set and get xqc
by wangyuan 06 Nov '23

06 Nov '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8E6ED CVE: NA -------------------------------- If the temporarily applied memory is used to set or get the xqc information, the driver will release the memory after the mailbox timeout. However, the hardware still performs the operation. Therefore, the released memory may be written by hardware. So when load driver, reserve memory is applied for xqc configuration. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/qm.c | 544 +++++++++++++++------------------- drivers/crypto/hisilicon/qm.h | 13 + 2 files changed, 250 insertions(+), 307 deletions(-) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 64e8ef1bf8e7..25bb24b4c131 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -50,7 +50,7 @@ #define QM_SQ_TYPE_MASK 0xf -#define QM_SQ_TAIL_IDX(sqc) ((le16_to_cpu((sqc)->w11) >> 6) & 0x1) +#define QM_SQ_TAIL_IDX(sqc) ((le16_to_cpu((sqc).w11) >> 6) & 0x1) /* cqc shift */ #define QM_CQ_HOP_NUM_SHIFT 0 @@ -64,7 +64,7 @@ #define QM_QC_CQE_SIZE 4 -#define QM_CQ_TAIL_IDX(cqc) ((le16_to_cpu((cqc)->w11) >> 6) & 0x1) +#define QM_CQ_TAIL_IDX(cqc) ((le16_to_cpu((cqc).w11) >> 6) & 0x1) /* eqc shift */ #define QM_EQE_AEQE_SIZE (2UL << 12) @@ -217,19 +217,6 @@ #define QM_MK_SQC_DW3_V2(sqe_sz) \ ((QM_Q_DEPTH - 1) | ((u32)ilog2(sqe_sz) << QM_SQ_SQE_SIZE_SHIFT)) -#define INIT_QC_COMMON(qc, base, pasid) do { \ - (qc)->head = 0; \ - (qc)->tail = 0; \ - (qc)->base_l = cpu_to_le32(lower_32_bits(base)); \ - (qc)->base_h = cpu_to_le32(upper_32_bits(base)); \ - (qc)->dw3 = 0; \ - (qc)->w8 = 0; \ - (qc)->rsvd0 = 0; \ - (qc)->pasid = cpu_to_le16(pasid); \ - (qc)->w11 = 0; \ - (qc)->rsvd1 = 0; \ -} while (0) - #define QMC_ALIGN(sz) ALIGN(sz, 32) static int __hisi_qm_start(struct hisi_qm *qm); @@ -537,6 +524,61 @@ static int hisi_qm_mb_read(struct hisi_qm *qm, u64 *base, u8 cmd, u16 queue) return 0; } +/* op 0: set xqc info to hardware, 1: get xqc info from hardware. */ +static int qm_set_and_get_xqc(struct hisi_qm *qm, u8 cmd, void *xqc, u32 qp_id, + bool op) +{ + struct hisi_qm *pf_qm = pci_get_drvdata(pci_physfn(qm->pdev)); + struct qm_mailbox mailbox; + dma_addr_t xqc_dma; + void *tmp_xqc; + size_t size; + int ret; + + switch (cmd) { + case QM_MB_CMD_SQC: + size = sizeof(struct qm_sqc); + tmp_xqc = qm->xqc_buf.sqc; + xqc_dma = qm->xqc_buf.sqc_dma; + break; + case QM_MB_CMD_CQC: + size = sizeof(struct qm_cqc); + tmp_xqc = qm->xqc_buf.cqc; + xqc_dma = qm->xqc_buf.cqc_dma; + break; + case QM_MB_CMD_EQC: + size = sizeof(struct qm_eqc); + tmp_xqc = qm->xqc_buf.eqc; + xqc_dma = qm->xqc_buf.eqc_dma; + break; + case QM_MB_CMD_AEQC: + size = sizeof(struct qm_aeqc); + tmp_xqc = qm->xqc_buf.aeqc; + xqc_dma = qm->xqc_buf.aeqc_dma; + break; + } + + /* No need to judge if master OOO is blocked. */ + if (qm_check_dev_error(pf_qm)) { + dev_err(&qm->pdev->dev, + "QM mailbox operation failed since qm is stop!\n"); + return -EIO; + } + + mutex_lock(&qm->mailbox_lock); + if (!op) + memcpy(tmp_xqc, xqc, size); + + qm_mb_pre_init(&mailbox, cmd, xqc_dma, qp_id, op); + ret = qm_mb_nolock(qm, &mailbox); + if (!ret && op) + memcpy(xqc, tmp_xqc, size); + + mutex_unlock(&qm->mailbox_lock); + + return ret; +} + static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8 priority) { u64 doorbell; @@ -1166,35 +1208,6 @@ static ssize_t qm_cmd_read(struct file *filp, char __user *buffer, return (*pos = len); } -static void *qm_ctx_alloc(struct hisi_qm *qm, size_t ctx_size, - dma_addr_t *dma_addr) -{ - struct device *dev = &qm->pdev->dev; - void *ctx_addr; - - ctx_addr = kzalloc(ctx_size, GFP_KERNEL); - if (!ctx_addr) - return ERR_PTR(-ENOMEM); - - *dma_addr = dma_map_single(dev, ctx_addr, ctx_size, DMA_FROM_DEVICE); - if (dma_mapping_error(dev, *dma_addr)) { - dev_err(dev, "DMA mapping error!\n"); - kfree(ctx_addr); - return ERR_PTR(-ENOMEM); - } - - return ctx_addr; -} - -static void qm_ctx_free(struct hisi_qm *qm, size_t ctx_size, - const void *ctx_addr, dma_addr_t *dma_addr) -{ - struct device *dev = &qm->pdev->dev; - - dma_unmap_single(dev, *dma_addr, ctx_size, DMA_FROM_DEVICE); - kfree(ctx_addr); -} - static int dump_show(struct hisi_qm *qm, void *info, unsigned int info_size, char *info_name) { @@ -1230,21 +1243,11 @@ static int dump_show(struct hisi_qm *qm, void *info, return 0; } -static int qm_dump_sqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id) -{ - return hisi_qm_mb_write(qm, QM_MB_CMD_SQC, dma_addr, qp_id, 1); -} - -static int qm_dump_cqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id) -{ - return hisi_qm_mb_write(qm, QM_MB_CMD_CQC, dma_addr, qp_id, 1); -} - static int qm_sqc_dump(struct hisi_qm *qm, const char *s) { struct device *dev = &qm->pdev->dev; - struct qm_sqc *sqc, *sqc_curr; - dma_addr_t sqc_dma; + struct qm_sqc *sqc_curr; + struct qm_sqc sqc; u32 qp_id; int ret; @@ -1257,40 +1260,28 @@ static int qm_sqc_dump(struct hisi_qm *qm, const char *s) return -EINVAL; } - sqc = qm_ctx_alloc(qm, sizeof(struct qm_sqc), &sqc_dma); - if (IS_ERR(sqc)) - return PTR_ERR(sqc); - - ret = qm_dump_sqc_raw(qm, sqc_dma, qp_id); - if (ret) { - down_read(&qm->qps_lock); - if (qm->sqc) { - sqc_curr = qm->sqc + qp_id; - - ret = dump_show(qm, sqc_curr, sizeof(struct qm_sqc), - "SOFT SQC"); - if (ret) - dev_info(dev, "Show soft sqc failed!\n"); - } - up_read(&qm->qps_lock); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_SQC, &sqc, qp_id, 1); + if (!ret) + return dump_show(qm, &sqc, sizeof(struct qm_sqc), "SQC"); - goto mailbox_fail; + down_read(&qm->qps_lock); + if (qm->sqc) { + sqc_curr = qm->sqc + qp_id; + ret = dump_show(qm, sqc_curr, sizeof(struct qm_sqc), + "SOFT SQC"); + if (ret) + dev_info(dev, "Show soft sqc failed!\n"); } + up_read(&qm->qps_lock); - ret = dump_show(qm, sqc, sizeof(struct qm_sqc), "SQC"); - if (ret) - dev_info(dev, "Show hw sqc failed!\n"); - -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_sqc), sqc, &sqc_dma); return ret; } static int qm_cqc_dump(struct hisi_qm *qm, const char *s) { struct device *dev = &qm->pdev->dev; - struct qm_cqc *cqc, *cqc_curr; - dma_addr_t cqc_dma; + struct qm_cqc *cqc_curr; + struct qm_cqc cqc; u32 qp_id; int ret; @@ -1303,40 +1294,28 @@ static int qm_cqc_dump(struct hisi_qm *qm, const char *s) return -EINVAL; } - cqc = qm_ctx_alloc(qm, sizeof(struct qm_cqc), &cqc_dma); - if (IS_ERR(cqc)) - return PTR_ERR(cqc); - - ret = qm_dump_cqc_raw(qm, cqc_dma, qp_id); - if (ret) { - down_read(&qm->qps_lock); - if (qm->cqc) { - cqc_curr = qm->cqc + qp_id; + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_CQC, &cqc, qp_id, 1); + if (!ret) + return dump_show(qm, &cqc, sizeof(struct qm_cqc), "CQC"); - ret = dump_show(qm, cqc_curr, sizeof(struct qm_cqc), - "SOFT CQC"); - if (ret) - dev_info(dev, "Show soft cqc failed!\n"); - } - up_read(&qm->qps_lock); + down_read(&qm->qps_lock); + if (qm->cqc) { + cqc_curr = qm->cqc + qp_id; - goto mailbox_fail; + ret = dump_show(qm, cqc_curr, sizeof(struct qm_cqc), + "SOFT CQC"); + if (ret) + dev_info(dev, "Show soft cqc failed!\n"); } + up_read(&qm->qps_lock); - ret = dump_show(qm, cqc, sizeof(struct qm_cqc), "CQC"); - if (ret) - dev_info(dev, "Show hw cqc failed!\n"); - -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_cqc), cqc, &cqc_dma); return ret; } static int qm_eqc_dump(struct hisi_qm *qm, char *s) { struct device *dev = &qm->pdev->dev; - struct qm_eqc *eqc; - dma_addr_t eqc_dma; + struct qm_eqc eqc; int ret; if (strsep(&s, " ")) { @@ -1344,28 +1323,17 @@ static int qm_eqc_dump(struct hisi_qm *qm, char *s) return -EINVAL; } - eqc = qm_ctx_alloc(qm, sizeof(struct qm_eqc), &eqc_dma); - if (IS_ERR(eqc)) - return PTR_ERR(eqc); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_EQC, eqc_dma, 0, 1); - if (ret) - goto mailbox_fail; - - ret = dump_show(qm, eqc, sizeof(struct qm_eqc), "EQC"); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_EQC, &eqc, 0, 1); if (ret) - dev_info(dev, "Show eqc failed!\n"); + return ret; -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_eqc), eqc, &eqc_dma); - return ret; + return dump_show(qm, &eqc, sizeof(struct qm_eqc), "EQC"); } static int qm_aeqc_dump(struct hisi_qm *qm, char *s) { struct device *dev = &qm->pdev->dev; - struct qm_aeqc *aeqc; - dma_addr_t aeqc_dma; + struct qm_aeqc aeqc; int ret; if (strsep(&s, " ")) { @@ -1373,21 +1341,11 @@ static int qm_aeqc_dump(struct hisi_qm *qm, char *s) return -EINVAL; } - aeqc = qm_ctx_alloc(qm, sizeof(struct qm_aeqc), &aeqc_dma); - if (IS_ERR(aeqc)) - return PTR_ERR(aeqc); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 1); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_AEQC, &aeqc, 0, 1); if (ret) - goto mailbox_fail; - - ret = dump_show(qm, aeqc, sizeof(struct qm_aeqc), "AEQC"); - if (ret) - dev_info(dev, "Show hw aeqc failed!\n"); + return ret; -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_aeqc), aeqc, &aeqc_dma); - return ret; + return dump_show(qm, &aeqc, sizeof(struct qm_aeqc), "AEQC"); } static int q_dump_param_parse(struct hisi_qm *qm, char *s, @@ -1432,7 +1390,6 @@ static int q_dump_param_parse(struct hisi_qm *qm, char *s, static int qm_sq_dump(struct hisi_qm *qm, char *s) { - struct device *dev = &qm->pdev->dev; struct hisi_qp *qp; u32 qp_id, sqe_id; void *sqe_curr; @@ -1447,16 +1404,11 @@ static int qm_sq_dump(struct hisi_qm *qm, char *s) memset(sqe_curr + qm->debug.sqe_mask_offset, SQE_ADDR_MASK, qm->debug.sqe_mask_len); - ret = dump_show(qm, sqe_curr, qm->sqe_size, "SQE"); - if (ret) - dev_info(dev, "Show sqe failed!\n"); - - return ret; + return dump_show(qm, sqe_curr, qm->sqe_size, "SQE"); } static int qm_cq_dump(struct hisi_qm *qm, char *s) { - struct device *dev = &qm->pdev->dev; struct qm_cqe *cqe_curr; struct hisi_qp *qp; u32 qp_id, cqe_id; @@ -1468,11 +1420,8 @@ static int qm_cq_dump(struct hisi_qm *qm, char *s) qp = &qm->qp_array[qp_id]; cqe_curr = qp->cqe + cqe_id; - ret = dump_show(qm, cqe_curr, sizeof(struct qm_cqe), "CQE"); - if (ret) - dev_info(dev, "Show cqe failed!\n"); - return ret; + return dump_show(qm, cqe_curr, sizeof(struct qm_cqe), "CQE"); } static int qm_eq_dump(struct hisi_qm *qm, const char *s) @@ -1895,79 +1844,46 @@ EXPORT_SYMBOL_GPL(hisi_qm_release_qp); static int qm_sq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) { struct hisi_qm *qm = qp->qm; - struct device *dev = &qm->pdev->dev; enum qm_hw_ver ver = qm->ver; - struct qm_sqc *sqc; - dma_addr_t sqc_dma; - int ret; + struct qm_sqc sqc = {0}; - sqc = kzalloc(sizeof(struct qm_sqc), GFP_KERNEL); - if (!sqc) - return -ENOMEM; - sqc_dma = dma_map_single(dev, sqc, sizeof(struct qm_sqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, sqc_dma)) { - kfree(sqc); - return -ENOMEM; - } - - INIT_QC_COMMON(sqc, qp->sqe_dma, pasid); if (ver == QM_HW_V1) { - sqc->dw3 = cpu_to_le32(QM_MK_SQC_DW3_V1(0, 0, 0, qm->sqe_size)); - sqc->w8 = cpu_to_le16(QM_Q_DEPTH - 1); + sqc.dw3 = cpu_to_le32(QM_MK_SQC_DW3_V1(0, 0, 0, qm->sqe_size)); + sqc.w8 = cpu_to_le16(QM_Q_DEPTH - 1); } else if (ver == QM_HW_V2) { - sqc->dw3 = cpu_to_le32(QM_MK_SQC_DW3_V2(qm->sqe_size)); - sqc->w8 = 0; /* rand_qc */ + sqc.dw3 = cpu_to_le32(QM_MK_SQC_DW3_V2(qm->sqe_size)); + sqc.w8 = 0; /* rand_qc */ } - sqc->cq_num = cpu_to_le16(qp_id); - sqc->w13 = cpu_to_le16(QM_MK_SQC_W13(0, 1, qp->alg_type)); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_SQC, sqc_dma, qp_id, 0); - - dma_unmap_single(dev, sqc_dma, sizeof(struct qm_sqc), DMA_TO_DEVICE); - kfree(sqc); + sqc.cq_num = cpu_to_le16(qp_id); + sqc.w13 = cpu_to_le16(QM_MK_SQC_W13(0, 1, qp->alg_type)); + sqc.base_l = cpu_to_le32(lower_32_bits(qp->sqe_dma)); + sqc.base_h = cpu_to_le32(upper_32_bits(qp->sqe_dma)); + sqc.pasid = cpu_to_le16(pasid); - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_SQC, &sqc, qp_id, 0); } static int qm_cq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) { struct hisi_qm *qm = qp->qm; - struct device *dev = &qm->pdev->dev; enum qm_hw_ver ver = qm->ver; - struct qm_cqc *cqc; - dma_addr_t cqc_dma; - int ret; + struct qm_cqc cqc = {0}; - cqc = kzalloc(sizeof(struct qm_cqc), GFP_KERNEL); - if (!cqc) - return -ENOMEM; - - cqc_dma = dma_map_single(dev, cqc, sizeof(struct qm_cqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, cqc_dma)) { - kfree(cqc); - return -ENOMEM; - } - - INIT_QC_COMMON(cqc, qp->cqe_dma, pasid); if (ver == QM_HW_V1) { - cqc->dw3 = cpu_to_le32(QM_MK_CQC_DW3_V1(0, 0, 0, + cqc.dw3 = cpu_to_le32(QM_MK_CQC_DW3_V1(0, 0, 0, QM_QC_CQE_SIZE)); - cqc->w8 = cpu_to_le16(QM_Q_DEPTH - 1); + cqc.w8 = cpu_to_le16(QM_Q_DEPTH - 1); } else if (ver == QM_HW_V2) { - cqc->dw3 = cpu_to_le32(QM_MK_CQC_DW3_V2(QM_QC_CQE_SIZE)); - cqc->w8 = 0; /* rand_qc */ + cqc.dw3 = cpu_to_le32(QM_MK_CQC_DW3_V2(QM_QC_CQE_SIZE)); + cqc.w8 = 0; /* rand_qc */ } - cqc->dw6 = cpu_to_le32(1 << QM_CQ_PHASE_SHIFT | - qp->c_flag << QM_CQ_FLAG_SHIFT); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_CQC, cqc_dma, qp_id, 0); + cqc.dw6 = cpu_to_le32(1 << QM_CQ_PHASE_SHIFT | + qp->c_flag << QM_CQ_FLAG_SHIFT); + cqc.base_l = cpu_to_le32(lower_32_bits(qp->cqe_dma)); + cqc.base_h = cpu_to_le32(upper_32_bits(qp->cqe_dma)); + cqc.pasid = cpu_to_le16(pasid); - dma_unmap_single(dev, cqc_dma, sizeof(struct qm_cqc), DMA_TO_DEVICE); - kfree(cqc); - - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_CQC, &cqc, qp_id, 0); } static int qm_qp_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) @@ -2043,54 +1959,40 @@ static void qp_stop_fail_cb(struct hisi_qp *qp) static void qm_qp_has_no_task(struct hisi_qp *qp) { - size_t size = sizeof(struct qm_sqc) + sizeof(struct qm_cqc); - struct device *dev = &qp->qm->pdev->dev; - struct qm_sqc *sqc; - struct qm_cqc *cqc; - dma_addr_t dma_addr; - void *addr; - int i = 0; - int ret; + struct hisi_qm *qm = qp->qm; + struct device *dev = &qm->pdev->dev; + struct qm_sqc sqc; + struct qm_cqc cqc; + int ret, i = 0; if (qp->qm->err_ini.err_info.is_qm_ecc_mbit || qp->qm->err_ini.err_info.is_dev_ecc_mbit) return; - addr = qm_ctx_alloc(qp->qm, size, &dma_addr); - if (IS_ERR(addr)) { - dev_err(dev, "alloc ctx for sqc and cqc failed!\n"); - return; - } - while (++i) { - ret = qm_dump_sqc_raw(qp->qm, dma_addr, qp->qp_id); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_SQC, &sqc, qp->qp_id, 1); if (ret) { - dev_err(dev, "Failed to dump sqc!\n"); - break; + dev_err_ratelimited(dev, "Fail to dump sqc!\n"); + return; } - sqc = addr; - ret = qm_dump_cqc_raw(qp->qm, - (dma_addr + sizeof(struct qm_sqc)), qp->qp_id); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_CQC, &cqc, qp->qp_id, 1); if (ret) { - dev_err(dev, "Failed to dump cqc!\n"); - break; + dev_err_ratelimited(dev, "Fail to dump cqc!\n"); + return; } - cqc = addr + sizeof(struct qm_sqc); - if ((sqc->tail == cqc->tail) && - (QM_SQ_TAIL_IDX(sqc) == QM_CQ_TAIL_IDX(cqc))) - break; + if ((QM_SQ_TAIL_IDX(sqc) == QM_CQ_TAIL_IDX(cqc)) && + (sqc.tail == cqc.tail)) + return; if (i == MAX_WAIT_COUNTS) { - dev_err(dev, "Fail to wait for device stop!\n"); - break; + dev_err(dev, "Fail to empty queue %u!\n", qp->qp_id); + return; } usleep_range(WAIT_PERIOD_US_MIN, WAIT_PERIOD_US_MAX); } - - qm_ctx_free(qp->qm, size, addr, &dma_addr); } static int hisi_qm_stop_qp_nolock(struct hisi_qp *qp) @@ -2597,10 +2499,12 @@ static int hisi_qp_memory_init(struct hisi_qm *qm, size_t dma_size, int id) struct hisi_qp *qp; qp = &qm->qp_array[id]; - qp->qdma.va = dma_alloc_coherent(dev, dma_size, - &qp->qdma.dma, GFP_KERNEL); - if (!qp->qdma.va) + qp->qdma.va = dma_alloc_coherent(dev, dma_size, &qp->qdma.dma, + GFP_KERNEL); + if (!qp->qdma.va) { + dev_err(dev, "Fail to alloc qp dma buf size=%zx\n", dma_size); return -ENOMEM; + } qp->sqe = qp->qdma.va; qp->sqe_dma = qp->qdma.dma; @@ -2613,13 +2517,79 @@ static int hisi_qp_memory_init(struct hisi_qm *qm, size_t dma_size, int id) return 0; } +static int hisi_qp_alloc_memory(struct hisi_qm *qm) +{ + size_t qp_dma_size; + int i, ret; + + qm->qp_array = kcalloc(qm->qp_num, sizeof(struct hisi_qp), GFP_KERNEL); + if (!qm->qp_array) + return -ENOMEM; + + /* one more page for device or qp statuses */ + qp_dma_size = qm->sqe_size * QM_Q_DEPTH + + sizeof(struct cqe) * QM_Q_DEPTH; + qp_dma_size = PAGE_ALIGN(qp_dma_size) + PAGE_SIZE; + for (i = 0; i < qm->qp_num; i++) { + ret = hisi_qp_memory_init(qm, qp_dma_size, i); + if (ret) + goto err_init_qp_mem; + } + + return 0; + +err_init_qp_mem: + hisi_qp_memory_uninit(qm, i); + + return ret; +} + +static void hisi_qm_free_rsv_buf(struct hisi_qm *qm) +{ + struct qm_dma *xqc_dma = &qm->xqc_buf.qcdma; + struct device *dev = &qm->pdev->dev; + + dma_free_coherent(dev, xqc_dma->size, xqc_dma->va, xqc_dma->dma); +} + +static int hisi_qm_alloc_rsv_buf(struct hisi_qm *qm) +{ + struct qm_rsv_buf *xqc_buf = &qm->xqc_buf; + struct qm_dma *xqc_dma = &xqc_buf->qcdma; + struct device *dev = &qm->pdev->dev; + size_t off = 0; + +#define QM_XQC_BUF_INIT(xqc_buf, type) do { \ + (xqc_buf)->type = ((xqc_buf)->qcdma.va + (off)); \ + (xqc_buf)->type##_dma = (xqc_buf)->qcdma.dma + (off); \ + off += QMC_ALIGN(sizeof(struct qm_##type)); \ +} while (0) + + xqc_dma->size = QMC_ALIGN(sizeof(struct qm_eqc)) + + QMC_ALIGN(sizeof(struct qm_aeqc)) + + QMC_ALIGN(sizeof(struct qm_sqc)) + + QMC_ALIGN(sizeof(struct qm_cqc)); + + xqc_dma->va = dma_alloc_coherent(dev, xqc_dma->size, &xqc_dma->dma, + GFP_ATOMIC); + if (!xqc_dma->va) { + dev_err(dev, "Fail to alloc qcdma size=%zx\n", xqc_dma->size); + return -ENOMEM; + } + + QM_XQC_BUF_INIT(xqc_buf, eqc); + QM_XQC_BUF_INIT(xqc_buf, aeqc); + QM_XQC_BUF_INIT(xqc_buf, sqc); + QM_XQC_BUF_INIT(xqc_buf, cqc); + + return 0; +} + static int hisi_qm_memory_init(struct hisi_qm *qm) { struct device *dev = &qm->pdev->dev; - size_t qp_dma_size; + int ret = -ENOMEM; size_t off = 0; - int ret = 0; - int i; #define QM_INIT_BUF(qm, type, num) do { \ (qm)->type = ((qm)->qdma.va + (off)); \ @@ -2635,41 +2605,35 @@ static int hisi_qm_memory_init(struct hisi_qm *qm) QMC_ALIGN(sizeof(struct qm_aeqe) * QM_Q_DEPTH) + QMC_ALIGN(sizeof(struct qm_sqc) * qm->qp_num) + QMC_ALIGN(sizeof(struct qm_cqc) * qm->qp_num); - qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size, - &qm->qdma.dma, GFP_ATOMIC | __GFP_ZERO); - dev_dbg(dev, "allocate qm dma buf size=%zx)\n", qm->qdma.size); - if (!qm->qdma.va) - return -ENOMEM; + qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size, &qm->qdma.dma, + GFP_ATOMIC | __GFP_ZERO); + if (!qm->qdma.va) { + dev_err(dev, "Fail to alloc qdma size=%zx\n", qm->qdma.size); + goto err_destroy_idr; + } QM_INIT_BUF(qm, eqe, QM_EQ_DEPTH); QM_INIT_BUF(qm, aeqe, QM_Q_DEPTH); QM_INIT_BUF(qm, sqc, qm->qp_num); QM_INIT_BUF(qm, cqc, qm->qp_num); - qm->qp_array = kcalloc(qm->qp_num, sizeof(struct hisi_qp), GFP_KERNEL); - if (!qm->qp_array) { - ret = -ENOMEM; - goto err_alloc_qp_array; - } + ret = hisi_qm_alloc_rsv_buf(qm); + if (ret) + goto err_free_qdma; - /* one more page for device or qp statuses */ - qp_dma_size = qm->sqe_size * QM_Q_DEPTH + - sizeof(struct cqe) * QM_Q_DEPTH; - qp_dma_size = PAGE_ALIGN(qp_dma_size) + PAGE_SIZE; - for (i = 0; i < qm->qp_num; i++) { - ret = hisi_qp_memory_init(qm, qp_dma_size, i); - if (ret) - goto err_init_qp_mem; + ret = hisi_qp_alloc_memory(qm); + if (ret) + goto err_free_reserve_buf; - dev_dbg(dev, "allocate qp dma buf size=%zx)\n", qp_dma_size); - } + return 0; + +err_free_reserve_buf: + hisi_qm_free_rsv_buf(qm); +err_free_qdma: + dma_free_coherent(dev, qm->qdma.size, qm->qdma.va, qm->qdma.dma); +err_destroy_idr: + idr_destroy(&qm->qp_idr); - return ret; -err_init_qp_mem: - hisi_qp_memory_uninit(qm, i); -err_alloc_qp_array: - dma_free_coherent(dev, qm->qdma.size, - qm->qdma.va, qm->qdma.dma); return ret; } @@ -2861,8 +2825,7 @@ void hisi_qm_uninit(struct hisi_qm *qm) hisi_qp_memory_uninit(qm, qm->qp_num); idr_destroy(&qm->qp_idr); - - /* qm hardware buffer free on put_queue if no dma api */ + hisi_qm_free_rsv_buf(qm); if (qm->qdma.va) { hisi_qm_cache_wb(qm); dma_free_coherent(dev, qm->qdma.size, @@ -2962,59 +2925,26 @@ static void qm_init_eq_aeq_status(struct hisi_qm *qm) static int qm_eq_ctx_cfg(struct hisi_qm *qm) { - struct device *dev = &qm->pdev->dev; - struct qm_eqc *eqc; - dma_addr_t eqc_dma; - int ret; + struct qm_eqc eqc = {0}; - eqc = kzalloc(sizeof(struct qm_eqc), GFP_KERNEL); - if (!eqc) - return -ENOMEM; - eqc_dma = dma_map_single(dev, eqc, sizeof(struct qm_eqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, eqc_dma)) { - kfree(eqc); - return -ENOMEM; - } - - eqc->base_l = cpu_to_le32(lower_32_bits(qm->eqe_dma)); - eqc->base_h = cpu_to_le32(upper_32_bits(qm->eqe_dma)); + eqc.base_l = cpu_to_le32(lower_32_bits(qm->eqe_dma)); + eqc.base_h = cpu_to_le32(upper_32_bits(qm->eqe_dma)); if (qm->ver == QM_HW_V1) - eqc->dw3 = cpu_to_le32(QM_EQE_AEQE_SIZE); - eqc->dw6 = cpu_to_le32((QM_EQ_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - ret = hisi_qm_mb_write(qm, QM_MB_CMD_EQC, eqc_dma, 0, 0); - dma_unmap_single(dev, eqc_dma, sizeof(struct qm_eqc), DMA_TO_DEVICE); - kfree(eqc); + eqc.dw3 = cpu_to_le32(QM_EQE_AEQE_SIZE); + eqc.dw6 = cpu_to_le32((QM_EQ_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_EQC, &eqc, 0, 0); } static int qm_aeq_ctx_cfg(struct hisi_qm *qm) { - struct device *dev = &qm->pdev->dev; - struct qm_aeqc *aeqc; - dma_addr_t aeqc_dma; - int ret; + struct qm_aeqc aeqc = {0}; - aeqc = kzalloc(sizeof(struct qm_aeqc), GFP_KERNEL); - if (!aeqc) - return -ENOMEM; - aeqc_dma = dma_map_single(dev, aeqc, sizeof(struct qm_aeqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, aeqc_dma)) { - kfree(aeqc); - return -ENOMEM; - } + aeqc.base_l = cpu_to_le32(lower_32_bits(qm->aeqe_dma)); + aeqc.base_h = cpu_to_le32(upper_32_bits(qm->aeqe_dma)); + aeqc.dw6 = cpu_to_le32((QM_Q_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - aeqc->base_l = cpu_to_le32(lower_32_bits(qm->aeqe_dma)); - aeqc->base_h = cpu_to_le32(upper_32_bits(qm->aeqe_dma)); - aeqc->dw6 = cpu_to_le32((QM_Q_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - ret = hisi_qm_mb_write(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 0); - - dma_unmap_single(dev, aeqc_dma, sizeof(struct qm_aeqc), DMA_TO_DEVICE); - kfree(aeqc); - - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_AEQC, &aeqc, 0, 0); } static int qm_eq_aeq_ctx_cfg(struct hisi_qm *qm) diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h index d7d23d1ec34c..9f5e440d7396 100644 --- a/drivers/crypto/hisilicon/qm.h +++ b/drivers/crypto/hisilicon/qm.h @@ -286,6 +286,18 @@ struct hisi_qm_list { bool (*check)(struct hisi_qm *qm); }; +struct qm_rsv_buf { + struct qm_sqc *sqc; + struct qm_cqc *cqc; + struct qm_eqc *eqc; + struct qm_aeqc *aeqc; + dma_addr_t sqc_dma; + dma_addr_t cqc_dma; + dma_addr_t eqc_dma; + dma_addr_t aeqc_dma; + struct qm_dma qcdma; +}; + struct hisi_qm { enum qm_hw_ver ver; enum qm_fun_type fun_type; @@ -309,6 +321,7 @@ struct hisi_qm { dma_addr_t cqc_dma; dma_addr_t eqe_dma; dma_addr_t aeqe_dma; + struct qm_rsv_buf xqc_buf; struct hisi_qm_status status; struct hisi_qm_err_ini err_ini; -- 2.30.0

2 1

[PATCH OLK-5.10 00/42] add error handling for add disk
by Li Nan 06 Nov '23

06 Nov '23

Christoph Hellwig (14): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (10): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Tetsuo Handa (1): block: check minor range in device_add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (9): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup block/blk.h | 5 +- include/linux/blk-mq.h | 12 ++ include/linux/genhd.h | 36 +++++ block/blk-integrity.c | 12 +- block/blk-mq.c | 19 +++ block/genhd.c | 286 ++++++++++++++++++++++++++------------- drivers/block/brd.c | 103 ++++++-------- drivers/block/loop.c | 26 ++-- drivers/block/nbd.c | 75 ++++------ drivers/mtd/ubi/block.c | 76 +++++------ drivers/nvme/host/core.c | 41 +++--- drivers/scsi/sr.c | 7 +- 12 files changed, 413 insertions(+), 285 deletions(-) -- 2.39.2

1 42

[PATCH openEuler-1.0-LTS] crypto: hisilicon/qm - alloc reserve buffer to set and get xqc
by wangyuan 06 Nov '23

06 Nov '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8E6ED CVE: NA -------------------------------- If the temporarily applied memory is used to set or get the xqc information, the driver will release the memory after the mailbox timeout. However, the hardware still performs the operation. Therefore, the released memory may be written by hardware. So when load driver, reserve memory is applied for xqc configuration. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/qm.c | 554 +++++++++++++++------------------- drivers/crypto/hisilicon/qm.h | 13 + 2 files changed, 255 insertions(+), 312 deletions(-) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 64e8ef1bf8e7..b4fd97e1e03e 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -50,7 +50,7 @@ #define QM_SQ_TYPE_MASK 0xf -#define QM_SQ_TAIL_IDX(sqc) ((le16_to_cpu((sqc)->w11) >> 6) & 0x1) +#define QM_SQ_TAIL_IDX(sqc) ((le16_to_cpu((sqc).w11) >> 6) & 0x1) /* cqc shift */ #define QM_CQ_HOP_NUM_SHIFT 0 @@ -64,7 +64,7 @@ #define QM_QC_CQE_SIZE 4 -#define QM_CQ_TAIL_IDX(cqc) ((le16_to_cpu((cqc)->w11) >> 6) & 0x1) +#define QM_CQ_TAIL_IDX(cqc) ((le16_to_cpu((cqc).w11) >> 6) & 0x1) /* eqc shift */ #define QM_EQE_AEQE_SIZE (2UL << 12) @@ -217,19 +217,6 @@ #define QM_MK_SQC_DW3_V2(sqe_sz) \ ((QM_Q_DEPTH - 1) | ((u32)ilog2(sqe_sz) << QM_SQ_SQE_SIZE_SHIFT)) -#define INIT_QC_COMMON(qc, base, pasid) do { \ - (qc)->head = 0; \ - (qc)->tail = 0; \ - (qc)->base_l = cpu_to_le32(lower_32_bits(base)); \ - (qc)->base_h = cpu_to_le32(upper_32_bits(base)); \ - (qc)->dw3 = 0; \ - (qc)->w8 = 0; \ - (qc)->rsvd0 = 0; \ - (qc)->pasid = cpu_to_le16(pasid); \ - (qc)->w11 = 0; \ - (qc)->rsvd1 = 0; \ -} while (0) - #define QMC_ALIGN(sz) ALIGN(sz, 32) static int __hisi_qm_start(struct hisi_qm *qm); @@ -537,6 +524,61 @@ static int hisi_qm_mb_read(struct hisi_qm *qm, u64 *base, u8 cmd, u16 queue) return 0; } +/* op 0: set xqc info to hardware, 1: get xqc info from hardware. */ +static int qm_set_and_get_xqc(struct hisi_qm *qm, u8 cmd, void *xqc, u32 qp_id, + bool op) +{ + struct hisi_qm *pf_qm = pci_get_drvdata(pci_physfn(qm->pdev)); + struct qm_mailbox mailbox; + dma_addr_t xqc_dma; + void *tmp_xqc; + size_t size; + int ret; + + switch (cmd) { + case QM_MB_CMD_SQC: + size = sizeof(struct qm_sqc); + tmp_xqc = qm->xqc_buf.sqc; + xqc_dma = qm->xqc_buf.sqc_dma; + break; + case QM_MB_CMD_CQC: + size = sizeof(struct qm_cqc); + tmp_xqc = qm->xqc_buf.cqc; + xqc_dma = qm->xqc_buf.cqc_dma; + break; + case QM_MB_CMD_EQC: + size = sizeof(struct qm_eqc); + tmp_xqc = qm->xqc_buf.eqc; + xqc_dma = qm->xqc_buf.eqc_dma; + break; + case QM_MB_CMD_AEQC: + size = sizeof(struct qm_aeqc); + tmp_xqc = qm->xqc_buf.aeqc; + xqc_dma = qm->xqc_buf.aeqc_dma; + break; + } + + /* No need to judge if master OOO is blocked. */ + if (qm_check_dev_error(pf_qm)) { + dev_err(&qm->pdev->dev, + "QM mailbox operation failed since qm is stop!\n"); + return -EIO; + } + + mutex_lock(&qm->mailbox_lock); + if (!op) + memcpy(tmp_xqc, xqc, size); + + qm_mb_pre_init(&mailbox, cmd, xqc_dma, qp_id, op); + ret = qm_mb_nolock(qm, &mailbox); + if (!ret && op) + memcpy(xqc, tmp_xqc, size); + + mutex_unlock(&qm->mailbox_lock); + + return ret; +} + static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8 priority) { u64 doorbell; @@ -1166,35 +1208,6 @@ static ssize_t qm_cmd_read(struct file *filp, char __user *buffer, return (*pos = len); } -static void *qm_ctx_alloc(struct hisi_qm *qm, size_t ctx_size, - dma_addr_t *dma_addr) -{ - struct device *dev = &qm->pdev->dev; - void *ctx_addr; - - ctx_addr = kzalloc(ctx_size, GFP_KERNEL); - if (!ctx_addr) - return ERR_PTR(-ENOMEM); - - *dma_addr = dma_map_single(dev, ctx_addr, ctx_size, DMA_FROM_DEVICE); - if (dma_mapping_error(dev, *dma_addr)) { - dev_err(dev, "DMA mapping error!\n"); - kfree(ctx_addr); - return ERR_PTR(-ENOMEM); - } - - return ctx_addr; -} - -static void qm_ctx_free(struct hisi_qm *qm, size_t ctx_size, - const void *ctx_addr, dma_addr_t *dma_addr) -{ - struct device *dev = &qm->pdev->dev; - - dma_unmap_single(dev, *dma_addr, ctx_size, DMA_FROM_DEVICE); - kfree(ctx_addr); -} - static int dump_show(struct hisi_qm *qm, void *info, unsigned int info_size, char *info_name) { @@ -1205,8 +1218,10 @@ static int dump_show(struct hisi_qm *qm, void *info, #define BYTE_PER_DW 4 info_buf = kzalloc(info_size, GFP_KERNEL); - if (!info_buf) + if (!info_buf) { + dev_err(dev, "Fail to alloc dump info buf\n"); return -ENOMEM; + } for (i = 0; i < info_size; i++, info_curr++) { if (i % BYTE_PER_DW == 0) @@ -1230,21 +1245,11 @@ static int dump_show(struct hisi_qm *qm, void *info, return 0; } -static int qm_dump_sqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id) -{ - return hisi_qm_mb_write(qm, QM_MB_CMD_SQC, dma_addr, qp_id, 1); -} - -static int qm_dump_cqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id) -{ - return hisi_qm_mb_write(qm, QM_MB_CMD_CQC, dma_addr, qp_id, 1); -} - static int qm_sqc_dump(struct hisi_qm *qm, const char *s) { struct device *dev = &qm->pdev->dev; - struct qm_sqc *sqc, *sqc_curr; - dma_addr_t sqc_dma; + struct qm_sqc *sqc_curr; + struct qm_sqc sqc; u32 qp_id; int ret; @@ -1257,40 +1262,28 @@ static int qm_sqc_dump(struct hisi_qm *qm, const char *s) return -EINVAL; } - sqc = qm_ctx_alloc(qm, sizeof(struct qm_sqc), &sqc_dma); - if (IS_ERR(sqc)) - return PTR_ERR(sqc); - - ret = qm_dump_sqc_raw(qm, sqc_dma, qp_id); - if (ret) { - down_read(&qm->qps_lock); - if (qm->sqc) { - sqc_curr = qm->sqc + qp_id; - - ret = dump_show(qm, sqc_curr, sizeof(struct qm_sqc), - "SOFT SQC"); - if (ret) - dev_info(dev, "Show soft sqc failed!\n"); - } - up_read(&qm->qps_lock); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_SQC, &sqc, qp_id, 1); + if (!ret) + return dump_show(qm, &sqc, sizeof(struct qm_sqc), "SQC"); - goto mailbox_fail; + down_read(&qm->qps_lock); + if (qm->sqc) { + sqc_curr = qm->sqc + qp_id; + ret = dump_show(qm, sqc_curr, sizeof(struct qm_sqc), + "SOFT SQC"); + if (ret) + dev_info(dev, "Show soft sqc failed!\n"); } + up_read(&qm->qps_lock); - ret = dump_show(qm, sqc, sizeof(struct qm_sqc), "SQC"); - if (ret) - dev_info(dev, "Show hw sqc failed!\n"); - -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_sqc), sqc, &sqc_dma); return ret; } static int qm_cqc_dump(struct hisi_qm *qm, const char *s) { struct device *dev = &qm->pdev->dev; - struct qm_cqc *cqc, *cqc_curr; - dma_addr_t cqc_dma; + struct qm_cqc *cqc_curr; + struct qm_cqc cqc; u32 qp_id; int ret; @@ -1303,40 +1296,28 @@ static int qm_cqc_dump(struct hisi_qm *qm, const char *s) return -EINVAL; } - cqc = qm_ctx_alloc(qm, sizeof(struct qm_cqc), &cqc_dma); - if (IS_ERR(cqc)) - return PTR_ERR(cqc); - - ret = qm_dump_cqc_raw(qm, cqc_dma, qp_id); - if (ret) { - down_read(&qm->qps_lock); - if (qm->cqc) { - cqc_curr = qm->cqc + qp_id; + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_CQC, &cqc, qp_id, 1); + if (!ret) + return dump_show(qm, &cqc, sizeof(struct qm_cqc), "CQC"); - ret = dump_show(qm, cqc_curr, sizeof(struct qm_cqc), - "SOFT CQC"); - if (ret) - dev_info(dev, "Show soft cqc failed!\n"); - } - up_read(&qm->qps_lock); + down_read(&qm->qps_lock); + if (qm->cqc) { + cqc_curr = qm->cqc + qp_id; - goto mailbox_fail; + ret = dump_show(qm, cqc_curr, sizeof(struct qm_cqc), + "SOFT CQC"); + if (ret) + dev_info(dev, "Show soft cqc failed!\n"); } + up_read(&qm->qps_lock); - ret = dump_show(qm, cqc, sizeof(struct qm_cqc), "CQC"); - if (ret) - dev_info(dev, "Show hw cqc failed!\n"); - -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_cqc), cqc, &cqc_dma); return ret; } static int qm_eqc_dump(struct hisi_qm *qm, char *s) { struct device *dev = &qm->pdev->dev; - struct qm_eqc *eqc; - dma_addr_t eqc_dma; + struct qm_eqc eqc; int ret; if (strsep(&s, " ")) { @@ -1344,28 +1325,17 @@ static int qm_eqc_dump(struct hisi_qm *qm, char *s) return -EINVAL; } - eqc = qm_ctx_alloc(qm, sizeof(struct qm_eqc), &eqc_dma); - if (IS_ERR(eqc)) - return PTR_ERR(eqc); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_EQC, eqc_dma, 0, 1); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_EQC, &eqc, 0, 1); if (ret) - goto mailbox_fail; - - ret = dump_show(qm, eqc, sizeof(struct qm_eqc), "EQC"); - if (ret) - dev_info(dev, "Show eqc failed!\n"); + return ret; -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_eqc), eqc, &eqc_dma); - return ret; + return dump_show(qm, &eqc, sizeof(struct qm_eqc), "EQC"); } static int qm_aeqc_dump(struct hisi_qm *qm, char *s) { struct device *dev = &qm->pdev->dev; - struct qm_aeqc *aeqc; - dma_addr_t aeqc_dma; + struct qm_aeqc aeqc; int ret; if (strsep(&s, " ")) { @@ -1373,21 +1343,11 @@ static int qm_aeqc_dump(struct hisi_qm *qm, char *s) return -EINVAL; } - aeqc = qm_ctx_alloc(qm, sizeof(struct qm_aeqc), &aeqc_dma); - if (IS_ERR(aeqc)) - return PTR_ERR(aeqc); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 1); - if (ret) - goto mailbox_fail; - - ret = dump_show(qm, aeqc, sizeof(struct qm_aeqc), "AEQC"); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_AEQC, &aeqc, 0, 1); if (ret) - dev_info(dev, "Show hw aeqc failed!\n"); + return ret; -mailbox_fail: - qm_ctx_free(qm, sizeof(struct qm_aeqc), aeqc, &aeqc_dma); - return ret; + return dump_show(qm, &aeqc, sizeof(struct qm_aeqc), "AEQC"); } static int q_dump_param_parse(struct hisi_qm *qm, char *s, @@ -1432,7 +1392,6 @@ static int q_dump_param_parse(struct hisi_qm *qm, char *s, static int qm_sq_dump(struct hisi_qm *qm, char *s) { - struct device *dev = &qm->pdev->dev; struct hisi_qp *qp; u32 qp_id, sqe_id; void *sqe_curr; @@ -1447,16 +1406,11 @@ static int qm_sq_dump(struct hisi_qm *qm, char *s) memset(sqe_curr + qm->debug.sqe_mask_offset, SQE_ADDR_MASK, qm->debug.sqe_mask_len); - ret = dump_show(qm, sqe_curr, qm->sqe_size, "SQE"); - if (ret) - dev_info(dev, "Show sqe failed!\n"); - - return ret; + return dump_show(qm, sqe_curr, qm->sqe_size, "SQE"); } static int qm_cq_dump(struct hisi_qm *qm, char *s) { - struct device *dev = &qm->pdev->dev; struct qm_cqe *cqe_curr; struct hisi_qp *qp; u32 qp_id, cqe_id; @@ -1468,11 +1422,8 @@ static int qm_cq_dump(struct hisi_qm *qm, char *s) qp = &qm->qp_array[qp_id]; cqe_curr = qp->cqe + cqe_id; - ret = dump_show(qm, cqe_curr, sizeof(struct qm_cqe), "CQE"); - if (ret) - dev_info(dev, "Show cqe failed!\n"); - return ret; + return dump_show(qm, cqe_curr, sizeof(struct qm_cqe), "CQE"); } static int qm_eq_dump(struct hisi_qm *qm, const char *s) @@ -1895,79 +1846,46 @@ EXPORT_SYMBOL_GPL(hisi_qm_release_qp); static int qm_sq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) { struct hisi_qm *qm = qp->qm; - struct device *dev = &qm->pdev->dev; enum qm_hw_ver ver = qm->ver; - struct qm_sqc *sqc; - dma_addr_t sqc_dma; - int ret; + struct qm_sqc sqc = {0}; - sqc = kzalloc(sizeof(struct qm_sqc), GFP_KERNEL); - if (!sqc) - return -ENOMEM; - sqc_dma = dma_map_single(dev, sqc, sizeof(struct qm_sqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, sqc_dma)) { - kfree(sqc); - return -ENOMEM; - } - - INIT_QC_COMMON(sqc, qp->sqe_dma, pasid); if (ver == QM_HW_V1) { - sqc->dw3 = cpu_to_le32(QM_MK_SQC_DW3_V1(0, 0, 0, qm->sqe_size)); - sqc->w8 = cpu_to_le16(QM_Q_DEPTH - 1); + sqc.dw3 = cpu_to_le32(QM_MK_SQC_DW3_V1(0, 0, 0, qm->sqe_size)); + sqc.w8 = cpu_to_le16(QM_Q_DEPTH - 1); } else if (ver == QM_HW_V2) { - sqc->dw3 = cpu_to_le32(QM_MK_SQC_DW3_V2(qm->sqe_size)); - sqc->w8 = 0; /* rand_qc */ + sqc.dw3 = cpu_to_le32(QM_MK_SQC_DW3_V2(qm->sqe_size)); + sqc.w8 = 0; /* rand_qc */ } - sqc->cq_num = cpu_to_le16(qp_id); - sqc->w13 = cpu_to_le16(QM_MK_SQC_W13(0, 1, qp->alg_type)); - - ret = hisi_qm_mb_write(qm, QM_MB_CMD_SQC, sqc_dma, qp_id, 0); + sqc.cq_num = cpu_to_le16(qp_id); + sqc.w13 = cpu_to_le16(QM_MK_SQC_W13(0, 1, qp->alg_type)); + sqc.base_l = cpu_to_le32(lower_32_bits(qp->sqe_dma)); + sqc.base_h = cpu_to_le32(upper_32_bits(qp->sqe_dma)); + sqc.pasid = cpu_to_le16(pasid); - dma_unmap_single(dev, sqc_dma, sizeof(struct qm_sqc), DMA_TO_DEVICE); - kfree(sqc); - - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_SQC, &sqc, qp_id, 0); } static int qm_cq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) { struct hisi_qm *qm = qp->qm; - struct device *dev = &qm->pdev->dev; enum qm_hw_ver ver = qm->ver; - struct qm_cqc *cqc; - dma_addr_t cqc_dma; - int ret; - - cqc = kzalloc(sizeof(struct qm_cqc), GFP_KERNEL); - if (!cqc) - return -ENOMEM; - - cqc_dma = dma_map_single(dev, cqc, sizeof(struct qm_cqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, cqc_dma)) { - kfree(cqc); - return -ENOMEM; - } + struct qm_cqc cqc = {0}; - INIT_QC_COMMON(cqc, qp->cqe_dma, pasid); if (ver == QM_HW_V1) { - cqc->dw3 = cpu_to_le32(QM_MK_CQC_DW3_V1(0, 0, 0, + cqc.dw3 = cpu_to_le32(QM_MK_CQC_DW3_V1(0, 0, 0, QM_QC_CQE_SIZE)); - cqc->w8 = cpu_to_le16(QM_Q_DEPTH - 1); + cqc.w8 = cpu_to_le16(QM_Q_DEPTH - 1); } else if (ver == QM_HW_V2) { - cqc->dw3 = cpu_to_le32(QM_MK_CQC_DW3_V2(QM_QC_CQE_SIZE)); - cqc->w8 = 0; /* rand_qc */ + cqc.dw3 = cpu_to_le32(QM_MK_CQC_DW3_V2(QM_QC_CQE_SIZE)); + cqc.w8 = 0; /* rand_qc */ } - cqc->dw6 = cpu_to_le32(1 << QM_CQ_PHASE_SHIFT | - qp->c_flag << QM_CQ_FLAG_SHIFT); + cqc.dw6 = cpu_to_le32(1 << QM_CQ_PHASE_SHIFT | + qp->c_flag << QM_CQ_FLAG_SHIFT); + cqc.base_l = cpu_to_le32(lower_32_bits(qp->cqe_dma)); + cqc.base_h = cpu_to_le32(upper_32_bits(qp->cqe_dma)); + cqc.pasid = cpu_to_le16(pasid); - ret = hisi_qm_mb_write(qm, QM_MB_CMD_CQC, cqc_dma, qp_id, 0); - - dma_unmap_single(dev, cqc_dma, sizeof(struct qm_cqc), DMA_TO_DEVICE); - kfree(cqc); - - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_CQC, &cqc, qp_id, 0); } static int qm_qp_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) @@ -2043,54 +1961,40 @@ static void qp_stop_fail_cb(struct hisi_qp *qp) static void qm_qp_has_no_task(struct hisi_qp *qp) { - size_t size = sizeof(struct qm_sqc) + sizeof(struct qm_cqc); - struct device *dev = &qp->qm->pdev->dev; - struct qm_sqc *sqc; - struct qm_cqc *cqc; - dma_addr_t dma_addr; - void *addr; - int i = 0; - int ret; + struct hisi_qm *qm = qp->qm; + struct device *dev = &qm->pdev->dev; + struct qm_sqc sqc; + struct qm_cqc cqc; + int ret, i = 0; if (qp->qm->err_ini.err_info.is_qm_ecc_mbit || qp->qm->err_ini.err_info.is_dev_ecc_mbit) return; - addr = qm_ctx_alloc(qp->qm, size, &dma_addr); - if (IS_ERR(addr)) { - dev_err(dev, "alloc ctx for sqc and cqc failed!\n"); - return; - } - while (++i) { - ret = qm_dump_sqc_raw(qp->qm, dma_addr, qp->qp_id); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_SQC, &sqc, qp->qp_id, 1); if (ret) { - dev_err(dev, "Failed to dump sqc!\n"); - break; + dev_err_ratelimited(dev, "Fail to dump sqc!\n"); + return; } - sqc = addr; - ret = qm_dump_cqc_raw(qp->qm, - (dma_addr + sizeof(struct qm_sqc)), qp->qp_id); + ret = qm_set_and_get_xqc(qm, QM_MB_CMD_CQC, &cqc, qp->qp_id, 1); if (ret) { - dev_err(dev, "Failed to dump cqc!\n"); - break; + dev_err_ratelimited(dev, "Fail to dump cqc!\n"); + return; } - cqc = addr + sizeof(struct qm_sqc); - if ((sqc->tail == cqc->tail) && - (QM_SQ_TAIL_IDX(sqc) == QM_CQ_TAIL_IDX(cqc))) - break; + if ((QM_SQ_TAIL_IDX(sqc) == QM_CQ_TAIL_IDX(cqc)) && + (sqc.tail == cqc.tail)) + return; if (i == MAX_WAIT_COUNTS) { - dev_err(dev, "Fail to wait for device stop!\n"); - break; + dev_err(dev, "Fail to empty queue %u!\n", qp->qp_id); + return; } usleep_range(WAIT_PERIOD_US_MIN, WAIT_PERIOD_US_MAX); } - - qm_ctx_free(qp->qm, size, addr, &dma_addr); } static int hisi_qm_stop_qp_nolock(struct hisi_qp *qp) @@ -2597,10 +2501,12 @@ static int hisi_qp_memory_init(struct hisi_qm *qm, size_t dma_size, int id) struct hisi_qp *qp; qp = &qm->qp_array[id]; - qp->qdma.va = dma_alloc_coherent(dev, dma_size, - &qp->qdma.dma, GFP_KERNEL); - if (!qp->qdma.va) + qp->qdma.va = dma_alloc_coherent(dev, dma_size, &qp->qdma.dma, + GFP_KERNEL); + if (!qp->qdma.va) { + dev_err(dev, "Fail to alloc qp dma buf size=%zx\n", dma_size); return -ENOMEM; + } qp->sqe = qp->qdma.va; qp->sqe_dma = qp->qdma.dma; @@ -2613,13 +2519,79 @@ static int hisi_qp_memory_init(struct hisi_qm *qm, size_t dma_size, int id) return 0; } +static int hisi_qp_alloc_memory(struct hisi_qm *qm) +{ + size_t qp_dma_size; + int i, ret; + + qm->qp_array = kcalloc(qm->qp_num, sizeof(struct hisi_qp), GFP_KERNEL); + if (!qm->qp_array) + return -ENOMEM; + + /* one more page for device or qp statuses */ + qp_dma_size = qm->sqe_size * QM_Q_DEPTH + + sizeof(struct cqe) * QM_Q_DEPTH; + qp_dma_size = PAGE_ALIGN(qp_dma_size) + PAGE_SIZE; + for (i = 0; i < qm->qp_num; i++) { + ret = hisi_qp_memory_init(qm, qp_dma_size, i); + if (ret) + goto err_init_qp_mem; + } + + return 0; + +err_init_qp_mem: + hisi_qp_memory_uninit(qm, i); + + return ret; +} + +static void hisi_qm_free_rsv_buf(struct hisi_qm *qm) +{ + struct qm_dma *xqc_dma = &qm->xqc_buf.qcdma; + struct device *dev = &qm->pdev->dev; + + dma_free_coherent(dev, xqc_dma->size, xqc_dma->va, xqc_dma->dma); +} + +static int hisi_qm_alloc_rsv_buf(struct hisi_qm *qm) +{ + struct qm_rsv_buf *xqc_buf = &qm->xqc_buf; + struct qm_dma *xqc_dma = &xqc_buf->qcdma; + struct device *dev = &qm->pdev->dev; + size_t off = 0; + +#define QM_XQC_BUF_INIT(xqc_buf, type) do { \ + (xqc_buf)->type = ((xqc_buf)->qcdma.va + (off)); \ + (xqc_buf)->type##_dma = (xqc_buf)->qcdma.dma + (off); \ + off += QMC_ALIGN(sizeof(struct qm_##type)); \ +} while (0) + + xqc_dma->size = QMC_ALIGN(sizeof(struct qm_eqc)) + + QMC_ALIGN(sizeof(struct qm_aeqc)) + + QMC_ALIGN(sizeof(struct qm_sqc)) + + QMC_ALIGN(sizeof(struct qm_cqc)); + + xqc_dma->va = dma_alloc_coherent(dev, xqc_dma->size, &xqc_dma->dma, + GFP_ATOMIC); + if (!xqc_dma->va) { + dev_err(dev, "Fail to alloc qcdma size=%zx\n", xqc_dma->size); + return -ENOMEM; + } + + QM_XQC_BUF_INIT(xqc_buf, eqc); + QM_XQC_BUF_INIT(xqc_buf, aeqc); + QM_XQC_BUF_INIT(xqc_buf, sqc); + QM_XQC_BUF_INIT(xqc_buf, cqc); + + return 0; +} + static int hisi_qm_memory_init(struct hisi_qm *qm) { struct device *dev = &qm->pdev->dev; - size_t qp_dma_size; + int ret = -ENOMEM; size_t off = 0; - int ret = 0; - int i; #define QM_INIT_BUF(qm, type, num) do { \ (qm)->type = ((qm)->qdma.va + (off)); \ @@ -2635,41 +2607,35 @@ static int hisi_qm_memory_init(struct hisi_qm *qm) QMC_ALIGN(sizeof(struct qm_aeqe) * QM_Q_DEPTH) + QMC_ALIGN(sizeof(struct qm_sqc) * qm->qp_num) + QMC_ALIGN(sizeof(struct qm_cqc) * qm->qp_num); - qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size, - &qm->qdma.dma, GFP_ATOMIC | __GFP_ZERO); - dev_dbg(dev, "allocate qm dma buf size=%zx)\n", qm->qdma.size); - if (!qm->qdma.va) - return -ENOMEM; + qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size, &qm->qdma.dma, + GFP_ATOMIC | __GFP_ZERO); + if (!qm->qdma.va) { + dev_err(dev, "Fail to alloc qdma size=%zx\n", qm->qdma.size); + goto err_destroy_idr; + } QM_INIT_BUF(qm, eqe, QM_EQ_DEPTH); QM_INIT_BUF(qm, aeqe, QM_Q_DEPTH); QM_INIT_BUF(qm, sqc, qm->qp_num); QM_INIT_BUF(qm, cqc, qm->qp_num); - qm->qp_array = kcalloc(qm->qp_num, sizeof(struct hisi_qp), GFP_KERNEL); - if (!qm->qp_array) { - ret = -ENOMEM; - goto err_alloc_qp_array; - } + ret = hisi_qm_alloc_rsv_buf(qm); + if (ret) + goto err_free_qdma; - /* one more page for device or qp statuses */ - qp_dma_size = qm->sqe_size * QM_Q_DEPTH + - sizeof(struct cqe) * QM_Q_DEPTH; - qp_dma_size = PAGE_ALIGN(qp_dma_size) + PAGE_SIZE; - for (i = 0; i < qm->qp_num; i++) { - ret = hisi_qp_memory_init(qm, qp_dma_size, i); - if (ret) - goto err_init_qp_mem; + ret = hisi_qp_alloc_memory(qm); + if (ret) + goto err_free_reserve_buf; - dev_dbg(dev, "allocate qp dma buf size=%zx)\n", qp_dma_size); - } + return 0; + +err_free_reserve_buf: + hisi_qm_free_rsv_buf(qm); +err_free_qdma: + dma_free_coherent(dev, qm->qdma.size, qm->qdma.va, qm->qdma.dma); +err_destroy_idr: + idr_destroy(&qm->qp_idr); - return ret; -err_init_qp_mem: - hisi_qp_memory_uninit(qm, i); -err_alloc_qp_array: - dma_free_coherent(dev, qm->qdma.size, - qm->qdma.va, qm->qdma.dma); return ret; } @@ -2861,8 +2827,7 @@ void hisi_qm_uninit(struct hisi_qm *qm) hisi_qp_memory_uninit(qm, qm->qp_num); idr_destroy(&qm->qp_idr); - - /* qm hardware buffer free on put_queue if no dma api */ + hisi_qm_free_rsv_buf(qm); if (qm->qdma.va) { hisi_qm_cache_wb(qm); dma_free_coherent(dev, qm->qdma.size, @@ -2962,59 +2927,26 @@ static void qm_init_eq_aeq_status(struct hisi_qm *qm) static int qm_eq_ctx_cfg(struct hisi_qm *qm) { - struct device *dev = &qm->pdev->dev; - struct qm_eqc *eqc; - dma_addr_t eqc_dma; - int ret; + struct qm_eqc eqc = {0}; - eqc = kzalloc(sizeof(struct qm_eqc), GFP_KERNEL); - if (!eqc) - return -ENOMEM; - eqc_dma = dma_map_single(dev, eqc, sizeof(struct qm_eqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, eqc_dma)) { - kfree(eqc); - return -ENOMEM; - } - - eqc->base_l = cpu_to_le32(lower_32_bits(qm->eqe_dma)); - eqc->base_h = cpu_to_le32(upper_32_bits(qm->eqe_dma)); + eqc.base_l = cpu_to_le32(lower_32_bits(qm->eqe_dma)); + eqc.base_h = cpu_to_le32(upper_32_bits(qm->eqe_dma)); if (qm->ver == QM_HW_V1) - eqc->dw3 = cpu_to_le32(QM_EQE_AEQE_SIZE); - eqc->dw6 = cpu_to_le32((QM_EQ_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - ret = hisi_qm_mb_write(qm, QM_MB_CMD_EQC, eqc_dma, 0, 0); - dma_unmap_single(dev, eqc_dma, sizeof(struct qm_eqc), DMA_TO_DEVICE); - kfree(eqc); + eqc.dw3 = cpu_to_le32(QM_EQE_AEQE_SIZE); + eqc.dw6 = cpu_to_le32((QM_EQ_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_EQC, &eqc, 0, 0); } static int qm_aeq_ctx_cfg(struct hisi_qm *qm) { - struct device *dev = &qm->pdev->dev; - struct qm_aeqc *aeqc; - dma_addr_t aeqc_dma; - int ret; + struct qm_aeqc aeqc = {0}; - aeqc = kzalloc(sizeof(struct qm_aeqc), GFP_KERNEL); - if (!aeqc) - return -ENOMEM; - aeqc_dma = dma_map_single(dev, aeqc, sizeof(struct qm_aeqc), - DMA_TO_DEVICE); - if (dma_mapping_error(dev, aeqc_dma)) { - kfree(aeqc); - return -ENOMEM; - } + aeqc.base_l = cpu_to_le32(lower_32_bits(qm->aeqe_dma)); + aeqc.base_h = cpu_to_le32(upper_32_bits(qm->aeqe_dma)); + aeqc.dw6 = cpu_to_le32((QM_Q_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - aeqc->base_l = cpu_to_le32(lower_32_bits(qm->aeqe_dma)); - aeqc->base_h = cpu_to_le32(upper_32_bits(qm->aeqe_dma)); - aeqc->dw6 = cpu_to_le32((QM_Q_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - ret = hisi_qm_mb_write(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 0); - - dma_unmap_single(dev, aeqc_dma, sizeof(struct qm_aeqc), DMA_TO_DEVICE); - kfree(aeqc); - - return ret; + return qm_set_and_get_xqc(qm, QM_MB_CMD_AEQC, &aeqc, 0, 0); } static int qm_eq_aeq_ctx_cfg(struct hisi_qm *qm) @@ -3841,10 +3773,8 @@ static int qm_vf_reset_prepare(struct pci_dev *pdev, ret = hisi_qm_stop(qm, stop_reason); if (ret) { - hisi_qm_set_hw_reset(qm, - QM_RESET_STOP_TX_OFFSET); - hisi_qm_set_hw_reset(qm, - QM_RESET_STOP_RX_OFFSET); + hisi_qm_set_hw_reset(qm, QM_RESET_STOP_TX_OFFSET); + hisi_qm_set_hw_reset(qm, QM_RESET_STOP_RX_OFFSET); atomic_set(&qm->status.flags, QM_STOP); } } diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h index d7d23d1ec34c..9f5e440d7396 100644 --- a/drivers/crypto/hisilicon/qm.h +++ b/drivers/crypto/hisilicon/qm.h @@ -286,6 +286,18 @@ struct hisi_qm_list { bool (*check)(struct hisi_qm *qm); }; +struct qm_rsv_buf { + struct qm_sqc *sqc; + struct qm_cqc *cqc; + struct qm_eqc *eqc; + struct qm_aeqc *aeqc; + dma_addr_t sqc_dma; + dma_addr_t cqc_dma; + dma_addr_t eqc_dma; + dma_addr_t aeqc_dma; + struct qm_dma qcdma; +}; + struct hisi_qm { enum qm_hw_ver ver; enum qm_fun_type fun_type; @@ -309,6 +321,7 @@ struct hisi_qm { dma_addr_t cqc_dma; dma_addr_t eqe_dma; dma_addr_t aeqe_dma; + struct qm_rsv_buf xqc_buf; struct hisi_qm_status status; struct hisi_qm_err_ini err_ini; -- 2.30.0

2 1

[PATCH OLK-5.10 v4 0/7] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 06 Nov '23

06 Nov '23

From: “z00447436” <zouqian4(a)huawei.com> Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (4): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init() Jia He (1): EDAC/ghes: Make ghes_edac a proper module Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 23 +++---- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 117 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/edac/Kconfig | 4 +- drivers/edac/ghes_edac.c | 40 +++++++++++- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/acpi/ghes.h | 20 +----- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 20 files changed, 283 insertions(+), 59 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 8

[PATCH OLK-5.10] blk-mq: avoid housekeeping CPUs scheduling a worker on a non-housekeeping CPU
by Xiongfeng Wang 06 Nov '23

06 Nov '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZBTV CVE: N/A ---------------------------------------------------- When NOHZ_FULL is enabled, such as in HPC situation, CPUs are divided into housekeeping CPUs and non-housekeeping CPUs. Non-housekeeping CPUs are NOHZ_FULL CPUs and are often monopolized by the userspace process, such HPC application process. Any sort of interruption is not expected. blk_mq_hctx_next_cpu() selects each cpu in 'hctx->cpumask' alternately to schedule the work thread blk_mq_run_work_fn(). When 'hctx->cpumask' contains housekeeping CPU and non-housekeeping CPU at the same time, a housekeeping CPU, which want to request a IO, may schedule a worker on a non-housekeeping CPU. This may affect the performance of the userspace application running on non-housekeeping CPUs. So let's just schedule the worker thread on the current CPU when the current CPU is housekeeping CPU. Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- block/blk-mq.c | 11 ++++++++++- include/linux/sched/isolation.h | 2 ++ kernel/sched/isolation.c | 8 ++++++++ 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 1d1200afb771..d0dce9ef9499 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -23,6 +23,7 @@ #include <linux/sched/sysctl.h> #include <linux/sched/topology.h> #include <linux/sched/signal.h> +#include <linux/sched/isolation.h> #include <linux/delay.h> #include <linux/crash_dump.h> #include <linux/prefetch.h> @@ -1676,6 +1677,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, unsigned long msecs) { + int work_cpu; + if (unlikely(blk_mq_hctx_stopped(hctx))) return; @@ -1697,7 +1700,13 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, if (!percpu_ref_tryget(&hctx->queue->q_usage_counter)) return; - kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, + if (enhanced_isolcpus && tick_nohz_full_enabled() && + housekeeping_cpu(smp_processor_id(), HK_FLAG_WQ)) + work_cpu = smp_processor_id(); + else + work_cpu = blk_mq_hctx_next_cpu(hctx); + + kblockd_mod_delayed_work_on(work_cpu, &hctx->run_work, msecs_to_jiffies(msecs)); percpu_ref_put(&hctx->queue->q_usage_counter); } diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index cc9f393e2a70..2f93081ad7a0 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -18,6 +18,7 @@ enum hk_flags { }; #ifdef CONFIG_CPU_ISOLATION +extern bool enhanced_isolcpus; DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); extern int housekeeping_any_cpu(enum hk_flags flags); extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags); @@ -28,6 +29,7 @@ extern void __init housekeeping_init(void); #else +#define enhanced_isolcpus 0 static inline int housekeeping_any_cpu(enum hk_flags flags) { return smp_processor_id(); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5a6ea03f9882..785ef5201116 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -198,3 +198,11 @@ static int __init housekeeping_isolcpus_setup(char *str) return housekeeping_setup(str, flags); } __setup("isolcpus=", housekeeping_isolcpus_setup); + +bool enhanced_isolcpus; +static int __init enhanced_isolcpus_setup(char *str) +{ + enhanced_isolcpus = true; + return 0; +} +__setup("enhanced_isolcpus", enhanced_isolcpus_setup); -- 2.20.1

3 3

[PATCH OLK-5.10 v3 0/7] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 06 Nov '23

06 Nov '23

From: “z00447436” <zouqian4(a)huawei.com> Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (4): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init() Jia He (1): EDAC/ghes: Make ghes_edac a proper module Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 23 +++---- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 117 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/edac/Kconfig | 4 +- drivers/edac/ghes_edac.c | 40 +++++++++++- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/acpi/ghes.h | 25 -------- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 20 files changed, 281 insertions(+), 66 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 8

[PATCH OLK-5.10 v2 0/7] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 06 Nov '23

06 Nov '23

From: “z00447436” <zouqian4(a)huawei.com> Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (4): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init() Jia He (1): EDAC/ghes: Make ghes_edac a proper module Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 23 +++---- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 117 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/edac/Kconfig | 4 +- drivers/edac/ghes_edac.c | 40 +++++++++++- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/acpi/ghes.h | 25 -------- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 20 files changed, 281 insertions(+), 66 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 8

[PATCH openEuler-1.0-LTS] fs: lockd: avoid possible wrong NULL parameter
by ZhaoLong Wang 06 Nov '23

06 Nov '23

From: Su Hui <suhui(a)nfschina.com> stable inclusion from stable-v4.19.295 commit 35f0749756b848ad4f4a165ad6b1dfa8d0e45a96 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8E5Q5 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit de8d38cf44bac43e83bad28357ba84784c412752 ] clang's static analysis warning: fs/lockd/mon.c: line 293, column 2: Null pointer passed as 2nd argument to memory copy function. Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will pass NULL as 2nd argument to memory copy function 'memcpy()'. So return NULL if 'hostname' is invalid. Fixes: 77a3ef33e2de ("NSM: More clean up of nsm_get_handle()") Signed-off-by: Su Hui <suhui(a)nfschina.com> Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: ZhaoLong Wang <wangzhaolong1(a)huawei.com> --- fs/lockd/mon.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index 654594ef4f94..68a2eac548c3 100644 --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -275,6 +275,9 @@ static struct nsm_handle *nsm_create_handle(const struct sockaddr *sap, { struct nsm_handle *new; + if (!hostname) + return NULL; + new = kzalloc(sizeof(*new) + hostname_len + 1, GFP_KERNEL); if (unlikely(new == NULL)) return NULL; -- 2.34.3

2 1

[PATCH openEuler-1.0-LTS] crypto: hisilicon - qm obtain the mailbox config at one time
by wangyuan 06 Nov '23

06 Nov '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8E0N4 CVE: NA -------------------------------- The mailbox configuration needs to be obtained from the hardware at one time. If the mailbox configuration is obtained for multiple times, the read value may be incorrect. Use the instruction to read mailbox data instead of readl. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/qm.c | 187 +++++++++++++++++++++++++--------- 1 file changed, 137 insertions(+), 50 deletions(-) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index f2706dc0d55e..64e8ef1bf8e7 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -37,6 +37,7 @@ #define QM_MB_OP_SHIFT 14 #define QM_MB_CMD_DATA_ADDR_L 0x304 #define QM_MB_CMD_DATA_ADDR_H 0x308 +#define QM_MB_STATUS_MASK GENMASK(12, 9) /* sqc shift */ #define QM_SQ_HOP_NUM_SHIFT 0 @@ -113,7 +114,7 @@ #define QM_SQC_VFT_NUM_MASK_v2 0x3ff #define QM_DFX_CNT_CLR_CE 0x100118 -#define QM_IN_IDLE_ST_REG 0x1040e4 +#define QM_IN_IDLE_ST_REG 0x1040e4 #define QM_ABNORMAL_INT_SOURCE 0x100000 #define QM_ABNORMAL_INT_MASK 0x100004 @@ -169,6 +170,8 @@ #define TASK_TIMEOUT 10000 #define WAIT_PERIOD 20 +#define QM_MB_WAIT_READY_CNT 10 +#define QM_MB_MAX_WAIT_CNT 21000 #define WAIT_PERIOD_US_MAX 200 #define WAIT_PERIOD_US_MIN 100 #define REMOVE_WAIT_DELAY 10 @@ -382,16 +385,6 @@ static bool qm_qp_avail_state(struct hisi_qm *qm, struct hisi_qp *qp, return avail; } -/* return 0 mailbox ready, -ETIMEDOUT hardware timeout */ -static int qm_wait_mb_ready(struct hisi_qm *qm) -{ - u32 val; - - return readl_relaxed_poll_timeout(qm->io_base + QM_MB_CMD_SEND_BASE, - val, !((val >> QM_MB_BUSY_SHIFT) & - 0x1), POLL_PERIOD, POLL_TIMEOUT); -} - /* 128 bit should be wrote to hardware at one time to trigger a mailbox */ static void qm_mb_write(struct hisi_qm *qm, const void *src) { @@ -400,7 +393,7 @@ static void qm_mb_write(struct hisi_qm *qm, const void *src) asm volatile("ldp %0, %1, %3\n" "stp %0, %1, %2\n" - "dsb sy\n" + "dmb oshst\n" : "=&r" (tmp0), "=&r" (tmp1), "+Q" (*((char __iomem *)fun_base)) @@ -408,45 +401,142 @@ static void qm_mb_write(struct hisi_qm *qm, const void *src) : "memory"); } -static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue, - bool op) +/* 128 bit should be read from hardware at one time*/ +static void qm_mb_read(struct hisi_qm *qm, void *dst) +{ + const void __iomem *fun_base = qm->io_base + QM_MB_CMD_SEND_BASE; + unsigned long tmp0 = 0, tmp1 = 0; + + asm volatile("ldp %0, %1, %3\n" + "stp %0, %1, %2\n" + "dmb oshst\n" + : "=&r" (tmp0), + "=&r" (tmp1), + "+Q" (*((char *)dst)) + : "Q" (*((char __iomem *)fun_base)) + : "memory"); +} + +static int qm_wait_mb_ready(struct hisi_qm *qm) { struct qm_mailbox mailbox; - int ret = 0; + int i = 0; + + while (i++ < QM_MB_WAIT_READY_CNT) { + qm_mb_read(qm, &mailbox); + if (!((le16_to_cpu(mailbox.w0) >> QM_MB_BUSY_SHIFT) & 0x1)) + return 0; - dev_dbg(&qm->pdev->dev, "QM mailbox request to q%u: %u\n", queue, cmd); + usleep_range(WAIT_PERIOD_US_MIN, WAIT_PERIOD_US_MAX); + } - mailbox.w0 = cpu_to_le16(cmd | - (op ? 0x1 << QM_MB_OP_SHIFT : 0) | - (0x1 << QM_MB_BUSY_SHIFT)); - mailbox.queue_num = cpu_to_le16(queue); - mailbox.base_l = cpu_to_le32(lower_32_bits(dma_addr)); - mailbox.base_h = cpu_to_le32(upper_32_bits(dma_addr)); - mailbox.rsvd = 0; + dev_err(&qm->pdev->dev, "QM mailbox is busy to start!\n"); - mutex_lock(&qm->mailbox_lock); + return -EBUSY; +} + +static int qm_wait_mb_finish(struct hisi_qm *qm, struct qm_mailbox *mailbox) +{ + int i = 0; + + while (++i) { + qm_mb_read(qm, mailbox); + if (!((le16_to_cpu(mailbox->w0) >> QM_MB_BUSY_SHIFT) & 0x1)) + break; + + if (i > QM_MB_MAX_WAIT_CNT) { + dev_err(&qm->pdev->dev, + "QM mailbox operation timeout!\n"); + return -ETIMEDOUT; + } - if (unlikely(qm_wait_mb_ready(qm))) { - ret = -EBUSY; - dev_err(&qm->pdev->dev, "QM mailbox is busy to start!\n"); - goto busy_unlock; + usleep_range(WAIT_PERIOD_US_MIN, WAIT_PERIOD_US_MAX); + } + + if (le16_to_cpu(mailbox->w0) & QM_MB_STATUS_MASK) { + dev_err(&qm->pdev->dev, "QM mailbox operation failed!\n"); + return -EIO; } - qm_mb_write(qm, &mailbox); + return 0; +} - if (unlikely(qm_wait_mb_ready(qm))) { - ret = -EBUSY; - dev_err(&qm->pdev->dev, "QM mailbox operation timeout!\n"); - goto busy_unlock; +static int qm_mb_nolock(struct hisi_qm *qm, struct qm_mailbox *mailbox) +{ + int ret; + + ret = qm_wait_mb_ready(qm); + if (ret) + goto mb_busy; + + qm_mb_write(qm, mailbox); + + ret = qm_wait_mb_finish(qm, mailbox); + if (ret) + goto mb_busy; + + return 0; + +mb_busy: + atomic64_inc(&qm->debug.dfx.mb_err_cnt); + return ret; +} + +static void qm_mb_pre_init(struct qm_mailbox *mailbox, u8 cmd, + u64 base, u16 queue, bool op) +{ + mailbox->w0 = cpu_to_le16((cmd) | + ((op) ? 0x1 << QM_MB_OP_SHIFT : 0) | + (0x1 << QM_MB_BUSY_SHIFT)); + mailbox->queue_num = cpu_to_le16(queue); + mailbox->base_l = cpu_to_le32(lower_32_bits(base)); + mailbox->base_h = cpu_to_le32(upper_32_bits(base)); + mailbox->rsvd = 0; +} + +static int qm_check_dev_error(struct hisi_qm *qm); +static int hisi_qm_mb_write(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, + u16 queue, bool op) +{ + struct hisi_qm *pf_qm = pci_get_drvdata(pci_physfn(qm->pdev)); + struct qm_mailbox mailbox; + int ret; + + dev_dbg(&qm->pdev->dev, "QM mailbox request to q%u: %u.\n", queue, cmd); + + /* No need to judge if master OOO is blocked. */ + if (qm_check_dev_error(pf_qm)) { + dev_err(&qm->pdev->dev, + "QM mailbox operation failed since qm is stop!\n"); + return -EIO; } -busy_unlock: + qm_mb_pre_init(&mailbox, cmd, dma_addr, queue, op); + mutex_lock(&qm->mailbox_lock); + ret = qm_mb_nolock(qm, &mailbox); mutex_unlock(&qm->mailbox_lock); - if (ret) - atomic64_inc(&qm->debug.dfx.mb_err_cnt); + return ret; } +static int hisi_qm_mb_read(struct hisi_qm *qm, u64 *base, u8 cmd, u16 queue) +{ + struct qm_mailbox mailbox; + int ret; + + qm_mb_pre_init(&mailbox, cmd, 0, queue, 1); + mutex_lock(&qm->mailbox_lock); + ret = qm_mb_nolock(qm, &mailbox); + mutex_unlock(&qm->mailbox_lock); + if (ret) + return ret; + + *base = le32_to_cpu(mailbox.base_l) | + ((u64)le32_to_cpu(mailbox.base_h) << 32); + + return 0; +} + static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8 priority) { u64 doorbell; @@ -826,12 +916,10 @@ static int qm_get_vft_v2(struct hisi_qm *qm, u32 *base, u32 *number) u64 sqc_vft; int ret; - ret = qm_mb(qm, QM_MB_CMD_SQC_VFT_V2, 0, 0, 1); + ret = hisi_qm_mb_read(qm, &sqc_vft, QM_MB_CMD_SQC_VFT_V2, 0); if (ret) return ret; - sqc_vft = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) | - ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) << 32); *base = QM_SQC_VFT_BASE_MASK_V2 & (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2); *number = (QM_SQC_VFT_NUM_MASK_v2 & (sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1; @@ -1144,12 +1232,12 @@ static int dump_show(struct hisi_qm *qm, void *info, static int qm_dump_sqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id) { - return qm_mb(qm, QM_MB_CMD_SQC, dma_addr, qp_id, 1); + return hisi_qm_mb_write(qm, QM_MB_CMD_SQC, dma_addr, qp_id, 1); } static int qm_dump_cqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id) { - return qm_mb(qm, QM_MB_CMD_CQC, dma_addr, qp_id, 1); + return hisi_qm_mb_write(qm, QM_MB_CMD_CQC, dma_addr, qp_id, 1); } static int qm_sqc_dump(struct hisi_qm *qm, const char *s) @@ -1260,7 +1348,7 @@ static int qm_eqc_dump(struct hisi_qm *qm, char *s) if (IS_ERR(eqc)) return PTR_ERR(eqc); - ret = qm_mb(qm, QM_MB_CMD_EQC, eqc_dma, 0, 1); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_EQC, eqc_dma, 0, 1); if (ret) goto mailbox_fail; @@ -1289,7 +1377,7 @@ static int qm_aeqc_dump(struct hisi_qm *qm, char *s) if (IS_ERR(aeqc)) return PTR_ERR(aeqc); - ret = qm_mb(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 1); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 1); if (ret) goto mailbox_fail; @@ -1834,7 +1922,7 @@ static int qm_sq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) sqc->cq_num = cpu_to_le16(qp_id); sqc->w13 = cpu_to_le16(QM_MK_SQC_W13(0, 1, qp->alg_type)); - ret = qm_mb(qm, QM_MB_CMD_SQC, sqc_dma, qp_id, 0); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_SQC, sqc_dma, qp_id, 0); dma_unmap_single(dev, sqc_dma, sizeof(struct qm_sqc), DMA_TO_DEVICE); kfree(sqc); @@ -1874,7 +1962,7 @@ static int qm_cq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) cqc->dw6 = cpu_to_le32(1 << QM_CQ_PHASE_SHIFT | qp->c_flag << QM_CQ_FLAG_SHIFT); - ret = qm_mb(qm, QM_MB_CMD_CQC, cqc_dma, qp_id, 0); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_CQC, cqc_dma, qp_id, 0); dma_unmap_single(dev, cqc_dma, sizeof(struct qm_cqc), DMA_TO_DEVICE); kfree(cqc); @@ -2894,7 +2982,7 @@ static int qm_eq_ctx_cfg(struct hisi_qm *qm) if (qm->ver == QM_HW_V1) eqc->dw3 = cpu_to_le32(QM_EQE_AEQE_SIZE); eqc->dw6 = cpu_to_le32((QM_EQ_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - ret = qm_mb(qm, QM_MB_CMD_EQC, eqc_dma, 0, 0); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_EQC, eqc_dma, 0, 0); dma_unmap_single(dev, eqc_dma, sizeof(struct qm_eqc), DMA_TO_DEVICE); kfree(eqc); @@ -2921,7 +3009,7 @@ static int qm_aeq_ctx_cfg(struct hisi_qm *qm) aeqc->base_l = cpu_to_le32(lower_32_bits(qm->aeqe_dma)); aeqc->base_h = cpu_to_le32(upper_32_bits(qm->aeqe_dma)); aeqc->dw6 = cpu_to_le32((QM_Q_DEPTH - 1) | (1 << QM_EQC_PHASE_SHIFT)); - ret = qm_mb(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 0); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 0); dma_unmap_single(dev, aeqc_dma, sizeof(struct qm_aeqc), DMA_TO_DEVICE); kfree(aeqc); @@ -2969,11 +3057,11 @@ static int __hisi_qm_start(struct hisi_qm *qm) if (ret) return ret; - ret = qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0); if (ret) return ret; - ret = qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0); + ret = hisi_qm_mb_write(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0); if (ret) return ret; @@ -3194,7 +3282,6 @@ static ssize_t qm_status_read(struct file *filp, char __user *buffer, return (*pos = len); } - static const struct file_operations qm_status_fops = { .owner = THIS_MODULE, .open = simple_open, -- 2.30.0

2 1

[PATCH V5 OLK-5.10] soc: hisilicon: hisi_hbmdev: Add hbm acls repair and query methods
by Zhang Zekun 06 Nov '23

06 Nov '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8CCP5 ---------------------------------- Hbm memory device add support for acls hot repair. The patch add two methods for userpace: - query a paddr if it support acls repair - repair a paddr in hbm memory device The feature of ACLS hot repair can help to fix a memory error from userspace by passing through the error physical address to HBM hardware. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- drivers/soc/hisilicon/Kconfig | 10 +++ drivers/soc/hisilicon/hisi_hbmdev.c | 134 ++++++++++++++++++++++++++++ 2 files changed, 144 insertions(+) diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index 383375f20cac..45754528873e 100644 --- a/drivers/soc/hisilicon/Kconfig +++ b/drivers/soc/hisilicon/Kconfig @@ -44,4 +44,14 @@ config HISI_HBMCACHE To compile the driver as a module, choose M here: the module will be called hisi_hbmcache. +config HISI_HBMDEV_ACLS + bool "Add support for HISI ACLS repair" + depends on HISI_HBMDEV + help + Add ACLS support for hbm device, which can be used to query and + repair hardware error in HBM devices. This feature need to work with + hardware firmwares. + + If not sure say no. + endmenu diff --git a/drivers/soc/hisilicon/hisi_hbmdev.c b/drivers/soc/hisilicon/hisi_hbmdev.c index 5b6b1618148c..a9cc78bde81b 100644 --- a/drivers/soc/hisilicon/hisi_hbmdev.c +++ b/drivers/soc/hisilicon/hisi_hbmdev.c @@ -11,6 +11,7 @@ #include <linux/node.h> #include <linux/arch_topology.h> #include <linux/memory_hotplug.h> +#include <linux/mm.h> #include "hisi_internal.h" @@ -25,6 +26,9 @@ struct cdev_node { struct memory_dev { struct kobject *memdev_kobj; struct kobject *topo_kobj; +#ifdef CONFIG_HISI_HBMDEV_ACLS + struct kobject *acls_kobj; +#endif struct cdev_node cdev_list; nodemask_t cluster_cpumask[MAX_NUMNODES]; }; @@ -85,6 +89,134 @@ static void memory_topo_init(void) kobject_put(mdev->topo_kobj); } +#ifdef CONFIG_HISI_HBMDEV_ACLS +static struct acpi_device *paddr_to_acpi_device(u64 paddr) +{ + unsigned long pfn; + int nid; + + pfn = __phys_to_pfn(paddr); + if (!pfn_valid(pfn)) + return NULL; + + nid = pfn_to_nid(pfn); + if (nid < 0 && nid >= MAX_NUMNODES) + return NULL; + + return hotplug_mdev[nid]; +} + +static ssize_t acls_query_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AQRY", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + /* AQRY will return a positive error code to represent error status */ + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} + +static struct kobj_attribute acls_query_store_attribute = + __ATTR(acls_query, 0200, NULL, acls_query_store); + +static ssize_t acls_repair_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AREP", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + /* AREP will return a positive error code to represent error status */ + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} +static struct kobj_attribute acls_repair_store_attribute = + __ATTR(acls_repair, 0200, NULL, acls_repair_store); + +static struct attribute *acls_attrs[] = { + &acls_query_store_attribute.attr, + &acls_repair_store_attribute.attr, + NULL, +}; + +static struct attribute_group acls_attr_group = { + .attrs = acls_attrs, +}; + +static void acls_init(void) +{ + int ret = -ENOMEM; + + mdev->acls_kobj = kobject_create_and_add("acls", mdev->memdev_kobj); + if (!mdev->acls_kobj) + goto out; + + ret = sysfs_create_group(mdev->acls_kobj, &acls_attr_group); + if (ret) + kobject_put(mdev->acls_kobj); + +out: + if (ret) + pr_err("ACLS hot repair is not enabled\n"); +} + +static void acls_remove(void) +{ + kobject_put(mdev->acls_kobj); +} +#else +static void acls_init(void) {} +static void acls_remove(void) {} +#endif + static int get_pxm(struct acpi_device *acpi_device, void *arg) { acpi_handle handle = acpi_device->handle; @@ -284,6 +416,7 @@ static int __init mdev_init(void) } memory_topo_init(); + acls_init(); return ret; } module_init(mdev_init); @@ -293,6 +426,7 @@ static void __exit mdev_exit(void) container_remove(); kobject_put(mdev->memdev_kobj); kobject_put(mdev->topo_kobj); + acls_remove(); kfree(mdev); } module_exit(mdev_exit); -- 2.17.1

3 2

[PATCH v4 OLK-5.10 00/29] Introduce some vdpa ops to support vdpa device live migrate
by Jiang Dongxu 06 Nov '23

06 Nov '23

From: jiangdongxu <jiangdongxu1(a)huawei.com> Patch 1-17: some bugfix and ops intruduced by upstream Patch 18-19: introduce vdpa device logging ops Patch 20-21: introduce vdpa device state ops Patch 22-23: introduce vdpa device migrate state ops Patch 24: introduce new vhost feature BYTEPMAPLOG Patch 25: export iommu_get_resv_regions/iommu_set_resv_regions Patch 26-27: some optimization about vhost-vdpa Patch 28: add vdpa/vhost-vdpa build config Patch 29: fix vhost-vdpa compile warnings Arnaldo Carvalho de Melo (1): tools include UAPI: Sync linux/vhost.h with the kernel sources Cindy Lu (2): vhost_vdpa: fix the crash in unmap a large memory vhost_vdpa: fix unmap process in no-batch mode Eugenio Pérez (1): vdpa: add get_backend_features vdpa operation Gautam Dawar (1): vhost-vdpa: free iommu domain after last use during cleanup Greg Kroah-Hartman (1): vhost-vdpa: vhost_vdpa_alloc_domain() should be using a const struct bus_type * Jason Gunthorpe (1): PCI/IOV: Add pci_iov_vf_id() to get VF index Sebastien Boeuf (3): vdpa: Add resume operation vhost-vdpa: Introduce RESUME backend feature bit vhost-vdpa: uAPI to resume the device Shannon Nelson (2): vhost_vdpa: tell vqs about the negotiated vhost_vdpa: support PACKED when setting-getting vring_base Shunsuke Mie (1): virtio: fix virtio transitional ids Stefano Garzarella (3): vhost-vdpa: fix an iotlb memory leak vdpa: add bind_mm/unbind_mm callbacks vhost-vdpa: use bind_mm/unbind_mm device callbacks Zhu Lingshan (1): virtio: update virtio id table, add transitional ids jiangdongxu (12): vdpa: add log operations vhost-vdpa: add uAPI for logging vdpa: add device state operations vhost-vdpa: add uAPI for device buffer vdpa: add vdpa device migration status ops vhost-vdpa: add uAPI for device migration status vhost: add VHOST feature VHOST_BACKEND_F_BYTEMAPLOG export iommu_get_resv_regions and iommu_set_resv_regions vhost-vdpa: Allow transparent MSI IOV vhost-vdpa: fix msi irq request err arm64: openeuler_defconfig: add VDPA config vhost-vdpa: fix compile warnings arch/arm64/configs/openeuler_defconfig | 6 +- drivers/iommu/iommu.c | 2 + drivers/pci/iov.c | 14 + drivers/vhost/vdpa.c | 383 ++++++++++++++++++++++--- include/linux/pci.h | 8 +- include/linux/vdpa.h | 60 +++- include/uapi/linux/vhost.h | 20 ++ include/uapi/linux/vhost_types.h | 21 ++ include/uapi/linux/virtio_ids.h | 12 + tools/include/uapi/linux/vhost.h | 8 + 10 files changed, 489 insertions(+), 45 deletions(-) -- 2.27.0

2 30

[PATCH V4 OLK-5.10] soc: hisilicon: hisi_hbmdev: Add hbm acls repair and query methods
by Zhang Zekun 06 Nov '23

06 Nov '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8CCP5 ---------------------------------- Hbm memory device add support for acls hot repair. The patch add two methods for userpace: - query a paddr if it support acls repair - repair a paddr in hbm memory device The feature of ACLS hot repair can help to fix a memory error from userspace by passing through the error physical address to HBM hardware. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- v4 - remove "Offering HULK" to pass CI patch format check v3 - Add #include <linux/mm.h> to avoid compile failure in arch x86_64 v2: - Change "u64 pfn" ot "unsigned long pfn" to avoid compile error - Add two stub to pretiffy the code drivers/soc/hisilicon/Kconfig | 10 +++ drivers/soc/hisilicon/hisi_hbmdev.c | 132 ++++++++++++++++++++++++++++ 2 files changed, 142 insertions(+) diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index 383375f20cac..45754528873e 100644 --- a/drivers/soc/hisilicon/Kconfig +++ b/drivers/soc/hisilicon/Kconfig @@ -44,4 +44,14 @@ config HISI_HBMCACHE To compile the driver as a module, choose M here: the module will be called hisi_hbmcache. +config HISI_HBMDEV_ACLS + bool "Add support for HISI ACLS repair" + depends on HISI_HBMDEV + help + Add ACLS support for hbm device, which can be used to query and + repair hardware error in HBM devices. This feature need to work with + hardware firmwares. + + If not sure say no. + endmenu diff --git a/drivers/soc/hisilicon/hisi_hbmdev.c b/drivers/soc/hisilicon/hisi_hbmdev.c index 5b6b1618148c..8348abc6144c 100644 --- a/drivers/soc/hisilicon/hisi_hbmdev.c +++ b/drivers/soc/hisilicon/hisi_hbmdev.c @@ -11,6 +11,7 @@ #include <linux/node.h> #include <linux/arch_topology.h> #include <linux/memory_hotplug.h> +#include <linux/mm.h> #include "hisi_internal.h" @@ -25,6 +26,9 @@ struct cdev_node { struct memory_dev { struct kobject *memdev_kobj; struct kobject *topo_kobj; +#ifdef CONFIG_HISI_HBMDEV_ACLS + struct kobject *acls_kobj; +#endif struct cdev_node cdev_list; nodemask_t cluster_cpumask[MAX_NUMNODES]; }; @@ -85,6 +89,132 @@ static void memory_topo_init(void) kobject_put(mdev->topo_kobj); } +#ifdef CONFIG_HISI_HBMDEV_ACLS +static struct acpi_device *paddr_to_acpi_device(u64 paddr) +{ + unsigned long pfn; + int nid; + + pfn = __phys_to_pfn(paddr); + if (!pfn_valid(pfn)) + return NULL; + + nid = pfn_to_nid(pfn); + if (nid < 0 && nid >= MAX_NUMNODES) + return NULL; + + return hotplug_mdev[nid]; +} + +static ssize_t acls_query_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AQRY", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} + +static struct kobj_attribute acls_query_store_attribute = + __ATTR(acls_query, 0200, NULL, acls_query_store); + +static ssize_t acls_repair_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AREP", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} +static struct kobj_attribute acls_repair_store_attribute = + __ATTR(acls_repair, 0200, NULL, acls_repair_store); + +static struct attribute *acls_attrs[] = { + &acls_query_store_attribute.attr, + &acls_repair_store_attribute.attr, + NULL, +}; + +static struct attribute_group acls_attr_group = { + .attrs = acls_attrs, +}; + +static void acls_init(void) +{ + int ret = -ENOMEM; + + mdev->acls_kobj = kobject_create_and_add("acls", mdev->memdev_kobj); + if (!mdev->acls_kobj) + goto out; + + ret = sysfs_create_group(mdev->acls_kobj, &acls_attr_group); + if (ret) + kobject_put(mdev->acls_kobj); + +out: + if (ret) + pr_err("ACLS hot repair is not enabled\n"); +} + +static void acls_remove(void) +{ + kobject_put(mdev->acls_kobj); +} +#else +static void acls_init(void) {} +static void acls_remove(void) {} +#endif + static int get_pxm(struct acpi_device *acpi_device, void *arg) { acpi_handle handle = acpi_device->handle; @@ -284,6 +414,7 @@ static int __init mdev_init(void) } memory_topo_init(); + acls_init(); return ret; } module_init(mdev_init); @@ -293,6 +424,7 @@ static void __exit mdev_exit(void) container_remove(); kobject_put(mdev->memdev_kobj); kobject_put(mdev->topo_kobj); + acls_remove(); kfree(mdev); } module_exit(mdev_exit); -- 2.17.1

2 1

[PATCH V3 OLK-5.10] soc: hisilicon: hisi_hbmdev: Add hbm acls repair and query methods
by Zhang Zekun 06 Nov '23

06 Nov '23

Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8CCP5 ---------------------------------- Hbm memory device add support for acls hot repair. The patch add two methods for userpace: - query a paddr if it support acls repair - repair a paddr in hbm memory device The feature of ACLS hot repair can help to fix a memory error from userspace by passing through the error physical address to HBM hardware. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- v3 - Add #include <linux/mm.h> to avoid compile failure in arch x86_64 v2: - Change "u64 pfn" ot "unsigned long pfn" to avoid compile error - Add two stub to pretiffy the code drivers/soc/hisilicon/Kconfig | 10 +++ drivers/soc/hisilicon/hisi_hbmdev.c | 132 ++++++++++++++++++++++++++++ 2 files changed, 142 insertions(+) diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index 383375f20cac..45754528873e 100644 --- a/drivers/soc/hisilicon/Kconfig +++ b/drivers/soc/hisilicon/Kconfig @@ -44,4 +44,14 @@ config HISI_HBMCACHE To compile the driver as a module, choose M here: the module will be called hisi_hbmcache. +config HISI_HBMDEV_ACLS + bool "Add support for HISI ACLS repair" + depends on HISI_HBMDEV + help + Add ACLS support for hbm device, which can be used to query and + repair hardware error in HBM devices. This feature need to work with + hardware firmwares. + + If not sure say no. + endmenu diff --git a/drivers/soc/hisilicon/hisi_hbmdev.c b/drivers/soc/hisilicon/hisi_hbmdev.c index 5b6b1618148c..8348abc6144c 100644 --- a/drivers/soc/hisilicon/hisi_hbmdev.c +++ b/drivers/soc/hisilicon/hisi_hbmdev.c @@ -11,6 +11,7 @@ #include <linux/node.h> #include <linux/arch_topology.h> #include <linux/memory_hotplug.h> +#include <linux/mm.h> #include "hisi_internal.h" @@ -25,6 +26,9 @@ struct cdev_node { struct memory_dev { struct kobject *memdev_kobj; struct kobject *topo_kobj; +#ifdef CONFIG_HISI_HBMDEV_ACLS + struct kobject *acls_kobj; +#endif struct cdev_node cdev_list; nodemask_t cluster_cpumask[MAX_NUMNODES]; }; @@ -85,6 +89,132 @@ static void memory_topo_init(void) kobject_put(mdev->topo_kobj); } +#ifdef CONFIG_HISI_HBMDEV_ACLS +static struct acpi_device *paddr_to_acpi_device(u64 paddr) +{ + unsigned long pfn; + int nid; + + pfn = __phys_to_pfn(paddr); + if (!pfn_valid(pfn)) + return NULL; + + nid = pfn_to_nid(pfn); + if (nid < 0 && nid >= MAX_NUMNODES) + return NULL; + + return hotplug_mdev[nid]; +} + +static ssize_t acls_query_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AQRY", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} + +static struct kobj_attribute acls_query_store_attribute = + __ATTR(acls_query, 0200, NULL, acls_query_store); + +static ssize_t acls_repair_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AREP", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} +static struct kobj_attribute acls_repair_store_attribute = + __ATTR(acls_repair, 0200, NULL, acls_repair_store); + +static struct attribute *acls_attrs[] = { + &acls_query_store_attribute.attr, + &acls_repair_store_attribute.attr, + NULL, +}; + +static struct attribute_group acls_attr_group = { + .attrs = acls_attrs, +}; + +static void acls_init(void) +{ + int ret = -ENOMEM; + + mdev->acls_kobj = kobject_create_and_add("acls", mdev->memdev_kobj); + if (!mdev->acls_kobj) + goto out; + + ret = sysfs_create_group(mdev->acls_kobj, &acls_attr_group); + if (ret) + kobject_put(mdev->acls_kobj); + +out: + if (ret) + pr_err("ACLS hot repair is not enabled\n"); +} + +static void acls_remove(void) +{ + kobject_put(mdev->acls_kobj); +} +#else +static void acls_init(void) {} +static void acls_remove(void) {} +#endif + static int get_pxm(struct acpi_device *acpi_device, void *arg) { acpi_handle handle = acpi_device->handle; @@ -284,6 +414,7 @@ static int __init mdev_init(void) } memory_topo_init(); + acls_init(); return ret; } module_init(mdev_init); @@ -293,6 +424,7 @@ static void __exit mdev_exit(void) container_remove(); kobject_put(mdev->memdev_kobj); kobject_put(mdev->topo_kobj); + acls_remove(); kfree(mdev); } module_exit(mdev_exit); -- 2.17.1

3 2

[PATCH openEuler-1.0-LTS 0/2] Sync LTS patches for openEuler-1.0-LTS
by Zhang Zekun 06 Nov '23

06 Nov '23

Sync 4.19 LTS patches for openEuler-1.0-LTS 1. Fix the wrong alloc flags which may cause sleep in atomic. 2. Fix the wrong bit number. Add a cover letter for these patches, which can help to generate PR. Dan Carpenter (1): regmap: rbtree: Use alloc_flags for memory allocations Richard Fitzgerald (1): regmap: rbtree: Fix wrong register marked as in-cache when creating new node drivers/base/regmap/regcache-rbtree.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) -- 2.17.1

2 3

[PATCH openEuler-22.03-LTS 0/4] fix memcgv1 oom meminfo bug
by Lu Jialin 06 Nov '23

06 Nov '23

Sergey Senozhatsky (1): seq_buf: Add seq_buf_do_printk() helper Steven Rostedt (VMware) (1): seq_buf: Add seq_buf_terminate() API Yosry Ahmed (2): memcg: use seq_buf_do_printk() with mem_cgroup_print_oom_meminfo() memcg: dump memory.stat during cgroup OOM for v1 include/linux/seq_buf.h | 27 +++++++++++++ lib/seq_buf.c | 32 +++++++++++++++ mm/memcontrol.c | 87 +++++++++++++++++++++++------------------ 3 files changed, 108 insertions(+), 38 deletions(-) -- 2.34.1

2 5

[PATCH OLK-5.10 0/5] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 04 Nov '23

04 Nov '23

From: “z00447436” <zouqian4(a)huawei.com> Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (3): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table Shuai Xue (1): ACPI: APEI: explicit init of HEST and GHES in apci_init() arch/arm64/include/asm/sdei.h | 6 ++ arch/arm64/kernel/entry.S | 27 +++++++- arch/arm64/kernel/sdei.c | 3 + arch/arm64/kernel/smp.c | 8 +-- drivers/acpi/apei/ghes.c | 19 +++--- drivers/acpi/arm64/Kconfig | 10 +++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/arm64/agdi.c | 116 ++++++++++++++++++++++++++++++++++ drivers/acpi/bus.c | 4 ++ drivers/acpi/pci_root.c | 3 - drivers/acpi/tables.c | 2 +- drivers/firmware/Kconfig | 1 + drivers/firmware/arm_sdei.c | 32 ++++++---- include/acpi/actbl2.h | 20 ++++++ include/acpi/apei.h | 4 +- include/linux/acpi_agdi.h | 13 ++++ include/linux/arm_sdei.h | 4 ++ 17 files changed, 240 insertions(+), 33 deletions(-) create mode 100644 drivers/acpi/arm64/agdi.c create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 6

[PATCH OLK-5.10 0/4] AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
by Qian Zou 04 Nov '23

04 Nov '23

From: “z00447436” <zouqian4(a)huawei.com> Some use-cases, such as system management, require the ability to generate a non-maskable event to the OS to request the OS kernel to perform a diagnostic dump and reset the system. Arm Generic Diagnostic Dump and Reset device enables a maintainer to request OS to perform a diagnostic dump and reset a system via SDEI event or an interrupt. This patch implements SDEI path and discards the interrupted context before proceeding to the crash kernel. D Scott Phillips (1): arm64: sdei: abort running SDEI handlers during crash Ilkka Koskinen (3): ACPI: tables: Add AGDI to the list of known table signatures ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPICA: iASL: Add suppport for AGDI table arch/arm64/include/asm/sdei.h | 6 ++++++ arch/arm64/kernel/entry.S | 27 +++++++++++++++++++++++++-- arch/arm64/kernel/sdei.c | 3 +++ arch/arm64/kernel/smp.c | 8 ++++---- drivers/acpi/arm64/Kconfig | 10 ++++++++++ drivers/acpi/arm64/Makefile | 1 + drivers/acpi/bus.c | 2 ++ drivers/acpi/tables.c | 2 +- drivers/firmware/arm_sdei.c | 19 +++++++++++++++++++ include/acpi/actbl2.h | 20 ++++++++++++++++++++ include/linux/acpi_agdi.h | 13 +++++++++++++ include/linux/arm_sdei.h | 2 ++ 12 files changed, 106 insertions(+), 7 deletions(-) create mode 100644 include/linux/acpi_agdi.h -- 2.33.0

2 5

[PATCH OLK-5.10 00/42] add error handling for add disk
by Li Nan 03 Nov '23

03 Nov '23

Christoph Hellwig (14): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Li Lingfeng (1): nbd: fix uaf in nbd_open Luis Chamberlain (10): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Tetsuo Handa (1): block: check minor range in device_add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Wen Yang (1): Revert "Revert "block: nbd: add sanity check for first_minor"" Yu Kuai (3): nbd: fix max value for 'first_minor' nbd: fix possible overflow for 'first_minor' in nbd_dev_add() block: fix memory leak for elevator on add_disk failure Zhang Wensheng (1): nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Zhong Jinghua (9): nbd: Reorganize the messy commit log about the first_minor check block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup block/blk.h | 5 +- include/linux/blk-mq.h | 12 ++ include/linux/genhd.h | 36 +++++ block/blk-integrity.c | 12 +- block/blk-mq.c | 19 +++ block/genhd.c | 286 ++++++++++++++++++++++++++------------- drivers/block/brd.c | 103 ++++++-------- drivers/block/loop.c | 26 ++-- drivers/block/nbd.c | 75 ++++------ drivers/mtd/ubi/block.c | 76 +++++------ drivers/nvme/host/core.c | 41 +++--- drivers/scsi/sr.c | 7 +- 12 files changed, 413 insertions(+), 285 deletions(-) -- 2.39.2

1 42

[PATCH OLK-5.10 01/42] nbd: Reorganize the messy commit log about the first_minor check
by Li Nan 03 Nov '23

03 Nov '23

From: Zhong Jinghua <zhongjinghua(a)huawei.com> hulk inclusion category: bugfix bugzilla: 188217 ---------------------------------------- Commits on our branch with check issues on first_minor had serious confusion, resulting in duplicate checks in nbd_dev_add and nbd_genl_connect. So I revert the messy commit, backport lts patch. Revert 881885f30261 [Backport] nbd: fix max value for 'first_minor' Revert 4d759cee3f18 [Backport] Revert "Revert "block: nbd: add sanity check for first_minor"" Revert b95487500a4f [Huawei] nbd: fix assignment error for first_minor in nbd_dev_add Revert 93c4218b2f4d [Backport] nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Revert 60141517d289 nbd: Fix use-after-free in blk_mq_free_rqs Revert 98d3ad1d2589 nbd: add sanity check for first_minor Signed-off-by: Zhong Jinghua <zhongjinghua(a)huawei.com> Signed-off-by: Li Nan <linan122(a)huawei.com> --- drivers/block/nbd.c | 24 +----------------------- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index fa08f380f5be..29884a0f1fca 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1875,17 +1875,7 @@ static int nbd_dev_add(int index) refcount_set(&nbd->refs, 1); INIT_LIST_HEAD(&nbd->list); disk->major = NBD_MAJOR; - - /* Too big first_minor can cause duplicate creation of - * sysfs files/links, since index << part_shift might overflow, or - * MKDEV() expect that the max bits of first_minor is 20. - */ disk->first_minor = index << part_shift; - if (disk->first_minor < index || disk->first_minor > MINORMASK) { - err = -EINVAL; - goto out_free_idr; - } - disk->fops = &nbd_fops; disk->private_data = nbd; sprintf(disk->disk_name, "nbd%d", index); @@ -1972,20 +1962,8 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) if (!netlink_capable(skb, CAP_SYS_ADMIN)) return -EPERM; - if (info->attrs[NBD_ATTR_INDEX]) { + if (info->attrs[NBD_ATTR_INDEX]) index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]); - - /* - * Too big first_minor can cause duplicate creation of - * sysfs files/links, since index << part_shift might - * overflow, or MKDEV() expect that the max bits of - * first_minor is 20. - */ - if (index < 0 || index > MINORMASK >> part_shift) { - printk(KERN_ERR "nbd: illegal input index %d\n", index); - return -EINVAL; - } - } if (!info->attrs[NBD_ATTR_SOCKETS]) { printk(KERN_ERR "nbd: must specify at least one socket\n"); return -EINVAL; -- 2.39.2

1 3

[PATCH RESEND openEuler-1.0-LTS 0/2] Sync LTS patches for openEuler-1.0-LTS
by Zhang Zekun 03 Nov '23

03 Nov '23

Sync 4.19 LTS patches for openEuler-1.0-LTS 1. Fix the wrong alloc flags which may cause sleep in atomic. 2. Fix the wrong bit number. Add a cover letter for these patches, which can help to generate PR. Dan Carpenter (1): regmap: rbtree: Use alloc_flags for memory allocations Richard Fitzgerald (1): regmap: rbtree: Fix wrong register marked as in-cache when creating new node drivers/base/regmap/regcache-rbtree.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) -- 2.17.1

1 2

[PATCH V2 OLK-5.10] soc: hisilicon: hisi_hbmdev: Add hbm acls repair and query methods
by Zhang Zekun 03 Nov '23

03 Nov '23

Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8CCP5 ---------------------------------- Hbm memory device add support for acls hot repair. The patch add two methods for userpace: - query a paddr if it support acls repair - repair a paddr in hbm memory device The feature of ACLS hot repair can help to fix a memory error from userspace by passing through the error physical address to HBM hardware. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- v2: - Change "u64 pfn" ot "unsigned long pfn" to avoid compile error - Add two stub to pretiffy the code drivers/soc/hisilicon/Kconfig | 10 +++ drivers/soc/hisilicon/hisi_hbmdev.c | 131 ++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+) diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index 383375f20cac..45754528873e 100644 --- a/drivers/soc/hisilicon/Kconfig +++ b/drivers/soc/hisilicon/Kconfig @@ -44,4 +44,14 @@ config HISI_HBMCACHE To compile the driver as a module, choose M here: the module will be called hisi_hbmcache. +config HISI_HBMDEV_ACLS + bool "Add support for HISI ACLS repair" + depends on HISI_HBMDEV + help + Add ACLS support for hbm device, which can be used to query and + repair hardware error in HBM devices. This feature need to work with + hardware firmwares. + + If not sure say no. + endmenu diff --git a/drivers/soc/hisilicon/hisi_hbmdev.c b/drivers/soc/hisilicon/hisi_hbmdev.c index 5b6b1618148c..caa2711b2de7 100644 --- a/drivers/soc/hisilicon/hisi_hbmdev.c +++ b/drivers/soc/hisilicon/hisi_hbmdev.c @@ -25,6 +25,9 @@ struct cdev_node { struct memory_dev { struct kobject *memdev_kobj; struct kobject *topo_kobj; +#ifdef CONFIG_HISI_HBMDEV_ACLS + struct kobject *acls_kobj; +#endif struct cdev_node cdev_list; nodemask_t cluster_cpumask[MAX_NUMNODES]; }; @@ -85,6 +88,132 @@ static void memory_topo_init(void) kobject_put(mdev->topo_kobj); } +#ifdef CONFIG_HISI_HBMDEV_ACLS +static struct acpi_device* paddr_to_acpi_device(u64 paddr) +{ + unsigned long pfn; + int nid; + + pfn = __phys_to_pfn(paddr); + if (!pfn_valid(pfn)) + return NULL; + + nid = pfn_to_nid(pfn); + if (nid < 0 && nid >= MAX_NUMNODES) + return NULL; + + return hotplug_mdev[nid]; +} + +static ssize_t acls_query_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AQRY", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} + +static struct kobj_attribute acls_query_store_attribute = + __ATTR(acls_query, 0200, NULL, acls_query_store); + +static ssize_t acls_repair_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AREP", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} +static struct kobj_attribute acls_repair_store_attribute = + __ATTR(acls_repair, 0200, NULL, acls_repair_store); + +static struct attribute *acls_attrs[] = { + &acls_query_store_attribute.attr, + &acls_repair_store_attribute.attr, + NULL, +}; + +static struct attribute_group acls_attr_group = { + .attrs = acls_attrs, +}; + +static void acls_init(void) +{ + int ret = -ENOMEM; + + mdev->acls_kobj = kobject_create_and_add("acls", mdev->memdev_kobj); + if (!mdev->acls_kobj) + goto out; + + ret = sysfs_create_group(mdev->acls_kobj, &acls_attr_group); + if (ret) + kobject_put(mdev->acls_kobj); + +out: + if (ret) + pr_err("ACLS hot repair is not enabled\n"); +} + +static void acls_remove(void) +{ + kobject_put(mdev->acls_kobj); +} +#else +static void acls_init(void) {} +static void acls_remove(void) {} +#endif + static int get_pxm(struct acpi_device *acpi_device, void *arg) { acpi_handle handle = acpi_device->handle; @@ -284,6 +413,7 @@ static int __init mdev_init(void) } memory_topo_init(); + acls_init(); return ret; } module_init(mdev_init); @@ -293,6 +423,7 @@ static void __exit mdev_exit(void) container_remove(); kobject_put(mdev->memdev_kobj); kobject_put(mdev->topo_kobj); + acls_remove(); kfree(mdev); } module_exit(mdev_exit); -- 2.17.1

2 1

[PATCH openEuler-1.0-LTS 1/2] regmap: rbtree: Use alloc_flags for memory allocations
by Zhang Zekun 03 Nov '23

03 Nov '23

From: Dan Carpenter <dan.carpenter(a)linaro.org> stable inclusion from stable-v4.19.295 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8DDE2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ---------------------------------------------------------- [ Upstream commit 0c8b0bf42c8cef56f7cd9cd876fbb7ece9217064 ] The kunit tests discovered a sleeping in atomic bug. The allocations in the regcache-rbtree code should use the map->alloc_flags instead of GFP_KERNEL. [ 5.005510] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306 [ 5.005960] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 117, name: kunit_try_catch [ 5.006219] preempt_count: 1, expected: 0 [ 5.006414] 1 lock held by kunit_try_catch/117: [ 5.006590] #0: 833b9010 (regmap_kunit:86:(config)->lock){....}-{2:2}, at: regmap_lock_spinlock+0x14/0x1c [ 5.007493] irq event stamp: 162 [ 5.007627] hardirqs last enabled at (161): [<80786738>] crng_make_state+0x1a0/0x294 [ 5.007871] hardirqs last disabled at (162): [<80c531ec>] _raw_spin_lock_irqsave+0x7c/0x80 [ 5.008119] softirqs last enabled at (0): [<801110ac>] copy_process+0x810/0x2138 [ 5.008356] softirqs last disabled at (0): [<00000000>] 0x0 [ 5.008688] CPU: 0 PID: 117 Comm: kunit_try_catch Tainted: G N 6.4.4-rc3-g0e8d2fdfb188 #1 [ 5.009011] Hardware name: Generic DT based system [ 5.009277] unwind_backtrace from show_stack+0x18/0x1c [ 5.009497] show_stack from dump_stack_lvl+0x38/0x5c [ 5.009676] dump_stack_lvl from __might_resched+0x188/0x2d0 [ 5.009860] __might_resched from __kmem_cache_alloc_node+0x1dc/0x25c [ 5.010061] __kmem_cache_alloc_node from kmalloc_trace+0x30/0xc8 [ 5.010254] kmalloc_trace from regcache_rbtree_write+0x26c/0x468 [ 5.010446] regcache_rbtree_write from _regmap_write+0x88/0x140 [ 5.010634] _regmap_write from regmap_write+0x44/0x68 [ 5.010803] regmap_write from basic_read_write+0x8c/0x270 [ 5.010980] basic_read_write from kunit_try_run_case+0x48/0xa0 Fixes: 28644c809f44 ("regmap: Add the rbtree cache support") Reported-by: Guenter Roeck <linux(a)roeck-us.net> Closes: https://lore.kernel.org/all/ee59d128-413c-48ad-a3aa-d9d350c80042@roeck-us.n… Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org> Tested-by: Guenter Roeck <linux(a)roeck-us.net> Link: https://lore.kernel.org/r/58f12a07-5f4b-4a8f-ab84-0a42d1908cb9@moroto.mount… Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- drivers/base/regmap/regcache-rbtree.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/base/regmap/regcache-rbtree.c b/drivers/base/regmap/regcache-rbtree.c index b1e9aae9a5d0..f04007b388ea 100644 --- a/drivers/base/regmap/regcache-rbtree.c +++ b/drivers/base/regmap/regcache-rbtree.c @@ -291,14 +291,14 @@ static int regcache_rbtree_insert_to_block(struct regmap *map, blk = krealloc(rbnode->block, blklen * map->cache_word_size, - GFP_KERNEL); + map->alloc_flags); if (!blk) return -ENOMEM; if (BITS_TO_LONGS(blklen) > BITS_TO_LONGS(rbnode->blklen)) { present = krealloc(rbnode->cache_present, BITS_TO_LONGS(blklen) * sizeof(*present), - GFP_KERNEL); + map->alloc_flags); if (!present) { kfree(blk); return -ENOMEM; @@ -335,7 +335,7 @@ regcache_rbtree_node_alloc(struct regmap *map, unsigned int reg) const struct regmap_range *range; int i; - rbnode = kzalloc(sizeof(*rbnode), GFP_KERNEL); + rbnode = kzalloc(sizeof(*rbnode), map->alloc_flags); if (!rbnode) return NULL; @@ -361,13 +361,13 @@ regcache_rbtree_node_alloc(struct regmap *map, unsigned int reg) } rbnode->block = kmalloc_array(rbnode->blklen, map->cache_word_size, - GFP_KERNEL); + map->alloc_flags); if (!rbnode->block) goto err_free; rbnode->cache_present = kcalloc(BITS_TO_LONGS(rbnode->blklen), sizeof(*rbnode->cache_present), - GFP_KERNEL); + map->alloc_flags); if (!rbnode->cache_present) goto err_free_block; -- 2.17.1

2 2

[PATCH OLK-5.10] soc: hisilicon: hisi_hbmdev: Add hbm acls repair and query methods
by Zhang Zekun 03 Nov '23

03 Nov '23

Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8CCP5 ---------------------------------- Hbm memory device add support for acls hot repair. The patch add two methods for userpace: - query a paddr if it support acls repair - repair a paddr in hbm memory device The feature of ACLS hot repair can help to fix a memory error from userspace by passing through the error physical address to HBM hardware. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> --- drivers/soc/hisilicon/Kconfig | 10 +++ drivers/soc/hisilicon/hisi_hbmdev.c | 134 ++++++++++++++++++++++++++++ 2 files changed, 144 insertions(+) diff --git a/drivers/soc/hisilicon/Kconfig b/drivers/soc/hisilicon/Kconfig index 383375f20cac..45754528873e 100644 --- a/drivers/soc/hisilicon/Kconfig +++ b/drivers/soc/hisilicon/Kconfig @@ -44,4 +44,14 @@ config HISI_HBMCACHE To compile the driver as a module, choose M here: the module will be called hisi_hbmcache. +config HISI_HBMDEV_ACLS + bool "Add support for HISI ACLS repair" + depends on HISI_HBMDEV + help + Add ACLS support for hbm device, which can be used to query and + repair hardware error in HBM devices. This feature need to work with + hardware firmwares. + + If not sure say no. + endmenu diff --git a/drivers/soc/hisilicon/hisi_hbmdev.c b/drivers/soc/hisilicon/hisi_hbmdev.c index 5b6b1618148c..5cde7a123033 100644 --- a/drivers/soc/hisilicon/hisi_hbmdev.c +++ b/drivers/soc/hisilicon/hisi_hbmdev.c @@ -26,6 +26,9 @@ struct memory_dev { struct kobject *memdev_kobj; struct kobject *topo_kobj; struct cdev_node cdev_list; +#ifdef CONFIG_HISI_HBMDEV_ACLS + struct kobject *acls_kobj; +#endif nodemask_t cluster_cpumask[MAX_NUMNODES]; }; @@ -85,6 +88,131 @@ static void memory_topo_init(void) kobject_put(mdev->topo_kobj); } +#ifdef CONFIG_HISI_HBMDEV_ACLS +static struct acpi_device* paddr_to_acpi_device(u64 paddr) +{ + int nid; + u64 pfn; + + pfn = __phys_to_pfn(paddr); + if (!pfn_valid(pfn)) + return NULL; + + nid = pfn_to_nid(pfn); + if (nid < 0 && nid >= MAX_NUMNODES) + return NULL; + + return hotplug_mdev[nid]; +} + +static ssize_t acls_query_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AQRY", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} + +static struct kobj_attribute acls_query_store_attribute = + __ATTR(acls_query, 0200, NULL, acls_query_store); + +static ssize_t acls_repair_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct acpi_object_list arg_list; + struct acpi_device *adev; + union acpi_object obj; + acpi_status status; + u64 paddr, res; + + if (kstrtoull(buf, 16, &paddr)) + return -EINVAL; + + adev = paddr_to_acpi_device(paddr); + if (!adev) + return -EINVAL; + + obj.type = ACPI_TYPE_INTEGER; + obj.integer.value = paddr; + arg_list.count = 1; + arg_list.pointer = &obj; + + status = acpi_evaluate_integer(adev->handle, "AREP", &arg_list, &res); + if (ACPI_FAILURE(status)) + return -ENODEV; + + if (IS_ERR_VALUE(-res)) + return -res; + else if (res) + return -ENODEV; + + return count; +} +static struct kobj_attribute acls_repair_store_attribute = + __ATTR(acls_repair, 0200, NULL, acls_repair_store); + +static struct attribute *acls_attrs[] = { + &acls_query_store_attribute.attr, + &acls_repair_store_attribute.attr, + NULL, +}; + +static struct attribute_group acls_attr_group = { + .attrs = acls_attrs, +}; + +static void acls_init(void) +{ + int ret = -ENOMEM; + + mdev->acls_kobj = kobject_create_and_add("acls", mdev->memdev_kobj); + if (!mdev->acls_kobj) + goto out; + + ret = sysfs_create_group(mdev->acls_kobj, &acls_attr_group); + if (ret) + kobject_put(mdev->acls_kobj); + +out: + if (ret) + pr_err("ACLS hot repair is not enabled\n"); + + return; +} + +static void acls_remove(void) +{ + kobject_put(mdev->acls_kobj); +} +#endif + static int get_pxm(struct acpi_device *acpi_device, void *arg) { acpi_handle handle = acpi_device->handle; @@ -284,6 +412,9 @@ static int __init mdev_init(void) } memory_topo_init(); +#ifdef CONFIG_HISI_HBMDEV_ACLS + acls_init(); +#endif return ret; } module_init(mdev_init); @@ -293,6 +424,9 @@ static void __exit mdev_exit(void) container_remove(); kobject_put(mdev->memdev_kobj); kobject_put(mdev->topo_kobj); +#ifdef CONFIG_HISI_HBMDEV_ACLS + acls_remove(); +#endif kfree(mdev); } module_exit(mdev_exit); -- 2.17.1

2 1

[PATCH OLK-5.10 v2] perf auxtrace ptt: Record whether an auxtrace mmap is needed
by Junhao He 03 Nov '23

03 Nov '23

driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7HI80 CVE: NA ---------------------------------------------------------------------- Add a flag needs_auxtrace_mmap to record whether an auxtrace mmap is needed, in preparation for correctly determining whether or not an auxtrace mmap is needed. The change was added in the mainline [1], but it was deleted when upstream openEuler for fixes build error. openEuler OLK-5.10 now has merge evsel->needs_auxtrace_mmap. Therefore, backport this modification to OpenEuler OLK-5.10. [1] commit 057381a7ece1 ("perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver") Signed-off-by: Junhao He <hejunhao3(a)huawei.com> --- tools/perf/arch/arm64/util/hisi-ptt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/arch/arm64/util/hisi-ptt.c b/tools/perf/arch/arm64/util/hisi-ptt.c index 110b2edf3e6b..97696b6e1dd5 100644 --- a/tools/perf/arch/arm64/util/hisi-ptt.c +++ b/tools/perf/arch/arm64/util/hisi-ptt.c @@ -113,6 +113,7 @@ static int hisi_ptt_recording_options(struct auxtrace_record *itr, } evsel->core.attr.freq = 0; evsel->core.attr.sample_period = 1; + evsel->needs_auxtrace_mmap = true; hisi_ptt_evsel = evsel; opts->full_auxtrace = true; } -- 2.33.0

2 1

[PATCH OLK-5.10 00/13] arm64: Add sysreg header generation scripting
by Junhao He 03 Nov '23

03 Nov '23

The arm64 kernel requires some metadata for each system register it may need to access. Currently we have: * A SYS_<regname> definition which sorresponds to a sys_reg() macro. This is used both to look up a sysreg by encoding (e.g. in KVM), and also to generate code to access a sysreg where the assembler is unaware of the specific sysreg encoding. Where assemblers support the S3_<op1>_C<crn>_C<crm>_<op2> syntax for system registers, we could use this rather than manually assembling the instructions. However, we don't have consistent definitions for these and we currently still need to handle toolchains that lack this feature. * A set of <regname>_<fieldname>_SHIFT and <regname>_<fieldname>_MASK definitions, which can be used to extract fields from the register, or to construct a register from a set of fields. These do not follow the convention used by <linux/bitfield.h>, and the masks are not shifted into place, preventing their use in FIELD_PREP() and FIELD_GET(). We require the SHIFT definitions for inline assembly (and WIDTH definitions would be helpful for UBFX/SBFX), so we cannot only define a shifted MASK. Defining a SHIFT, WIDTH, shifted MASK and unshifted MASK is tedious and error-prone and life is much easier when they can be relied up to exist when writing code. * A set of <regname>_<fieldname>_<valname> definitions for each enumerated value a field may hold. These are used when identifying the presence of features. Atop of this, other code has to build up metadata at runtime (e.g. the sets of RES0/RES1 bits in a register). Alejandro Tafalla (1): arm64/sysreg: Fix typo in Enum element regex James Morse (1): arm64/sysreg: Extend the maximum width of a register and symbol name Marc Zyngier (1): arm64: Allow the definition of UNKNOWN system register fields Mark Brown (6): arm64/sysreg: Enable automatic generation of system register definitions arm64/sysreg: Introduce helpers for access to sysreg fields arm64/sysreg: Add SYS_FIELD_GET() helper arm64/sysreg: Support generation of RAZ fields arm64/sysreg: Allow leading blanks on comments in sysreg file arm64/sysreg: Allow enumerations to be declared as signed or unsigned Mark Rutland (4): arm64: Add sysreg header generation scripting arm64/sysreg: improve comment for regs without fields arm64/sysreg: fix odd line spacing arm64/sysreg: allow *Enum blocks in SysregFields blocks arch/arm64/Makefile | 3 + arch/arm64/include/asm/Kbuild | 2 + arch/arm64/include/asm/sysreg.h | 17 ++ arch/arm64/tools/Makefile | 18 ++ arch/arm64/tools/gen-sysreg.awk | 336 ++++++++++++++++++++++++++++++++ arch/arm64/tools/sysreg | 50 +++++ 6 files changed, 426 insertions(+) create mode 100644 arch/arm64/tools/Makefile create mode 100755 arch/arm64/tools/gen-sysreg.awk create mode 100644 arch/arm64/tools/sysreg -- 2.33.0

2 14

[PATCH OLK-5.10] PCI/IOV: Add pci_sriov_numvfs_lock to support enable pci sriov concurrently
by Jialin Zhang 02 Nov '23

02 Nov '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BLQX CVE: NA -------------------------------- "echo sriov_totalvfs > /sys/bus/pci/devices/$PF_BDF/sriov_numvfs" concurrently may cause the following warnings: Warning 1: sysfs: cannot create duplicate filename '/devices/pci0000:95/0000:95:00.0/pci_bus/0000:97' Call trace: dump_backtrace+0x0/0x200 show_stack+0x20/0x30 dump_stack+0xf0/0x138 sysfs_warn_dup+0x6c/0x90 sysfs_create_dir_ns+0xf8/0x11c create_dir+0x30/0x18c kobject_add_internal+0x5c/0x190 kobject_add+0x98/0x110 device_add+0x100/0x4a0 device_register+0x28/0x40 pci_alloc_child_bus+0x184/0x24c pci_add_new_bus+0x20/0xa4 pci_iov_add_virtfn+0x2a8/0x31c sriov_enable+0x20c/0x4d0 pci_enable_sriov+0x38/0x50 virtio_pci_sriov_configure+0x3c/0xf4 [virtio_pci] sriov_numvfs_store+0xb0/0x1b0 dev_attr_store+0x20/0x34 sysfs_kf_write+0x4c/0x5c kernfs_fop_write_iter+0x130/0x1c0 new_sync_write+0xf0/0x194 vfs_write+0x224/0x2c0 ksys_write+0x74/0x104 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 The reason is that different VFs may create the same pci bus number and try to add new bus concurrently in virtfn_add_bus. Warning 2: proc_dir_entry 'pci/0000:97' already registered Call trace: proc_register+0x100/0x190 proc_mkdir+0x6c/0xa0 pci_proc_attach_device+0xfc/0x120 pci_bus_add_device+0x40/0xd0 pci_iov_add_virtfn+0x2c8/0x31c sriov_enable+0x20c/0x4d0 pci_enable_sriov+0x38/0x50 virtio_pci_sriov_configure+0x3c/0xf4 [virtio_pci] sriov_numvfs_store+0xb0/0x1b0 dev_attr_store+0x20/0x34 sysfs_kf_write+0x4c/0x5c kernfs_fop_write_iter+0x130/0x1c0 new_sync_write+0xf0/0x194 vfs_write+0x224/0x2c0 ksys_write+0x74/0x104 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 The reason is that different VFs may create '/proc/bus/pci/bus_number' directory using the same bus number in pci_proc_attach_device concurrently. Mutex lock can avoid potential conflict. With the patch below the warnings above are no longer appear. Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/pci/iov.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 4afd4ee4f7f0..958f349345ca 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -16,6 +16,8 @@ #define VIRTFN_ID_LEN 16 +static DEFINE_MUTEX(pci_sriov_numvfs_lock); + int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { if (!dev->is_physfn) @@ -299,6 +301,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, if (num_vfs > pci_sriov_get_totalvfs(pdev)) return -ERANGE; + mutex_lock(&pci_sriov_numvfs_lock); device_lock(&pdev->dev); if (num_vfs == pdev->sriov->num_VFs) @@ -335,6 +338,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, exit: device_unlock(&pdev->dev); + mutex_unlock(&pci_sriov_numvfs_lock); if (ret < 0) return ret; -- 2.25.1

2 1

[PATCH openEuler-1.0-LTS] PCI/IOV: Add pci_sriov_numvfs_lock to support enable pci sriov concurrently
by Jialin Zhang 02 Nov '23

02 Nov '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BLQX CVE: NA -------------------------------- "echo sriov_totalvfs > /sys/bus/pci/devices/$PF_BDF/sriov_numvfs" concurrently may cause the following warnings: Warning 1: sysfs: cannot create duplicate filename '/devices/pci0000:95/0000:95:00.0/pci_bus/0000:97' Call trace: dump_backtrace+0x0/0x200 show_stack+0x20/0x30 dump_stack+0xf0/0x138 sysfs_warn_dup+0x6c/0x90 sysfs_create_dir_ns+0xf8/0x11c create_dir+0x30/0x18c kobject_add_internal+0x5c/0x190 kobject_add+0x98/0x110 device_add+0x100/0x4a0 device_register+0x28/0x40 pci_alloc_child_bus+0x184/0x24c pci_add_new_bus+0x20/0xa4 pci_iov_add_virtfn+0x2a8/0x31c sriov_enable+0x20c/0x4d0 pci_enable_sriov+0x38/0x50 virtio_pci_sriov_configure+0x3c/0xf4 [virtio_pci] sriov_numvfs_store+0xb0/0x1b0 dev_attr_store+0x20/0x34 sysfs_kf_write+0x4c/0x5c kernfs_fop_write_iter+0x130/0x1c0 new_sync_write+0xf0/0x194 vfs_write+0x224/0x2c0 ksys_write+0x74/0x104 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 The reason is that different VFs may create the same pci bus number and try to add new bus concurrently in virtfn_add_bus. Warning 2: proc_dir_entry 'pci/0000:97' already registered Call trace: proc_register+0x100/0x190 proc_mkdir+0x6c/0xa0 pci_proc_attach_device+0xfc/0x120 pci_bus_add_device+0x40/0xd0 pci_iov_add_virtfn+0x2c8/0x31c sriov_enable+0x20c/0x4d0 pci_enable_sriov+0x38/0x50 virtio_pci_sriov_configure+0x3c/0xf4 [virtio_pci] sriov_numvfs_store+0xb0/0x1b0 dev_attr_store+0x20/0x34 sysfs_kf_write+0x4c/0x5c kernfs_fop_write_iter+0x130/0x1c0 new_sync_write+0xf0/0x194 vfs_write+0x224/0x2c0 ksys_write+0x74/0x104 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 The reason is that different VFs may create '/proc/bus/pci/bus_number' directory using the same bus number in pci_proc_attach_device concurrently. Mutex lock can avoid potential conflict. With the patch below the warnings above are no longer appear. Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/pci/pci-sysfs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 48c56cb08652..e35c2f0ee28c 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -31,6 +31,8 @@ static int sysfs_initialized; /* = 0 */ +static DEFINE_MUTEX(pci_sriov_numvfs_lock); + /* show configuration fields */ #define pci_config_attr(field, format_string) \ static ssize_t \ @@ -604,6 +606,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, if (num_vfs > pci_sriov_get_totalvfs(pdev)) return -ERANGE; + mutex_lock(&pci_sriov_numvfs_lock); device_lock(&pdev->dev); if (num_vfs == pdev->sriov->num_VFs) @@ -640,6 +643,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, exit: device_unlock(&pdev->dev); + mutex_unlock(&pci_sriov_numvfs_lock); if (ret < 0) return ret; -- 2.25.1

2 1

[PATCH OLK-5.10] coresight: core: fix memory leak in dict->fwnode_list
by Junhao He 02 Nov '23

02 Nov '23

driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8DBW8 CVE: NA ---------------------------------------------------------------------- There are memory leaks reported by kmemleak: ... unreferenced object 0xffff2020103c3200 (size 256): comm "insmod", pid 4476, jiffies 4294978252 (age 50072.536s) hex dump (first 32 bytes): 10 60 40 06 28 20 ff ff 10 c0 59 06 20 20 ff ff .`@.( ....Y. .. 10 e0 47 06 28 20 ff ff 10 00 49 06 28 20 ff ff ..G.( ....I.( .. backtrace: [<0000000034ec4724>] __kmem_cache_alloc_node+0x2f8/0x348 [<0000000057fbc15d>] __kmalloc_node_track_caller+0x5c/0x110 [<00000055d5e34b>] krealloc+0x8c/0x178 [<00000000a4635beb>] coresight_alloc_device_name+0x128/0x188 [coresight] [<00000000a92ddfee>] funnel_cs_ops+0x10/0xfffffffffffedaa0 [coresight_funnel] [<00000000449e20f8>] dynamic_funnel_ids+0x80/0xfffffffffffed840 [coresight_funnel] ... when remove driver, the golab variables defined by the macro DEFINE_CORESIGHT_DEVLIST will be released, dict->nr_idx and dict->fwnode_list are cleared to 0. The lifetime of the golab variable has ended. So the buffer pointer is lost. Use the callback of devm_add_action_or_reset() to free memory. Fixes: 0f5f9b6ba9e1 ("coresight: Use platform agnostic names") Signed-off-by: Junhao He <hejunhao3(a)huawei.com> --- drivers/hwtracing/coresight/coresight-core.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 9696f402a328..55ec00876989 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -1739,6 +1739,20 @@ bool coresight_loses_context_with_cpu(struct device *dev) } EXPORT_SYMBOL_GPL(coresight_loses_context_with_cpu); +void coresight_release_dev_list(void *data) +{ + struct coresight_dev_list *dict = data; + + mutex_lock(&coresight_mutex); + + if (dict->nr_idx) { + kfree(dict->fwnode_list); + dict->nr_idx = 0; + } + + mutex_unlock(&coresight_mutex); +} + /* * coresight_alloc_device_name - Get an index for a given device in the * device index list specific to a driver. An index is allocated for a @@ -1749,12 +1763,16 @@ EXPORT_SYMBOL_GPL(coresight_loses_context_with_cpu); char *coresight_alloc_device_name(struct coresight_dev_list *dict, struct device *dev) { - int idx; + int idx, ret; char *name = NULL; struct fwnode_handle **list; mutex_lock(&coresight_mutex); + ret = devm_add_action_or_reset(dev, coresight_release_dev_list, dict); + if (ret) + goto done; + idx = coresight_search_device_idx(dict, dev_fwnode(dev)); if (idx < 0) { /* Make space for the new entry */ -- 2.33.0

2 1

[PATCH OLK-5.10 v2] perf auxtrace ptt: Record whether an auxtrace mmap is needed
by Junhao He 02 Nov '23

02 Nov '23

driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7HI80 CVE: NA ---------------------------------------------------------------------- Add a flag needs_auxtrace_mmap to record whether an auxtrace mmap is needed, in preparation for correctly determining whether or not an auxtrace mmap is needed. The change was added in the mainline patch [1], but it was deleted when upstream openEuler for fixes build error. openEuler OLK-5.10 now has merge evsel->needs_auxtrace_mmap. Therefore, backport this modification to OpenEuler OLK-5.10. [1] 057381a7ece1 "perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver" Signed-off-by: Junhao He <hejunhao3(a)huawei.com> --- tools/perf/arch/arm64/util/hisi-ptt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/arch/arm64/util/hisi-ptt.c b/tools/perf/arch/arm64/util/hisi-ptt.c index 110b2edf3e6b..97696b6e1dd5 100644 --- a/tools/perf/arch/arm64/util/hisi-ptt.c +++ b/tools/perf/arch/arm64/util/hisi-ptt.c @@ -113,6 +113,7 @@ static int hisi_ptt_recording_options(struct auxtrace_record *itr, } evsel->core.attr.freq = 0; evsel->core.attr.sample_period = 1; + evsel->needs_auxtrace_mmap = true; hisi_ptt_evsel = evsel; opts->full_auxtrace = true; } -- 2.33.0

2 1

[PATCH] perf auxtrace ptt: Record whether an auxtrace mmap is needed
by Junhao He 02 Nov '23

02 Nov '23

driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7HI80 CVE: NA ---------------------------------------------------------------------- Add a flag needs_auxtrace_mmap to record whether an auxtrace mmap is needed, in preparation for correctly determining whether or not an auxtrace mmap is needed. The change was added in the mainline patch [1], but it was deleted when upstream openEuler for fixes build error. openEuler OLK-5.10 now has merge evsel->needs_auxtrace_mmap. Therefore, backport this modification to OpenEuler OLK-5.10. [1] 057381a7ece1 "perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver" Signed-off-by: Junhao He <hejunhao3(a)huawei.com> --- tools/perf/arch/arm64/util/hisi-ptt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/arch/arm64/util/hisi-ptt.c b/tools/perf/arch/arm64/util/hisi-ptt.c index 110b2edf3e6b..97696b6e1dd5 100644 --- a/tools/perf/arch/arm64/util/hisi-ptt.c +++ b/tools/perf/arch/arm64/util/hisi-ptt.c @@ -113,6 +113,7 @@ static int hisi_ptt_recording_options(struct auxtrace_record *itr, } evsel->core.attr.freq = 0; evsel->core.attr.sample_period = 1; + evsel->needs_auxtrace_mmap = true; hisi_ptt_evsel = evsel; opts->full_auxtrace = true; } -- 2.33.0

1 0

[PATCH v3 openEuler-1.0-LTS] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
by Wenyu Huang 02 Nov '23

02 Nov '23

From: Kaitao Cheng <pilgrimtao(a)gmail.com> stable inclusion from stable-v4.19.297 commit 50f9ad607ea891a9308e67b81f774c71736d1098 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAW8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- In the function, if register_trace_sched_migrate_task() returns error, sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is why fail_deprobe_sched_switch was added. Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com Cc: stable(a)vger.kernel.org Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing") Signed-off-by: Kaitao Cheng <pilgrimtao(a)gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Wenyu Huang <huangwenyu4(a)huawei.com> --- kernel/trace/trace_sched_wakeup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 0db1b024e786..8041bd5e4262 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -642,7 +642,7 @@ static void start_wakeup_tracer(struct trace_array *tr) if (ret) { pr_info("wakeup trace: Couldn't activate tracepoint" " probe to kernel_sched_migrate_task\n"); - return; + goto fail_deprobe_sched_switch; } wakeup_reset(tr); @@ -660,6 +660,8 @@ static void start_wakeup_tracer(struct trace_array *tr) printk(KERN_ERR "failed to start wakeup tracer\n"); return; +fail_deprobe_sched_switch: + unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL); fail_deprobe_wake_new: unregister_trace_sched_wakeup_new(probe_wakeup, NULL); fail_deprobe: -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] net: sched: sch_qfq: Use non-work-conserving warning handler
by Liu Jian 02 Nov '23

02 Nov '23

mainline inclusion from mainline-v6.7-rc1 commit 6d25d1dc76bf5943a5c1f4bb74d66d5eac58eb77 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8D8S8 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- A helper function for printing non-work-conserving alarms is added in commit b00355db3f88 ("pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler."). In this commit, use qdisc_warn_nonwc() instead of WARN_ONCE() to handle the non-work-conserving warning in qfq Qdisc. Signed-off-by: Liu Jian <liujian56(a)huawei.com> Link: https://lore.kernel.org/r/20231023064729.370649-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/sched/sch_qfq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 5b7b149a2b9f..14d009151a5b 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1018,7 +1018,7 @@ static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, *cl = list_first_entry(&agg->active, struct qfq_class, alist); skb = (*cl)->qdisc->ops->peek((*cl)->qdisc); if (skb == NULL) - WARN_ONCE(1, "qfq_dequeue: non-workconserving leaf\n"); + qdisc_warn_nonwc("qfq_dequeue", (*cl)->qdisc); else *len = qdisc_pkt_len(skb); -- 2.34.1

2 1

[PATCH OLK-5.10] net: sched: sch_qfq: Use non-work-conserving warning handler
by Liu Jian 02 Nov '23

02 Nov '23

mainline inclusion from mainline-v6.7-rc1 commit 6d25d1dc76bf5943a5c1f4bb74d66d5eac58eb77 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8D8S8 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- A helper function for printing non-work-conserving alarms is added in commit b00355db3f88 ("pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler."). In this commit, use qdisc_warn_nonwc() instead of WARN_ONCE() to handle the non-work-conserving warning in qfq Qdisc. Signed-off-by: Liu Jian <liujian56(a)huawei.com> Link: https://lore.kernel.org/r/20231023064729.370649-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/sched/sch_qfq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 9447f486141d..e48c3baa059e 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1005,7 +1005,7 @@ static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, *cl = list_first_entry(&agg->active, struct qfq_class, alist); skb = (*cl)->qdisc->ops->peek((*cl)->qdisc); if (skb == NULL) - WARN_ONCE(1, "qfq_dequeue: non-workconserving leaf\n"); + qdisc_warn_nonwc("qfq_dequeue", (*cl)->qdisc); else *len = qdisc_pkt_len(skb); -- 2.34.1

2 1

[PATCH OLK-5.10] ext4: recheck buffer valid after page unlock
by yangerkun 02 Nov '23

02 Nov '23

hulk inclusion category: bugfix bugzilla: 189306, https://gitee.com/openeuler/kernel/issues/I8BBWH CVE: NA ---------------------------------------- We need destination address when we do dio read, and this addr can come from mmap results for a journal data mode inode. Then dio_bio_complete will dirty the page which the mmap addr point to(since we have fill dirty data for this page). ext4_journalled_set_page_dirty will first set PageChecked and then dirty page(do not dirty buffer), which leave __ext4_journalled_writepage in ext4_writepage to do the rest thing needed for journal data mode. We need first start handle and then lock the page, so in __ext4_journalled_writepage we first unlock the page and latter call ext4_journal_start; after we relock the page, we do some check to prevent the concurrence truncate and then walk through buffer to help join journal. Actually, once we unlock the page, since we has not add extra buffer refcount, so the buffer can also gone(concurrent happened for jbd2_journal_commit_transaction and jbd2_log_do_checkpoint can remove the extra buffer head ref and clear buffer dirty, so drop cache can release buffer), and upper walk through buffer will trigger the BUG_ON in page_buffers. The problem does not exist in mainline since 3f079114bf52 ("ext4: Convert data=journal writeback to use ext4_writepages()") delete all this code, and this patchset seems too complex to do the backport. So we just fix it with a simpler way, check buffer valid before walk through buffer. Signed-off-by: yangerkun <yangerkun(a)huawei.com> --- fs/ext4/inode.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9379a062dba4..0b5521cec637 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1911,6 +1911,13 @@ static int __ext4_journalled_writepage(struct page *page, goto out; } + if (!page_has_buffers(page)) { + /* Check buffer valid since we ever unlock this page */ + ext4_journal_stop(handle); + ClearPageDirty(page); + goto out; + } + if (inline_data) { ret = ext4_mark_inode_dirty(handle, inode); } else { -- 2.39.2

2 1

[PATCH openEuler-1.0-LTS v3 0/3] bugfixes for CVE-2022-45884
by liwei 02 Nov '23

02 Nov '23

bugfixes for CVE-2022-45884 Dinghao Liu (1): media: dvbdev: Fix memleak in dvb_register_device Fuqian Huang (1): media: media/dvb: Use kmemdup rather than duplicating its implementation Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() drivers/media/dvb-core/dvbdev.c | 86 ++++++++++++++++----- drivers/media/dvb-frontends/drx39xyj/drxj.c | 5 +- include/media/dvbdev.h | 15 ++++ 3 files changed, 82 insertions(+), 24 deletions(-) -- 2.25.1

2 4

[PATCH openEuler-1.0-LTS v4 0/7] Fix netfilter conntrack
by Lu Wei 02 Nov '23

02 Nov '23

Florian Westphal (7): netfilter: conntrack: tell compiler to not inline nf_ct_resolve_clash netfilter: conntrack: remove two args from resolve_clash netfilter: conntrack: place confirm-bit setting in a helper netfilter: conntrack: split resolve_clash function netfilter: conntrack: allow insertion of clashing entries netfilter: conntrack: do not auto-delete clash entries on reply netfilter: conntrack: fix infinite loop on rmmod include/linux/rculist_nulls.h | 7 + .../linux/netfilter/nf_conntrack_common.h | 12 +- net/netfilter/nf_conntrack_core.c | 215 +++++++++++++++--- net/netfilter/nf_conntrack_proto_udp.c | 8 +- net/netfilter/nft_flow_offload.c | 2 +- 5 files changed, 206 insertions(+), 38 deletions(-) -- 2.34.1

2 8

[PATCH openEuler-1.0-LTS v2 0/4] ACPI for MPAM 2.0
by Yu Liao 01 Nov '23

01 Nov '23

v1 -> v2: modified commit log Erik Kaneda (1): ACPICA: ACPI 6.4: PPTT: add new version of subtable type 1 Hesham Almatary (1): ACPICA: Add support for Arm's MPAM ACPI table version 2 Yu Liao (2): ACPI / PPTT: Find PPTT processor node by cache id ACPI/MPAM: Adapt to Arm's MPAM ACPI table version 2 arch/arm64/kernel/mpam/mpam_device.c | 2 +- drivers/acpi/arm64/Makefile | 2 +- drivers/acpi/arm64/mpam.c | 21 +++- drivers/acpi/arm64/mpam_v2.c | 175 +++++++++++++++++++++++++++ drivers/acpi/pptt.c | 55 +++++++++ include/acpi/actbl2.h | 118 ++++++++++++++++++ include/linux/acpi.h | 5 + include/linux/arm_mpam.h | 2 +- 8 files changed, 373 insertions(+), 7 deletions(-) create mode 100644 drivers/acpi/arm64/mpam_v2.c -- 2.25.1

2 5

[PATCH openEuler-1.0-LTS v2 0/3] bugfixes for CVE-2022-45884
by liwei 01 Nov '23

01 Nov '23

bugfixes for CVE-2022-45884 Dinghao Liu (1): media: dvbdev: Fix memleak in dvb_register_device Fuqian Huang (1): media: media/dvb: Use kmemdup rather than duplicating its implementation Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() drivers/media/dvb-core/dvbdev.c | 86 ++++++++++++++++----- drivers/media/dvb-frontends/drx39xyj/drxj.c | 5 +- include/media/dvbdev.h | 15 ++++ 3 files changed, 82 insertions(+), 24 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10 v2 0/7] add sample sockmap code for redis
by Liu Jian 31 Oct '23

31 Oct '23

v1->v2: Move local_skb to a reserved field instead of inserting it to a structure hole. Fix a cgroup ref leak bug. Hengqi Chen (1): libbpf: Support uniform BTF-defined key/value specification across all BPF maps Liu Jian (4): cgroup: make cgroup_bpf_prog_attach work when cgroup2 is not mounted net: let sockops can use bpf_get_current_comm() net: add local_skb parameter to identify local tcp connection tools: add sample sockmap code for redis Xu Kuohai (1): bpf, sockmap: Fix map type error in sock_map_del_link Yosry Ahmed (1): cgroup: add cgroup_v1v2_get_from_[fd/file]() include/linux/cgroup.h | 2 + include/linux/filter.h | 1 + include/linux/skbuff.h | 2 +- include/uapi/linux/bpf.h | 1 + kernel/bpf/cgroup.c | 8 +- kernel/cgroup/cgroup.c | 72 +++++++- net/core/filter.c | 9 + net/core/sock_map.c | 10 +- net/ipv4/tcp_input.c | 4 +- net/ipv4/tcp_output.c | 4 + tools/include/uapi/linux/bpf.h | 1 + tools/lib/bpf/libbpf.c | 24 +++ tools/netacc/Makefile | 24 +++ tools/netacc/bpf_sockmap.h | 167 +++++++++++++++++++ tools/netacc/net-acc | 35 ++++ tools/netacc/netacc.c | 296 +++++++++++++++++++++++++++++++++ tools/netacc/netaccsockmap.c | 176 ++++++++++++++++++++ 17 files changed, 820 insertions(+), 16 deletions(-) create mode 100644 tools/netacc/Makefile create mode 100644 tools/netacc/bpf_sockmap.h create mode 100755 tools/netacc/net-acc create mode 100644 tools/netacc/netacc.c create mode 100644 tools/netacc/netaccsockmap.c -- 2.34.1

2 8

[PATCH openEuler-1.0-LTS v3 0/7] Fix netfilter conntrack
by Lu Wei 31 Oct '23

31 Oct '23

Florian Westphal (7): netfilter: conntrack: tell compiler to not inline nf_ct_resolve_clash netfilter: conntrack: remove two args from resolve_clash netfilter: conntrack: place confirm-bit setting in a helper netfilter: conntrack: split resolve_clash function netfilter: conntrack: allow insertion of clashing entries netfilter: conntrack: do not auto-delete clash entries on reply netfilter: conntrack: fix infinite loop on rmmod include/linux/rculist_nulls.h | 7 + .../linux/netfilter/nf_conntrack_common.h | 12 +- net/netfilter/nf_conntrack_core.c | 215 +++++++++++++++--- net/netfilter/nf_conntrack_proto_udp.c | 7 +- net/netfilter/nft_flow_offload.c | 2 +- 5 files changed, 205 insertions(+), 38 deletions(-) -- V3: modify commit message 2.34.1

2 8

[PATCH OLK-5.10 0/3] LoongArch: add 32/64 pc relative relocation type support
by Hongchen Zhang 31 Oct '23

31 Oct '23

Hongchen Zhang (3): LoongArch: Define relocation types for ABI v2.10 LoongArch: Add support for 32_PCREL relocation type LoongArch: Add support for 64_PCREL relocation type arch/loongarch/include/asm/elf.h | 9 +++++++++ arch/loongarch/kernel/module.c | 22 +++++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-) -- 2.33.0

2 4

[PATCH OLK-5.10 1/3] LoongArch: Define relocation types for ABI v2.10
by Hongchen Zhang 31 Oct '23

31 Oct '23

LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8C3BV ------------------------------------------ The relocation types from 101 to 109 are used by GNU binutils >= 2.41, add their definitions to use them in later patches. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=include/elf/loongar… Cc: <stable(a)vger.kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/include/asm/elf.h | 9 +++++++++ arch/loongarch/kernel/module.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h index 7af0cebf28d7..b9a4ab54285c 100644 --- a/arch/loongarch/include/asm/elf.h +++ b/arch/loongarch/include/asm/elf.h @@ -111,6 +111,15 @@ #define R_LARCH_TLS_GD_HI20 98 #define R_LARCH_32_PCREL 99 #define R_LARCH_RELAX 100 +#define R_LARCH_DELETE 101 +#define R_LARCH_ALIGN 102 +#define R_LARCH_PCREL20_S2 103 +#define R_LARCH_CFA 104 +#define R_LARCH_ADD6 105 +#define R_LARCH_SUB6 106 +#define R_LARCH_ADD_ULEB128 107 +#define R_LARCH_SUB_ULEB128 108 +#define R_LARCH_64_PCREL 109 #ifndef ELF_ARCH diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c index 8127df5dd0ad..0b8ce1e0f83b 100644 --- a/arch/loongarch/kernel/module.c +++ b/arch/loongarch/kernel/module.c @@ -387,7 +387,7 @@ typedef int (*reloc_rela_handler)(struct module *mod, u32 *location, Elf_Addr v, /* The handlers for known reloc types */ static reloc_rela_handler reloc_rela_handlers[] = { - [R_LARCH_NONE ... R_LARCH_RELAX] = apply_r_larch_error, + [R_LARCH_NONE ... R_LARCH_64_PCREL] = apply_r_larch_error, [R_LARCH_NONE] = apply_r_larch_none, [R_LARCH_32] = apply_r_larch_32, -- 2.33.0

2 2

[PATCH OLK-5.10 3/3] LoongArch: Add support for 64_PCREL relocation type
by Hongchen Zhang 31 Oct '23

31 Oct '23

LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8C3BV ------------------------------------------ When build and update kernel with the latest upstream binutils and loongson3_defconfig, module loader fails with: kmod: zsmalloc: Unknown relocation type 109 kmod: fuse: Unknown relocation type 109 kmod: fuse: Unknown relocation type 109 kmod: radeon: Unknown relocation type 109 kmod: nf_tables: Unknown relocation type 109 kmod: nf_tables: Unknown relocation type 109 This is because the latest upstream binutils replaces a pair of ADD64 and SUB64 with 64_PCREL, so add support for 64_PCREL relocation type. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ecb802d02eeb Cc: <stable(a)vger.kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/kernel/module.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c index a67f9def7927..b4cf3383a8bc 100644 --- a/arch/loongarch/kernel/module.c +++ b/arch/loongarch/kernel/module.c @@ -381,6 +381,15 @@ static int apply_r_larch_32_pcrel(struct module *mod, u32 *location, Elf_Addr v, return 0; } +static int apply_r_larch_64_pcrel(struct module *mod, u32 *location, Elf_Addr v, + s64 *rela_stack, size_t *rela_stack_top, unsigned int type) +{ + ptrdiff_t offset = (void *)v - (void *)location; + + *(u64 *)location = offset; + return 0; +} + /* * reloc_handlers_rela() - Apply a particular relocation to a module * @mod: the module to apply the reloc to @@ -414,6 +423,7 @@ static reloc_rela_handler reloc_rela_handlers[] = { [R_LARCH_PCALA_HI20...R_LARCH_PCALA64_HI12] = apply_r_larch_pcala, [R_LARCH_GOT_PC_HI20...R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc, [R_LARCH_32_PCREL] = apply_r_larch_32_pcrel, + [R_LARCH_64_PCREL] = apply_r_larch_64_pcrel, }; int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab, -- 2.33.0

1 0

[PATCH OLK-5.10 2/3] LoongArch: Add support for 32_PCREL relocation type
by Hongchen Zhang 31 Oct '23

31 Oct '23

LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8C3BV ------------------------------------------ When build and update kernel with the latest upstream binutils and loongson3_defconfig, module loader fails with: kmod: zsmalloc: Unsupport relocation type 99, please add its support. kmod: fuse: Unsupport relocation type 99, please add its support. kmod: ipmi_msghandler: Unsupport relocation type 99, please add its support. kmod: ipmi_msghandler: Unsupport relocation type 99, please add its support. kmod: pstore: Unsupport relocation type 99, please add its support. kmod: drm_display_helper: Unsupport relocation type 99, please add its support. kmod: drm_display_helper: Unsupport relocation type 99, please add its support. kmod: drm_display_helper: Unsupport relocation type 99, please add its support. kmod: fuse: Unsupport relocation type 99, please add its support. kmod: fat: Unsupport relocation type 99, please add its support. This is because the latest upstream binutils replaces a pair of ADD32 and SUB32 with 32_PCREL, so add support for 32_PCREL relocation type. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ecb802d02eeb Cc: <stable(a)vger.kernel.org> Co-developed-by: Youling Tang <tangyouling(a)loongson.cn> Signed-off-by: Youling Tang <tangyouling(a)loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/kernel/module.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c index 0b8ce1e0f83b..a67f9def7927 100644 --- a/arch/loongarch/kernel/module.c +++ b/arch/loongarch/kernel/module.c @@ -372,6 +372,15 @@ static int apply_r_larch_got_pc(struct module *mod, u32 *location, Elf_Addr v, return apply_r_larch_pcala(mod, location, got, rela_stack, rela_stack_top, type); } +static int apply_r_larch_32_pcrel(struct module *mod, u32 *location, Elf_Addr v, + s64 *rela_stack, size_t *rela_stack_top, unsigned int type) +{ + ptrdiff_t offset = (void *)v - (void *)location; + + *(u32 *)location = offset; + return 0; +} + /* * reloc_handlers_rela() - Apply a particular relocation to a module * @mod: the module to apply the reloc to @@ -404,6 +413,7 @@ static reloc_rela_handler reloc_rela_handlers[] = { [R_LARCH_B26] = apply_r_larch_b26, [R_LARCH_PCALA_HI20...R_LARCH_PCALA64_HI12] = apply_r_larch_pcala, [R_LARCH_GOT_PC_HI20...R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc, + [R_LARCH_32_PCREL] = apply_r_larch_32_pcrel, }; int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab, -- 2.33.0

1 0

[PATCH OLK-5.10] nvmet-tcp: Fix a possible UAF in queue intialization setup
by Li Lingfeng 31 Oct '23

31 Oct '23

From: Sagi Grimberg <sagi(a)grimberg.me> mainline inclusion from mainline-v6.6-rc7 commit d920abd1e7c4884f9ecd0749d1921b7ab19ddfbd category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I88DLS CVE: CVE-2023-5178 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ------------------------------------------------- From Alon: "Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel, a malicious user can cause a UAF and a double free, which may lead to RCE (may also lead to an LPE in case the attacker already has local privileges)." Hence, when a queue initialization fails after the ahash requests are allocated, it is guaranteed that the queue removal async work will be called, hence leave the deallocation to the queue removal. Also, be extra careful not to continue processing the socket, so set queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error. Cc: stable(a)vger.kernel.org Reported-by: Alon Zahavi <zahavi.alon(a)gmail.com> Tested-by: Alon Zahavi <zahavi.alon(a)gmail.com> Signed-off-by: Sagi Grimberg <sagi(a)grimberg.me> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com> Signed-off-by: Keith Busch <kbusch(a)kernel.org> Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/nvme/target/tcp.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 2ddbd4f4f628..7ce22d173fc7 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -336,6 +336,7 @@ static void nvmet_tcp_fatal_error(struct nvmet_tcp_queue *queue) static void nvmet_tcp_socket_error(struct nvmet_tcp_queue *queue, int status) { + queue->rcv_state = NVMET_TCP_RECV_ERR; if (status == -EPIPE || status == -ECONNRESET) kernel_sock_shutdown(queue->sock, SHUT_RDWR); else @@ -882,15 +883,11 @@ static int nvmet_tcp_handle_icreq(struct nvmet_tcp_queue *queue) iov.iov_len = sizeof(*icresp); ret = kernel_sendmsg(queue->sock, &msg, &iov, 1, iov.iov_len); if (ret < 0) - goto free_crypto; + return ret; /* queue removal will cleanup */ queue->state = NVMET_TCP_Q_LIVE; nvmet_prepare_receive_pdu(queue); return 0; -free_crypto: - if (queue->hdr_digest || queue->data_digest) - nvmet_tcp_free_crypto(queue); - return ret; } static void nvmet_tcp_handle_req_failure(struct nvmet_tcp_queue *queue, -- 2.31.1

2 1

[PATCH OLK-5.10] integrity: Fix possible multiple allocation in integrity_inode_get()
by felix 31 Oct '23

31 Oct '23

From: Tianjia Zhang <tianjia.zhang(a)linux.alibaba.com> mainline inclusion from mainline-v6.5-rc1 commit 9df6a4870dc371136e90330cfbbc51464ee66993 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8C2DB CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When integrity_inode_get() is querying and inserting the cache, there is a conditional race in the concurrent environment. The race condition is the result of not properly implementing "double-checked locking". In this case, it first checks to see if the iint cache record exists before taking the lock, but doesn't check again after taking the integrity_iint_lock. Fixes: bf2276d10ce5 ("ima: allocating iint improvements") Signed-off-by: Tianjia Zhang <tianjia.zhang(a)linux.alibaba.com> Cc: Dmitry Kasatkin <dmitry.kasatkin(a)gmail.com> Cc: <stable(a)vger.kernel.org> # v3.10+ Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com> Signed-off-by: Felix Fu <fuzhen5(a)huawei.com> --- security/integrity/iint.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/security/integrity/iint.c b/security/integrity/iint.c index 1668cfbeb671..2428037d9e14 100644 --- a/security/integrity/iint.c +++ b/security/integrity/iint.c @@ -43,12 +43,10 @@ static struct integrity_iint_cache *__integrity_iint_find(struct inode *inode) else if (inode > iint->inode) n = n->rb_right; else - break; + return iint; } - if (!n) - return NULL; - return iint; + return NULL; } /* @@ -111,10 +109,15 @@ struct integrity_iint_cache *integrity_inode_get(struct inode *inode) parent = *p; test_iint = rb_entry(parent, struct integrity_iint_cache, rb_node); - if (inode < test_iint->inode) + if (inode < test_iint->inode) { p = &(*p)->rb_left; - else + } else if (inode > test_iint->inode) { p = &(*p)->rb_right; + } else { + write_unlock(&integrity_iint_lock); + kmem_cache_free(iint_cache, iint); + return test_iint; + } } iint->inode = inode; -- 2.34.1

2 1

[PATCH OLK-5.10 0/3] LoongArch: add 32/64 pc relative relocation type support
by Hongchen Zhang 31 Oct '23

31 Oct '23

Hongchen Zhang (3): LoongArch: Define relocation types for ABI v2.10 LoongArch: Add support for 32_PCREL relocation type LoongArch: Add support for 64_PCREL relocation type arch/loongarch/include/asm/elf.h | 9 +++++++++ arch/loongarch/kernel/module.c | 22 +++++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-) -- 2.33.0

2 4

[PATCH openEuler-1.0-LTS v2 0/7] Fix netfilter nf_conntrack
by Lu Wei 31 Oct '23

31 Oct '23

Florian Westphal (6): netfilter: conntrack: tell compiler to not inline nf_ct_resolve_clash netfilter: conntrack: remove two args from resolve_clash netfilter: conntrack: place confirm-bit setting in a helper netfilter: conntrack: split resolve_clash function netfilter: conntrack: allow insertion of clashing entries netfilter: conntrack: do not auto-delete clash entries on reply Zhihao Cheng (1): ubi: Refuse attaching if mtd's erasesize is 0 -- V2: add a new bugfix patch 2.34.1

2 8

[PATCH OLK-5.10 0/4] fix memcgv1 oom meminfo bug
by Lu Jialin 31 Oct '23

31 Oct '23

Sergey Senozhatsky (1): seq_buf: Add seq_buf_do_printk() helper Steven Rostedt (VMware) (1): seq_buf: Add seq_buf_terminate() API Yosry Ahmed (2): memcg: use seq_buf_do_printk() with mem_cgroup_print_oom_meminfo() memcg: dump memory.stat during cgroup OOM for v1 include/linux/seq_buf.h | 27 +++++++++++++ lib/seq_buf.c | 32 +++++++++++++++ mm/memcontrol.c | 87 +++++++++++++++++++++++------------------ 3 files changed, 108 insertions(+), 38 deletions(-) -- 2.34.1

2 5

openEuler 测试讨论
by openEuler conference 31 Oct '23

31 Oct '23

您好！ Kernel SIG 邀请您参加 2023-10-31 15:30 召开的WeLink会议会议主题：openEuler 测试讨论会议链接：https://bmeeting.huaweicloud.com:36443/#/j/980828033 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 温馨提醒：建议接入会议后修改参会人的姓名，也可以使用您在gitee.com的ID 更多资讯尽在：https://openeuler.org/zh/ Hello! openEuler Kernel SIG invites you to attend the WeLink conference will be held at 2023-10-31 15:30, The subject of the conference is openEuler 测试讨论, You can join the meeting at https://bmeeting.huaweicloud.com:36443/#/j/980828033. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com. More information: https://openeuler.org/en/

1 0

[PATCH OLK-5.10 v2 00/38] Add error handling for device_add_disk
by Zhong Jinghua 31 Oct '23

31 Oct '23

Befor these paths, block is not know when drvier is add fail. Now, add error handling for drivers. Christoph Hellwig (14): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Luis Chamberlain (11): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Tetsuo Handa (1): block: check minor range in device_add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Yu Kuai (1): block: fix memory leak for elevator on add_disk failure Zhong Jinghua (9): block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path scsi: sd: Clean up sdkp if device_add_disk() failed mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup block/blk-integrity.c | 12 +- block/blk-mq.c | 19 +++ block/blk.h | 5 +- block/genhd.c | 286 ++++++++++++++++++++++++++------------- drivers/block/brd.c | 103 ++++++-------- drivers/block/loop.c | 26 ++-- drivers/block/nbd.c | 59 ++++---- drivers/mtd/ubi/block.c | 76 +++++------ drivers/nvme/host/core.c | 41 +++--- drivers/scsi/sd.c | 8 +- drivers/scsi/sr.c | 7 +- include/linux/blk-mq.h | 12 ++ include/linux/genhd.h | 36 +++++ 13 files changed, 417 insertions(+), 273 deletions(-) -- 2.31.1

2 39

[PATCH OLK-5.10 1/3] LoongArch: Define relocation types for ABI v2.10
by Hongchen Zhang 31 Oct '23

31 Oct '23

The relocation types from 101 to 109 are used by GNU binutils >= 2.41, add their definitions to use them in later patches. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=include/elf/loongar… Cc: <stable(a)vger.kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/include/asm/elf.h | 9 +++++++++ arch/loongarch/kernel/module.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h index 7af0cebf28d7..b9a4ab54285c 100644 --- a/arch/loongarch/include/asm/elf.h +++ b/arch/loongarch/include/asm/elf.h @@ -111,6 +111,15 @@ #define R_LARCH_TLS_GD_HI20 98 #define R_LARCH_32_PCREL 99 #define R_LARCH_RELAX 100 +#define R_LARCH_DELETE 101 +#define R_LARCH_ALIGN 102 +#define R_LARCH_PCREL20_S2 103 +#define R_LARCH_CFA 104 +#define R_LARCH_ADD6 105 +#define R_LARCH_SUB6 106 +#define R_LARCH_ADD_ULEB128 107 +#define R_LARCH_SUB_ULEB128 108 +#define R_LARCH_64_PCREL 109 #ifndef ELF_ARCH diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c index 8127df5dd0ad..0b8ce1e0f83b 100644 --- a/arch/loongarch/kernel/module.c +++ b/arch/loongarch/kernel/module.c @@ -387,7 +387,7 @@ typedef int (*reloc_rela_handler)(struct module *mod, u32 *location, Elf_Addr v, /* The handlers for known reloc types */ static reloc_rela_handler reloc_rela_handlers[] = { - [R_LARCH_NONE ... R_LARCH_RELAX] = apply_r_larch_error, + [R_LARCH_NONE ... R_LARCH_64_PCREL] = apply_r_larch_error, [R_LARCH_NONE] = apply_r_larch_none, [R_LARCH_32] = apply_r_larch_32, -- 2.33.0

2 1

[PATCH OLK-5.10 3/3] LoongArch: Add support for 64_PCREL relocation type
by Hongchen Zhang 31 Oct '23

31 Oct '23

When build and update kernel with the latest upstream binutils and loongson3_defconfig, module loader fails with: kmod: zsmalloc: Unknown relocation type 109 kmod: fuse: Unknown relocation type 109 kmod: fuse: Unknown relocation type 109 kmod: radeon: Unknown relocation type 109 kmod: nf_tables: Unknown relocation type 109 kmod: nf_tables: Unknown relocation type 109 This is because the latest upstream binutils replaces a pair of ADD64 and SUB64 with 64_PCREL, so add support for 64_PCREL relocation type. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ecb802d02eeb Cc: <stable(a)vger.kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/kernel/module.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c index a67f9def7927..b4cf3383a8bc 100644 --- a/arch/loongarch/kernel/module.c +++ b/arch/loongarch/kernel/module.c @@ -381,6 +381,15 @@ static int apply_r_larch_32_pcrel(struct module *mod, u32 *location, Elf_Addr v, return 0; } +static int apply_r_larch_64_pcrel(struct module *mod, u32 *location, Elf_Addr v, + s64 *rela_stack, size_t *rela_stack_top, unsigned int type) +{ + ptrdiff_t offset = (void *)v - (void *)location; + + *(u64 *)location = offset; + return 0; +} + /* * reloc_handlers_rela() - Apply a particular relocation to a module * @mod: the module to apply the reloc to @@ -414,6 +423,7 @@ static reloc_rela_handler reloc_rela_handlers[] = { [R_LARCH_PCALA_HI20...R_LARCH_PCALA64_HI12] = apply_r_larch_pcala, [R_LARCH_GOT_PC_HI20...R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc, [R_LARCH_32_PCREL] = apply_r_larch_32_pcrel, + [R_LARCH_64_PCREL] = apply_r_larch_64_pcrel, }; int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab, -- 2.33.0

1 0

[PATCH OLK-5.10 2/3] LoongArch: Add support for 32_PCREL relocation type
by Hongchen Zhang 31 Oct '23

31 Oct '23

When build and update kernel with the latest upstream binutils and loongson3_defconfig, module loader fails with: kmod: zsmalloc: Unsupport relocation type 99, please add its support. kmod: fuse: Unsupport relocation type 99, please add its support. kmod: ipmi_msghandler: Unsupport relocation type 99, please add its support. kmod: ipmi_msghandler: Unsupport relocation type 99, please add its support. kmod: pstore: Unsupport relocation type 99, please add its support. kmod: drm_display_helper: Unsupport relocation type 99, please add its support. kmod: drm_display_helper: Unsupport relocation type 99, please add its support. kmod: drm_display_helper: Unsupport relocation type 99, please add its support. kmod: fuse: Unsupport relocation type 99, please add its support. kmod: fat: Unsupport relocation type 99, please add its support. This is because the latest upstream binutils replaces a pair of ADD32 and SUB32 with 32_PCREL, so add support for 32_PCREL relocation type. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ecb802d02eeb Cc: <stable(a)vger.kernel.org> Co-developed-by: Youling Tang <tangyouling(a)loongson.cn> Signed-off-by: Youling Tang <tangyouling(a)loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/kernel/module.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c index 0b8ce1e0f83b..a67f9def7927 100644 --- a/arch/loongarch/kernel/module.c +++ b/arch/loongarch/kernel/module.c @@ -372,6 +372,15 @@ static int apply_r_larch_got_pc(struct module *mod, u32 *location, Elf_Addr v, return apply_r_larch_pcala(mod, location, got, rela_stack, rela_stack_top, type); } +static int apply_r_larch_32_pcrel(struct module *mod, u32 *location, Elf_Addr v, + s64 *rela_stack, size_t *rela_stack_top, unsigned int type) +{ + ptrdiff_t offset = (void *)v - (void *)location; + + *(u32 *)location = offset; + return 0; +} + /* * reloc_handlers_rela() - Apply a particular relocation to a module * @mod: the module to apply the reloc to @@ -404,6 +413,7 @@ static reloc_rela_handler reloc_rela_handlers[] = { [R_LARCH_B26] = apply_r_larch_b26, [R_LARCH_PCALA_HI20...R_LARCH_PCALA64_HI12] = apply_r_larch_pcala, [R_LARCH_GOT_PC_HI20...R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc, + [R_LARCH_32_PCREL] = apply_r_larch_32_pcrel, }; int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab, -- 2.33.0

1 0

[PATCH OLK-5.10] io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid
by Baokun Li 30 Oct '23

30 Oct '23

From: Jens Axboe <axboe(a)kernel.dk> mainline inclusion from mainline-v6.7 commit 7644b1a1c9a7ae8ab99175989bfc8676055edb46 category: bugfix bugzilla: 189322, https://gitee.com/src-openeuler/kernel/issues/I8BQSX CVE: CVE-2023-46862 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- We could race with SQ thread exit, and if we do, we'll hit a NULL pointer dereference when the thread is cleared. Grab the SQPOLL data lock before attempting to get the task cpu and pid for fdinfo, this ensures we have a stable view of it. Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 Reviewed-by: Gabriel Krisman Bertazi <krisman(a)suse.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflicts: io_uring/io_uring.c Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- io_uring/io_uring.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 2e61a67938a7..3fa584b0ee0a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -10180,7 +10180,7 @@ static int io_uring_show_cred(struct seq_file *m, unsigned int id, static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) { - struct io_sq_data *sq = NULL; + int sq_pid = -1, sq_cpu = -1; bool has_lock; int i; @@ -10193,13 +10193,19 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) has_lock = mutex_trylock(&ctx->uring_lock); if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { - sq = ctx->sq_data; - if (!sq->thread) - sq = NULL; + struct io_sq_data *sq = ctx->sq_data; + + if (mutex_trylock(&sq->lock)) { + if (sq->thread) { + sq_pid = task_pid_nr(sq->thread); + sq_cpu = task_cpu(sq->thread); + } + mutex_unlock(&sq->lock); + } } - seq_printf(m, "SqThread:\t%d\n", sq ? task_pid_nr(sq->thread) : -1); - seq_printf(m, "SqThreadCpu:\t%d\n", sq ? task_cpu(sq->thread) : -1); + seq_printf(m, "SqThread:\t%d\n", sq_pid); + seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu); seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files); for (i = 0; has_lock && i < ctx->nr_user_files; i++) { struct file *f = io_file_from_index(ctx, i); -- 2.31.1

2 1

[PATCH openEuler-1.0-LTS 0/4] bugfixes for CVE-2022-45884
by liwei 30 Oct '23

30 Oct '23

bugfixes for CVE-2022-45884 Hyunwoo Kim (1): media: dvb-core: Fix use-after-free due to race at dvb_register_device() Keita Suzuki (1): media: dvb-core: Fix double free in dvb_register_device() Lin Ma (2): media: dvbdev: adopts refcnt to avoid UAF media: dvbdev: fix refcnt bug drivers/media/dvb-core/dvb_ca_en50221.c | 2 +- drivers/media/dvb-core/dvb_frontend.c | 2 +- drivers/media/dvb-core/dvbdev.c | 122 ++++++++++++++++++------ include/media/dvbdev.h | 46 ++++++--- 4 files changed, 128 insertions(+), 44 deletions(-) -- 2.25.1

2 5

[openeuler 0/8] libhns: Support STARS over RDMA
by Chengchang Tang 30 Oct '23

30 Oct '23

STARS is a HW scheduler. These patches support hns RoCE working in STARS mode which means RoCE will be scheduled by STARS. Chengchang Tang (8): Update kernel headers for libhns query_device() libhns: Support query HW ID by hnsdv_query_device() Update kernel headers for supporting POE CQs libhns: Add support for POE CQs Update kernel headers for supporting STARS QP in HNS libhns: Support STARS mode QP Update kernel headers for supporting write with notify libhns: Support write with notify kernel-headers/rdma/hns-abi.h | 38 ++++++- providers/hns/hns_roce_u.c | 50 ++++++--- providers/hns/hns_roce_u.h | 1 + providers/hns/hns_roce_u_abi.h | 3 + providers/hns/hns_roce_u_hw_v2.c | 65 +++++++++-- providers/hns/hns_roce_u_hw_v2.h | 1 + providers/hns/hns_roce_u_verbs.c | 229 +++++++++++++++++++++++++++++++++++---- providers/hns/hnsdv.h | 50 +++++++++ providers/hns/libhns.map | 3 + 9 files changed, 393 insertions(+), 47 deletions(-) -- 2.9.5

1 8

[openeuler 0/6] RDMA/hns: Support STARS over RDMA
by Chengchang Tang 30 Oct '23

30 Oct '23

STARS is a HW scheduler. These patches support STARS taking over HNS RoCE. Chengchang Tang (6): RDMA/hns: Support query HW ID from user space. RDMA/hns: Fix print after query hw id failed. RDMA/hns: Support configuring POE channels and creating POE CQs RDMA/hns: Support STARS mode QP RDMA/hns: Support kernel ULP querying HW ID RDMA/hns: Support write with notify drivers/infiniband/hw/hns/Makefile | 3 +- drivers/infiniband/hw/hns/hns_roce_common.h | 3 + drivers/infiniband/hw/hns/hns_roce_cq.c | 114 +++++++++++++++++- drivers/infiniband/hw/hns/hns_roce_debugfs.c | 111 ++++++++++++++++++ drivers/infiniband/hw/hns/hns_roce_device.h | 63 ++++++++++ drivers/infiniband/hw/hns/hns_roce_ext.c | 143 +++++++++++++++++++++++ drivers/infiniband/hw/hns/hns_roce_ext.h | 66 +++++++++++ drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 169 +++++++++++++++++++++++++-- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 52 +++++++++ drivers/infiniband/hw/hns/hns_roce_main.c | 62 +++++++++- drivers/infiniband/hw/hns/hns_roce_poe.c | 97 +++++++++++++++ drivers/infiniband/hw/hns/hns_roce_qp.c | 62 +++++++++- include/uapi/rdma/hns-abi.h | 37 +++++- 13 files changed, 966 insertions(+), 16 deletions(-) create mode 100644 drivers/infiniband/hw/hns/hns_roce_ext.c create mode 100644 drivers/infiniband/hw/hns/hns_roce_ext.h create mode 100644 drivers/infiniband/hw/hns/hns_roce_poe.c -- 2.9.5

1 6

[PATCH OLK-5.10 v2 00/38] Add error handling for device_add_disk
by Zhong Jinghua 30 Oct '23

30 Oct '23

From: Zhong Jinghua <zhongjinghua(a)huaweicloud.com> Befor these paths, block is not know when drvier is add fail. Now, add error handling for drivers. Christoph Hellwig (14): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Luis Chamberlain (11): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Tetsuo Handa (1): block: check minor range in device_add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Yu Kuai (1): block: fix memory leak for elevator on add_disk failure Zhong Jinghua (9): block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path scsi: sd: Clean up sdkp if device_add_disk() failed mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup block/blk-integrity.c | 12 +- block/blk-mq.c | 19 +++ block/blk.h | 5 +- block/genhd.c | 286 ++++++++++++++++++++++++++------------- drivers/block/brd.c | 103 ++++++-------- drivers/block/loop.c | 26 ++-- drivers/block/nbd.c | 59 ++++---- drivers/mtd/ubi/block.c | 76 +++++------ drivers/nvme/host/core.c | 41 +++--- drivers/scsi/sd.c | 8 +- drivers/scsi/sr.c | 7 +- include/linux/blk-mq.h | 12 ++ include/linux/genhd.h | 36 +++++ 13 files changed, 417 insertions(+), 273 deletions(-) -- 2.31.1

1 38

[PATCH OLK-5.10 0/3] fix CVE-2023-46813
by Yu Liao 30 Oct '23

30 Oct '23

fix CVE-2023-46813 An issue was discovered in the Linux kernel before 6.5.9, exploitable by local users with userspace access to MMIO registers. Incorrect access checking in the #VC handler and instruction emulation of the SEV-ES emulation of MMIO accesses could lead to arbitrary write access to kernel memory (and thus privilege escalation). This depends on a race condition through which userspace can replace an instruction before the #VC handler reads it. Borislav Petkov (AMD) (1): [Backport] x86/sev: Disable MMIO emulation from user mode Joerg Roedel (2): [Backport] x86/sev: Check IOBM for IOIO exceptions from user-space [Backport] x86/sev: Check for user-space IOIO pointing to kernel space arch/x86/boot/compressed/sev-es.c | 10 ++++++ arch/x86/kernel/sev-es-shared.c | 53 +++++++++++++++++++++++++------ arch/x86/kernel/sev-es.c | 30 +++++++++++++++++ 3 files changed, 84 insertions(+), 9 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10 v2 00/38] Add error handling for device_add_disk
by Zhong Jinghua 30 Oct '23

30 Oct '23

From: Zhong Jinghua <zhongjinghua(a)huaweicloud.com> Befor these paths, block is not know when drvier is add fail. Now, add error handling for drivers. Christoph Hellwig (14): block: fold register_disk into device_add_disk block: call blk_integrity_add earlier in device_add_disk block: add the events* attributes to disk_attrs block: fix error unwinding in device_add_disk block: clear ->slave_dir when dropping the main slave_dir reference block: add blk_alloc_disk and blk_cleanup_disk APIs blk-mq: add the blk_mq_alloc_disk APIs loop: use blk_mq_alloc_disk and blk_cleanup_disk loop: fix order of cleaning up the queue and freeing the tagset nbd: use blk_mq_alloc_disk and blk_cleanup_disk block: add a flag to make put_disk on partially initalized disks safer nvme: use blk_mq_alloc_disk brd: convert to blk_alloc_disk/blk_cleanup_disk ubi: use blk_mq_alloc_disk and blk_cleanup_disk Dan Carpenter (1): blk-mq: fix an IS_ERR() vs NULL bug Luis Chamberlain (11): block: return errors from blk_integrity_add block: return errors from disk_alloc_events block: add error handling for device_add_disk / add_disk block: fix device_add_disk() kobject_create_and_add() error handling loop: add error handling support for add_disk() nbd: add error handling support for add_disk() nvme: add error handling support for add_disk() block/brd: add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: sr: Add error handling support for add_disk() mtd/ubi/block: add error handling support for add_disk() Tetsuo Handa (1): block: check minor range in device_add_disk() Wang Qing (1): nbd: fix order of cleaning up the queue and freeing the tagset Yu Kuai (1): block: fix memory leak for elevator on add_disk failure Zhong Jinghua (9): block: return errors from blk_register_region block: Fix the kabi change in device_add_disk block: Fix the kabi change on blk_register_region block: call blk_get_queue earlier in __device_add_disk block: Fix minor range check in device_add_disk() block: Set memalloc_noio to false in the error path scsi: sd: Clean up sdkp if device_add_disk() failed mtd/ubi/block: Fix null pointer dereference issue in error path mtd/ubi/block: Fix uaf problem in ubiblock_cleanup block/blk-integrity.c | 12 +- block/blk-mq.c | 19 +++ block/blk.h | 5 +- block/genhd.c | 286 ++++++++++++++++++++++++++------------- drivers/block/brd.c | 103 ++++++-------- drivers/block/loop.c | 26 ++-- drivers/block/nbd.c | 59 ++++---- drivers/mtd/ubi/block.c | 76 +++++------ drivers/nvme/host/core.c | 41 +++--- drivers/scsi/sd.c | 8 +- drivers/scsi/sr.c | 7 +- include/linux/blk-mq.h | 12 ++ include/linux/genhd.h | 36 +++++ 13 files changed, 417 insertions(+), 273 deletions(-) -- 2.31.1

2 39

[openEuler-1.0-LTS 00/15] Fix the UAF when task fails.
by Zhong Jinghua 30 Oct '23

30 Oct '23

The NetLlink mechanism is used to send unreliable notifications when a task fails. Bharath Ravi (1): scsi: iscsi: Perform connection failure entirely in kernel space Gabriel Krisman Bertazi (2): scsi: iscsi: Report connection state in sysfs scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim Gulam Mohamed (1): scsi: iscsi: Fix race condition between login and sync thread Mike Christie (10): scsi: iscsi: Fix iSCSI cls conn state scsi: iscsi: Force immediate failure during shutdown scsi: iscsi: Rel ref after iscsi_lookup_endpoint() scsi: iscsi: Fix in-kernel conn failure handling scsi: iscsi: Fix set_param() handling scsi: iscsi: Move iscsi_ep_disconnect() scsi: iscsi: Fix offload conn cleanup when iscsid restarts scsi: iscsi: Fix conn cleanup and stop race during iscsid restart scsi: iscsi: Fix endpoint reuse regression scsi: iscsi: Fix unbound endpoint error handling Zhong Jinghua (1): scsi: iscsi: Fix kabi change in iscsi_cls_conn drivers/infiniband/ulp/iser/iscsi_iser.c | 1 + drivers/scsi/be2iscsi/be_iscsi.c | 19 +- drivers/scsi/bnx2i/bnx2i_iscsi.c | 23 +- drivers/scsi/cxgbi/libcxgbi.c | 12 +- drivers/scsi/libiscsi.c | 21 +- drivers/scsi/qedi/qedi_iscsi.c | 25 +- drivers/scsi/qla4xxx/ql4_os.c | 1 + drivers/scsi/scsi_transport_iscsi.c | 487 ++++++++++++++++++----- include/scsi/scsi_transport_iscsi.h | 28 ++ 9 files changed, 485 insertions(+), 132 deletions(-) -- 2.31.1

1 15

[PATCH OLK-5.10] net: xfrm: Fix xfrm_address_filter OOB read
by Dong Chenchen 30 Oct '23

30 Oct '23

From: Lin Ma <linma(a)zju.edu.cn> stable inclusion from stable-v5.10.192 commit 7e50815d29037e08d3d26f3ebc41bcec729847b7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I85DZ6 CVE: CVE-2023-39194 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit dfa73c17d55b921e1d4e154976de35317e43a93a ] We found below OOB crash: [ 44.211730] ================================================================== [ 44.212045] BUG: KASAN: slab-out-of-bounds in memcmp+0x8b/0xb0 [ 44.212045] Read of size 8 at addr ffff88800870f320 by task poc.xfrm/97 [ 44.212045] [ 44.212045] CPU: 0 PID: 97 Comm: poc.xfrm Not tainted 6.4.0-rc7-00072-gdad9774deaf1-dirty #4 [ 44.212045] Call Trace: [ 44.212045] <TASK> [ 44.212045] dump_stack_lvl+0x37/0x50 [ 44.212045] print_report+0xcc/0x620 [ 44.212045] ? __virt_addr_valid+0xf3/0x170 [ 44.212045] ? memcmp+0x8b/0xb0 [ 44.212045] kasan_report+0xb2/0xe0 [ 44.212045] ? memcmp+0x8b/0xb0 [ 44.212045] kasan_check_range+0x39/0x1c0 [ 44.212045] memcmp+0x8b/0xb0 [ 44.212045] xfrm_state_walk+0x21c/0x420 [ 44.212045] ? __pfx_dump_one_state+0x10/0x10 [ 44.212045] xfrm_dump_sa+0x1e2/0x290 [ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10 [ 44.212045] ? __kernel_text_address+0xd/0x40 [ 44.212045] ? kasan_unpoison+0x27/0x60 [ 44.212045] ? mutex_lock+0x60/0xe0 [ 44.212045] ? __pfx_mutex_lock+0x10/0x10 [ 44.212045] ? kasan_save_stack+0x22/0x50 [ 44.212045] netlink_dump+0x322/0x6c0 [ 44.212045] ? __pfx_netlink_dump+0x10/0x10 [ 44.212045] ? mutex_unlock+0x7f/0xd0 [ 44.212045] ? __pfx_mutex_unlock+0x10/0x10 [ 44.212045] __netlink_dump_start+0x353/0x430 [ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410 [ 44.212045] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10 [ 44.212045] ? __pfx_xfrm_dump_sa_done+0x10/0x10 [ 44.212045] ? __stack_depot_save+0x382/0x4e0 [ 44.212045] ? filter_irq_stacks+0x1c/0x70 [ 44.212045] ? kasan_save_stack+0x32/0x50 [ 44.212045] ? kasan_save_stack+0x22/0x50 [ 44.212045] ? kasan_set_track+0x25/0x30 [ 44.212045] ? __kasan_slab_alloc+0x59/0x70 [ 44.212045] ? kmem_cache_alloc_node+0xf7/0x260 [ 44.212045] ? kmalloc_reserve+0xab/0x120 [ 44.212045] ? __alloc_skb+0xcf/0x210 [ 44.212045] ? netlink_sendmsg+0x509/0x700 [ 44.212045] ? sock_sendmsg+0xde/0xe0 [ 44.212045] ? __sys_sendto+0x18d/0x230 [ 44.212045] ? __x64_sys_sendto+0x71/0x90 [ 44.212045] ? do_syscall_64+0x3f/0x90 [ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] ? netlink_sendmsg+0x509/0x700 [ 44.212045] ? sock_sendmsg+0xde/0xe0 [ 44.212045] ? __sys_sendto+0x18d/0x230 [ 44.212045] ? __x64_sys_sendto+0x71/0x90 [ 44.212045] ? do_syscall_64+0x3f/0x90 [ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] ? kasan_save_stack+0x22/0x50 [ 44.212045] ? kasan_set_track+0x25/0x30 [ 44.212045] ? kasan_save_free_info+0x2e/0x50 [ 44.212045] ? __kasan_slab_free+0x10a/0x190 [ 44.212045] ? kmem_cache_free+0x9c/0x340 [ 44.212045] ? netlink_recvmsg+0x23c/0x660 [ 44.212045] ? sock_recvmsg+0xeb/0xf0 [ 44.212045] ? __sys_recvfrom+0x13c/0x1f0 [ 44.212045] ? __x64_sys_recvfrom+0x71/0x90 [ 44.212045] ? do_syscall_64+0x3f/0x90 [ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] ? copyout+0x3e/0x50 [ 44.212045] netlink_rcv_skb+0xd6/0x210 [ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [ 44.212045] ? __pfx_netlink_rcv_skb+0x10/0x10 [ 44.212045] ? __pfx_sock_has_perm+0x10/0x10 [ 44.212045] ? mutex_lock+0x8d/0xe0 [ 44.212045] ? __pfx_mutex_lock+0x10/0x10 [ 44.212045] xfrm_netlink_rcv+0x44/0x50 [ 44.212045] netlink_unicast+0x36f/0x4c0 [ 44.212045] ? __pfx_netlink_unicast+0x10/0x10 [ 44.212045] ? netlink_recvmsg+0x500/0x660 [ 44.212045] netlink_sendmsg+0x3b7/0x700 [ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10 [ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10 [ 44.212045] sock_sendmsg+0xde/0xe0 [ 44.212045] __sys_sendto+0x18d/0x230 [ 44.212045] ? __pfx___sys_sendto+0x10/0x10 [ 44.212045] ? rcu_core+0x44a/0xe10 [ 44.212045] ? __rseq_handle_notify_resume+0x45b/0x740 [ 44.212045] ? _raw_spin_lock_irq+0x81/0xe0 [ 44.212045] ? __pfx___rseq_handle_notify_resume+0x10/0x10 [ 44.212045] ? __pfx_restore_fpregs_from_fpstate+0x10/0x10 [ 44.212045] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 [ 44.212045] ? __pfx_task_work_run+0x10/0x10 [ 44.212045] __x64_sys_sendto+0x71/0x90 [ 44.212045] do_syscall_64+0x3f/0x90 [ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] RIP: 0033:0x44b7da [ 44.212045] RSP: 002b:00007ffdc8838548 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 44.212045] RAX: ffffffffffffffda RBX: 00007ffdc8839978 RCX: 000000000044b7da [ 44.212045] RDX: 0000000000000038 RSI: 00007ffdc8838770 RDI: 0000000000000003 [ 44.212045] RBP: 00007ffdc88385b0 R08: 00007ffdc883858c R09: 000000000000000c [ 44.212045] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 44.212045] R13: 00007ffdc8839968 R14: 00000000004c37d0 R15: 0000000000000001 [ 44.212045] </TASK> [ 44.212045] [ 44.212045] Allocated by task 97: [ 44.212045] kasan_save_stack+0x22/0x50 [ 44.212045] kasan_set_track+0x25/0x30 [ 44.212045] __kasan_kmalloc+0x7f/0x90 [ 44.212045] __kmalloc_node_track_caller+0x5b/0x140 [ 44.212045] kmemdup+0x21/0x50 [ 44.212045] xfrm_dump_sa+0x17d/0x290 [ 44.212045] netlink_dump+0x322/0x6c0 [ 44.212045] __netlink_dump_start+0x353/0x430 [ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410 [ 44.212045] netlink_rcv_skb+0xd6/0x210 [ 44.212045] xfrm_netlink_rcv+0x44/0x50 [ 44.212045] netlink_unicast+0x36f/0x4c0 [ 44.212045] netlink_sendmsg+0x3b7/0x700 [ 44.212045] sock_sendmsg+0xde/0xe0 [ 44.212045] __sys_sendto+0x18d/0x230 [ 44.212045] __x64_sys_sendto+0x71/0x90 [ 44.212045] do_syscall_64+0x3f/0x90 [ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] [ 44.212045] The buggy address belongs to the object at ffff88800870f300 [ 44.212045] which belongs to the cache kmalloc-64 of size 64 [ 44.212045] The buggy address is located 32 bytes inside of [ 44.212045] allocated 36-byte region [ffff88800870f300, ffff88800870f324) [ 44.212045] [ 44.212045] The buggy address belongs to the physical page: [ 44.212045] page:00000000e4de16ee refcount:1 mapcount:0 mapping:000000000 ... [ 44.212045] flags: 0x100000000000200(slab|node=0|zone=1) [ 44.212045] page_type: 0xffffffff() [ 44.212045] raw: 0100000000000200 ffff888004c41640 dead000000000122 0000000000000000 [ 44.212045] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 44.212045] page dumped because: kasan: bad access detected [ 44.212045] [ 44.212045] Memory state around the buggy address: [ 44.212045] ffff88800870f200: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 44.212045] ffff88800870f280: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] >ffff88800870f300: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] ^ [ 44.212045] ffff88800870f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] ffff88800870f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] ================================================================== By investigating the code, we find the root cause of this OOB is the lack of checks in xfrm_dump_sa(). The buggy code allows a malicious user to pass arbitrary value of filter->splen/dplen. Hence, with crafted xfrm states, the attacker can achieve 8 bytes heap OOB read, which causes info leak. if (attrs[XFRMA_ADDRESS_FILTER]) { filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]), sizeof(*filter), GFP_KERNEL); if (filter == NULL) return -ENOMEM; // NO MORE CHECKS HERE !!! } This patch fixes the OOB by adding necessary boundary checks, just like the code in pfkey_dump() function. Fixes: d3623099d350 ("ipsec: add support of limited SA dump") Signed-off-by: Lin Ma <linma(a)zju.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- net/xfrm/xfrm_user.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 138f71c67969..9956acbaffcc 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -1049,6 +1049,15 @@ static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb) sizeof(*filter), GFP_KERNEL); if (filter == NULL) return -ENOMEM; + + /* see addr_match(), (prefix length >> 5) << 2 + * will be used to compare xfrm_address_t + */ + if (filter->splen > (sizeof(xfrm_address_t) << 3) || + filter->dplen > (sizeof(xfrm_address_t) << 3)) { + kfree(filter); + return -EINVAL; + } } if (attrs[XFRMA_PROTO]) -- 2.25.1

2 1

[PATCH openEuler-22.03-LTS v2 00/16] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 30 Oct '23

30 Oct '23

In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the self-developed patch, so revert it. The next 5 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (14): arm64: report EL1 UNDEFs better arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 396 insertions(+), 427 deletions(-) -- 2.34.1

2 17

[PATCH openEuler-22.03-LTS-SP1 00/15] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 30 Oct '23

30 Oct '23

In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the self-developed patch, so revert it. The next 4 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (13): arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 395 insertions(+), 428 deletions(-) -- 2.34.1

2 16

[PATCH openEuler-22.03-LTS 00/16] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 30 Oct '23

30 Oct '23

In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the self-developed patch, so revert it. The next 5 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (14): arm64: report EL1 UNDEFs better arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 396 insertions(+), 427 deletions(-) -- 2.34.1

2 17

[PATCH OLK-5.10 0/3] Synchronize mainline hisilicon uncore pmu driver bugfix to openEuler-OLK-5.10
by Junhao He 28 Oct '23

28 Oct '23

Synchronize mainline hisilicon uncore pmu driver bugfix to openEuler-OLK-5.10 This patchset includes 2 minor updates for the hisi_pcie_pmu: - fix issue that we may touch others events in some case - modify the event->cpu only on the success pmu::event_init() - dix use-after-free when register hisilicon ddrc pmu fails Junhao He (1): perf: hisi: Fix use-after-free when register pmu fails Yicong Yang (2): drivers/perf: hisi_pcie: Check the type first in pmu::event_init() drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf/hisilicon/hisi_pcie_pmu.c | 9 +++++---- drivers/perf/hisilicon/hisi_uncore_pa_pmu.c | 4 ++-- drivers/perf/hisilicon/hisi_uncore_sllc_pmu.c | 4 ++-- 3 files changed, 9 insertions(+), 8 deletions(-) -- 2.33.0

2 4

[PATCH OLK-5.10 0/7] coresight: trbe: Enable ACPI based devices
by Junhao He 28 Oct '23

28 Oct '23

This series enables detection of ACPI based TRBE devices via a stand alone purpose built representative platform device. But as a pre-requisite this changes coresight_platform_data structure assignment for the TRBE device. | commit from OLK-5.10 | patch | Conflict or Not | description | | --------------- | ------------ | --------------- | ------------ | | 003e440c95a0 | coresight: trbe: Enable ACPI based TRBE devices | No | | | c64a976a27cd | coresight: trbe: Add a representative coresight_platform_data for TRBE | No | | | eb630e5b2a47 | arm_pmu: acpi: Add a representative platform device for TRBE | Yes | Resolve code conflicts without changing functions | | a25b79f824ec | arm_pmu: acpi: Refactor arm_spe_acpi_register_device() | No | | | 54c422019d57 | Revert "coresight: Return the pointer of @pdata when not "fwnode"" | No | mainline`s patch "c64a976a27cd" fixes the issue | | 7e4acd18f0d0 | Revert "arm64/trbe: Add initial MADT/SPE probing" | No | mainline`s patch "eb630e5b2a47" adds ARM TRBE device support | | fbdfedf671b9 | Revert "coresight: trbe: Enable ACPI/Platform automatic module loading" | No | mainline`s patch "003e440c95a0" adds ARM TRBE ACPI support | Anshuman Khandual (4): arm_pmu: acpi: Refactor arm_spe_acpi_register_device() arm_pmu: acpi: Add a representative platform device for TRBE coresight: trbe: Add a representative coresight_platform_data for TRBE coresight: trbe: Enable ACPI based TRBE devices Junhao He (3): Revert "coresight: trbe: Enable ACPI/Platform automatic module loading" Revert "arm64/trbe: Add initial MADT/SPE probing" Revert "coresight: Return the pointer of @pdata when not "fwnode"" arch/arm64/configs/openeuler_defconfig | 1 - arch/arm64/kernel/Makefile | 1 - arch/arm64/kernel/acpi_trbe.c | 81 ----------- drivers/hwtracing/coresight/Kconfig | 4 - .../hwtracing/coresight/coresight-platform.c | 6 +- drivers/hwtracing/coresight/coresight-trbe.c | 26 +++- drivers/hwtracing/coresight/coresight-trbe.h | 2 + drivers/perf/arm_pmu_acpi.c | 137 +++++++++++++----- include/linux/coresight.h | 2 - include/linux/perf/arm_pmu.h | 1 + 10 files changed, 123 insertions(+), 138 deletions(-) delete mode 100644 arch/arm64/kernel/acpi_trbe.c -- 2.33.0

2 8

[PATCH OLK-5.10 v3 00/15] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 28 Oct '23

28 Oct '23

In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the self-developed patch, so revert it. The next 4 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. I've tested this with userspace programs which use each of the deprecated instructions on Raspberry Pi 4B KVM/Qemu, and I've concurrently modified the support level for each of the features back-and-forth between HW and emulated to check that there are no oops or above repeated addition or deletion call trace. Changes in v3: - Add conflicts message. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (13): arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 395 insertions(+), 428 deletions(-) -- 2.34.1

2 16

[PATCH openEuler-1.0-LTS] sched/membarrier: fix missing local execution of ipi_sync_rq_state()
by Xia Fukun 28 Oct '23

28 Oct '23

From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> mainline inclusion from mainline-v6.5-rc7 commit ce29ddc47b91f97e7f69a0fb7cbb5845f52a9825 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BN94 CVE: NA -------------------------------- The function sync_runqueues_membarrier_state() should copy the membarrier state from the @mm received as parameter to each runqueue currently running tasks using that mm. However, the use of smp_call_function_many() skips the current runqueue, which is unintended. Replace by a call to on_each_cpu_mask(). Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load") Reported-by: Nadav Amit <nadav.amit(a)gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Cc: stable(a)vger.kernel.org # 5.4.x+ Link: https://lore.kernel.org/r/74F1E842-4A84-47BF-B6C2-5407DFDD4A4A@gmail.com Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/membarrier.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 8c4e14e6544a..1757074be994 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -255,9 +255,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm) } rcu_read_unlock(); - preempt_disable(); - smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1); - preempt_enable(); + on_each_cpu_mask(tmpmask, ipi_sync_rq_state, mm, true); free_cpumask_var(tmpmask); cpus_read_unlock(); -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] sched/cpuacct: Fix charge cpuacct.usage_sys
by Xia Fukun 28 Oct '23

28 Oct '23

From: Muchun Song <songmuchun(a)bytedance.com> mainline inclusion from mainline-v6.5-rc7 commit dbe9337109c2705f08e6a00392f991eb2d2570a5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BN4B CVE: NA -------------------------------- The user_mode(task_pt_regs(tsk)) always return true for user thread, and false for kernel thread. So it means that the cpuacct.usage_sys is the time that kernel thread uses not the time that thread uses in the kernel mode. We can try get_irq_regs() first, if it is NULL, then we can fall back to task_pt_regs(). Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Link: https://lkml.kernel.org/r/20200420070453.76815-1-songmuchun@bytedance.com Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/cpuacct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 9fbb10383434..254a60c99435 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -5,6 +5,7 @@ * Based on the work by Paul Menage (menage(a)google.com) and Balbir Singh * (balbir(a)in.ibm.com) */ +#include <asm/irq_regs.h> #include "sched.h" /* Time spent by the tasks of the CPU accounting group executing in ... */ @@ -339,7 +340,7 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; int index = CPUACCT_STAT_SYSTEM; - struct pt_regs *regs = task_pt_regs(tsk); + struct pt_regs *regs = get_irq_regs() ? : task_pt_regs(tsk); if (regs && user_mode(regs)) index = CPUACCT_STAT_USER; -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
by Wenyu Huang 28 Oct '23

28 Oct '23

From: Kaitao Cheng <pilgrimtao(a)gmail.com> stable inclusion from stable-4.19 commit 50f9ad607ea891a9308e67b81f774c71736d1098 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAW8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- In the function, if register_trace_sched_migrate_task() returns error, sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is why fail_deprobe_sched_switch was added. Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com Cc: stable(a)vger.kernel.org Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing") Signed-off-by: Kaitao Cheng <pilgrimtao(a)gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Wenyu Huang <huangwenyu4(a)huawei.com> --- kernel/trace/trace_sched_wakeup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 0db1b024e786..8041bd5e4262 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -642,7 +642,7 @@ static void start_wakeup_tracer(struct trace_array *tr) if (ret) { pr_info("wakeup trace: Couldn't activate tracepoint" " probe to kernel_sched_migrate_task\n"); - return; + goto fail_deprobe_sched_switch; } wakeup_reset(tr); @@ -660,6 +660,8 @@ static void start_wakeup_tracer(struct trace_array *tr) printk(KERN_ERR "failed to start wakeup tracer\n"); return; +fail_deprobe_sched_switch: + unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL); fail_deprobe_wake_new: unregister_trace_sched_wakeup_new(probe_wakeup, NULL); fail_deprobe: -- 2.34.1

2 1

[PATCH OLK-5.10 v5 0/6] Synchronize coresight driver bugfix patches to openEuler
by Junhao He 28 Oct '23

28 Oct '23

1. Fix memory leak coresight: Fix memory leak in acpi_buffer->pointer 2. Fix TRBE bug coresight: trbe: Fix TRBE potential sleep in atomic context coresight: trbe: Allocate platform data per device coresight: trbe: Fix return value check in arm_trbe_register_coresight_cpu() 3. Fix ETR bug coresight: Fix run time warnings while reusing ETR buffer coresight: tmc-etr: Disable warnings for allocation failures | mainline commit | patch | Conflict or Not | description | | --------------- | ------------ | --------------- | ------------ | | 34149bd82c35 | coresight: Fix memory leak in acpi_buffer->pointer | No | | | 2e0fdff6f374 | coresight: Fix run time warnings while reusing ETR buffer | Yes | Resolve code conflicts without changing functions | | d3ce83b13fbd | coresight: trbe: Fix TRBE potential sleep in atomic context | Yes | Resolve code conflicts without changing functions | | 91ee6ec9039e | coresight: Fix memory leak in acpi_buffer->pointer | Yes | Dependency cleanup patch-set does not applied to OLK-5.10 | | e911e9760c77 | coresight: trbe: Allocate platform data per device | No | depend on ��c7e95621feea�� | | c7e95621feea | coresight: trbe: Fix return value check in arm_trbe_register_coresight_cpu() | No | | Junhao He (2) coresight: Fix memory leak in acpi_buffer->pointer coresight: trbe: Fix TRBE potential sleep in atomic context Linu Cherian (1): coresight: Fix run time warnings while reusing ETR buffer Suzuki K Poulose (2): coresight: trbe: Allocate platform data per device coresight: tmc-etr: Disable warnings for allocation failures Wei Yongjun (1): coresight: trbe: Fix return value check in arm_trbe_register_coresight_cpu() .../hwtracing/coresight/coresight-platform.c | 46 ++++++++++++------- .../hwtracing/coresight/coresight-tmc-etr.c | 23 +++++----- drivers/hwtracing/coresight/coresight-trbe.c | 41 ++++++++--------- 3 files changed, 62 insertions(+), 48 deletions(-) -- 2.33.0

2 7

[PATCH OLK-5.10 0/2] Fix CVE-2023-5717
by Yang Jihong 28 Oct '23

28 Oct '23

Peter Zijlstra (1): perf: Disallow mis-matched inherited group reads Yang Jihong (1): perf: Fix kabi breakage in struct perf_event include/linux/perf_event.h | 2 ++ kernel/events/core.c | 40 ++++++++++++++++++++++++++++++++------ 2 files changed, 36 insertions(+), 6 deletions(-) -- 2.34.1

2 3

[PATCH v2 openEuler-1.0-LTS] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
by Wenyu Huang 27 Oct '23

27 Oct '23

From: Kaitao Cheng <pilgrimtao(a)gmail.com> stable inclusion from stable-v4.19.297 commit 50f9ad607ea891a9308e67b81f774c71736d1098 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAW8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- In the function, if register_trace_sched_migrate_task() returns error, sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is why fail_deprobe_sched_switch was added. Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com Cc: stable(a)vger.kernel.org Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing") Signed-off-by: Kaitao Cheng <pilgrimtao(a)gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Wenyu Huang <huangwenyu4(a)huawei.com> --- kernel/trace/trace_sched_wakeup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 0db1b024e786..8041bd5e4262 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -642,7 +642,7 @@ static void start_wakeup_tracer(struct trace_array *tr) if (ret) { pr_info("wakeup trace: Couldn't activate tracepoint" " probe to kernel_sched_migrate_task\n"); - return; + goto fail_deprobe_sched_switch; } wakeup_reset(tr); @@ -660,6 +660,8 @@ static void start_wakeup_tracer(struct trace_array *tr) printk(KERN_ERR "failed to start wakeup tracer\n"); return; +fail_deprobe_sched_switch: + unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL); fail_deprobe_wake_new: unregister_trace_sched_wakeup_new(probe_wakeup, NULL); fail_deprobe: -- 2.34.1

2 1

[PATCH OLK-5.10 v2 00/15] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 27 Oct '23

27 Oct '23

MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the self-developed patch, so revert it. The next 4 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. I've tested this with userspace programs which use each of the deprecated instructions on Raspberry Pi 4B KVM/Qemu, and I've concurrently modified the support level for each of the features back-and-forth between HW and emulated to check that there are no oops or above repeated addition or deletion call trace. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (13): arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 395 insertions(+), 428 deletions(-) -- 2.34.1

2 16

[PATCH openEuler-1.0-LTS] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
by Wenyu Huang 27 Oct '23

27 Oct '23

From: Kaitao Cheng <pilgrimtao(a)gmail.com> stable inclusion from stable-4.19.297 commit 50f9ad607ea891a9308e67b81f774c71736d1098 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAW8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- In the function, if register_trace_sched_migrate_task() returns error, sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is why fail_deprobe_sched_switch was added. Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com Cc: stable(a)vger.kernel.org Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing") Signed-off-by: Kaitao Cheng <pilgrimtao(a)gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Wenyu Huang <huangwenyu4(a)huawei.com> --- kernel/trace/trace_sched_wakeup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 0db1b024e786..8041bd5e4262 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -642,7 +642,7 @@ static void start_wakeup_tracer(struct trace_array *tr) if (ret) { pr_info("wakeup trace: Couldn't activate tracepoint" " probe to kernel_sched_migrate_task\n"); - return; + goto fail_deprobe_sched_switch; } wakeup_reset(tr); @@ -660,6 +660,8 @@ static void start_wakeup_tracer(struct trace_array *tr) printk(KERN_ERR "failed to start wakeup tracer\n"); return; +fail_deprobe_sched_switch: + unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL); fail_deprobe_wake_new: unregister_trace_sched_wakeup_new(probe_wakeup, NULL); fail_deprobe: -- 2.34.1

2 1

[PATCH OLK-5.10 0/6] improve pagecache PSI annotations
by Liu Shixin 27 Oct '23

27 Oct '23

Currently the VM tries to abuse the block layer submission path for the page cache PSI annotations. This series instead annotates the ->read_folio and ->readahead calls in the core VM code, and then only deals with the odd direct add_to_page_cache_lru calls manually. Christoph Hellwig (5): mm: add PSI accounting around ->read_folio and ->readahead calls sched/psi: export psi_memstall_{enter,leave} btrfs: add manual PSI accounting for compressed reads erofs: add manual PSI accounting for the compressed address space block: remove PSI accounting from the bio layer Johannes Weiner (1): fs: fix leaked psi pressure state block/bio.c | 8 -------- block/blk-core.c | 19 ------------------- fs/btrfs/compression.c | 16 ++++++++++++++-- fs/direct-io.c | 2 -- fs/erofs/zdata.c | 17 ++++++++++++++++- include/linux/blk_types.h | 1 - include/linux/pagemap.h | 2 ++ kernel/sched/psi.c | 2 ++ mm/filemap.c | 7 +++++++ mm/readahead.c | 7 +++++++ 10 files changed, 48 insertions(+), 33 deletions(-) -- 2.25.1

2 7

[PATCH OLK-5.10] ubi: Refuse attaching if mtd's erasesize is 0
by ZhaoLong Wang 27 Oct '23

27 Oct '23

From: Zhihao Cheng <chengzhihao1(a)huawei.com> mainline inclusion from mainline-v6.6-rc5 commit 017c73a34a661a861712f7cc1393a123e5b2208c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6YOAJ CVE: CVE-2023-31085 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ------------------------------------------------- There exists mtd devices with zero erasesize, which will trigger a divide-by-zero exception while attaching ubi device. Fix it by refusing attaching if mtd's erasesize is 0. Fixes: 801c135ce73d ("UBI: Unsorted Block Images") Reported-by: Yu Hao <yhao016(a)ucr.edu> Link: https://lore.kernel.org/lkml/977347543.226888.1682011999468.JavaMail.zimbra… Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: Miquel Raynal <miquel.raynal(a)bootlin.com> Signed-off-by: Richard Weinberger <richard(a)nod.at> Signed-off-by: ZhaoLong Wang <wangzhaolong1(a)huawei.com> --- drivers/mtd/ubi/build.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 1052bb60063f..403f61c3bf73 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -896,6 +896,13 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num, return -EINVAL; } + /* UBI cannot work on flashes with zero erasesize. */ + if (!mtd->erasesize) { + pr_err("ubi: refuse attaching mtd%d - zero erasesize flash is not supported\n", + mtd->index); + return -EINVAL; + } + if (ubi_num == UBI_DEV_NUM_AUTO) { /* Search for an empty slot in the @ubi_devices array */ for (ubi_num = 0; ubi_num < UBI_MAX_DEVICES; ubi_num++) -- 2.34.3

2 1

[PATCH openEuler-1.0-LTS] ubi: Refuse attaching if mtd's erasesize is 0
by ZhaoLong Wang 27 Oct '23

27 Oct '23

From: Zhihao Cheng <chengzhihao1(a)huawei.com> mainline inclusion from mainline-v6.6-rc5 commit 017c73a34a661a861712f7cc1393a123e5b2208c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6YOAJ CVE: CVE-2023-31085 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ------------------------------------------------- There exists mtd devices with zero erasesize, which will trigger a divide-by-zero exception while attaching ubi device. Fix it by refusing attaching if mtd's erasesize is 0. Fixes: 801c135ce73d ("UBI: Unsorted Block Images") Reported-by: Yu Hao <yhao016(a)ucr.edu> Link: https://lore.kernel.org/lkml/977347543.226888.1682011999468.JavaMail.zimbra… Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: Miquel Raynal <miquel.raynal(a)bootlin.com> Signed-off-by: Richard Weinberger <richard(a)nod.at> Signed-off-by: ZhaoLong Wang <wangzhaolong1(a)huawei.com> --- drivers/mtd/ubi/build.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index c120c8761fcd..93cce95287ba 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -866,6 +866,13 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num, return -EINVAL; } + /* UBI cannot work on flashes with zero erasesize. */ + if (!mtd->erasesize) { + pr_err("ubi: refuse attaching mtd%d - zero erasesize flash is not supported\n", + mtd->index); + return -EINVAL; + } + if (ubi_num == UBI_DEV_NUM_AUTO) { /* Search for an empty slot in the @ubi_devices array */ for (ubi_num = 0; ubi_num < UBI_MAX_DEVICES; ubi_num++) -- 2.34.3

2 1

[PATCH openEuler-1.0-LTS] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
by Wenyu Huang 27 Oct '23

27 Oct '23

From: Kaitao Cheng <pilgrimtao(a)gmail.com> stable inclusion from stable-4.19 commit 50f9ad607ea891a9308e67b81f774c71736d1098 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAW8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- In the function, if register_trace_sched_migrate_task() returns error, sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is why fail_deprobe_sched_switch was added. Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com Cc: stable(a)vger.kernel.org Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing") Signed-off-by: Kaitao Cheng <pilgrimtao(a)gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Wenyu Huang <huangwenyu4(a)huawei.com> --- kernel/trace/trace_sched_wakeup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 0db1b024e786..8041bd5e4262 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -642,7 +642,7 @@ static void start_wakeup_tracer(struct trace_array *tr) if (ret) { pr_info("wakeup trace: Couldn't activate tracepoint" " probe to kernel_sched_migrate_task\n"); - return; + goto fail_deprobe_sched_switch; } wakeup_reset(tr); @@ -660,6 +660,8 @@ static void start_wakeup_tracer(struct trace_array *tr) printk(KERN_ERR "failed to start wakeup tracer\n"); return; +fail_deprobe_sched_switch: + unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL); fail_deprobe_wake_new: unregister_trace_sched_wakeup_new(probe_wakeup, NULL); fail_deprobe: -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] crypto: hisilicon - fix different versions of devices driver compatibility issue
by zhongkeyi 27 Oct '23

27 Oct '23

driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8BFOM CVE: NA ---------------------------------------------------------------------- In order to be compatible with devices of different versions, V1 in the accelerator driver is now isolated, and other versions are the previous V2 processing flow. Signed-off-by: zhongkeyi <zhongkeyi1(a)huawei.com> --- drivers/crypto/hisilicon/hpre/hpre_main.c | 2 +- drivers/crypto/hisilicon/qm.c | 70 +++++++++-------------- drivers/crypto/hisilicon/qm.h | 14 ++--- drivers/crypto/hisilicon/rde/rde_main.c | 12 +--- drivers/crypto/hisilicon/sec2/sec_main.c | 12 +--- drivers/crypto/hisilicon/zip/zip_main.c | 12 +--- 6 files changed, 37 insertions(+), 85 deletions(-) diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c index cbe8ea438fd2..8c71353cd4b5 100644 --- a/drivers/crypto/hisilicon/hpre/hpre_main.c +++ b/drivers/crypto/hisilicon/hpre/hpre_main.c @@ -840,7 +840,7 @@ static int hpre_pf_probe_init(struct hisi_qm *qm) { int ret; - if (qm->ver != QM_HW_V2) + if (qm->ver == QM_HW_V1) return -EINVAL; qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index f2706dc0d55e..5562c63bfeeb 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -682,7 +682,7 @@ static int qm_irq_register(struct hisi_qm *qm) if (ret) return ret; - if (qm->ver == QM_HW_V2) { + if (qm->ver != QM_HW_V1) { ret = request_irq(pci_irq_vector(pdev, QM_AEQ_EVENT_IRQ_VECTOR), qm_aeq_irq, IRQF_SHARED, qm->dev_name, qm); if (ret) @@ -713,13 +713,12 @@ static void qm_irq_unregister(struct hisi_qm *qm) free_irq(pci_irq_vector(pdev, QM_EQ_EVENT_IRQ_VECTOR), qm); - if (qm->ver == QM_HW_V2) { - free_irq(pci_irq_vector(pdev, QM_AEQ_EVENT_IRQ_VECTOR), qm); + if (qm->ver == QM_HW_V1) + return; - if (qm->fun_type == QM_HW_PF) - free_irq(pci_irq_vector(pdev, - QM_ABNORMAL_EVENT_IRQ_VECTOR), qm); - } + free_irq(pci_irq_vector(pdev, QM_AEQ_EVENT_IRQ_VECTOR), qm); + if (qm->fun_type == QM_HW_PF) + free_irq(pci_irq_vector(pdev, QM_ABNORMAL_EVENT_IRQ_VECTOR), qm); } static void qm_init_qp_status(struct hisi_qp *qp) @@ -741,36 +740,26 @@ static void qm_vft_data_cfg(struct hisi_qm *qm, enum vft_type type, u32 base, if (number > 0) { switch (type) { case SQC_VFT: - switch (qm->ver) { - case QM_HW_V1: + if (qm->ver == QM_HW_V1) { tmp = QM_SQC_VFT_BUF_SIZE | QM_SQC_VFT_SQC_SIZE | QM_SQC_VFT_INDEX_NUMBER | QM_SQC_VFT_VALID | (u64)base << QM_SQC_VFT_START_SQN_SHIFT; - break; - case QM_HW_V2: + } else { tmp = (u64)base << QM_SQC_VFT_START_SQN_SHIFT | QM_SQC_VFT_VALID | (u64)(number - 1) << QM_SQC_VFT_SQN_SHIFT; - break; - case QM_HW_UNKNOWN: - break; } break; case CQC_VFT: - switch (qm->ver) { - case QM_HW_V1: + if (qm->ver == QM_HW_V1) { tmp = QM_CQC_VFT_BUF_SIZE | QM_CQC_VFT_SQC_SIZE | QM_CQC_VFT_INDEX_NUMBER | QM_CQC_VFT_VALID; - break; - case QM_HW_V2: + } else { tmp = QM_CQC_VFT_VALID; - break; - case QM_HW_UNKNOWN: - break; } break; } @@ -1827,7 +1816,7 @@ static int qm_sq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) if (ver == QM_HW_V1) { sqc->dw3 = cpu_to_le32(QM_MK_SQC_DW3_V1(0, 0, 0, qm->sqe_size)); sqc->w8 = cpu_to_le16(QM_Q_DEPTH - 1); - } else if (ver == QM_HW_V2) { + } else { sqc->dw3 = cpu_to_le32(QM_MK_SQC_DW3_V2(qm->sqe_size)); sqc->w8 = 0; /* rand_qc */ } @@ -1867,7 +1856,7 @@ static int qm_cq_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) cqc->dw3 = cpu_to_le32(QM_MK_CQC_DW3_V1(0, 0, 0, QM_QC_CQE_SIZE)); cqc->w8 = cpu_to_le16(QM_Q_DEPTH - 1); - } else if (ver == QM_HW_V2) { + } else { cqc->dw3 = cpu_to_le32(QM_MK_CQC_DW3_V2(QM_QC_CQE_SIZE)); cqc->w8 = 0; /* rand_qc */ } @@ -2104,14 +2093,13 @@ static void hisi_qm_cache_wb(struct hisi_qm *qm) { unsigned int val; - if (qm->ver == QM_HW_V2) { - writel(0x1, qm->io_base + QM_CACHE_WB_START); - if (readl_relaxed_poll_timeout(qm->io_base + QM_CACHE_WB_DONE, - val, val & BIT(0), POLL_PERIOD, - POLL_TIMEOUT)) - dev_err(&qm->pdev->dev, - "QM writeback sqc cache fail!\n"); - } + if (qm->ver == QM_HW_V1) + return; + + writel(0x1, qm->io_base + QM_CACHE_WB_START); + if (readl_relaxed_poll_timeout(qm->io_base + QM_CACHE_WB_DONE, + val, val & BIT(0), 10, 1000)) + dev_err(&qm->pdev->dev, "QM writeback sqc cache fail!\n"); } int hisi_qm_get_free_qp_num(struct hisi_qm *qm) @@ -2212,12 +2200,12 @@ static int hisi_qm_uacce_mmap(struct uacce_queue *q, switch (qfr->type) { case UACCE_QFRT_MMIO: - if (qm->ver == QM_HW_V2) { - if (WARN_ON(sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR + - QM_V2_DOORBELL_OFFSET / PAGE_SIZE))) + if (qm->ver == QM_HW_V1) { + if (WARN_ON(sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR)) return -EINVAL; } else { - if (WARN_ON(sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR)) + if (WARN_ON(sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR + + QM_V2_DOORBELL_OFFSET / PAGE_SIZE))) return -EINVAL; } @@ -2691,16 +2679,10 @@ int hisi_qm_init(struct hisi_qm *qm) struct device *dev = &pdev->dev; int ret; - switch (qm->ver) { - case QM_HW_V1: + if (qm->ver == QM_HW_V1) qm->ops = &qm_hw_ops_v1; - break; - case QM_HW_V2: + else qm->ops = &qm_hw_ops_v2; - break; - default: - return -EINVAL; - } if (qm->use_uacce) { dev_info(dev, "qm register to uacce\n"); @@ -2720,7 +2702,7 @@ int hisi_qm_init(struct hisi_qm *qm) goto err_irq_register; mutex_init(&qm->mailbox_lock); - if (qm->fun_type == QM_HW_VF && qm->ver == QM_HW_V2) { + if (qm->fun_type == QM_HW_VF && qm->ver != QM_HW_V1) { /* v2 or v3 starts to support get vft by mailbox */ ret = hisi_qm_get_vft(qm, &qm->qp_base, &qm->qp_num); if (ret) diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h index d7d23d1ec34c..211689579161 100644 --- a/drivers/crypto/hisilicon/qm.h +++ b/drivers/crypto/hisilicon/qm.h @@ -114,6 +114,7 @@ enum qm_hw_ver { QM_HW_UNKNOWN = -1, QM_HW_V1 = 0x20, QM_HW_V2 = 0x21, + QM_HW_V3 = 0x30, }; enum qm_fun_type { @@ -385,7 +386,6 @@ static inline int q_num_set(const char *val, const struct kernel_param *kp, struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_HUAWEI, device, NULL); u32 n, q_num; - u8 rev_id; int ret; if (!val) @@ -396,17 +396,10 @@ static inline int q_num_set(const char *val, const struct kernel_param *kp, pr_info("No device found currently, suppose queue number is %d\n", q_num); } else { - rev_id = pdev->revision; - switch (rev_id) { - case QM_HW_V1: + if (pdev->revision == QM_HW_V1) q_num = QNUM_V1; - break; - case QM_HW_V2: + else q_num = QNUM_V2; - break; - default: - return -EINVAL; - } } ret = kstrtou32(val, 10, &n); @@ -474,6 +467,7 @@ static inline int hisi_qm_pre_init(struct hisi_qm *qm, switch (pdev->revision) { case QM_HW_V1: case QM_HW_V2: + case QM_HW_V3: qm->ver = pdev->revision; break; default: diff --git a/drivers/crypto/hisilicon/rde/rde_main.c b/drivers/crypto/hisilicon/rde/rde_main.c index f2e00ff891db..9fee21bfaed0 100644 --- a/drivers/crypto/hisilicon/rde/rde_main.c +++ b/drivers/crypto/hisilicon/rde/rde_main.c @@ -647,18 +647,10 @@ static int hisi_rde_pf_probe_init(struct hisi_qm *qm) hisi_rde->ctrl = ctrl; ctrl->hisi_rde = hisi_rde; - switch (qm->ver) { - case QM_HW_V1: + if (qm->ver == QM_HW_V1) qm->ctrl_q_num = HRDE_QUEUE_NUM_V1; - break; - - case QM_HW_V2: + else qm->ctrl_q_num = HRDE_QUEUE_NUM_V2; - break; - - default: - return -EINVAL; - } ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c index 0f32dcb69e12..2f8dd6c30cb1 100644 --- a/drivers/crypto/hisilicon/sec2/sec_main.c +++ b/drivers/crypto/hisilicon/sec2/sec_main.c @@ -738,18 +738,10 @@ static int sec_pf_probe_init(struct hisi_qm *qm) { int ret; - switch (qm->ver) { - case QM_HW_V1: + if (qm->ver == QM_HW_V1) qm->ctrl_q_num = SEC_QUEUE_NUM_V1; - break; - - case QM_HW_V2: + else qm->ctrl_q_num = SEC_QUEUE_NUM_V2; - break; - - default: - return -EINVAL; - } ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c index 1ca51793e26a..ce931af1007b 100644 --- a/drivers/crypto/hisilicon/zip/zip_main.c +++ b/drivers/crypto/hisilicon/zip/zip_main.c @@ -790,18 +790,10 @@ static int hisi_zip_pf_probe_init(struct hisi_qm *qm) zip->ctrl = ctrl; ctrl->hisi_zip = zip; - switch (qm->ver) { - case QM_HW_V1: + if (qm->ver == QM_HW_V1) qm->ctrl_q_num = HZIP_QUEUE_NUM_V1; - break; - - case QM_HW_V2: + else qm->ctrl_q_num = HZIP_QUEUE_NUM_V2; - break; - - default: - return -EINVAL; - } ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) -- 2.27.0

2 1

[PATCH OLK-5.10 RESEND 00/15] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 27 Oct '23

27 Oct '23

In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the hulk self-developed patch, so revert it. The next 4 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. I've tested this with userspace programs which use each of the deprecated instructions on Raspberry Pi 4B KVM/Qemu, and I've concurrently modified the support level for each of the features back-and-forth between HW and emulated to check that there are no oops or above repeated addition or deletion call trace. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (13): arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 395 insertions(+), 428 deletions(-) -- 2.34.1

1 15

[PATCH openEuler-1.0-LTS v2 0/7] Fix netfilter conntrack
by Lu Wei 27 Oct '23

27 Oct '23

Florian Westphal (7): netfilter: conntrack: tell compiler to not inline nf_ct_resolve_clash netfilter: conntrack: remove two args from resolve_clash netfilter: conntrack: place confirm-bit setting in a helper netfilter: conntrack: split resolve_clash function netfilter: conntrack: allow insertion of clashing entries netfilter: conntrack: do not auto-delete clash entries on reply netfilter: conntrack: fix infinite loop on rmmod include/linux/rculist_nulls.h | 7 + .../linux/netfilter/nf_conntrack_common.h | 12 +- net/netfilter/nf_conntrack_core.c | 215 +++++++++++++++--- net/netfilter/nf_conntrack_proto_udp.c | 7 +- net/netfilter/nft_flow_offload.c | 2 +- 5 files changed, 205 insertions(+), 38 deletions(-) -- 2.34.1

2 8

[PATCH OLK-5.10 v4 0/8] coresight: etm4x: Migrate ACPI AMBA devices to platform drive
by Junhao He 27 Oct '23

27 Oct '23

CoreSight ETM4x devices could be accessed either via MMIO (handled via amba_driver) or CPU system instructions (handled via platform driver). But this has the following issues : - Each new CPU comes up with its own PID and thus we need to keep on adding the "known" PIDs to get it working with AMBA driver. While the ETM4 architecture (and CoreSight architecture) defines way to identify a device as ETM4. Thus older kernels won't be able to "discover" a newer CPU, unless we add the PIDs. - With ACPI, the ETM4x devices have the same HID to identify the device irrespective of the mode of access. This creates a problem where two different drivers (both AMBA based driver and platform driver) would hook into the "HID" and could conflict. e.g., if AMBA driver gets hold of a non-MMIO device, the probe fails. If we have single driver hooked into the given "HID", we could handle them seamlessly, irrespective of the mode of access. - CoreSight is heavily dependent on the runtime power management. With ACPI, amba_driver doesn't get us anywhere with handling the power and thus one need to always turn the power ON to use them. Moving to platform driver gives us the power management for free. Due to all of the above, we are moving ACPI MMIO based etm4x devices to be supported via tha platform driver. The series makes the existing platform driver generic to handle both type of the access modes. Although existing AMBA driver would still continue to support DT based etm4x MMIO devices. Although some problems still remain, such as manually adding PIDs for all new AMBA DT based devices. Anshuman Khandual (5): coresight: etm4x: Allocate and device assign 'struct etmv4_drvdata' earlier coresight: etm4x: Drop iomem 'base' argument from etm4_probe() coresight: etm4x: Drop pid argument from etm4_probe() coresight: etm4x: Change etm4_platform_driver driver for MMIO devices coresight: etm4x: Ensure valid drvdata and clock before clk_put() Junhao He (1): Revert "coresight: ete: Add acpi match id for Hip09" Suzuki K Poulose (2): coresight: platform: acpi: Ignore the absence of graph coresight: etm4x: Add ACPI support in platform driver drivers/acpi/acpi_amba.c | 1 - .../coresight/coresight-etm4x-core.c | 119 ++++++++++++++---- drivers/hwtracing/coresight/coresight-etm4x.h | 4 + .../hwtracing/coresight/coresight-platform.c | 6 +- include/linux/coresight.h | 59 +++++++++ 5 files changed, 162 insertions(+), 27 deletions(-) -- 2.33.0

2 9

[OLK-5.10 v2] RDMA/hns: Append SCC context to the raw dump of QP Resource
by Chengchang Tang 27 Oct '23

27 Oct '23

From: wenglianfa <wenglianfa(a)huawei.com> driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8B8HH -------------------------------------------------------------------------- QP and SCC are bound one by one. Therefore, QPC and SCCC can be combined for output. When the SCC output is disabled, the QPC output is normal, and the SCCC output is all 0. The 512-byte QP context is appended with 64-byte SCC context. Example: $rdma res show qp -jpr [ { "ifindex": 0, "ifname": "hns_0", "data": [ 67,0,0,0... 512bytes 4,0,2... 64bytes] },... } ] Signed-off-by: wenglianfa <wenglianfa(a)huawei.com> --- drivers/infiniband/hw/hns/hns_roce_cmd.h | 3 +++ drivers/infiniband/hw/hns/hns_roce_device.h | 1 + drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 25 +++++++++++++++++++ drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 6 +++++ drivers/infiniband/hw/hns/hns_roce_restrack.c | 24 +++++++++++++++--- 5 files changed, 56 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h b/drivers/infiniband/hw/hns/hns_roce_cmd.h index 052a3d60905a..11dbbabebdc9 100644 --- a/drivers/infiniband/hw/hns/hns_roce_cmd.h +++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h @@ -108,6 +108,9 @@ enum { HNS_ROCE_CMD_QUERY_CEQC = 0x92, HNS_ROCE_CMD_DESTROY_CEQC = 0x93, + /* SCC CTX commands */ + HNS_ROCE_CMD_QUERY_SCCC = 0xa2, + /* SCC CTX BT commands */ HNS_ROCE_CMD_READ_SCCC_BT0 = 0xa4, HNS_ROCE_CMD_WRITE_SCCC_BT0 = 0xa5, diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 453c088ad92f..14bc242b6ad6 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -1020,6 +1020,7 @@ struct hns_roce_hw { int (*query_cqc)(struct hns_roce_dev *hr_dev, u32 cqn, void *buffer); int (*query_qpc)(struct hns_roce_dev *hr_dev, u32 qpn, void *buffer); int (*query_mpt)(struct hns_roce_dev *hr_dev, u32 key, void *buffer); + int (*query_sccc)(struct hns_roce_dev *hr_dev, u32 qpn, void *buffer); int (*get_dscp)(struct hns_roce_dev *hr_dev, u8 dscp, u8 *tc_mode, u8 *priority); int (*query_hw_counter)(struct hns_roce_dev *hr_dev, diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 8a1b9714dafb..a9b44ebbbd25 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -5682,6 +5682,30 @@ static int hns_roce_v2_query_qpc(struct hns_roce_dev *hr_dev, u32 qpn, return ret; } +static int hns_roce_v2_query_sccc(struct hns_roce_dev *hr_dev, u32 qpn, + void *buffer) +{ + struct hns_roce_v2_scc_context *context; + struct hns_roce_cmd_mailbox *mailbox; + int ret; + + mailbox = hns_roce_alloc_cmd_mailbox(hr_dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + ret = hns_roce_cmd_mbox(hr_dev, 0, mailbox->dma, HNS_ROCE_CMD_QUERY_SCCC, + qpn); + if (ret) + goto out; + + context = mailbox->buf; + memcpy(buffer, context, sizeof(*context)); + +out: + hns_roce_free_cmd_mailbox(hr_dev, mailbox); + return ret; +} + static u8 get_qp_timeout_attr(struct hns_roce_dev *hr_dev, struct hns_roce_v2_qp_context *context) { @@ -7162,6 +7186,7 @@ static const struct hns_roce_hw hns_roce_hw_v2 = { .query_cqc = hns_roce_v2_query_cqc, .query_qpc = hns_roce_v2_query_qpc, .query_mpt = hns_roce_v2_query_mpt, + .query_sccc = hns_roce_v2_query_sccc, .get_dscp = hns_roce_hw_v2_get_dscp, .hns_roce_dev_ops = &hns_roce_v2_dev_ops, .hns_roce_dev_srq_ops = &hns_roce_v2_dev_srq_ops, diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h index cbcd0f6531a4..4293ec92ca53 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h @@ -654,6 +654,12 @@ struct hns_roce_v2_qp_context { #define QPCEX_SQ_RQ_NOT_FORBID_EN QPCEX_FIELD_LOC(23, 23) #define QPCEX_STASH QPCEX_FIELD_LOC(82, 82) +#define SCC_CONTEXT_SIZE 16 + +struct hns_roce_v2_scc_context { + __le32 data[SCC_CONTEXT_SIZE]; +}; + #define V2_QP_RWE_S 1 /* rdma write enable */ #define V2_QP_RRE_S 2 /* rdma read enable */ #define V2_QP_ATE_S 3 /* rdma atomic enable */ diff --git a/drivers/infiniband/hw/hns/hns_roce_restrack.c b/drivers/infiniband/hw/hns/hns_roce_restrack.c index 961036b31c18..9b8b40998b2e 100644 --- a/drivers/infiniband/hw/hns/hns_roce_restrack.c +++ b/drivers/infiniband/hw/hns/hns_roce_restrack.c @@ -99,16 +99,34 @@ int hns_roce_fill_res_qp_entry_raw(struct sk_buff *msg, struct ib_qp *ib_qp) { struct hns_roce_dev *hr_dev = to_hr_dev(ib_qp->device); struct hns_roce_qp *hr_qp = to_hr_qp(ib_qp); - struct hns_roce_v2_qp_context context; + struct hns_roce_full_qp_ctx { + struct hns_roce_v2_qp_context qpc; + struct hns_roce_v2_scc_context sccc; + } context = {}; int ret; if (!hr_dev->hw->query_qpc) return -EINVAL; - ret = hr_dev->hw->query_qpc(hr_dev, hr_qp->qpn, &context); + ret = hr_dev->hw->query_qpc(hr_dev, hr_qp->qpn, &context.qpc); if (ret) - return -EINVAL; + return ret; + + if (!(hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL) || + !hr_dev->hw->query_sccc) + goto out; + + /* The scc may not exist. If the scc query fails, + * only the qpc content is displayed, and the + * scc value is all 0. + */ + ret = hr_dev->hw->query_sccc(hr_dev, hr_qp->qpn, &context.sccc); + if (ret) + ibdev_warn_ratelimited(&hr_dev->ib_dev, + "failed to query SCCC, ret = %d.\n", + ret); +out: ret = nla_put(msg, RDMA_NLDEV_ATTR_RES_RAW, sizeof(context), &context); return ret; -- 2.30.0

1 0

[PATCH OLK-5.10 00/15] arm64: fix a concurrency issue in emulation_proc_handler()
by Jinjie Ruan 27 Oct '23

27 Oct '23

In linux-6.1, the related code is refactored in commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") and this issue was incidentally fixed. This patch set try to adapt the refactoring stable 5.10 patches to solve the problem of repeated addition of linked lists described below. How to reproduce: CONFIG_ARMV8_DEPRECATED=y, CONFIG_SWP_EMULATION=y, and CONFIG_DEBUG_LIST=y, then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 1 > /proc/sys/abi/swp done or "echo 1 > /proc/sys/abi/swp" and then launch two shell executions: #!/bin/bash while [ 1 ]; do echo 0 > /proc/sys/abi/swp done In emulation_proc_handler(), read and write operations are performed on insn->current_mode. In the concurrency scenario, mutex only protects writing insn->current_mode, and not protects the read. Suppose there are two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE in the critical section, the prev_mode of task2 is still the old data INSN_UNDEF of insn->current_mode. As a result, two tasks call update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and current_mode = INSN_EMULATE, then call register_emulation_hooks twice, resulting in a list_add double problem. commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling") remove the dynamic registration and unregistration so remove the register_undef_hook() function, so the below problem was incidentally fixed. Call trace: __list_add_valid+0xd8/0xe4 register_undef_hook+0x94/0x13c update_insn_emulation_mode+0xd0/0x12c emulation_proc_handler+0xd8/0xf4 proc_sys_call_handler+0x140/0x250 proc_sys_write+0x1c/0x2c new_sync_write+0xec/0x18c vfs_write+0x214/0x2ac ksys_write+0x70/0xfc __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180 Call trace: __list_del_entry_valid+0xac/0x110 unregister_undef_hook+0x34/0x80 update_insn_emulation_mode+0xf0/0x180 emulation_proc_handler+0x8c/0xd8 proc_sys_call_handler+0x1d8/0x208 proc_sys_write+0x14/0x20 new_sync_write+0xf0/0x190 vfs_write+0x304/0x388 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.4+0x68/0x188 do_el0_svc+0x24/0xa0 el0_svc+0x14/0x20 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 The first patch is the hulk self-developed patch, so revert it. The next 4 patches is a patch set which provides context for subsequent refactoring 9 patches, especially commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") which modify do_undefinstr() to add a ESR_ELx value arg, and then commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") splits do_undefinstr() handler into separate do_el0_undef() and do_el1_undef() handlers. The 9 patches after that is another refactoring patch set, which is in preparation for the main rework commit 124c49b1b5d9 ("arm64: armv8_deprecated: rework deprected instruction handling"). To remove struct undef_hook, commit bff8f413c71f ("arm64: factor out EL1 SSBS emulation hook") factor out EL1 SSBS emulation hook, which also avoid call call_undef_hook() in do_el1_undef(), commit f5962add74b6 ("arm64: rework EL0 MRS emulation") factor out EL0 MRS emulation hook, which also prepare for replacing call_undef_hook() in do_el0_undef(). To replace call_undef_hook() function, commit 61d64a376ea8 ("arm64: split EL0/EL1 UNDEF handlers") split the do_undefinstr() into do_el0_undef() and do_el1_undef() functions, and commit dbfbd87efa79 ("arm64: factor insn read out of call_undef_hook()") factor user_insn_read() from call_undef_hook() so the main rework patch can replace the call_undef_hook() in do_el0_undef(). The last patch is a bugfix for the main rework patch. I've tested this with userspace programs which use each of the deprecated instructions on Raspberry Pi 4B KVM/Qemu, and I've concurrently modified the support level for each of the features back-and-forth between HW and emulated to check that there are no oops or above repeated addition or deletion call trace. Fixes: af483947d472 ("arm64: fix oops in concurrently setting insn_emulation sysctls") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> Jinjie Ruan (1): Revert "arm64: fix a concurrency issue in emulation_proc_handler()" Mark Rutland (13): arm64: die(): pass 'err' as long arm64: consistently pass ESR_ELx to die() arm64: rework FPAC exception handling arm64: rework BTI exception handling arm64: allow kprobes on EL0 handlers arm64: split EL0/EL1 UNDEF handlers arm64: factor out EL1 SSBS emulation hook arm64: factor insn read out of call_undef_hook() arm64: rework EL0 MRS emulation arm64: armv8_deprecated: fold ops into insn_emulation arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated: rework deprected instruction handling Ren Zhijie (1): arm64: armv8_deprecated: fix unused-function error arch/arm64/include/asm/cpufeature.h | 3 +- arch/arm64/include/asm/exception.h | 13 +- arch/arm64/include/asm/spectre.h | 2 + arch/arm64/include/asm/system_misc.h | 2 +- arch/arm64/include/asm/traps.h | 19 +- arch/arm64/kernel/armv8_deprecated.c | 574 +++++++++++++-------------- arch/arm64/kernel/cpufeature.c | 23 +- arch/arm64/kernel/entry-common.c | 36 +- arch/arm64/kernel/proton-pack.c | 26 +- arch/arm64/kernel/traps.c | 125 +++--- 10 files changed, 395 insertions(+), 428 deletions(-) -- 2.34.1

1 15

[openEuler-1.0-LTS] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
by Wenyu Huang 27 Oct '23

27 Oct '23

From: Kaitao Cheng <pilgrimtao(a)gmail.com> stable inclusion from stable-4.19 commit 50f9ad607ea891a9308e67b81f774c71736d1098 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAW8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- In the function, if register_trace_sched_migrate_task() returns error, sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is why fail_deprobe_sched_switch was added. Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com Cc: stable(a)vger.kernel.org Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing") Signed-off-by: Kaitao Cheng <pilgrimtao(a)gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Wenyu Huang <huangwenyu4(a)huawei.com> --- kernel/trace/trace_sched_wakeup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index 0db1b024e786..8041bd5e4262 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -642,7 +642,7 @@ static void start_wakeup_tracer(struct trace_array *tr) if (ret) { pr_info("wakeup trace: Couldn't activate tracepoint" " probe to kernel_sched_migrate_task\n"); - return; + goto fail_deprobe_sched_switch; } wakeup_reset(tr); @@ -660,6 +660,8 @@ static void start_wakeup_tracer(struct trace_array *tr) printk(KERN_ERR "failed to start wakeup tracer\n"); return; +fail_deprobe_sched_switch: + unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL); fail_deprobe_wake_new: unregister_trace_sched_wakeup_new(probe_wakeup, NULL); fail_deprobe: -- 2.34.1

1 0

[PATCH openEuler-1.0-LTS 0/6] Fix netfilter conntrack
by Lu Wei 27 Oct '23

27 Oct '23

Florian Westphal (6): netfilter: conntrack: tell compiler to not inline nf_ct_resolve_clash netfilter: conntrack: remove two args from resolve_clash netfilter: conntrack: place confirm-bit setting in a helper netfilter: conntrack: split resolve_clash function netfilter: conntrack: allow insertion of clashing entries netfilter: conntrack: do not auto-delete clash entries on reply include/linux/rculist_nulls.h | 7 + .../linux/netfilter/nf_conntrack_common.h | 12 +- net/netfilter/nf_conntrack_core.c | 202 +++++++++++++++--- net/netfilter/nf_conntrack_proto_udp.c | 7 +- net/netfilter/nft_flow_offload.c | 2 +- 5 files changed, 193 insertions(+), 37 deletions(-) -- 2.34.1

2 7

[PATCH OLK-5.10 v3 0/7] coresight: etm4x: Migrate ACPI AMBA devices to platform driver
by Junhao He 27 Oct '23

27 Oct '23

CoreSight ETM4x devices could be accessed either via MMIO (handled via amba_driver) or CPU system instructions (handled via platform driver). But this has the following issues : - Each new CPU comes up with its own PID and thus we need to keep on adding the "known" PIDs to get it working with AMBA driver. While the ETM4 architecture (and CoreSight architecture) defines way to identify a device as ETM4. Thus older kernels won't be able to "discover" a newer CPU, unless we add the PIDs. - With ACPI, the ETM4x devices have the same HID to identify the device irrespective of the mode of access. This creates a problem where two different drivers (both AMBA based driver and platform driver) would hook into the "HID" and could conflict. e.g., if AMBA driver gets hold of a non-MMIO device, the probe fails. If we have single driver hooked into the given "HID", we could handle them seamlessly, irrespective of the mode of access. - CoreSight is heavily dependent on the runtime power management. With ACPI, amba_driver doesn't get us anywhere with handling the power and thus one need to always turn the power ON to use them. Moving to platform driver gives us the power management for free. Due to all of the above, we are moving ACPI MMIO based etm4x devices to be supported via tha platform driver. The series makes the existing platform driver generic to handle both type of the access modes. Although existing AMBA driver would still continue to support DT based etm4x MMIO devices. Although some problems still remain, such as manually adding PIDs for all new AMBA DT based devices. Anshuman Khandual (4): coresight: etm4x: Allocate and device assign 'struct etmv4_drvdata' earlier coresight: etm4x: Drop iomem 'base' argument from etm4_probe() coresight: etm4x: Drop pid argument from etm4_probe() coresight: etm4x: Change etm4_platform_driver driver for MMIO devices Junhao He (1): Revert "coresight: ete: Add acpi match id for Hip09" Suzuki K Poulose (2): coresight: platform: acpi: Ignore the absence of graph coresight: etm4x: Add ACPI support in platform driver drivers/acpi/acpi_amba.c | 1 - .../coresight/coresight-etm4x-core.c | 119 ++++++++++++++---- drivers/hwtracing/coresight/coresight-etm4x.h | 4 + .../hwtracing/coresight/coresight-platform.c | 6 +- include/linux/coresight.h | 59 +++++++++ 5 files changed, 162 insertions(+), 27 deletions(-) -- 2.33.0

2 8

[PATCH OLK-5.10] preempt/dynamic: Fix setup_preempt_mode() return value
by Xia Fukun 27 Oct '23

27 Oct '23

From: Andrew Halaney <ahalaney(a)redhat.com> mainline inclusion from mainline-v6.5-rc7 commit 9ed20bafc85806ca6c97c9128cec46c3ef80ae86 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8BAX7 CVE: NA -------------------------------- __setup() callbacks expect 1 for success and 0 for failure. Correct the usage here to reflect that. Fixes: 826bfeb37bb4 ("preempt/dynamic: Support dynamic preempt with preempt= boot option") Reported-by: Mark Rutland <mark.rutland(a)arm.com> Signed-off-by: Andrew Halaney <ahalaney(a)redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Link: https://lkml.kernel.org/r/20211203233203.133581-1-ahalaney@redhat.com Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5d17bffffaf4..44cfb82b4f13 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5794,11 +5794,11 @@ static int __init setup_preempt_mode(char *str) int mode = sched_dynamic_mode(str); if (mode < 0) { pr_warn("Dynamic Preempt: unsupported mode: %s\n", str); - return 1; + return 0; } sched_dynamic_update(mode); - return 0; + return 1; } __setup("preempt=", setup_preempt_mode); -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS 0/2] CVE-2022-44033
by Zhang Xiaoxu 27 Oct '23

27 Oct '23

Jiri Slaby (2): tty: ipwireless: move Kconfig entry to tty char: pcmcia: remove all the drivers Documentation/ioctl/ioctl-number.txt | 1 - Documentation/process/magic-number.rst | 1 - .../translations/zh_CN/magic-number.txt | 1 - MAINTAINERS | 17 - arch/powerpc/configs/ppc6xx_defconfig | 2 - drivers/char/Kconfig | 2 - drivers/char/Makefile | 1 - drivers/char/pcmcia/Kconfig | 67 - drivers/char/pcmcia/Makefile | 10 - drivers/char/pcmcia/cm4000_cs.c | 1920 -------- drivers/char/pcmcia/cm4040_cs.c | 685 --- drivers/char/pcmcia/cm4040_cs.h | 48 - drivers/char/pcmcia/scr24x_cs.c | 373 -- drivers/char/pcmcia/synclink_cs.c | 4305 ----------------- drivers/tty/Kconfig | 9 + include/linux/cm4000_cs.h | 11 - include/uapi/linux/cm4000_cs.h | 64 - 17 files changed, 9 insertions(+), 7508 deletions(-) delete mode 100644 drivers/char/pcmcia/Kconfig delete mode 100644 drivers/char/pcmcia/Makefile delete mode 100644 drivers/char/pcmcia/cm4000_cs.c delete mode 100644 drivers/char/pcmcia/cm4040_cs.c delete mode 100644 drivers/char/pcmcia/cm4040_cs.h delete mode 100644 drivers/char/pcmcia/scr24x_cs.c delete mode 100644 drivers/char/pcmcia/synclink_cs.c delete mode 100644 include/linux/cm4000_cs.h delete mode 100644 include/uapi/linux/cm4000_cs.h -- 2.34.1

2 3

[PATCH openEuler-22.03-LTS 0/2] CVE-2022-44033
by Zhang Xiaoxu 27 Oct '23

27 Oct '23

Jiri Slaby (2): tty: ipwireless: move Kconfig entry to tty char: pcmcia: remove all the drivers Documentation/process/magic-number.rst | 1 - .../it_IT/process/magic-number.rst | 1 - .../zh_CN/process/magic-number.rst | 1 - .../userspace-api/ioctl/ioctl-number.rst | 1 - MAINTAINERS | 17 - arch/powerpc/configs/ppc6xx_defconfig | 2 - drivers/char/Kconfig | 2 - drivers/char/Makefile | 1 - drivers/char/pcmcia/Kconfig | 68 - drivers/char/pcmcia/Makefile | 11 - drivers/char/pcmcia/cm4000_cs.c | 1912 -------- drivers/char/pcmcia/cm4040_cs.c | 685 --- drivers/char/pcmcia/cm4040_cs.h | 48 - drivers/char/pcmcia/scr24x_cs.c | 360 -- drivers/char/pcmcia/synclink_cs.c | 4304 ----------------- drivers/tty/Kconfig | 9 + include/linux/cm4000_cs.h | 11 - include/uapi/linux/cm4000_cs.h | 64 - 18 files changed, 9 insertions(+), 7489 deletions(-) delete mode 100644 drivers/char/pcmcia/Kconfig delete mode 100644 drivers/char/pcmcia/Makefile delete mode 100644 drivers/char/pcmcia/cm4000_cs.c delete mode 100644 drivers/char/pcmcia/cm4040_cs.c delete mode 100644 drivers/char/pcmcia/cm4040_cs.h delete mode 100644 drivers/char/pcmcia/scr24x_cs.c delete mode 100644 drivers/char/pcmcia/synclink_cs.c delete mode 100644 include/linux/cm4000_cs.h delete mode 100644 include/uapi/linux/cm4000_cs.h -- 2.34.1

2 3

[PATCH openEuler-22.03-LTS 0/2] CVE-2022-44033
by Zhang Xiaoxu 27 Oct '23

27 Oct '23

Jiri Slaby (2): tty: ipwireless: move Kconfig entry to tty char: pcmcia: remove all the drivers Documentation/process/magic-number.rst | 1 - .../it_IT/process/magic-number.rst | 1 - .../zh_CN/process/magic-number.rst | 1 - .../userspace-api/ioctl/ioctl-number.rst | 1 - MAINTAINERS | 17 - arch/powerpc/configs/ppc6xx_defconfig | 2 - drivers/char/Kconfig | 2 - drivers/char/Makefile | 1 - drivers/char/pcmcia/Kconfig | 68 - drivers/char/pcmcia/Makefile | 11 - drivers/char/pcmcia/cm4000_cs.c | 1912 -------- drivers/char/pcmcia/cm4040_cs.c | 685 --- drivers/char/pcmcia/cm4040_cs.h | 48 - drivers/char/pcmcia/scr24x_cs.c | 360 -- drivers/char/pcmcia/synclink_cs.c | 4304 ----------------- drivers/tty/Kconfig | 9 + include/linux/cm4000_cs.h | 11 - include/uapi/linux/cm4000_cs.h | 64 - 18 files changed, 9 insertions(+), 7489 deletions(-) delete mode 100644 drivers/char/pcmcia/Kconfig delete mode 100644 drivers/char/pcmcia/Makefile delete mode 100644 drivers/char/pcmcia/cm4000_cs.c delete mode 100644 drivers/char/pcmcia/cm4040_cs.c delete mode 100644 drivers/char/pcmcia/cm4040_cs.h delete mode 100644 drivers/char/pcmcia/scr24x_cs.c delete mode 100644 drivers/char/pcmcia/synclink_cs.c delete mode 100644 include/linux/cm4000_cs.h delete mode 100644 include/uapi/linux/cm4000_cs.h -- 2.34.1

2 3

[PATCH OLK-5.10 v4 0/6] Synchronize coresight driver bugfix patches to openEuler
by Junhao He 26 Oct '23

26 Oct '23

1. Fix memory leak coresight: Fix memory leak in acpi_buffer->pointer 2. Fix TRBE bug coresight: trbe: Fix TRBE potential sleep in atomic context coresight: trbe: Allocate platform data per device coresight: trbe: Fix return value check in arm_trbe_register_coresight_cpu() 3. Fix ETR bug coresight: Fix run time warnings while reusing ETR buffer coresight: tmc-etr: Disable warnings for allocation failures Junhao He (2): coresight: Fix memory leak in acpi_buffer->pointer coresight: trbe: Fix TRBE potential sleep in atomic context Linu Cherian (1): coresight: Fix run time warnings while reusing ETR buffer Suzuki K Poulose (2): coresight: trbe: Allocate platform data per device coresight: tmc-etr: Disable warnings for allocation failures Wei Yongjun (1): coresight: trbe: Fix return value check in arm_trbe_register_coresight_cpu() .../hwtracing/coresight/coresight-platform.c | 46 ++++++++++++------- .../hwtracing/coresight/coresight-tmc-etr.c | 23 +++++----- drivers/hwtracing/coresight/coresight-trbe.c | 41 ++++++++--------- 3 files changed, 62 insertions(+), 48 deletions(-) -- 2.33.0

2 7

[PATCH OLK-5.10 0/7] coresight: etm4x: Migrate ACPI AMBA devices to platform driver
by Junhao He 26 Oct '23

26 Oct '23

CoreSight ETM4x devices could be accessed either via MMIO (handled via amba_driver) or CPU system instructions (handled via platform driver). But this has the following issues : - Each new CPU comes up with its own PID and thus we need to keep on adding the "known" PIDs to get it working with AMBA driver. While the ETM4 architecture (and CoreSight architecture) defines way to identify a device as ETM4. Thus older kernels won't be able to "discover" a newer CPU, unless we add the PIDs. - With ACPI, the ETM4x devices have the same HID to identify the device irrespective of the mode of access. This creates a problem where two different drivers (both AMBA based driver and platform driver) would hook into the "HID" and could conflict. e.g., if AMBA driver gets hold of a non-MMIO device, the probe fails. If we have single driver hooked into the given "HID", we could handle them seamlessly, irrespective of the mode of access. - CoreSight is heavily dependent on the runtime power management. With ACPI, amba_driver doesn't get us anywhere with handling the power and thus one need to always turn the power ON to use them. Moving to platform driver gives us the power management for free. Due to all of the above, we are moving ACPI MMIO based etm4x devices to be supported via tha platform driver. The series makes the existing platform driver generic to handle both type of the access modes. Although existing AMBA driver would still continue to support DT based etm4x MMIO devices. Although some problems still remain, such as manually adding PIDs for all new AMBA DT based devices. Anshuman Khandual (4): coresight: etm4x: Allocate and device assign 'struct etmv4_drvdata' earlier coresight: etm4x: Drop iomem 'base' argument from etm4_probe() coresight: etm4x: Drop pid argument from etm4_probe() coresight: etm4x: Change etm4_platform_driver driver for MMIO devices Chengchang Tang (1): RDMA/hns: Add support for RDMA VF over UBL Junhao He (1): Revert "coresight: ete: Add acpi match id for Hip09" Suzuki K Poulose (1): coresight: platform: acpi: Ignore the absence of graph -- 2.33.0

2 8

[PATCH OLK-5.10 v2 0/7] coresight: etm4x: Migrate ACPI AMBA devices to platform driver
by Junhao He 26 Oct '23

26 Oct '23

CoreSight ETM4x devices could be accessed either via MMIO (handled via amba_driver) or CPU system instructions (handled via platform driver). But this has the following issues : - Each new CPU comes up with its own PID and thus we need to keep on adding the "known" PIDs to get it working with AMBA driver. While the ETM4 architecture (and CoreSight architecture) defines way to identify a device as ETM4. Thus older kernels won't be able to "discover" a newer CPU, unless we add the PIDs. - With ACPI, the ETM4x devices have the same HID to identify the device irrespective of the mode of access. This creates a problem where two different drivers (both AMBA based driver and platform driver) would hook into the "HID" and could conflict. e.g., if AMBA driver gets hold of a non-MMIO device, the probe fails. If we have single driver hooked into the given "HID", we could handle them seamlessly, irrespective of the mode of access. - CoreSight is heavily dependent on the runtime power management. With ACPI, amba_driver doesn't get us anywhere with handling the power and thus one need to always turn the power ON to use them. Moving to platform driver gives us the power management for free. Due to all of the above, we are moving ACPI MMIO based etm4x devices to be supported via tha platform driver. The series makes the existing platform driver generic to handle both type of the access modes. Although existing AMBA driver would still continue to support DT based etm4x MMIO devices. Although some problems still remain, such as manually adding PIDs for all new AMBA DT based devices. Anshuman Khandual (4): coresight: etm4x: Allocate and device assign 'struct etmv4_drvdata' earlier coresight: etm4x: Drop iomem 'base' argument from etm4_probe() coresight: etm4x: Drop pid argument from etm4_probe() coresight: etm4x: Change etm4_platform_driver driver for MMIO devices Junhao He (1): Revert "coresight: ete: Add acpi match id for Hip09" Suzuki K Poulose (2): coresight: platform: acpi: Ignore the absence of graph coresight: etm4x: Add ACPI support in platform driver drivers/acpi/acpi_amba.c | 1 - .../coresight/coresight-etm4x-core.c | 119 ++++++++++++++---- drivers/hwtracing/coresight/coresight-etm4x.h | 4 + .../hwtracing/coresight/coresight-platform.c | 6 +- include/linux/coresight.h | 59 +++++++++ 5 files changed, 162 insertions(+), 27 deletions(-) -- 2.33.0

1 7

[PATCH openEuler-1.0-LTS 0/2] Expose SVE2 features for userspace
by Yu Liao 26 Oct '23

26 Oct '23

Andrew Murray (1): arm64: HWCAP: add support for AT_HWCAP2 Dave Martin (1): arm64: Expose SVE2 features for userspace Documentation/arm64/cpu-feature-registers.txt | 16 ++++ Documentation/arm64/elf_hwcaps.txt | 37 ++++++++- Documentation/arm64/sve.txt | 17 ++++ arch/arm64/Kconfig | 3 + arch/arm64/crypto/aes-ce-ccm-glue.c | 2 +- arch/arm64/crypto/aes-neonbs-glue.c | 2 +- arch/arm64/crypto/chacha20-neon-glue.c | 2 +- arch/arm64/crypto/ghash-ce-glue.c | 6 +- arch/arm64/crypto/sha256-glue.c | 4 +- arch/arm64/include/asm/cpufeature.h | 22 +++--- arch/arm64/include/asm/hwcap.h | 55 ++++++++++++- arch/arm64/include/asm/sysreg.h | 14 ++++ arch/arm64/include/uapi/asm/hwcap.h | 12 ++- arch/arm64/kernel/cpufeature.c | 77 +++++++++++-------- arch/arm64/kernel/cpuinfo.c | 12 ++- arch/arm64/kernel/fpsimd.c | 4 +- drivers/clocksource/arm_arch_timer.c | 8 ++ 17 files changed, 236 insertions(+), 57 deletions(-) -- 2.25.1

2 3

[PATCH openEuler-1.0-LTS 0/2] Fix CVE-2023-5717
by Yang Jihong 26 Oct '23

26 Oct '23

Peter Zijlstra (1): perf: Disallow mis-matched inherited group reads Yang Jihong (1): perf: Fix kabi breakage in struct perf_event include/linux/perf_event.h | 4 ++++ kernel/events/core.c | 40 ++++++++++++++++++++++++++++++++------ 2 files changed, 38 insertions(+), 6 deletions(-) -- 2.34.1

2 3