From: John Harrison John.C.Harrison@Intel.com
mainline inclusion from mainline-5.11-rc1 commit c784e5249e773689e38d2bc1749f08b986621a26 bugzilla: 185850 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2020-12363, CVE-2020-12364
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The latest GuC firmware includes a number of interface changes that require driver updates to match.
* Starting from Gen11, the ID to be provided to GuC needs to contain the engine class in bits [0..2] and the instance in bits [3..6].
NOTE: this patch breaks pointer dereferences in some existing GuC functions that use the guc_id to dereference arrays but these functions are not used for now as we have GuC submission disabled and we will update these functions in follow up patch which requires new IDs.
* The new GuC requires the additional data structure (ADS) and associated 'private_data' pointer to be setup. This is basically a scratch area of memory that the GuC owns. The size is read from the CSS header.
* There is now a physical to logical engine mapping table in the ADS which needs to be configured in order for the firmware to load. For now, the table is initialised with a 1 to 1 mapping.
* GUC_CTL_CTXINFO has been removed from the initialization params.
* reg_state_buffer is maintained internally by the GuC as part of the private data.
* The ADS layout has changed significantly. This patch updates the shared structure and also adds better documentation of the layout.
* While i915 does not use GuC doorbells, the firmware now requires that some initialisation is done.
* The number of engine classes and instances supported in the ADS has been increased.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Oscar Mateo oscar.mateo@intel.com Signed-off-by: Michel Thierry michel.thierry@intel.com Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Cc: Michal Winiarski michal.winiarski@intel.com Cc: Tomasz Lis tomasz.lis@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Reviewed-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-2-John.... Signed-off-by: Chen Jun chenjun102@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 +- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 18 --- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 131 +++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 80 +++++------ drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h | 5 + drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 27 ++-- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h | 6 +- 8 files changed, 176 insertions(+), 96 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index a19537706ed1..c940ac3aae2f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -305,8 +305,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->i915 = i915; engine->gt = gt; engine->uncore = gt->uncore; - engine->hw_id = engine->guc_id = info->hw_id; engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases); + engine->hw_id = info->hw_id; + engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
engine->class = info->class; engine->instance = info->instance; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 942c7c187adb..6909da1e1a73 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -213,23 +213,6 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc) return flags; }
-static u32 guc_ctl_ctxinfo_flags(struct intel_guc *guc) -{ - u32 flags = 0; - - if (intel_guc_submission_is_used(guc)) { - u32 ctxnum, base; - - base = intel_guc_ggtt_offset(guc, guc->stage_desc_pool); - ctxnum = GUC_MAX_STAGE_DESCRIPTORS / 16; - - base >>= PAGE_SHIFT; - flags |= (base << GUC_CTL_BASE_ADDR_SHIFT) | - (ctxnum << GUC_CTL_CTXNUM_IN16_SHIFT); - } - return flags; -} - static u32 guc_ctl_log_params_flags(struct intel_guc *guc) { u32 offset = intel_guc_ggtt_offset(guc, guc->log.vma) >> PAGE_SHIFT; @@ -291,7 +274,6 @@ static void guc_init_params(struct intel_guc *guc)
BUILD_BUG_ON(sizeof(guc->params) != GUC_CTL_MAX_DWORDS * sizeof(u32));
- params[GUC_CTL_CTXINFO] = guc_ctl_ctxinfo_flags(guc); params[GUC_CTL_LOG_PARAMS] = guc_ctl_log_params_flags(guc); params[GUC_CTL_FEATURE] = guc_ctl_feature_flags(guc); params[GUC_CTL_DEBUG] = guc_ctl_debug_flags(guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index d44061033f23..7950d28beb8c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -10,11 +10,52 @@
/* * The Additional Data Struct (ADS) has pointers for different buffers used by - * the GuC. One single gem object contains the ADS struct itself (guc_ads), the - * scheduling policies (guc_policies), a structure describing a collection of - * register sets (guc_mmio_reg_state) and some extra pages for the GuC to save - * its internal state for sleep. + * the GuC. One single gem object contains the ADS struct itself (guc_ads) and + * all the extra buffers indirectly linked via the ADS struct's entries. + * + * Layout of the ADS blob allocated for the GuC: + * + * +---------------------------------------+ <== base + * | guc_ads | + * +---------------------------------------+ + * | guc_policies | + * +---------------------------------------+ + * | guc_gt_system_info | + * +---------------------------------------+ + * | guc_clients_info | + * +---------------------------------------+ + * | guc_ct_pool_entry[size] | + * +---------------------------------------+ + * | padding | + * +---------------------------------------+ <== 4K aligned + * | private data | + * +---------------------------------------+ + * | padding | + * +---------------------------------------+ <== 4K aligned */ +struct __guc_ads_blob { + struct guc_ads ads; + struct guc_policies policies; + struct guc_gt_system_info system_info; + struct guc_clients_info clients_info; + struct guc_ct_pool_entry ct_pool[GUC_CT_POOL_SIZE]; +} __packed; + +static u32 guc_ads_private_data_size(struct intel_guc *guc) +{ + return PAGE_ALIGN(guc->fw.private_data_size); +} + +static u32 guc_ads_private_data_offset(struct intel_guc *guc) +{ + return PAGE_ALIGN(sizeof(struct __guc_ads_blob)); +} + +static u32 guc_ads_blob_size(struct intel_guc *guc) +{ + return guc_ads_private_data_offset(guc) + + guc_ads_private_data_size(guc); +}
static void guc_policy_init(struct guc_policy *policy) { @@ -48,26 +89,37 @@ static void guc_ct_pool_entries_init(struct guc_ct_pool_entry *pool, u32 num) memset(pool, 0, num * sizeof(*pool)); }
+static void guc_mapping_table_init(struct intel_gt *gt, + struct guc_gt_system_info *system_info) +{ + unsigned int i, j; + struct intel_engine_cs *engine; + enum intel_engine_id id; + + /* Table must be set to invalid values for entries not used */ + for (i = 0; i < GUC_MAX_ENGINE_CLASSES; ++i) + for (j = 0; j < GUC_MAX_INSTANCES_PER_CLASS; ++j) + system_info->mapping_table[i][j] = + GUC_MAX_INSTANCES_PER_CLASS; + + for_each_engine(engine, gt, id) { + u8 guc_class = engine->class; + + system_info->mapping_table[guc_class][engine->instance] = + engine->instance; + } +} + /* * The first 80 dwords of the register state context, containing the * execlists and ppgtt registers. */ #define LR_HW_CONTEXT_SIZE (80 * sizeof(u32))
-/* The ads obj includes the struct itself and buffers passed to GuC */ -struct __guc_ads_blob { - struct guc_ads ads; - struct guc_policies policies; - struct guc_mmio_reg_state reg_state; - struct guc_gt_system_info system_info; - struct guc_clients_info clients_info; - struct guc_ct_pool_entry ct_pool[GUC_CT_POOL_SIZE]; - u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE]; -} __packed; - static void __guc_ads_init(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; struct __guc_ads_blob *blob = guc->ads_blob; const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE; u32 base; @@ -99,13 +151,25 @@ static void __guc_ads_init(struct intel_guc *guc) }
/* System info */ - blob->system_info.slice_enabled = hweight8(gt->info.sseu.slice_mask); - blob->system_info.rcs_enabled = 1; - blob->system_info.bcs_enabled = 1; + blob->system_info.engine_enabled_masks[RENDER_CLASS] = 1; + blob->system_info.engine_enabled_masks[COPY_ENGINE_CLASS] = 1; + blob->system_info.engine_enabled_masks[VIDEO_DECODE_CLASS] = VDBOX_MASK(gt); + blob->system_info.engine_enabled_masks[VIDEO_ENHANCEMENT_CLASS] = VEBOX_MASK(gt); + + blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] = + hweight8(gt->info.sseu.slice_mask); + blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_VDBOX_SFC_SUPPORT_MASK] = + gt->info.vdbox_sfc_access; + + if (INTEL_GEN(i915) >= 12 && !IS_DGFX(i915)) { + u32 distdbreg = intel_uncore_read(gt->uncore, + GEN12_DIST_DBS_POPULATED); + blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_DOORBELL_COUNT_PER_SQIDI] = + ((distdbreg >> GEN12_DOORBELLS_PER_SQIDI_SHIFT) & + GEN12_DOORBELLS_PER_SQIDI) + 1; + }
- blob->system_info.vdbox_enable_mask = VDBOX_MASK(gt); - blob->system_info.vebox_enable_mask = VEBOX_MASK(gt); - blob->system_info.vdbox_sfc_support_mask = gt->info.vdbox_sfc_access; + guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
base = intel_guc_ggtt_offset(guc, guc->ads_vma);
@@ -118,11 +182,12 @@ static void __guc_ads_init(struct intel_guc *guc)
/* ADS */ blob->ads.scheduler_policies = base + ptr_offset(blob, policies); - blob->ads.reg_state_buffer = base + ptr_offset(blob, reg_state_buffer); - blob->ads.reg_state_addr = base + ptr_offset(blob, reg_state); blob->ads.gt_system_info = base + ptr_offset(blob, system_info); blob->ads.clients_info = base + ptr_offset(blob, clients_info);
+ /* Private Data */ + blob->ads.private_data = base + guc_ads_private_data_offset(guc); + i915_gem_object_flush_map(guc->ads_vma->obj); }
@@ -135,14 +200,15 @@ static void __guc_ads_init(struct intel_guc *guc) */ int intel_guc_ads_create(struct intel_guc *guc) { - const u32 size = PAGE_ALIGN(sizeof(struct __guc_ads_blob)); + u32 size; int ret;
GEM_BUG_ON(guc->ads_vma);
+ size = guc_ads_blob_size(guc); + ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma, (void **)&guc->ads_blob); - if (ret) return ret;
@@ -156,6 +222,18 @@ void intel_guc_ads_destroy(struct intel_guc *guc) i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP); }
+static void guc_ads_private_data_reset(struct intel_guc *guc) +{ + u32 size; + + size = guc_ads_private_data_size(guc); + if (!size) + return; + + memset((void *)guc->ads_blob + guc_ads_private_data_offset(guc), 0, + size); +} + /** * intel_guc_ads_reset() - prepares GuC Additional Data Struct for reuse * @guc: intel_guc struct @@ -168,5 +246,8 @@ void intel_guc_ads_reset(struct intel_guc *guc) { if (!guc->ads_vma) return; + __guc_ads_init(guc); + + guc_ads_private_data_reset(guc); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index a6b733c146c9..79c560d9c0b6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -26,8 +26,8 @@ #define GUC_VIDEO_ENGINE2 4 #define GUC_MAX_ENGINES_NUM (GUC_VIDEO_ENGINE2 + 1)
-#define GUC_MAX_ENGINE_CLASSES 5 -#define GUC_MAX_INSTANCES_PER_CLASS 16 +#define GUC_MAX_ENGINE_CLASSES 16 +#define GUC_MAX_INSTANCES_PER_CLASS 32
#define GUC_DOORBELL_INVALID 256
@@ -62,12 +62,7 @@ #define GUC_STAGE_DESC_ATTR_PCH BIT(6) #define GUC_STAGE_DESC_ATTR_TERMINATED BIT(7)
-/* New GuC control data */ -#define GUC_CTL_CTXINFO 0 -#define GUC_CTL_CTXNUM_IN16_SHIFT 0 -#define GUC_CTL_BASE_ADDR_SHIFT 12 - -#define GUC_CTL_LOG_PARAMS 1 +#define GUC_CTL_LOG_PARAMS 0 #define GUC_LOG_VALID (1 << 0) #define GUC_LOG_NOTIFY_ON_HALF_FULL (1 << 1) #define GUC_LOG_ALLOC_IN_MEGABYTE (1 << 3) @@ -79,11 +74,11 @@ #define GUC_LOG_ISR_MASK (0x7 << GUC_LOG_ISR_SHIFT) #define GUC_LOG_BUF_ADDR_SHIFT 12
-#define GUC_CTL_WA 2 -#define GUC_CTL_FEATURE 3 +#define GUC_CTL_WA 1 +#define GUC_CTL_FEATURE 2 #define GUC_CTL_DISABLE_SCHEDULER (1 << 14)
-#define GUC_CTL_DEBUG 4 +#define GUC_CTL_DEBUG 3 #define GUC_LOG_VERBOSITY_SHIFT 0 #define GUC_LOG_VERBOSITY_LOW (0 << GUC_LOG_VERBOSITY_SHIFT) #define GUC_LOG_VERBOSITY_MED (1 << GUC_LOG_VERBOSITY_SHIFT) @@ -97,12 +92,37 @@ #define GUC_LOG_DISABLED (1 << 6) #define GUC_PROFILE_ENABLED (1 << 7)
-#define GUC_CTL_ADS 5 +#define GUC_CTL_ADS 4 #define GUC_ADS_ADDR_SHIFT 1 #define GUC_ADS_ADDR_MASK (0xFFFFF << GUC_ADS_ADDR_SHIFT)
#define GUC_CTL_MAX_DWORDS (SOFT_SCRATCH_COUNT - 2) /* [1..14] */
+/* Generic GT SysInfo data types */ +#define GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED 0 +#define GUC_GENERIC_GT_SYSINFO_VDBOX_SFC_SUPPORT_MASK 1 +#define GUC_GENERIC_GT_SYSINFO_DOORBELL_COUNT_PER_SQIDI 2 +#define GUC_GENERIC_GT_SYSINFO_MAX 16 + +/* + * The class goes in bits [0..2] of the GuC ID, the instance in bits [3..6]. + * Bit 7 can be used for operations that apply to all engine classes&instances. + */ +#define GUC_ENGINE_CLASS_SHIFT 0 +#define GUC_ENGINE_CLASS_MASK (0x7 << GUC_ENGINE_CLASS_SHIFT) +#define GUC_ENGINE_INSTANCE_SHIFT 3 +#define GUC_ENGINE_INSTANCE_MASK (0xf << GUC_ENGINE_INSTANCE_SHIFT) +#define GUC_ENGINE_ALL_INSTANCES BIT(7) + +#define MAKE_GUC_ID(class, instance) \ + (((class) << GUC_ENGINE_CLASS_SHIFT) | \ + ((instance) << GUC_ENGINE_INSTANCE_SHIFT)) + +#define GUC_ID_TO_ENGINE_CLASS(guc_id) \ + (((guc_id) & GUC_ENGINE_CLASS_MASK) >> GUC_ENGINE_CLASS_SHIFT) +#define GUC_ID_TO_ENGINE_INSTANCE(guc_id) \ + (((guc_id) & GUC_ENGINE_INSTANCE_MASK) >> GUC_ENGINE_INSTANCE_SHIFT) + /* Work item for submitting workloads into work queue of GuC. */ struct guc_wq_item { u32 header; @@ -336,11 +356,6 @@ struct guc_policies { } __packed;
/* GuC MMIO reg state struct */ - - -#define GUC_REGSET_MAX_REGISTERS 64 -#define GUC_S3_SAVE_SPACE_PAGES 10 - struct guc_mmio_reg { u32 offset; u32 value; @@ -348,28 +363,18 @@ struct guc_mmio_reg { #define GUC_REGSET_MASKED (1 << 0) } __packed;
-struct guc_mmio_regset { - struct guc_mmio_reg registers[GUC_REGSET_MAX_REGISTERS]; - u32 values_valid; - u32 number_of_registers; -} __packed; - /* GuC register sets */ -struct guc_mmio_reg_state { - struct guc_mmio_regset engine_reg[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; - u32 reserved[98]; +struct guc_mmio_reg_set { + u32 address; + u16 count; + u16 reserved; } __packed;
/* HW info */ struct guc_gt_system_info { - u32 slice_enabled; - u32 rcs_enabled; - u32 reserved0; - u32 bcs_enabled; - u32 vdbox_enable_mask; - u32 vdbox_sfc_support_mask; - u32 vebox_enable_mask; - u32 reserved[9]; + u8 mapping_table[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; + u32 engine_enabled_masks[GUC_MAX_ENGINE_CLASSES]; + u32 generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_MAX]; } __packed;
/* Clients info */ @@ -390,15 +395,16 @@ struct guc_clients_info {
/* GuC Additional Data Struct */ struct guc_ads { - u32 reg_state_addr; - u32 reg_state_buffer; + struct guc_mmio_reg_set reg_state_list[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; + u32 reserved0; u32 scheduler_policies; u32 gt_system_info; u32 clients_info; u32 control_data; u32 golden_context_lrca[GUC_MAX_ENGINE_CLASSES]; u32 eng_state_size[GUC_MAX_ENGINE_CLASSES]; - u32 reserved[16]; + u32 private_data; + u32 reserved[15]; } __packed;
/* GuC logging structures */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h index 1949346e714e..b37fc2ffaef2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h @@ -118,6 +118,11 @@ struct guc_doorbell_info { #define GEN8_DRB_VALID (1<<0) #define GEN8_DRBREGU(x) _MMIO(0x1000 + (x) * 8 + 4)
+#define GEN12_DIST_DBS_POPULATED _MMIO(0xd08) +#define GEN12_DOORBELLS_PER_SQIDI_SHIFT 16 +#define GEN12_DOORBELLS_PER_SQIDI (0xff) +#define GEN12_SQIDIS_DOORBELL_EXIST (0xffff) + #define DE_GUCRMR _MMIO(0x44054)
#define GUC_BCS_RCS_IER _MMIO(0xC550) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index 80e8b6c3bc8c..ee4ac3922277 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -44,23 +44,19 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw, * List of required GuC and HuC binaries per-platform. * Must be ordered based on platform + revid, from newer to older. * - * TGL 35.2 is interface-compatible with 33.0 for previous Gens. The deltas - * between 33.0 and 35.2 are only related to new additions to support new Gen12 - * features. - * * Note that RKL uses the same firmware as TGL. */ #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \ - fw_def(ROCKETLAKE, 0, guc_def(tgl, 35, 2, 0), huc_def(tgl, 7, 5, 0)) \ - fw_def(TIGERLAKE, 0, guc_def(tgl, 35, 2, 0), huc_def(tgl, 7, 5, 0)) \ - fw_def(ELKHARTLAKE, 0, guc_def(ehl, 33, 0, 4), huc_def(ehl, 9, 0, 0)) \ - fw_def(ICELAKE, 0, guc_def(icl, 33, 0, 0), huc_def(icl, 9, 0, 0)) \ - fw_def(COMETLAKE, 5, guc_def(cml, 33, 0, 0), huc_def(cml, 4, 0, 0)) \ - fw_def(COFFEELAKE, 0, guc_def(kbl, 33, 0, 0), huc_def(kbl, 4, 0, 0)) \ - fw_def(GEMINILAKE, 0, guc_def(glk, 33, 0, 0), huc_def(glk, 4, 0, 0)) \ - fw_def(KABYLAKE, 0, guc_def(kbl, 33, 0, 0), huc_def(kbl, 4, 0, 0)) \ - fw_def(BROXTON, 0, guc_def(bxt, 33, 0, 0), huc_def(bxt, 2, 0, 0)) \ - fw_def(SKYLAKE, 0, guc_def(skl, 33, 0, 0), huc_def(skl, 2, 0, 0)) + fw_def(ROCKETLAKE, 0, guc_def(tgl, 49, 0, 1), huc_def(tgl, 7, 5, 0)) \ + fw_def(TIGERLAKE, 0, guc_def(tgl, 49, 0, 1), huc_def(tgl, 7, 5, 0)) \ + fw_def(ELKHARTLAKE, 0, guc_def(ehl, 49, 0, 1), huc_def(ehl, 9, 0, 0)) \ + fw_def(ICELAKE, 0, guc_def(icl, 49, 0, 1), huc_def(icl, 9, 0, 0)) \ + fw_def(COMETLAKE, 5, guc_def(cml, 49, 0, 1), huc_def(cml, 4, 0, 0)) \ + fw_def(COFFEELAKE, 0, guc_def(kbl, 49, 0, 1), huc_def(kbl, 4, 0, 0)) \ + fw_def(GEMINILAKE, 0, guc_def(glk, 49, 0, 1), huc_def(glk, 4, 0, 0)) \ + fw_def(KABYLAKE, 0, guc_def(kbl, 49, 0, 1), huc_def(kbl, 4, 0, 0)) \ + fw_def(BROXTON, 0, guc_def(bxt, 49, 0, 1), huc_def(bxt, 2, 0, 0)) \ + fw_def(SKYLAKE, 0, guc_def(skl, 49, 0, 1), huc_def(skl, 2, 0, 0))
#define __MAKE_UC_FW_PATH(prefix_, name_, major_, minor_, patch_) \ "i915/" \ @@ -371,6 +367,9 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw) } }
+ if (uc_fw->type == INTEL_UC_FW_TYPE_GUC) + uc_fw->private_data_size = css->private_data_size; + obj = i915_gem_object_create_shmem_from_data(i915, fw->data, fw->size); if (IS_ERR(obj)) { err = PTR_ERR(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h index 23d3a423ac0f..99bb1fe1af66 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h @@ -88,6 +88,8 @@ struct intel_uc_fw {
u32 rsa_size; u32 ucode_size; + + u32 private_data_size; };
#ifdef CONFIG_DRM_I915_DEBUG_GUC diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h index 029214cdedd5..e41ffc7a7fbc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h @@ -69,7 +69,11 @@ struct uc_css_header { #define CSS_SW_VERSION_UC_MAJOR (0xFF << 16) #define CSS_SW_VERSION_UC_MINOR (0xFF << 8) #define CSS_SW_VERSION_UC_PATCH (0xFF << 0) - u32 reserved[14]; + u32 reserved0[13]; + union { + u32 private_data_size; /* only applies to GuC */ + u32 reserved1; + }; u32 header_info; } __packed; static_assert(sizeof(struct uc_css_header) == 128);
From: Hannes Reinecke hare@suse.de
mainline inclusion from mainline-5.14-rc1 commit ced202f7bd78eb6a79c441a8b217e0f3d38bccfc category: bugfix bugzilla: 185813 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
Return the actual error code in __scsi_execute() (which, according to the documentation, should have happened anyway). And audit all callers to cope with negative return values from __scsi_execute() and friends.
[mkp: resolve conflict and return bool]
Link: https://lore.kernel.org/r/20210427083046.31620-7-hare@suse.de Reviewed-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Hannes Reinecke hare@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/ata/libata-scsi.c | 8 ++++++++ drivers/scsi/ch.c | 3 ++- drivers/scsi/cxlflash/superpipe.c | 2 +- drivers/scsi/scsi.c | 2 ++ drivers/scsi/scsi_ioctl.c | 4 +++- drivers/scsi/scsi_lib.c | 15 +++++++++------ drivers/scsi/scsi_scan.c | 4 ++-- drivers/scsi/scsi_transport_spi.c | 2 +- drivers/scsi/sd.c | 15 +++++++++------ drivers/scsi/sd_zbc.c | 2 +- drivers/scsi/sr_ioctl.c | 4 ++++ drivers/scsi/ufs/ufshcd.c | 2 +- drivers/scsi/virtio_scsi.c | 2 +- include/scsi/scsi.h | 7 +++++-- 14 files changed, 49 insertions(+), 23 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 48b8934970f3..c5129b9e3afd 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -409,6 +409,10 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg) cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize, sensebuf, &sshdr, (10*HZ), 5, 0, 0, NULL);
+ if (cmd_result < 0) { + rc = cmd_result; + goto error; + } if (driver_byte(cmd_result) == DRIVER_SENSE) {/* sense data available */ u8 *desc = sensebuf + 8; cmd_result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ @@ -490,6 +494,10 @@ int ata_task_ioctl(struct scsi_device *scsidev, void __user *arg) cmd_result = scsi_execute(scsidev, scsi_cmd, DMA_NONE, NULL, 0, sensebuf, &sshdr, (10*HZ), 5, 0, 0, NULL);
+ if (cmd_result < 0) { + rc = cmd_result; + goto error; + } if (driver_byte(cmd_result) == DRIVER_SENSE) {/* sense data available */ u8 *desc = sensebuf + 8; cmd_result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c index cb74ab1ae5a4..0e7d1214c3d8 100644 --- a/drivers/scsi/ch.c +++ b/drivers/scsi/ch.c @@ -198,7 +198,8 @@ ch_do_scsi(scsi_changer *ch, unsigned char *cmd, int cmd_len, result = scsi_execute_req(ch->device, cmd, direction, buffer, buflength, &sshdr, timeout * HZ, MAX_RETRIES, NULL); - + if (result < 0) + return result; if (driver_byte(result) == DRIVER_SENSE) { if (debug) scsi_print_sense_hdr(ch->device, ch->name, &sshdr); diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c index 5dddf67dfa24..3ebb7ac86a64 100644 --- a/drivers/scsi/cxlflash/superpipe.c +++ b/drivers/scsi/cxlflash/superpipe.c @@ -369,7 +369,7 @@ static int read_cap16(struct scsi_device *sdev, struct llun_info *lli) goto out; }
- if (driver_byte(result) == DRIVER_SENSE) { + if (result > 0 && driver_byte(result) == DRIVER_SENSE) { result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ if (result & SAM_STAT_CHECK_CONDITION) { switch (sshdr.sense_key) { diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 6ad834d61d4c..f561caabc771 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -495,6 +495,8 @@ int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer, result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer, len, &sshdr, 30 * HZ, 3, NULL);
+ if (result < 0) + return result; if (result && scsi_sense_valid(&sshdr) && sshdr.sense_key == ILLEGAL_REQUEST && (sshdr.asc == 0x20 || sshdr.asc == 0x24) && sshdr.ascq == 0x00) diff --git a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c index 14872c9dc78c..d34e1b41dc71 100644 --- a/drivers/scsi/scsi_ioctl.c +++ b/drivers/scsi/scsi_ioctl.c @@ -101,6 +101,8 @@ static int ioctl_internal_command(struct scsi_device *sdev, char *cmd, SCSI_LOG_IOCTL(2, sdev_printk(KERN_INFO, sdev, "Ioctl returned 0x%x\n", result));
+ if (result < 0) + goto out; if (driver_byte(result) == DRIVER_SENSE && scsi_sense_valid(&sshdr)) { switch (sshdr.sense_key) { @@ -133,7 +135,7 @@ static int ioctl_internal_command(struct scsi_device *sdev, char *cmd, break; } } - +out: SCSI_LOG_IOCTL(2, sdev_printk(KERN_INFO, sdev, "IOCTL Releasing command\n")); return result; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0a265a3e3dc6..9352a309fe55 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -245,20 +245,23 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, { struct request *req; struct scsi_request *rq; - int ret = DRIVER_ERROR << 24; + int ret;
req = blk_get_request(sdev->request_queue, data_direction == DMA_TO_DEVICE ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, rq_flags & RQF_PM ? BLK_MQ_REQ_PM : 0); if (IS_ERR(req)) - return ret; - rq = scsi_req(req); + return PTR_ERR(req);
- if (bufflen && blk_rq_map_kern(sdev->request_queue, req, - buffer, bufflen, GFP_NOIO)) - goto out; + rq = scsi_req(req);
+ if (bufflen) { + ret = blk_rq_map_kern(sdev->request_queue, req, + buffer, bufflen, GFP_NOIO); + if (ret) + goto out; + } rq->cmd_len = COMMAND_SIZE(cmd[0]); memcpy(rq->cmd, cmd, rq->cmd_len); rq->retries = retries; diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 8e474b145249..57002529174f 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -599,7 +599,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result, "scsi scan: INQUIRY %s with code 0x%x\n", result ? "failed" : "successful", result));
- if (result) { + if (result > 0) { /* * not-ready to ready transition [asc/ascq=0x28/0x0] * or power-on, reset [asc/ascq=0x29/0x0], continue. @@ -614,7 +614,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result, (sshdr.ascq == 0)) continue; } - } else { + } else if (result == 0) { /* * if nothing was transferred, we try * again. It's a workaround for some USB diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c index c37dd15d16d2..a9bb7ae2fafd 100644 --- a/drivers/scsi/scsi_transport_spi.c +++ b/drivers/scsi/scsi_transport_spi.c @@ -127,7 +127,7 @@ static int spi_execute(struct scsi_device *sdev, const void *cmd, REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER, RQF_PM, NULL); - if (driver_byte(result) != DRIVER_SENSE || + if (result < 0 || driver_byte(result) != DRIVER_SENSE || sshdr->sense_key != UNIT_ATTENTION) break; } diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 97713f595a29..3fc184e9702f 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1673,7 +1673,7 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing) &sshdr);
/* failed to execute TUR, assume media not present */ - if (host_byte(retval)) { + if (retval < 0 || host_byte(retval)) { set_media_not_present(sdkp); goto out; } @@ -1734,6 +1734,9 @@ static int sd_sync_cache(struct scsi_disk *sdkp, struct scsi_sense_hdr *sshdr) if (res) { sd_print_result(sdkp, "Synchronize Cache(10) failed", res);
+ if (res < 0) + return res; + if (driver_byte(res) == DRIVER_SENSE) sd_print_sense_hdr(sdkp, sshdr);
@@ -1842,7 +1845,7 @@ static int sd_pr_command(struct block_device *bdev, u8 sa, result = scsi_execute_req(sdev, cmd, DMA_TO_DEVICE, &data, sizeof(data), &sshdr, SD_TIMEOUT, sdkp->max_retries, NULL);
- if (driver_byte(result) == DRIVER_SENSE && + if (result > 0 && driver_byte(result) == DRIVER_SENSE && scsi_sense_valid(&sshdr)) { sdev_printk(KERN_INFO, sdev, "PR command failed: %d\n", result); scsi_print_sense_hdr(sdev, NULL, &sshdr); @@ -2194,7 +2197,7 @@ sd_spinup_disk(struct scsi_disk *sdkp) ((driver_byte(the_result) == DRIVER_SENSE) && sense_valid && sshdr.sense_key == UNIT_ATTENTION)));
- if (driver_byte(the_result) != DRIVER_SENSE) { + if (the_result < 0 || driver_byte(the_result) != DRIVER_SENSE) { /* no sense, TUR either succeeded or failed * with a status error */ if(!spintime && !scsi_status_is_good(the_result)) { @@ -2379,7 +2382,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, if (media_not_present(sdkp, &sshdr)) return -ENODEV;
- if (the_result) { + if (the_result > 0) { sense_valid = scsi_sense_valid(&sshdr); if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST && @@ -2464,7 +2467,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, if (media_not_present(sdkp, &sshdr)) return -ENODEV;
- if (the_result) { + if (the_result > 0) { sense_valid = scsi_sense_valid(&sshdr); if (sense_valid && sshdr.sense_key == UNIT_ATTENTION && @@ -3614,7 +3617,7 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL); if (res) { sd_print_result(sdkp, "Start/Stop Unit failed", res); - if (driver_byte(res) == DRIVER_SENSE) + if (res > 0 && driver_byte(res) == DRIVER_SENSE) sd_print_sense_hdr(sdkp, &sshdr); if (scsi_sense_valid(&sshdr) && /* 0x3a is medium not present */ diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index 01088f333dbc..6834e029906a 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -116,7 +116,7 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf, sd_printk(KERN_ERR, sdkp, "REPORT ZONES start lba %llu failed\n", lba); sd_print_result(sdkp, "REPORT ZONES", result); - if (driver_byte(result) == DRIVER_SENSE && + if (result > 0 && driver_byte(result) == DRIVER_SENSE && scsi_sense_valid(&sshdr)) sd_print_sense_hdr(sdkp, &sshdr); return -EIO; diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c index ffcf902da390..4c1de11e69fb 100644 --- a/drivers/scsi/sr_ioctl.c +++ b/drivers/scsi/sr_ioctl.c @@ -205,6 +205,10 @@ int sr_do_ioctl(Scsi_CD *cd, struct packet_command *cgc) cgc->timeout, IOCTL_RETRIES, 0, 0, NULL);
/* Minimal error checking. Ignore cases we know about, and report the rest. */ + if (result < 0) { + err = result; + goto out; + } if (driver_byte(result) != 0) { switch (sshdr->sense_key) { case UNIT_ATTENTION: diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 930f35863cbb..b8462f54cb72 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -8354,7 +8354,7 @@ static int ufshcd_set_dev_pwr_mode(struct ufs_hba *hba, sdev_printk(KERN_WARNING, sdp, "START_STOP failed for power mode: %d, result %x\n", pwr_mode, ret); - if (driver_byte(ret) == DRIVER_SENSE) + if (ret > 0 && driver_byte(ret) == DRIVER_SENSE) scsi_print_sense_hdr(sdp, NULL, &sshdr); }
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 6dac58ae6120..fabbad5cda60 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -355,7 +355,7 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) if (result == 0 && inq_result[0] >> 5) { /* PQ indicates the LUN is not attached */ scsi_remove_device(sdev); - } else if (host_byte(result) == DID_BAD_TARGET) { + } else if (result > 0 && host_byte(result) == DID_BAD_TARGET) { /* * If all LUNs of a virtio-scsi device are unplugged * it will respond with BAD TARGET on any INQUIRY diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h index 7ac50dbee475..6f227b610681 100644 --- a/include/scsi/scsi.h +++ b/include/scsi/scsi.h @@ -256,10 +256,13 @@ static inline int scsi_is_wlun(u64 lun) * This returns true for known good conditions that may be treated as * command completed normally */ -static inline int scsi_status_is_good(int status) +static inline bool scsi_status_is_good(int status) { + if (status < 0) + return false; + if (host_byte(status) == DID_NO_CONNECT) - return 0; + return false;
/* * FIXME: bit0 is listed as reserved in SCSI-2, but is
From: Ye Bin yebin10@huawei.com
mainline inclusion from mainline-v5.13-rc1 commit ac2f7ca51b0929461ea49918f27c11b680f28995 category: bugfix bugzilla: 182973 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
Before commit 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()"), the following series of commands would trigger a panic:
1. mount /dev/sda -o ro,errors=panic test 2. mount /dev/sda -o remount,abort test
After commit 014c9caa29d3, remounting a file system using the test mount option "abort" will no longer trigger a panic. This commit will restore the behaviour immediately before commit 014c9caa29d3. (However, note that the Linux kernel's behavior has not been consistent; some previous kernel versions, including 5.4 and 4.19 similarly did not panic after using the mount option "abort".)
This also makes a change to long-standing behaviour; namely, the following series commands will now cause a panic, when previously it did not:
1. mount /dev/sda -o ro,errors=panic test 2. echo test > /sys/fs/ext4/sda/trigger_fs_error
However, this makes ext4's behaviour much more consistent, so this is a good thing.
Cc: stable@kernel.org Fixes: 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()") Signed-off-by: Ye Bin yebin10@huawei.com Link: https://lore.kernel.org/r/20210401081903.3421208-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Zheng Liang zhengliang6@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/ext4/super.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 478b1516b7d4..926c0d7d9a3d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -667,9 +667,6 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, ext4_commit_super(sb); }
- if (sb_rdonly(sb) || continue_fs) - return; - /* * We force ERRORS_RO behavior when system is rebooting. Otherwise we * could panic during 'reboot -f' as the underlying device got already @@ -679,6 +676,10 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, panic("EXT4-fs (device %s): panic forced after error\n", sb->s_id); } + + if (sb_rdonly(sb) || continue_fs) + return; + ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only"); /* * Make sure updated value of ->s_mount_flags will be visible before
From: Pavel Begunkov asml.silence@gmail.com
mainline inclusion from mainline-v5.13-rc2 commit 447c19f3b5074409c794b350b10306e1da1ef4ba category: bugfix bugzilla: 185823 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Always remove linked timeout on io_link_timeout_fn() from the master request link list, otherwise we may get use-after-free when first io_link_timeout_fn() puts linked timeout in the fail path, and then will be found and put on master's free.
Cc: stable@vger.kernel.org # 5.10+ Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly") Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.162099012... Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: fs/io_uring.c
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/io_uring.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index f3078708fdb8..98552a1852ea 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6259,9 +6259,8 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) if (!list_empty(&req->link_list)) { prev = list_entry(req->link_list.prev, struct io_kiocb, link_list); - if (refcount_inc_not_zero(&prev->refs)) - list_del_init(&req->link_list); - else + list_del_init(&req->link_list); + if (!refcount_inc_not_zero(&prev->refs)) prev = NULL; }
From: Pavel Begunkov asml.silence@gmail.com
mainline inclusion from mainline-v5.13-rc1 commit f70865db5ff35f5ed0c7e9ef63e7cca3d4947f04 category: bugfix bugzilla: 185824 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Revert of revert of "io_uring: wait potential ->release() on resurrect", which adds a helper for resurrect not racing completion reinit, as was removed because of a strange bug with no clear root or link to the patch.
Was improved, instead of rcu_synchronize(), just wait_for_completion() because we're at 0 refs and it will happen very shortly. Specifically use non-interruptible version to ignore all pending signals that may have ended prior interruptible wait.
This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/7a080c20f686d026efade810b116b72f88abaff9.161810175... Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: fs/io_uring.c
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/io_uring.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 98552a1852ea..95c807ad3cff 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1009,6 +1009,18 @@ static inline bool __io_match_files(struct io_kiocb *req, req->work.identity->files == files; }
+static void io_refs_resurrect(struct percpu_ref *ref, struct completion *compl) +{ + bool got = percpu_ref_tryget(ref); + + /* already at zero, wait for ->release() */ + if (!got) + wait_for_completion(compl); + percpu_ref_resurrect(ref); + if (got) + percpu_ref_put(ref); +} + static bool io_match_task(struct io_kiocb *head, struct task_struct *task, struct files_struct *files) @@ -9747,12 +9759,11 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, if (ret < 0) break; } while (1); - mutex_lock(&ctx->uring_lock);
if (ret) { - percpu_ref_resurrect(&ctx->refs); - goto out_quiesce; + io_refs_resurrect(&ctx->refs, &ctx->ref_comp); + return ret; } }
@@ -9845,7 +9856,6 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, if (io_register_op_must_quiesce(opcode)) { /* bring the ctx back to life */ percpu_ref_reinit(&ctx->refs); -out_quiesce: reinit_completion(&ctx->ref_comp); } return ret;
From: Ye Bin yebin10@huawei.com
mainline inclusion from mainline-v5.16-rc3 commit 1d0254e6b47e73222fd3d6ae95cccbaafe5b3ecf category: bugfix bugzilla: 185836 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
I got issue as follows: [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 [ 594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 594.364987] Modules linked in: [ 594.365405] irq event stamp: 604180238 [ 594.365906] hardirqs last enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50 [ 594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0 [ 594.368420] softirqs last enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e [ 594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250 [ 594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G L 5.15.0-next-20211112+ #88 [ 594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 594.373604] Workqueue: events_unbound io_ring_exit_work [ 594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50 [ 594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474 [ 594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202 [ 594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106 [ 594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001 [ 594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab [ 594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0 [ 594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000 [ 594.382787] FS: 0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000 [ 594.383851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0 [ 594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 594.387403] Call Trace: [ 594.387738] <TASK> [ 594.388042] find_and_remove_object+0x118/0x160 [ 594.389321] delete_object_full+0xc/0x20 [ 594.389852] kfree+0x193/0x470 [ 594.390275] __io_remove_buffers.part.0+0xed/0x147 [ 594.390931] io_ring_ctx_free+0x342/0x6a2 [ 594.392159] io_ring_exit_work+0x41e/0x486 [ 594.396419] process_one_work+0x906/0x15a0 [ 594.399185] worker_thread+0x8b/0xd80 [ 594.400259] kthread+0x3bf/0x4a0 [ 594.401847] ret_from_fork+0x22/0x30 [ 594.402343] </TASK>
Message from syslogd@localhost at Nov 13 09:09:54 ... kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
We can reproduce this issue by follow syzkaller log: r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0) sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0) syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0) io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)
The reason above issue is 'buf->list' has 2,100,000 nodes, occupied cpu lead to soft lockup. To solve this issue, we need add schedule point when do while loop in '__io_remove_buffers'. After add schedule point we do regression, get follow data. [ 240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 [ 305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00 [ 333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 ...
Fixes:8bab4c09f24e("io_uring: allow conditional reschedule for intensive iterators") Signed-off-by: Ye Bin yebin10@huawei.com Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: fs/io_uring.c
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/io_uring.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 95c807ad3cff..600dd0898d7e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3967,6 +3967,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer *buf, kfree(nxt); if (++i == nbufs) return i; + cond_resched(); } i++; kfree(buf);
From: Baokun Li libaokun1@huawei.com
hulk inclusion category: bugfix bugzilla: 185858 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
Hulk robot reported a kmemleak problem: ----------------------------------------------------------------------- unreferenced object 0xffff93d1d8cc02e8 (size 248): comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s) hex dump (first 32 bytes): 00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00 .@.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000db5610b3>] seq_open+0x2a/0x80 [<00000000d66ac99d>] full_proxy_open+0x167/0x1e0 [<00000000d58ef917>] do_dentry_open+0x1e1/0x3a0 [<0000000016c91867>] path_openat+0x961/0xa20 [<00000000909c9564>] do_filp_open+0xae/0x120 [<0000000059c761e6>] do_sys_openat2+0x216/0x2f0 [<00000000b7a7b239>] do_sys_open+0x57/0x80 [<00000000e559d671>] do_syscall_64+0x33/0x40 [<000000000ea1fbfd>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff93d419854000 (size 4096): comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s) hex dump (first 32 bytes): 6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30 kfence-#250: 0x0 30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d 0000000754bda12- backtrace: [<000000008162c6f2>] seq_read_iter+0x313/0x440 [<0000000020b1b3e3>] seq_read+0x14b/0x1a0 [<00000000af248fbc>] full_proxy_read+0x56/0x80 [<00000000f97679d1>] vfs_read+0xa5/0x1b0 [<000000000ed8a36f>] ksys_read+0xa0/0xf0 [<00000000e559d671>] do_syscall_64+0x33/0x40 [<000000000ea1fbfd>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 -----------------------------------------------------------------------
I find that we can easily reproduce this problem with the following commands: `cat /sys/kernel/debug/kfence/objects` `echo scan > /sys/kernel/debug/kmemleak` `cat /sys/kernel/debug/kmemleak`
The leaked memory is allocated in the stack below: ---------------------------------- do_syscall_64 do_sys_open do_dentry_open full_proxy_open seq_open ---> alloc seq_file vfs_read full_proxy_read seq_read seq_read_iter traverse ---> alloc seq_buf ----------------------------------
And it should have been released in the following process: ---------------------------------- do_syscall_64 syscall_exit_to_user_mode exit_to_user_mode_prepare task_work_run ____fput __fput full_proxy_release ---> free here ----------------------------------
However, the release function corresponding to file_operations is not implemented in kfence. As a result, a memory leak occurs. Therefore, the solution to this problem is to implement the corresponding release function.
Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Baokun Li libaokun1@huawei.com Reported-by: Hulk Robot hulkci@huawei.com Acked-by: Marco Elver elver@google.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Yu Kuai yukuai3@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 09945784df9e..a19154a8d196 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -683,6 +683,7 @@ static const struct file_operations objects_fops = { .open = open_objects, .read = seq_read, .llseek = seq_lseek, + .release = seq_release, };
static int __init kfence_debugfs_init(void)
From: Ye Bin yebin10@huawei.com
mainline inclusion from mainline-v5.17 commit 8a7518931baa8ea023700987f3db31cb0a80610b category: bugfix bugzilla: 185819 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
We do test with inject error fault base on v4.19, after test some time we found sync /dev/sda always failed. [root@localhost] sync /dev/sda sync: error syncing '/dev/sda': Input/output error
scsi log as follows: [19069.812296] sd 0:0:0:0: [sda] tag#64 Send: scmd 0x00000000d03a0b6b [19069.812302] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812533] sd 0:0:0:0: [sda] tag#64 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK [19069.812536] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812539] sd 0:0:0:0: [sda] tag#64 scsi host busy 1 failed 0 [19069.812542] sd 0:0:0:0: Notifying upper driver of completion (result 0) [19069.812546] sd 0:0:0:0: [sda] tag#64 sd_done: completed 0 of 0 bytes [19069.812549] sd 0:0:0:0: [sda] tag#64 0 sectors total, 0 bytes done. [19069.812564] print_req_error: I/O error, dev sda, sector 0
ftrace log as follows: rep-306069 [007] .... 19654.923315: block_bio_queue: 8,0 FWS 0 + 0 [rep] rep-306069 [007] .... 19654.923333: block_getrq: 8,0 FWS 0 + 0 [rep] kworker/7:1H-250 [007] .... 19654.923352: block_rq_issue: 8,0 FF 0 () 0 + 0 [kworker/7:1H] <idle>-0 [007] ..s. 19654.923562: block_rq_complete: 8,0 FF () 18446744073709551615 + 0 [0] <idle>-0 [007] d.s. 19654.923576: block_rq_complete: 8,0 WS () 0 + 0 [-5]
As 8d6996630c03 introduce 'fq->rq_status', this data only update when 'flush_rq' reference count isn't zero. If flush request once failed and record error code in 'fq->rq_status'. If there is no chance to update 'fq->rq_status',then do fsync will always failed. To address this issue reset 'fq->rq_status' after return error code to upper layer.
Fixes: 8d6996630c03("block: fix null pointer dereference in blk_mq_rq_timed_out()") Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20211129012659.1553733-1-yebin10@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: block/blk-flush.c
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-flush.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c index b38839de9408..82919829bc4d 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -235,8 +235,10 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) * avoiding use-after-free. */ WRITE_ONCE(flush_rq->state, MQ_RQ_IDLE); - if (fq->rq_status != BLK_STS_OK) + if (fq->rq_status != BLK_STS_OK) { error = fq->rq_status; + fq->rq_status = BLK_STS_OK; + }
if (!q->elevator) { flush_rq->tag = BLK_MQ_NO_TAG;
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v5.11-rc1 commit acaf523a7bf226b28504306c1cfee194520123b3 category: bugfix bugzilla: 185789 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
blk_throtl_update_limit_valid() will search for descendants to see if 'LIMIT_LOW' of bps/iops and READ/WRITE is nonzero. However, they're always zero if CONFIG_BLK_DEV_THROTTLING_LOW is not set, furthermore, a lot of time will be wasted to iterate descendants.
Thus do nothing in blk_throtl_update_limit_valid() in such situation.
Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-throttle.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 4087dcc13f7d..3db507f33243 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -591,6 +591,7 @@ static void throtl_pd_online(struct blkg_policy_data *pd) tg_update_has_rules(tg); }
+#ifdef CONFIG_BLK_DEV_THROTTLING_LOW static void blk_throtl_update_limit_valid(struct throtl_data *td) { struct cgroup_subsys_state *pos_css; @@ -611,6 +612,11 @@ static void blk_throtl_update_limit_valid(struct throtl_data *td)
td->limit_valid[LIMIT_LOW] = low_valid; } +#else +static inline void blk_throtl_update_limit_valid(struct throtl_data *td) +{ +} +#endif
static void throtl_upgrade_state(struct throtl_data *td); static void throtl_pd_offline(struct blkg_policy_data *pd)
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v5.14-rc1 commit a731763fc479a9c64456e0643d0ccf64203100c9 category: bugfix bugzilla: 185790 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
We run a test that create millions of cgroups and blkgs, and then trigger blkg_destroy_all(). blkg_destroy_all() will hold spin lock for a long time in such situation. Thus release the lock when a batch of blkgs are destroyed.
blkcg_activate_policy() and blkcg_deactivate_policy() might have the same problem, however, as they are basically only called from module init/exit paths, let's leave them alone for now.
Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20210707015649.1929797-1-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-cgroup.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bd0c2bec05a8..cebcc00e30f8 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -57,6 +57,8 @@ static LIST_HEAD(all_blkcgs); /* protected by blkcg_pol_mutex */ bool blkcg_debug_stats = false; static struct workqueue_struct *blkcg_punt_bio_wq;
+#define BLKG_DESTROY_BATCH_SIZE 64 + static bool blkcg_policy_enabled(struct request_queue *q, const struct blkcg_policy *pol) { @@ -423,7 +425,9 @@ static void blkg_destroy(struct blkcg_gq *blkg) static void blkg_destroy_all(struct request_queue *q) { struct blkcg_gq *blkg, *n; + int count = BLKG_DESTROY_BATCH_SIZE;
+restart: spin_lock_irq(&q->queue_lock); list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) { struct blkcg *blkcg = blkg->blkcg; @@ -431,6 +435,17 @@ static void blkg_destroy_all(struct request_queue *q) spin_lock(&blkcg->lock); blkg_destroy(blkg); spin_unlock(&blkcg->lock); + + /* + * in order to avoid holding the spin lock for too long, release + * it when a batch of blkgs are destroyed. + */ + if (!(--count)) { + count = BLKG_DESTROY_BATCH_SIZE; + spin_unlock_irq(&q->queue_lock); + cond_resched(); + goto restart; + } }
q->root_blkg = NULL;
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 185885 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------
Thers is a out-of-order access problem which would cause endless sleep when we use pipe with epoll.
The story is following, we assume the ring size is 2, the ring head is 1, the ring tail is 0, task0 is write task, task1 is read task, task2 is write task. Task0 Task1 Task2 epoll_ctl(fd, EPOLL_CTL_ADD, ...) pipe_poll() poll_wait() tail = READ_ONCE(pipe->tail); // Re-order and get tail=0 pipe_read tail++ //tail=1 pipe_write head++ //head=2 head = READ_ONCE(pipe->head); // head = 2 check ring is full by head - tail Task0 get head = 2 and tail = 0, so it mistake that the pipe ring is full, then task0 is not add into ready list. If the ring is not full anymore, task0 would not be woken up forever.
The reason of this problem is that we got inconsistent head/tail value of the pipe ring, so we fix the problem by getting them protected.
Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/pipe.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/pipe.c b/fs/pipe.c index d6d4019ba32f..f5ae4feb512e 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -669,8 +669,10 @@ pipe_poll(struct file *filp, poll_table *wait) * if something changes and you got it wrong, the poll * table entry will wake you up and fix it. */ + spin_lock_irq(&pipe->rd_wait.lock); head = READ_ONCE(pipe->head); tail = READ_ONCE(pipe->tail); + spin_unlock_irq(&pipe->rd_wait.lock);
mask = 0; if (filp->f_mode & FMODE_READ) {
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.11-rc4 commit 01ea173e103edd5ec41acec65b9261b87e123fc2 category: bugfix bugzilla: 185881 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2021-4037
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
XFS always inherits the SGID bit if it is set on the parent inode, while the generic inode_init_owner does not do this in a few cases where it can create a possible security problem, see commit 0fa3ecd87848 ("Fix up non-directory creation in SGID directories") for details.
Switch XFS to use the generic helper for the normal path to fix this, just keeping the simple field inheritance open coded for the case of the non-sgid case with the bsdgrpid mount option.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org
conflicts: fs/xfs/xfs_inode.c
Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_inode.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 64b8408af9f1..af54224ebbe7 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -802,6 +802,7 @@ xfs_ialloc( xfs_buf_t **ialloc_context, xfs_inode_t **ipp) { + struct inode *dir = pip ? VFS_I(pip) : NULL; struct xfs_mount *mp = tp->t_mountp; xfs_ino_t ino; xfs_inode_t *ip; @@ -847,18 +848,17 @@ xfs_ialloc( return error; ASSERT(ip != NULL); inode = VFS_I(ip); - inode->i_mode = mode; set_nlink(inode, nlink); - inode->i_uid = current_fsuid(); inode->i_rdev = rdev; ip->i_d.di_projid = prid;
- if (pip && XFS_INHERIT_GID(pip)) { - inode->i_gid = VFS_I(pip)->i_gid; - if ((VFS_I(pip)->i_mode & S_ISGID) && S_ISDIR(mode)) - inode->i_mode |= S_ISGID; + if (dir && !(dir->i_mode & S_ISGID) && + (mp->m_flags & XFS_MOUNT_GRPID)) { + inode->i_uid = current_fsuid(); + inode->i_gid = dir->i_gid; + inode->i_mode = mode; } else { - inode->i_gid = current_fsgid(); + inode_init_owner(inode, dir, mode); }
/*
From: Ye Bin yebin10@huawei.com
hulk inclusion category: bugfix bugzilla: 185875 https://gitee.com/openeuler/kernel/issues/I4DDEL
-----------------------------------------------
We got issue as follows: [ 833.786542] nbd: failed to add new device [ 833.791613] ================================================================== [ 833.794918] BUG: KASAN: use-after-free in blk_mq_free_rqs+0x558/0x6c0 [ 833.798108] Read of size 8 at addr ffff800109b7c288 by task kworker/0:3/113 [ 833.804216] CPU: 0 PID: 113 Comm: kworker/0:3 Kdump: loaded Not tainted 4.19.90 #1 [ 833.807635] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 833.811798] Workqueue: events __blk_release_queue [ 833.815035] Call trace: [ 833.817964] dump_backtrace+0x0/0x3c0 [ 833.821070] show_stack+0x28/0x38 [ 833.824091] dump_stack+0xfc/0x154 [ 833.827042] print_address_description+0x68/0x278 [ 833.830147] kasan_report+0x204/0x330 [ 833.833121] __asan_report_load8_noabort+0x30/0x40 [ 833.836180] blk_mq_free_rqs+0x558/0x6c0 [ 833.839089] blk_mq_sched_tags_teardown+0xf4/0x1c0 [ 833.842035] blk_mq_exit_sched+0x1b8/0x260 [ 833.844878] elevator_exit+0x114/0x148 [ 833.847634] blk_exit_queue+0x68/0xe8 [ 833.850352] __blk_release_queue+0xd0/0x408 [ 833.853113] process_one_work+0x55c/0x10d0 [ 833.855864] worker_thread+0x3d4/0xe30 [ 833.858558] kthread+0x2c8/0x348 [ 833.861139] ret_from_fork+0x10/0x18 [ 833.863714] [ 833.866000] Allocated by task 186531: [ 833.868467] kasan_kmalloc+0xe0/0x190 [ 833.870936] kmem_cache_alloc_trace+0x104/0x218 [ 833.873483] nbd_dev_add+0x54/0x760 [nbd] [ 833.875988] nbd_genl_connect+0x3c4/0x1348 [nbd] [ 833.878591] genl_family_rcv_msg+0x798/0xa10 [ 833.881113] genl_rcv_msg+0xc0/0x170 [ 833.883489] netlink_rcv_skb+0x1b4/0x370 [ 833.885897] genl_rcv+0x40/0x58 [ 833.888225] netlink_unicast+0x4bc/0x660 [ 833.890661] netlink_sendmsg+0x880/0xa60 [ 833.893112] sock_sendmsg+0xb8/0x110 [ 833.895513] ____sys_sendmsg+0x570/0x698 [ 833.897927] ___sys_sendmsg+0x108/0x188 [ 833.900350] __sys_sendmsg+0xe8/0x198 [ 833.900360] __arm64_sys_sendmsg+0x78/0xa8 [ 833.906911] el0_svc_common+0x10c/0x330 [ 833.909289] el0_svc_handler+0x60/0xd0 [ 833.911660] el0_svc+0x8/0x1b0 [ 833.913963] [ 833.916117] Freed by task 186531: [ 833.918445] __kasan_slab_free+0x120/0x228 [ 833.920860] kasan_slab_free+0x10/0x18 [ 833.923193] kfree+0x80/0x1f0 [ 833.925392] nbd_dev_add+0xf0/0x760 [nbd] [ 833.927686] nbd_genl_connect+0x3c4/0x1348 [nbd] [ 833.929989] genl_family_rcv_msg+0x798/0xa10 [ 833.932231] genl_rcv_msg+0xc0/0x170 [ 833.934335] netlink_rcv_skb+0x1b4/0x370 [ 833.936444] genl_rcv+0x40/0x58 [ 833.938460] netlink_unicast+0x4bc/0x660 [ 833.940570] netlink_sendmsg+0x880/0xa60 [ 833.942682] sock_sendmsg+0xb8/0x110 [ 833.944745] ____sys_sendmsg+0x570/0x698 [ 833.946849] ___sys_sendmsg+0x108/0x188 [ 833.948924] __sys_sendmsg+0xe8/0x198 [ 833.950980] __arm64_sys_sendmsg+0x78/0xa8 [ 833.953059] el0_svc_common+0x10c/0x330 [ 833.955121] el0_svc_handler+0x60/0xd0 [ 833.957143] el0_svc+0x8/0x1b0 [ 833.959088] [ 833.960846] The buggy address belongs to the object at ffff800109b7c280 [ 833.960846] which belongs to the cache kmalloc-512 of size 512 [ 833.965502] The buggy address is located 8 bytes inside of [ 833.965502] 512-byte region [ffff800109b7c280, ffff800109b7c480) [ 833.970108] The buggy address belongs to the page: [ 833.972390] page:ffff7e000426df00 count:1 mapcount:0 mapping:ffff8000c0003800 index:0x0 compound_mapcount: 0 [ 833.975269] flags: 0x7ffff0000008100(slab|head) [ 833.977640] raw: 07ffff0000008100 ffff7e00035d1600 0000000200000002 ffff8000c0003800 [ 833.980426] raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 [ 833.983212] page dumped because: kasan: bad access detected [ 833.985778] [ 833.987935] Memory state around the buggy address: [ 833.990448] ffff800109b7c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 833.993265] ffff800109b7c200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 833.996097] >ffff800109b7c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 833.998930] ^ [ 834.001384] ffff800109b7c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 834.004329] ffff800109b7c380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 834.007264] ==================================================================
As 98fd847a495f commit add check "disk->first_minor", if failed will free tags and finally call put_disk will free request_queue. blk_mq_free_tag_set blk_mq_free_map_and_requests blk_mq_free_rqs
put_disk: __blk_release_queue blk_exit_queue elevator_exit blk_mq_exit_sched blk_mq_sched_tags_teardown blk_mq_free_rqs -->will trigger UAF To address this issue, just move 'disk->first_minor' check at the first in nbd_dev_add.
Fixes:98fd847a495f("nbd: add sanity check for first_minor") Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/block/nbd.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 07b06fc6f70e..1bb9f45f2e3d 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1758,6 +1758,15 @@ static int nbd_dev_add(int index) struct gendisk *disk; struct request_queue *q; int err = -ENOMEM; + int first_minor = index << part_shift; + + /* + * Too big index can cause duplicate creation of sysfs files/links, + * because MKDEV() expect that the max first minor is MINORMASK, or + * index << part_shift can overflow. + */ + if (first_minor < index || first_minor > MINORMASK) + return -EINVAL;
nbd = kzalloc(sizeof(struct nbd_device), GFP_KERNEL); if (!nbd) @@ -1821,18 +1830,7 @@ static int nbd_dev_add(int index) refcount_set(&nbd->refs, 1); INIT_LIST_HEAD(&nbd->list); disk->major = NBD_MAJOR; - - /* - * Too big index can cause duplicate creation of sysfs files/links, - * because MKDEV() expect that the max first minor is MINORMASK, or - * index << part_shift can overflow. - */ - disk->first_minor = index << part_shift; - if (disk->first_minor < index || disk->first_minor > MINORMASK) { - err = -EINVAL; - goto out_free_tags; - } - + disk->first_minor = first_minor; disk->fops = &nbd_fops; disk->private_data = nbd; sprintf(disk->disk_name, "nbd%d", index);
From: Linus Torvalds torvalds@linux-foundation.org
stable inclusion from stable-5.10.84 commit 4baba6ba56eb91a735a027f783cc4b9276b48d5b category: bugfix bugzilla: 185907 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2021-4083
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
-------------------------------------------------
commit 054aa8d439b9185d4f5eb9a90282d1ce74772969 upstream.
Jann Horn points out that there is another possible race wrt Unix domain socket garbage collection, somewhat reminiscent of the one fixed in commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
See the extended comment about the garbage collection requirements added to unix_peek_fds() by that commit for details.
The race comes from how we can locklessly look up a file descriptor just as it is in the process of being closed, and with the right artificial timing (Jann added a few strategic 'mdelay(500)' calls to do that), the Unix domain socket garbage collector could see the reference count decrement of the close() happen before fget() took its reference to the file and the file was attached onto a new file descriptor.
This is all (intentionally) correct on the 'struct file *' side, with RCU lookups and lockless reference counting very much part of the design. Getting that reference count out of order isn't a problem per se.
But the garbage collector can get confused by seeing this situation of having seen a file not having any remaining external references and then seeing it being attached to an fd.
In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the fix was to serialize the file descriptor install with the garbage collector by taking and releasing the unix_gc_lock.
That's not really an option here, but since this all happens when we are in the process of looking up a file descriptor, we can instead simply just re-check that the file hasn't been closed in the meantime, and just re-do the lookup if we raced with a concurrent close() of the same file descriptor.
Reported-and-tested-by: Jann Horn jannh@google.com Acked-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/file.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/file.c b/fs/file.c index 59f8d3d82509..e4c168ddd8e7 100644 --- a/fs/file.c +++ b/fs/file.c @@ -883,6 +883,10 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd, file = NULL; else if (!get_file_rcu_many(file, refs)) goto loop; + else if (__fcheck_files(files, fd) != file) { + fput_many(file, refs); + goto loop; + } } rcu_read_unlock();
From: Zhihao Cheng chengzhihao1@huawei.com
hulk inclusion category: bugfix bugzilla: 185912 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------
There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private
There is a third possibility which maybe related to [1] that page is private but not dirty caused by following process:
PA lock(page) ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page)
write_cache_pages lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage write_inode // fail, goto out, following codes are not executed // do_writepage // set_page_writeback // set Writeback // detach_page_private // clear Private // end_page_writeback // clear Writeback out: unlock(page) // Private, Not Dirty
PB ksys_fadvise64_64 generic_fadvise invalidate_inode_page // page is neither Dirty nor Writeback invalidate_complete_page // page_has_private is true try_to_release_page ubifs_releasepage ubifs_assert(c, 0) !!!
Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1499 UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x7e/0x1e0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 vfs_fadvise+0x35/0x40 ksys_fadvise64_64+0x4c/0xb0
Jump [2] to find a reproducer.
[1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-... [2] https://bugzilla.kernel.org/show_bug.cgi?id=215357
Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/ubifs/file.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index df46f2d3ff8b..d89f6a2f184a 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1031,7 +1031,7 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc) if (page->index >= synced_i_size >> PAGE_SHIFT) { err = inode->i_sb->s_op->write_inode(inode, NULL); if (err) - goto out_unlock; + goto out_redirty; /* * The inode has been written, but the write-buffer has * not been synchronized, so in case of an unclean @@ -1059,11 +1059,17 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc) if (i_size > synced_i_size) { err = inode->i_sb->s_op->write_inode(inode, NULL); if (err) - goto out_unlock; + goto out_redirty; }
return do_writepage(page, len); - +out_redirty: + /* + * redirty_page_for_writepage() won't call ubifs_dirty_inode() because + * it passes I_DIRTY_PAGES flag while calling __mark_inode_dirty(), so + * there is no need to do space budget for dirty inode. + */ + redirty_page_for_writepage(wbc, page); out_unlock: unlock_page(page); return err;
From: Zhihao Cheng chengzhihao1@huawei.com
hulk inclusion category: bugfix bugzilla: 185912 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------
There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private
The normal process cannot go to ubifs_releasepage() which means there exists pages being private but not dirty. Reproducer[1] shows that it could occur (which maybe related to [2]) with following process:
PA PB PC lock(page)[PA] ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page)
write_cache_pages[PA] lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage
do_truncation[PB] truncate_setsize i_size_write(inode, newsize) // newsize = 0
i_size = i_size_read(inode) // i_size = 0 end_index = i_size >> PAGE_SHIFT if (page->index > end_index) goto out // jump out: unlock(page) // Private, Not Dirty
generic_fadvise[PC] lock(page) invalidate_inode_page try_to_release_page ubifs_releasepage ubifs_assert(c, 0) // bad assertion! unlock(page)
Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1683): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1513 UBIFS warning (ubi0:0 pid 1683): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1683 Comm: aa Not tainted 5.16.0-rc5-00184-g0bca5994cacc-dirty #308 Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x67/0x1d0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 __invalidate_mapping_pages+0xb9/0x280 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 ksys_fadvise64_64+0x4c/0xb0
[1] https://bugzilla.kernel.org/show_bug.cgi?id=215373 [2] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-...
Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/ubifs/file.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index d89f6a2f184a..b35983dff75b 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1493,14 +1493,23 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags) struct inode *inode = page->mapping->host; struct ubifs_info *c = inode->i_sb->s_fs_info;
- /* - * An attempt to release a dirty page without budgeting for it - should - * not happen. - */ if (PageWriteback(page)) return 0; + + /* + * Page is private but not dirty, weird? There is one condition + * making it happened. ubifs_writepage skipped the page because + * page index beyonds isize (for example. truncated by other + * process named A), then the page is invalidated by fadvise64 + * syscall before being truncated by process A. + */ ubifs_assert(c, PagePrivate(page)); - ubifs_assert(c, 0); + if (PageChecked(page)) + release_new_page_budget(c); + else + release_existing_page_budget(c); + + atomic_long_dec(&c->dirty_pg_cnt); detach_page_private(page); ClearPageChecked(page); return 1;