From: Xingui Yang yangxingui@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6J1SI CVE: NA
---------------------------- When the IO complete and free slot in function slot_complete_v3_hw(), it is possible that sas_dev.list is being traversed elsewhere, and it may trigger a null pointer exception, such as follows: ==>cq thread ==>scsi_eh_6
==>scsi_error_handler() ==>sas_eh_handle_sas_errors() ==>sas_scsi_find_task() ==>lldd_abort_task() ==>slot_complete_v3_hw() ==>hisi_sas_abort_task() ==>hisi_sas_slot_task_free() ==>dereg_device_v3_hw() ==>list_del_init() ==>list_for_each_entry_safe()
[ 7165.434918] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [ 7165.434926] sas: trying to find task 0x00000000769b5ba5 [ 7165.434927] sas: sas_scsi_find_task: aborting task 0x00000000769b5ba5 [ 7165.434940] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000769b5ba5) aborted [ 7165.434964] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000c9f7aa07) ignored [ 7165.434965] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000e2a1cf01) ignored [ 7165.434968] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 7165.434972] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000022d52d93) ignored [ 7165.434975] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000066a7516c) ignored [ 7165.434976] Mem abort info: [ 7165.434982] ESR = 0x96000004 [ 7165.434991] Exception class = DABT (current EL), IL = 32 bits [ 7165.434992] SET = 0, FnV = 0 [ 7165.434993] EA = 0, S1PTW = 0 [ 7165.434994] Data abort info: [ 7165.434994] ISV = 0, ISS = 0x00000004 [ 7165.434995] CM = 0, WnR = 0 [ 7165.434997] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f29543f2 [ 7165.434998] [0000000000000000] pgd=0000000000000000 [ 7165.435003] Internal error: Oops: 96000004 [#1] SMP [ 7165.439863] Process scsi_eh_6 (pid: 4109, stack limit = 0x00000000c43818d5) [ 7165.468862] pstate: 00c00009 (nzcv daif +PAN +UAO) [ 7165.473637] pc : dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.479443] lr : dereg_device_v3_hw+0x2c/0xa8 [hisi_sas_v3_hw] [ 7165.485247] sp : ffff00001d623bc0 [ 7165.488546] x29: ffff00001d623bc0 x28: ffffa027d03b9508 [ 7165.493835] x27: ffff80278ed50af0 x26: ffffa027dd31e0a8 [ 7165.499123] x25: ffffa027d9b27f88 x24: ffffa027d9b209f8 [ 7165.504411] x23: ffffa027c45b0d60 x22: ffff80278ec07c00 [ 7165.509700] x21: 0000000000000008 x20: ffffa027d9b209f8 [ 7165.514988] x19: ffffa027d9b27f88 x18: ffffffffffffffff [ 7165.520276] x17: 0000000000000000 x16: 0000000000000000 [ 7165.525564] x15: ffff0000091d9708 x14: ffff0000093b7dc8 [ 7165.530852] x13: ffff0000093b7a23 x12: 6e7265746e692067 [ 7165.536140] x11: 0000000000000000 x10: 0000000000000bb0 [ 7165.541429] x9 : ffff00001d6238f0 x8 : ffffa027d877af00 [ 7165.546718] x7 : ffffa027d6329600 x6 : ffff7e809f58ca00 [ 7165.552006] x5 : 0000000000001f8a x4 : 000000000000088e [ 7165.557295] x3 : ffffa027d9b27fa8 x2 : 0000000000000000 [ 7165.562583] x1 : 0000000000000000 x0 : 000000003000188e [ 7165.567872] Call trace: [ 7165.570309] dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.575775] hisi_sas_abort_task+0x248/0x358 [hisi_sas_main] [ 7165.581415] sas_eh_handle_sas_errors+0x258/0x8e0 [libsas] [ 7165.586876] sas_scsi_recover_host+0x134/0x458 [libsas] [ 7165.592082] scsi_error_handler+0xb4/0x488 [ 7165.596163] kthread+0x134/0x138 [ 7165.599380] ret_from_fork+0x10/0x18 [ 7165.602940] Code: d5033e9f b9000040 aa0103e2 eb03003f (f9400021) [ 7165.609004] kernel fault(0x1) notification starting on CPU 75 [ 7165.700728] ---[ end trace fc042cbbea224efc ]--- [ 7165.705326] Kernel panic - not syncing: Fatal exception
So grab sas_dev lock when traversing the members of sas_dev.list in dereg_device_v3_hw() and hisi_sas_release_tasks() to avoid concurrency of adding and deleting member, which may cause an exception. And when hisi_sas_release_tasks() call hisi_sas_do_release_task() to free slot, the lock cannot be grabbed again in hisi_sas_slot_task_free(), then a bool parameter need_lock is added.
Signed-off-by: Xingui Yang yangxingui@huawei.com Reviewed-by: kang fenglong kangfenglong@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/scsi/hisi_sas/hisi_sas.h | 3 ++- drivers/scsi/hisi_sas/hisi_sas_main.c | 26 +++++++++++++++++--------- drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +- drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 2 +- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 ++++- 5 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h index 533408277156..2880e4e22a73 100644 --- a/drivers/scsi/hisi_sas/hisi_sas.h +++ b/drivers/scsi/hisi_sas/hisi_sas.h @@ -601,7 +601,8 @@ extern void hisi_sas_phy_enable(struct hisi_hba *hisi_hba, int phy_no, extern void hisi_sas_phy_down(struct hisi_hba *hisi_hba, int phy_no, int rdy); extern void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task, - struct hisi_sas_slot *slot); + struct hisi_sas_slot *slot, + bool need_lock); extern void hisi_sas_init_mem(struct hisi_hba *hisi_hba); extern void hisi_sas_rst_work_handler(struct work_struct *work); extern void hisi_sas_sync_rst_work_handler(struct work_struct *work); diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index e6ccaedcd8b9..b00bf3b0f418 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -229,7 +229,7 @@ static void hisi_sas_slot_index_init(struct hisi_hba *hisi_hba) }
void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task, - struct hisi_sas_slot *slot) + struct hisi_sas_slot *slot, bool need_lock) { unsigned long flags; int device_id = slot->device_id; @@ -260,9 +260,13 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task, } }
- spin_lock_irqsave(&sas_dev->lock, flags); - list_del_init(&slot->entry); - spin_unlock_irqrestore(&sas_dev->lock, flags); + if (need_lock) { + spin_lock_irqsave(&sas_dev->lock, flags); + list_del_init(&slot->entry); + spin_unlock_irqrestore(&sas_dev->lock, flags); + } else { + list_del_init(&slot->entry); + }
memset(slot, 0, offsetof(struct hisi_sas_slot, buf));
@@ -989,7 +993,7 @@ static void hisi_sas_port_notify_formed(struct asd_sas_phy *sas_phy) }
static void hisi_sas_do_release_task(struct hisi_hba *hisi_hba, struct sas_task *task, - struct hisi_sas_slot *slot) + struct hisi_sas_slot *slot, bool need_lock) { if (task) { unsigned long flags; @@ -1011,7 +1015,7 @@ static void hisi_sas_do_release_task(struct hisi_hba *hisi_hba, struct sas_task task->task_done(task); }
- hisi_sas_slot_task_free(hisi_hba, task, slot); + hisi_sas_slot_task_free(hisi_hba, task, slot, need_lock); }
static void hisi_sas_release_task(struct hisi_hba *hisi_hba, @@ -1019,9 +1023,13 @@ static void hisi_sas_release_task(struct hisi_hba *hisi_hba, { struct hisi_sas_slot *slot, *slot2; struct hisi_sas_device *sas_dev = device->lldd_dev; + unsigned long flags;
+ spin_lock_irqsave(&sas_dev->lock, flags); list_for_each_entry_safe(slot, slot2, &sas_dev->list, entry) - hisi_sas_do_release_task(hisi_hba, slot->task, slot); + hisi_sas_do_release_task(hisi_hba, slot->task, slot, false); + + spin_unlock_irqrestore(&sas_dev->lock, flags); }
void hisi_sas_release_tasks(struct hisi_hba *hisi_hba) @@ -1701,7 +1709,7 @@ static int hisi_sas_abort_task(struct sas_task *task) */ if (rc == TMF_RESP_FUNC_COMPLETE && rc2 != TMF_RESP_FUNC_SUCC) { if (task->lldd_task) - hisi_sas_do_release_task(hisi_hba, task, slot); + hisi_sas_do_release_task(hisi_hba, task, slot, true); } } else if (task->task_proto & SAS_PROTOCOL_SATA || task->task_proto & SAS_PROTOCOL_STP) { @@ -1722,7 +1730,7 @@ static int hisi_sas_abort_task(struct sas_task *task) */ if ((sas_dev->dev_status == HISI_SAS_DEV_NCQ_ERR) && qc && qc->scsicmd) { - hisi_sas_do_release_task(hisi_hba, task, slot); + hisi_sas_do_release_task(hisi_hba, task, slot, true); rc = TMF_RESP_FUNC_COMPLETE; } else { rc = hisi_sas_softreset_ata_disk(device); diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 452665c641a6..6485e2b6456c 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -1320,7 +1320,7 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba, }
out: - hisi_sas_slot_task_free(hisi_hba, task, slot); + hisi_sas_slot_task_free(hisi_hba, task, slot, true); sts = ts->stat;
if (task->task_done) diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 46d274582684..2df70c4873f5 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -2480,7 +2480,7 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) } task->task_state_flags |= SAS_TASK_STATE_DONE; spin_unlock_irqrestore(&task->task_state_lock, flags); - hisi_sas_slot_task_free(hisi_hba, task, slot); + hisi_sas_slot_task_free(hisi_hba, task, slot, true);
if (!is_internal && (task->task_proto != SAS_PROTOCOL_SMP)) { spin_lock_irqsave(&device->done_lock, flags); diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 77f451040c4c..de88dead4b72 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -914,9 +914,11 @@ static void dereg_device_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot, *slot2; struct hisi_sas_device *sas_dev = device->lldd_dev; u32 cfg_abt_set_query_iptt; + unsigned long flags;
cfg_abt_set_query_iptt = hisi_sas_read32(hisi_hba, CFG_ABT_SET_QUERY_IPTT); + spin_lock_irqsave(&sas_dev->lock, flags); list_for_each_entry_safe(slot, slot2, &sas_dev->list, entry) { cfg_abt_set_query_iptt &= ~CFG_SET_ABORTED_IPTT_MSK; cfg_abt_set_query_iptt |= (1 << CFG_SET_ABORTED_EN_OFF) | @@ -924,6 +926,7 @@ static void dereg_device_v3_hw(struct hisi_hba *hisi_hba, hisi_sas_write32(hisi_hba, CFG_ABT_SET_QUERY_IPTT, cfg_abt_set_query_iptt); } + spin_unlock_irqrestore(&sas_dev->lock, flags); cfg_abt_set_query_iptt &= ~(1 << CFG_SET_ABORTED_EN_OFF); hisi_sas_write32(hisi_hba, CFG_ABT_SET_QUERY_IPTT, cfg_abt_set_query_iptt); @@ -2558,7 +2561,7 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) } task->task_state_flags |= SAS_TASK_STATE_DONE; spin_unlock_irqrestore(&task->task_state_lock, flags); - hisi_sas_slot_task_free(hisi_hba, task, slot); + hisi_sas_slot_task_free(hisi_hba, task, slot, true);
if (!is_internal && (task->task_proto != SAS_PROTOCOL_SMP)) { spin_lock_irqsave(&device->done_lock, flags);
From: Xingui Yang yangxingui@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6J25G CVE: NA
---------------------------- If an NCQ error occurs when the IPTT is valid and slot->abort flag is set in completion path, sas_task_abort() will be called to abort only one NCQ command now, and the host would be set to SHOST_RECOVERY state. But this may not kick-off EH Immediately until other outstanding QCs timeouts. As a result, the host may remain in the SHOST_RECOVERY state for up to 30 seconds, such as follows:
[7972317.645234] hisi_sas_v3_hw 0000:74:04.0: erroneous completion iptt=3264 task=00000000466116b8 dev id=2 sas_addr=0x5000000000000502 CQ hdr: 0x1883 0x20cc0 0x40000 0x20420000 Error info: 0x0 0x0 0x200000 0x0 [7972341.508264] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [7972341.984731] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 32 tries: 1
So all NCQ commands that are in the queue should be aborted when an NCQ error occurs In this scenario.
Signed-off-by: Xingui Yang yangxingui@huawei.com Reviewed-by: kang fenglong kangfenglong@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 68 ++++++++++++++++++-------- 1 file changed, 47 insertions(+), 21 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index de88dead4b72..9d840538aea8 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2268,6 +2268,22 @@ static bool is_ncq_err(struct hisi_sas_complete_v3_hdr *complete_hdr) (dw3 & FIS_ATA_STATUS_ERR); }
+static void hisi_sas_ata_device_link_abort(struct domain_device *device) +{ + struct ata_port *ap = device->sata_dev.ap; + struct ata_link *link = &ap->link; + unsigned long flags; + + spin_lock_irqsave(ap->lock, flags); + device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ + device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ + + link->eh_info.err_mask |= AC_ERR_DEV; + link->eh_info.action |= ATA_EH_RESET; + ata_link_abort(link); + spin_unlock_irqrestore(ap->lock, flags); +} + static void set_aborted_iptt(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) { @@ -2490,7 +2506,11 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) iu->resp_data[0]); } if (unlikely(slot->abort)) { - sas_task_abort(task); + if (dev_is_sata(device) && task->ata_task.use_ncq) + hisi_sas_ata_device_link_abort(device); + else + sas_task_abort(task); + return ts->stat; } goto out; @@ -2580,6 +2600,31 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) return sts; }
+static void hisi_sas_disk_err_handler(struct hisi_hba *hisi_hba, + struct hisi_sas_complete_v3_hdr *complete_hdr) +{ + u32 dw0 = le32_to_cpu(complete_hdr->dw0); + u32 dw1 = le32_to_cpu(complete_hdr->dw1); + u32 dw3 = le32_to_cpu(complete_hdr->dw3); + int device_id = (dw1 & CMPLT_HDR_DEV_ID_MSK) >> + CMPLT_HDR_DEV_ID_OFF; + struct hisi_sas_itct *itct = + &hisi_hba->itct[device_id]; + struct hisi_sas_device *sas_dev = + &hisi_hba->devices[device_id]; + struct domain_device *device = sas_dev->sas_device; + struct device *dev = hisi_hba->dev; + + dev_err(dev, "erroneous completion disk err dev id=%d sas_addr=0x%llx CQ hdr: 0x%x 0x%x 0x%x 0x%x\n", + device_id, itct->sas_addr, dw0, dw1, + complete_hdr->act, dw3); + + if (is_ncq_err(complete_hdr)) + sas_dev->dev_status = HISI_SAS_DEV_NCQ_ERR; + + hisi_sas_ata_device_link_abort(device); +} + static void cq_tasklet_v3_hw(unsigned long val) { struct hisi_sas_cq *cq = (struct hisi_sas_cq *)val; @@ -2608,26 +2653,7 @@ static void cq_tasklet_v3_hw(unsigned long val)
if (unlikely((dw0 & CMPLT_HDR_CMPLT_MSK) == 0x3) && (dw3 & CMPLT_HDR_SATA_DISK_ERR_MSK)) { - int device_id = (dw1 & CMPLT_HDR_DEV_ID_MSK) >> - CMPLT_HDR_DEV_ID_OFF; - struct hisi_sas_itct *itct = - &hisi_hba->itct[device_id]; - struct hisi_sas_device *sas_dev = - &hisi_hba->devices[device_id]; - struct domain_device *device = sas_dev->sas_device; - struct ata_port *ap = device->sata_dev.ap; - struct ata_link *link = &ap->link; - - dev_err(dev, "erroneous completion disk err dev id=%d sas_addr=0x%llx CQ hdr: 0x%x 0x%x 0x%x 0x%x\n", - device_id, itct->sas_addr, dw0, dw1, - complete_hdr->act, dw3); - - if (is_ncq_err(complete_hdr)) - sas_dev->dev_status = HISI_SAS_DEV_NCQ_ERR; - - link->eh_info.err_mask |= AC_ERR_DEV; - link->eh_info.action |= ATA_EH_RESET; - ata_link_abort(link); + hisi_sas_disk_err_handler(hisi_hba, complete_hdr); } else if (likely(iptt < HISI_SAS_COMMAND_ENTRIES_V3_HW)) { slot = &hisi_hba->slot_info[iptt]; slot->cmplt_queue_slot = rd_point;
From: Xingui Yang yangxingui@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6J2FQ CVE: NA
-----------------------------------------
The channel int0 flag is not cleared when exit interrupt, and the OOB interrupt is triggered by other PHYs. The previously uncleared interrupt will continues to be triggered as follows:
hisi_sas_v3_hw 0000:b4:02.0: phydown: phy2 phy_state=0xfb hisi_sas_v3_hw 0000:b4:02.0: ignore flutter phy2 down hisi_sas_v3_hw 0000:b4:02.0: phy2 OOB ready hisi_sas_v3_hw 0000:b4:02.0: phy2 phy_attached! hisi_sas_v3_hw 0000:b4:02.0: phyup: phy2 link_rate=10(sata)
hisi_sas_v3_hw 0000:b4:02.0: phydown: phy3 phy_state=0xf7 hisi_sas_v3_hw 0000:b4:02.0: ignore flutter phy3 down hisi_sas_v3_hw 0000:b4:02.0: phy2 OOB ready hisi_sas_v3_hw 0000:b4:02.0: phy2 phy_attached! hisi_sas_v3_hw 0000:b4:02.0: phy3 OOB ready hisi_sas_v3_hw 0000:b4:02.0: phy3 phy_attached!
So clear interrupt status when exiting channel int0 for v3 hw.
Signed-off-by: Xingui Yang yangxingui@huawei.com Reviewed-by: kang fenglong kangfenglong@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 9d840538aea8..5e2488fd2466 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -1937,7 +1937,7 @@ static void handle_chl_int0_v3_hw(struct hisi_hba *hisi_hba, int phy_no)
dev_dbg(dev, "phy%d OOB ready\n", phy_no); if (phy->phy_attached) - return; + goto out;
if (!timer_pending(&phy->timer)) { phy->timer.function = wait_phyup_timedout_v3_hw; @@ -1947,6 +1947,7 @@ static void handle_chl_int0_v3_hw(struct hisi_hba *hisi_hba, int phy_no) } }
+out: hisi_sas_phy_write32(hisi_hba, phy_no, CHL_INT0, irq_value0 & (~CHL_INT0_SL_RX_BCST_ACK_MSK) & (~CHL_INT0_SL_PHY_ENABLE_MSK)
From: "Paul E. McKenney" paulmck@kernel.org
mainline inclusion from mainline-v5.13-rc1 commit 1c0c4bc1ceb580851b2d76fdef9712b3bdae134b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O1UD CVE: NA
--------------------------------
If there is heavy softirq activity, the softirq system will attempt to awaken ksoftirqd and will stop the traditional back-of-interrupt softirq processing. This is all well and good, but only if the ksoftirqd kthreads already exist, which is not the case during early boot, in which case the system hangs.
One reproducer is as follows:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_NO_HZ_IDLE=y CONFIG_HZ_PERIODIC=n" --bootargs "threadirqs=1" --trust-make
This commit therefore adds a couple of existence checks for ksoftirqd and forces back-of-interrupt softirq processing when ksoftirqd does not yet exist. With this change, the above test passes.
Reported-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reported-by: Uladzislau Rezki urezki@gmail.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de [ paulmck: Remove unneeded check per Sebastian Siewior feedback. ] Reviewed-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/softirq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c index 6f584861d329..4daab24bd4e2 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -362,7 +362,7 @@ static inline void invoke_softirq(void) if (ksoftirqd_running(local_softirq_pending())) return;
- if (!force_irqthreads) { + if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) { #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK /* * We can safely execute softirq on the current stack if
From: Thomas Gleixner tglx@linutronix.de
mainline inclusion from mainline-v5.12-rc4 commit 81e2073c175b887398e5bca6c004efa89983f58d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O1UD CVE: NA
--------------------------------
With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue:
thread(irq_A) irq_handler(A) spin_lock(&foo->lock);
interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock);
This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler.
Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT.
Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool.
For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference.
For RT kernels there is no issue.
Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading") Reported-by: Johan Hovold johan@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Johan Hovold johan@kernel.org Acked-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/irq/manage.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 213f88f7fdfb..fc180f45f3a0 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1030,11 +1030,15 @@ irq_forced_thread_fn(struct irq_desc *desc, struct irqaction *action) irqreturn_t ret;
local_bh_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_disable(); ret = action->thread_fn(action->irq, action->dev_id); if (ret == IRQ_HANDLED) atomic_inc(&desc->threads_handled);
irq_finalize_oneshot(desc, action); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_enable(); local_bh_enable(); return ret; }
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
mainline inclusion from mainline-v5.13-rc1 commit c5e3a41187ac01425f5ad1abce927905e4ac44e4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O1UD CVE: NA
--------------------------------
KMSAN complains that new_value at cpumask_parse_user() from write_irq_affinity() from irq_affinity_proc_write() is uninitialized.
[ 148.133411][ T5509] ===================================================== [ 148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340 [ 148.137819][ T5509] [ 148.138448][ T5509] Local variable ----new_value.i@irq_affinity_proc_write created at: [ 148.140768][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.142298][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.143823][ T5509] =====================================================
Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(), any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility that find_next_bit() accesses uninitialized cpu mask variable. Fix this problem by replacing alloc_cpumask_var() with zalloc_cpumask_var().
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Link: https://lore.kernel.org/r/20210401055823.3929-1-penguin-kernel@I-love.SAKURA... Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/irq/proc.c | 4 ++-- kernel/profile.c | 2 +- kernel/trace/trace.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index a914f1772c62..8bd25bba99b1 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -148,7 +148,7 @@ static ssize_t write_irq_affinity(int type, struct file *file, if (!irq_can_set_affinity_usr(irq) || no_irq_affinity) return -EIO;
- if (!alloc_cpumask_var(&new_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM;
if (type) @@ -247,7 +247,7 @@ static ssize_t default_affinity_write(struct file *file, cpumask_var_t new_value; int err;
- if (!alloc_cpumask_var(&new_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM;
err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/profile.c b/kernel/profile.c index 7fc621404230..ca466307db9b 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -437,7 +437,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file, cpumask_var_t new_value; int err;
- if (!alloc_cpumask_var(&new_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM;
err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 9daaee7a9dd2..1e1781ace916 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4231,7 +4231,7 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, cpumask_var_t tracing_cpumask_new; int err, cpu;
- if (!alloc_cpumask_var(&tracing_cpumask_new, GFP_KERNEL)) + if (!zalloc_cpumask_var(&tracing_cpumask_new, GFP_KERNEL)) return -ENOMEM;
err = cpumask_parse_user(ubuf, count, tracing_cpumask_new);
From: James Morse james.morse@arm.com
stable inclusion from stable-v4.19.264 commit 8f513afabebc46900120badb3d4e858395b6f08f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O1UD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 44b3834b2eed595af07021b1c64e6f9bc396398b upstream.
Cortex-A57 and Cortex-A72 have an erratum where an interrupt that occurs between a pair of AES instructions in aarch32 mode may corrupt the ELR. The task will subsequently produce the wrong AES result.
The AES instructions are part of the cryptographic extensions, which are optional. User-space software will detect the support for these instructions from the hwcaps. If the platform doesn't support these instructions a software implementation should be used.
Remove the hwcap bits on affected parts to indicate user-space should not use the AES instructions.
Acked-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: James Morse james.morse@arm.com Link: https://lore.kernel.org/r/20220714161523.279570-3-james.morse@arm.com Signed-off-by: Will Deacon will@kernel.org [florian: resolved conflicts in arch/arm64/tools/cpucaps and cpu_errata.c] Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
conflicts: arch/arm64/include/asm/cpucaps.h arch/arm64/kernel/cpu_errata.c arch/arm64/kernel/cpufeature.c
Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- Documentation/arm64/silicon-errata.txt | 2 ++ arch/arm64/Kconfig | 16 ++++++++++++++++ arch/arm64/include/asm/cpucaps.h | 1 + arch/arm64/kernel/cpu_errata.c | 17 +++++++++++++++++ arch/arm64/kernel/cpufeature.c | 13 ++++++++++++- 5 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index 553e6aff3862..5d97fd9f1ccb 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -55,7 +55,9 @@ stable kernels. | ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | | ARM | Cortex-A57 | #852523 | N/A | | ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | | ARM | Cortex-A72 | #853709 | N/A | +| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | | ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | | ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | | ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e24792913bfb..88b8031a93b2 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -512,6 +512,22 @@ config ARM64_ERRATUM_1463225
If unsure, say Y.
+config ARM64_ERRATUM_1742098 + bool "Cortex-A57/A72: 1742098: ELR recorded incorrectly on interrupt taken between cryptographic instructions in a sequence" + depends on COMPAT + default y + help + This option removes the AES hwcap for aarch32 user-space to + workaround erratum 1742098 on Cortex-A57 and Cortex-A72. + + Affected parts may corrupt the AES state if an interrupt is + taken between a pair of AES instructions. These instructions + are only present if the cryptography extensions are present. + All software should have a fallback implementation for CPUs + that don't implement the cryptography extensions. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index f6d2ed5341a4..5edcd00ee6df 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -65,5 +65,6 @@ #endif
#define ARM64_SPECTRE_BHB 40 +#define ARM64_WORKAROUND_1742098 41
#endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index afa72ccbf3ca..bae255cf5ab5 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -769,6 +769,15 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
#endif
+#ifdef CONFIG_ARM64_ERRATUM_1742098 +static struct midr_range broken_aarch32_aes[] = { + MIDR_RANGE(MIDR_CORTEX_A57, 0, 1, 0xf, 0xf), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), + {}, +}; +#endif + + const struct arm64_cpu_capabilities arm64_errata[] = { #if defined(CONFIG_ARM64_ERRATUM_826319) || \ defined(CONFIG_ARM64_ERRATUM_827319) || \ @@ -968,6 +977,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = { ERRATA_MIDR_RANGE_LIST(tx2_family_cpus), .matches = needs_tx2_tvm_workaround, }, +#endif +#ifdef CONFIG_ARM64_ERRATUM_1742098 + { + .desc = "ARM erratum 1742098", + .capability = ARM64_WORKAROUND_1742098, + CAP_MIDR_RANGE_LIST(broken_aarch32_aes), + .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, + }, #endif { } diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index d7d8a4ab94b4..1c93cc3f7692 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -31,6 +31,7 @@ #include <asm/cpufeature.h> #include <asm/cpu_ops.h> #include <asm/fpsimd.h> +#include <asm/hwcap.h> #include <asm/mmu_context.h> #include <asm/processor.h> #include <asm/sysreg.h> @@ -1279,6 +1280,14 @@ static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry, } #endif
+static void elf_hwcap_fixup(void) +{ +#ifdef CONFIG_ARM64_ERRATUM_1742098 + if (cpus_have_const_cap(ARM64_WORKAROUND_1742098)) + a32_elf_hwcap2 &= ~COMPAT_HWCAP2_AES; +#endif /* ARM64_ERRATUM_1742098 */ +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -1972,8 +1981,10 @@ void __init setup_cpu_features(void) mark_const_caps_ready(); setup_elf_hwcaps(arm64_elf_hwcaps);
- if (system_supports_32bit_el0()) + if (system_supports_32bit_el0()) { setup_elf_hwcaps(a32_elf_hwcaps); + elf_hwcap_fixup(); + }
if (system_uses_ttbr0_pan()) pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
From: Sven Schnelle svens@linux.ibm.com
mainline inclusion from mainline-v6.3-rc1 commit db4df8e9d79e7d37732c1a1b560958e8dadfefa1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6Q4F0 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When specifying an invalid console= device like console=tty3270, tty_driver_lookup_tty() returns the tty struct without checking whether index is a valid number.
To reproduce:
qemu-system-x86_64 -enable-kvm -nographic -serial mon:stdio \ -kernel ../linux-build-x86/arch/x86/boot/bzImage \ -append "console=ttyS0 console=tty3270"
This crashes with:
[ 0.770599] BUG: kernel NULL pointer dereference, address: 00000000000000ef [ 0.771265] #PF: supervisor read access in kernel mode [ 0.771773] #PF: error_code(0x0000) - not-present page [ 0.772609] Oops: 0000 [#1] PREEMPT SMP PTI [ 0.774878] RIP: 0010:tty_open+0x268/0x6f0 [ 0.784013] chrdev_open+0xbd/0x230 [ 0.784444] ? cdev_device_add+0x80/0x80 [ 0.784920] do_dentry_open+0x1e0/0x410 [ 0.785389] path_openat+0xca9/0x1050 [ 0.785813] do_filp_open+0xaa/0x150 [ 0.786240] file_open_name+0x133/0x1b0 [ 0.786746] filp_open+0x27/0x50 [ 0.787244] console_on_rootfs+0x14/0x4d [ 0.787800] kernel_init_freeable+0x1e4/0x20d [ 0.788383] ? rest_init+0xc0/0xc0 [ 0.788881] kernel_init+0x11/0x120 [ 0.789356] ret_from_fork+0x22/0x30
Signed-off-by: Sven Schnelle svens@linux.ibm.com Reviewed-by: Jiri Slaby jirislaby@kernel.org Link: https://lore.kernel.org/r/20221209112737.3222509-2-svens@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Cai Xinchen caixinchen1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/tty_io.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 708be6258609..e7917270ef61 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1155,14 +1155,16 @@ static struct tty_struct *tty_driver_lookup_tty(struct tty_driver *driver, { struct tty_struct *tty;
- if (driver->ops->lookup) + if (driver->ops->lookup) { if (!file) tty = ERR_PTR(-EIO); else tty = driver->ops->lookup(driver, file, idx); - else + } else { + if (idx >= driver->num) + return ERR_PTR(-EINVAL); tty = driver->ttys[idx]; - + } if (!IS_ERR(tty)) tty_kref_get(tty); return tty;
From: Peng Liu liupeng256@huawei.com
mainline inclusion from mainline-v5.19-rc1 commit 0a7a0f6f7f3679c906fc55e3805c1d5e2c566f55 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PMWS CVE: NA
--------------------------------
Patch series "hugetlb: Fix some incorrect behavior", v3.
This series fix three bugs of hugetlb: 1) Invalid use of nr_online_nodes; 2) Inconsistency between 1G hugepage and 2M hugepage; 3) Useless information in dmesg.
This patch (of 4):
Certain systems are designed to have sparse/discontiguous nodes. In this case, nr_online_nodes can not be used to walk through numa node. Also, a valid node may be greater than nr_online_nodes.
However, in hugetlb, it is assumed that nodes are contiguous.
For sparse/discontiguous nodes, the current code may treat a valid node as invalid, and will fail to allocate all hugepages on a valid node that "nid >= nr_online_nodes".
As David suggested:
if (tmp >= nr_online_nodes) goto invalid;
Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming that "node < 2" is valid is wrong.
Recheck all the places that use nr_online_nodes, and repair them one by one.
[liupeng256@huawei.com: v4] Link: https://lkml.kernel.org/r/20220416103526.3287348-1-liupeng256@huawei.com Link: https://lkml.kernel.org/r/20220413032915.251254-1-liupeng256@huawei.com Link: https://lkml.kernel.org/r/20220413032915.251254-2-liupeng256@huawei.com Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work") Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter") Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings") Signed-off-by: Peng Liu liupeng256@huawei.com Suggested-by: David Hildenbrand david@redhat.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Davidlohr Bueso dave@stgolabs.net Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: David Hildenbrand david@redhat.com Cc: Zhenguo Yao yaozhenguo1@gmail.com Cc: Muchun Song songmuchun@bytedance.com Cc: Liu Yuntao liuyuntao10@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/hugetlb.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/hugetlb.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a468d94bc16a..6b440d01b226 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2514,9 +2514,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) struct huge_bootmem_page *m = NULL; /* initialize for clang */ int nr_nodes, node;
- if (nid != NUMA_NO_NODE && nid >= nr_online_nodes) - return 0; - if (!huge_page_limit_check(HUGE_PAGE_BOOTMEM_ALLOC, huge_page_size(h), nid)) return 0;
@@ -2624,7 +2621,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) bool node_specific_alloc = false;
/* do node specific alloc */ - for (i = 0; i < nr_online_nodes; i++) { + for_each_online_node(i) { if (h->max_huge_pages_node[i] > 0) { hugetlb_hstate_alloc_pages_onenode(h, i); node_specific_alloc = true; @@ -4243,7 +4240,7 @@ static int __init hugetlb_init(void) if (!default_hstate.max_huge_pages) { default_hstate.max_huge_pages = default_hstate_max_huge_pages;
- for (i = 0; i < nr_online_nodes; i++) + for_each_online_node(i) default_hstate.max_huge_pages_node[i] = default_hugepages_in_node[i]; } @@ -4367,7 +4364,7 @@ static int __init hugetlb_nrpages_setup(char *s) pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n"); return 0; } - if (tmp >= nr_online_nodes) + if (tmp >= MAX_NUMNODES || !node_online(tmp)) goto invalid; node = tmp; p += count + 1;
From: Peng Liu liupeng256@huawei.com
mainline inclusion from mainline-v5.19-rc1 commit f87442f407af80dac4dc81c8a7772b71b36b2e09 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PMWS CVE: NA
--------------------------------
Hugepages can be specified to pernode since "hugetlbfs: extend the definition of hugepages parameter to support node allocation", but the following problem is observed.
Confusing behavior is observed when both 1G and 2M hugepage is set after "numa=off". cmdline hugepage settings: hugepagesz=1G hugepages=0:3,1:3 hugepagesz=2M hugepages=0:1024,1:1024 results: HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
Furthermore, confusing behavior can be also observed when an invalid node behind a valid node. To fix this, never allocate any typical hugepage when an invalid parameter is received.
Link: https://lkml.kernel.org/r/20220413032915.251254-3-liupeng256@huawei.com Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: David Hildenbrand david@redhat.com Cc: Liu Yuntao liuyuntao10@huawei.com Cc: Muchun Song songmuchun@bytedance.com Cc: Zhenguo Yao yaozhenguo1@gmail.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/hugetlb.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/hugetlb.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6b440d01b226..79d5d2cab271 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4325,6 +4325,19 @@ bool __init __weak hugetlb_node_alloc_supported(void) return true; }
+static void __init hugepages_clear_pages_in_node(void) +{ + if (!hugetlb_max_hstate) { + default_hstate_max_huge_pages = 0; + memset(default_hugepages_in_node, 0, + MAX_NUMNODES * sizeof(unsigned int)); + } else { + parsed_hstate->max_huge_pages = 0; + memset(parsed_hstate->max_huge_pages_node, 0, + MAX_NUMNODES * sizeof(unsigned int)); + } +} + static int __init hugetlb_nrpages_setup(char *s) { unsigned long *mhp; @@ -4403,6 +4416,7 @@ static int __init hugetlb_nrpages_setup(char *s)
invalid: pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p); + hugepages_clear_pages_in_node(); return 0; } __setup("hugepages=", hugetlb_nrpages_setup);