From: Jie Hai haijie1@huawei.com
stable inclusion from stable-v5.10.150 commit af12e209a9d559394d35875ba0e6c80407605888 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 2cbb95883c990d0002a77e13d3278913ab26ad79 ]
When we get a DMA channel and try to use it in multiple threads it will cause oops and hanging the system.
% echo 100 > /sys/module/dmatest/parameters/threads_per_chan % echo 100 > /sys/module/dmatest/parameters/iterations % echo 1 > /sys/module/dmatest/parameters/run [383493.327077] Unable to handle kernel paging request at virtual address dead000000000108 [383493.335103] Mem abort info: [383493.335103] ESR = 0x96000044 [383493.335105] EC = 0x25: DABT (current EL), IL = 32 bits [383493.335107] SET = 0, FnV = 0 [383493.335108] EA = 0, S1PTW = 0 [383493.335109] FSC = 0x04: level 0 translation fault [383493.335110] Data abort info: [383493.335111] ISV = 0, ISS = 0x00000044 [383493.364739] CM = 0, WnR = 1 [383493.367793] [dead000000000108] address between user and kernel address ranges [383493.375021] Internal error: Oops: 96000044 [#1] PREEMPT SMP [383493.437574] CPU: 63 PID: 27895 Comm: dma0chan0-copy2 Kdump: loaded Tainted: GO 5.17.0-rc4+ #2 [383493.457851] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [383493.465331] pc : vchan_tx_submit+0x64/0xa0 [383493.469957] lr : vchan_tx_submit+0x34/0xa0
This occurs because the transmission timed out, and that's due to data race. Each thread rewrite channels's descriptor as soon as device_issue_pending is called. It leads to the situation that the driver thinks that it uses the right descriptor in interrupt handler while channels's descriptor has been changed by other thread. The descriptor which in fact reported interrupt will not be handled any more, as well as its tx->callback. That's why timeout reports.
With current fixes channels' descriptor changes it's value only when it has been used. A new descriptor is acquired from vc->desc_issued queue that is already filled with descriptors that are ready to be sent. Threads have no direct access to DMA channel descriptor. In case of channel's descriptor is busy, try to submit to HW again when a descriptor is completed. In this case, vc->desc_issued may be empty when hisi_dma_start_transfer is called, so delete error reporting on this. Now it is just possible to queue a descriptor for further processing.
Fixes: e9f08b65250d ("dmaengine: hisilicon: Add Kunpeng DMA engine support") Signed-off-by: Jie Hai haijie1@huawei.com Acked-by: Zhou Wang wangzhou1@hisilicon.com Link: https://lore.kernel.org/r/20220830062251.52993-4-haijie1@huawei.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/dma/hisi_dma.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/hisi_dma.c b/drivers/dma/hisi_dma.c index 08ec90dd4c46..8f1651367310 100644 --- a/drivers/dma/hisi_dma.c +++ b/drivers/dma/hisi_dma.c @@ -276,7 +276,6 @@ static void hisi_dma_start_transfer(struct hisi_dma_chan *chan)
vd = vchan_next_desc(&chan->vc); if (!vd) { - dev_err(&hdma_dev->pdev->dev, "no issued task!\n"); chan->desc = NULL; return; } @@ -308,7 +307,7 @@ static void hisi_dma_issue_pending(struct dma_chan *c)
spin_lock_irqsave(&chan->vc.lock, flags);
- if (vchan_issue_pending(&chan->vc)) + if (vchan_issue_pending(&chan->vc) && !chan->desc) hisi_dma_start_transfer(chan);
spin_unlock_irqrestore(&chan->vc.lock, flags); @@ -447,11 +446,10 @@ static irqreturn_t hisi_dma_irq(int irq, void *data) chan->qp_num, chan->cq_head); if (FIELD_GET(STATUS_MASK, cqe->w0) == STATUS_SUCC) { vchan_cookie_complete(&desc->vd); + hisi_dma_start_transfer(chan); } else { dev_err(&hdma_dev->pdev->dev, "task error!\n"); } - - chan->desc = NULL; }
spin_unlock_irqrestore(&chan->vc.lock, flags);