[Linuxarm] Re: [PATCH v3 3/3] iommu/iopf: add timeout function to iommu

16 Jan 2023


      ...
-----Original Message-----
From: liulongfang
Sent: 14 January 2023 07:46
To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com;
Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B)
prime.zeng@hisilicon.com
Cc: linuxarm@openeuler.org; yekai (A) yekai13@huawei.com; fanghao (A)
fanghao11@huawei.com; Wangzhou (B) wangzhou1@hisilicon.com;
shenyang (M) shenyang39@huawei.com; qianweili
qianweili@huawei.com; liudongdong (C) liudongdong3@huawei.com
Subject: Re: [PATCH v3 3/3] iommu/iopf: add timeout function to iommu
On 2023/1/9 14:56, Shameerali Kolothum Thodi Wrote:
...
...
-----Original Message-----
From: liulongfang
Sent: 09 January 2023 02:39
To: Shameerali Kolothum Thodi
shameerali.kolothum.thodi@huawei.com;
...
...
Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B)
prime.zeng@hisilicon.com
Cc: linuxarm@openeuler.org; yekai (A) yekai13@huawei.com; fanghao
(A)
...
...
fanghao11@huawei.com; Wangzhou (B) wangzhou1@hisilicon.com;
shenyang (M) shenyang39@huawei.com; qianweili
qianweili@huawei.com; liudongdong (C) liudongdong3@huawei.com
Subject: Re: [PATCH v3 3/3] iommu/iopf: add timeout function to iommu
On 2023/1/8 17:12, Shameerali Kolothum Thodi Wrote:
...
...
-----Original Message-----
From: liulongfang
Sent: 07 January 2023 09:24
To: Shameerali Kolothum Thodi
shameerali.kolothum.thodi@huawei.com;
...
...
Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B)
prime.zeng@hisilicon.com
Cc: linuxarm@openeuler.org; yekai (A) yekai13@huawei.com;
fanghao
...
...
...
...
(A) fanghao11@huawei.com; Wangzhou (B)
wangzhou1@hisilicon.com;
...
...
shenyang (M) shenyang39@huawei.com; qianweili
qianweili@huawei.com; liudongdong (C)
liudongdong3@huawei.com
...
...
...
...
Subject: Re: [PATCH v3 3/3] iommu/iopf: add timeout function to
iommu
...
...
...
...
On 2023/1/6 16:57, Shameerali Kolothum Thodi wrote:
...
> -----Original Message-----
> From: liulongfang
> Sent: 06 January 2023 00:55
> To: Shameerali Kolothum Thodi
shameerali.kolothum.thodi@huawei.com;
...
> Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B)
> prime.zeng@hisilicon.com
> Cc: linuxarm@openeuler.org; yekai (A) yekai13@huawei.com;
fanghao
...
...
(A)
...
> fanghao11@huawei.com; Wangzhou (B)
wangzhou1@hisilicon.com;
...
...
...
> shenyang (M) shenyang39@huawei.com; qianweili
> qianweili@huawei.com; liudongdong (C)
liudongdong3@huawei.com
...
...
...
> Subject: Re: [PATCH v3 3/3] iommu/iopf: add timeout function to
> iommu
>
> On 2023/1/6 1:11, Shameerali Kolothum Thodi wrote:
>>
>>
>>> -----Original Message-----
>>> From: liulongfang
>>> Sent: 05 January 2023 07:48
>>> To: Shameerali Kolothum Thodi
> shameerali.kolothum.thodi@huawei.com;
>>> Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B)
>>> prime.zeng@hisilicon.com
>>> Cc: linuxarm@openeuler.org; yekai (A) yekai13@huawei.com;
fanghao
...
>>> (A) fanghao11@huawei.com; Wangzhou (B)
> wangzhou1@hisilicon.com;
>>> shenyang (M) shenyang39@huawei.com; liulongfang
>>> liulongfang@huawei.com; qianweili qianweili@huawei.com;
>>> liudongdong (C) liudongdong3@huawei.com
>>> Subject: [PATCH v3 3/3] iommu/iopf: add timeout function to
iommu
...
...
...
...
...
>>>
[...]
...
...
...
...
...
...
>>> send-email-jacob.jun.pan@linux.intel.com/
>>>
>>> The current timeout timer will create a timer for each IO page
>>> fault request. After the page fault request is processed normally
>>> or an error is returned after timeout, the timer will be
>>> automatically destroyed and will not be reused.
>>
>> I think it is better to have a per device timer rather than a
>> timer for each request. What was the problem with Intel's
approach?
...
...
...
...
...
>>
>
> Intel's patch did not handle the timer scheme completely.
You meant not calling the
iopf_complete_group()/iommu_page_response()
from their timer fn? But they have listed out the options there to
handle the timeout. So probably we could just send FAILURE response
from that timer fn for now and leave the other options for discussion.
I didn't analyze their plan carefully, because their plan will query
the list every time the timer is activated,
Because it is per device timer, not per event timer.
and their processing part is still TODO,
...
which has not been implemented.
Also that ToDo list is how they see this interface will used further which
is
...
...
valid.
...
Remember this is common IOMMU code and should cater to
ARM/Intel/AMD
...
etc and not just devices that support STALL mode but also normal PCI
PRQ
...
...
as well .
...
Their plan may have value, but it is a surefire one. As for the reason for
the
...
...
failure, it is because their timers are created based on the device.
Every time the device receives a new IOPF event, it will refresh this timer,
but
...
...
its timing time starting point cannot be updated (the previous IOPF event
still
...
...
needs to continue timing),
As I said in previous comment, could you please move that refresh timer to
page
...
response path? I think that will solve the problem.
You said "page response path" specifically refers to which part of the code ?
can you give me a specific file path or function name?
In addition, why is it better to put the refresh timer in the "page response
path"
than the current response workqueue of IOPF?
Can you explain why?
I think I have explained my reason earlier!. Anyway again from my previous 
comment,
" ... Going through their patch, one problem
could be that they reset the timer in the iommu_report_device_fault().
So if there are multiple page faults from the same device, the timer expiry
will not happen in case of a fault handler failure. If you move that timer 
reset to iommu_page_response() fn, I think it will work."
The idea is if there are no response for a specified time, the per device timer will
get fired and send the FAIURE code. Do you see any problem in moving the
timer update to iommu_page_response()?
...
...
which leads to the fact that as long as one IOPF
...
event times out, the all IOPF events received by the device after this will
time
...
...
out, that making the device unusable at all.
Ok. I am not sure what device does here when you send out a FAILURE
response
...
code to any one IOPF event. Does it continue to work after that or does
require
...
a reset?
Take the current accelerator device as an example.
When the refresh timer returns a FAULURE, the accelerator device will end
the
current task and return a task error message to the software.
Then wait for the delivery and processing of the next task.
Ok. So our ACC driver will handle it. Most likely other device drivers/vfio/iommufd
may require a callback to handle anything specific with respect to this error.
Thanks,
Shameer
...
Thanks,
Longfang.
...
Thanks,
Shameer
...
Therefore, to solve this problem, it can only be solved by creating a timer
independently for each IOPF event.
Thansk,
Longfang.
...
...
...
> In addition, it will appear kernel calltrace under high system load.
Have you debugged this further? Going through their patch, one
problem could be that they reset the timer in the
iommu_report_device_fault().
...
...
...
So if there are multiple page faults from the save device, the timer
expiry will not happen in case of a fault handler failure. If you
move that timer reset to iommu_page_response() fn, I think it will
work.
...
...
...
...
...
Since they have not completed their plans themselves, and there are
too many TODO parts in their plans, which do not match our current
needs, we will not debug their code.
Their patch has been tested on our version, and it will cause kernel
calltrace, so we reverted it directly, and redesigned another solution by
ourselves.
...
I think per device timer is more optimum than setting up a timer for
each event unless you can prove/explain otherwise. Since a solution is
already proposed in the ML, it is always good to anlyse it and come up
with an explanation why another different solution is required.
Thanks,
Shameer
...
Thanks,
Longfang.
...
Thanks,
Shameer
>
>> Thanks,
>> Shameer
>>
>
> Thanks,
> Longfang.
>
>>> Signed-off-by: Longfang Liu liulongfang@huawei.com
>>> ---
>>>  drivers/iommu/io-pgfault.c | 63
>>> ++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 61 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/io-pgfault.c
>>> b/drivers/iommu/io-pgfault.c index cd7ebbfe4a67..0fb3ec1390a8
>>> 100644
>>> --- a/drivers/iommu/io-pgfault.c
>>> +++ b/drivers/iommu/io-pgfault.c
>>> @@ -6,9 +6,11 @@
>>>   */
>>>
>>>  #include <linux/iommu.h>
>>> +#include <linux/jiffies.h>
>>>  #include <linux/list.h>
>>>  #include <linux/sched/mm.h>
>>>  #include <linux/slab.h>
>>> +#include <linux/timer.h>
>>>  #include <linux/workqueue.h>
>>>
>>>  #include "iommu-sva.h"
>>> @@ -56,11 +58,20 @@ struct iopf_fault {
>>>  #define IOPF_RESPONSE_MAX_TIMEOUT		10000
>>>  static unsigned long iopf_timeout_ms =
> IOPF_RESPONSE_DEF_TIMEOUT;
>>>
>>> +enum iopf_timer_state {
>>> +	IOPF_TIME_INIT = 0,
>>> +	IOPF_IN_PROCESS,
>>> +	IOPF_COMPLETE,
>>> +	IOPF_TIME_OUT,
>>> +};
>>> +
>>>  struct iopf_group {
>>>  	struct iopf_fault		last_fault;
>>>  	struct list_head		faults;
>>>  	struct work_struct		work;
>>>  	struct device			*dev;
>>> +	struct timer_list		iopf_timer;
>>> +	atomic_t			iopf_state;
>>>  };
>>>
>>>  static int __init iommu_set_iopf_timeout(char *str) @@ -101,6
>>> +112,32 @@ static int iopf_complete_group(struct device
> *dev,
>>> struct iopf_fault *iopf,
>>>  	return iommu_page_response(dev, &resp);  }
>>>
>>> +static void iopf_timer_func(struct timer_list *t) {
>>> +	struct iopf_group *group = from_timer(group, t, iopf_timer);
>>> +
>>> +	/* Workqueue process time out */
>>> +	if (atomic_read(&group->iopf_state) < IOPF_COMPLETE) {
>>> +		atomic_set(&group->iopf_state, IOPF_TIME_OUT);
>>> +		iopf_complete_group(group->dev, &group->last_fault,
>>> +					 IOMMU_PAGE_RESP_FAILURE);
>>> +		dev_err(group->dev, "[timer] iopf workqueue process
time
...
...
...
...
...
>>> out.\n");
>>> +	}
>>> +}
>>> +
>>> +static void iopf_timer_setup(struct iopf_group *group) {
>>> +	atomic_set(&group->iopf_state, IOPF_TIME_INIT);
>>> +
>>> +	/* IOPF timeout is 0 means no timeout tracking */
>>> +	if (iopf_timeout_ms) {
>>> +		timer_setup(&group->iopf_timer, iopf_timer_func, 0);
>>> +		group->iopf_timer.expires = jiffies +
>>> +					msecs_to_jiffies(iopf_timeout_ms);
>>> +		mod_timer(&group->iopf_timer,
group->iopf_timer.expires);
...
...
...
...
...
>>> +	}
>>> +}
>>> +
>>>  static void iopf_handler(struct work_struct *work)  {
>>>  	struct iopf_group *group;
>>> @@ -109,6 +146,17 @@ static void iopf_handler(struct
work_struct
...
...
...
...
...
> *work)
>>>  	enum iommu_page_response_code status =
IOMMU_PAGE_RESP_SUCCESS;
...
...
...
>>>
>>>  	group = container_of(work, struct iopf_group, work);
>>> +	if (atomic_read(&group->iopf_state) == IOPF_TIME_OUT) {
>>> +		list_for_each_entry_safe(iopf, next, &group->faults, list) {
>>> +			if (!(iopf->fault.prm.flags &
>>> +
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
...
...
...
...
...
>>> +				kfree(iopf);
>>> +		}
>>> +		dev_err(group->dev, "iopf workqueue wait time out.\n");
>>> +		goto iopf_err;
>>> +	}
>>> +	atomic_set(&group->iopf_state, IOPF_IN_PROCESS);
>>> +
>>>  	domain = iommu_get_domain_for_dev_pasid(group->dev,
>>>  				group->last_fault.fault.prm.pasid, 0);
>>>  	if (!domain || !domain->iopf_handler) @@ -128,7 +176,16
@@
...
...
...
...
...
>>> static void iopf_handler(struct work_struct
> *work)
>>>  			kfree(iopf);
>>>  	}
>>>
>>> +	if (atomic_read(&group->iopf_state) == IOPF_TIME_OUT) {
>>> +		dev_err(group->dev, "iopf handle fault time out.\n");
>>> +		goto iopf_err;
>>> +	}
>>> +
>>> +	atomic_set(&group->iopf_state, IOPF_COMPLETE);
>>>  	iopf_complete_group(group->dev, &group->last_fault, status);
>>> +
>>> +iopf_err:
>>> +	del_timer_sync(&group->iopf_timer);
>>>  	kfree(group);
>>>  }
>>>
>>> @@ -183,7 +240,6 @@ int iommu_queue_iopf(struct
iommu_fault
...
...
...
...
*fault,
...
>>> void *cookie)
>>>  	struct dev_iommu *param = dev->iommu;
>>>
>>>  	lockdep_assert_held(&param->lock);
>>> -
>>>  	if (fault->type != IOMMU_FAULT_PAGE_REQ)
>>>  		/* Not a recoverable page fault */
>>>  		return -EOPNOTSUPP;
>>> @@ -222,6 +278,8 @@ int iommu_queue_iopf(struct
iommu_fault
...
...
...
...
*fault,
...
>>> void *cookie)
>>>
>>>  	group->dev = dev;
>>>  	group->last_fault.fault = *fault;
>>> +	iopf_timer_setup(group);
>>> +
>>>  	INIT_LIST_HEAD(&group->faults);
>>>  	list_add(&group->last_fault.list, &group->faults);
>>>  	INIT_WORK(&group->work, iopf_handler); @@ -414,7 +472,8
@@
...
...
...
...
...
>>> struct iopf_queue *iopf_queue_alloc(const
char
...
>>> *name)
>>>  	 * that's dealt with, the high-level function can handle groups
>>> out
> of
>>>  	 * order.
>>>  	 */
>>> -	queue->wq = alloc_workqueue("iopf_queue/%s",
WQ_UNBOUND,
...
...
...
...
0,
...
>>> name);
>>> +	queue->wq = alloc_workqueue("iopf_queue/%s",
WQ_HIGHPRI |
...
...
...
...
...
>>> +				 WQ_MEM_RECLAIM | WQ_UNBOUND, 0,
name);
...
...
...
...
...
>>>  	if (!queue->wq) {
>>>  		kfree(queue);
>>>  		return NULL;
>>> --
>>> 2.24.0
>>
>> .
>>
.
.
.

    

2025

2024

2023

2022

2021

2020

[Linuxarm] Re: [PATCH v3 3/3] iommu/iopf: add timeout function to iommu