- Linuxarm - mailweb.openeuler.org

Re: [PATCH V2 1/7] app/testpmd: fix forward lcores number when DCB test
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 15:32 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V2 1/7] app/testpmd: fix forward lcores number when DCB test > > Currently, 'nb_fwd_lcores' value are both adjusted based on 'nb_fwd_streams' in > rss/simple/icmp_echo_fwd_config_setup. > But the operation is missing in dcb_fwd_config_setup, which may lead to a bad > behavior in which multiple polling threads operate on the same queue. This patch is OK. But commit log is redundant and confusing. The above is enough to explain what your patch is doing and can even be more simple. >In this > case, the device sends and receives packets, causing unexpected results. The > procedure is as follows: I don't understand what you're saying here. The commands you're showing is 8 nbcores dealing with 16 queues. So it's one thread dealing with multiple queues which doesn't have issues at all. Please remove the useless and confusing commands. > 1/ run testpmd with "--rxq=8 --txq=8" > 2/ port stop all > 3/ set nbcore 8 > 4/ port config 0 dcb vt off 4 pfc on > 5/ port config all rxq 16 > 6/ port config all txq 16 > 7/ port start all > 8/ set fwd mac > 9/ start > > For the DCB forwarding test, each core is assigned to each traffic class and each > core is assigned a multi-stream. > Therefore, 'nb_fwd_lcores' value needs to be adjusted based on 'total_tc_num' > in all forwarding ports. Please refer to the RSS fwd config fix patch to write your own commit log. Use simple and easy-understanding words to explain yourself. Below is the reference of RSS. commit 017d680a91fcf30da14a6d3a2f96d41f6dda3a0f Author: Pablo de Lara <pablo.de.lara.guarch(a)intel.com> Date: Mon Jun 27 23:35:19 2016 +0100 app/testpmd: limit number of forwarding cores Number of forwarding cores must be equal or less than number of forwarding streams, otherwise two cores would try to use a same queue on a port, which is not allowed. > > Fixes: 900550de04a7 ("app/testpmd: add dcb support") > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > Cc: stable(a)dpdk.org > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > app/test-pmd/config.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index > ccb9bd3..03ee40c 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -2961,6 +2961,21 @@ rss_fwd_config_setup(void) > } > } > > +static uint16_t > +get_fwd_port_total_tc_num(void) > +{ > + struct rte_eth_dcb_info dcb_info; > + uint16_t total_tc_num = 0; > + unsigned int i; > + > + for (i = 0; i < nb_fwd_ports; i++) { > + (void)rte_eth_dev_get_dcb_info(fwd_ports_ids[i], &dcb_info); > + total_tc_num += dcb_info.nb_tcs; > + } > + > + return total_tc_num; > +} > + > /** > * For the DCB forwarding test, each core is assigned on each traffic class. > * > @@ -2980,12 +2995,16 @@ dcb_fwd_config_setup(void) > lcoreid_t lc_id; > uint16_t nb_rx_queue, nb_tx_queue; > uint16_t i, j, k, sm_id = 0; > + uint16_t total_tc_num; > uint8_t tc = 0; > > cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores; > cur_fwd_config.nb_fwd_ports = nb_fwd_ports; > cur_fwd_config.nb_fwd_streams = > (streamid_t) (nb_rxq * cur_fwd_config.nb_fwd_ports); > + total_tc_num = get_fwd_port_total_tc_num(); > + if (cur_fwd_config.nb_fwd_lcores > total_tc_num) > + cur_fwd_config.nb_fwd_lcores = total_tc_num; > > /* reinitialize forwarding streams */ > init_fwd_streams(); > -- > 2.7.4

1 0

Re: [PATCH] sched/fair: don't use waker's cpu if the waker of sync wake-up is interrupt
by Mike Galbraith 27 Apr '21

27 Apr '21

On Tue, 2021-04-27 at 14:37 +1200, Barry Song wrote: > > To fix this issue, this patch checks the waker of sync wake-up is a task > but not an interrupt. In this case, the waker will schedule out and give > CPU to wakee. That rash "the waker will schedule out" assumption, ie this really really is a synchronous wakeup, may be true in your particular case, but the sync hint is so overused as to be fairly meaningless. We've squabbled with it repeatedly over the years because of that. It really should either become more of a communication of intent than it currently is, or just go away. I'd argue for go away, simply because there is no way for the kernel to know that there isn't more work directly behind any particular wakeup. Take a pipe, does shoving some bits through a pipe mean you have no further use of your CPU? IFF you're doing nothing but playing ping- pong, sure it does, but how many real proggies have zero overlap among its threads of execution? The mere notion of threaded apps having no overlap *to be converted to throughput* is dainbramaged, which should be the death knell of the sync wakeup hint. Threaded apps can't do stuff like, oh, networking, which uses the sync hint heavily, without at least to some extent defeating the purpose of threading if we were to take the hint seriously. Heck, just look at the beauty (gak) of wake_wide(). It was born specifically to combat the pure-evil side of the sync wakeup hint. Bah, 'nuff "Danger Will Robinson, that thing will *eat you*!!" ;-) -Mike

2 4

Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
by Alex Williamson 26 Apr '21

26 Apr '21

On Wed, 21 Apr 2021 17:59:02 +0800 liulongfang <liulongfang(a)huawei.com> wrote: > On 2021/4/21 6:04, Alex Williamson wrote: > > On Tue, 20 Apr 2021 09:59:57 -0300 > > Jason Gunthorpe <jgg(a)nvidia.com> wrote: > > > >> On Tue, Apr 20, 2021 at 08:50:12PM +0800, liulongfang wrote: > >>> On 2021/4/19 20:33, Jason Gunthorpe wrote: > >>>> On Mon, Apr 19, 2021 at 08:24:40PM +0800, liulongfang wrote: > >>>> > >>>>>> I'm also confused how this works securely at all, as a general rule a > >>>>>> VFIO PCI driver cannot access the MMIO memory of the function it is > >>>>>> planning to assign to the guest. There is a lot of danger that the > >>>>>> guest could access that MMIO space one way or another. > >>>>> > >>>>> VF's MMIO memory is divided into two parts, one is the guest part, > >>>>> and the other is the live migration part. They do not affect each other, > >>>>> so there is no security problem. > >>>> > >>>> AFAIK there are several scenarios where a guest can access this MMIO > >>>> memory using DMA even if it is not mapped into the guest for CPU > >>>> access. > >>>> > >>> The hardware divides VF's MMIO memory into two parts. The live migration > >>> driver in the host uses the live migration part, and the device driver in > >>> the guest uses the guest part. They obtain the address of VF's MMIO memory > >>> in their respective drivers, although these two parts The memory is > >>> continuous on the hardware device, but due to the needs of the drive function, > >>> they will not perform operations on another part of the memory, and the > >>> device hardware also independently responds to the operation commands of > >>> the two parts. > >> > >> It doesn't matter, the memory is still under the same PCI BDF and VFIO > >> supports scenarios where devices in the same IOMMU group are not > >> isolated from each other. > >> > >> This is why the granual of isolation is a PCI BDF - VFIO directly > >> blocks kernel drivers from attaching to PCI BDFs that are not > >> completely isolated from VFIO BDF. > >> > >> Bypassing this prevention and attaching a kernel driver directly to > >> the same BDF being exposed to the guest breaks that isolation model. > >> > >>> So, I still don't understand what the security risk you are talking about is, > >>> and what do you think the security design should look like? > >>> Can you elaborate on it? > >> > >> Each security domain must have its own PCI BDF. > >> > >> The migration control registers must be on a different VF from the VF > >> being plugged into a guest and the two VFs have to be in different > >> IOMMU groups to ensure they are isolated from each other. > > > > I think that's a solution, I don't know if it's the only solution. > > AIUI, the issue here is that we have a device specific kernel driver > > extending vfio-pci with migration support for this device by using an > > If the two parts of the MMIO region are split into different BAR spaces on > the device, the MMIO region of the business function is still placed in BAR2, > and the MMIO region of the live migration function is moved to BAR4. > Only BAR2 is mapped in the guest. only BAR4 is mapped in the host. > This can solve this security issue. The concern is really the "on the device" part rather than whether the resources are within the same BAR or not. We need to assume that a user driver can generate a DMA targeting any address in the system, including in this case the user driver could generate a DMA targeting this migration BAR. Ideally this would be routed upstream to the IOMMU where it would be blocked for lack of a translation entry. However, because this range resides on the same PCIe requester ID, it's logically more similar to a two-function device where the functions are considered non-isolated and are therefore exposed within the same IOMMU group. We would not allow a kernel driver for one of those functions and a userspace driver for the other. In this case those drivers are strongly related, but we still need to consider to what extent a malicious user driver can interfere with or exploit the kernel side driver. > > MMIO region of the same device. This is susceptible to DMA> manipulation by the user device. Whether that's a security issue or> not depends on how the user can break the device. If the scope is > > limited to breaking their own device, they can do that any number of > > ways and it's not very interesting. If the user can manipulate device > > state in order to trigger an exploit of the host-side kernel driver, > > that's obviously more of a problem. > > > > The other side of this is that if migration support can be implemented > > entirely within the VF using this portion of the device MMIO space, why > > do we need the host kernel to support this rather than implementing it > > in userspace? For example, QEMU could know about this device, > > manipulate the BAR size to expose only the operational portion of MMIO > > to the VM and use the remainder to support migration itself. I'm > > afraid that just like mdev, the vfio migration uAPI is going to be used > > as an excuse to create kernel drivers simply to be able to make use of > > that uAPI. I haven't looked at this driver to know if it has some > > When the accelerator device is designed to support the live migration > function, it is based on the uAPI of the migration region to realize the > live migration function, so the live migration function requires a driver > that connects to this uAPI. > Is this set of interfaces not open to us now? In your model, if both BARs are exposed to userspace and a device specific extension in QEMU claims the migration BAR rather than exposing it to the VM, could that driver mimic the migration region uAPI from userspace? For example, you don't need page pinning to interact with the IOMMU, you don't need resources beyond the scope of the endpoint device itself, and the migration info BAR is safe for userspace to manage? If so, then a kernel-based driver to expose a migration uAPI seems like it's only a risk for the kernel, ie. moving what could be a userspace driver into the kernel for the convenience of re-using a kernel uAPI. Thanks, Alex

2 1

Re: [PATCH V4] ethdev: add queue state when retrieve queue information
by Thomas Monjalon 26 Apr '21

26 Apr '21

16/04/2021 10:46, Lijun Ou: > Currently, upper-layer application could get queue state only > through pointers such as dev->data->tx_queue_state[queue_id], > this is not the recommended way to access it. So this patch > add get queue state when call rte_eth_rx_queue_info_get and > rte_eth_tx_queue_info_get API. > > Note: After add queue_state field, the 'struct rte_eth_rxq_info' size > remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so > it could be ABI compatible. [...] > --- a/doc/guides/rel_notes/release_21_05.rst > +++ b/doc/guides/rel_notes/release_21_05.rst > @@ -251,6 +251,12 @@ ABI Changes > function was already marked as internal in the API documentation for it, > and was not for use by external applications. > > +* Added new field ``queue_state`` to ``rte_eth_rxq_info`` structure > + to provide indicated rxq queue state. > + > +* Added new field ``queue_state`` to ``rte_eth_txq_info`` structure > + to provide indicated txq queue state. Not sure we should add a note here for additions which do not break ABI compatibility. It may be confusing.

4 6

Re: [PATCH 2/2] Documentation/ABI: Move the topology-related sysfs interface to the right place
by Song Bao Hua (Barry Song) 26 Apr '21

26 Apr '21

> -----Original Message----- > From: tiantao (H) > Sent: Monday, April 26, 2021 2:26 PM > To: Song Bao Hua (Barry Song) <song.bao.hua(a)hisilicon.com> > Cc: linuxarm(a)openeuler.org; tiantao (H) <tiantao6(a)hisilicon.com> > Subject: [PATCH 2/2] Documentation/ABI: Move the topology-related sysfs > interface to the right place > > Move the interface that exists under > /sys/devices/system/cpu/cpuX/topology/ to the more logical > Documentation/ABI/ file that can be properly parsed and > displayed to the user space > > Signed-off-by: Tian Tao <tiantao6(a)hisilicon.com> > Signed-off-by: Barry Song <song.bao.hua(a)hisilicon.com> > --- > Documentation/ABI/stable/sysfs-devices-system-cpu | 149 > ++++++++++++++++++++++ > Documentation/admin-guide/cputopology.rst | 104 --------------- > 2 files changed, 149 insertions(+), 104 deletions(-) > > diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu > b/Documentation/ABI/stable/sysfs-devices-system-cpu > index 33c133e..7d0b23e 100644 > --- a/Documentation/ABI/stable/sysfs-devices-system-cpu > +++ b/Documentation/ABI/stable/sysfs-devices-system-cpu > @@ -1,3 +1,7 @@ > +Export CPU topology info via sysfs. Items (attributes) are similar > +to /proc/cpuinfo output of some architectures. They reside in > +/sys/devices/system/cpu/cpuX/topology/: > + > What: /sys/devices/system/cpu/dscr_default > Date: 13-May-2014 > KernelVersion: v3.15.0 > @@ -23,3 +27,148 @@ Description: Default value for the Data Stream Control > Register (DSCR) on > here). > If set by a process it will be inherited by child processes. > Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/physical_package_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: physical package id of cpuX. Typically corresponds to a > physical > + socket number, but the actual value is architecture and platform > + dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/die_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the CPU die ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/core_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the CPU core ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/book_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the book ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/drawer_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the drawer ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/core_cpus > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of CPUs within the same core. > + (deprecated name: "thread_siblings") > +Values: hexadecimal bitmask. > + > +What: /sys/devices/system/cpu/cpuX/topology/core_cpus_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of CPUs within the same core. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > + (deprecated name: "thread_siblings_list"). > +Values: hexadecimal bitmask. No. this is a list not a mask. > + > +What: /sys/devices/system/cpu/cpuX/topology/package_cpus > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of the CPUs sharing the same > physical_package_id. > + (deprecated name: "core_siblings"). > +Values: 64 bit unsigned integer (bit field) Id is unsigned integer. Here it is hexadecimal bitmask. > + > +What: /sys/devices/system/cpu/cpuX/topology/package_cpus_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of CPUs sharing the same > physical_package_id. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > + (deprecated name: "core_siblings_list") > +Values: hexadecimal bitmask. As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/die_cpus > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of CPUs within the same die. > +Values: 64 bit unsigned integer (bit field) As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/die_cpus_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of CPUs within the same die. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > +Values: hexadecimal bitmask. As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/book_siblings > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of cpuX's hardware threads within the same > + book_id. > +Values: 64 bit unsigned integer (bit field) As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/book_siblings_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of cpuX's hardware threads within the same > + book_id. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. For example here we should mark it "available to s390 only" > +Values: hexadecimal bitmask. > + > +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of cpuX's hardware threads within the same > + drawer_id. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of cpuX's hardware threads within the same > + drawer_id. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > +Values: hexadecimal bitmask. > + > +Architecture-neutral, drivers/base/topology.c, exports these attributes. > +However, the book and drawer related sysfs files will only be created if > +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively. > + > +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390, > +where they reflect the cpu and cache hierarchy. These are not ABIs, better to be in original doc. Better to describe drawer ABI, book ABIs are only available for s390 in the ABI description. > diff --git a/Documentation/admin-guide/cputopology.rst > b/Documentation/admin-guide/cputopology.rst > index 4538d78..4672465 100644 > --- a/Documentation/admin-guide/cputopology.rst > +++ b/Documentation/admin-guide/cputopology.rst > @@ -2,110 +2,6 @@ > How CPU topology info is exported via sysfs > =========================================== > > -Export CPU topology info via sysfs. Items (attributes) are similar > -to /proc/cpuinfo output of some architectures. They reside in > -/sys/devices/system/cpu/cpuX/topology/: > - > -physical_package_id: > - > - physical package id of cpuX. Typically corresponds to a physical > - socket number, but the actual value is architecture and platform > - dependent. > - > -die_id: > - > - the CPU die ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -core_id: > - > - the CPU core ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -book_id: > - > - the book ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -drawer_id: > - > - the drawer ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -core_cpus: > - > - internal kernel map of CPUs within the same core. > - (deprecated name: "thread_siblings") > - > -core_cpus_list: > - > - human-readable list of CPUs within the same core. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - (deprecated name: "thread_siblings_list"); > - > -package_cpus: > - > - internal kernel map of the CPUs sharing the same physical_package_id. > - (deprecated name: "core_siblings") > - > -package_cpus_list: > - > - human-readable list of CPUs sharing the same physical_package_id. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - (deprecated name: "core_siblings_list") > - > -die_cpus: > - > - internal kernel map of CPUs within the same die. > - > -die_cpus_list: > - > - human-readable list of CPUs within the same die. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - > -book_siblings: > - > - internal kernel map of cpuX's hardware threads within the same > - book_id. > - > -book_siblings_list: > - > - human-readable list of cpuX's hardware threads within the same > - book_id. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - > -drawer_siblings: > - > - internal kernel map of cpuX's hardware threads within the same > - drawer_id. > - > -drawer_siblings_list: > - > - human-readable list of cpuX's hardware threads within the same > - drawer_id. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - > -Architecture-neutral, drivers/base/topology.c, exports these attributes. > -However, the book and drawer related sysfs files will only be created if > -CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively. > - > -CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390, > -where they reflect the cpu and cache hierarchy. > - > For an architecture to support this feature, it must define some of > these macros in include/asm-XXX/topology.h:: > > -- > 2.7.4

1 0

[PATCH 0/2] clarify and cleanup CPU and NUMA topology ABIs
by Tian Tao 26 Apr '21

26 Apr '21

patch #1: Move the interface that exists under /sys/devices/system/cpu/cpuX/topology/ to the more logical Documentation/ABI/ file that can be properly parsed and displayed to the user space. patch #2: clarify the overflow issue of sysfs pagebuf, and Move the presence of BUILD_BUG_ON to more sensible place. Tian Tao (2): CPU, NUMA topology ABIs: clarify and cleanup CPU and NUMA topology ABIs cpumask: clarify the overflow issue of sysfs pagebuf Documentation/ABI/stable/sysfs-devices-node | 4 + Documentation/ABI/stable/sysfs-devices-system-cpu | 149 ++++++++++++++++++++++ Documentation/admin-guide/cputopology.rst | 89 ------------- drivers/base/node.c | 3 - include/linux/cpumask.h | 9 +- 5 files changed, 161 insertions(+), 93 deletions(-) -- 2.7.4

1 2

Re: [dpdk-dev] [PATCH V5] ethdev: add queue state when retrieve queue information
by Kinsella, Ray 23 Apr '21

23 Apr '21

On 17/04/2021 04:09, Lijun Ou wrote: > Currently, upper-layer application could get queue state only > through pointers such as dev->data->tx_queue_state[queue_id], > this is not the recommended way to access it. So this patch > add get queue state when call rte_eth_rx_queue_info_get and > rte_eth_tx_queue_info_get API. > > Note: After add queue_state field, the 'struct rte_eth_rxq_info' size > remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so > it could be ABI compatible. > > Signed-off-by: Chengwen Feng <fengchengwen(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev(a)intel.com> > --- > V4->V5: > - Add acked-by > - add a note to the "New features" section to annouce the new feature. > > V3->V4: > - update libabigail.abignore for removing the CI warnings > > V2->V3: > - rewrite the commit log and delete the part Note > - rewrite tht comments for queue state > - move the queue_state definition locations > > V1->V2: > - move queue state defines to public file > --- > doc/guides/rel_notes/release_21_05.rst | 6 ++++++ > lib/librte_ethdev/ethdev_driver.h | 7 ------- > lib/librte_ethdev/rte_ethdev.c | 3 +++ > lib/librte_ethdev/rte_ethdev.h | 9 +++++++++ > 4 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst > index 58272e1..1ab3681 100644 > --- a/doc/guides/rel_notes/release_21_05.rst > +++ b/doc/guides/rel_notes/release_21_05.rst > @@ -81,6 +81,12 @@ New Features > representor=[[c#]pf#]sf# sf[0,2-1023] /* 1023 SFs. */ > representor=[c#]pf# c2pf[0,1] /* 2 PFs on controller 2. */ > > +* **Enhanced function for getting rxq/txq info ABI.** > + * Added new field ``queue_state`` to ``rte_eth_rxq_info`` structure to > + provide indicated rxq queue state. > + * Added new field ``queue_state`` to ``rte_eth_txq_info`` structure to > + provide indicated txq queue state. > + > * **Added support for meter PPS profile.** > > Currently meter algorithms only supports bytes per second(BPS). > diff --git a/lib/librte_ethdev/ethdev_driver.h b/lib/librte_ethdev/ethdev_driver.h > index 113129d..40e474a 100644 > --- a/lib/librte_ethdev/ethdev_driver.h > +++ b/lib/librte_ethdev/ethdev_driver.h > @@ -952,13 +952,6 @@ struct eth_dev_ops { > }; > > /** > - * RX/TX queue states > - */ > -#define RTE_ETH_QUEUE_STATE_STOPPED 0 > -#define RTE_ETH_QUEUE_STATE_STARTED 1 > -#define RTE_ETH_QUEUE_STATE_HAIRPIN 2 > - > -/** > * @internal > * Check if the selected Rx queue is hairpin queue. > * > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c > index c73d263..d5adf4f 100644 > --- a/lib/librte_ethdev/rte_ethdev.c > +++ b/lib/librte_ethdev/rte_ethdev.c > @@ -5038,6 +5038,8 @@ rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id, > > memset(qinfo, 0, sizeof(*qinfo)); > dev->dev_ops->rxq_info_get(dev, queue_id, qinfo); > + qinfo->queue_state = dev->data->rx_queue_state[queue_id]; > + > return 0; > } > > @@ -5078,6 +5080,7 @@ rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id, > > memset(qinfo, 0, sizeof(*qinfo)); > dev->dev_ops->txq_info_get(dev, queue_id, qinfo); > + qinfo->queue_state = dev->data->tx_queue_state[queue_id]; > > return 0; > } > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h > index 3b773b6..a0d01d2 100644 > --- a/lib/librte_ethdev/rte_ethdev.h > +++ b/lib/librte_ethdev/rte_ethdev.h > @@ -1588,6 +1588,13 @@ struct rte_eth_dev_info { > }; > > /** > + * RX/TX queue states > + */ > +#define RTE_ETH_QUEUE_STATE_STOPPED 0 > +#define RTE_ETH_QUEUE_STATE_STARTED 1 > +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2 > + > +/** > * Ethernet device RX queue information structure. > * Used to retrieve information about configured queue. > */ > @@ -1595,6 +1602,7 @@ struct rte_eth_rxq_info { > struct rte_mempool *mp; /**< mempool used by that queue. */ > struct rte_eth_rxconf conf; /**< queue config parameters. */ > uint8_t scattered_rx; /**< scattered packets RX supported. */ > + uint8_t queue_state; /**< one of RTE_ETH_QUEUE_STATE_*. */ Are we sure this is a false positive? - it is being added mid-structure on the rx-side at least. Shouldn't this be appended to the end - unless it is being sneaked into padding between fields. > uint16_t nb_desc; /**< configured number of RXDs. */ > uint16_t rx_buf_size; /**< hardware receive buffer size. */ > } __rte_cache_min_aligned; > @@ -1606,6 +1614,7 @@ struct rte_eth_rxq_info { > struct rte_eth_txq_info { > struct rte_eth_txconf conf; /**< queue config parameters. */ > uint16_t nb_desc; /**< configured number of TXDs. */ > + uint8_t queue_state; /**< one of RTE_ETH_QUEUE_STATE_*. */ > } __rte_cache_min_aligned; > > /* Generic Burst mode flag definition, values can be ORed. */ >

2 2

Re: [PATCH V6] app/test-pmd: support cleanup txq mbufs command
by Ferruh Yigit 21 Apr '21

21 Apr '21

On 4/21/2021 9:45 AM, Lijun Ou wrote: > From: Chengwen Feng <fengchengwen(a)huawei.com> > > This patch supports cleanup txq mbufs command: > port cleanup (port_id) txq (queue_id) (free_cnt) > > Signed-off-by: Chengwen Feng <fengchengwen(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit(a)intel.com> Applied to dpdk-next-net/main, thanks.

1 0

Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
by Jason Gunthorpe 21 Apr '21

21 Apr '21

On Tue, Apr 20, 2021 at 08:50:12PM +0800, liulongfang wrote: > On 2021/4/19 20:33, Jason Gunthorpe wrote: > > On Mon, Apr 19, 2021 at 08:24:40PM +0800, liulongfang wrote: > > > >>> I'm also confused how this works securely at all, as a general rule a > >>> VFIO PCI driver cannot access the MMIO memory of the function it is > >>> planning to assign to the guest. There is a lot of danger that the > >>> guest could access that MMIO space one way or another. > >> > >> VF's MMIO memory is divided into two parts, one is the guest part, > >> and the other is the live migration part. They do not affect each other, > >> so there is no security problem. > > > > AFAIK there are several scenarios where a guest can access this MMIO > > memory using DMA even if it is not mapped into the guest for CPU > > access. > > > The hardware divides VF's MMIO memory into two parts. The live migration > driver in the host uses the live migration part, and the device driver in > the guest uses the guest part. They obtain the address of VF's MMIO memory > in their respective drivers, although these two parts The memory is > continuous on the hardware device, but due to the needs of the drive function, > they will not perform operations on another part of the memory, and the > device hardware also independently responds to the operation commands of > the two parts. It doesn't matter, the memory is still under the same PCI BDF and VFIO supports scenarios where devices in the same IOMMU group are not isolated from each other. This is why the granual of isolation is a PCI BDF - VFIO directly blocks kernel drivers from attaching to PCI BDFs that are not completely isolated from VFIO BDF. Bypassing this prevention and attaching a kernel driver directly to the same BDF being exposed to the guest breaks that isolation model. > So, I still don't understand what the security risk you are talking about is, > and what do you think the security design should look like? > Can you elaborate on it? Each security domain must have its own PCI BDF. The migration control registers must be on a different VF from the VF being plugged into a guest and the two VFs have to be in different IOMMU groups to ensure they are isolated from each other. Jason

3 3

Re: [PATCH net v4 1/2] net: sched: fix packet stuck problem for lockless qdisc
by Michal Kubecek 21 Apr '21

21 Apr '21

On Wed, Apr 21, 2021 at 04:21:54PM +0800, Yunsheng Lin wrote: > > I tried using below shell to simulate your testcase: > > #!/bin/sh > > for((i=0; i<20; i++)) > do > taskset -c 0-31 netperf -t TCP_STREAM -H 192.168.100.2 -l 30 -- -m 1048576 > done > > And I got a quite stable result: 9413~9416 (10^6bits/sec) for 10G netdev. Perhaps try it without the taskset, in my test, there was only one connection. > > > > https://github.com/mkubecek/nperf > > > > It is still raw and a lot of features are missing but it can be already > > used for multithreaded TCP_STREAM and TCP_RR tests. In particular, the > > output above was with > > > > nperf -H 172.17.1.1 -l 30 -i 20 --exact -t TCP_STREAM -M 1 > > I tried your nperf too, unfortunately it does not seem to work on my > system(arm64), which exits very quickly and output the blow result: > > root@(none):/home# nperf -H 192.168.100.2 -l 30 -i 20 --exact -t TCP_STREAM -M 1 > server: 192.168.100.2, port 12543 > iterations: 20, threads: 1, test length: 30 > test: TCP_STREAM, message size: 1048576 > > 1 4.0 B/s, avg 4.0 B/s, mdev 0.0 B/s ( 0.0%) [...] Did you start nperfd on the other side? (It plays a role similar to netserver for netperf.) Few days ago I noticed that there is something wrong with error handling in case of failed connection but didn't get to fixing it yet. I'll try running some tests also on other architectures, including arm64 and s390x (to catch potential endinanity issues). Michal

1 0