- Linuxarm - mailweb.openeuler.org

Re: [PATCH V3 5/7] app/testpmd: move position of verifying DCB test
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 17:01 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V3 5/7] app/testpmd: move position of verifying DCB test > > Currently, the check for doing DCB test is assigned to start_packet_forwarding(), > which will be called when run "start" cmd. But fwd_config_setup() is used in > many scenarios, such as, "port config all rxq". > > This patch moves the check from start_packet_forwarding() to > fwd_config_setup(). > > Fixes: 7741e4cf16c0 ("app/testpmd: VMDq and DCB updates") > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > app/test-pmd/config.c | 23 +++++++++++++++++++++-- app/test- > pmd/testpmd.c | 19 ------------------- > 2 files changed, 21 insertions(+), 21 deletions(-) Acked-by: Xiaoyun Li <xiaoyun.li(a)intel.com>

1 0

Re: [PATCH V3 4/7] app/testpmd: add check for support of reporting DCB info
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 17:01 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V3 4/7] app/testpmd: add check for support of reporting DCB > info > > Currently, '.get_dcb_info' must be supported for the port doing DCB test, or all > information in 'rte_eth_dcb_info' are zero. It should be prevented when user run > cmd "port config 0 dcb vt off 4 pfc off". > > This patch adds the check for support of reporting dcb info. > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > V2->V3 > - fix the abnormal print information > > --- > app/test-pmd/cmdline.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > 2.7.4 Acked-by: Xiaoyun Li <xiaoyun.li(a)intel.com>

1 0

Re: [PATCH V3 3/7] app/testpmd: fix a segment fault when DCB test
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 17:01 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V3 3/7] app/testpmd: fix a segment fault when DCB test > > After DCB mode is configured, if we decrease the number of RX and TX queues, > fwd_config_setup() will be called to setup the DCB forwarding configuration. > And forwarding streams are updated based on new queue numbers in > fwd_config_setup(), but the mapping between the TC and queues obtained by > rte_eth_dev_get_dcb_info() is still old queue numbers (old queue numbers are > greater than new queue numbers). > In this case, the segment fault happens. So rte_eth_dev_configure() should be > called again to update the mapping between the TC and queues before > rte_eth_dev_get_dcb_info(). > > Like: > set nbcore 4 > port stop all > port config 0 dcb vt off 4 pfc on > port start all > port stop all > port config all rxq 8 > port config all txq 8 > > Fixes: 900550de04a7 ("app/testpmd: add dcb support") > Cc: stable(a)dpdk.org > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > app/test-pmd/config.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index > 03ee40c..18b197b 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -2996,7 +2996,33 @@ dcb_fwd_config_setup(void) > uint16_t nb_rx_queue, nb_tx_queue; > uint16_t i, j, k, sm_id = 0; > uint16_t total_tc_num; > + struct rte_port *port; > uint8_t tc = 0; > + portid_t pid; > + int ret; > + > + /* > + * The fwd_config_setup() is called when the port is > RTE_PORT_STARTED When the port is RTE_PORT_STARTED or RTE_PORT_STARTED? Missing an 'or' here. Maybe something like the following will be better? (just a reference, you can put it in a better way) /* * Re-configure ports to get updated mapping between tc and queue in case the queue number of the port is changed. * Skip for started ports since configuring queue number needs to stop ports first. */ > + * RTE_PORT_STOPPED. When a port is RTE_PORT_STARTED, > dev_configure > + * cannot be called. > + * > + * re-configure the device after changing queue numbers of stopped > + * ports, so that the updated mapping between tc and queue can be > + * obtained. > + */ > + for (pid = 0; pid < nb_fwd_ports; pid++) { > + if (port_is_started(pid) == 1) > + continue; > + > + port = &ports[pid]; > + ret = rte_eth_dev_configure(pid, nb_rxq, nb_txq, > + &port->dev_conf); > + if (ret < 0) { > + printf("Failed to re-configure port %d, ret = %d.\n", > + pid, ret); > + return; > + } > + } > > cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores; > cur_fwd_config.nb_fwd_ports = nb_fwd_ports; > -- > 2.7.4

1 0

Re: [PATCH V3 2/7] app/testpmd: fix DCB forwarding configuration
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 17:01 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V3 2/7] app/testpmd: fix DCB forwarding configuration > > After DCB mode is configured, the operations of port stop and port start change > the value of the global variable "dcb_test", As a result, the forwarding > configuration from DCB to RSS mode, namely, “dcb_fwd_config_setup()” to > "rss_fwd_config_setup()". > > Currently, the 'dcb_flag' field in struct 'rte_port' indicates whether the port is > configured with DCB. And it is sufficient to have 'dcb_config' > as a global variable to control the DCB test status. So this patch deletes the > "dcb_test". > > In addition, the 'dcb_config' is first set to 1 first in init_port_dcb_config(), but > the function may fail. > So it should be moved to the end. Change this to the following will be better: In addition, setting 'dcb_config' at the end of init_port_dcb_config() in case that ports fail to enter DCB mode. > > Fixes: 900550de04a7 ("app/testpmd: add dcb support") > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > Fixes: 7741e4cf16c0 ("app/testpmd: VMDq and DCB updates") > Cc: stable(a)dpdk.org > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > app/test-pmd/testpmd.c | 18 ++++-------------- app/test-pmd/testpmd.h | 1 - > 2 files changed, 4 insertions(+), 15 deletions(-) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index > d4be23f..a076b1d 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -246,9 +246,6 @@ uint16_t mb_mempool_cache = DEF_MBUF_CACHE; > /**< Size of mbuf mempool cache. */ > /* current configuration is in DCB or not,0 means it is not in DCB mode */ > uint8_t dcb_config = 0; > > -/* Whether the dcb is in testing status */ -uint8_t dcb_test = 0; > - > /* > * Configurable number of RX/TX queues. > */ > @@ -2167,8 +2164,7 @@ start_packet_forwarding(int with_tx_first) > return; > } > > - > - if(dcb_test) { > + if (dcb_config) { > for (i = 0; i < nb_fwd_ports; i++) { > pt_id = fwd_ports_ids[i]; > port = &ports[pt_id]; > @@ -2476,8 +2472,6 @@ start_port(portid_t pid) > if (port_id_is_invalid(pid, ENABLED_WARN)) > return 0; > > - if(dcb_config) > - dcb_test = 1; > RTE_ETH_FOREACH_DEV(pi) { > if (pid != pi && pid != (portid_t)RTE_PORT_ALL) > continue; > @@ -2717,11 +2711,6 @@ stop_port(portid_t pid) > portid_t peer_pl[RTE_MAX_ETHPORTS]; > int peer_pi; > > - if (dcb_test) { > - dcb_test = 0; > - dcb_config = 0; > - } > - > if (port_id_is_invalid(pid, ENABLED_WARN)) > return; > > @@ -3625,8 +3614,6 @@ init_port_dcb_config(portid_t pid, > rte_port = &ports[pid]; > > memset(&port_conf, 0, sizeof(struct rte_eth_conf)); > - /* Enter DCB configuration status */ > - dcb_config = 1; > > port_conf.rxmode = rte_port->dev_conf.rxmode; > port_conf.txmode = rte_port->dev_conf.txmode; @@ -3694,6 +3681,9 > @@ init_port_dcb_config(portid_t pid, > > rte_port->dcb_flag = 1; > > + /* Enter DCB configuration status */ > + dcb_config = 1; > + > return 0; > } > > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index > 9530ec5..432c66d 100644 > --- a/app/test-pmd/testpmd.h > +++ b/app/test-pmd/testpmd.h > @@ -425,7 +425,6 @@ extern uint64_t noisy_lkup_num_reads; extern uint64_t > noisy_lkup_num_reads_writes; > > extern uint8_t dcb_config; > -extern uint8_t dcb_test; > > extern uint32_t mbuf_data_size_n; > extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT]; > -- > 2.7.4

1 0

Re: [PATCH V3 1/7] app/testpmd: fix forward lcores number when DCB test
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 17:01 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V3 1/7] app/testpmd: fix forward lcores number when DCB test > > Currently, 'nb_fwd_lcores' value are both adjusted based on 'nb_fwd_streams' in > rss/simple/icmp_echo_fwd_config_setup. > But the operation is missing in dcb_fwd_config_setup, which may lead to a bad > behavior in which multiple polling threads operate on the same queue. In this > case, the device sends and receives packets, causing unexpected results. The > procedure is as follows: > 1/ run testpmd with "--rxq=8 --txq=8" > 2/ port stop all > 3/ set nbcore 8 > 4/ port config 0 dcb vt off 4 pfc on > 5/ port config all rxq 16 > 6/ port config all txq 16 > 7/ port start all > 8/ set fwd mac > 9/ start > Same as v2 comment. Please check that. > For the DCB forwarding test, each core is assigned to each traffic class and each > core is assigned a multi-stream. > Therefore, 'nb_fwd_lcores' value needs to be adjusted based on 'total_tc_num' > in all forwarding ports. > > Fixes: 900550de04a7 ("app/testpmd: add dcb support") > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > Cc: stable(a)dpdk.org > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > app/test-pmd/config.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index > ccb9bd3..03ee40c 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -2961,6 +2961,21 @@ rss_fwd_config_setup(void) > } > } > > +static uint16_t > +get_fwd_port_total_tc_num(void) > +{ > + struct rte_eth_dcb_info dcb_info; > + uint16_t total_tc_num = 0; > + unsigned int i; > + > + for (i = 0; i < nb_fwd_ports; i++) { > + (void)rte_eth_dev_get_dcb_info(fwd_ports_ids[i], &dcb_info); > + total_tc_num += dcb_info.nb_tcs; > + } > + > + return total_tc_num; > +} > + > /** > * For the DCB forwarding test, each core is assigned on each traffic class. > * > @@ -2980,12 +2995,16 @@ dcb_fwd_config_setup(void) > lcoreid_t lc_id; > uint16_t nb_rx_queue, nb_tx_queue; > uint16_t i, j, k, sm_id = 0; > + uint16_t total_tc_num; > uint8_t tc = 0; > > cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores; > cur_fwd_config.nb_fwd_ports = nb_fwd_ports; > cur_fwd_config.nb_fwd_streams = > (streamid_t) (nb_rxq * cur_fwd_config.nb_fwd_ports); > + total_tc_num = get_fwd_port_total_tc_num(); > + if (cur_fwd_config.nb_fwd_lcores > total_tc_num) > + cur_fwd_config.nb_fwd_lcores = total_tc_num; > > /* reinitialize forwarding streams */ > init_fwd_streams(); > -- > 2.7.4

1 0

Re: [PATCH V2 1/7] app/testpmd: fix forward lcores number when DCB test
by Li, Xiaoyun 27 Apr '21

27 Apr '21

> -----Original Message----- > From: Huisong Li <lihuisong(a)huawei.com> > Sent: Tuesday, April 20, 2021 15:32 > To: dev(a)dpdk.org > Cc: Yigit, Ferruh <ferruh.yigit(a)intel.com>; Li, Xiaoyun <xiaoyun.li(a)intel.com>; > linuxarm(a)openeuler.org; lihuisong(a)huawei.com > Subject: [PATCH V2 1/7] app/testpmd: fix forward lcores number when DCB test > > Currently, 'nb_fwd_lcores' value are both adjusted based on 'nb_fwd_streams' in > rss/simple/icmp_echo_fwd_config_setup. > But the operation is missing in dcb_fwd_config_setup, which may lead to a bad > behavior in which multiple polling threads operate on the same queue. This patch is OK. But commit log is redundant and confusing. The above is enough to explain what your patch is doing and can even be more simple. >In this > case, the device sends and receives packets, causing unexpected results. The > procedure is as follows: I don't understand what you're saying here. The commands you're showing is 8 nbcores dealing with 16 queues. So it's one thread dealing with multiple queues which doesn't have issues at all. Please remove the useless and confusing commands. > 1/ run testpmd with "--rxq=8 --txq=8" > 2/ port stop all > 3/ set nbcore 8 > 4/ port config 0 dcb vt off 4 pfc on > 5/ port config all rxq 16 > 6/ port config all txq 16 > 7/ port start all > 8/ set fwd mac > 9/ start > > For the DCB forwarding test, each core is assigned to each traffic class and each > core is assigned a multi-stream. > Therefore, 'nb_fwd_lcores' value needs to be adjusted based on 'total_tc_num' > in all forwarding ports. Please refer to the RSS fwd config fix patch to write your own commit log. Use simple and easy-understanding words to explain yourself. Below is the reference of RSS. commit 017d680a91fcf30da14a6d3a2f96d41f6dda3a0f Author: Pablo de Lara <pablo.de.lara.guarch(a)intel.com> Date: Mon Jun 27 23:35:19 2016 +0100 app/testpmd: limit number of forwarding cores Number of forwarding cores must be equal or less than number of forwarding streams, otherwise two cores would try to use a same queue on a port, which is not allowed. > > Fixes: 900550de04a7 ("app/testpmd: add dcb support") > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > Cc: stable(a)dpdk.org > > Signed-off-by: Huisong Li <lihuisong(a)huawei.com> > Signed-off-by: Lijun Ou <oulijun(a)huawei.com> > --- > app/test-pmd/config.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index > ccb9bd3..03ee40c 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -2961,6 +2961,21 @@ rss_fwd_config_setup(void) > } > } > > +static uint16_t > +get_fwd_port_total_tc_num(void) > +{ > + struct rte_eth_dcb_info dcb_info; > + uint16_t total_tc_num = 0; > + unsigned int i; > + > + for (i = 0; i < nb_fwd_ports; i++) { > + (void)rte_eth_dev_get_dcb_info(fwd_ports_ids[i], &dcb_info); > + total_tc_num += dcb_info.nb_tcs; > + } > + > + return total_tc_num; > +} > + > /** > * For the DCB forwarding test, each core is assigned on each traffic class. > * > @@ -2980,12 +2995,16 @@ dcb_fwd_config_setup(void) > lcoreid_t lc_id; > uint16_t nb_rx_queue, nb_tx_queue; > uint16_t i, j, k, sm_id = 0; > + uint16_t total_tc_num; > uint8_t tc = 0; > > cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores; > cur_fwd_config.nb_fwd_ports = nb_fwd_ports; > cur_fwd_config.nb_fwd_streams = > (streamid_t) (nb_rxq * cur_fwd_config.nb_fwd_ports); > + total_tc_num = get_fwd_port_total_tc_num(); > + if (cur_fwd_config.nb_fwd_lcores > total_tc_num) > + cur_fwd_config.nb_fwd_lcores = total_tc_num; > > /* reinitialize forwarding streams */ > init_fwd_streams(); > -- > 2.7.4

1 0

Re: [PATCH] sched/fair: don't use waker's cpu if the waker of sync wake-up is interrupt
by Mike Galbraith 27 Apr '21

27 Apr '21

On Tue, 2021-04-27 at 14:37 +1200, Barry Song wrote: > > To fix this issue, this patch checks the waker of sync wake-up is a task > but not an interrupt. In this case, the waker will schedule out and give > CPU to wakee. That rash "the waker will schedule out" assumption, ie this really really is a synchronous wakeup, may be true in your particular case, but the sync hint is so overused as to be fairly meaningless. We've squabbled with it repeatedly over the years because of that. It really should either become more of a communication of intent than it currently is, or just go away. I'd argue for go away, simply because there is no way for the kernel to know that there isn't more work directly behind any particular wakeup. Take a pipe, does shoving some bits through a pipe mean you have no further use of your CPU? IFF you're doing nothing but playing ping- pong, sure it does, but how many real proggies have zero overlap among its threads of execution? The mere notion of threaded apps having no overlap *to be converted to throughput* is dainbramaged, which should be the death knell of the sync wakeup hint. Threaded apps can't do stuff like, oh, networking, which uses the sync hint heavily, without at least to some extent defeating the purpose of threading if we were to take the hint seriously. Heck, just look at the beauty (gak) of wake_wide(). It was born specifically to combat the pure-evil side of the sync wakeup hint. Bah, 'nuff "Danger Will Robinson, that thing will *eat you*!!" ;-) -Mike

2 4

Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
by Alex Williamson 26 Apr '21

26 Apr '21

On Wed, 21 Apr 2021 17:59:02 +0800 liulongfang <liulongfang(a)huawei.com> wrote: > On 2021/4/21 6:04, Alex Williamson wrote: > > On Tue, 20 Apr 2021 09:59:57 -0300 > > Jason Gunthorpe <jgg(a)nvidia.com> wrote: > > > >> On Tue, Apr 20, 2021 at 08:50:12PM +0800, liulongfang wrote: > >>> On 2021/4/19 20:33, Jason Gunthorpe wrote: > >>>> On Mon, Apr 19, 2021 at 08:24:40PM +0800, liulongfang wrote: > >>>> > >>>>>> I'm also confused how this works securely at all, as a general rule a > >>>>>> VFIO PCI driver cannot access the MMIO memory of the function it is > >>>>>> planning to assign to the guest. There is a lot of danger that the > >>>>>> guest could access that MMIO space one way or another. > >>>>> > >>>>> VF's MMIO memory is divided into two parts, one is the guest part, > >>>>> and the other is the live migration part. They do not affect each other, > >>>>> so there is no security problem. > >>>> > >>>> AFAIK there are several scenarios where a guest can access this MMIO > >>>> memory using DMA even if it is not mapped into the guest for CPU > >>>> access. > >>>> > >>> The hardware divides VF's MMIO memory into two parts. The live migration > >>> driver in the host uses the live migration part, and the device driver in > >>> the guest uses the guest part. They obtain the address of VF's MMIO memory > >>> in their respective drivers, although these two parts The memory is > >>> continuous on the hardware device, but due to the needs of the drive function, > >>> they will not perform operations on another part of the memory, and the > >>> device hardware also independently responds to the operation commands of > >>> the two parts. > >> > >> It doesn't matter, the memory is still under the same PCI BDF and VFIO > >> supports scenarios where devices in the same IOMMU group are not > >> isolated from each other. > >> > >> This is why the granual of isolation is a PCI BDF - VFIO directly > >> blocks kernel drivers from attaching to PCI BDFs that are not > >> completely isolated from VFIO BDF. > >> > >> Bypassing this prevention and attaching a kernel driver directly to > >> the same BDF being exposed to the guest breaks that isolation model. > >> > >>> So, I still don't understand what the security risk you are talking about is, > >>> and what do you think the security design should look like? > >>> Can you elaborate on it? > >> > >> Each security domain must have its own PCI BDF. > >> > >> The migration control registers must be on a different VF from the VF > >> being plugged into a guest and the two VFs have to be in different > >> IOMMU groups to ensure they are isolated from each other. > > > > I think that's a solution, I don't know if it's the only solution. > > AIUI, the issue here is that we have a device specific kernel driver > > extending vfio-pci with migration support for this device by using an > > If the two parts of the MMIO region are split into different BAR spaces on > the device, the MMIO region of the business function is still placed in BAR2, > and the MMIO region of the live migration function is moved to BAR4. > Only BAR2 is mapped in the guest. only BAR4 is mapped in the host. > This can solve this security issue. The concern is really the "on the device" part rather than whether the resources are within the same BAR or not. We need to assume that a user driver can generate a DMA targeting any address in the system, including in this case the user driver could generate a DMA targeting this migration BAR. Ideally this would be routed upstream to the IOMMU where it would be blocked for lack of a translation entry. However, because this range resides on the same PCIe requester ID, it's logically more similar to a two-function device where the functions are considered non-isolated and are therefore exposed within the same IOMMU group. We would not allow a kernel driver for one of those functions and a userspace driver for the other. In this case those drivers are strongly related, but we still need to consider to what extent a malicious user driver can interfere with or exploit the kernel side driver. > > MMIO region of the same device. This is susceptible to DMA> manipulation by the user device. Whether that's a security issue or> not depends on how the user can break the device. If the scope is > > limited to breaking their own device, they can do that any number of > > ways and it's not very interesting. If the user can manipulate device > > state in order to trigger an exploit of the host-side kernel driver, > > that's obviously more of a problem. > > > > The other side of this is that if migration support can be implemented > > entirely within the VF using this portion of the device MMIO space, why > > do we need the host kernel to support this rather than implementing it > > in userspace? For example, QEMU could know about this device, > > manipulate the BAR size to expose only the operational portion of MMIO > > to the VM and use the remainder to support migration itself. I'm > > afraid that just like mdev, the vfio migration uAPI is going to be used > > as an excuse to create kernel drivers simply to be able to make use of > > that uAPI. I haven't looked at this driver to know if it has some > > When the accelerator device is designed to support the live migration > function, it is based on the uAPI of the migration region to realize the > live migration function, so the live migration function requires a driver > that connects to this uAPI. > Is this set of interfaces not open to us now? In your model, if both BARs are exposed to userspace and a device specific extension in QEMU claims the migration BAR rather than exposing it to the VM, could that driver mimic the migration region uAPI from userspace? For example, you don't need page pinning to interact with the IOMMU, you don't need resources beyond the scope of the endpoint device itself, and the migration info BAR is safe for userspace to manage? If so, then a kernel-based driver to expose a migration uAPI seems like it's only a risk for the kernel, ie. moving what could be a userspace driver into the kernel for the convenience of re-using a kernel uAPI. Thanks, Alex

2 1

Re: [PATCH V4] ethdev: add queue state when retrieve queue information
by Thomas Monjalon 26 Apr '21

26 Apr '21

16/04/2021 10:46, Lijun Ou: > Currently, upper-layer application could get queue state only > through pointers such as dev->data->tx_queue_state[queue_id], > this is not the recommended way to access it. So this patch > add get queue state when call rte_eth_rx_queue_info_get and > rte_eth_tx_queue_info_get API. > > Note: After add queue_state field, the 'struct rte_eth_rxq_info' size > remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so > it could be ABI compatible. [...] > --- a/doc/guides/rel_notes/release_21_05.rst > +++ b/doc/guides/rel_notes/release_21_05.rst > @@ -251,6 +251,12 @@ ABI Changes > function was already marked as internal in the API documentation for it, > and was not for use by external applications. > > +* Added new field ``queue_state`` to ``rte_eth_rxq_info`` structure > + to provide indicated rxq queue state. > + > +* Added new field ``queue_state`` to ``rte_eth_txq_info`` structure > + to provide indicated txq queue state. Not sure we should add a note here for additions which do not break ABI compatibility. It may be confusing.

4 6

Re: [PATCH 2/2] Documentation/ABI: Move the topology-related sysfs interface to the right place
by Song Bao Hua (Barry Song) 26 Apr '21

26 Apr '21

> -----Original Message----- > From: tiantao (H) > Sent: Monday, April 26, 2021 2:26 PM > To: Song Bao Hua (Barry Song) <song.bao.hua(a)hisilicon.com> > Cc: linuxarm(a)openeuler.org; tiantao (H) <tiantao6(a)hisilicon.com> > Subject: [PATCH 2/2] Documentation/ABI: Move the topology-related sysfs > interface to the right place > > Move the interface that exists under > /sys/devices/system/cpu/cpuX/topology/ to the more logical > Documentation/ABI/ file that can be properly parsed and > displayed to the user space > > Signed-off-by: Tian Tao <tiantao6(a)hisilicon.com> > Signed-off-by: Barry Song <song.bao.hua(a)hisilicon.com> > --- > Documentation/ABI/stable/sysfs-devices-system-cpu | 149 > ++++++++++++++++++++++ > Documentation/admin-guide/cputopology.rst | 104 --------------- > 2 files changed, 149 insertions(+), 104 deletions(-) > > diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu > b/Documentation/ABI/stable/sysfs-devices-system-cpu > index 33c133e..7d0b23e 100644 > --- a/Documentation/ABI/stable/sysfs-devices-system-cpu > +++ b/Documentation/ABI/stable/sysfs-devices-system-cpu > @@ -1,3 +1,7 @@ > +Export CPU topology info via sysfs. Items (attributes) are similar > +to /proc/cpuinfo output of some architectures. They reside in > +/sys/devices/system/cpu/cpuX/topology/: > + > What: /sys/devices/system/cpu/dscr_default > Date: 13-May-2014 > KernelVersion: v3.15.0 > @@ -23,3 +27,148 @@ Description: Default value for the Data Stream Control > Register (DSCR) on > here). > If set by a process it will be inherited by child processes. > Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/physical_package_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: physical package id of cpuX. Typically corresponds to a > physical > + socket number, but the actual value is architecture and platform > + dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/die_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the CPU die ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/core_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the CPU core ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/book_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the book ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/drawer_id > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: the drawer ID of cpuX. Typically it is the hardware platform's > + identifier (rather than the kernel's). The actual value is > + architecture and platform dependent. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/core_cpus > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of CPUs within the same core. > + (deprecated name: "thread_siblings") > +Values: hexadecimal bitmask. > + > +What: /sys/devices/system/cpu/cpuX/topology/core_cpus_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of CPUs within the same core. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > + (deprecated name: "thread_siblings_list"). > +Values: hexadecimal bitmask. No. this is a list not a mask. > + > +What: /sys/devices/system/cpu/cpuX/topology/package_cpus > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of the CPUs sharing the same > physical_package_id. > + (deprecated name: "core_siblings"). > +Values: 64 bit unsigned integer (bit field) Id is unsigned integer. Here it is hexadecimal bitmask. > + > +What: /sys/devices/system/cpu/cpuX/topology/package_cpus_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of CPUs sharing the same > physical_package_id. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > + (deprecated name: "core_siblings_list") > +Values: hexadecimal bitmask. As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/die_cpus > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of CPUs within the same die. > +Values: 64 bit unsigned integer (bit field) As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/die_cpus_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of CPUs within the same die. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > +Values: hexadecimal bitmask. As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/book_siblings > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of cpuX's hardware threads within the same > + book_id. > +Values: 64 bit unsigned integer (bit field) As above. > + > +What: /sys/devices/system/cpu/cpuX/topology/book_siblings_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of cpuX's hardware threads within the same > + book_id. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. For example here we should mark it "available to s390 only" > +Values: hexadecimal bitmask. > + > +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: internal kernel map of cpuX's hardware threads within the same > + drawer_id. > +Values: 64 bit unsigned integer (bit field) > + > +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list > +Date: 19-Mar-2021 > +KernelVersion: v5.12 > +Contact: > +Description: human-readable list of cpuX's hardware threads within the same > + drawer_id. > + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > + so the tail of the string will be trimmed while its size is larger > + than PAGE_SIZE. > +Values: hexadecimal bitmask. > + > +Architecture-neutral, drivers/base/topology.c, exports these attributes. > +However, the book and drawer related sysfs files will only be created if > +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively. > + > +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390, > +where they reflect the cpu and cache hierarchy. These are not ABIs, better to be in original doc. Better to describe drawer ABI, book ABIs are only available for s390 in the ABI description. > diff --git a/Documentation/admin-guide/cputopology.rst > b/Documentation/admin-guide/cputopology.rst > index 4538d78..4672465 100644 > --- a/Documentation/admin-guide/cputopology.rst > +++ b/Documentation/admin-guide/cputopology.rst > @@ -2,110 +2,6 @@ > How CPU topology info is exported via sysfs > =========================================== > > -Export CPU topology info via sysfs. Items (attributes) are similar > -to /proc/cpuinfo output of some architectures. They reside in > -/sys/devices/system/cpu/cpuX/topology/: > - > -physical_package_id: > - > - physical package id of cpuX. Typically corresponds to a physical > - socket number, but the actual value is architecture and platform > - dependent. > - > -die_id: > - > - the CPU die ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -core_id: > - > - the CPU core ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -book_id: > - > - the book ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -drawer_id: > - > - the drawer ID of cpuX. Typically it is the hardware platform's > - identifier (rather than the kernel's). The actual value is > - architecture and platform dependent. > - > -core_cpus: > - > - internal kernel map of CPUs within the same core. > - (deprecated name: "thread_siblings") > - > -core_cpus_list: > - > - human-readable list of CPUs within the same core. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - (deprecated name: "thread_siblings_list"); > - > -package_cpus: > - > - internal kernel map of the CPUs sharing the same physical_package_id. > - (deprecated name: "core_siblings") > - > -package_cpus_list: > - > - human-readable list of CPUs sharing the same physical_package_id. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - (deprecated name: "core_siblings_list") > - > -die_cpus: > - > - internal kernel map of CPUs within the same die. > - > -die_cpus_list: > - > - human-readable list of CPUs within the same die. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - > -book_siblings: > - > - internal kernel map of cpuX's hardware threads within the same > - book_id. > - > -book_siblings_list: > - > - human-readable list of cpuX's hardware threads within the same > - book_id. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - > -drawer_siblings: > - > - internal kernel map of cpuX's hardware threads within the same > - drawer_id. > - > -drawer_siblings_list: > - > - human-readable list of cpuX's hardware threads within the same > - drawer_id. > - The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, > - so the tail of the string will be trimmed while its size is larger > - than PAGE_SIZE. > - > -Architecture-neutral, drivers/base/topology.c, exports these attributes. > -However, the book and drawer related sysfs files will only be created if > -CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively. > - > -CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390, > -where they reflect the cpu and cache hierarchy. > - > For an architecture to support this feature, it must define some of > these macros in include/asm-XXX/topology.h:: > > -- > 2.7.4

1 0