FYI.
> From: anthony.l.nguyen(a)intel.com [mailto:anthony.l.nguyen@intel.com]
> Sent: Wednesday, April 14, 2021 10:28 PM
> To: Salil Mehta <salil.mehta(a)huawei.com>
> Subject: Patch "ice: Re-organizes reqstd/avail {R,T}XQ check/code for
> efficiency+readability" has been added to net-queue tree
>
> This is a note to let you know that I applied the following patch to
> the dev-queue branch on the net-queue tree.
>
> http://patchwork.ozlabs.org/patch/…
[View More]1465962/
>
> From 2c58268dac5bd0dbfe46614febd22b757e29582c Mon Sep 17 00:00:00 2001
> From: Salil Mehta <salil.mehta(a)huawei.com>
> Date: Wed, 14 Apr 2021 12:01:49 -0700
> Subject: [PATCH] ice: Re-organizes reqstd/avail {R,T}XQ check/code for
> efficiency+readability
>
> If user has explicitly requested the number of {R,T}XQs, then it is
> unnecessary to get the count of already available {R,T}XQs from the
> PF avail_{r,t}xqs bitmap. This value will get overridden by user specified
> value in any case.
>
> This patch does minor re-organization of the code for improving the flow
> and readabiltiy. This scope of improvement was found during the review of
> the ICE driver code.
>
> FYI, I could not test this change due to unavailability of the hardware.
> It would be helpful if somebody can test this patch and provide Tested-by
> Tag. Many thanks!
>
> Fixes: 87324e747fde ("ice: Implement ethtool ops for channels")
> Cc: intel-wired-lan(a)lists.osuosl.org
> Cc: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
> Signed-off-by: Salil Mehta <salil.mehta(a)huawei.com>
> ---
>
> checkpatch.pl reports:
>
> total: 0 errors, 0 warnings, 0 checks, 32 lines checked
>
> .apply/V2-net-ice-Re-organizes-reqstd-avail-R-T-XQ-check-code-for-efficienc
> y-readability.patch has no obvious style problems and is ready for submission.
[View Less]
This series adds support for pushing link status to VFs for
the HNS3 ethernet driver.
Guangbin Huang (2):
net: hns3: PF add support for pushing link status to VFs
net: hns3: VF not request link status when PF support push link status
feature
drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h | 3 ++
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 35 +++++++++++++++++++++-
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 1 +
.../net/ethernet/hisilicon/hns3/hns3pf/…
[View More]hclge_mbx.c | 12 ++++----
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 8 +++--
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 1 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c | 6 ++++
7 files changed, 55 insertions(+), 11 deletions(-)
--
2.7.4
[View Less]
Currently, upper-layer application could get queue state only
through pointers such as dev->data->tx_queue_state[queue_id],
this is not the recommended way to access it. So this patch
add get queue state when call rte_eth_rx_queue_info_get and
rte_eth_tx_queue_info_get API.
Note: The hairpin queue is not supported with above
rte_eth_*x_queue_info_get, so the queue state could be
RTE_ETH_QUEUE_STATE_STARTED or RTE_ETH_QUEUE_STATE_STOPPED.
Note: After add queue_state field, the 'struct …
[View More]rte_eth_rxq_info' size
remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so
it could be ABI compatible.
Signed-off-by: Chengwen Feng <fengchengwen(a)huawei.com>
Signed-off-by: Lijun Ou <oulijun(a)huawei.com>
---
doc/guides/rel_notes/release_21_05.rst | 6 ++++++
lib/librte_ethdev/rte_ethdev.c | 3 +++
lib/librte_ethdev/rte_ethdev.h | 4 ++++
3 files changed, 13 insertions(+)
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index 43063e3..165b5f7 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -156,6 +156,12 @@ ABI Changes
* No ABI change that would break compatibility with 20.11.
+* Added new field ``queue_state`` to ``rte_eth_rxq_info`` structure
+ to provide indicated rxq queue state.
+
+* Added new field ``queue_state`` to ``rte_eth_txq_info`` structure
+ to provide indicated txq queue state.
+
Known Issues
------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 3059aa5..fbd10b2 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -5042,6 +5042,8 @@ rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
memset(qinfo, 0, sizeof(*qinfo));
dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
+ qinfo->queue_state = dev->data->rx_queue_state[queue_id];
+
return 0;
}
@@ -5082,6 +5084,7 @@ rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
memset(qinfo, 0, sizeof(*qinfo));
dev->dev_ops->txq_info_get(dev, queue_id, qinfo);
+ qinfo->queue_state = dev->data->tx_queue_state[queue_id];
return 0;
}
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index efda313..3b83c5a 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1591,6 +1591,8 @@ struct rte_eth_rxq_info {
uint8_t scattered_rx; /**< scattered packets RX supported. */
uint16_t nb_desc; /**< configured number of RXDs. */
uint16_t rx_buf_size; /**< hardware receive buffer size. */
+ /**< Queues state: STARTED(1) / STOPPED(0). */
+ uint8_t queue_state;
} __rte_cache_min_aligned;
/**
@@ -1600,6 +1602,8 @@ struct rte_eth_rxq_info {
struct rte_eth_txq_info {
struct rte_eth_txconf conf; /**< queue config parameters. */
uint16_t nb_desc; /**< configured number of TXDs. */
+ /**< Queues state: STARTED(1) / STOPPED(0). */
+ uint8_t queue_state;
} __rte_cache_min_aligned;
/* Generic Burst mode flag definition, values can be ORed. */
--
2.7.4
[View Less]
> From: lipeng (Y)
> Sent: Wednesday, April 14, 2021 2:59 AM
>
>
> 在 2021/4/13 21:55, Salil Mehta 写道:
> >> From: lipeng (Y)
> >> Sent: Tuesday, April 13, 2021 1:38 PM
> >>
> >>
> >> 在 2021/4/13 19:17, Salil Mehta 写道:
> >>>> From: lipeng (Y)
> >>>> Sent: Tuesday, April 13, 2021 10:30 AM
> >>>> To: Salil Mehta <salil.mehta(a)huawei.com>; Linuxarm <linuxarm(a)huawei.com>;
> …
[View More]>>>> linuxarm(a)openeuler.org
> >>>> Cc: Zhuangyuzeng (Yisen) <yisen.zhuang(a)huawei.com>; huangdaode
> >>>> <huangdaode(a)huawei.com>; linyunsheng <linyunsheng(a)huawei.com>; shenjian
> (K)
> >>>> <shenjian15(a)huawei.com>; moyufeng <moyufeng(a)huawei.com>; zhangjiaran
> >>>> <zhangjiaran(a)huawei.com>; huangguangbin (A) <huangguangbin2(a)huawei.com>;
> >>>> Jonathan Cameron <jonathan.cameron(a)huawei.com>
> >>>> Subject: Re: [PATCH RFC net] net: hns3: fixes+refactors the broken set
> channel
> >>>> fallback logic
> >>>>
> >>>>
> >>>> 在 2021/4/13 17:12, Salil Mehta 写道:
> >>>>> The fallback logic of the set channels when set_channel() fails to configure
> >>>>> TX Sched/RSS H/W configuration or for the code which brings down/restores
> >>>>> client before/therafter is not handled properly.
> >>>>>
> >>>>> This patch fixes and refactors that code to improve the readibility.
> >>>>>
> >>>>> [INTERNAL ONLY]
> >>>>> Fixes: <Tag TODO> ("Comment TODO")
> >>>>> Signed-off-by: Salil Mehta <salil.mehta(a)huawei.com>
> >>>>> ---
> >>>>> .../net/ethernet/hisilicon/hns3/hns3_enet.c | 72
> +++++++++++--------
> >>>>> 1 file changed, 43 insertions(+), 29 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> >>>> b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> >>>>> index bf4302a5cf95..41d8bdb34f6e 100644
> >>>>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> >>>>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> >>>>> @@ -4692,23 +4692,56 @@ static int hns3_reset_notify(struct hnae3_handle
> >>>> *handle,
> >>>>> static int hns3_change_channels(struct hnae3_handle *handle, u32
> >>>> new_tqp_num,
> >>>>> bool rxfh_configured)
> >>>>> {
> >>>>> - int ret;
> >>>>> + const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
> >>>>> + u32 org_tqp_num = handle->kinfo.num_tqps;
> >>>>> + struct device *dev = &handle->pdev->dev;
> >>>>> + u32 req_tqp_num = new_tqp_num;
> >>>>> + bool revert_old_config = false;
> >>>>> + int ret, retval = 0;
> >>>>> +
> >>>>> + /* bring down the client */
> >>>>> + ret = hns3_reset_notify(handle, HNAE3_DOWN_CLIENT);
> >>>>> + if (ret)
> >>>>> + return ret;
> >>>>>
> >>>>> - ret = handle->ae_algo->ops->set_channels(handle, new_tqp_num,
> >>>>> - rxfh_configured);
> >>>>> - if (ret) {
> >>>>> - dev_err(&handle->pdev->dev,
> >>>>> - "Change tqp num(%u) fail.\n", new_tqp_num);
> >>>>> + ret = hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT);
> >>>>> + if (ret)
> >>>>> return ret;
> >>>>> +
> >>>>> +revert_old_tpqs_config:
> >>>>> + /* update the traffic sched and RSS config in the H/W as per new TPQ
> num
> >>>> */
> >>>>> + ret = ops->set_channels(handle, req_tqp_num, rxfh_configured);
> >>>>> + if (ret) {
> >>>>> + dev_err(dev, "failed(=%d) to update TX sched/RSS h/w cfg for %s
> >> TPQs\n",
> >>>>> + ret, revert_old_config?"old":"new");
> >>>>> +
> >>>>> + /* try reverting back h/w config with old TPQ num */
> >>>>> + if (!revert_old_config) {
> >>>>> + dev_warn(dev,
> >>>>> + "will try to revert TX sched/RSS h/w cfg with old TPQs\n");
> >>>>> + req_tqp_num = org_tqp_num;
> >>>>> + revert_old_config = true;
> >>>>> + retval = ret;
> >>>>> +
> >>>>> + goto revert_old_tpqs_config;
> >>>>> + }
> >>>>> +
> >>>>> + /* bad, we could not revert to h/w config with old TQP number */
> >>>>> + goto err_set_chan_fail;
> >>>> why you changelike this ?
> >>>>
> >>>>
> >>>> when setting fail, the port should work OK just like setting before.
> >>> The port can get into problematic state if the client remains in un-initialized
> >>> state after errors are encountered during the process of setting the channels
> >>> for new number of TPQs. Port cannot be said to be in sane state.
> >>>
> >>> Conditions(after desperate attempt to revert to old TPQ config fails):
> >>> 1. half un-initialized state of the client or
> >>> 2. client un-initialized but not re-initialized or
> >>> 3. client re-initialization fails later
> >>>
> >>>
> >>> This condition can only be recovered when you reset or re-load the driver.
> >>>
> >>>
> >> do you have any test case for it ?
> >
> > We need to simulate the TX Sched/RSS hardware config failure at firmware/IMP.
> > For the integration Testing, we can always create one but for system testing
> > we need ways in the firmware to be able to simulate these failures. Some
> > debugging feature is required in the firmware. Maybe somebody can take this
> > task and introduce this feature in the IMP firmware code if it does not exists
> > already? I would suggest to expose all these controls using 'devlink' interface
> > to the user which can be used by sys test team as well.
> >
> >
> >> when user change channel larger by ethtool command , and the
> >> memory not enough, it will fail.
> >>
> >> we should recovered too.
> > All of these should get conveyed by the hnae3 layer when you do set channels
> or when
> > we do client re-initialization(init+up)
> >
> This patch will make some test case fail, it is OK before this change.
Can you please list down the test case which is failing or will fail?
[View Less]
ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
cluster has 4 cpus. All clusters share L3 cache data while each cluster
has local L3 tag. On the other hand, each cluster will share some
internal system bus. This means cache is much more affine inside one cluster
than across clusters.
+-----------------------------------+ +---------+
| +------+ +------+ +---------------------------+ |
| | CPU0 | | cpu1 | …
[View More] | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ cluster | | tag | | |
| | CPU2 | | CPU3 | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| | | L3 | | |
| +------+ +------+ +----+ tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | L3 |
| data |
+-----------------------------------+ | |
| +------+ +------+ | +-----------+ | |
| | | | | | | | | |
| +------+ +------+ +----+ L3 | | |
| | | tag | | |
| +------+ +------+ | | | | |
| | | | | ++ +-----------+ | |
| +------+ +------+ |---------------------------+ |
+-----------------------------------| | |
+-----------------------------------| | |
| +------+ +------+ +---------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ | | tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
Through the following small program, you can see the performance impact of
running it in one cluster and across two clusters:
struct foo {
int x;
int y;
} f;
void *thread1_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
s += f.x;
}
void *thread2_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
f.y++;
}
int main(int argc, char **argv)
{
pthread_t tid1, tid2;
pthread_create(&tid1, NULL, thread1_fun, NULL);
pthread_create(&tid2, NULL, thread2_fun, NULL);
pthread_join(tid1, NULL);
pthread_join(tid2, NULL);
}
While running this program in one cluster, it takes:
$ time taskset -c 0,1 ./a.out
real 0m0.832s
user 0m1.649s
sys 0m0.004s
As a contrast, it takes much more time if we run the same program
in two clusters:
$ time taskset -c 0,4 ./a.out
real 0m1.133s
user 0m1.960s
sys 0m0.000s
0.832/1.133 = 73%, it is a huge difference.
Also, hackbench running on 4 cpus in single one cluster and 4 cpus in
different clusters also shows a large contrast:
* inside a cluster:
root@ubuntu:~# taskset -c 0,1,2,3 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 4.285
* across clusters:
root@ubuntu:~# taskset -c 0,4,8,12 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 5.524
The score is 4.285 vs 5.524, shorter time means better performance.
All these testing implies that we should let the Linux scheduler use
this topology to make better load balancing and WAKE_AFFINE decisions.
However, the current scheduler totally has no idea of clusters.
This patchset exposed the cluster topology first, then added the sched
domain for cluster. While it is named as "cluster", architectures and
machines can define the exact meaning of cluster as long as they have
some resources sharing under llc and they can leverage the affinity
of this resource to achive better scheduling performance.
-v3:
- rebased againest 5.11-rc2
- with respect to the comments of Valentin Schneider, Peter Zijlstra,
Vincent Guittot and Mel Gorman etc.
* moved the scheduler changes from arm64 to the common place for all
architectures.
* added SD_SHARE_CLS_RESOURCES sd_flags specifying the sched_domain
where select_idle_cpu() should begin to scan from
* removed redundant select_idle_cluster() function since all code is
in select_idle_cpu() now. it also avoided scanning cluster cpus
twice in v2 code;
* redo the hackbench in one numa after the above changes
Valentin suggested that select_idle_cpu() could begin to scan from
domain with SD_SHARE_PKG_RESOURCES. Changing like this might be too
aggressive and limit the spreading of tasks. Thus, this patch lets
the architectures and machines to decide where to start by adding
a new SD_SHARE_CLS_RESOURCES.
Barry Song (1):
scheduler: add scheduler level for clusters
Jonathan Cameron (1):
topology: Represent clusters of CPUs within a die.
Documentation/admin-guide/cputopology.rst | 26 +++++++++++---
arch/arm64/Kconfig | 7 ++++
arch/arm64/kernel/topology.c | 2 ++
drivers/acpi/pptt.c | 60 +++++++++++++++++++++++++++++++
drivers/base/arch_topology.c | 14 ++++++++
drivers/base/topology.c | 10 ++++++
include/linux/acpi.h | 5 +++
include/linux/arch_topology.h | 5 +++
include/linux/sched/sd_flags.h | 9 +++++
include/linux/sched/topology.h | 7 ++++
include/linux/topology.h | 13 +++++++
kernel/sched/fair.c | 27 ++++++++++----
kernel/sched/topology.c | 6 ++++
13 files changed, 181 insertions(+), 10 deletions(-)
--
2.7.4
[View Less]
If user has explicitly requested the number of {R,T}XQs, then it is unnecessary
to get the count of already available {R,T}XQs from the PF avail_{r,t}xqs
bitmap. This value will get overriden by user specified value in any case.
This patch does minor re-organization of the code for improving the flow and
readabiltiy. This scope of improvement was found during the review of the ICE
driver code.
FYI, I could not test this change due to unavailability of the hardware. It
would helpful if …
[View More]somebody can test this and provide Tested-by Tag. Many thanks!
Fixes: 11b7551e096d ("ice: Implement ethtool ops for channels")
Signed-off-by: Salil Mehta <salil.mehta(a)huawei.com>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index d13c7fc8fb0a..161e8dfe548c 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -161,12 +161,13 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
switch (vsi->type) {
case ICE_VSI_PF:
- vsi->alloc_txq = min3(pf->num_lan_msix,
- ice_get_avail_txq_count(pf),
- (u16)num_online_cpus());
if (vsi->req_txq) {
vsi->alloc_txq = vsi->req_txq;
vsi->num_txq = vsi->req_txq;
+ } else {
+ vsi->alloc_txq = min3(pf->num_lan_msix,
+ ice_get_avail_txq_count(pf),
+ (u16)num_online_cpus());
}
pf->num_lan_tx = vsi->alloc_txq;
@@ -175,12 +176,13 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
if (!test_bit(ICE_FLAG_RSS_ENA, pf->flags)) {
vsi->alloc_rxq = 1;
} else {
- vsi->alloc_rxq = min3(pf->num_lan_msix,
- ice_get_avail_rxq_count(pf),
- (u16)num_online_cpus());
if (vsi->req_rxq) {
vsi->alloc_rxq = vsi->req_rxq;
vsi->num_rxq = vsi->req_rxq;
+ } else {
+ vsi->alloc_rxq = min3(pf->num_lan_msix,
+ ice_get_avail_rxq_count(pf),
+ (u16)num_online_cpus());
}
}
--
2.17.1
[View Less]