[PATCH v2] PCI/DPC: Check host->native_dpc before enable dpc service
by Yicong Yang
Per Downstream Port Containment Related Enhancements ECN[1]
Table 4-6, Interpretation of _OSC Control Field Returned Value,
for bit 7 of _OSC control return value:
"Firmware sets this bit to 1 to grant the OS control over PCI Express
Downstream Port Containment configuration."
"If control of this feature was requested and denied,
or was not requested, the firmware returns this bit set to 0."
We store bit 7 of _OSC control return value in host->native_dpc,
check it before enable the dpc service as the firmware may not
grant the control.
[1] Downstream Port Containment Related Enhancements ECN,
Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888
Signed-off-by: Yicong Yang <yangyicong(a)hisilicon.com>
---
Change since v1:
- use correct reference for _OSC control return value
drivers/pci/pcie/portdrv_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed664..7445d03 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -253,7 +253,8 @@ static int get_port_device_capability(struct pci_dev *dev)
*/
if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
pci_aer_available() &&
- (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
+ (pcie_ports_dpc_native ||
+ ((services & PCIE_PORT_SERVICE_AER) && host->native_dpc)))
services |= PCIE_PORT_SERVICE_DPC;
if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
--
2.8.1
1 year, 6 months
[RFC PATCH 0/5] KVM/ARM64 Add support for pinned VMIDs
by Shameer Kolothum
On an ARM64 system with a SMMUv3 implementation that fully supports
Broadcast TLB Maintenance(BTM) feature as part of the Distributed
Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
received by SMMUv3. This is very useful when the SMMUv3 shares the
page tables with the CPU(eg: Guest SVA use case). For this to work,
the SMMU must use the same VMID that is allocated by KVM to configure
the stage 2 translations. At present KVM VMID allocations are recycled
on rollover and may change as a result. This will create issues if we
have to share the KVM VMID with SMMU.
Please see the discussion here,
https://lore.kernel.org/linux-iommu/20200522101755.GA3453945@myrica/
This series proposes a way to share the VMID between KVM and IOMMU
driver by,
1. Splitting the KVM VMID space into two equal halves based on the
command line option "kvm-arm.pinned_vmid_enable".
2. First half of the VMID space follows the normal recycle on rollover
policy.
3. Second half of the VMID space doesn't roll over and is used to
allocate pinned VMIDs.
4. Provides helper function to retrieve the KVM instance associated
with a device(if it is part of a vfio group).
5. Introduces generic interfaces to get/put pinned KVM VMIDs.
Open Items:
1. I couldn't figure out a way to determine whether a platform actually
fully supports DVM/BTM or not. Not sure we can take a call based on
SMMUv3 BTM feature bit alone. Probably we can get it from firmware
via IORT?
2. The current splitting of VMID space is only one way to do this and
probably not the best. Maybe we can follow the pinned ASID method used
in SVA code. Suggestions welcome here.
3. The detach_pasid_table() interface is not very clear to me as the current
Qemu prototype is not using that. This requires fixing from my side.
This is based on Jean-Philippe's SVA series[1] and Eric's SMMUv3 dual-stage
support series[2].
The branch with the whole vSVA + BTM solution is here,
https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva-btm...
This is lightly tested on a HiSilicon D06 platform with uacce/zip dev test tool,
./zip_sva_per -k tlb
Thanks,
Shameer
1. https://github.com/Linaro/linux-kernel-uadk/commits/uacce-devel-5.10
2. https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.auger@red...
Shameer Kolothum (5):
vfio: Add a helper to retrieve kvm instance from a dev
KVM: Add generic infrastructure to support pinned VMIDs
KVM: ARM64: Add support for pinned VMIDs
iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
KVM: arm64: Make sure pinned vmid is released on VM exit
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c | 116 +++++++++++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++-
drivers/vfio/vfio.c | 12 ++
include/linux/kvm_host.h | 17 +++
include/linux/vfio.h | 1 +
virt/kvm/Kconfig | 2 +
virt/kvm/kvm_main.c | 25 +++++
9 files changed, 220 insertions(+), 5 deletions(-)
--
2.17.1
1 year, 6 months
[PATCH net-next 0/9] net: hns3: refactor and new features for flow director
by Huazhong Tan
This patchset refactor some functions and add some new features for
flow director.
patch 1~3: refactor large functions
patch 4, 7: add traffic class and user-def field support for ethtool
patch 5: use asynchronously configuration
patch 6: clean up for hns3_del_all_fd_entries()
patch 8, 9: add support for queue bonding mode
Jian Shen (9):
net: hns3: refactor out hclge_add_fd_entry()
net: hns3: refactor out hclge_fd_get_tuple()
net: hns3: refactor for function hclge_fd_convert_tuple
net: hns3: add support for traffic class tuple support for flow
director by ethtool
net: hns3: refactor flow director configuration
net: hns3: refine for hns3_del_all_fd_entries()
net: hns3: add support for user-def data of flow director
net: hns3: add support for queue bonding mode of flow director
net: hns3: add queue bonding mode support for VF
drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h | 8 +
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 9 +-
drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 7 +-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 91 +-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 14 +-
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 13 +-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 2 +
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 21 +
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 1570 ++++++++++++++------
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 63 +
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 33 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 2 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 74 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 7 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c | 17 +
15 files changed, 1450 insertions(+), 481 deletions(-)
--
2.7.4
1 year, 7 months
[RFC PATCH v5 0/4] scheduler: expose the topology of clusters and add cluster scheduler
by Barry Song
ARM64 server chip Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each
cluster has 4 cpus. All clusters share L3 cache data while each cluster has
local L3 tag. On the other hand, each cluster will share some internal system
bus. This means cache is much more affine inside one cluster than across
clusters.
+-----------------------------------+ +---------+
| +------+ +------+ +---------------------------+ |
| | CPU0 | | cpu1 | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ cluster | | tag | | |
| | CPU2 | | CPU3 | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| | | L3 | | |
| +------+ +------+ +----+ tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | L3 |
| data |
+-----------------------------------+ | |
| +------+ +------+ | +-----------+ | |
| | | | | | | | | |
| +------+ +------+ +----+ L3 | | |
| | | tag | | |
| +------+ +------+ | | | | |
| | | | | ++ +-----------+ | |
| +------+ +------+ |---------------------------+ |
+-----------------------------------| | |
+-----------------------------------| | |
| +------+ +------+ +---------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ | | tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
There is a similar need for clustering in x86. Some x86 cores could share L2 caches
that is similar to the cluster in Kupeng 920 (e.g. on Jacobsville there are 6 clusters
of 4 Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).
Having a sched_domain for clusters will bring two aspects of improvement:
1. spreading unrelated tasks among clusters, which decreases the contention of resources
and improve the throughput.
unrelated tasks might be put randomly without cluster sched_domain:
+-------------------+ +-----------------+
| +----+ +----+ | | |
| |task| |task| | | |
| |1 | |2 | | | |
| +----+ +----+ | | |
| | | |
| cluster1 | | cluster2 |
+-------------------+ +-----------------+
but with cluster sched_domain, they are likely to spread due to LB:
+-------------------+ +-----------------+
| +----+ | | +----+ |
| |task| | | |task| |
| |1 | | | |2 | |
| +----+ | | +----+ |
| | | |
| cluster1 | | cluster2 |
+-------------------+ +-----------------+
2. gathering related tasks within a cluster, which improves the cache affinity of tasks
talking with each other.
Without cluster sched_domain, related tasks might be put randomly. In case task1-8 have
relationship as below:
Task1 wakes up task4
Task2 wakes up task5
Task3 wakes up task6
Task4 wakes up task7
With the tuning of select_idle_cpu() to scan local cluster first, those tasks might
get a chance to be gathered like:
+---------------------------+ +----------------------+
| +----+ +-----+ | | +----+ +-----+ |
| |task| |task | | | |task| |task | |
| |1 | | 4 | | | |2 | |5 | |
| +----+ +-----+ | | +----+ +-----+ |
| | | |
| cluster1 | | cluster2 |
| | | |
| | | |
| +-----+ +------+ | | +-----+ +------+ |
| |task | | task | | | |task | |task | |
| |3 | | 6 | | | |4 | |8 | |
| +-----+ +------+ | | +-----+ +------+ |
+---------------------------+ +----------------------+
Otherwise, the result might be:
+---------------------------+ +----------------------+
| +----+ +-----+ | | +----+ +-----+ |
| |task| |task | | | |task| |task | |
| |1 | | 2 | | | |5 | |6 | |
| +----+ +-----+ | | +----+ +-----+ |
| | | |
| cluster1 | | cluster2 |
| | | |
| | | |
| +-----+ +------+ | | +-----+ +------+ |
| |task | | task | | | |task | |task | |
| |3 | | 4 | | | |7 | |8 | |
| +-----+ +------+ | | +-----+ +------+ |
+---------------------------+ +----------------------+
-v5:
* split "add scheduler level for clusters" into two patches to evaluate the
impact of spreading and gathering separately;
* add a tracepoint of select_idle_cpu for debug purpose; add bcc script in
commit log;
* add cluster_id = -1 in reset_cpu_topology()
* rebased to tip/sched/core
-v4:
* rebased to tip/sched/core with the latest unified code of select_idle_cpu
* added Tim's patch for x86 Jacobsville
* also added benchmark data of spreading unrelated tasks
* avoided the iteration of sched_domain by moving to static_key(addressing
Vincent's comment
* used acpi_cpu_id for acpi_find_processor_node(addressing Masa's comment)
Barry Song (2):
scheduler: add scheduler level for clusters
scheduler: scan idle cpu in cluster before scanning the whole llc
Jonathan Cameron (1):
topology: Represent clusters of CPUs within a die
Tim Chen (1):
scheduler: Add cluster scheduler level for x86
Documentation/admin-guide/cputopology.rst | 26 +++++++++++--
arch/arm64/Kconfig | 7 ++++
arch/arm64/kernel/topology.c | 2 +
arch/x86/Kconfig | 8 ++++
arch/x86/include/asm/smp.h | 7 ++++
arch/x86/include/asm/topology.h | 1 +
arch/x86/kernel/cpu/cacheinfo.c | 1 +
arch/x86/kernel/cpu/common.c | 3 ++
arch/x86/kernel/smpboot.c | 43 ++++++++++++++++++++-
drivers/acpi/pptt.c | 63 +++++++++++++++++++++++++++++++
drivers/base/arch_topology.c | 15 ++++++++
drivers/base/topology.c | 10 +++++
include/linux/acpi.h | 5 +++
include/linux/arch_topology.h | 5 +++
include/linux/sched/cluster.h | 19 ++++++++++
include/linux/sched/topology.h | 7 ++++
include/linux/topology.h | 13 +++++++
include/trace/events/sched.h | 22 +++++++++++
kernel/sched/core.c | 20 ++++++++++
kernel/sched/fair.c | 36 +++++++++++++++++-
kernel/sched/sched.h | 1 +
kernel/sched/topology.c | 5 +++
22 files changed, 313 insertions(+), 6 deletions(-)
create mode 100644 include/linux/sched/cluster.h
--
1.8.3.1
1 year, 9 months
[PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc
by Yunsheng Lin
Lockless qdisc has below concurrent problem:
cpu0 cpu1
. .
q->enqueue .
. .
qdisc_run_begin() .
. .
dequeue_skb() .
. .
sch_direct_xmit() .
. .
. q->enqueue
. qdisc_run_begin()
. return and do nothing
. .
qdisc_run_end() .
cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().
Lockless qdisc has below another concurrent problem when
tx_action is involved:
cpu0(serving tx_action) cpu1 cpu2
. . .
. q->enqueue .
. qdisc_run_begin() .
. dequeue_skb() .
. . q->enqueue
. . .
. sch_direct_xmit() .
. . qdisc_run_begin()
. . return and do nothing
. . .
clear __QDISC_STATE_SCHED . .
qdisc_run_begin() . .
return and do nothing . .
. . .
. qdisc_run_end() .
This patch fixes the above data race by:
1. Get the flag before doing spin_trylock().
2. If the first spin_trylock() return false and the flag is not
set before the first spin_trylock(), Set the flag and retry
another spin_trylock() in case other CPU may not see the new
flag after it releases the lock.
3. reschedule if the flags is set after the lock is released
at the end of qdisc_run_end().
For tx_action case, the flags is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled
again to dequeue the skb enqueued by cpu2.
Only clear the flag before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the above double
spin_trylock() and __netif_schedule() calling.
The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:
threads without+this_patch with+this_patch delta
1 2.61Mpps 2.60Mpps -0.3%
2 3.97Mpps 3.82Mpps -3.7%
4 5.62Mpps 5.59Mpps -0.5%
8 2.78Mpps 2.77Mpps -0.3%
16 2.22Mpps 2.22Mpps -0.0%
Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com>
---
V3: fix a compile error and a few comment typo, remove the
__QDISC_STATE_DEACTIVATED checking, and update the
performance data.
V2: Avoid the overhead of fixing the data race as much as
possible.
---
include/net/sch_generic.h | 38 +++++++++++++++++++++++++++++++++++++-
net/sched/sch_generic.c | 12 ++++++++++++
2 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f7a6e14..e3f46eb 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -36,6 +36,7 @@ struct qdisc_rate_table {
enum qdisc_state_t {
__QDISC_STATE_SCHED,
__QDISC_STATE_DEACTIVATED,
+ __QDISC_STATE_NEED_RESCHEDULE,
};
struct qdisc_size_table {
@@ -159,8 +160,38 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
static inline bool qdisc_run_begin(struct Qdisc *qdisc)
{
if (qdisc->flags & TCQ_F_NOLOCK) {
+ bool dont_retry = test_bit(__QDISC_STATE_NEED_RESCHEDULE,
+ &qdisc->state);
+
+ if (spin_trylock(&qdisc->seqlock))
+ goto nolock_empty;
+
+ /* If the flag is set before doing the spin_trylock() and
+ * the above spin_trylock() return false, it means other cpu
+ * holding the lock will do dequeuing for us, or it wil see
+ * the flag set after releasing lock and reschedule the
+ * net_tx_action() to do the dequeuing.
+ */
+ if (dont_retry)
+ return false;
+
+ /* We could do set_bit() before the first spin_trylock(),
+ * and avoid doing second spin_trylock() completely, then
+ * we could have multi cpus doing the set_bit(). Here use
+ * dont_retry to avoid doing the set_bit() and the second
+ * spin_trylock(), which has 5% performance improvement than
+ * doing the set_bit() before the first spin_trylock().
+ */
+ set_bit(__QDISC_STATE_NEED_RESCHEDULE,
+ &qdisc->state);
+
+ /* Retry again in case other CPU may not see the new flag
+ * after it releases the lock at the end of qdisc_run_end().
+ */
if (!spin_trylock(&qdisc->seqlock))
return false;
+
+nolock_empty:
WRITE_ONCE(qdisc->empty, false);
} else if (qdisc_is_running(qdisc)) {
return false;
@@ -176,8 +207,13 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
static inline void qdisc_run_end(struct Qdisc *qdisc)
{
write_seqcount_end(&qdisc->running);
- if (qdisc->flags & TCQ_F_NOLOCK)
+ if (qdisc->flags & TCQ_F_NOLOCK) {
spin_unlock(&qdisc->seqlock);
+
+ if (unlikely(test_bit(__QDISC_STATE_NEED_RESCHEDULE,
+ &qdisc->state)))
+ __netif_schedule(qdisc);
+ }
}
static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 44991ea..4953430 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -640,8 +640,10 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
{
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
struct sk_buff *skb = NULL;
+ bool need_retry = true;
int band;
+retry:
for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
struct skb_array *q = band2list(priv, band);
@@ -652,6 +654,16 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
}
if (likely(skb)) {
qdisc_update_stats_at_dequeue(qdisc, skb);
+ } else if (need_retry &&
+ test_and_clear_bit(__QDISC_STATE_NEED_RESCHEDULE,
+ &qdisc->state)) {
+ /* do another enqueuing after clearing the flag to
+ * avoid calling __netif_schedule().
+ */
+ smp_mb__after_atomic();
+ need_retry = false;
+
+ goto retry;
} else {
WRITE_ONCE(qdisc->empty, true);
}
--
2.7.4
1 year, 9 months
[PATCH] app/testpmd: support Tx mbuf free on demand cmd
by Lijun Ou
From: Chengwen Feng <fengchengwen(a)huawei.com>
This patch support tx_done_cleanup command:
tx_done_cleanup port (port_id) (queue_id) (free_cnt)
User must make sure there are no concurrent access to the same Tx
queue (like rte_eth_tx_burst, rte_eth_dev_tx_queue_stop and so on)
when this command executed.
Signed-off-by: Chengwen Feng <fengchengwen(a)huawei.com>
Signed-off-by: Lijun Ou <oulijun(a)huawei.com>
---
app/test-pmd/cmdline.c | 91 +++++++++++++++++++++++++++++
doc/guides/rel_notes/release_21_05.rst | 2 +
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
3 files changed, 100 insertions(+)
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 14110eb..832ae70 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -36,6 +36,7 @@
#include <rte_pci.h>
#include <rte_ether.h>
#include <rte_ethdev.h>
+#include <rte_ethdev_driver.h>
#include <rte_string_fns.h>
#include <rte_devargs.h>
#include <rte_flow.h>
@@ -675,6 +676,9 @@ static void cmd_help_long_parsed(void *parsed_result,
"set port (port_id) ptype_mask (ptype_mask)\n"
" set packet types classification for a specific port\n\n"
+ "tx_done_cleanup (port_id) (queue_id) (free_cnt)\n"
+ " Cleanup a tx queue's mbuf on a port\n\n"
+
"set port (port_id) queue-region region_id (value) "
"queue_start_index (value) queue_num (value)\n"
" Set a queue region on a port\n\n"
@@ -16910,6 +16914,92 @@ cmdline_parse_inst_t cmd_showport_macs = {
},
};
+/* *** tx_done_cleanup *** */
+struct cmd_tx_done_cleanup_result {
+ cmdline_fixed_string_t clean;
+ cmdline_fixed_string_t port;
+ uint16_t port_id;
+ uint16_t queue_id;
+ uint32_t free_cnt;
+};
+
+static void
+cmd_tx_done_cleanup_parsed(void *parsed_result,
+ __rte_unused struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_tx_done_cleanup_result *res = parsed_result;
+ struct rte_eth_dev *dev;
+ uint16_t port_id = res->port_id;
+ uint16_t queue_id = res->queue_id;
+ uint32_t free_cnt = res->free_cnt;
+ int ret;
+
+ if (!rte_eth_dev_is_valid_port(port_id)) {
+ printf("Invalid port_id %u\n", port_id);
+ return;
+ }
+
+ dev = &rte_eth_devices[port_id];
+ if (queue_id >= dev->data->nb_tx_queues) {
+ printf("Invalid TX queue_id %u\n", queue_id);
+ return;
+ }
+
+ if (dev->data->tx_queue_state[queue_id] !=
+ RTE_ETH_QUEUE_STATE_STARTED) {
+ printf("TX queue_id %u not started!\n", queue_id);
+ return;
+ }
+
+ /*
+ * rte_eth_tx_done_cleanup is a dataplane API, user must make sure
+ * there are no concurrent access to the same Tx queue (like
+ * rte_eth_tx_burst, rte_eth_dev_tx_queue_stop and so on) when this API
+ * called.
+ */
+ ret = rte_eth_tx_done_cleanup(port_id, queue_id, free_cnt);
+ if (ret < 0) {
+ printf("Failed to cleanup mbuf for port %u TX queue %u "
+ "error desc: %s(%d)\n",
+ port_id, queue_id, strerror(-ret), ret);
+ return;
+ }
+
+ printf("Cleanup port %u TX queue %u mbuf nums: %u\n",
+ port_id, queue_id, ret);
+}
+
+cmdline_parse_token_string_t cmd_tx_done_cleanup_clean =
+ TOKEN_STRING_INITIALIZER(struct cmd_tx_done_cleanup_result, clean,
+ "tx_done_cleanup");
+cmdline_parse_token_string_t cmd_tx_done_cleanup_port =
+ TOKEN_STRING_INITIALIZER(struct cmd_tx_done_cleanup_result, port,
+ "port");
+cmdline_parse_token_num_t cmd_tx_done_cleanup_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_tx_done_cleanup_result, port_id,
+ UINT16);
+cmdline_parse_token_num_t cmd_tx_done_cleanup_queue_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_tx_done_cleanup_result, queue_id,
+ UINT16);
+cmdline_parse_token_num_t cmd_tx_done_cleanup_free_cnt =
+ TOKEN_NUM_INITIALIZER(struct cmd_tx_done_cleanup_result, free_cnt,
+ UINT32);
+
+cmdline_parse_inst_t cmd_tx_done_cleanup = {
+ .f = cmd_tx_done_cleanup_parsed,
+ .data = NULL,
+ .help_str = "tx_done_cleanup port <port_id> <queue_id> <free_cnt>",
+ .tokens = {
+ (void *)&cmd_tx_done_cleanup_clean,
+ (void *)&cmd_tx_done_cleanup_port,
+ (void *)&cmd_tx_done_cleanup_port_id,
+ (void *)&cmd_tx_done_cleanup_queue_id,
+ (void *)&cmd_tx_done_cleanup_free_cnt,
+ NULL,
+ },
+};
+
/* ******************************************************************************** */
/* list of instructions */
@@ -17035,6 +17125,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_config_rss_reta,
(cmdline_parse_inst_t *)&cmd_showport_reta,
(cmdline_parse_inst_t *)&cmd_showport_macs,
+ (cmdline_parse_inst_t *)&cmd_tx_done_cleanup,
(cmdline_parse_inst_t *)&cmd_config_burst,
(cmdline_parse_inst_t *)&cmd_config_thresh,
(cmdline_parse_inst_t *)&cmd_config_threshold,
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index 23f7f0b..8077573 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -69,6 +69,8 @@ New Features
* Added command to display Rx queue used descriptor count.
``show port (port_id) rxq (queue_id) desc used count``
+ * Added command to cleanup a Tx queue's mbuf on a port.
+ ``tx_done_cleanup port <port_id> <queue_id> <free_cnt>``
Removed Items
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index f59eb8a..39281f5 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -272,6 +272,13 @@ and ready to be processed by the driver on a given RX queue::
testpmd> show port (port_id) rxq (queue_id) desc used count
+cleanup txq mbufs
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Request the driver to free mbufs currently cached by the driver for a given port's
+Tx queue::
+ testpmd> tx_done_cleanup port (port_id) (queue_id) (free_cnt)
+
show config
~~~~~~~~~~~
--
2.7.4
1 year, 9 months
[PATCH] ethdev: add queue state when retrieve queue information
by Lijun Ou
Currently, upper-layer application could get queue state only
through pointers such as dev->data->tx_queue_state[queue_id],
this is not the recommended way to access it. So this patch
add get queue state when call rte_eth_rx_queue_info_get and
rte_eth_tx_queue_info_get API.
Note: The hairpin queue is not supported with above
rte_eth_*x_queue_info_get, so the queue state could be
RTE_ETH_QUEUE_STATE_STARTED or RTE_ETH_QUEUE_STATE_STOPPED.
Note: After add queue_state field, the 'struct rte_eth_rxq_info' size
remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so
it could be ABI compatible.
Signed-off-by: Chengwen Feng <fengchengwen(a)huawei.com>
Signed-off-by: Lijun Ou <oulijun(a)huawei.com>
---
doc/guides/rel_notes/release_21_05.rst | 6 ++++++
lib/librte_ethdev/rte_ethdev.c | 3 +++
lib/librte_ethdev/rte_ethdev.h | 4 ++++
3 files changed, 13 insertions(+)
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index 43063e3..165b5f7 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -156,6 +156,12 @@ ABI Changes
* No ABI change that would break compatibility with 20.11.
+* Added new field ``queue_state`` to ``rte_eth_rxq_info`` structure
+ to provide indicated rxq queue state.
+
+* Added new field ``queue_state`` to ``rte_eth_txq_info`` structure
+ to provide indicated txq queue state.
+
Known Issues
------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 3059aa5..fbd10b2 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -5042,6 +5042,8 @@ rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
memset(qinfo, 0, sizeof(*qinfo));
dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
+ qinfo->queue_state = dev->data->rx_queue_state[queue_id];
+
return 0;
}
@@ -5082,6 +5084,7 @@ rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
memset(qinfo, 0, sizeof(*qinfo));
dev->dev_ops->txq_info_get(dev, queue_id, qinfo);
+ qinfo->queue_state = dev->data->tx_queue_state[queue_id];
return 0;
}
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index efda313..3b83c5a 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1591,6 +1591,8 @@ struct rte_eth_rxq_info {
uint8_t scattered_rx; /**< scattered packets RX supported. */
uint16_t nb_desc; /**< configured number of RXDs. */
uint16_t rx_buf_size; /**< hardware receive buffer size. */
+ /**< Queues state: STARTED(1) / STOPPED(0). */
+ uint8_t queue_state;
} __rte_cache_min_aligned;
/**
@@ -1600,6 +1602,8 @@ struct rte_eth_rxq_info {
struct rte_eth_txq_info {
struct rte_eth_txconf conf; /**< queue config parameters. */
uint16_t nb_desc; /**< configured number of TXDs. */
+ /**< Queues state: STARTED(1) / STOPPED(0). */
+ uint8_t queue_state;
} __rte_cache_min_aligned;
/* Generic Burst mode flag definition, values can be ORed. */
--
2.7.4
1 year, 9 months
[RFC PATCH v3 0/2] scheduler: expose the topology of clusters and add cluster scheduler
by Barry Song
ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
cluster has 4 cpus. All clusters share L3 cache data while each cluster
has local L3 tag. On the other hand, each cluster will share some
internal system bus. This means cache is much more affine inside one cluster
than across clusters.
+-----------------------------------+ +---------+
| +------+ +------+ +---------------------------+ |
| | CPU0 | | cpu1 | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ cluster | | tag | | |
| | CPU2 | | CPU3 | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| | | L3 | | |
| +------+ +------+ +----+ tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | L3 |
| data |
+-----------------------------------+ | |
| +------+ +------+ | +-----------+ | |
| | | | | | | | | |
| +------+ +------+ +----+ L3 | | |
| | | tag | | |
| +------+ +------+ | | | | |
| | | | | ++ +-----------+ | |
| +------+ +------+ |---------------------------+ |
+-----------------------------------| | |
+-----------------------------------| | |
| +------+ +------+ +---------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ | | tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
Through the following small program, you can see the performance impact of
running it in one cluster and across two clusters:
struct foo {
int x;
int y;
} f;
void *thread1_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
s += f.x;
}
void *thread2_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
f.y++;
}
int main(int argc, char **argv)
{
pthread_t tid1, tid2;
pthread_create(&tid1, NULL, thread1_fun, NULL);
pthread_create(&tid2, NULL, thread2_fun, NULL);
pthread_join(tid1, NULL);
pthread_join(tid2, NULL);
}
While running this program in one cluster, it takes:
$ time taskset -c 0,1 ./a.out
real 0m0.832s
user 0m1.649s
sys 0m0.004s
As a contrast, it takes much more time if we run the same program
in two clusters:
$ time taskset -c 0,4 ./a.out
real 0m1.133s
user 0m1.960s
sys 0m0.000s
0.832/1.133 = 73%, it is a huge difference.
Also, hackbench running on 4 cpus in single one cluster and 4 cpus in
different clusters also shows a large contrast:
* inside a cluster:
root@ubuntu:~# taskset -c 0,1,2,3 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 4.285
* across clusters:
root@ubuntu:~# taskset -c 0,4,8,12 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 5.524
The score is 4.285 vs 5.524, shorter time means better performance.
All these testing implies that we should let the Linux scheduler use
this topology to make better load balancing and WAKE_AFFINE decisions.
However, the current scheduler totally has no idea of clusters.
This patchset exposed the cluster topology first, then added the sched
domain for cluster. While it is named as "cluster", architectures and
machines can define the exact meaning of cluster as long as they have
some resources sharing under llc and they can leverage the affinity
of this resource to achive better scheduling performance.
-v3:
- rebased againest 5.11-rc2
- with respect to the comments of Valentin Schneider, Peter Zijlstra,
Vincent Guittot and Mel Gorman etc.
* moved the scheduler changes from arm64 to the common place for all
architectures.
* added SD_SHARE_CLS_RESOURCES sd_flags specifying the sched_domain
where select_idle_cpu() should begin to scan from
* removed redundant select_idle_cluster() function since all code is
in select_idle_cpu() now. it also avoided scanning cluster cpus
twice in v2 code;
* redo the hackbench in one numa after the above changes
Valentin suggested that select_idle_cpu() could begin to scan from
domain with SD_SHARE_PKG_RESOURCES. Changing like this might be too
aggressive and limit the spreading of tasks. Thus, this patch lets
the architectures and machines to decide where to start by adding
a new SD_SHARE_CLS_RESOURCES.
Barry Song (1):
scheduler: add scheduler level for clusters
Jonathan Cameron (1):
topology: Represent clusters of CPUs within a die.
Documentation/admin-guide/cputopology.rst | 26 +++++++++++---
arch/arm64/Kconfig | 7 ++++
arch/arm64/kernel/topology.c | 2 ++
drivers/acpi/pptt.c | 60 +++++++++++++++++++++++++++++++
drivers/base/arch_topology.c | 14 ++++++++
drivers/base/topology.c | 10 ++++++
include/linux/acpi.h | 5 +++
include/linux/arch_topology.h | 5 +++
include/linux/sched/sd_flags.h | 9 +++++
include/linux/sched/topology.h | 7 ++++
include/linux/topology.h | 13 +++++++
kernel/sched/fair.c | 27 ++++++++++----
kernel/sched/topology.c | 6 ++++
13 files changed, 181 insertions(+), 10 deletions(-)
--
2.7.4
1 year, 9 months
[PATCH v1 0/2] scsi: libsas: few clean up patches
by Luo Jiaxing
Two types of errors are detected by the checkpatch.
1. Alignment between switches and cases
2. Improper use of some spaces
Here are the clean up patches.
Luo Jiaxing (2):
scsi: libsas: make switch and case at the same indent in
sas_to_ata_err()
scsi: libsas: clean up for white spaces
drivers/scsi/libsas/sas_ata.c | 74 ++++++++++++++++++--------------------
drivers/scsi/libsas/sas_discover.c | 2 +-
drivers/scsi/libsas/sas_expander.c | 15 ++++----
3 files changed, 43 insertions(+), 48 deletions(-)
--
2.7.4
1 year, 9 months
[PATCH 0/3] Fixes for testpmd
by Lijun Ou
This series add two test bug fixes and a print style.
Hongbo Zheng (1):
app/testpmd: use of Rx/Tx in testpmd
Huisong Li (2):
app/testpmd: fix forwarding configuration when DCB test
app/testpmd: remove forwarding config from parsing Rx and Tx
app/test-pmd/cmdline.c | 106 ++++++++++++++++----------------
app/test-pmd/config.c | 147 +++++++++++++++++++++++++--------------------
app/test-pmd/csumonly.c | 22 +++----
app/test-pmd/icmpecho.c | 2 +-
app/test-pmd/ieee1588fwd.c | 18 +++---
app/test-pmd/parameters.c | 50 +++++++--------
app/test-pmd/testpmd.c | 132 ++++++++++++++++++++--------------------
app/test-pmd/testpmd.h | 28 ++++-----
app/test-pmd/txonly.c | 2 +-
9 files changed, 263 insertions(+), 244 deletions(-)
--
2.7.4
1 year, 9 months