[RFC PATCH v3 0/2] scheduler: expose the topology of clusters and add cluster scheduler
by Barry Song
ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
cluster has 4 cpus. All clusters share L3 cache data while each cluster
has local L3 tag. On the other hand, each cluster will share some
internal system bus. This means cache is much more affine inside one cluster
than across clusters.
+-----------------------------------+ +---------+
| +------+ +------+ +---------------------------+ |
| | CPU0 | | cpu1 | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ cluster | | tag | | |
| | CPU2 | | CPU3 | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| | | L3 | | |
| +------+ +------+ +----+ tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | L3 |
| data |
+-----------------------------------+ | |
| +------+ +------+ | +-----------+ | |
| | | | | | | | | |
| +------+ +------+ +----+ L3 | | |
| | | tag | | |
| +------+ +------+ | | | | |
| | | | | ++ +-----------+ | |
| +------+ +------+ |---------------------------+ |
+-----------------------------------| | |
+-----------------------------------| | |
| +------+ +------+ +---------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ | | tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
Through the following small program, you can see the performance impact of
running it in one cluster and across two clusters:
struct foo {
int x;
int y;
} f;
void *thread1_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
s += f.x;
}
void *thread2_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
f.y++;
}
int main(int argc, char **argv)
{
pthread_t tid1, tid2;
pthread_create(&tid1, NULL, thread1_fun, NULL);
pthread_create(&tid2, NULL, thread2_fun, NULL);
pthread_join(tid1, NULL);
pthread_join(tid2, NULL);
}
While running this program in one cluster, it takes:
$ time taskset -c 0,1 ./a.out
real 0m0.832s
user 0m1.649s
sys 0m0.004s
As a contrast, it takes much more time if we run the same program
in two clusters:
$ time taskset -c 0,4 ./a.out
real 0m1.133s
user 0m1.960s
sys 0m0.000s
0.832/1.133 = 73%, it is a huge difference.
Also, hackbench running on 4 cpus in single one cluster and 4 cpus in
different clusters also shows a large contrast:
* inside a cluster:
root@ubuntu:~# taskset -c 0,1,2,3 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 4.285
* across clusters:
root@ubuntu:~# taskset -c 0,4,8,12 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 5.524
The score is 4.285 vs 5.524, shorter time means better performance.
All these testing implies that we should let the Linux scheduler use
this topology to make better load balancing and WAKE_AFFINE decisions.
However, the current scheduler totally has no idea of clusters.
This patchset exposed the cluster topology first, then added the sched
domain for cluster. While it is named as "cluster", architectures and
machines can define the exact meaning of cluster as long as they have
some resources sharing under llc and they can leverage the affinity
of this resource to achive better scheduling performance.
-v3:
- rebased againest 5.11-rc2
- with respect to the comments of Valentin Schneider, Peter Zijlstra,
Vincent Guittot and Mel Gorman etc.
* moved the scheduler changes from arm64 to the common place for all
architectures.
* added SD_SHARE_CLS_RESOURCES sd_flags specifying the sched_domain
where select_idle_cpu() should begin to scan from
* removed redundant select_idle_cluster() function since all code is
in select_idle_cpu() now. it also avoided scanning cluster cpus
twice in v2 code;
* redo the hackbench in one numa after the above changes
Valentin suggested that select_idle_cpu() could begin to scan from
domain with SD_SHARE_PKG_RESOURCES. Changing like this might be too
aggressive and limit the spreading of tasks. Thus, this patch lets
the architectures and machines to decide where to start by adding
a new SD_SHARE_CLS_RESOURCES.
Barry Song (1):
scheduler: add scheduler level for clusters
Jonathan Cameron (1):
topology: Represent clusters of CPUs within a die.
Documentation/admin-guide/cputopology.rst | 26 +++++++++++---
arch/arm64/Kconfig | 7 ++++
arch/arm64/kernel/topology.c | 2 ++
drivers/acpi/pptt.c | 60 +++++++++++++++++++++++++++++++
drivers/base/arch_topology.c | 14 ++++++++
drivers/base/topology.c | 10 ++++++
include/linux/acpi.h | 5 +++
include/linux/arch_topology.h | 5 +++
include/linux/sched/sd_flags.h | 9 +++++
include/linux/sched/topology.h | 7 ++++
include/linux/topology.h | 13 +++++++
kernel/sched/fair.c | 27 ++++++++++----
kernel/sched/topology.c | 6 ++++
13 files changed, 181 insertions(+), 10 deletions(-)
--
2.7.4
1 year, 9 months
[PATCH v2 0/2] EDAC/ghes: Add EDAC device for reporting the CPU cache error count
by Shiju Jose
CPU cache corrected errors are detected occasionally on
few of our ARM64 hardware boards. Though it is rare, the
probability of the CPU cache errors frequently occurring
can't be avoided. The earlier failure detection by monitoring
the cache corrected errors for the frequent occurrences and
taking preventive action could prevent more serious hardware
faults.
On Intel architectures, cache corrected errors are reported and
the affected cores are offlined in the architecture specific method.
http://www.mcelog.org/cache.html
However for the firmware-first error reporting, specifically on
ARM64 architecture, there is no provision present for reporting
the cache corrected error count to the user-space and taking
preventive action such as offline the affected cores.
For this purpose, it was suggested to create the CPU EDAC
device for the CPU caches for reporting the cache error count
for the firmware-first error reporting.
User-space application could monitor the recorded corrected error
count for the earlier hardware failure detection and could take
preventive action, such as offline the corresponding CPU core/s.
Changes:
RFC V1 -> RFC V2:
1. Fixed feedback by Boris.
1.1. Added reason of this patch.
1.2. Changed CPU errors to CPU cache errors in the drivers/edac/Kconfig
1.3 Changed EDAC cache list to percpu variables.
1.4 Changed configuration depends on ARM64.
1.5. Moved discovery of cacheinfo to ghes_scan_system().
2. Changes in the descriptions.
Shiju Jose (2):
EDAC/ghes: Add EDAC device for reporting the CPU cache errors
ACPI / APEI: Add reporting ARM64 CPU cache corrected error count
Documentation/ABI/testing/sysfs-devices-edac | 15 ++
drivers/acpi/apei/ghes.c | 76 +++++++-
drivers/edac/Kconfig | 12 ++
drivers/edac/ghes_edac.c | 186 +++++++++++++++++++
include/acpi/ghes.h | 27 +++
include/linux/cper.h | 4 +
6 files changed, 316 insertions(+), 4 deletions(-)
--
2.17.1
1 year, 10 months
Re: [PATCH] lib/logic_pio: Fix overlap check for pio registery
by John Garry
On 21/12/2020 13:04, Jiahui Cen wrote:
>> On 21/12/2020 03:24, Jiahui Cen wrote:
>>> Hi John,
>>>
>>> On 2020/12/18 18:40, John Garry wrote:
>>>> On 18/12/2020 06:23, Jiahui Cen wrote:
>>>>> Since the [start, end) is a half-open interval, a range with the end equal
>>>>> to the start of another range should not be considered as overlapped.
>>>>>
>>>>> Signed-off-by: Jiahui Cen<cenjiahui(a)huawei.com>
>>>>> ---
>>>>> lib/logic_pio.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/logic_pio.c b/lib/logic_pio.c
>>>>> index f32fe481b492..445d611f1dc1 100644
>>>>> --- a/lib/logic_pio.c
>>>>> +++ b/lib/logic_pio.c
>>>>> @@ -57,7 +57,7 @@ int logic_pio_register_range(struct logic_pio_hwaddr *new_range)
>>>>> new_range->flags == LOGIC_PIO_CPU_MMIO) {
>>>>> /* for MMIO ranges we need to check for overlap */
>>>>> if (start >= range->hw_start + range->size ||
>>>>> - end < range->hw_start) {
>>>>> + end <= range->hw_start) {
>>>> It looks like your change is correct, but should not really have an impact in practice since:
>>>> a: BIOSes generally list ascending IO port CPU addresses
>>>> b. there is space between IO port CPU address regions
>>>>
>>>> Have you seen a problem here?
>>>>
>>> No serious problem. I found it just when I was working on adding support of
>>> pci expander bridge for Arm in QEMU. I found the IO window of some extended
>>> root bus could not be registered when I inserted the extended buses' _CRS
>>> info into DSDT table in the x86 way, which does not sort the buses.
>>>
>>> Though root buses should be sorted in QEMU, would it be better to accept
>>> those non-ascending IO windows?
>>>
>> ok, so it seems that you have seen a real problem, and this issue is not just detected by code analysis.
>>
>>> BTW, for b, it seems to be no space between IO windows of different root buses
>>> generated by EDK2. Or maybe I missed something obvious.
>> I don't know about that. Anyway, your change looks ok.
>>
>> Reviewed-by: John Garry<john.garry(a)huawei.com>
>>
>> BTW, for your virt env, will there be requirement to unregister PCI MMIO ranges? Currently we don't see that in non-virt world.
>>
> Thanks for your review.
>
> And currently there is no such a requirement in my virt env.
>
I am not sure what happened to this patch, but I plan on sending some
patches in this area soon - do you want me to include this one?
Thanks,
John
1 year, 10 months
[RFC V2] app/testpmd: support multi-process
by Lijun Ou
This patch adds multi-process support for testpmd.
The test cmd example as follows:
the primary cmd:
./testpmd -w xxx --file-prefix=xx -l 0-1 -n 2 -- -i\
--rxq=16 --txq=16 --num-procs=2 --proc-id=0
the secondary cmd:
./testpmd -w xxx --file-prefix=xx -l 2-3 -n 2 -- -i\
--rxq=16 --txq=16 --num-procs=2 --proc-id=1
Signed-off-by: Min Hu (Connor) <humin29(a)huawei.com>
Signed-off-by: Lijun Ou <oulijun(a)huawei.com>
---
app/test-pmd/cmdline.c | 6 ++-
app/test-pmd/config.c | 9 +++-
app/test-pmd/parameters.c | 9 ++++
app/test-pmd/testpmd.c | 135 +++++++++++++++++++++++++++++-----------------
app/test-pmd/testpmd.h | 7 +++
5 files changed, 114 insertions(+), 52 deletions(-)
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 89034c8..48af5cd 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -71,8 +71,6 @@
#include "cmdline_tm.h"
#include "bpf_cmd.h"
-static struct cmdline *testpmd_cl;
-
static void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);
/* *** Help command with introduction. *** */
@@ -17124,6 +17122,10 @@ prompt(void)
if (testpmd_cl == NULL)
return;
cmdline_interact(testpmd_cl);
+ if (unlikely(f_quit == 1)) {
+ dup2(testpmd_fd_copy, testpmd_cl->s_in);
+ close(testpmd_fd_copy);
+ }
cmdline_stdin_exit(testpmd_cl);
}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 0e2b9f7..f065008 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3100,6 +3100,8 @@ rss_fwd_config_setup(void)
queueid_t rxq;
queueid_t nb_q;
streamid_t sm_id;
+ int start;
+ int end;
nb_q = nb_rxq;
if (nb_q > nb_txq)
@@ -3117,7 +3119,10 @@ rss_fwd_config_setup(void)
init_fwd_streams();
setup_fwd_config_of_each_lcore(&cur_fwd_config);
- rxp = 0; rxq = 0;
+ start = proc_id * nb_q / num_procs;
+ end = start + nb_q / num_procs;
+ rxp = 0;
+ rxq = start;
for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) {
struct fwd_stream *fs;
@@ -3134,6 +3139,8 @@ rss_fwd_config_setup(void)
continue;
rxp = 0;
rxq++;
+ if (rxq >= end)
+ rxq = start;
}
}
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index df5eb10..ac63854 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -45,6 +45,8 @@
#include <rte_flow.h>
#include "testpmd.h"
+#define PARAM_PROC_ID "proc-id"
+#define PARAM_NUM_PROCS "num-procs"
static void
usage(char* progname)
@@ -603,6 +605,8 @@ launch_args_parse(int argc, char** argv)
{ "rx-mq-mode", 1, 0, 0 },
{ "record-core-cycles", 0, 0, 0 },
{ "record-burst-stats", 0, 0, 0 },
+ { PARAM_NUM_PROCS, 1, 0, 0 },
+ { PARAM_PROC_ID, 1, 0, 0 },
{ 0, 0, 0, 0 },
};
@@ -1356,6 +1360,11 @@ launch_args_parse(int argc, char** argv)
record_core_cycles = 1;
if (!strcmp(lgopts[opt_idx].name, "record-burst-stats"))
record_burst_stats = 1;
+
+ if (strncmp(lgopts[opt_idx].name, PARAM_NUM_PROCS, 8) == 0)
+ num_procs = atoi(optarg);
+ if (strncmp(lgopts[opt_idx].name, PARAM_PROC_ID, 7) == 0)
+ proc_id = atoi(optarg);
break;
case 'h':
usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index c256e71..3abd080 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -63,6 +63,8 @@
#include "testpmd.h"
+int testpmd_fd_copy = 500; /* the copy of STDIN_FILENO */
+
#ifndef MAP_HUGETLB
/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
#define HUGE_FLAG (0x40000)
@@ -125,6 +127,9 @@ uint8_t port_numa[RTE_MAX_ETHPORTS];
*/
uint8_t rxring_numa[RTE_MAX_ETHPORTS];
+int proc_id = 0;
+unsigned num_procs = 1;
+
/*
* Store specified sockets on which TX ring to be used by ports
* is allocated.
@@ -978,16 +983,26 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
/* wrapper to rte_mempool_create() */
TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
rte_mbuf_best_mempool_ops());
- rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
- mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ rte_mp = rte_pktmbuf_pool_create(pool_name,
+ nb_mbuf, mb_mempool_cache, 0,
+ mbuf_seg_size, socket_id);
+ else
+ rte_mp = rte_mempool_lookup(pool_name);
+
break;
}
case MP_ALLOC_ANON:
{
- rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
- mb_size, (unsigned int) mb_mempool_cache,
- sizeof(struct rte_pktmbuf_pool_private),
- socket_id, mempool_flags);
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ rte_mp = rte_mempool_create_empty(pool_name,
+ nb_mbuf, mb_size,
+ (unsigned int)mb_mempool_cache,
+ sizeof(struct rte_pktmbuf_pool_private),
+ socket_id, mempool_flags);
+ else
+ rte_mp = rte_mempool_lookup(pool_name);
+
if (rte_mp == NULL)
goto err;
@@ -1017,9 +1032,13 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
rte_mbuf_best_mempool_ops());
- rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
- mb_mempool_cache, 0, mbuf_seg_size,
- heap_socket);
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ rte_mp = rte_pktmbuf_pool_create(pool_name,
+ nb_mbuf, mb_mempool_cache, 0,
+ mbuf_seg_size, heap_socket);
+ else
+ rte_mp = rte_mempool_lookup(pool_name);
+
break;
}
case MP_ALLOC_XBUF:
@@ -2503,21 +2522,28 @@ start_port(portid_t pid)
return -1;
}
/* configure port */
- diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ diag = rte_eth_dev_configure(pi,
+ nb_rxq + nb_hairpinq,
nb_txq + nb_hairpinq,
&(port->dev_conf));
- if (diag != 0) {
- if (rte_atomic16_cmpset(&(port->port_status),
- RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
- printf("Port %d can not be set back "
- "to stopped\n", pi);
- printf("Fail to configure port %d\n", pi);
- /* try to reconfigure port next time */
- port->need_reconfig = 1;
- return -1;
+ if (diag != 0) {
+ if (rte_atomic16_cmpset(
+ &(port->port_status),
+ RTE_PORT_HANDLING,
+ RTE_PORT_STOPPED) == 0)
+ printf("Port %d can not be set "
+ "back to stopped\n", pi);
+ printf("Fail to configure port %d\n",
+ pi);
+ /* try to reconfigure port next time */
+ port->need_reconfig = 1;
+ return -1;
+ }
}
}
- if (port->need_reconfig_queues > 0) {
+ if (port->need_reconfig_queues > 0 &&
+ rte_eal_process_type() == RTE_PROC_PRIMARY) {
port->need_reconfig_queues = 0;
/* setup tx queues */
for (qi = 0; qi < nb_txq; qi++) {
@@ -2618,15 +2644,18 @@ start_port(portid_t pid)
cnt_pi++;
/* start port */
- if (rte_eth_dev_start(pi) < 0) {
- printf("Fail to start port %d\n", pi);
-
- /* Fail to setup rx queue, return */
- if (rte_atomic16_cmpset(&(port->port_status),
- RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
- printf("Port %d can not be set back to "
- "stopped\n", pi);
- continue;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ diag = rte_eth_dev_start(pi);
+ if (diag < 0) {
+ printf("Fail to start port %d\n", pi);
+
+ /* Fail to setup rx queue, return */
+ if (rte_atomic16_cmpset(&(port->port_status),
+ RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+ printf("Port %d can not be set back to "
+ "stopped\n", pi);
+ continue;
+ }
}
if (rte_atomic16_cmpset(&(port->port_status),
@@ -2755,7 +2784,7 @@ stop_port(portid_t pid)
if (port->flow_list)
port_flow_flush(pi);
- if (rte_eth_dev_stop(pi) != 0)
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY && rte_eth_dev_stop(pi) != 0)
RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
pi);
@@ -2824,8 +2853,10 @@ close_port(portid_t pid)
continue;
}
- port_flow_flush(pi);
- rte_eth_dev_close(pi);
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ port_flow_flush(pi);
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ rte_eth_dev_close(pi);
}
remove_invalid_ports();
@@ -3089,7 +3120,7 @@ pmd_test_exit(void)
}
}
for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
- if (mempools[i])
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY && mempools[i])
rte_mempool_free(mempools[i]);
}
@@ -3537,6 +3568,10 @@ init_port_dcb_config(portid_t pid,
int retval;
uint16_t i;
+ if (num_procs > 1) {
+ printf("The multi-process feature doesn't support dcb.\n");
+ return -ENOTSUP;
+ }
rte_port = &ports[pid];
memset(&port_conf, 0, sizeof(struct rte_eth_conf));
@@ -3635,13 +3670,6 @@ init_port(void)
}
static void
-force_quit(void)
-{
- pmd_test_exit();
- prompt_exit();
-}
-
-static void
print_stats(void)
{
uint8_t i;
@@ -3672,12 +3700,16 @@ signal_handler(int signum)
if (latencystats_enabled != 0)
rte_latencystats_uninit();
#endif
- force_quit();
/* Set flag to indicate the force termination. */
f_quit = 1;
- /* exit with the expected status */
- signal(signum, SIG_DFL);
- kill(getpid(), signum);
+ if (interactive == 1) {
+ dup2(testpmd_cl->s_in, testpmd_fd_copy);
+ close(testpmd_cl->s_in);
+ } else {
+ dup2(0, testpmd_fd_copy);
+ close(0);
+ }
+
}
}
@@ -3702,10 +3734,6 @@ main(int argc, char** argv)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n",
rte_strerror(rte_errno));
- if (rte_eal_process_type() == RTE_PROC_SECONDARY)
- rte_exit(EXIT_FAILURE,
- "Secondary process type not supported.\n");
-
ret = register_eth_event_callback();
if (ret != 0)
rte_exit(EXIT_FAILURE, "Cannot register for ethdev events");
@@ -3801,8 +3829,10 @@ main(int argc, char** argv)
}
}
- if (!no_device_start && start_port(RTE_PORT_ALL) != 0)
+ if (!no_device_start && start_port(RTE_PORT_ALL) != 0) {
+ pmd_test_exit();
rte_exit(EXIT_FAILURE, "Start ports failed\n");
+ }
/* set all ports to promiscuous mode by default */
RTE_ETH_FOREACH_DEV(port_id) {
@@ -3848,6 +3878,8 @@ main(int argc, char** argv)
}
prompt();
pmd_test_exit();
+ if (unlikely(f_quit == 1))
+ prompt_exit();
} else
#endif
{
@@ -3883,6 +3915,11 @@ main(int argc, char** argv)
printf("Press enter to exit\n");
rc = read(0, &c, 1);
pmd_test_exit();
+ if (unlikely(f_quit == 1)) {
+ dup2(testpmd_fd_copy, 0);
+ close(testpmd_fd_copy);
+ prompt_exit();
+ }
if (rc < 0)
return 1;
}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 5f23162..8629c57 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -13,6 +13,7 @@
#include <rte_gso.h>
#include <cmdline.h>
#include <sys/queue.h>
+#include "cmdline.h"
#define RTE_PORT_ALL (~(portid_t)0x0)
@@ -24,6 +25,10 @@
#define RTE_PORT_CLOSED (uint16_t)2
#define RTE_PORT_HANDLING (uint16_t)3
+uint8_t f_quit;
+int testpmd_fd_copy;
+struct cmdline *testpmd_cl;
+
/*
* It is used to allocate the memory for hash key.
* The hash key size is NIC dependent.
@@ -421,6 +426,8 @@ extern uint64_t noisy_lkup_mem_sz;
extern uint64_t noisy_lkup_num_writes;
extern uint64_t noisy_lkup_num_reads;
extern uint64_t noisy_lkup_num_reads_writes;
+extern int proc_id;
+extern unsigned num_procs;
extern uint8_t dcb_config;
extern uint8_t dcb_test;
--
2.7.4
1 year, 10 months
Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
by Shameerali Kolothum Thodi
Hi Eric,
> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: 18 November 2020 11:22
> To: eric.auger.pro(a)gmail.com; eric.auger(a)redhat.com;
> iommu(a)lists.linux-foundation.org; linux-kernel(a)vger.kernel.org;
> kvm(a)vger.kernel.org; kvmarm(a)lists.cs.columbia.edu; will(a)kernel.org;
> joro(a)8bytes.org; maz(a)kernel.org; robin.murphy(a)arm.com;
> alex.williamson(a)redhat.com
> Cc: jean-philippe(a)linaro.org; zhangfei.gao(a)linaro.org;
> zhangfei.gao(a)gmail.com; vivek.gautam(a)arm.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi(a)huawei.com>;
> jacob.jun.pan(a)linux.intel.com; yi.l.liu(a)intel.com; tn(a)semihalf.com;
> nicoleotsuka(a)gmail.com; yuzenghui <yuzenghui(a)huawei.com>
> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
>
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
>
> Then those capabilities gets implemented in the SMMUv3 driver.
>
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
I am seeing an issue with Guest testpmd run with this series.
I have two different setups and testpmd works fine with the
first one but not with the second.
1). Guest doesn't have kernel driver built-in for pass-through dev.
root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 0000
Flags: fast devsel
Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints
root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Invalid NUMA socket, default to 0
EAL: using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 8E:A6:8C:43:43:45
Checking link statuses...
Done
testpmd>
2). Guest have kernel driver built-in for pass-through dev.
root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 0000
Flags: bus master, fast devsel, latency 0
Memory at 8000100000 (64-bit, prefetchable) [size=64K]
Memory at 8000000000 (64-bit, prefetchable) [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints
Kernel driver in use: hns3
root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Invalid NUMA socket, default to 0
EAL: using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
hns3vf_init_vf(): Failed to fetch configuration: -62
hns3vf_dev_init(): Failed to init vf: -62
EAL: Releasing pci mapped resource for 0000:00:02.0
EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
EAL: Requested device 0000:00:02.0 cannot be used
EAL: Bus (pci) probe failed.
EAL: No legacy callbacks, legacy socket not created
testpmd: No probed ethernet devices
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
testpmd>
And in this case, smmu(host) reports a translation fault,
[ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
[ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
[ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
[ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
Tested with Intel 82599 card(ixgbevf) as well. but same errror.
Not able to root cause the problem yet. With the hope that, this is
related to tlb entries not being invlaidated properly, I tried explicitly
issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
the STE update, but no luck yet :(
Please let me know if I am missing something here or has any clue if you
can replicate this on your setup.
Thanks,
Shameer
>
> Best Regards
>
> Eric
>
> This series can be found at:
> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
> (including the VFIO part in his last version: v11)
>
> The series includes a patch from Jean-Philippe. It is better to
> review the original patch:
> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>
> The VFIO series is sent separately.
>
> History:
>
> v12 -> v13:
> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
> reported by Shameer. This urged me to revisit patch 4 into
> iommu/smmuv3: Allow s1 and s2 configs to coexist where
> s1_cfg and s2_cfg are not dynamically allocated anymore.
> Instead I use a new set field in existing structs
> - fixed 2 others config checks
> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
> according to the last version
>
> v11 -> v12:
> - rebase on top of v5.10-rc4
>
> Eric Auger (14):
> iommu: Introduce attach/detach_pasid_table API
> iommu: Introduce bind/unbind_guest_msi
> iommu/smmuv3: Allow s1 and s2 configs to coexist
> iommu/smmuv3: Get prepared for nested stage support
> iommu/smmuv3: Implement attach/detach_pasid_table
> iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
> iommu/smmuv3: Implement cache_invalidate
> dma-iommu: Implement NESTED_MSI cookie
> iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
> iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
> regions
> iommu/smmuv3: Implement bind/unbind_guest_msi
> iommu/smmuv3: Report non recoverable faults
> iommu/smmuv3: Accept configs with more than one context descriptor
> iommu/smmuv3: Add PASID cache invalidation per PASID
>
> Jean-Philippe Brucker (1):
> iommu/arm-smmu-v3: Maintain a SID->device structure
>
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
> ++++++++++++++++++--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
> drivers/iommu/dma-iommu.c | 142 ++++-
> drivers/iommu/iommu.c | 105 ++++
> include/linux/dma-iommu.h | 16 +
> include/linux/iommu.h | 41 ++
> include/uapi/linux/iommu.h | 54 ++
> 7 files changed, 1042 insertions(+), 78 deletions(-)
>
> --
> 2.21.3
1 year, 11 months
[PATCH v4 00/12] add IRQF_NO_AUTOEN for request_irq
by Barry Song
This patchset added IRQF_NO_AUTOEN for request_irq() and converted
drivers/input to this new API.
Other drivers will be handled afterwards.
-v4:
remove the irq_settings magic for NOAUTOEN
Barry Song (12):
genirq: add IRQF_NO_AUTOEN for request_irq
Input: ar1021 - request_irq by IRQF_NO_AUTOEN and remove disable_irq
Input: atmel_mxt_ts - request_irq by IRQF_NO_AUTOEN and remove
disable_irq
Input: melfas_mip4 - request_irq by IRQF_NO_AUTOEN and remove
disable_irq
Input: bu21029_ts - request_irq by IRQF_NO_AUTOEN and remove
irq_set_status_flags
Input: stmfts - request_irq by IRQF_NO_AUTOEN and remove
irq_set_status_flags
Input: zinitix - request_irq by IRQF_NO_AUTOEN and remove
irq_set_status_flags
Input: mms114 - request_irq by IRQF_NO_AUTOEN and remove disable_irq
Input: wm831x-ts - request_irq by IRQF_NO_AUTOEN and remove
disable_irq
Input: cyttsp - request_irq by IRQF_NO_AUTOEN and remove disable_irq
Input: tegra-kbc - request_irq by IRQF_NO_AUTOEN and remove
disable_irq
Input: tca6416-keypad - request_irq by IRQF_NO_AUTOEN and remove
disable_irq
drivers/input/keyboard/tca6416-keypad.c | 3 +--
drivers/input/keyboard/tegra-kbc.c | 5 ++---
drivers/input/touchscreen/ar1021_i2c.c | 5 +----
drivers/input/touchscreen/atmel_mxt_ts.c | 5 ++---
drivers/input/touchscreen/bu21029_ts.c | 4 ++--
drivers/input/touchscreen/cyttsp_core.c | 5 ++---
drivers/input/touchscreen/melfas_mip4.c | 5 ++---
drivers/input/touchscreen/mms114.c | 4 ++--
drivers/input/touchscreen/stmfts.c | 3 +--
drivers/input/touchscreen/wm831x-ts.c | 3 +--
drivers/input/touchscreen/zinitix.c | 4 ++--
include/linux/interrupt.h | 3 +++
kernel/irq/manage.c | 8 +++++++-
13 files changed, 28 insertions(+), 29 deletions(-)
--
2.25.1
1 year, 11 months
[PATCH v2 for-next] RDMA/hns: Add support of direct wqe
by Weihang Li
From: Yixing Liu <liuyixing1(a)huawei.com>
Direct wqe is a mechanism to fill wqe directly into the hardware. In the
case of light load, the wqe will be filled into pcie bar space of the
hardware, this will reduce one memory access operation and therefore reduce
the latency.
Signed-off-by: Yixing Liu <liuyixing1(a)huawei.com>
Signed-off-by: Lang Cheng <chenglang(a)huawei.com>
Signed-off-by: Weihang Li <liweihang(a)huawei.com>
---
Changes since v1:
* Delete an extra blank line.
* Link: https://patchwork.kernel.org/project/linux-rdma/patch/1611395717-11081-1-...
drivers/infiniband/hw/hns/hns_roce_device.h | 6 ++++
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 44 ++++++++++++++++++++++++++++-
drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 13 +++++++++
3 files changed, 62 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index f62851f..907bf71 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -90,6 +90,7 @@
#define HNS_ROCE_MAX_PORTS 6
#define HNS_ROCE_GID_SIZE 16
#define HNS_ROCE_SGE_SIZE 16
+#define HNS_ROCE_DWQE_SIZE 65536
#define HNS_ROCE_HOP_NUM_0 0xff
@@ -643,6 +644,10 @@ struct hns_roce_work {
u32 queue_num;
};
+enum {
+ HNS_ROCE_QP_CAP_DIRECT_WQE = BIT(5),
+};
+
struct hns_roce_qp {
struct ib_qp ibqp;
struct hns_roce_wq rq;
@@ -984,6 +989,7 @@ struct hns_roce_dev {
struct mutex pgdir_mutex;
int irq[HNS_ROCE_MAX_IRQ_NUM];
u8 __iomem *reg_base;
+ void __iomem *mem_base;
struct hns_roce_caps caps;
struct xarray qp_table_xa;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index a5bbfb1..8ae3317 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -503,6 +503,8 @@ static inline int set_ud_wqe(struct hns_roce_qp *qp,
if (ret)
return ret;
+ qp->sl = to_hr_ah(ud_wr(wr)->ah)->av.sl;
+
set_extend_sge(qp, wr->sg_list, &curr_idx, valid_num_sge);
/*
@@ -635,6 +637,8 @@ static inline void update_sq_db(struct hns_roce_dev *hr_dev,
V2_DB_BYTE_4_TAG_S, qp->doorbell_qpn);
roce_set_field(sq_db.byte_4, V2_DB_BYTE_4_CMD_M,
V2_DB_BYTE_4_CMD_S, HNS_ROCE_V2_SQ_DB);
+ /* indicates data on new BAR, 0 : SQ doorbell, 1 : DWQE */
+ roce_set_bit(sq_db.byte_4, V2_DB_FLAG_S, 0);
roce_set_field(sq_db.parameter, V2_DB_PARAMETER_IDX_M,
V2_DB_PARAMETER_IDX_S, qp->sq.head);
roce_set_field(sq_db.parameter, V2_DB_PARAMETER_SL_M,
@@ -644,6 +648,38 @@ static inline void update_sq_db(struct hns_roce_dev *hr_dev,
}
}
+static void hns_roce_write512(struct hns_roce_dev *hr_dev, u64 *val,
+ u64 __iomem *dest)
+{
+#define HNS_ROCE_WRITE_TIMES 8
+ struct hns_roce_v2_priv *priv = (struct hns_roce_v2_priv *)hr_dev->priv;
+ struct hnae3_handle *handle = priv->handle;
+ const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
+ int i;
+
+ if (!hr_dev->dis_db && !ops->get_hw_reset_stat(handle))
+ for (i = 0; i < HNS_ROCE_WRITE_TIMES; i++)
+ writeq_relaxed(*(val + i), dest + i);
+}
+
+static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp,
+ void *wqe)
+{
+ struct hns_roce_v2_rc_send_wqe *rc_sq_wqe = wqe;
+
+ /* All kinds of DirectWQE have the same header field layout */
+ roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_FLAG_S, 1);
+ roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_DB_SL_L_M,
+ V2_RC_SEND_WQE_BYTE_4_DB_SL_L_S, qp->sl);
+ roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_DB_SL_H_M,
+ V2_RC_SEND_WQE_BYTE_4_DB_SL_H_S, qp->sl >> 2);
+ roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M,
+ V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S, qp->sq.head);
+
+ hns_roce_write512(hr_dev, wqe, hr_dev->mem_base +
+ HNS_ROCE_DWQE_SIZE * qp->ibqp.qp_num);
+}
+
static int hns_roce_v2_post_send(struct ib_qp *ibqp,
const struct ib_send_wr *wr,
const struct ib_send_wr **bad_wr)
@@ -710,7 +746,12 @@ static int hns_roce_v2_post_send(struct ib_qp *ibqp,
qp->next_sge = sge_idx;
/* Memory barrier */
wmb();
- update_sq_db(hr_dev, qp);
+
+ if (nreq == 1 && qp->sq.head == qp->sq.tail + 1 &&
+ (qp->en_flags & HNS_ROCE_QP_CAP_DIRECT_WQE))
+ write_dwqe(hr_dev, qp, wqe);
+ else
+ update_sq_db(hr_dev, qp);
}
spin_unlock_irqrestore(&qp->sq.lock, flags);
@@ -6273,6 +6314,7 @@ static void hns_roce_hw_v2_get_cfg(struct hns_roce_dev *hr_dev,
/* Get info from NIC driver. */
hr_dev->reg_base = handle->rinfo.roce_io_base;
+ hr_dev->mem_base = handle->rinfo.roce_mem_base;
hr_dev->caps.num_ports = 1;
hr_dev->iboe.netdevs[0] = handle->rinfo.netdev;
hr_dev->iboe.phy_port[0] = 0;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 69bc072..add1816 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -1098,6 +1098,8 @@ struct hns_roce_v2_mpt_entry {
#define V2_DB_BYTE_4_CMD_S 24
#define V2_DB_BYTE_4_CMD_M GENMASK(27, 24)
+#define V2_DB_FLAG_S 31
+
#define V2_DB_PARAMETER_IDX_S 0
#define V2_DB_PARAMETER_IDX_M GENMASK(15, 0)
@@ -1194,6 +1196,15 @@ struct hns_roce_v2_rc_send_wqe {
#define V2_RC_SEND_WQE_BYTE_4_OPCODE_S 0
#define V2_RC_SEND_WQE_BYTE_4_OPCODE_M GENMASK(4, 0)
+#define V2_RC_SEND_WQE_BYTE_4_DB_SL_L_S 5
+#define V2_RC_SEND_WQE_BYTE_4_DB_SL_L_M GENMASK(6, 5)
+
+#define V2_RC_SEND_WQE_BYTE_4_DB_SL_H_S 13
+#define V2_RC_SEND_WQE_BYTE_4_DB_SL_H_M GENMASK(14, 13)
+
+#define V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S 15
+#define V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M GENMASK(30, 15)
+
#define V2_RC_SEND_WQE_BYTE_4_OWNER_S 7
#define V2_RC_SEND_WQE_BYTE_4_CQE_S 8
@@ -1216,6 +1227,8 @@ struct hns_roce_v2_rc_send_wqe {
#define V2_RC_FRMR_WQE_BYTE_4_LW_S 23
+#define V2_RC_SEND_WQE_BYTE_4_FLAG_S 31
+
#define V2_RC_SEND_WQE_BYTE_16_XRC_SRQN_S 0
#define V2_RC_SEND_WQE_BYTE_16_XRC_SRQN_M GENMASK(23, 0)
--
2.8.1
1 year, 11 months
[PATCH for-next 00/12] RDMA/hns: Several fixes and cleanups of RQ/SRQ
by Weihang Li
There are some issues when using SRQ on HIP08/HIP09, the first part of this
series is some fixes on them.
In addition, the codes about RQ/SRQ including the creation and post recv
flow are a bit hard to understand, they need to be refactored.
Lang Cheng (2):
RDMA/hns: Allocate one more recv SGE for HIP08
RDMA/hns: Use new interfaces to write SRQC
Wenpeng Liang (8):
RDMA/hns: Bugfix for checking whether the srq is full when post wr
RDMA/hns: Force srq_limit to 0 when creating SRQ
RDMA/hns: Fixed wrong judgments in the goto branch
RDMA/hns: Remove the reserved WQE of SRQ
RDMA/hns: Refactor hns_roce_create_srq()
RDMA/hns: Refactor code about SRQ Context
RDMA/hns: Refactor hns_roce_v2_post_srq_recv()
RDMA/hns: Add verification of QP type when post_recv
Xi Wang (2):
RDMA/hns: Refactor post recv flow
RDMA/hns: Clear remaining unused sges when post_recv
drivers/infiniband/hw/hns/hns_roce_device.h | 16 +-
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 407 +++++++++++++++-------------
drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 72 +++--
drivers/infiniband/hw/hns/hns_roce_main.c | 3 +-
drivers/infiniband/hw/hns/hns_roce_qp.c | 37 ++-
drivers/infiniband/hw/hns/hns_roce_srq.c | 329 ++++++++++++----------
6 files changed, 510 insertions(+), 354 deletions(-)
--
2.8.1
1 year, 11 months
[PATCH v2 RFC 0/7] RDMA/hns: Add support for Dynamic Context Attachment
by Weihang Li
The hip09 introduces the DCA(Dynamic Context Attachment) feature which
supports many RC QPs to share the WQE buffer in a memory pool. If a QP
enables DCA feature, the WQE's buffer will not be allocated when creating
but when the users start to post WRs. This will reduce the memory
consumption when there are too many QPs are inactive.
Changes since v1:
* Replace all GFP_ATOMIC with GFP_NOWAIT, because the former may use
emergency pool if no regular memory can be found.
* Change size of cap_flags of alloc_ucontext_resp from 32 to 64 to avoid
a potential problem when pass it back to the userspace.
* Move definition of HNS_ROCE_CAP_FLAG_DCA_MODE to hns-abi.h.
* Rename free_mem_states() to free_dca_states() in #1.
* Link: https://patchwork.kernel.org/project/linux-rdma/cover/1610706138-4219-1-g...
Xi Wang (7):
RDMA/hns: Introduce DCA for RC QP
RDMA/hns: Add method for shrinking DCA memory pool
RDMA/hns: Configure DCA mode for the userspace QP
RDMA/hns: Add method for attaching WQE buffer
RDMA/hns: Setup the configuration of WQE addressing to QPC
RDMA/hns: Add method to detach WQE buffer
RDMA/hns: Add method to query WQE buffer's address
drivers/infiniband/hw/hns/Makefile | 2 +-
drivers/infiniband/hw/hns/hns_roce_dca.c | 1264 +++++++++++++++++++++++++++
drivers/infiniband/hw/hns/hns_roce_dca.h | 68 ++
drivers/infiniband/hw/hns/hns_roce_device.h | 31 +
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 223 ++++-
drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 3 +
drivers/infiniband/hw/hns/hns_roce_main.c | 27 +-
drivers/infiniband/hw/hns/hns_roce_qp.c | 119 ++-
include/uapi/rdma/hns-abi.h | 64 ++
9 files changed, 1754 insertions(+), 47 deletions(-)
create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.c
create mode 100644 drivers/infiniband/hw/hns/hns_roce_dca.h
--
2.8.1
1 year, 11 months
[PATCH] perf metricgroup: Fix system PMU metrics
by John Garry
Joakim reports that getting "perf stat" for multiple system PMU metrics
segfaults:
./perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all
Segmentation fault
While the same works without issue for a single metric.
The logic in metricgroup__add_metric_sys_event_iter() is broken, in that
add_metric() @m argument should be NULL for each new metric. Fix by not
passing a holder for that, and rather make local in
metricgroup__add_metric_sys_event_iter().
Fixes: be335ec28efa ("perf metricgroup: Support adding metrics for system PMUs")
Reported-by: Joakim Zhang <qiangqing.zhang(a)nxp.com>
Signed-off-by: John Garry <john.garry(a)huawei.com>
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index ee94d3e8dd65..2e60ee170abc 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -766,7 +766,6 @@ int __weak arch_get_runtimeparam(struct pmu_event *pe __maybe_unused)
struct metricgroup_add_iter_data {
struct list_head *metric_list;
const char *metric;
- struct metric **m;
struct expr_ids *ids;
int *ret;
bool *has_match;
@@ -1058,12 +1057,13 @@ static int metricgroup__add_metric_sys_event_iter(struct pmu_event *pe,
void *data)
{
struct metricgroup_add_iter_data *d = data;
+ struct metric *m = NULL;
int ret;
if (!match_pe_metric(pe, d->metric))
return 0;
- ret = add_metric(d->metric_list, pe, d->metric_no_group, d->m, NULL, d->ids);
+ ret = add_metric(d->metric_list, pe, d->metric_no_group, &m, NULL, d->ids);
if (ret)
return ret;
@@ -1114,7 +1114,6 @@ static int metricgroup__add_metric(const char *metric, bool metric_no_group,
.metric_list = &list,
.metric = metric,
.metric_no_group = metric_no_group,
- .m = &m,
.ids = &ids,
.has_match = &has_match,
.ret = &ret,
--
2.26.2
1 year, 11 months