[PATCH OLK-6.6 0/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions and Don't use PMCCNTR_EL0 on SMT cores

From: Qizhi Zhang <zhangqizhi3@h-partners.com> perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions and Don't use PMCCNTR_EL0 on SMT cores PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain conditions. Factor out the condition check to a separate function for further extension. Add documents for better understanding. No functional changes intended. PU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if one of the SMT siblings is not idle on a multi-threaded implementation. So don't use it on SMT cores. Yicong Yang (2): perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores drivers/perf/arm_pmu.c | 3 +++ drivers/perf/arm_pmuv3.c | 32 ++++++++++++++++++++++++++++++-- include/linux/arch_topology.h | 11 +++++++++++ include/linux/perf/arm_pmu.h | 1 + 4 files changed, 45 insertions(+), 2 deletions(-) -- 2.33.0

From: Yicong Yang <yangyicong@hisilicon.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICXCYF ---------------------------------------------------------------------- PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain conditions. Factor out the condition check to a separate function for further extension. Add documents for better understanding. No functional changes intended. Fixes: 81e15ca3e523 ("perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold") Reviewed-by: James Clark <james.clark@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Qizhi Zhang <zhangqizhi3@h-partners.com> --- drivers/perf/arm_pmuv3.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 52ccd3d4b883..63740d8ef5d6 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -1177,6 +1177,25 @@ static int armv8pmu_get_hw_metric_event_idx(struct pmu_hw_events *cpuc, return leader_idx - 1; } +static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc, + struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT; + + if (evtype != ARMV8_PMUV3_PERFCTR_CPU_CYCLES) + return false; + + /* + * A CPU_CYCLES event with threshold counting cannot use PMCCNTR_EL0 + * since it lacks threshold support. + */ + if (armv8pmu_event_get_threshold(&event->attr)) + return false; + + return true; +} + static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, struct perf_event *event) { @@ -1191,8 +1210,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, return -EINVAL; /* Always prefer to place a cycle counter into the cycle counter. */ - if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) && - !armv8pmu_event_get_threshold(&event->attr)) { + if (armv8pmu_can_use_pmccntr(cpuc, event)) { if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask)) return ARMV8_IDX_CYCLE_COUNTER; else if (armv8pmu_event_is_64bit(event) && -- 2.33.0

From: Yicong Yang <yangyicong@hisilicon.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICXCYF ---------------------------------------------------------------------- CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if one of the SMT siblings is not idle on a multi-threaded implementation. So don't use it on SMT cores. Introduce topology_core_has_smt() for knowing the SMT implementation and cached it in arm_pmu::has_smt during allocation. When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this patch we'll get: [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1 --taskset 2 --timeout 1 [...] Performance counter stats for 'CPU(s) 2-3': CPU2 2880457316 cycles CPU3 2880459810 cycles 1.254688470 seconds time elapsed With this patch the idle state of CPU3 is observed as expected: [root@client1 ~]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1 --taskset 2 --timeout 1 [...] Performance counter stats for 'CPU(s) 2-3': CPU2 2558580492 cycles CPU3 305749 cycles 1.113626410 seconds time elapsed Fixes: 81e15ca3e523 ("perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Qizhi Zhang <zhangqizhi3@h-partners.com> --- drivers/perf/arm_pmu.c | 3 +++ drivers/perf/arm_pmuv3.c | 10 ++++++++++ include/linux/arch_topology.h | 11 +++++++++++ include/linux/perf/arm_pmu.h | 1 + 4 files changed, 25 insertions(+) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 5621bbc828af..b72c4e35b387 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -953,6 +953,9 @@ struct arm_pmu *armpmu_alloc(void) events = per_cpu_ptr(pmu->hw_events, cpu); raw_spin_lock_init(&events->pmu_lock); events->percpu_pmu = pmu; + + if (!pmu->has_smt && topology_core_has_smt(cpu)) + pmu->has_smt = true; } return pmu; diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 63740d8ef5d6..a9a079f89e39 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -1180,6 +1180,7 @@ static int armv8pmu_get_hw_metric_event_idx(struct pmu_hw_events *cpuc, static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc, struct perf_event *event) { + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); struct hw_perf_event *hwc = &event->hw; unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT; @@ -1193,6 +1194,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc, if (armv8pmu_event_get_threshold(&event->attr)) return false; + /* + * The PMCCNTR_EL0 increments from the processor clock rather than + * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue + * counting on a WFI PE if one of its SMT silbing is not idle on a + * multi-threaded implementation. So don't use it on SMT cores. + */ + if (cpu_pmu->has_smt) + return false; + return true; } diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index a63d61ca55af..c531cec149d1 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -100,6 +100,17 @@ void remove_cpu_topology(unsigned int cpuid); void reset_cpu_topology(void); int parse_acpi_topology(void); void freq_inv_set_max_ratio(int cpu, u64 max_rate); + +/* + * Architectures like ARM64 don't have reliable architectural way to get SMT + * information and depend on the firmware (ACPI/OF) report. Non-SMT core won't + * initialize thread_id so we can use this to detect the SMT implementation. + */ +static inline bool topology_core_has_smt(int cpu) +{ + return cpu_topology[cpu].thread_id != -1; +} + #endif #endif /* _LINUX_ARCH_TOPOLOGY_H_ */ diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index ac84689cc11c..4f2a25e37042 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -110,6 +110,7 @@ struct arm_pmu { cpumask_t supported_cpus; char *name; int pmuver; + bool has_smt; irqreturn_t (*handle_irq)(struct arm_pmu *pmu); void (*enable)(struct perf_event *event); void (*disable)(struct perf_event *event); -- 2.33.0

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,转换为PR失败! 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/CK5... 失败原因:应用补丁/补丁集失败,Patch failed at 0001 perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions 建议解决方法:请查看失败原因, 确认补丁是否可以应用在当前期望分支的最新代码上 FeedBack: The patch(es) which you have sent to kernel@openeuler.org has been converted to PR failed! Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/CK5... Failed Reason: apply patch(es) failed, Patch failed at 0001 perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions Suggest Solution: please checkout if the failed patch(es) can work on the newest codes in expected branch
participants (2)
-
patchwork bot
-
Yushan Wang