This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All the relevant register definitions could be accessed here.
https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
This series applies on 6.10-rc3.
Also this series is being hosted below for quick access, review and test.
https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v18)
- Anshuman
========== Perf Branch Stack Sampling Support (arm64 platforms) ===========
Currently arm64 platform does not support perf branch stack sampling. Hence any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK marked in event->attr.sample_type, will be rejected in armpmu_event_init().
static int armpmu_event_init(struct perf_event *event) { ........ /* does not support taken branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; ........ }
$perf record -j any,u,k ls Error: cycles:P: PMU Hardware or event type doesn't support branch stack sampling.
-------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ----------------------
After this series, perf branch stack sampling feature gets enabled on arm64 platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is also selected during build. Let's observe all all possible scenarios here.
1. Feature not built (!CONFIG_ARM64_BRBE):
Falls back to the current behaviour i.e event gets rejected.
2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE):
Falls back to the current behaviour i.e event gets rejected.
3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE):
Platform supports branch stack sampling requests. Let's observe through a simple example here.
$perf record -j any_call,u,k,save_type ls
[Please refer perf-record man pages for all possible branch filter options]
$perf report -------------------------- Snip ---------------------- # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................................ ............................................ .................. # 3.52% ls [kernel.kallsyms] [k] sched_clock_noinstr [k] arch_counter_get_cntpct 16 3.52% ls [kernel.kallsyms] [k] sched_clock [k] sched_clock_noinstr 9 1.85% ls [kernel.kallsyms] [k] sched_clock_cpu [k] sched_clock 5 1.80% ls [kernel.kallsyms] [k] irqtime_account_irq [k] sched_clock_cpu 20 1.58% ls [kernel.kallsyms] [k] gic_handle_irq [k] generic_handle_domain_irq 19 1.58% ls [kernel.kallsyms] [k] call_on_irq_stack [k] gic_handle_irq 9 1.58% ls [kernel.kallsyms] [k] do_interrupt_handler [k] call_on_irq_stack 23 1.58% ls [kernel.kallsyms] [k] generic_handle_domain_irq [k] __irq_resolve_mapping 6 1.58% ls [kernel.kallsyms] [k] __irq_resolve_mapping [k] __rcu_read_lock 10 -------------------------- Snip ----------------------
$perf report -D | grep cycles -------------------------- Snip ---------------------- ..... 1: ffff800080dd3334 -> ffff800080dd759c 39 cycles P 0 IND_CALL ..... 2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles P 0 IND_CALL ..... 3: ffff800080139918 -> ffff800080ffae64 9 cycles P 0 CALL ..... 4: ffff800080dd3324 -> ffff8000801398f8 7 cycles P 0 CALL ..... 5: ffff8000800f8548 -> ffff800080dd330c 21 cycles P 0 IND_CALL ..... 6: ffff8000800f864c -> ffff8000800f84ec 6 cycles P 0 CALL ..... 7: ffff8000800f86dc -> ffff8000800f8638 11 cycles P 0 CALL ..... 8: ffff8000800f86d4 -> ffff800081008630 16 cycles P 0 CALL -------------------------- Snip ----------------------
perf script and other tooling can also be applied on the captured perf.data Similarly branch stack sampling records can be collected via direct system call i.e perf_event_open() method after setting 'struct perf_event_attr' as required.
event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_<FILTER_1> | PERF_SAMPLE_BRANCH_<FILTER_2> | PERF_SAMPLE_BRANCH_<FILTER_3> | ...............................
But all branch filters might not be supported on the platform.
----------------------- BRBE Branch Filters Support -----------------------
- Following branch filters are supported on arm64.
PERF_SAMPLE_BRANCH_USER /* Branch privilege filters */ PERF_SAMPLE_BRANCH_KERNEL PERF_SAMPLE_BRANCH_HV
PERF_SAMPLE_BRANCH_ANY /* Branch type filters */ PERF_SAMPLE_BRANCH_ANY_CALL PERF_SAMPLE_BRANCH_ANY_RETURN PERF_SAMPLE_BRANCH_IND_CALL PERF_SAMPLE_BRANCH_COND PERF_SAMPLE_BRANCH_IND_JUMP PERF_SAMPLE_BRANCH_CALL
PERF_SAMPLE_BRANCH_NO_FLAGS /* Branch record flags */ PERF_SAMPLE_BRANCH_NO_CYCLES PERF_SAMPLE_BRANCH_TYPE_SAVE PERF_SAMPLE_BRANCH_HW_INDEX PERF_SAMPLE_BRANCH_PRIV_SAVE
- Following branch filters are not supported on arm64.
PERF_SAMPLE_BRANCH_ABORT_TX PERF_SAMPLE_BRANCH_IN_TX PERF_SAMPLE_BRANCH_NO_TX PERF_SAMPLE_BRANCH_CALL_STACK
Events requesting above non-supported branch filters get rejected.
--------------------------- Virtualisation support ------------------------
- No guest support
-------------------------------- Testing ---------------------------------
- Cross compiled for both arm64 and arm32 platforms - Passes all branch tests with 'perf test branch' on arm64
Anshuman Khandual (6): arm64/sysreg: Add BRBE registers and fields KVM: arm64: Explicitly handle BRBE traps as UNDEFINED drivers: perf: arm_pmu: Add infrastructure for branch stack sampling arm64/boot: Enable EL2 requirements for BRBE drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE KVM: arm64: nvhe: Disable branch generation in nVHE guests
James Clark (3): perf: test: Speed up running brstack test on an Arm model perf: test: Remove empty lines from branch filter test output perf: test: Extend branch stack sampling test for Arm64 BRBE
Junhao He (1): arm64/perf: Enable branch stack sampling
Marc Zyngier (7): KVM: arm64: Fix TRFCR_EL1/PMSCR_EL1 access in hVHE mode KVM: arm64: Add accessor for per-CPU state KVM: arm64: Exclude host_debug_data from vcpu_arch KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch KVM: arm64: Exclude FP ownership from kvm_vcpu_arch KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag
Miguel Luis (2): arm64: Add missing _EL12 encodings arm64: Add missing _EL2 encodings
Documentation/arch/arm64/booting.rst | 21 + arch/arm64/configs/openeuler_defconfig | 1 + arch/arm64/include/asm/el2_setup.h | 87 +- arch/arm64/include/asm/kvm_emulate.h | 4 +- arch/arm64/include/asm/kvm_host.h | 94 +- arch/arm64/include/asm/sysreg.h | 55 +- arch/arm64/kvm/arm.c | 8 +- arch/arm64/kvm/debug.c | 5 + arch/arm64/kvm/fpsimd.c | 13 +- arch/arm64/kvm/hyp/include/hyp/debug-sr.h | 8 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 20 +- arch/arm64/kvm/hyp/nvhe/debug-sr.c | 51 +- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 3 - arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 +- arch/arm64/kvm/hyp/nvhe/setup.c | 3 +- arch/arm64/kvm/hyp/nvhe/switch.c | 6 +- arch/arm64/kvm/hyp/vhe/switch.c | 13 +- arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 3 +- arch/arm64/kvm/pmu.c | 2 +- arch/arm64/kvm/sys_regs.c | 56 + arch/arm64/tools/sysreg | 131 +++ drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 1198 +++++++++++++++++++++ drivers/perf/arm_pmu.c | 42 +- drivers/perf/arm_pmuv3.c | 160 ++- drivers/perf/arm_pmuv3_branch.h | 83 ++ include/linux/perf/arm_pmu.h | 37 +- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/shell/test_brstack.sh | 57 +- tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 2 + tools/perf/tests/workloads/traploop.c | 39 + 33 files changed, 2109 insertions(+), 109 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_pmuv3_branch.h create mode 100644 tools/perf/tests/workloads/traploop.c