This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All the relevant register definitions could be accessed here.
https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
This series applies on 6.10-rc3.
Also this series is being hosted below for quick access, review and test.
https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v18)
- Anshuman
========== Perf Branch Stack Sampling Support (arm64 platforms) ===========
Currently arm64 platform does not support perf branch stack sampling. Hence any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK marked in event->attr.sample_type, will be rejected in armpmu_event_init().
static int armpmu_event_init(struct perf_event *event) { ........ /* does not support taken branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; ........ }
$perf record -j any,u,k ls Error: cycles:P: PMU Hardware or event type doesn't support branch stack sampling.
-------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ----------------------
After this series, perf branch stack sampling feature gets enabled on arm64 platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is also selected during build. Let's observe all all possible scenarios here.
1. Feature not built (!CONFIG_ARM64_BRBE):
Falls back to the current behaviour i.e event gets rejected.
2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE):
Falls back to the current behaviour i.e event gets rejected.
3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE):
Platform supports branch stack sampling requests. Let's observe through a simple example here.
$perf record -j any_call,u,k,save_type ls
[Please refer perf-record man pages for all possible branch filter options]
$perf report -------------------------- Snip ---------------------- # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................................ ............................................ .................. # 3.52% ls [kernel.kallsyms] [k] sched_clock_noinstr [k] arch_counter_get_cntpct 16 3.52% ls [kernel.kallsyms] [k] sched_clock [k] sched_clock_noinstr 9 1.85% ls [kernel.kallsyms] [k] sched_clock_cpu [k] sched_clock 5 1.80% ls [kernel.kallsyms] [k] irqtime_account_irq [k] sched_clock_cpu 20 1.58% ls [kernel.kallsyms] [k] gic_handle_irq [k] generic_handle_domain_irq 19 1.58% ls [kernel.kallsyms] [k] call_on_irq_stack [k] gic_handle_irq 9 1.58% ls [kernel.kallsyms] [k] do_interrupt_handler [k] call_on_irq_stack 23 1.58% ls [kernel.kallsyms] [k] generic_handle_domain_irq [k] __irq_resolve_mapping 6 1.58% ls [kernel.kallsyms] [k] __irq_resolve_mapping [k] __rcu_read_lock 10 -------------------------- Snip ----------------------
$perf report -D | grep cycles -------------------------- Snip ---------------------- ..... 1: ffff800080dd3334 -> ffff800080dd759c 39 cycles P 0 IND_CALL ..... 2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles P 0 IND_CALL ..... 3: ffff800080139918 -> ffff800080ffae64 9 cycles P 0 CALL ..... 4: ffff800080dd3324 -> ffff8000801398f8 7 cycles P 0 CALL ..... 5: ffff8000800f8548 -> ffff800080dd330c 21 cycles P 0 IND_CALL ..... 6: ffff8000800f864c -> ffff8000800f84ec 6 cycles P 0 CALL ..... 7: ffff8000800f86dc -> ffff8000800f8638 11 cycles P 0 CALL ..... 8: ffff8000800f86d4 -> ffff800081008630 16 cycles P 0 CALL -------------------------- Snip ----------------------
perf script and other tooling can also be applied on the captured perf.data Similarly branch stack sampling records can be collected via direct system call i.e perf_event_open() method after setting 'struct perf_event_attr' as required.
event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_<FILTER_1> | PERF_SAMPLE_BRANCH_<FILTER_2> | PERF_SAMPLE_BRANCH_<FILTER_3> | ...............................
But all branch filters might not be supported on the platform.
----------------------- BRBE Branch Filters Support -----------------------
- Following branch filters are supported on arm64.
PERF_SAMPLE_BRANCH_USER /* Branch privilege filters */ PERF_SAMPLE_BRANCH_KERNEL PERF_SAMPLE_BRANCH_HV
PERF_SAMPLE_BRANCH_ANY /* Branch type filters */ PERF_SAMPLE_BRANCH_ANY_CALL PERF_SAMPLE_BRANCH_ANY_RETURN PERF_SAMPLE_BRANCH_IND_CALL PERF_SAMPLE_BRANCH_COND PERF_SAMPLE_BRANCH_IND_JUMP PERF_SAMPLE_BRANCH_CALL
PERF_SAMPLE_BRANCH_NO_FLAGS /* Branch record flags */ PERF_SAMPLE_BRANCH_NO_CYCLES PERF_SAMPLE_BRANCH_TYPE_SAVE PERF_SAMPLE_BRANCH_HW_INDEX PERF_SAMPLE_BRANCH_PRIV_SAVE
- Following branch filters are not supported on arm64.
PERF_SAMPLE_BRANCH_ABORT_TX PERF_SAMPLE_BRANCH_IN_TX PERF_SAMPLE_BRANCH_NO_TX PERF_SAMPLE_BRANCH_CALL_STACK
Events requesting above non-supported branch filters get rejected.
--------------------------- Virtualisation support ------------------------
- No guest support
-------------------------------- Testing ---------------------------------
- Cross compiled for both arm64 and arm32 platforms - Passes all branch tests with 'perf test branch' on arm64
Changes in V18:
- Changed BRBIDR0_EL1 register fields CC and FORMAT, updated the commit message - Replaced BRBIDR0_EL1_FORMAT_0 as BRBIDR0_EL1_FORMAT_FORMAT_0 in BRBE driver - Dropped ifdef CONFIG_ARM64_BRBE around __init_el2_brbe() - Updated in code comment around __init_el2_brbe() - Dropped the write up for EL2->EL1 transition, also moved up the EL3 write up - Unconditionally capture branch record type and privilege information - Scan valid branch stack events in armpmu_start() to create merged filter - Dropped branch_sample_type override in armv8pmu_branch_stack_add() - Dropped branch filter mismatch between PMU and event in read_branch_records() - Added SW filtering framework in read_branch_records() during filter mismatch - Added SW filtering for privilege modes - Used host_data_ptr() to access host_debug_state.brbcr_el1 register - Changed DEBUG_STATE_SAVE_BRBE to use BIT(7) - Reverted back iflags as u8
https://lore.kernel.org/all/20240405024639.1179064-1-anshuman.khandual@arm.c...
Changes in V17:
- Added back Reviewed-by tags from Mark Brown - Updated the commit message regarding the field BRBINFx_EL1_TYPE_IMPDEF_TRAP_EL3 - Added leading 0s for all values as BRBIDR0_EL1.NUMREC is a 8 bit field - Added leading 0s for all values as BRBFCR_EL1.BANK is a 2 bit field - Reordered BRBCR_EL1/BRBCR_EL12/BRBCR_EL2 registers as per sysreg encodings - Renamed s/FIRST/BANK_0 and s/SECOND/BANK_1 in BRBFCR_EL1.BANK - Renamed s/UNCOND_DIRECT/DIRECT_UNCOND in BRBINFx_EL1.TYPE - Renamed s/COND_DIRECT/DIRECT_COND in BRBINFx_EL1.TYPE - Dropped __SYS_BRBINF/__SYS_BRBSRC/__SYS_BRBTGT and their expansions - Moved all existing BRBE registers from sysreg.h header to tools/sysreg format - Updated the commit message including about sys_insn_descs[] - Changed KVM to use existing SYS_BRBSRC/TGT/INF_EL1(n) format - Moved the BRBE instructions into sys_insn_descs[] array - ARM PMUV3 changes have been moved into the BRBE driver patch instead - Moved down branch_stack_add() in armpmu_add() after event's basic checks - Added new callbacks in struct arm_pmu e.g branch_stack_[init|add|del]() - Renamed struct arm_pmu callback branch_reset() as branch_stack_reset() - Dropped the comment in armpmu_event_init() - Renamed 'pmu_hw_events' elements from 'brbe_' to more generic 'branch_' - Separated out from the BRBE driver implementation patch - Dropped the comment in __init_el2_brbe() - Updated __init_el2_brbe() with BRBCR_EL2.MPRED requirements - Updated __init_el2_brbe() with __check_hvhe() constructs - Updated booting.rst regarding MPRED, MDCR_EL3 and fine grained control - Dropped Documentation/arch/arm64/brbe.rst - Renamed armv8pmu_branch_reset() as armv8pmu_branch_stack_reset() - Separated out booting.rst and EL2 boot requirements into a new patch - Dropped process_branch_aborts() completely - Added an warning if transaction states get detected unexpectedly - Dropped enum brbe_bank_idx from the driver - Defined armv8pmu_branch_stack_init/add/del() callbacks in the driver - Changed BRBE driver to use existing SYS_BRBSRC/TGT/INF_EL1(n) format - Dropped isb() call sites in __debug_[save|restore]_brbe() - Changed to [read|write]_sysreg_el1() accessors in __debug_[save|restore]_brbe()
Changes in V16
https://lore.kernel.org/all/20240125094119.2542332-1-anshuman.khandual@arm.c...
- Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3 - Updated BRBCR_ELx[9] as field FZPSS - Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1 - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion - Renamed arm_brbe.h as arm_pmuv3_branch.h - Updated perf_sample_save_brstack()'s new argument requirements with NULL - Fixed typo (s/informations/information) in Documentation/arch/arm64/brbe.rst - Added SPDX-License-Identifier in Documentation/arch/arm64/brbe.rst - Added new PERF_SAMPLE_BRANCH_COUNTERS into BRBE_EXCLUDE_BRANCH_FILTERS - Dropped BRBFCR_EL1 and BRBCR_EL1 from enum vcpu_sysreg - Reverted back the KVM NVHE patch - use host_debug_state based 'brbcr_el1' element and dropped the previous dependency on Jame's coresight series
Changes in V15:
https://lore.kernel.org/all/20231201053906.1261704-1-anshuman.khandual@arm.c...
- Added a comment for armv8pmu_branch_probe() regarding single cpu probe - Added a text in brbe.rst regarding single cpu probe - Dropped runtime BRBE enable for setting DEBUG_STATE_SAVE_BRBE - Dropped zero_branch_stack based zero branch records mechanism - Replaced BRBFCR_EL1_DEFAULT_CONFIG with BRBFCR_EL1_CONFIG_MASK - Added BRBFCR_EL1_CONFIG_MASK masking in branch_type_to_brbfcr() - Moved BRBE helpers from arm_brbe.h into arm_brbe.c - Moved armv8_pmu_xxx() declaration inside arm_brbe.h for arm64 (CONFIG_ARM64_BRBE) - Moved armv8_pmu_xxx() stub definitions inside arm_brbe.h for arm32 (!CONFIG_ARM64_BRBE) - Included arm_brbe.h header both in arm_pmuv3.c and arm_brbe.c - Dropped BRBE custom pr_fmt() - Dropped CONFIG_PERF_EVENTS wrapping from header entries - Flush branch records when a cpu bound event follows a task bound event - Dropped BRBFCR_EL1 from __debug_save_brbe()/__debug_restore_brbe() - Always save the live SYS_BRBCR_EL1 in host context and then check if BRBE was enabled before resetting SYS_BRBCR_EL1 for the host
Changes in V14:
https://lore.kernel.org/all/20231114051329.327572-1-anshuman.khandual@arm.co...
- This series has been reorganised as suggested during V13 - There are just eight patches now i.e 5 enablement and 3 perf branch tests
- Fixed brackets problem in __SYS_BRBINFO/BRBSRC/BRBTGT() macros - Renamed the macro i.e s/__SYS_BRBINFO/__SYS_BRBINF/ - Renamed s/BRB_IALL/BRB_IALL_INSN and s/BRBE_INJ/BRB_INJ_INSN - Moved BRB_IALL_INSN and SYS_BRB_INSN instructions to sysreg patch - Changed E1BRE as ExBRE in sysreg fields inside BRBCR_ELx - Used BRBCR_ELx for defining all BRBCR_EL1, BRBCR_EL2, and BRBCR_EL12 (new)
- Folded the following three patches into a single patch i.e [PATCH 3/8]
drivers: perf: arm_pmu: Add new sched_task() callback arm64/perf: Add branch stack support in struct arm_pmu arm64/perf: Add branch stack support in struct pmu_hw_events arm64/perf: Add branch stack support in ARMV8 PMU arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack()
- All armv8pmu_branch_xxxx() stub definitions have been moved inside include/linux/perf/arm_pmuv3.h for easy access from both arm32 and arm64 - Added brbe_users, brbe_context and brbe_sample_type in struct pmu_hw_events - Added comments for all the above new elements in struct pmu_hw_events - Added branch_reset() and sched_task() callbacks - Changed and optimized branch records processing during a PMU IRQ - NO branch records get captured for event with mismatched brbe_sample_type - Branch record context is tracked from armpmu_del() & armpmu_add() - Branch record hardware is driven from armv8pmu_start() & armv8pmu_stop() - Dropped NULL check for 'pmu_ctx' inside armv8pmu_sched_task() - Moved down PERF_ATTACH_TASK_DATA assignment with a preceding comment - In conflicting branch sample type requests, first event takes precedence
- Folded the following five patches from V13 into a single patch i.e [PATCH 4/8]
arm64/perf: Enable branch stack events via FEAT_BRBE arm64/perf: Add struct brbe_regset helper functions arm64/perf: Implement branch records save on task sched out arm64/perf: Implement branch records save on PMU IRQ
- Fixed the year in copyright statement - Added Documentation/arch/arm64/brbe.rst - Updated Documentation/arch/arm64/booting.rst (BRBCR_EL2.CC for EL1 entry) - Added __init_el2_brbe() which enables branch record cycle count support - Disabled EL2 traps in __init_el2_fgt() while accessing BRBE registers and executing instructions - Changed CONFIG_ARM64_BRBE user visible description - Fixed a typo in CONFIG_ARM64_BRBE config option description text - Added BUILD_BUG_ON() co-relating BRBE_BANK_MAX_ENTRIES and MAX_BRANCH_RECORDS - Dropped arm64_create_brbe_task_ctx_kmem_cache() - Moved down comment for PERF_SAMPLE_BRANCH_KERNEL in branch_type_to_brbcr() - Renamed BRBCR_ELx_DEFAULT_CONFIG as BRBCR_ELx_CONFIG_MASK - Replaced BRBCR_ELx_DEFAULT_TS with BRBCR_ELx_TS_MASK in BRBCR_ELx_CONFIG_MASK - Replaced BRBCR_ELx_E1BRE instances with BRBCR_ELx_ExBRE
- Added BRBE specific branch stack sampling perf test patches into the series - Added a patch to prevent guest accesses into BRBE registers and instructions - Added a patch to save the BRBE host context in NVHE environment - Updated most commit messages
Changes in V13:
https://lore.kernel.org/all/20230711082455.215983-1-anshuman.khandual@arm.co... https://lore.kernel.org/all/20230622065351.1092893-1-anshuman.khandual@arm.c...
- Added branch callback stubs for aarch32 pmuv3 based platforms - Updated the comments for capture_brbe_regset() - Deleted the comments in __read_brbe_regset() - Reversed the arguments order in capture_brbe_regset() and brbe_branch_save() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset()
Changes in V12:
https://lore.kernel.org/all/20230615133239.442736-1-anshuman.khandual@arm.co...
- Replaced branch types with complete DIRECT/INDIRECT prefixes/suffixes - Replaced branch types with complete INSN/ALIGN prefixes/suffixes - Replaced return branch types as simple RET/ERET - Replaced time field GST_PHYSICAL as GUEST_PHYSICAL - Added 0 padding for BRBIDR0_EL1.NUMREC enum values - Dropped helper arm_pmu_branch_stack_supported() - Renamed armv8pmu_branch_valid() as armv8pmu_branch_attr_valid() - Separated perf_task_ctx_cache setup from arm_pmu private allocation - Collected changes to branch_records_alloc() in a single patch [5/10] - Reworked and cleaned up branch_records_alloc() - Reworked armv8pmu_branch_read() with new loop iterations in patch [6/10] - Reworked capture_brbe_regset() with new loop iterations in patch [8/10] - Updated the comment in branch_type_to_brbcr() - Fixed the comment before stitch_stored_live_entries() - Fixed BRBINFINJ_EL1 definition for VALID_FULL enum field - Factored out helper __read_brbe_regset() from capture_brbe_regset() - Dropped the helper copy_brbe_regset() - Simplified stitch_stored_live_entries() with memcpy(), memmove() - Reworked armv8pmu_probe_pmu() to bail out early with !probe.present - Rework brbe_attributes_probe() without 'struct brbe_hw_attr' - Dropped 'struct brbe_hw_attr' argument from capture_brbe_regset() - Dropped 'struct brbe_hw_attr' argument from brbe_branch_save() - Dropped arm_pmu->private and added arm_pmu->reg_trbidr instead
Changes in V11:
https://lore.kernel.org/all/20230531040428.501523-1-anshuman.khandual@arm.co...
- Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data
Changes in V10:
https://lore.kernel.org/all/20230517022410.722287-1-anshuman.khandual@arm.co...
- Rebased the series on v6.4-rc2 - Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c - Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h] - Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq()
Changes in V9:
https://lore.kernel.org/all/20230315051444.1683170-1-anshuman.khandual@arm.c...
- Fixed build problem with has_branch_stack() in arm64 header - BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields' - Renamed all BRBINF_EL1 call sites as BRBINFx_EL1 - Dropped static const char branch_filter_error_msg[] - Implemented a positive list check for BRBE supported perf branch filters - Added a comment in armv8pmu_handle_irq() - Implemented per-cpu allocation for struct branch_record records - Skipped looping through bank 1 if an invalid record is detected in bank 0 - Added comment in armv8pmu_branch_read() explaining prohibited region etc - Added comment warning about erroneously marking transactions as aborted - Replaced the first argument (perf_branch_entry) in capture_brbe_flags() - Dropped the last argument (idx) in capture_brbe_flags() - Dropped the brbcr argument from capture_brbe_flags() - Used perf_sample_save_brstack() to capture branch records for perf_sample_data - Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces - Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read() - Implemented event task context based branch records save mechanism
Changes in V8:
https://lore.kernel.org/all/20230123125956.1350336-1-anshuman.khandual@arm.c...
- Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper - Added a comment and line break before arm_pmu->private element - Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]() - Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event() - Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP() - Changed brbe_hw_attr->brbe_version from 'bool' to 'int' - Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]() - Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid() - Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings - Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags() - Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type() - Renamed brbe_valid() as brbe_record_is_complete() - Renamed brbe_source() as brbe_record_is_source_only() - Renamed brbe_target() as brbe_record_is_target_only() - Inverted checks for !brbe_record_is_[target|source]_only() for info capture - Replaced 'fetch' with 'get' in all helpers that extract field value - Dropped 'static int brbe_current_bank' optimization in select_brbe_bank() - Dropped select_brbe_bank_index() completely, added capture_branch_entry() - Process captured branch entries in two separate loops one for each BRBE bank - Moved branch_records_alloc() inside armv8pmu_probe_pmu() - Added a forward declaration for the helper has_branch_stack() - Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free() - Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call
Changes in V7:
https://lore.kernel.org/all/20230105031039.207972-1-anshuman.khandual@arm.co...
- Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event - Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header - Defined BRBFCR_EL1_DEFAULT_CONFIG in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP - Defined BRBCR_EL1_DEFAULT_TS in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS - Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr() - Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr() - Also set BRBE in paused state in armv8pmu_branch_disable() - Dropped brbe_paused(), set_brbe_paused() helpers - Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid() - Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr - Added valid_brbe_[cc, format, version]() helpers - Split a separate brbe_attributes_probe() from armv8pmu_branch_probe() - Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid() - Defined enum brbe_bank_idx with possible values for BRBE bank indices - Changed armpmu->hw_attr into armpmu->private - Added missing space in stub definition for armv8pmu_branch_valid() - Replaced both kmalloc() with kzalloc() - Added BRBE_BANK_MAX_ENTRIES - Updated comment for capture_brbe_flags() - Updated comment for struct brbe_hw_attr - Dropped space after type cast in couple of places - Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read() - Captured cpuc->branches->branch_entries[idx] in a local variable - Dropped saved_priv from armv8pmu_branch_read() - Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration - Replaced with FIELD_GET() and FIELD_PREP() wherever applicable - Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL - Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version() select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation - Reorganized brbe_valid_nr() and dropped the pr_warn() message - Changed probe sequence in brbe_attributes_probe() - Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state - Disable BRBE before disabling the PMU event counter - Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode() - Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported()
Changes in V6:
https://lore.kernel.org/linux-arm-kernel/20221208084402.863310-1-anshuman.kh...
- Restore the exception level privilege after reading the branch records - Unpause the buffer after reading the branch records - Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level - Reworked BRBE implementation and branch stack sampling support on arm pmu - BRBE implementation is now part of overall ARMV8 PMU implementation - BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/ - CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig - File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c - File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h - BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events - BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events - BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation - Added sched_task() callback into struct arm_pmu - Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes - Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu - Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events - Dropped brbe_users, brbe_context tracking in struct hw_pmu_events - Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag - armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation - Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe - Added armv8pmu_branch_reset() inside armv8pmu_branch_enable() - Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK - Dropped set_brbe_disabled() as well - Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events
Changes in V5:
https://lore.kernel.org/linux-arm-kernel/20221107062514.2851047-1-anshuman.k...
- Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01 - Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI - Changed config ARM_BRBE_PMU from 'tristate' to 'bool'
Changes in V4:
https://lore.kernel.org/all/20221017055713.451092-1-anshuman.khandual@arm.co...
- Changed ../tools/sysreg declarations as suggested - Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags - Dropped perfmon_capable() check in armpmu_event_init() - s/pr_warn_once/pr_info in armpmu_event_init() - Added brbe_format element into struct pmu_hw_events - Changed v1p1 as brbe_v1p1 in struct pmu_hw_events - Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning
Changes in V3:
https://lore.kernel.org/all/20220929075857.158358-1-anshuman.khandual@arm.co...
- Moved brbe_stack from the stack and now dynamically allocated - Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv() - Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg - Created dummy BRBINF_EL1 field definitions in tools/sysreg - Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable() - Both exception and exception return branche records are now captured only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already been checked in generic perf via perf_allow_kernel()
Changes in V2:
https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@arm.co...
- Dropped branch sample filter helpers consolidation patch from this series - Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable() - Use cached perfmon_capable() while configuring BRBE branch record filters
Changes in V1:
https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.kh...
- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers - Process new perf branch types via PERF_BR_EXTEND_ABI
Changes in RFC V2:
https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.kh...
- Added branch_sample_priv() while consolidating other branch sample filter helpers - Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc - Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5) - Added documentation for struct arm_pmu changes, updated commit message - Updated commit message for BRBE detection infrastructure patch - PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver) - Branch privilege state capture mechanism has now moved inside the driver
Changes in RFC V1:
https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khand...
Anshuman Khandual (6): arm64/sysreg: Add BRBE registers and fields KVM: arm64: Explicitly handle BRBE traps as UNDEFINED drivers: perf: arm_pmu: Add infrastructure for branch stack sampling arm64/boot: Enable EL2 requirements for BRBE drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE KVM: arm64: nvhe: Disable branch generation in nVHE guests
James Clark (3): perf: test: Speed up running brstack test on an Arm model perf: test: Remove empty lines from branch filter test output perf: test: Extend branch stack sampling test for Arm64 BRBE
Junhao He (1): arm64/perf: Enable branch stack sampling
Marc Zyngier (2): KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag KVM: arm64: Fix TRFCR_EL1/PMSCR_EL1 access in hVHE mode
Miguel Luis (2): arm64: Add missing _EL12 encodings arm64: Add missing _EL2 encodings
Documentation/arch/arm64/booting.rst | 21 + arch/arm64/configs/openeuler_defconfig | 1 + arch/arm64/include/asm/el2_setup.h | 87 +- arch/arm64/include/asm/kvm_host.h | 5 +- arch/arm64/include/asm/sysreg.h | 55 +- arch/arm64/kvm/debug.c | 5 + arch/arm64/kvm/hyp/nvhe/debug-sr.c | 43 +- arch/arm64/kvm/hyp/vhe/switch.c | 7 +- arch/arm64/kvm/sys_regs.c | 56 ++ arch/arm64/tools/sysreg | 131 +++ drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 1198 ++++++++++++++++++++++++ drivers/perf/arm_pmu.c | 42 +- drivers/perf/arm_pmuv3.c | 160 +++- drivers/perf/arm_pmuv3_branch.h | 83 ++ include/linux/perf/arm_pmu.h | 37 +- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/shell/test_brstack.sh | 57 +- tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 2 + tools/perf/tests/workloads/traploop.c | 39 + 22 files changed, 2007 insertions(+), 36 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_pmuv3_branch.h create mode 100644 tools/perf/tests/workloads/traploop.c