From: ZhangPeng zhangpeng362@huawei.com
Backport linux-6.6.8 LTS patches from upstream. git cherry-pick v6.6.7..v6.6.8~1 -s There are only 2 simple context conflicts. Build and boot test passed for arm64 & x86.
Al Viro (1): io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation
Alex Deucher (1): drm/amdgpu/sdma5.2: add begin/end_use ring callbacks
Amelie Delaunay (1): dmaengine: stm32-dma: avoid bitfield overflow assertion
Amir Goldstein (1): fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
Andrew Halaney (1): net: stmmac: Handle disabled MDIO busses from devicetree
Andrzej Kacprowski (1): accel/ivpu/37xx: Fix interrupt_clear_with_0 WA initialization
Andy Shevchenko (1): platform/x86: intel_telemetry: Fix kernel doc descriptions
Aoba K (1): HID: multitouch: Add quirk for HONOR GLO-GXXX touchpad
Ard Biesheuvel (1): efi/x86: Avoid physical KASLR on older Dell systems
Baokun Li (1): ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
Bjorn Helgaas (1): Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
Boris Burkov (3): btrfs: free qgroup reserve when ORDERED_IOERR is set btrfs: fix qgroup_free_reserved_data int overflow btrfs: don't clear qgroup reserved bit in release_folio
Brett Raye (1): HID: glorious: fix Glorious Model I HID report
Chengfeng Ye (2): atm: solos-pci: Fix potential deadlock on &cli_queue_lock atm: solos-pci: Fix potential deadlock on &tx_queue_lock
Chris Mi (2): net/mlx5e: Disable IPsec offload support if not FW steering net/mlx5e: TC, Don't offload post action rule if not supported
Christian König (1): drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
Colin Ian King (1): bcache: remove redundant assignment to variable cur_idx
Coly Li (3): bcache: avoid oversize memory allocation by small stripe_size bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc() bcache: avoid NULL checking to c->root in run_cache_set()
Dan Carpenter (1): net/mlx5: Fix a NULL vs IS_ERR() check
Dan Williams (1): cxl/hdm: Fix dpa translation locking
David Arinzon (4): net: ena: Destroy correct number of xdp queues upon failure net: ena: Fix xdp drops handling due to multibuf packets net: ena: Fix DMA syncing in XDP path when SWIOTLB is on net: ena: Fix XDP redirection error
David Hildenbrand (1): selftests/mm: cow: print ksft header before printing anything else
David Howells (2): afs: Fix refcount underflow from error handling race rxrpc: Fix some minor issues with bundle tracing
David Stevens (1): mm/shmem: fix race in shmem_undo_range w/THP
Denis Benato (2): HID: hid-asus: reset the backlight brightness level on resume HID: hid-asus: add const to read-only outgoing usb buffer
Dinghao Liu (1): qed: Fix a potential use-after-free in qed_cxt_tables_alloc
Dong Chenchen (1): net: Remove acked SYN flag from packet in the transmit queue correctly
Eduard Zingerman (1): selftests/bpf: fix bpf_loop_bench for new callback verification scheme
Fangrui Song (1): x86/speculation, objtool: Use absolute relocations for annotations
Florent Revest (1): team: Fix use-after-free when an option instance allocation fails
Frank Li (1): dmaengine: fsl-edma: fix DMA channel leak in eDMAv4
Gavin Li (1): net/mlx5e: Check netdev pointer before checking its net ns
Gergo Koteles (4): ALSA: hda/tas2781: leave hda_component in usable state ALSA: hda/tas2781: handle missing EFI calibration data ALSA: hda/tas2781: call cleanup functions only once ALSA: hda/tas2781: reset the amp before component_add
Hamish Martin (2): HID: mcp2221: Set driver data before I2C adapter add HID: mcp2221: Allow IO to start during probe
Hangyu Hua (1): fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
Hannes Reinecke (1): nvme: catch errors from nvme_configure_metadata()
Hariprasad Kelam (3): octeontx2-pf: Fix promisc mcam entry action octeontx2-af: Update RSS algorithm index octeontx2-af: Fix pause frame configuration
Hartmut Knaack (1): ALSA: hda/realtek: Apply mute LED quirk for HP15-db
Heiko Carstens (1): scripts/checkstack.pl: match all stack sizes for s390
Huacai Chen (2): LoongArch: Silence the boot warning about 'nokaslr' LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL
Hyunwoo Kim (3): atm: Fix Use-After-Free in do_vcc_ioctl net/rose: Fix Use-After-Free in rose_ioctl appletalk: Fix Use-After-Free in atalk_ioctl
Ignat Korchagin (1): kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
Igor Russkikh (1): net: atlantic: fix double free in ring reinit logic
Ioana Ciornei (2): dpaa2-switch: fix size of the dma_unmap dpaa2-switch: do not ask for MDB, VLAN and FDB replay
James Houghton (1): arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
Jan Kara (1): ext4: fix warning in ext4_dio_write_end_io()
Jani Nikula (1): drm/edid: also call add modes in EDID connector update fallback
Jason-JH.Lin (1): drm/mediatek: Add spinlock for setting vblank event in atomic_begin
Jean Delvare (1): stmmac: dwmac-loongson: Add architecture dependency
Jianbo Liu (2): net/mlx5e: Reduce eswitch mode_lock protection context net/mlx5e: Check the number of elements before walk TC rhashtable
Jiaxun Yang (1): PCI: loongson: Limit MRRS to 256
Johan Hovold (2): PCI/ASPM: Add pci_enable_link_state_locked() PCI: vmd: Fix potential deadlock when enabling ASPM
John Hubbard (1): Revert "selftests: error out if kernel header files are not yet built"
Josef Bacik (1): btrfs: do not allow non subvolume root targets for snapshot
Kai Vehmanen (2): ALSA: hda/hdmi: add force-connect quirk for NUC5CPYB ALSA: hda/hdmi: add force-connect quirks for ASUSTeK Z170 variants
Kalesh AP (1): bnxt_en: Fix wrong return value check in bnxt_close_nic()
Kelly Kane (1): r8152: add vendor/device ID pair for ASUS USB-C2500
Krister Johansen (1): fuse: share lookup state between submount and its parent
Krzysztof Kozlowski (1): soundwire: stream: fix NULL pointer dereference for multi_link
Lech Perczak (1): net: usb: qmi_wwan: claim interface 4 for ZTE MF290
Leon Romanovsky (3): net/mlx5e: Honor user choice of IPsec replay window size net/mlx5e: Ensure that IPsec sequence packet number starts from 1 net/mlx5e: Tidy up IPsec NAT-T SA discovery
Linus Torvalds (1): asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation
Maciej Żenczykowski (1): net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX
Mario Limonciello (3): HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[] drm/amd/display: Restore guard against default backlight value < 1 nit drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
Mark O'Donovan (1): nvme-auth: set explanation code for failure2 msgs
Mark Rutland (1): perf: Fix perf_event_validate_size() lockdep splat
Masahiro Yamada (2): LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi arm64: add dependency between vmlinuz.efi and Image
Michael Chan (1): bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic
Michael Walle (1): drm/mediatek: fix kernel oops if no crtc is found
Mikhail Khvainitski (1): HID: lenovo: Restrict detection of patched firmware only to USB cptkbd
Ming Lei (1): blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!"
Moshe Shemesh (2): net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work net/mlx5: Nack sync reset request when HotPlug is enabled
Namjae Jeon (1): ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
Nguyen Dinh Phi (1): nfc: virtual_ncidev: Add variable to check if ndev is running
Nikolay Kuratov (1): vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
Oliver Neukum (2): usb: aqc111: check packet for fixup for true limit HID: add ALWAYS_POLL quirk for Apple kb
Patrisious Haddad (3): net/mlx5e: Unify esw and normal IPsec status table creation/destruction RDMA/mlx5: Send events from IB driver about device affiliation state RDMA/mlx5: Change the key being sent for MPV device affiliation
Paulo Alcantara (6): smb: client: implement ->query_reparse_point() for SMB1 smb: client: introduce ->parse_reparse_point() smb: client: set correct file type from NFS reparse points smb: client: fix potential OOBs in smb2_parse_contexts() smb: client: fix NULL deref in asn1_ber_decoder() smb: client: fix OOB in smb2_query_reparse_point()
Piotr Gardocki (2): iavf: Introduce new state machines for flow director iavf: Handle ntuple on/off based on new state machines for flow director
Radu Bulie (1): net: fec: correct queue selection
Saurabh Sengar (1): x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
Sebastian Parschauer (1): HID: Add quirk for Labtec/ODDOR/aikeec handbrake
Shinas Rasheed (1): octeon_ep: explicitly test for firmware ready value
Slawomir Laba (1): iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
Sneh Shah (1): net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
Somnath Kotur (1): bnxt_en: Clear resource reservation during resume
Sreekanth Reddy (1): bnxt_en: Fix skb recycling logic in bnxt_deliver_skb()
Stanislaw Gruszka (1): accel/ivpu: Print information about used workarounds
Stefan Wahren (3): qca_debug: Prevent crash on TX ring changes qca_debug: Fix ethtool -G iface tx behavior qca_spi: Fix reset behavior
Steven Rostedt (Google) (9): eventfs: Do not allow NULL parent to eventfs_start_creating() ring-buffer: Fix memory leak of free page tracing: Update snapshot buffer on resize if it is allocated ring-buffer: Do not update before stamp when switching sub-buffers ring-buffer: Have saved event hold the entire event ring-buffer: Fix writing to the buffer with max_data_size ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs ring-buffer: Do not try to put back write_stamp ring-buffer: Have rb_time_cmpxchg() set the msb counter too
Stuart Lee (1): drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get
Tvrtko Ursulin (2): drm/i915/selftests: Fix engine reset count storage for multi-tile drm/i915: Use internal class when counting engine resets
Tyler Fanelli (1): fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
Ville Syrjälä (3): drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original drm/i915: Fix intel_atomic_setup_scalers() plane_state handling drm/i915: Fix remapped stride with CCS on ADL+
Vlad Buslov (1): net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
WANG Rui (1): LoongArch: Record pc instead of offset in la_abs relocation
Yanteng Si (1): stmmac: dwmac-loongson: Make sure MDIO is initialized before use
Yihong Cao (1): HID: apple: add Jamesdonkey and A3R to non-apple keyboards list
Yu Zhao (4): mm/mglru: fix underprotected page cache mm/mglru: try to stop at high watermarks mm/mglru: respect min_ttl_ms with memcgs mm/mglru: reclaim offlined memcgs harder
Yusong Gao (1): sign-file: Fix incorrect return values check
Zhipeng Lu (1): octeontx2-af: fix a use-after-free in rvu_nix_register_reporters
Zizhi Wo (1): ksmbd: fix memory leak in smb2_lock()
arch/arm64/Makefile | 2 +- arch/arm64/include/asm/pgtable.h | 6 + arch/loongarch/Makefile | 2 + arch/loongarch/include/asm/asmmacro.h | 3 +- arch/loongarch/include/asm/setup.h | 2 +- arch/loongarch/kernel/relocate.c | 10 +- arch/loongarch/mm/pgtable.c | 4 +- arch/riscv/Kconfig | 4 +- arch/riscv/kernel/crash_core.c | 4 +- arch/x86/hyperv/hv_init.c | 25 +- arch/x86/include/asm/alternative.h | 4 +- arch/x86/include/asm/nospec-branch.h | 4 +- block/blk-throttle.c | 2 + drivers/accel/ivpu/ivpu_drv.h | 5 + drivers/accel/ivpu/ivpu_hw_37xx.c | 17 +- drivers/accel/ivpu/ivpu_hw_40xx.c | 4 + drivers/atm/solos-pci.c | 8 +- drivers/cxl/core/hdm.c | 3 +- drivers/cxl/core/port.c | 4 +- drivers/dma/fsl-edma-common.c | 1 + drivers/dma/stm32-dma.c | 8 +- drivers/firmware/efi/libstub/x86-stub.c | 31 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 +++ .../link/protocols/link_edp_panel_control.c | 4 +- .../amd/display/modules/power/power_helpers.c | 2 + drivers/gpu/drm/drm_edid.c | 3 +- drivers/gpu/drm/i915/display/intel_fb.c | 19 +- drivers/gpu/drm/i915/display/skl_scaler.c | 2 +- drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +- drivers/gpu/drm/i915/i915_gpu_error.h | 12 +- .../gpu/drm/i915/selftests/igt_live_test.c | 9 +- .../gpu/drm/i915/selftests/igt_live_test.h | 3 +- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 14 +- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 5 +- drivers/hid/hid-apple.c | 2 + drivers/hid/hid-asus.c | 27 ++- drivers/hid/hid-glorious.c | 16 +- drivers/hid/hid-ids.h | 12 +- drivers/hid/hid-lenovo.c | 3 +- drivers/hid/hid-mcp2221.c | 4 +- drivers/hid/hid-multitouch.c | 5 + drivers/hid/hid-quirks.c | 2 + drivers/hid/i2c-hid/i2c-hid-acpi.c | 5 + drivers/infiniband/hw/mlx5/main.c | 17 ++ drivers/md/bcache/bcache.h | 1 + drivers/md/bcache/btree.c | 7 + drivers/md/bcache/super.c | 4 +- drivers/md/bcache/writeback.c | 2 +- drivers/net/ethernet/amazon/ena/ena_eth_com.c | 3 - drivers/net/ethernet/amazon/ena/ena_netdev.c | 53 +++-- .../net/ethernet/aquantia/atlantic/aq_ring.c | 5 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 38 ++- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 10 +- .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 11 +- .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c | 5 +- .../freescale/dpaa2/dpaa2-switch-flower.c | 7 +- .../ethernet/freescale/dpaa2/dpaa2-switch.c | 11 +- drivers/net/ethernet/freescale/fec_main.c | 27 +-- drivers/net/ethernet/intel/iavf/iavf.h | 1 + .../net/ethernet/intel/iavf/iavf_ethtool.c | 27 ++- drivers/net/ethernet/intel/iavf/iavf_fdir.h | 15 +- drivers/net/ethernet/intel/iavf/iavf_main.c | 179 +++++++++----- .../net/ethernet/intel/iavf/iavf_virtchnl.c | 71 +++++- .../ethernet/marvell/octeon_ep/octep_main.c | 3 +- .../net/ethernet/marvell/octeontx2/af/rpm.c | 11 +- .../marvell/octeontx2/af/rvu_devlink.c | 5 +- .../ethernet/marvell/octeontx2/af/rvu_npc.c | 55 ++++- .../ethernet/marvell/octeontx2/nic/otx2_pf.c | 25 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + .../mellanox/mlx5/core/en/tc/post_act.c | 6 + .../mellanox/mlx5/core/en_accel/ipsec.c | 56 +++-- .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 218 +++++++++++++----- .../mlx5/core/en_accel/ipsec_offload.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_main.c | 27 ++- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 25 +- .../mellanox/mlx5/core/esw/ipsec_fs.c | 154 +------------ .../mellanox/mlx5/core/esw/ipsec_fs.h | 15 -- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 35 +-- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 2 + .../mellanox/mlx5/core/eswitch_offloads.c | 54 +++-- .../ethernet/mellanox/mlx5/core/fw_reset.c | 29 +++ .../net/ethernet/mellanox/mlx5/core/main.c | 6 + drivers/net/ethernet/qlogic/qed/qed_cxt.c | 1 + drivers/net/ethernet/qualcomm/qca_debug.c | 17 +- drivers/net/ethernet/qualcomm/qca_spi.c | 20 +- drivers/net/ethernet/stmicro/stmmac/Kconfig | 2 +- .../ethernet/stmicro/stmmac/dwmac-loongson.c | 14 +- .../stmicro/stmmac/dwmac-qcom-ethqos.c | 10 + .../net/ethernet/stmicro/stmmac/stmmac_mdio.c | 6 +- drivers/net/team/team.c | 4 +- drivers/net/usb/aqc111.c | 8 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/usb/r8152.c | 1 + drivers/nfc/virtual_ncidev.c | 7 +- drivers/nvme/host/auth.c | 2 + drivers/nvme/host/core.c | 19 +- drivers/pci/controller/pci-loongson.c | 46 +++- drivers/pci/controller/vmd.c | 2 +- drivers/pci/hotplug/acpiphp_glue.c | 9 +- drivers/pci/pcie/aspm.c | 53 +++-- drivers/platform/x86/intel/telemetry/core.c | 4 +- drivers/soundwire/stream.c | 7 +- fs/afs/rxrpc.c | 2 +- fs/btrfs/delalloc-space.c | 2 +- fs/btrfs/extent_io.c | 3 +- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 16 +- fs/btrfs/ioctl.c | 9 + fs/btrfs/ordered-data.c | 11 +- fs/btrfs/qgroup.c | 25 +- fs/btrfs/qgroup.h | 4 +- fs/ext4/file.c | 14 +- fs/ext4/mballoc.c | 4 + fs/fuse/dax.c | 1 + fs/fuse/file.c | 8 +- fs/fuse/fuse_i.h | 19 +- fs/fuse/inode.c | 81 ++++++- fs/smb/client/cached_dir.c | 17 +- fs/smb/client/cifsglob.h | 14 +- fs/smb/client/cifspdu.h | 4 +- fs/smb/client/cifsproto.h | 11 +- fs/smb/client/cifssmb.c | 193 +++++++--------- fs/smb/client/inode.c | 74 ++++-- fs/smb/client/readdir.c | 6 +- fs/smb/client/smb1ops.c | 73 ++---- fs/smb/client/smb2inode.c | 2 +- fs/smb/client/smb2misc.c | 26 +-- fs/smb/client/smb2ops.c | 164 ++++++------- fs/smb/client/smb2pdu.c | 93 +++++--- fs/smb/client/smb2proto.h | 12 +- fs/smb/common/smb2pdu.h | 2 +- fs/smb/server/smb2pdu.c | 1 + fs/tracefs/inode.c | 13 +- include/asm-generic/qspinlock.h | 2 +- include/linux/mlx5/device.h | 2 + include/linux/mlx5/driver.h | 2 + include/linux/mlx5/mlx5_ifc.h | 9 +- include/linux/mm_inline.h | 23 +- include/linux/mmzone.h | 34 +-- include/linux/objtool.h | 10 +- include/linux/pci.h | 3 + include/linux/usb/r8152.h | 1 + include/net/addrconf.h | 12 +- include/net/if_inet6.h | 4 - include/net/netfilter/nf_flow_table.h | 10 + include/uapi/linux/fuse.h | 10 +- io_uring/uring_cmd.c | 2 +- kernel/Kconfig.kexec | 1 - kernel/events/core.c | 10 + kernel/trace/ring_buffer.c | 58 ++--- kernel/trace/trace.c | 4 +- mm/shmem.c | 19 +- mm/vmscan.c | 92 +++++--- mm/workingset.c | 6 +- net/appletalk/ddp.c | 9 +- net/atm/ioctl.c | 7 +- net/ipv4/tcp_output.c | 6 + net/ipv6/addrconf.c | 6 +- net/rose/af_rose.c | 4 +- net/rxrpc/conn_client.c | 7 +- net/sched/act_ct.c | 34 ++- net/vmw_vsock/virtio_transport_common.c | 2 +- scripts/checkstack.pl | 3 +- scripts/sign-file.c | 12 +- sound/pci/hda/patch_hdmi.c | 3 + sound/pci/hda/patch_realtek.c | 1 + sound/pci/hda/tas2781_hda_i2c.c | 21 +- tools/testing/selftests/Makefile | 21 +- .../selftests/bpf/progs/bpf_loop_bench.c | 13 +- tools/testing/selftests/lib.mk | 40 +--- tools/testing/selftests/mm/cow.c | 3 +- 175 files changed, 2023 insertions(+), 1214 deletions(-)
From: Kelly Kane kelly@hawknetworks.com
stable inclusion from stable-v6.6.8 commit da94fb0217e52ef10993135dd67263857b170b63 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7037d95a047cd89b1f680eed253c6ab586bef1ed ]
The ASUS USB-C2500 is an RTL8156 based 2.5G Ethernet controller.
Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter.
Signed-off-by: Kelly Kane kelly@hawknetworks.com Link: https://lore.kernel.org/r/20231203011712.6314-1-kelly@hawknetworks.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/usb/r8152.c | 1 + include/linux/usb/r8152.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 7611c406fdb9..127b34dcc5b3 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -10016,6 +10016,7 @@ static const struct usb_device_id rtl8152_table[] = { { USB_DEVICE(VENDOR_ID_NVIDIA, 0x09ff) }, { USB_DEVICE(VENDOR_ID_TPLINK, 0x0601) }, { USB_DEVICE(VENDOR_ID_DLINK, 0xb301) }, + { USB_DEVICE(VENDOR_ID_ASUS, 0x1976) }, {} };
diff --git a/include/linux/usb/r8152.h b/include/linux/usb/r8152.h index 287e9d83fb8b..33a4c146dc19 100644 --- a/include/linux/usb/r8152.h +++ b/include/linux/usb/r8152.h @@ -30,6 +30,7 @@ #define VENDOR_ID_NVIDIA 0x0955 #define VENDOR_ID_TPLINK 0x2357 #define VENDOR_ID_DLINK 0x2001 +#define VENDOR_ID_ASUS 0x0b05
#if IS_REACHABLE(CONFIG_USB_RTL8152) extern u8 rtl8152_get_version(struct usb_interface *intf);
From: Jan Kara jack@suse.cz
stable inclusion from stable-v6.6.8 commit 73dddf9858ff68b6cb6bd3193b1c53b3331a8efb bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 619f75dae2cf117b1d07f27b046b9ffb071c4685 ]
The syzbot has reported that it can hit the warning in ext4_dio_write_end_io() because i_size < i_disksize. Indeed the reproducer creates a race between DIO IO completion and truncate expanding the file and thus ext4_dio_write_end_io() sees an inconsistent inode state where i_disksize is already updated but i_size is not updated yet. Since we are careful when setting up DIO write and consider it extending (and thus performing the IO synchronously with i_rwsem held exclusively) whenever it goes past either of i_size or i_disksize, we can use the same test during IO completion without risking entering ext4_handle_inode_extension() without i_rwsem held. This way we make it obvious both i_size and i_disksize are large enough when we report DIO completion without relying on unreliable WARN_ON.
Reported-by: syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com Fixes: 91562895f803 ("ext4: properly sync file size update after O_SYNC direct IO") Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Link: https://lore.kernel.org/r/20231130095653.22679-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/ext4/file.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 33f8eb715b41..324b45b51d1f 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -353,9 +353,10 @@ static void ext4_inode_extension_cleanup(struct inode *inode, ssize_t count) return; } /* - * If i_disksize got extended due to writeback of delalloc blocks while - * the DIO was running we could fail to cleanup the orphan list in - * ext4_handle_inode_extension(). Do it now. + * If i_disksize got extended either due to writeback of delalloc + * blocks or extending truncate while the DIO was running we could fail + * to cleanup the orphan list in ext4_handle_inode_extension(). Do it + * now. */ if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) { handle_t *handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); @@ -390,10 +391,11 @@ static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size, * blocks. But the code in ext4_iomap_alloc() is careful to use * zeroed/unwritten extents if this is possible; thus we won't leave * uninitialized blocks in a file even if we didn't succeed in writing - * as much as we intended. + * as much as we intended. Also we can race with truncate or write + * expanding the file so we have to be a bit careful here. */ - WARN_ON_ONCE(i_size_read(inode) < READ_ONCE(EXT4_I(inode)->i_disksize)); - if (pos + size <= READ_ONCE(EXT4_I(inode)->i_disksize)) + if (pos + size <= READ_ONCE(EXT4_I(inode)->i_disksize) && + pos + size <= i_size_read(inode)) return size; return ext4_handle_inode_extension(inode, pos, size); }
From: Zizhi Wo wozizhi@huawei.com
stable inclusion from stable-v6.6.8 commit 809d50d36e71bbd56a2b253d94c991e82fe1d2dc bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 8f1752723019db900fb60a5b9d0dfd3a2bdea36c ]
In smb2_lock(), if setup_async_work() executes successfully, work->cancel_argv will bind the argv that generated by kmalloc(). And release_async_work() is called in ksmbd_conn_try_dequeue_request() or smb2_lock() to release argv. However, when setup_async_work function fails, work->cancel_argv has not been bound to the argv, resulting in the previously allocated argv not being released. Call kfree() to fix it.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Zizhi Wo wozizhi@huawei.com Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/server/smb2pdu.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c index 269fbfb3cd67..2b248d45d40a 100644 --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -7074,6 +7074,7 @@ int smb2_lock(struct ksmbd_work *work) smb2_remove_blocked_lock, argv); if (rc) { + kfree(argv); err = -ENOMEM; goto out; }
From: Ard Biesheuvel ardb@kernel.org
stable inclusion from stable-v6.6.8 commit 800f84d8f0de521c5681c0c6df4284b004588202 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 50d7cdf7a9b1ab6f4f74a69c84e974d5dc0c1bf1 ]
River reports boot hangs with v6.6 and v6.7, and the bisect points to commit
a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
which moves the memory allocation and kernel decompression from the legacy decompressor (which executes *after* ExitBootServices()) to the EFI stub, using boot services for allocating the memory. The memory allocation succeeds but the subsequent call to decompress_kernel() never returns, resulting in a failed boot and a hanging system.
As it turns out, this issue only occurs when physical address randomization (KASLR) is enabled, and given that this is a feature we can live without (virtual KASLR is much more important), let's disable the physical part of KASLR when booting on AMI UEFI firmware claiming to implement revision v2.0 of the specification (which was released in 2006), as this is the version these systems advertise.
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218173 Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/firmware/efi/libstub/x86-stub.c | 31 +++++++++++++++++++------ 1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 9d5df683f882..70b325a2f1f3 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -307,17 +307,20 @@ static void setup_unaccepted_memory(void) efi_err("Memory acceptance protocol failed\n"); }
+static efi_char16_t *efistub_fw_vendor(void) +{ + unsigned long vendor = efi_table_attr(efi_system_table, fw_vendor); + + return (efi_char16_t *)vendor; +} + static const efi_char16_t apple[] = L"Apple";
static void setup_quirks(struct boot_params *boot_params) { - efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long) - efi_table_attr(efi_system_table, fw_vendor); - - if (!memcmp(fw_vendor, apple, sizeof(apple))) { - if (IS_ENABLED(CONFIG_APPLE_PROPERTIES)) - retrieve_apple_device_properties(boot_params); - } + if (IS_ENABLED(CONFIG_APPLE_PROPERTIES) && + !memcmp(efistub_fw_vendor(), apple, sizeof(apple))) + retrieve_apple_device_properties(boot_params); }
/* @@ -799,11 +802,25 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) { u64 range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - kernel_total_size; + static const efi_char16_t ami[] = L"American Megatrends";
efi_get_seed(seed, sizeof(seed));
virt_addr += (range * seed[1]) >> 32; virt_addr &= ~(CONFIG_PHYSICAL_ALIGN - 1); + + /* + * Older Dell systems with AMI UEFI firmware v2.0 may hang + * while decompressing the kernel if physical address + * randomization is enabled. + * + * https://bugzilla.kernel.org/show_bug.cgi?id=218173 + */ + if (efi_system_table->hdr.revision <= EFI_2_00_SYSTEM_TABLE_REVISION && + !memcmp(efistub_fw_vendor(), ami, sizeof(ami))) { + efi_debug("AMI firmware v2.0 or older detected - disabling physical KASLR\n"); + seed[0] = 0; + } }
status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
From: David Howells dhowells@redhat.com
stable inclusion from stable-v6.6.8 commit 8715fe2fc1e8b462b29381d342ab1ecd895d2637 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 52bf9f6c09fca8c74388cd41cc24e5d1bff812a9 ]
If an AFS cell that has an unreachable (eg. ENETUNREACH) server listed (VL server or fileserver), an asynchronous probe to one of its addresses may fail immediately because sendmsg() returns an error. When this happens, a refcount underflow can happen if certain events hit a very small window.
The way this occurs is:
(1) There are two levels of "call" object, the afs_call and the rxrpc_call. Each of them can be transitioned to a "completed" state in the event of success or failure.
(2) Asynchronous afs_calls are self-referential whilst they are active to prevent them from evaporating when they're not being processed. This reference is disposed of when the afs_call is completed.
Note that an afs_call may only be completed once; once completed completing it again will do nothing.
(3) When a call transmission is made, the app-side rxrpc code queues a Tx buffer for the rxrpc I/O thread to transmit. The I/O thread invokes sendmsg() to transmit it - and in the case of failure, it transitions the rxrpc_call to the completed state.
(4) When an rxrpc_call is completed, the app layer is notified. In this case, the app is kafs and it schedules a work item to process events pertaining to an afs_call.
(5) When the afs_call event processor is run, it goes down through the RPC-specific handler to afs_extract_data() to retrieve data from rxrpc - and, in this case, it picks up the error from the rxrpc_call and returns it.
The error is then propagated to the afs_call and that is completed too. At this point the self-reference is released.
(6) If the rxrpc I/O thread manages to complete the rxrpc_call within the window between rxrpc_send_data() queuing the request packet and checking for call completion on the way out, then rxrpc_kernel_send_data() will return the error from sendmsg() to the app.
(7) Then afs_make_call() will see an error and will jump to the error handling path which will attempt to clean up the afs_call.
(8) The problem comes when the error handling path in afs_make_call() tries to unconditionally drop an async afs_call's self-reference. This self-reference, however, may already have been dropped by afs_extract_data() completing the afs_call
(9) The refcount underflows when we return to afs_do_probe_vlserver() and that tries to drop its reference on the afs_call.
Fix this by making afs_make_call() attempt to complete the afs_call rather than unconditionally putting it. That way, if afs_extract_data() manages to complete the call first, afs_make_call() won't do anything.
The bug can be forced by making do_udp_sendmsg() return -ENETUNREACH and sticking an msleep() in rxrpc_send_data() after the 'success:' label to widen the race window.
The error message looks something like:
refcount_t: underflow; use-after-free. WARNING: CPU: 3 PID: 720 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 ... RIP: 0010:refcount_warn_saturate+0xba/0x110 ... afs_put_call+0x1dc/0x1f0 [kafs] afs_fs_get_capabilities+0x8b/0xe0 [kafs] afs_fs_probe_fileserver+0x188/0x1e0 [kafs] afs_lookup_server+0x3bf/0x3f0 [kafs] afs_alloc_server_list+0x130/0x2e0 [kafs] afs_create_volume+0x162/0x400 [kafs] afs_get_tree+0x266/0x410 [kafs] vfs_get_tree+0x25/0xc0 fc_mount+0xe/0x40 afs_d_automount+0x1b3/0x390 [kafs] __traverse_mounts+0x8f/0x210 step_into+0x340/0x760 path_openat+0x13a/0x1260 do_filp_open+0xaf/0x160 do_sys_openat2+0xaf/0x170
or something like:
refcount_t: underflow; use-after-free. ... RIP: 0010:refcount_warn_saturate+0x99/0xda ... afs_put_call+0x4a/0x175 afs_send_vl_probes+0x108/0x172 afs_select_vlserver+0xd6/0x311 afs_do_cell_detect_alias+0x5e/0x1e9 afs_cell_detect_alias+0x44/0x92 afs_validate_fc+0x9d/0x134 afs_get_tree+0x20/0x2e6 vfs_get_tree+0x1d/0xc9 fc_mount+0xe/0x33 afs_d_automount+0x48/0x9d __traverse_mounts+0xe0/0x166 step_into+0x140/0x274 open_last_lookups+0x1c1/0x1df path_openat+0x138/0x1c3 do_filp_open+0x55/0xb4 do_sys_openat2+0x6c/0xb6
Fixes: 34fa47612bfe ("afs: Fix race in async call refcounting") Reported-by: Bill MacAllister bill@ca-zephyr.org Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052304 Suggested-by: Jeffrey E Altman jaltman@auristor.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Jeffrey Altman jaltman@auristor.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/2633992.1702073229@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/afs/rxrpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index ed1644e7683f..d642d06a453b 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -424,7 +424,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) if (call->async) { if (cancel_work_sync(&call->async_work)) afs_put_call(call); - afs_put_call(call); + afs_set_call_complete(call, ret, 0); }
ac->error = ret;
From: Mikhail Khvainitski me@khvoinitsky.org
stable inclusion from stable-v6.6.8 commit b89b7c7635705e0b1661e0a2a5965560ac37a0f2 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 43527a0094c10dfbf0d5a2e7979395a38de3ff65 ]
Commit 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround") introduced a regression for ThinkPad TrackPoint Keyboard II which has similar quirks to cptkbd (so it uses the same workarounds) but slightly different so that there are false-positives during detecting well-behaving firmware. This commit restricts detecting well-behaving firmware to the only model which known to have one and have stable enough quirks to not cause false-positives.
Fixes: 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround") Link: https://lore.kernel.org/linux-input/ZXRiiPsBKNasioqH@jekhomev/ Link: https://bbs.archlinux.org/viewtopic.php?pid=2135468#p2135468 Signed-off-by: Mikhail Khvainitski me@khvoinitsky.org Tested-by: Yauhen Kharuzhy jekhor@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-lenovo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-lenovo.c b/drivers/hid/hid-lenovo.c index 7c1b33be9d13..149a3c74346b 100644 --- a/drivers/hid/hid-lenovo.c +++ b/drivers/hid/hid-lenovo.c @@ -692,7 +692,8 @@ static int lenovo_event_cptkbd(struct hid_device *hdev, * so set middlebutton_state to 3 * to never apply workaround anymore */ - if (cptkbd_data->middlebutton_state == 1 && + if (hdev->product == USB_DEVICE_ID_LENOVO_CUSBKBD && + cptkbd_data->middlebutton_state == 1 && usage->type == EV_REL && (usage->code == REL_X || usage->code == REL_Y)) { cptkbd_data->middlebutton_state = 3;
From: Leon Romanovsky leonro@nvidia.com
stable inclusion from stable-v6.6.8 commit 17e600e438c6b597ac1cf8c592b2ab53c680f6e4 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a5e400a985df8041ed4659ed1462aa9134318130 ]
Users can configure IPsec replay window size, but mlx5 driver didn't honor their choice and set always 32bits. Fix assignment logic to configure right size from the beginning.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Reviewed-by: Patrisious Haddad phaddad@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/en_accel/ipsec.c | 21 +++++++++++++++++++ .../mlx5/core/en_accel/ipsec_offload.c | 2 +- include/linux/mlx5/mlx5_ifc.h | 7 +++++++ 3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 7d4ceb9b9c16..65678e89aea6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -335,6 +335,27 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, attrs->replay_esn.esn = sa_entry->esn_state.esn; attrs->replay_esn.esn_msb = sa_entry->esn_state.esn_msb; attrs->replay_esn.overlap = sa_entry->esn_state.overlap; + switch (x->replay_esn->replay_window) { + case 32: + attrs->replay_esn.replay_window = + MLX5_IPSEC_ASO_REPLAY_WIN_32BIT; + break; + case 64: + attrs->replay_esn.replay_window = + MLX5_IPSEC_ASO_REPLAY_WIN_64BIT; + break; + case 128: + attrs->replay_esn.replay_window = + MLX5_IPSEC_ASO_REPLAY_WIN_128BIT; + break; + case 256: + attrs->replay_esn.replay_window = + MLX5_IPSEC_ASO_REPLAY_WIN_256BIT; + break; + default: + WARN_ON(true); + return; + } }
attrs->dir = x->xso.dir; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c index 3245d1c9d539..55b11d8cba53 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c @@ -94,7 +94,7 @@ static void mlx5e_ipsec_packet_setup(void *obj, u32 pdn,
if (attrs->dir == XFRM_DEV_OFFLOAD_IN) { MLX5_SET(ipsec_aso, aso_ctx, window_sz, - attrs->replay_esn.replay_window / 64); + attrs->replay_esn.replay_window); MLX5_SET(ipsec_aso, aso_ctx, mode, MLX5_IPSEC_ASO_REPLAY_PROTECTION); } diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index fc3db401f8a2..f08cd1303145 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11936,6 +11936,13 @@ enum { MLX5_IPSEC_ASO_INC_SN = 0x2, };
+enum { + MLX5_IPSEC_ASO_REPLAY_WIN_32BIT = 0x0, + MLX5_IPSEC_ASO_REPLAY_WIN_64BIT = 0x1, + MLX5_IPSEC_ASO_REPLAY_WIN_128BIT = 0x2, + MLX5_IPSEC_ASO_REPLAY_WIN_256BIT = 0x3, +}; + struct mlx5_ifc_ipsec_aso_bits { u8 valid[0x1]; u8 reserved_at_201[0x1];
From: Leon Romanovsky leonro@nvidia.com
stable inclusion from stable-v6.6.8 commit 80299a1c685fff46755f336faed4b7a29cbd44fb bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3d42c8cc67a8fcbff0181f9ed6d03d353edcee07 ]
According to RFC4303, section "3.3.3. Sequence Number Generation", the first packet sent using a given SA will contain a sequence number of 1.
However if user didn't set seq/oseq, the HW used zero as first sequence packet number. Such misconfiguration causes to drop of first packet if replay window protection was enabled in SA.
To fix it, set sequence number to be at least 1.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 65678e89aea6..0d4b8aef6add 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -121,7 +121,14 @@ static bool mlx5e_ipsec_update_esn_state(struct mlx5e_ipsec_sa_entry *sa_entry) if (x->xso.type == XFRM_DEV_OFFLOAD_CRYPTO) esn_msb = xfrm_replay_seqhi(x, htonl(seq_bottom));
- sa_entry->esn_state.esn = esn; + if (sa_entry->esn_state.esn_msb) + sa_entry->esn_state.esn = esn; + else + /* According to RFC4303, section "3.3.3. Sequence Number Generation", + * the first packet sent using a given SA will contain a sequence + * number of 1. + */ + sa_entry->esn_state.esn = max_t(u32, esn, 1); sa_entry->esn_state.esn_msb = esn_msb;
if (unlikely(overlap && seq_bottom < MLX5E_IPSEC_ESN_SCOPE_MID)) {
From: Patrisious Haddad phaddad@nvidia.com
stable inclusion from stable-v6.6.8 commit 20af7afcd8b85a4cb413072d631bf9a6469eee3a bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 94af50c0a9bb961fe93cf0fdd14eb0883da86721 ]
Change normal IPsec flow to use the same creation/destruction functions for status flow table as that of ESW, which first of all refines the code to have less code duplication.
And more importantly, the ESW status table handles IPsec syndrome checks at steering by HW, which is more efficient than the previous behaviour we had where it was copied to WQE meta data and checked by the driver.
Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode") Signed-off-by: Patrisious Haddad phaddad@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 187 +++++++++++++----- .../mellanox/mlx5/core/esw/ipsec_fs.c | 152 -------------- .../mellanox/mlx5/core/esw/ipsec_fs.h | 15 -- 3 files changed, 141 insertions(+), 213 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c index 7dba4221993f..fc6aca7c05a4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c @@ -128,63 +128,166 @@ static struct mlx5_flow_table *ipsec_ft_create(struct mlx5_flow_namespace *ns, return mlx5_create_auto_grouped_flow_table(ns, &ft_attr); }
-static int ipsec_status_rule(struct mlx5_core_dev *mdev, - struct mlx5e_ipsec_rx *rx, - struct mlx5_flow_destination *dest) +static void ipsec_rx_status_drop_destroy(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx *rx) { - u8 action[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)] = {}; + mlx5_del_flow_rules(rx->status_drop.rule); + mlx5_destroy_flow_group(rx->status_drop.group); + mlx5_fc_destroy(ipsec->mdev, rx->status_drop_cnt); +} + +static void ipsec_rx_status_pass_destroy(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx *rx) +{ + mlx5_del_flow_rules(rx->status.rule); + + if (rx != ipsec->rx_esw) + return; + +#ifdef CONFIG_MLX5_ESWITCH + mlx5_chains_put_table(esw_chains(ipsec->mdev->priv.eswitch), 0, 1, 0); +#endif +} + +static int ipsec_rx_status_drop_create(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx *rx) +{ + int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + struct mlx5_flow_table *ft = rx->ft.status; + struct mlx5_core_dev *mdev = ipsec->mdev; + struct mlx5_flow_destination dest = {}; struct mlx5_flow_act flow_act = {}; - struct mlx5_modify_hdr *modify_hdr; - struct mlx5_flow_handle *fte; + struct mlx5_flow_handle *rule; + struct mlx5_fc *flow_counter; struct mlx5_flow_spec *spec; - int err; + struct mlx5_flow_group *g; + u32 *flow_group_in; + int err = 0;
+ flow_group_in = kvzalloc(inlen, GFP_KERNEL); spec = kvzalloc(sizeof(*spec), GFP_KERNEL); - if (!spec) - return -ENOMEM; + if (!flow_group_in || !spec) { + err = -ENOMEM; + goto err_out; + }
- /* Action to copy 7 bit ipsec_syndrome to regB[24:30] */ - MLX5_SET(copy_action_in, action, action_type, MLX5_ACTION_TYPE_COPY); - MLX5_SET(copy_action_in, action, src_field, MLX5_ACTION_IN_FIELD_IPSEC_SYNDROME); - MLX5_SET(copy_action_in, action, src_offset, 0); - MLX5_SET(copy_action_in, action, length, 7); - MLX5_SET(copy_action_in, action, dst_field, MLX5_ACTION_IN_FIELD_METADATA_REG_B); - MLX5_SET(copy_action_in, action, dst_offset, 24); + MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ft->max_fte - 1); + MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, ft->max_fte - 1); + g = mlx5_create_flow_group(ft, flow_group_in); + if (IS_ERR(g)) { + err = PTR_ERR(g); + mlx5_core_err(mdev, + "Failed to add ipsec rx status drop flow group, err=%d\n", err); + goto err_out; + }
- modify_hdr = mlx5_modify_header_alloc(mdev, MLX5_FLOW_NAMESPACE_KERNEL, - 1, action); + flow_counter = mlx5_fc_create(mdev, false); + if (IS_ERR(flow_counter)) { + err = PTR_ERR(flow_counter); + mlx5_core_err(mdev, + "Failed to add ipsec rx status drop rule counter, err=%d\n", err); + goto err_cnt; + }
- if (IS_ERR(modify_hdr)) { - err = PTR_ERR(modify_hdr); + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP | MLX5_FLOW_CONTEXT_ACTION_COUNT; + dest.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; + dest.counter_id = mlx5_fc_id(flow_counter); + if (rx == ipsec->rx_esw) + spec->flow_context.flow_source = MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK; + rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1); + if (IS_ERR(rule)) { + err = PTR_ERR(rule); mlx5_core_err(mdev, - "fail to alloc ipsec copy modify_header_id err=%d\n", err); - goto out_spec; + "Failed to add ipsec rx status drop rule, err=%d\n", err); + goto err_rule; }
- /* create fte */ - flow_act.action = MLX5_FLOW_CONTEXT_ACTION_MOD_HDR | - MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | + rx->status_drop.group = g; + rx->status_drop.rule = rule; + rx->status_drop_cnt = flow_counter; + + kvfree(flow_group_in); + kvfree(spec); + return 0; + +err_rule: + mlx5_fc_destroy(mdev, flow_counter); +err_cnt: + mlx5_destroy_flow_group(g); +err_out: + kvfree(flow_group_in); + kvfree(spec); + return err; +} + +static int ipsec_rx_status_pass_create(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx *rx, + struct mlx5_flow_destination *dest) +{ + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_handle *rule; + struct mlx5_flow_spec *spec; + int err; + + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); + if (!spec) + return -ENOMEM; + + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, + misc_parameters_2.ipsec_syndrome); + MLX5_SET(fte_match_param, spec->match_value, + misc_parameters_2.ipsec_syndrome, 0); + if (rx == ipsec->rx_esw) + spec->flow_context.flow_source = MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK; + spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2; + flow_act.flags = FLOW_ACT_NO_APPEND; + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT; - flow_act.modify_hdr = modify_hdr; - fte = mlx5_add_flow_rules(rx->ft.status, spec, &flow_act, dest, 2); - if (IS_ERR(fte)) { - err = PTR_ERR(fte); - mlx5_core_err(mdev, "fail to add ipsec rx err copy rule err=%d\n", err); - goto out; + rule = mlx5_add_flow_rules(rx->ft.status, spec, &flow_act, dest, 2); + if (IS_ERR(rule)) { + err = PTR_ERR(rule); + mlx5_core_warn(ipsec->mdev, + "Failed to add ipsec rx status pass rule, err=%d\n", err); + goto err_rule; }
+ rx->status.rule = rule; kvfree(spec); - rx->status.rule = fte; - rx->status.modify_hdr = modify_hdr; return 0;
-out: - mlx5_modify_header_dealloc(mdev, modify_hdr); -out_spec: +err_rule: kvfree(spec); return err; }
+static void mlx5_ipsec_rx_status_destroy(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx *rx) +{ + ipsec_rx_status_pass_destroy(ipsec, rx); + ipsec_rx_status_drop_destroy(ipsec, rx); +} + +static int mlx5_ipsec_rx_status_create(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx *rx, + struct mlx5_flow_destination *dest) +{ + int err; + + err = ipsec_rx_status_drop_create(ipsec, rx); + if (err) + return err; + + err = ipsec_rx_status_pass_create(ipsec, rx, dest); + if (err) + goto err_pass_create; + + return 0; + +err_pass_create: + ipsec_rx_status_drop_destroy(ipsec, rx); + return err; +} + static int ipsec_miss_create(struct mlx5_core_dev *mdev, struct mlx5_flow_table *ft, struct mlx5e_ipsec_miss *miss, @@ -256,12 +359,7 @@ static void rx_destroy(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec, mlx5_destroy_flow_table(rx->ft.sa); if (rx->allow_tunnel_mode) mlx5_eswitch_unblock_encap(mdev); - if (rx == ipsec->rx_esw) { - mlx5_esw_ipsec_rx_status_destroy(ipsec, rx); - } else { - mlx5_del_flow_rules(rx->status.rule); - mlx5_modify_header_dealloc(mdev, rx->status.modify_hdr); - } + mlx5_ipsec_rx_status_destroy(ipsec, rx); mlx5_destroy_flow_table(rx->ft.status);
mlx5_ipsec_fs_roce_rx_destroy(ipsec->roce, family); @@ -351,10 +449,7 @@ static int rx_create(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec,
dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; dest[1].counter_id = mlx5_fc_id(rx->fc->cnt); - if (rx == ipsec->rx_esw) - err = mlx5_esw_ipsec_rx_status_create(ipsec, rx, dest); - else - err = ipsec_status_rule(mdev, rx, dest); + err = mlx5_ipsec_rx_status_create(ipsec, rx, dest); if (err) goto err_add;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c index 095f31f380fa..13b5916b64e2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c @@ -21,158 +21,6 @@ enum { MLX5_ESW_IPSEC_TX_ESP_FT_CNT_LEVEL, };
-static void esw_ipsec_rx_status_drop_destroy(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx) -{ - mlx5_del_flow_rules(rx->status_drop.rule); - mlx5_destroy_flow_group(rx->status_drop.group); - mlx5_fc_destroy(ipsec->mdev, rx->status_drop_cnt); -} - -static void esw_ipsec_rx_status_pass_destroy(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx) -{ - mlx5_del_flow_rules(rx->status.rule); - mlx5_chains_put_table(esw_chains(ipsec->mdev->priv.eswitch), 0, 1, 0); -} - -static int esw_ipsec_rx_status_drop_create(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx) -{ - int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); - struct mlx5_flow_table *ft = rx->ft.status; - struct mlx5_core_dev *mdev = ipsec->mdev; - struct mlx5_flow_destination dest = {}; - struct mlx5_flow_act flow_act = {}; - struct mlx5_flow_handle *rule; - struct mlx5_fc *flow_counter; - struct mlx5_flow_spec *spec; - struct mlx5_flow_group *g; - u32 *flow_group_in; - int err = 0; - - flow_group_in = kvzalloc(inlen, GFP_KERNEL); - spec = kvzalloc(sizeof(*spec), GFP_KERNEL); - if (!flow_group_in || !spec) { - err = -ENOMEM; - goto err_out; - } - - MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ft->max_fte - 1); - MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, ft->max_fte - 1); - g = mlx5_create_flow_group(ft, flow_group_in); - if (IS_ERR(g)) { - err = PTR_ERR(g); - mlx5_core_err(mdev, - "Failed to add ipsec rx status drop flow group, err=%d\n", err); - goto err_out; - } - - flow_counter = mlx5_fc_create(mdev, false); - if (IS_ERR(flow_counter)) { - err = PTR_ERR(flow_counter); - mlx5_core_err(mdev, - "Failed to add ipsec rx status drop rule counter, err=%d\n", err); - goto err_cnt; - } - - flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP | MLX5_FLOW_CONTEXT_ACTION_COUNT; - dest.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; - dest.counter_id = mlx5_fc_id(flow_counter); - spec->flow_context.flow_source = MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK; - rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1); - if (IS_ERR(rule)) { - err = PTR_ERR(rule); - mlx5_core_err(mdev, - "Failed to add ipsec rx status drop rule, err=%d\n", err); - goto err_rule; - } - - rx->status_drop.group = g; - rx->status_drop.rule = rule; - rx->status_drop_cnt = flow_counter; - - kvfree(flow_group_in); - kvfree(spec); - return 0; - -err_rule: - mlx5_fc_destroy(mdev, flow_counter); -err_cnt: - mlx5_destroy_flow_group(g); -err_out: - kvfree(flow_group_in); - kvfree(spec); - return err; -} - -static int esw_ipsec_rx_status_pass_create(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx, - struct mlx5_flow_destination *dest) -{ - struct mlx5_flow_act flow_act = {}; - struct mlx5_flow_handle *rule; - struct mlx5_flow_spec *spec; - int err; - - spec = kvzalloc(sizeof(*spec), GFP_KERNEL); - if (!spec) - return -ENOMEM; - - MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, - misc_parameters_2.ipsec_syndrome); - MLX5_SET(fte_match_param, spec->match_value, - misc_parameters_2.ipsec_syndrome, 0); - spec->flow_context.flow_source = MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK; - spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2; - flow_act.flags = FLOW_ACT_NO_APPEND; - flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | - MLX5_FLOW_CONTEXT_ACTION_COUNT; - rule = mlx5_add_flow_rules(rx->ft.status, spec, &flow_act, dest, 2); - if (IS_ERR(rule)) { - err = PTR_ERR(rule); - mlx5_core_warn(ipsec->mdev, - "Failed to add ipsec rx status pass rule, err=%d\n", err); - goto err_rule; - } - - rx->status.rule = rule; - kvfree(spec); - return 0; - -err_rule: - kvfree(spec); - return err; -} - -void mlx5_esw_ipsec_rx_status_destroy(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx) -{ - esw_ipsec_rx_status_pass_destroy(ipsec, rx); - esw_ipsec_rx_status_drop_destroy(ipsec, rx); -} - -int mlx5_esw_ipsec_rx_status_create(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx, - struct mlx5_flow_destination *dest) -{ - int err; - - err = esw_ipsec_rx_status_drop_create(ipsec, rx); - if (err) - return err; - - err = esw_ipsec_rx_status_pass_create(ipsec, rx, dest); - if (err) - goto err_pass_create; - - return 0; - -err_pass_create: - esw_ipsec_rx_status_drop_destroy(ipsec, rx); - return err; -} - void mlx5_esw_ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_rx_create_attr *attr) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h index 0c90f7a8b0d3..ac9c65b89166 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h @@ -8,11 +8,6 @@ struct mlx5e_ipsec; struct mlx5e_ipsec_sa_entry;
#ifdef CONFIG_MLX5_ESWITCH -void mlx5_esw_ipsec_rx_status_destroy(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx); -int mlx5_esw_ipsec_rx_status_create(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx, - struct mlx5_flow_destination *dest); void mlx5_esw_ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_rx_create_attr *attr); int mlx5_esw_ipsec_rx_status_pass_dest_get(struct mlx5e_ipsec *ipsec, @@ -26,16 +21,6 @@ void mlx5_esw_ipsec_tx_create_attr_set(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx_create_attr *attr); void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev); #else -static inline void mlx5_esw_ipsec_rx_status_destroy(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx) {} - -static inline int mlx5_esw_ipsec_rx_status_create(struct mlx5e_ipsec *ipsec, - struct mlx5e_ipsec_rx *rx, - struct mlx5_flow_destination *dest) -{ - return -EINVAL; -} - static inline void mlx5_esw_ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_rx_create_attr *attr) {}
From: Leon Romanovsky leonro@nvidia.com
stable inclusion from stable-v6.6.8 commit 1a0d0e97a750f883b0f4ae6938346092bc41fcdb bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit c2bf84f1d1a1595dcc45fe867f0e02b331993fee ]
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones. In case they arrive to RX, the SPI and ESP are located in inner header, while the check was performed on outer header instead.
That wrong check caused to the situation where received rekeying request was missed and caused to rekey timeout, which "compensated" this failure by completing rekeying.
Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality") Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 22 ++++++++++++++----- include/linux/mlx5/mlx5_ifc.h | 2 +- 2 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c index fc6aca7c05a4..6dc60be2a697 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c @@ -974,13 +974,22 @@ static void setup_fte_esp(struct mlx5_flow_spec *spec) MLX5_SET(fte_match_param, spec->match_value, outer_headers.ip_protocol, IPPROTO_ESP); }
-static void setup_fte_spi(struct mlx5_flow_spec *spec, u32 spi) +static void setup_fte_spi(struct mlx5_flow_spec *spec, u32 spi, bool encap) { /* SPI number */ spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS;
- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, misc_parameters.outer_esp_spi); - MLX5_SET(fte_match_param, spec->match_value, misc_parameters.outer_esp_spi, spi); + if (encap) { + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, + misc_parameters.inner_esp_spi); + MLX5_SET(fte_match_param, spec->match_value, + misc_parameters.inner_esp_spi, spi); + } else { + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, + misc_parameters.outer_esp_spi); + MLX5_SET(fte_match_param, spec->match_value, + misc_parameters.outer_esp_spi, spi); + } }
static void setup_fte_no_frags(struct mlx5_flow_spec *spec) @@ -1339,8 +1348,9 @@ static int rx_add_rule(struct mlx5e_ipsec_sa_entry *sa_entry) else setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6);
- setup_fte_spi(spec, attrs->spi); - setup_fte_esp(spec); + setup_fte_spi(spec, attrs->spi, attrs->encap); + if (!attrs->encap) + setup_fte_esp(spec); setup_fte_no_frags(spec); setup_fte_upper_proto_match(spec, &attrs->upspec);
@@ -1443,7 +1453,7 @@ static int tx_add_rule(struct mlx5e_ipsec_sa_entry *sa_entry)
switch (attrs->type) { case XFRM_DEV_OFFLOAD_CRYPTO: - setup_fte_spi(spec, attrs->spi); + setup_fte_spi(spec, attrs->spi, false); setup_fte_esp(spec); setup_fte_reg_a(spec); break; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index f08cd1303145..8ac6ae79e083 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -620,7 +620,7 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 reserved_at_140[0x8]; u8 bth_dst_qp[0x18]; - u8 reserved_at_160[0x20]; + u8 inner_esp_spi[0x20]; u8 outer_esp_spi[0x20]; u8 reserved_at_1a0[0x60]; };
From: Jianbo Liu jianbol@nvidia.com
stable inclusion from stable-v6.6.8 commit 594a306461de8617a4b497b88a0c7670c4d93f2e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit baac8351f74c543896b8fd40138b7ad9365587a3 ]
Currently eswitch mode_lock is so heavy, for example, it's locked during the whole process of the mode change, which may need to hold other locks. As the mode_lock is also used by IPSec to block mode and encap change now, it is easy to cause lock dependency.
Since some of protections are also done by devlink lock, the eswitch mode_lock is not needed at those places, and thus the possibility of lockdep issue is reduced.
Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Jianbo Liu jianbol@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 9 +++-- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 35 ++++++++++------- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 2 + .../mellanox/mlx5/core/eswitch_offloads.c | 38 +++++++++++-------- 4 files changed, 52 insertions(+), 32 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c index 6dc60be2a697..03f69c485a00 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c @@ -1834,8 +1834,11 @@ static int mlx5e_ipsec_block_tc_offload(struct mlx5_core_dev *mdev) struct mlx5_eswitch *esw = mdev->priv.eswitch; int err = 0;
- if (esw) - down_write(&esw->mode_lock); + if (esw) { + err = mlx5_esw_lock(esw); + if (err) + return err; + }
if (mdev->num_block_ipsec) { err = -EBUSY; @@ -1846,7 +1849,7 @@ static int mlx5e_ipsec_block_tc_offload(struct mlx5_core_dev *mdev)
unlock: if (esw) - up_write(&esw->mode_lock); + mlx5_esw_unlock(esw);
return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 8d0b915a3121..3047d7015c52 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1463,7 +1463,7 @@ int mlx5_eswitch_enable_locked(struct mlx5_eswitch *esw, int num_vfs) { int err;
- lockdep_assert_held(&esw->mode_lock); + devl_assert_locked(priv_to_devlink(esw->dev));
if (!MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ft_support)) { esw_warn(esw->dev, "FDB is not supported, aborting ...\n"); @@ -1531,7 +1531,6 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) if (toggle_lag) mlx5_lag_disable_change(esw->dev);
- down_write(&esw->mode_lock); if (!mlx5_esw_is_fdb_created(esw)) { ret = mlx5_eswitch_enable_locked(esw, num_vfs); } else { @@ -1554,8 +1553,6 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) } }
- up_write(&esw->mode_lock); - if (toggle_lag) mlx5_lag_enable_change(esw->dev);
@@ -1569,12 +1566,11 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) return;
devl_assert_locked(priv_to_devlink(esw->dev)); - down_write(&esw->mode_lock); /* If driver is unloaded, this function is called twice by remove_one() * and mlx5_unload(). Prevent the second call. */ if (!esw->esw_funcs.num_vfs && !esw->esw_funcs.num_ec_vfs && !clear_vf) - goto unlock; + return;
esw_info(esw->dev, "Unload vfs: mode(%s), nvfs(%d), necvfs(%d), active vports(%d)\n", esw->mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS", @@ -1603,9 +1599,6 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) esw->esw_funcs.num_vfs = 0; else esw->esw_funcs.num_ec_vfs = 0; - -unlock: - up_write(&esw->mode_lock); }
/* Free resources for corresponding eswitch mode. It is called by devlink @@ -1647,10 +1640,8 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
devl_assert_locked(priv_to_devlink(esw->dev)); mlx5_lag_disable_change(esw->dev); - down_write(&esw->mode_lock); mlx5_eswitch_disable_locked(esw); esw->mode = MLX5_ESWITCH_LEGACY; - up_write(&esw->mode_lock); mlx5_lag_enable_change(esw->dev); }
@@ -2254,8 +2245,13 @@ bool mlx5_esw_hold(struct mlx5_core_dev *mdev) if (!mlx5_esw_allowed(esw)) return true;
- if (down_read_trylock(&esw->mode_lock) != 0) + if (down_read_trylock(&esw->mode_lock) != 0) { + if (esw->eswitch_operation_in_progress) { + up_read(&esw->mode_lock); + return false; + } return true; + }
return false; } @@ -2312,7 +2308,8 @@ int mlx5_esw_try_lock(struct mlx5_eswitch *esw) if (down_write_trylock(&esw->mode_lock) == 0) return -EINVAL;
- if (atomic64_read(&esw->user_count) > 0) { + if (esw->eswitch_operation_in_progress || + atomic64_read(&esw->user_count) > 0) { up_write(&esw->mode_lock); return -EBUSY; } @@ -2320,6 +2317,18 @@ int mlx5_esw_try_lock(struct mlx5_eswitch *esw) return esw->mode; }
+int mlx5_esw_lock(struct mlx5_eswitch *esw) +{ + down_write(&esw->mode_lock); + + if (esw->eswitch_operation_in_progress) { + up_write(&esw->mode_lock); + return -EBUSY; + } + + return 0; +} + /** * mlx5_esw_unlock() - Release write lock on esw mode lock * @esw: eswitch device. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 37ab66e7b403..b674b57d05aa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -383,6 +383,7 @@ struct mlx5_eswitch { struct xarray paired; struct mlx5_devcom_comp_dev *devcom; u16 enabled_ipsec_vf_count; + bool eswitch_operation_in_progress; };
void esw_offloads_disable(struct mlx5_eswitch *esw); @@ -827,6 +828,7 @@ void mlx5_esw_release(struct mlx5_core_dev *dev); void mlx5_esw_get(struct mlx5_core_dev *dev); void mlx5_esw_put(struct mlx5_core_dev *dev); int mlx5_esw_try_lock(struct mlx5_eswitch *esw); +int mlx5_esw_lock(struct mlx5_eswitch *esw); void mlx5_esw_unlock(struct mlx5_eswitch *esw);
void esw_vport_change_handle_locked(struct mlx5_vport *vport); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 88236e75fd90..bf78eeca401b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -3733,13 +3733,16 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, goto unlock; }
+ esw->eswitch_operation_in_progress = true; + up_write(&esw->mode_lock); + mlx5_eswitch_disable_locked(esw); if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) { if (mlx5_devlink_trap_get_num_active(esw->dev)) { NL_SET_ERR_MSG_MOD(extack, "Can't change mode while devlink traps are active"); err = -EOPNOTSUPP; - goto unlock; + goto skip; } err = esw_offloads_start(esw, extack); } else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) { @@ -3749,6 +3752,9 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, err = -EINVAL; }
+skip: + down_write(&esw->mode_lock); + esw->eswitch_operation_in_progress = false; unlock: mlx5_esw_unlock(esw); enable_lag: @@ -3759,16 +3765,12 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode) { struct mlx5_eswitch *esw; - int err;
esw = mlx5_devlink_eswitch_get(devlink); if (IS_ERR(esw)) return PTR_ERR(esw);
- down_read(&esw->mode_lock); - err = esw_mode_to_devlink(esw->mode, mode); - up_read(&esw->mode_lock); - return err; + return esw_mode_to_devlink(esw->mode, mode); }
static int mlx5_esw_vports_inline_set(struct mlx5_eswitch *esw, u8 mlx5_mode, @@ -3862,11 +3864,15 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode, if (err) goto out;
+ esw->eswitch_operation_in_progress = true; + up_write(&esw->mode_lock); + err = mlx5_esw_vports_inline_set(esw, mlx5_mode, extack); - if (err) - goto out; + if (!err) + esw->offloads.inline_mode = mlx5_mode;
- esw->offloads.inline_mode = mlx5_mode; + down_write(&esw->mode_lock); + esw->eswitch_operation_in_progress = false; up_write(&esw->mode_lock); return 0;
@@ -3878,16 +3884,12 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode, int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode) { struct mlx5_eswitch *esw; - int err;
esw = mlx5_devlink_eswitch_get(devlink); if (IS_ERR(esw)) return PTR_ERR(esw);
- down_read(&esw->mode_lock); - err = esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode); - up_read(&esw->mode_lock); - return err; + return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode); }
bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev) @@ -3969,6 +3971,9 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, goto unlock; }
+ esw->eswitch_operation_in_progress = true; + up_write(&esw->mode_lock); + esw_destroy_offloads_fdb_tables(esw);
esw->offloads.encap = encap; @@ -3982,6 +3987,9 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, (void)esw_create_offloads_fdb_tables(esw); }
+ down_write(&esw->mode_lock); + esw->eswitch_operation_in_progress = false; + unlock: up_write(&esw->mode_lock); return err; @@ -3996,9 +4004,7 @@ int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink, if (IS_ERR(esw)) return PTR_ERR(esw);
- down_read(&esw->mode_lock); *encap = esw->offloads.encap; - up_read(&esw->mode_lock); return 0; }
From: Jianbo Liu jianbol@nvidia.com
stable inclusion from stable-v6.6.8 commit 4a95f412b7ee000b279df4842ad227a6eadec89c bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 4e25b661f484df54b6751b65f9ea2434a3b67539 ]
After IPSec TX tables are destroyed, the flow rules in TC rhashtable, which have the destination to IPSec, are restored to the original one, the uplink.
However, when the device is in switchdev mode and unload driver with IPSec rules configured, TC rhashtable cleanup is done before IPSec cleanup, which means tc_ht->tbl is already freed when walking TC rhashtable, in order to restore the destination. So add the checking before walking to avoid unexpected behavior.
Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules") Signed-off-by: Jianbo Liu jianbol@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c index 13b5916b64e2..d5d33c3b3aa2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c @@ -152,7 +152,7 @@ void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev)
xa_for_each(&esw->offloads.vport_reps, i, rep) { rpriv = rep->rep_data[REP_ETH].priv; - if (!rpriv || !rpriv->netdev) + if (!rpriv || !rpriv->netdev || !atomic_read(&rpriv->tc_ht.nelems)) continue;
rhashtable_walk_enter(&rpriv->tc_ht, &iter);
From: Patrisious Haddad phaddad@nvidia.com
stable inclusion from stable-v6.6.8 commit fdd350fe5e1a8a87b3bda46792a72980b93e72ab bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 0d293714ac32650bfb669ceadf7cc2fad8161401 ]
Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation.
This is useful since now the EN driver can register to those event and know when a device is affiliated or not.
Signed-off-by: Patrisious Haddad phaddad@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.169529668... Signed-off-by: Leon Romanovsky leon@kernel.org Stable-dep-of: 762a55a54eec ("net/mlx5e: Disable IPsec offload support if not FW steering") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/infiniband/hw/mlx5/main.c | 17 +++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 ++++++ include/linux/mlx5/device.h | 2 ++ include/linux/mlx5/driver.h | 2 ++ 4 files changed, 27 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 5d963abb7e60..4c4233b9c8b0 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -24,6 +24,7 @@ #include <linux/mlx5/vport.h> #include <linux/mlx5/fs.h> #include <linux/mlx5/eswitch.h> +#include <linux/mlx5/driver.h> #include <linux/list.h> #include <rdma/ib_smi.h> #include <rdma/ib_umem_odp.h> @@ -3175,6 +3176,13 @@ static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev,
lockdep_assert_held(&mlx5_ib_multiport_mutex);
+ mlx5_core_mp_event_replay(ibdev->mdev, + MLX5_DRIVER_EVENT_AFFILIATION_REMOVED, + NULL); + mlx5_core_mp_event_replay(mpi->mdev, + MLX5_DRIVER_EVENT_AFFILIATION_REMOVED, + NULL); + mlx5_ib_cleanup_cong_debugfs(ibdev, port_num);
spin_lock(&port->mp.mpi_lock); @@ -3226,6 +3234,7 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev, struct mlx5_ib_multiport_info *mpi) { u32 port_num = mlx5_core_native_port_num(mpi->mdev) - 1; + u64 key; int err;
lockdep_assert_held(&mlx5_ib_multiport_mutex); @@ -3254,6 +3263,14 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev,
mlx5_ib_init_cong_debugfs(ibdev, port_num);
+ key = ibdev->ib_dev.index; + mlx5_core_mp_event_replay(mpi->mdev, + MLX5_DRIVER_EVENT_AFFILIATION_DONE, + &key); + mlx5_core_mp_event_replay(ibdev->mdev, + MLX5_DRIVER_EVENT_AFFILIATION_DONE, + &key); + return true;
unbind: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 15561965d2af..6ca91c0e8a6a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -361,6 +361,12 @@ void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *dev) } EXPORT_SYMBOL(mlx5_core_uplink_netdev_event_replay);
+void mlx5_core_mp_event_replay(struct mlx5_core_dev *dev, u32 event, void *data) +{ + mlx5_blocking_notifier_call_chain(dev, event, data); +} +EXPORT_SYMBOL(mlx5_core_mp_event_replay); + int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type, enum mlx5_cap_mode cap_mode) { diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 4d5be378fa8c..26333d602a50 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -366,6 +366,8 @@ enum mlx5_driver_event { MLX5_DRIVER_EVENT_UPLINK_NETDEV, MLX5_DRIVER_EVENT_MACSEC_SA_ADDED, MLX5_DRIVER_EVENT_MACSEC_SA_DELETED, + MLX5_DRIVER_EVENT_AFFILIATION_DONE, + MLX5_DRIVER_EVENT_AFFILIATION_REMOVED, };
enum { diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 3033bbaeac81..5ca4e085d813 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1027,6 +1027,8 @@ bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); void mlx5_core_uplink_netdev_set(struct mlx5_core_dev *mdev, struct net_device *netdev); void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *mdev);
+void mlx5_core_mp_event_replay(struct mlx5_core_dev *dev, u32 event, void *data); + void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev);
From: Chris Mi cmi@nvidia.com
stable inclusion from stable-v6.6.8 commit 7e46db5e2a311ff4b28d8a4e0a3f23d9cdf3e78b bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 762a55a54eec4217e4cec9265ab6e5d4c11b61bd ]
IPsec FDB offload can only work with FW steering as of now, disable the cap upon non FW steering.
And since the IPSec cap is dynamic now based on steering mode. Cleanup the resources if they exist instead of checking the IPsec cap again.
Fixes: edd8b295f9e2 ("Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'") Signed-off-by: Chris Mi cmi@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/en_accel/ipsec.c | 26 ++++++++----------- .../mlx5/core/en_accel/ipsec_offload.c | 8 +++++- 2 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 0d4b8aef6add..5834e47e72d8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -929,9 +929,11 @@ void mlx5e_ipsec_cleanup(struct mlx5e_priv *priv) return;
mlx5e_accel_ipsec_fs_cleanup(ipsec); - if (mlx5_ipsec_device_caps(priv->mdev) & MLX5_IPSEC_CAP_TUNNEL) + if (ipsec->netevent_nb.notifier_call) { unregister_netevent_notifier(&ipsec->netevent_nb); - if (mlx5_ipsec_device_caps(priv->mdev) & MLX5_IPSEC_CAP_PACKET_OFFLOAD) + ipsec->netevent_nb.notifier_call = NULL; + } + if (ipsec->aso) mlx5e_ipsec_aso_cleanup(ipsec); destroy_workqueue(ipsec->wq); kfree(ipsec); @@ -1040,6 +1042,12 @@ static int mlx5e_xfrm_validate_policy(struct mlx5_core_dev *mdev, } }
+ if (x->xdo.type == XFRM_DEV_OFFLOAD_PACKET && + !(mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_PACKET_OFFLOAD)) { + NL_SET_ERR_MSG_MOD(extack, "Packet offload is not supported"); + return -EINVAL; + } + return 0; }
@@ -1135,14 +1143,6 @@ static const struct xfrmdev_ops mlx5e_ipsec_xfrmdev_ops = { .xdo_dev_state_free = mlx5e_xfrm_free_state, .xdo_dev_offload_ok = mlx5e_ipsec_offload_ok, .xdo_dev_state_advance_esn = mlx5e_xfrm_advance_esn_state, -}; - -static const struct xfrmdev_ops mlx5e_ipsec_packet_xfrmdev_ops = { - .xdo_dev_state_add = mlx5e_xfrm_add_state, - .xdo_dev_state_delete = mlx5e_xfrm_del_state, - .xdo_dev_state_free = mlx5e_xfrm_free_state, - .xdo_dev_offload_ok = mlx5e_ipsec_offload_ok, - .xdo_dev_state_advance_esn = mlx5e_xfrm_advance_esn_state,
.xdo_dev_state_update_curlft = mlx5e_xfrm_update_curlft, .xdo_dev_policy_add = mlx5e_xfrm_add_policy, @@ -1160,11 +1160,7 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv)
mlx5_core_info(mdev, "mlx5e: IPSec ESP acceleration enabled\n");
- if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_PACKET_OFFLOAD) - netdev->xfrmdev_ops = &mlx5e_ipsec_packet_xfrmdev_ops; - else - netdev->xfrmdev_ops = &mlx5e_ipsec_xfrmdev_ops; - + netdev->xfrmdev_ops = &mlx5e_ipsec_xfrmdev_ops; netdev->features |= NETIF_F_HW_ESP; netdev->hw_enc_features |= NETIF_F_HW_ESP;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c index 55b11d8cba53..ce29e3172120 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c @@ -5,6 +5,8 @@ #include "en.h" #include "ipsec.h" #include "lib/crypto.h" +#include "fs_core.h" +#include "eswitch.h"
enum { MLX5_IPSEC_ASO_REMOVE_FLOW_PKT_CNT_OFFSET, @@ -37,7 +39,10 @@ u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev) MLX5_CAP_ETH(mdev, insert_trailer) && MLX5_CAP_ETH(mdev, swp)) caps |= MLX5_IPSEC_CAP_CRYPTO;
- if (MLX5_CAP_IPSEC(mdev, ipsec_full_offload)) { + if (MLX5_CAP_IPSEC(mdev, ipsec_full_offload) && + (mdev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_DMFS || + (mdev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_SMFS && + is_mdev_legacy_mode(mdev)))) { if (MLX5_CAP_FLOWTABLE_NIC_TX(mdev, reformat_add_esp_trasport) && MLX5_CAP_FLOWTABLE_NIC_RX(mdev, @@ -558,6 +563,7 @@ void mlx5e_ipsec_aso_cleanup(struct mlx5e_ipsec *ipsec) dma_unmap_single(pdev, aso->dma_addr, sizeof(aso->ctx), DMA_BIDIRECTIONAL); kfree(aso); + ipsec->aso = NULL; }
static void mlx5e_ipsec_aso_copy(struct mlx5_wqe_aso_ctrl_seg *ctrl,
From: Moshe Shemesh moshe@nvidia.com
stable inclusion from stable-v6.6.8 commit 8ce3d969348a7c7fa3469588eb1319f9f3cc0eaa bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit eab0da38912ebdad922ed0388209f7eb0a5163cd ]
Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below).
To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow.
While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too.
Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0
but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0
-> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2);
*** DEADLOCK ***
4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK>
Fixes: c90005b5f75c ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 27 ++++++++++++++++--- 2 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 86f2690c5e01..20a6bc1a234f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -818,6 +818,7 @@ enum { MLX5E_STATE_DESTROYING, MLX5E_STATE_XDP_TX_ENABLED, MLX5E_STATE_XDP_ACTIVE, + MLX5E_STATE_CHANNELS_ACTIVE, };
struct mlx5e_modify_sq_param { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index acb40770cf0c..c3961c2bbc57 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2668,6 +2668,7 @@ void mlx5e_close_channels(struct mlx5e_channels *chs) { int i;
+ ASSERT_RTNL(); if (chs->ptp) { mlx5e_ptp_close(chs->ptp); chs->ptp = NULL; @@ -2945,17 +2946,29 @@ void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) if (mlx5e_is_vport_rep(priv)) mlx5e_rep_activate_channels(priv);
+ set_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state); + mlx5e_wait_channels_min_rx_wqes(&priv->channels);
if (priv->rx_res) mlx5e_rx_res_channels_activate(priv->rx_res, &priv->channels); }
+static void mlx5e_cancel_tx_timeout_work(struct mlx5e_priv *priv) +{ + WARN_ON_ONCE(test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state)); + if (current_work() != &priv->tx_timeout_work) + cancel_work_sync(&priv->tx_timeout_work); +} + void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) { if (priv->rx_res) mlx5e_rx_res_channels_deactivate(priv->rx_res);
+ clear_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state); + mlx5e_cancel_tx_timeout_work(priv); + if (mlx5e_is_vport_rep(priv)) mlx5e_rep_deactivate_channels(priv);
@@ -4734,8 +4747,17 @@ static void mlx5e_tx_timeout_work(struct work_struct *work) struct net_device *netdev = priv->netdev; int i;
- rtnl_lock(); - mutex_lock(&priv->state_lock); + /* Take rtnl_lock to ensure no change in netdev->real_num_tx_queues + * through this flow. However, channel closing flows have to wait for + * this work to finish while holding rtnl lock too. So either get the + * lock or find that channels are being closed for other reason and + * this work is not relevant anymore. + */ + while (!rtnl_trylock()) { + if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state)) + return; + msleep(20); + }
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) goto unlock; @@ -4754,7 +4776,6 @@ static void mlx5e_tx_timeout_work(struct work_struct *work) }
unlock: - mutex_unlock(&priv->state_lock); rtnl_unlock(); }
From: Chris Mi cmi@nvidia.com
stable inclusion from stable-v6.6.8 commit b766f8b8d4d13d358ebc4d169ededd50d1468a74 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit ccbe33003b109f14c4dde2a4fca9c2a50c423601 ]
If post action is not supported, eg. ignore_flow_level is not supported, don't offload post action rule. Otherwise, will hit panic [1].
Fix it by checking if post action table is valid or not.
[1] [445537.863880] BUG: unable to handle page fault for address: ffffffffffffffb1 [445537.864617] #PF: supervisor read access in kernel mode [445537.865244] #PF: error_code(0x0000) - not-present page [445537.865860] PGD 70683a067 P4D 70683a067 PUD 70683c067 PMD 0 [445537.866497] Oops: 0000 [#1] PREEMPT SMP NOPTI [445537.867077] CPU: 19 PID: 248742 Comm: tc Kdump: loaded Tainted: G O 6.5.0+ #1 [445537.867888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [445537.868834] RIP: 0010:mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core] [445537.869635] Code: c0 0d 00 00 e8 20 96 c6 d3 48 85 c0 0f 84 e5 00 00 00 c7 83 b0 01 00 00 00 00 00 00 49 89 c5 31 c0 31 d2 66 89 83 b4 01 00 00 <49> 8b 44 24 10 83 23 df 83 8b d8 01 00 00 04 48 89 83 c0 01 00 00 [445537.871318] RSP: 0018:ffffb98741cef428 EFLAGS: 00010246 [445537.871962] RAX: 0000000000000000 RBX: ffff8df341167000 RCX: 0000000000000001 [445537.872704] RDX: 0000000000000000 RSI: ffffffff954844e1 RDI: ffffffff9546e9cb [445537.873430] RBP: ffffb98741cef448 R08: 0000000000000020 R09: 0000000000000246 [445537.874160] R10: 0000000000000000 R11: ffffffff943f73ff R12: ffffffffffffffa1 [445537.874893] R13: ffff8df36d336c20 R14: ffffffffffffffa1 R15: ffff8df341167000 [445537.875628] FS: 00007fcd6564f800(0000) GS:ffff8dfa9ea00000(0000) knlGS:0000000000000000 [445537.876425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [445537.877090] CR2: ffffffffffffffb1 CR3: 00000003b5884001 CR4: 0000000000770ee0 [445537.877832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [445537.878564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [445537.879300] PKRU: 55555554 [445537.879797] Call Trace: [445537.880263] <TASK> [445537.880713] ? show_regs+0x6e/0x80 [445537.881232] ? __die+0x29/0x70 [445537.881731] ? page_fault_oops+0x85/0x160 [445537.882276] ? search_exception_tables+0x65/0x70 [445537.882852] ? kernelmode_fixup_or_oops+0xa2/0x120 [445537.883432] ? __bad_area_nosemaphore+0x18b/0x250 [445537.884019] ? bad_area_nosemaphore+0x16/0x20 [445537.884566] ? do_kern_addr_fault+0x8b/0xa0 [445537.885105] ? exc_page_fault+0xf5/0x1c0 [445537.885623] ? asm_exc_page_fault+0x2b/0x30 [445537.886149] ? __kmem_cache_alloc_node+0x1df/0x2a0 [445537.886717] ? mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core] [445537.887431] ? mlx5e_tc_post_act_add+0x30/0x130 [mlx5_core] [445537.888172] alloc_flow_post_acts+0xfb/0x1c0 [mlx5_core] [445537.888849] parse_tc_actions+0x582/0x5c0 [mlx5_core] [445537.889505] parse_tc_fdb_actions+0xd7/0x1f0 [mlx5_core] [445537.890175] __mlx5e_add_fdb_flow+0x1ab/0x2b0 [mlx5_core] [445537.890843] mlx5e_add_fdb_flow+0x56/0x120 [mlx5_core] [445537.891491] ? debug_smp_processor_id+0x1b/0x30 [445537.892037] mlx5e_tc_add_flow+0x79/0x90 [mlx5_core] [445537.892676] mlx5e_configure_flower+0x305/0x450 [mlx5_core] [445537.893341] mlx5e_rep_setup_tc_cls_flower+0x3d/0x80 [mlx5_core] [445537.894037] mlx5e_rep_setup_tc_cb+0x5c/0xa0 [mlx5_core] [445537.894693] tc_setup_cb_add+0xdc/0x220 [445537.895177] fl_hw_replace_filter+0x15f/0x220 [cls_flower] [445537.895767] fl_change+0xe87/0x1190 [cls_flower] [445537.896302] tc_new_tfilter+0x484/0xa50
Fixes: f0da4daa3413 ("net/mlx5e: Refactor ct to use post action infrastructure") Signed-off-by: Chris Mi cmi@nvidia.com Reviewed-by: Jianbo Liu jianbol@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Reviewed-by: Automatic Verification verifier@nvidia.com Reviewed-by: Maher Sanalla msanalla@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Shachar Kagan skagan@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/en/tc/post_act.c | 6 +++++ .../net/ethernet/mellanox/mlx5/core/en_tc.c | 25 ++++++++++++++++--- 2 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c index 4e923a2874ae..86bf007fd05b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c @@ -83,6 +83,9 @@ mlx5e_tc_post_act_offload(struct mlx5e_post_act *post_act, struct mlx5_flow_spec *spec; int err;
+ if (IS_ERR(post_act)) + return PTR_ERR(post_act); + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); if (!spec) return -ENOMEM; @@ -111,6 +114,9 @@ mlx5e_tc_post_act_add(struct mlx5e_post_act *post_act, struct mlx5_flow_attr *po struct mlx5e_post_act_handle *handle; int err;
+ if (IS_ERR(post_act)) + return ERR_CAST(post_act); + handle = kzalloc(sizeof(*handle), GFP_KERNEL); if (!handle) return ERR_PTR(-ENOMEM); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index b62fd3749341..1bead98f73bf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -444,6 +444,9 @@ mlx5e_tc_add_flow_meter(struct mlx5e_priv *priv, struct mlx5e_flow_meter_handle *meter; enum mlx5e_post_meter_type type;
+ if (IS_ERR(post_act)) + return PTR_ERR(post_act); + meter = mlx5e_tc_meter_replace(priv->mdev, &attr->meter_attr.params); if (IS_ERR(meter)) { mlx5_core_err(priv->mdev, "Failed to get flow meter\n"); @@ -3736,6 +3739,20 @@ alloc_flow_post_acts(struct mlx5e_tc_flow *flow, struct netlink_ext_ack *extack) return err; }
+static int +set_branch_dest_ft(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr) +{ + struct mlx5e_post_act *post_act = get_post_action(priv); + + if (IS_ERR(post_act)) + return PTR_ERR(post_act); + + attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + attr->dest_ft = mlx5e_tc_post_act_get_ft(post_act); + + return 0; +} + static int alloc_branch_attr(struct mlx5e_tc_flow *flow, struct mlx5e_tc_act_branch_ctrl *cond, @@ -3759,8 +3776,8 @@ alloc_branch_attr(struct mlx5e_tc_flow *flow, break; case FLOW_ACTION_ACCEPT: case FLOW_ACTION_PIPE: - attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; - attr->dest_ft = mlx5e_tc_post_act_get_ft(get_post_action(flow->priv)); + if (set_branch_dest_ft(flow->priv, attr)) + goto out_err; break; case FLOW_ACTION_JUMP: if (*jump_count) { @@ -3769,8 +3786,8 @@ alloc_branch_attr(struct mlx5e_tc_flow *flow, goto out_err; } *jump_count = cond->extval; - attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; - attr->dest_ft = mlx5e_tc_post_act_get_ft(get_post_action(flow->priv)); + if (set_branch_dest_ft(flow->priv, attr)) + goto out_err; break; default: err = -EOPNOTSUPP;
From: Moshe Shemesh moshe@nvidia.com
stable inclusion from stable-v6.6.8 commit a4839771d7b9d74b98b8209967ffdbfb28ae9104 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3d7a3f2612d75de5f371a681038b089ded6667eb ]
Current sync reset flow is not supported when PCIe bridge connected directly to mlx5 device has HotPlug interrupt enabled and can be triggered on link state change event. Return nack on reset request in such case.
Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../ethernet/mellanox/mlx5/core/fw_reset.c | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c index b568988e92e3..c4e19d627da2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -325,6 +325,29 @@ static void mlx5_fw_live_patch_event(struct work_struct *work) mlx5_core_err(dev, "Failed to reload FW tracer\n"); }
+#if IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) +static int mlx5_check_hotplug_interrupt(struct mlx5_core_dev *dev) +{ + struct pci_dev *bridge = dev->pdev->bus->self; + u16 reg16; + int err; + + if (!bridge) + return -EOPNOTSUPP; + + err = pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, ®16); + if (err) + return err; + + if ((reg16 & PCI_EXP_SLTCTL_HPIE) && (reg16 & PCI_EXP_SLTCTL_DLLSCE)) { + mlx5_core_warn(dev, "FW reset is not supported as HotPlug is enabled\n"); + return -EOPNOTSUPP; + } + + return 0; +} +#endif + static int mlx5_check_dev_ids(struct mlx5_core_dev *dev, u16 dev_id) { struct pci_bus *bridge_bus = dev->pdev->bus; @@ -357,6 +380,12 @@ static bool mlx5_is_reset_now_capable(struct mlx5_core_dev *dev) return false; }
+#if IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) + err = mlx5_check_hotplug_interrupt(dev); + if (err) + return false; +#endif + err = pci_read_config_word(dev->pdev, PCI_DEVICE_ID, &dev_id); if (err) return false;
From: Gavin Li gavinl@nvidia.com
stable inclusion from stable-v6.6.8 commit ef3b2d5f21526a47067ac0ab885fd1c096fcec82 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7aaf975238c47b710fcc4eca0da1e7902a53abe2 ]
Previously, when comparing the net namespaces, the case where the netdev doesn't exist wasn't taken into account, and therefore can cause a crash. In such a case, the comparing function should return false, as there is no netdev->net to compare the devlink->net to.
Furthermore, this will result in an attempt to enter switchdev mode without a netdev to fail, and which is the desired result as there is no meaning in switchdev mode without a net device.
Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by: Gavin Li gavinl@nvidia.com Reviewed-by: Gavi Teitz gavi@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../mellanox/mlx5/core/eswitch_offloads.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index bf78eeca401b..bb8bcb448ae9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -3653,14 +3653,18 @@ static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 *mode)
static bool esw_offloads_devlink_ns_eq_netdev_ns(struct devlink *devlink) { + struct mlx5_core_dev *dev = devlink_priv(devlink); struct net *devl_net, *netdev_net; - struct mlx5_eswitch *esw; - - esw = mlx5_devlink_eswitch_nocheck_get(devlink); - netdev_net = dev_net(esw->dev->mlx5e_res.uplink_netdev); - devl_net = devlink_net(devlink); + bool ret = false;
- return net_eq(devl_net, netdev_net); + mutex_lock(&dev->mlx5e_res.uplink_netdev_lock); + if (dev->mlx5e_res.uplink_netdev) { + netdev_net = dev_net(dev->mlx5e_res.uplink_netdev); + devl_net = devlink_net(devlink); + ret = net_eq(devl_net, netdev_net); + } + mutex_unlock(&dev->mlx5e_res.uplink_netdev_lock); + return ret; }
int mlx5_eswitch_block_mode(struct mlx5_core_dev *dev)
From: Dan Carpenter dan.carpenter@linaro.org
stable inclusion from stable-v6.6.8 commit 6cb39c79bca9ed05ba9b33083a6f09211d79f05e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit ca4ef28d0ad831d2521fa2b16952f37fd9324ca3 ]
The mlx5_esw_offloads_devlink_port() function returns error pointers, not NULL.
Fixes: 7bef147a6ab6 ("net/mlx5: Don't skip vport check") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 825f9c687633..007cb167cabc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1503,7 +1503,7 @@ mlx5e_vport_vf_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
dl_port = mlx5_esw_offloads_devlink_port(dev->priv.eswitch, rpriv->rep->vport); - if (dl_port) { + if (!IS_ERR(dl_port)) { SET_NETDEV_DEVLINK_PORT(netdev, dl_port); mlx5e_rep_vnic_reporter_create(priv, dl_port); }
From: Maciej Żenczykowski maze@google.com
stable inclusion from stable-v6.6.8 commit 92d813f73f649a32cb936583030c2edcc309eb37 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit bd4a816752bab609dd6d65ae021387beb9e2ddbd ]
Lorenzo points out that we effectively clear all unknown flags from PIO when copying them to userspace in the netlink RTM_NEWPREFIX notification.
We could fix this one at a time as new flags are defined, or in one fell swoop - I choose the latter.
We could either define 6 new reserved flags (reserved1..6) and handle them individually (and rename them as new flags are defined), or we could simply copy the entire unmodified byte over - I choose the latter.
This unfortunately requires some anonymous union/struct magic, so we add a static assert on the struct size for a little extra safety.
Cc: David Ahern dsahern@kernel.org Cc: Lorenzo Colitti lorenzo@google.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski maze@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- include/net/addrconf.h | 12 ++++++++++-- include/net/if_inet6.h | 4 ---- net/ipv6/addrconf.c | 6 +----- 3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 82da55101b5a..61ebe723ee4d 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -31,17 +31,22 @@ struct prefix_info { __u8 length; __u8 prefix_len;
+ union __packed { + __u8 flags; + struct __packed { #if defined(__BIG_ENDIAN_BITFIELD) - __u8 onlink : 1, + __u8 onlink : 1, autoconf : 1, reserved : 6; #elif defined(__LITTLE_ENDIAN_BITFIELD) - __u8 reserved : 6, + __u8 reserved : 6, autoconf : 1, onlink : 1; #else #error "Please fix <asm/byteorder.h>" #endif + }; + }; __be32 valid; __be32 prefered; __be32 reserved2; @@ -49,6 +54,9 @@ struct prefix_info { struct in6_addr prefix; };
+/* rfc4861 4.6.2: IPv6 PIO is 32 bytes in size */ +static_assert(sizeof(struct prefix_info) == 32); + #include <linux/ipv6.h> #include <linux/netdevice.h> #include <net/if_inet6.h> diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h index 1199310b847b..c4a5c3d3ab65 100644 --- a/include/net/if_inet6.h +++ b/include/net/if_inet6.h @@ -22,10 +22,6 @@ #define IF_RS_SENT 0x10 #define IF_READY 0x80000000
-/* prefix flags */ -#define IF_PREFIX_ONLINK 0x01 -#define IF_PREFIX_AUTOCONF 0x02 - enum { INET6_IFADDR_STATE_PREDAD, INET6_IFADDR_STATE_DAD, diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 0b6ee962c84e..b007d098ffe2 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -6137,11 +6137,7 @@ static int inet6_fill_prefix(struct sk_buff *skb, struct inet6_dev *idev, pmsg->prefix_len = pinfo->prefix_len; pmsg->prefix_type = pinfo->type; pmsg->prefix_pad3 = 0; - pmsg->prefix_flags = 0; - if (pinfo->onlink) - pmsg->prefix_flags |= IF_PREFIX_ONLINK; - if (pinfo->autoconf) - pmsg->prefix_flags |= IF_PREFIX_AUTOCONF; + pmsg->prefix_flags = pinfo->flags;
if (nla_put(skb, PREFIX_ADDRESS, sizeof(pinfo->prefix), &pinfo->prefix)) goto nla_put_failure;
From: Stefan Wahren wahrenst@gmx.net
stable inclusion from stable-v6.6.8 commit 21b9dc814d3fe447b8f1fef67efeb7a3bc0f1502 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f4e6064c97c050bd9904925ff7d53d0c9954fc7b ]
The qca_spi driver stop and restart the SPI kernel thread (via ndo_stop & ndo_open) in case of TX ring changes. This is a big issue because it allows userspace to prevent restart of the SPI kernel thread (via signals). A subsequent change of TX ring wrongly assume a valid spi_thread pointer which result in a crash.
So prevent this by stopping the network traffic handling and temporary park the SPI thread.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20231206141222.52029-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/qualcomm/qca_debug.c | 9 ++++----- drivers/net/ethernet/qualcomm/qca_spi.c | 12 ++++++++++++ 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_debug.c b/drivers/net/ethernet/qualcomm/qca_debug.c index 6f2fa2a42770..a5445252b0c4 100644 --- a/drivers/net/ethernet/qualcomm/qca_debug.c +++ b/drivers/net/ethernet/qualcomm/qca_debug.c @@ -263,7 +263,6 @@ qcaspi_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring, struct kernel_ethtool_ringparam *kernel_ring, struct netlink_ext_ack *extack) { - const struct net_device_ops *ops = dev->netdev_ops; struct qcaspi *qca = netdev_priv(dev);
if ((ring->rx_pending) || @@ -271,14 +270,14 @@ qcaspi_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring, (ring->rx_jumbo_pending)) return -EINVAL;
- if (netif_running(dev)) - ops->ndo_stop(dev); + if (qca->spi_thread) + kthread_park(qca->spi_thread);
qca->txr.count = max_t(u32, ring->tx_pending, TX_RING_MIN_LEN); qca->txr.count = min_t(u16, qca->txr.count, TX_RING_MAX_LEN);
- if (netif_running(dev)) - ops->ndo_open(dev); + if (qca->spi_thread) + kthread_unpark(qca->spi_thread);
return 0; } diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index bec723028e96..b0fad69bb755 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -580,6 +580,18 @@ qcaspi_spi_thread(void *data) netdev_info(qca->net_dev, "SPI thread created\n"); while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); + if (kthread_should_park()) { + netif_tx_disable(qca->net_dev); + netif_carrier_off(qca->net_dev); + qcaspi_flush_tx_ring(qca); + kthread_parkme(); + if (qca->sync == QCASPI_SYNC_READY) { + netif_carrier_on(qca->net_dev); + netif_wake_queue(qca->net_dev); + } + continue; + } + if ((qca->intr_req == qca->intr_svc) && !qca->txr.skb[qca->txr.head]) schedule();
From: Stefan Wahren wahrenst@gmx.net
stable inclusion from stable-v6.6.8 commit 02296b1d8449980b81cab590717ca10743d93527 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 96a7e861d9e04d07febd3011c30cd84cd141d81f ]
After calling ethtool -g it was not possible to adjust the TX ring size again:
# ethtool -g eth1 Ring parameters for eth1: Pre-set maximums: RX: 4 RX Mini: n/a RX Jumbo: n/a TX: 10 Current hardware settings: RX: 4 RX Mini: n/a RX Jumbo: n/a TX: 10 # ethtool -G eth1 tx 8 netlink error: Invalid argument
The reason for this is that the readonly setting rx_pending get initialized and after that the range check in qcaspi_set_ringparam() fails regardless of the provided parameter. So fix this by accepting the exposed RX defaults. Instead of adding another magic number better use a new define here.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Suggested-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20231206141222.52029-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/qualcomm/qca_debug.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_debug.c b/drivers/net/ethernet/qualcomm/qca_debug.c index a5445252b0c4..1822f2ad8f0d 100644 --- a/drivers/net/ethernet/qualcomm/qca_debug.c +++ b/drivers/net/ethernet/qualcomm/qca_debug.c @@ -30,6 +30,8 @@
#define QCASPI_MAX_REGS 0x20
+#define QCASPI_RX_MAX_FRAMES 4 + static const u16 qcaspi_spi_regs[] = { SPI_REG_BFR_SIZE, SPI_REG_WRBUF_SPC_AVA, @@ -252,9 +254,9 @@ qcaspi_get_ringparam(struct net_device *dev, struct ethtool_ringparam *ring, { struct qcaspi *qca = netdev_priv(dev);
- ring->rx_max_pending = 4; + ring->rx_max_pending = QCASPI_RX_MAX_FRAMES; ring->tx_max_pending = TX_RING_MAX_LEN; - ring->rx_pending = 4; + ring->rx_pending = QCASPI_RX_MAX_FRAMES; ring->tx_pending = qca->txr.count; }
@@ -265,7 +267,7 @@ qcaspi_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring, { struct qcaspi *qca = netdev_priv(dev);
- if ((ring->rx_pending) || + if (ring->rx_pending != QCASPI_RX_MAX_FRAMES || (ring->rx_mini_pending) || (ring->rx_jumbo_pending)) return -EINVAL;
From: Stefan Wahren wahrenst@gmx.net
stable inclusion from stable-v6.6.8 commit f7dac967e17081a84b718990a85e48bc74fb9412 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 1057812d146dd658c9a9a96d869c2551150207b5 ]
In case of a reset triggered by the QCA7000 itself, the behavior of the qca_spi driver was not quite correct: - in case of a pending RX frame decoding the drop counter must be incremented and decoding state machine reseted - also the reset counter must always be incremented regardless of sync state
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20231206141222.52029-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/qualcomm/qca_spi.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index b0fad69bb755..5f3c11fb3fa2 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -620,11 +620,17 @@ qcaspi_spi_thread(void *data) if (intr_cause & SPI_INT_CPU_ON) { qcaspi_qca7k_sync(qca, QCASPI_EVENT_CPUON);
+ /* Frame decoding in progress */ + if (qca->frm_handle.state != qca->frm_handle.init) + qca->net_dev->stats.rx_dropped++; + + qcafrm_fsm_init_spi(&qca->frm_handle); + qca->stats.device_reset++; + /* not synced. */ if (qca->sync != QCASPI_SYNC_READY) continue;
- qca->stats.device_reset++; netif_wake_queue(qca->net_dev); netif_carrier_on(qca->net_dev); }
From: Somnath Kotur somnath.kotur@broadcom.com
stable inclusion from stable-v6.6.8 commit bf9ceb1633621d15e83723ec87c938e4b73b301f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9ef7c58f5abe41e6d91f37f28fe2d851ffedd92a ]
We are issuing HWRM_FUNC_RESET cmd to reset the device including all reserved resources, but not clearing the reservations within the driver struct. As a result, when the driver re-initializes as part of resume, it believes that there is no need to do any resource reservation and goes ahead and tries to allocate rings which will eventually fail beyond a certain number pre-reserved by the firmware.
Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Reviewed-by: Ajit Khaparde ajit.khaparde@broadcom.com Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20231208001658.14230-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 7551aa8068f8..4d2296f201ad 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -13897,6 +13897,8 @@ static int bnxt_resume(struct device *device) if (rc) goto resume_exit;
+ bnxt_clear_reservations(bp, true); + if (bnxt_hwrm_func_drv_rgtr(bp, NULL, 0, false)) { rc = -ENODEV; goto resume_exit;
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
stable inclusion from stable-v6.6.8 commit d8ea6b0d549bb1d7522d17124a66d6e52e8c4d9e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit aded5d1feb08e48d544845d3594d70c4d5fe6e54 ]
Receive SKBs can go through the VF-rep path or the normal path. skb_mark_for_recycle() is only called for the normal path. Fix it to do it for both paths to fix possible stalled page pool shutdown errors.
Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Reviewed-by: Vikas Gupta vikas.gupta@broadcom.com Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20231208001658.14230-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 4d2296f201ad..9f52b943fede 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1749,13 +1749,14 @@ static void bnxt_tpa_agg(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, static void bnxt_deliver_skb(struct bnxt *bp, struct bnxt_napi *bnapi, struct sk_buff *skb) { + skb_mark_for_recycle(skb); + if (skb->dev != bp->dev) { /* this packet belongs to a vf-rep */ bnxt_vf_rep_rx(bp, skb); return; } skb_record_rx_queue(skb, bnapi->index); - skb_mark_for_recycle(skb); napi_gro_receive(&bnapi->napi, skb); }
From: Kalesh AP kalesh-anakkur.purayil@broadcom.com
stable inclusion from stable-v6.6.8 commit 909f5a48bf23b0b2e5e45fc68f8b4e136c5153ac bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit bd6781c18cb5b5e5d8c5873fa9a51668e89ec76e ]
The wait_event_interruptible_timeout() function returns 0 if the timeout elapsed, -ERESTARTSYS if it was interrupted by a signal, and the remaining jiffies otherwise if the condition evaluated to true before the timeout elapsed.
Driver should have checked for zero return value instead of a positive value.
MChan: Print a warning for -ERESTARTSYS. The close operation will proceed anyway when wait_event_interruptible_timeout() returns for any reason. Since we do the close no matter what, we should not return this error code to the caller. Change bnxt_close_nic() to a void function and remove all error handling from some of the callers.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Reviewed-by: Vikas Gupta vikas.gupta@broadcom.com Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20231208001658.14230-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 +++++++------ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +- .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 11 ++--------- .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 ++++--------------- drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c | 5 ++--- 5 files changed, 16 insertions(+), 34 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 9f52b943fede..4ce34a39bb5e 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -10704,10 +10704,8 @@ static void __bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bnxt_free_mem(bp, irq_re_init); }
-int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) +void bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) { - int rc = 0; - if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) { /* If we get here, it means firmware reset is in progress * while we are trying to close. We can safely proceed with @@ -10722,15 +10720,18 @@ int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
#ifdef CONFIG_BNXT_SRIOV if (bp->sriov_cfg) { + int rc; + rc = wait_event_interruptible_timeout(bp->sriov_cfg_wait, !bp->sriov_cfg, BNXT_SRIOV_CFG_WAIT_TMO); - if (rc) - netdev_warn(bp->dev, "timeout waiting for SRIOV config operation to complete!\n"); + if (!rc) + netdev_warn(bp->dev, "timeout waiting for SRIOV config operation to complete, proceeding to close!\n"); + else if (rc < 0) + netdev_warn(bp->dev, "SRIOV config operation interrupted, proceeding to close!\n"); } #endif __bnxt_close_nic(bp, irq_re_init, link_re_init); - return rc; }
static int bnxt_close(struct net_device *dev) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 84cbcfa61bc1..ea0f47eceea7 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2362,7 +2362,7 @@ int bnxt_open_nic(struct bnxt *, bool, bool); int bnxt_half_open_nic(struct bnxt *bp); void bnxt_half_close_nic(struct bnxt *bp); void bnxt_reenable_sriov(struct bnxt *bp); -int bnxt_close_nic(struct bnxt *, bool, bool); +void bnxt_close_nic(struct bnxt *, bool, bool); void bnxt_get_ring_err_stats(struct bnxt *bp, struct bnxt_total_ring_err_stats *stats); int bnxt_dbg_hwrm_rd_reg(struct bnxt *bp, u32 reg_off, u16 num_words, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index 8b3e7697390f..9d39f194b260 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -478,15 +478,8 @@ static int bnxt_dl_reload_down(struct devlink *dl, bool netns_change, return -ENODEV; } bnxt_ulp_stop(bp); - if (netif_running(bp->dev)) { - rc = bnxt_close_nic(bp, true, true); - if (rc) { - NL_SET_ERR_MSG_MOD(extack, "Failed to close"); - dev_close(bp->dev); - rtnl_unlock(); - break; - } - } + if (netif_running(bp->dev)) + bnxt_close_nic(bp, true, true); bnxt_vf_reps_free(bp); rc = bnxt_hwrm_func_drv_unrgtr(bp); if (rc) { diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 547247d98eba..3c36dd805148 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -164,9 +164,8 @@ static int bnxt_set_coalesce(struct net_device *dev, reset_coalesce: if (test_bit(BNXT_STATE_OPEN, &bp->state)) { if (update_stats) { - rc = bnxt_close_nic(bp, true, false); - if (!rc) - rc = bnxt_open_nic(bp, true, false); + bnxt_close_nic(bp, true, false); + rc = bnxt_open_nic(bp, true, false); } else { rc = bnxt_hwrm_set_coal(bp); } @@ -955,12 +954,7 @@ static int bnxt_set_channels(struct net_device *dev, * before PF unload */ } - rc = bnxt_close_nic(bp, true, false); - if (rc) { - netdev_err(bp->dev, "Set channel failure rc :%x\n", - rc); - return rc; - } + bnxt_close_nic(bp, true, false); }
if (sh) { @@ -3737,12 +3731,7 @@ static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, bnxt_run_fw_tests(bp, test_mask, &test_results); } else { bnxt_ulp_stop(bp); - rc = bnxt_close_nic(bp, true, false); - if (rc) { - etest->flags |= ETH_TEST_FL_FAILED; - bnxt_ulp_start(bp, rc); - return; - } + bnxt_close_nic(bp, true, false); bnxt_run_fw_tests(bp, test_mask, &test_results);
buf[BNXT_MACLPBK_TEST_IDX] = 1; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c index f3886710e778..6e3da3362bd6 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c @@ -521,9 +521,8 @@ static int bnxt_hwrm_ptp_cfg(struct bnxt *bp)
if (netif_running(bp->dev)) { if (ptp->rx_filter == HWTSTAMP_FILTER_ALL) { - rc = bnxt_close_nic(bp, false, false); - if (!rc) - rc = bnxt_open_nic(bp, false, false); + bnxt_close_nic(bp, false, false); + rc = bnxt_open_nic(bp, false, false); } else { bnxt_ptp_cfg_tstamp_filters(bp); }
From: Michael Chan michael.chan@broadcom.com
stable inclusion from stable-v6.6.8 commit 9542105eb4ffe5fb498737dd93d57532b3f5ab58 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit c13e268c0768659cdaae4bfe2fb24860bcc8ddb4 ]
When the chip is configured to timestamp all receive packets, the timestamp in the RX completion is only valid if the metadata present flag is not set for packets received on the wire. In addition, internal loopback packets will never have a valid timestamp and the timestamp field will always be zero. We must exclude any 0 value in the timestamp field because there is no way to determine if it is a loopback packet or not.
Add a new function bnxt_rx_ts_valid() to check for all timestamp valid conditions.
Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20231208001658.14230-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 20 +++++++++++++++++--- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 8 +++++++- 2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 4ce34a39bb5e..f811d59fd71f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1760,6 +1760,21 @@ static void bnxt_deliver_skb(struct bnxt *bp, struct bnxt_napi *bnapi, napi_gro_receive(&bnapi->napi, skb); }
+static bool bnxt_rx_ts_valid(struct bnxt *bp, u32 flags, + struct rx_cmp_ext *rxcmp1, u32 *cmpl_ts) +{ + u32 ts = le32_to_cpu(rxcmp1->rx_cmp_timestamp); + + if (BNXT_PTP_RX_TS_VALID(flags)) + goto ts_valid; + if (!bp->ptp_all_rx_tstamp || !ts || !BNXT_ALL_RX_TS_VALID(flags)) + return false; + +ts_valid: + *cmpl_ts = ts; + return true; +} + /* returns the following: * 1 - 1 packet successfully received * 0 - successful TPA_START, packet not completed yet @@ -1785,6 +1800,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, struct sk_buff *skb; struct xdp_buff xdp; u32 flags, misc; + u32 cmpl_ts; void *data; int rc = 0;
@@ -2007,10 +2023,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, } }
- if (unlikely((flags & RX_CMP_FLAGS_ITYPES_MASK) == - RX_CMP_FLAGS_ITYPE_PTP_W_TS) || bp->ptp_all_rx_tstamp) { + if (bnxt_rx_ts_valid(bp, flags, rxcmp1, &cmpl_ts)) { if (bp->flags & BNXT_FLAG_CHIP_P5) { - u32 cmpl_ts = le32_to_cpu(rxcmp1->rx_cmp_timestamp); u64 ns, ts;
if (!bnxt_get_rx_ts_p5(bp, &ts, cmpl_ts)) { diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index ea0f47eceea7..0116f67593e3 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -161,7 +161,7 @@ struct rx_cmp { #define RX_CMP_FLAGS_ERROR (1 << 6) #define RX_CMP_FLAGS_PLACEMENT (7 << 7) #define RX_CMP_FLAGS_RSS_VALID (1 << 10) - #define RX_CMP_FLAGS_UNUSED (1 << 11) + #define RX_CMP_FLAGS_PKT_METADATA_PRESENT (1 << 11) #define RX_CMP_FLAGS_ITYPES_SHIFT 12 #define RX_CMP_FLAGS_ITYPES_MASK 0xf000 #define RX_CMP_FLAGS_ITYPE_UNKNOWN (0 << 12) @@ -188,6 +188,12 @@ struct rx_cmp { __le32 rx_cmp_rss_hash; };
+#define BNXT_PTP_RX_TS_VALID(flags) \ + (((flags) & RX_CMP_FLAGS_ITYPES_MASK) == RX_CMP_FLAGS_ITYPE_PTP_W_TS) + +#define BNXT_ALL_RX_TS_VALID(flags) \ + !((flags) & RX_CMP_FLAGS_PKT_METADATA_PRESENT) + #define RX_CMP_HASH_VALID(rxcmp) \ ((rxcmp)->rx_cmp_len_flags_type & cpu_to_le32(RX_CMP_FLAGS_RSS_VALID))
From: Chengfeng Ye dg573847474@gmail.com
stable inclusion from stable-v6.6.8 commit 4faf39c4252ae5eb5bdacc92c85e087f240dd3e0 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit d5dba32b8f6cb39be708b726044ba30dbc088b30 ]
As &card->cli_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock.
<deadlock #1> console_show() --> spin_lock(&card->cli_queue_lock) <interrupt> --> solos_bh() --> spin_lock(&card->cli_queue_lock)
This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock.
To prevent the potential deadlock, the patch uses spin_lock_bh() on the card->cli_queue_lock under process context code consistently to prevent the possible deadlock scenario.
Fixes: 9c54004ea717 ("atm: Driver for Solos PCI ADSL2+ card.") Signed-off-by: Chengfeng Ye dg573847474@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/atm/solos-pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c index 94fbc3abe60e..95f768b28a5e 100644 --- a/drivers/atm/solos-pci.c +++ b/drivers/atm/solos-pci.c @@ -449,9 +449,9 @@ static ssize_t console_show(struct device *dev, struct device_attribute *attr, struct sk_buff *skb; unsigned int len;
- spin_lock(&card->cli_queue_lock); + spin_lock_bh(&card->cli_queue_lock); skb = skb_dequeue(&card->cli_queue[SOLOS_CHAN(atmdev)]); - spin_unlock(&card->cli_queue_lock); + spin_unlock_bh(&card->cli_queue_lock); if(skb == NULL) return sprintf(buf, "No data.\n");
From: Chengfeng Ye dg573847474@gmail.com
stable inclusion from stable-v6.6.8 commit 82102501e08e3ce28ee52f8df97d2ac98a3cea51 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 15319a4e8ee4b098118591c6ccbd17237f841613 ]
As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock.
<deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock)
This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock.
To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario.
Fixes: 213e85d38912 ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye dg573847474@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/atm/solos-pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c index 95f768b28a5e..d3c30a28c410 100644 --- a/drivers/atm/solos-pci.c +++ b/drivers/atm/solos-pci.c @@ -956,14 +956,14 @@ static void pclose(struct atm_vcc *vcc) struct pkt_hdr *header;
/* Remove any yet-to-be-transmitted packets from the pending queue */ - spin_lock(&card->tx_queue_lock); + spin_lock_bh(&card->tx_queue_lock); skb_queue_walk_safe(&card->tx_queue[port], skb, tmpskb) { if (SKB_CB(skb)->vcc == vcc) { skb_unlink(skb, &card->tx_queue[port]); solos_pop(vcc, skb); } } - spin_unlock(&card->tx_queue_lock); + spin_unlock_bh(&card->tx_queue_lock);
skb = alloc_skb(sizeof(*header), GFP_KERNEL); if (!skb) {
From: Radu Bulie radu-andrei.bulie@nxp.com
stable inclusion from stable-v6.6.8 commit 4317fba45ff3eb5a4ce31d7781b6eb3e71abcca4 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9fc95fe95c3e2a63ced8eeca4b256518ab204b63 ]
The old implementation extracted VLAN TCI info from the payload before the VLAN tag has been pushed in the payload.
Another problem was that the VLAN TCI was extracted even if the packet did not have VLAN protocol header.
This resulted in invalid VLAN TCI and as a consequence a random queue was computed.
This patch fixes the above issues and use the VLAN TCI from the skb if it is present or VLAN TCI from payload if present. If no VLAN header is present queue 0 is selected.
Fixes: 52c4a1a85f4b ("net: fec: add ndo_select_queue to fix TX bandwidth fluctuations") Signed-off-by: Radu Bulie radu-andrei.bulie@nxp.com Signed-off-by: Wei Fang wei.fang@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/freescale/fec_main.c | 27 +++++++++-------------- 1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 77c8e9cfb445..35c95f07fd6d 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3710,31 +3710,26 @@ static int fec_set_features(struct net_device *netdev, return 0; }
-static u16 fec_enet_get_raw_vlan_tci(struct sk_buff *skb) -{ - struct vlan_ethhdr *vhdr; - unsigned short vlan_TCI = 0; - - if (skb->protocol == htons(ETH_P_ALL)) { - vhdr = (struct vlan_ethhdr *)(skb->data); - vlan_TCI = ntohs(vhdr->h_vlan_TCI); - } - - return vlan_TCI; -} - static u16 fec_enet_select_queue(struct net_device *ndev, struct sk_buff *skb, struct net_device *sb_dev) { struct fec_enet_private *fep = netdev_priv(ndev); - u16 vlan_tag; + u16 vlan_tag = 0;
if (!(fep->quirks & FEC_QUIRK_HAS_AVB)) return netdev_pick_tx(ndev, skb, NULL);
- vlan_tag = fec_enet_get_raw_vlan_tci(skb); - if (!vlan_tag) + /* VLAN is present in the payload.*/ + if (eth_type_vlan(skb->protocol)) { + struct vlan_ethhdr *vhdr = skb_vlan_eth_hdr(skb); + + vlan_tag = ntohs(vhdr->h_vlan_TCI); + /* VLAN is present in the skb but not yet pushed in the payload.*/ + } else if (skb_vlan_tag_present(skb)) { + vlan_tag = skb->vlan_tci; + } else { return vlan_tag; + }
return fec_enet_vlan_pri_to_queue[vlan_tag >> 13]; }
From: Zhipeng Lu alexious@zju.edu.cn
stable inclusion from stable-v6.6.8 commit dd75adfdc2865724c156b180c833ea63650c54a7 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 28a7cb045ab700de5554193a1642917602787784 ]
The rvu_dl will be freed in rvu_nix_health_reporters_destroy(rvu_dl) after the create_workqueue fails, and after that free, the rvu_dl will be translate back through the following call chain:
rvu_nix_health_reporters_destroy |-> rvu_nix_health_reporters_create |-> rvu_health_reporters_create |-> rvu_register_dl (label err_dl_health)
Finally. in the err_dl_health label, rvu_dl being freed again in rvu_health_reporters_destroy(rvu) by rvu_nix_health_reporters_destroy. In the second calls of rvu_nix_health_reporters_destroy, however, it uses rvu_dl->rvu_nix_health_reporter, which is already freed at the end of rvu_nix_health_reporters_destroy in the first call.
So this patch prevents the first destroy by instantly returning -ENONMEN when create_workqueue fails. In addition, since the failure of create_workqueue is the only entrence of label err, it has been integrated into the error-handling path of create_workqueue.
Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 058f75dc4c8a..bffe04e6d025 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -642,7 +642,7 @@ static int rvu_nix_register_reporters(struct rvu_devlink *rvu_dl)
rvu_dl->devlink_wq = create_workqueue("rvu_devlink_wq"); if (!rvu_dl->devlink_wq) - goto err; + return -ENOMEM;
INIT_WORK(&rvu_reporters->intr_work, rvu_nix_intr_work); INIT_WORK(&rvu_reporters->gen_work, rvu_nix_gen_work); @@ -650,9 +650,6 @@ static int rvu_nix_register_reporters(struct rvu_devlink *rvu_dl) INIT_WORK(&rvu_reporters->ras_work, rvu_nix_ras_work);
return 0; -err: - rvu_nix_health_reporters_destroy(rvu_dl); - return -ENOMEM; }
static int rvu_nix_health_reporters_create(struct rvu_devlink *rvu_dl)
From: Vlad Buslov vladbu@nvidia.com
stable inclusion from stable-v6.6.8 commit 15f300ed1d5e21ac85ab63de504dc246839fdf12 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 125f1c7f26ffcdbf96177abe75b70c1a6ceb17bc ]
The referenced change added custom cleanup code to act_ct to delete any callbacks registered on the parent block when deleting the tcf_ct_flow_table instance. However, the underlying issue is that the drivers don't obtain the reference to the tcf_ct_flow_table instance when registering callbacks which means that not only driver callbacks may still be on the table when deleting it but also that the driver can still have pointers to its internal nf_flowtable and can use it concurrently which results either warning in netfilter[0] or use-after-free.
Fix the issue by taking a reference to the underlying struct tcf_ct_flow_table instance when registering the callback and release the reference when unregistering. Expose new API required for such reference counting by adding two new callbacks to nf_flowtable_type and implementing them for act_ct flowtable_ct type. This fixes the issue by extending the lifetime of nf_flowtable until all users have unregistered.
[0]: [106170.938634] ------------[ cut here ]------------ [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis try overlay mlx5_core [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1 [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00 [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202 [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48 [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000 [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700 [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58 [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000 [106170.951616] FS: 0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000 [106170.952329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0 [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106170.954766] Call Trace: [106170.955057] <TASK> [106170.955315] ? __warn+0x79/0x120 [106170.955648] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.956172] ? report_bug+0x17c/0x190 [106170.956537] ? handle_bug+0x3c/0x60 [106170.956891] ? exc_invalid_op+0x14/0x70 [106170.957264] ? asm_exc_invalid_op+0x16/0x20 [106170.957666] ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core] [106170.958172] ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core] [106170.958788] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.959339] ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core] [106170.959854] ? mapping_remove+0x154/0x1d0 [mlx5_core] [106170.960342] ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core] [106170.960927] mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core] [106170.961441] mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core] [106170.962001] mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core] [106170.962524] mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core] [106170.963034] mlx5e_flow_put+0x73/0xe0 [mlx5_core] [106170.963506] mlx5e_put_flow_list+0x38/0x70 [mlx5_core] [106170.964002] mlx5e_rep_update_flows+0xec/0x290 [mlx5_core] [106170.964525] mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core] [106170.965056] process_one_work+0x13a/0x2c0 [106170.965443] worker_thread+0x2e5/0x3f0 [106170.965808] ? rescuer_thread+0x410/0x410 [106170.966192] kthread+0xc6/0xf0 [106170.966515] ? kthread_complete_and_exit+0x20/0x20 [106170.966970] ret_from_fork+0x2d/0x50 [106170.967332] ? kthread_complete_and_exit+0x20/0x20 [106170.967774] ret_from_fork_asm+0x11/0x20 [106170.970466] </TASK> [106170.970726] ---[ end trace 0000000000000000 ]---
Fixes: 77ac5e40c44e ("net/sched: act_ct: remove and free nf_table callbacks") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Paul Blakey paulb@nvidia.com Acked-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- include/net/netfilter/nf_flow_table.h | 10 ++++++++ net/sched/act_ct.c | 34 ++++++++++++++++++++++----- 2 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index fe1507c1db82..692d5955911c 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -62,6 +62,8 @@ struct nf_flowtable_type { enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule); void (*free)(struct nf_flowtable *ft); + void (*get)(struct nf_flowtable *ft); + void (*put)(struct nf_flowtable *ft); nf_hookfn *hook; struct module *owner; }; @@ -240,6 +242,11 @@ nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table, }
list_add_tail(&block_cb->list, &block->cb_list); + up_write(&flow_table->flow_block_lock); + + if (flow_table->type->get) + flow_table->type->get(flow_table); + return 0;
unlock: up_write(&flow_table->flow_block_lock); @@ -262,6 +269,9 @@ nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table, WARN_ON(true); } up_write(&flow_table->flow_block_lock); + + if (flow_table->type->put) + flow_table->type->put(flow_table); }
void flow_offload_route_init(struct flow_offload *flow, diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 6dcc4585576e..dd710fb9f490 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -286,9 +286,31 @@ static bool tcf_ct_flow_is_outdated(const struct flow_offload *flow) !test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags); }
+static void tcf_ct_flow_table_get_ref(struct tcf_ct_flow_table *ct_ft); + +static void tcf_ct_nf_get(struct nf_flowtable *ft) +{ + struct tcf_ct_flow_table *ct_ft = + container_of(ft, struct tcf_ct_flow_table, nf_ft); + + tcf_ct_flow_table_get_ref(ct_ft); +} + +static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft); + +static void tcf_ct_nf_put(struct nf_flowtable *ft) +{ + struct tcf_ct_flow_table *ct_ft = + container_of(ft, struct tcf_ct_flow_table, nf_ft); + + tcf_ct_flow_table_put(ct_ft); +} + static struct nf_flowtable_type flowtable_ct = { .gc = tcf_ct_flow_is_outdated, .action = tcf_ct_flow_table_fill_actions, + .get = tcf_ct_nf_get, + .put = tcf_ct_nf_put, .owner = THIS_MODULE, };
@@ -337,9 +359,13 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) return err; }
+static void tcf_ct_flow_table_get_ref(struct tcf_ct_flow_table *ct_ft) +{ + refcount_inc(&ct_ft->ref); +} + static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) { - struct flow_block_cb *block_cb, *tmp_cb; struct tcf_ct_flow_table *ct_ft; struct flow_block *block;
@@ -347,13 +373,9 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) rwork); nf_flow_table_free(&ct_ft->nf_ft);
- /* Remove any remaining callbacks before cleanup */ block = &ct_ft->nf_ft.flow_block; down_write(&ct_ft->nf_ft.flow_block_lock); - list_for_each_entry_safe(block_cb, tmp_cb, &block->cb_list, list) { - list_del(&block_cb->list); - flow_block_cb_free(block_cb); - } + WARN_ON(!list_empty(&block->cb_list)); up_write(&ct_ft->nf_ft.flow_block_lock); kfree(ct_ft);
From: Shinas Rasheed srasheed@marvell.com
stable inclusion from stable-v6.6.8 commit 6047060105e44c7108c715535fc92c0226995a40 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 284f717622417cb267e344a9174f8e5698d1e3c1 ]
The firmware ready value is 1, and get firmware ready status function should explicitly test for that value. The firmware ready value read will be 2 after driver load, and on unbind till firmware rewrites the firmware ready back to 0, the value seen by driver will be 2, which should be regarded as not ready.
Fixes: 10c073e40469 ("octeon_ep: defer probe if firmware not ready") Signed-off-by: Shinas Rasheed srasheed@marvell.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c index 5b46ca47c8e5..2ee1374db4c0 100644 --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c @@ -1076,7 +1076,8 @@ static bool get_fw_ready_status(struct pci_dev *pdev)
pci_read_config_byte(pdev, (pos + 8), &status); dev_info(&pdev->dev, "Firmware ready status = %u\n", status); - return status; +#define FW_STATUS_READY 1ULL + return status == FW_STATUS_READY; } return false; }
From: Hariprasad Kelam hkelam@marvell.com
stable inclusion from stable-v6.6.8 commit 5295d2ad9103381daba63b206b34cfed08bdba69 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit dbda436824ded8ef6a05bb82cd9baa8d42377a49 ]
Current implementation is such that, promisc mcam entry action is set as multicast even when there are no trusted VFs. multicast action causes the hardware to copy packet data, which reduces the performance.
This patch fixes this issue by setting the promisc mcam entry action to unicast instead of multicast when there are no trusted VFs. The same change is made for the 'allmulti' mcam entry action.
Fixes: ffd2f89ad05c ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: Sunil Kovvuri Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../ethernet/marvell/octeontx2/nic/otx2_pf.c | 25 ++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index 0c17ebdda148..a57455aebff6 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -1650,6 +1650,21 @@ static void otx2_free_hw_resources(struct otx2_nic *pf) mutex_unlock(&mbox->lock); }
+static bool otx2_promisc_use_mce_list(struct otx2_nic *pfvf) +{ + int vf; + + /* The AF driver will determine whether to allow the VF netdev or not */ + if (is_otx2_vf(pfvf->pcifunc)) + return true; + + /* check if there are any trusted VFs associated with the PF netdev */ + for (vf = 0; vf < pci_num_vf(pfvf->pdev); vf++) + if (pfvf->vf_configs[vf].trusted) + return true; + return false; +} + static void otx2_do_set_rx_mode(struct otx2_nic *pf) { struct net_device *netdev = pf->netdev; @@ -1682,7 +1697,8 @@ static void otx2_do_set_rx_mode(struct otx2_nic *pf) if (netdev->flags & (IFF_ALLMULTI | IFF_MULTICAST)) req->mode |= NIX_RX_MODE_ALLMULTI;
- req->mode |= NIX_RX_MODE_USE_MCE; + if (otx2_promisc_use_mce_list(pf)) + req->mode |= NIX_RX_MODE_USE_MCE;
otx2_sync_mbox_msg(&pf->mbox); mutex_unlock(&pf->mbox.lock); @@ -2691,11 +2707,14 @@ static int otx2_ndo_set_vf_trust(struct net_device *netdev, int vf, pf->vf_configs[vf].trusted = enable; rc = otx2_set_vf_permissions(pf, vf, OTX2_TRUSTED_VF);
- if (rc) + if (rc) { pf->vf_configs[vf].trusted = !enable; - else + } else { netdev_info(pf->netdev, "VF %d is %strusted\n", vf, enable ? "" : "not "); + otx2_set_rx_mode(netdev); + } + return rc; }
From: Hariprasad Kelam hkelam@marvell.com
stable inclusion from stable-v6.6.8 commit 6b5de31e372cc79ade47b54d8cbb6d258887497d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 570ba37898ecd9069beb58bf0b6cf84daba6e0fe ]
The RSS flow algorithm is not set up correctly for promiscuous or all multi MCAM entries. This has an impact on flow distribution.
This patch fixes the issue by updating flow algorithm index in above mentioned MCAM entries.
Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: Sunil Kovvuri Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../ethernet/marvell/octeontx2/af/rvu_npc.c | 55 +++++++++++++++---- 1 file changed, 44 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c index f65805860c8d..0bcf3e559280 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c @@ -671,6 +671,7 @@ void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc, int blkaddr, ucast_idx, index; struct nix_rx_action action = { 0 }; u64 relaxed_mask; + u8 flow_key_alg;
if (!hw->cap.nix_rx_multicast && is_cgx_vf(rvu, pcifunc)) return; @@ -701,6 +702,8 @@ void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc, action.op = NIX_RX_ACTIONOP_UCAST; }
+ flow_key_alg = action.flow_key_alg; + /* RX_ACTION set to MCAST for CGX PF's */ if (hw->cap.nix_rx_multicast && pfvf->use_mce_list && is_pf_cgxmapped(rvu, rvu_get_pf(pcifunc))) { @@ -740,7 +743,7 @@ void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc, req.vf = pcifunc; req.index = action.index; req.match_id = action.match_id; - req.flow_key_alg = action.flow_key_alg; + req.flow_key_alg = flow_key_alg;
rvu_mbox_handler_npc_install_flow(rvu, &req, &rsp); } @@ -854,6 +857,7 @@ void rvu_npc_install_allmulti_entry(struct rvu *rvu, u16 pcifunc, int nixlf, u8 mac_addr[ETH_ALEN] = { 0 }; struct nix_rx_action action = { 0 }; struct rvu_pfvf *pfvf; + u8 flow_key_alg; u16 vf_func;
/* Only CGX PF/VF can add allmulticast entry */ @@ -888,6 +892,7 @@ void rvu_npc_install_allmulti_entry(struct rvu *rvu, u16 pcifunc, int nixlf, *(u64 *)&action = npc_get_mcam_action(rvu, mcam, blkaddr, ucast_idx);
+ flow_key_alg = action.flow_key_alg; if (action.op != NIX_RX_ACTIONOP_RSS) { *(u64 *)&action = 0; action.op = NIX_RX_ACTIONOP_UCAST; @@ -924,7 +929,7 @@ void rvu_npc_install_allmulti_entry(struct rvu *rvu, u16 pcifunc, int nixlf, req.vf = pcifunc | vf_func; req.index = action.index; req.match_id = action.match_id; - req.flow_key_alg = action.flow_key_alg; + req.flow_key_alg = flow_key_alg;
rvu_mbox_handler_npc_install_flow(rvu, &req, &rsp); } @@ -990,11 +995,38 @@ static void npc_update_vf_flow_entry(struct rvu *rvu, struct npc_mcam *mcam, mutex_unlock(&mcam->lock); }
+static void npc_update_rx_action_with_alg_idx(struct rvu *rvu, struct nix_rx_action action, + struct rvu_pfvf *pfvf, int mcam_index, int blkaddr, + int alg_idx) + +{ + struct npc_mcam *mcam = &rvu->hw->mcam; + struct rvu_hwinfo *hw = rvu->hw; + int bank, op_rss; + + if (!is_mcam_entry_enabled(rvu, mcam, blkaddr, mcam_index)) + return; + + op_rss = (!hw->cap.nix_rx_multicast || !pfvf->use_mce_list); + + bank = npc_get_bank(mcam, mcam_index); + mcam_index &= (mcam->banksize - 1); + + /* If Rx action is MCAST update only RSS algorithm index */ + if (!op_rss) { + *(u64 *)&action = rvu_read64(rvu, blkaddr, + NPC_AF_MCAMEX_BANKX_ACTION(mcam_index, bank)); + + action.flow_key_alg = alg_idx; + } + rvu_write64(rvu, blkaddr, + NPC_AF_MCAMEX_BANKX_ACTION(mcam_index, bank), *(u64 *)&action); +} + void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf, int group, int alg_idx, int mcam_index) { struct npc_mcam *mcam = &rvu->hw->mcam; - struct rvu_hwinfo *hw = rvu->hw; struct nix_rx_action action; int blkaddr, index, bank; struct rvu_pfvf *pfvf; @@ -1050,15 +1082,16 @@ void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf, /* If PF's promiscuous entry is enabled, * Set RSS action for that entry as well */ - if ((!hw->cap.nix_rx_multicast || !pfvf->use_mce_list) && - is_mcam_entry_enabled(rvu, mcam, blkaddr, index)) { - bank = npc_get_bank(mcam, index); - index &= (mcam->banksize - 1); + npc_update_rx_action_with_alg_idx(rvu, action, pfvf, index, blkaddr, + alg_idx);
- rvu_write64(rvu, blkaddr, - NPC_AF_MCAMEX_BANKX_ACTION(index, bank), - *(u64 *)&action); - } + index = npc_get_nixlf_mcam_index(mcam, pcifunc, + nixlf, NIXLF_ALLMULTI_ENTRY); + /* If PF's allmulti entry is enabled, + * Set RSS action for that entry as well + */ + npc_update_rx_action_with_alg_idx(rvu, action, pfvf, index, blkaddr, + alg_idx); }
void npc_enadis_default_mce_entry(struct rvu *rvu, u16 pcifunc,
From: Hariprasad Kelam hkelam@marvell.com
stable inclusion from stable-v6.6.8 commit f115b31d7e96b4674f99427126e78721ef96a2ae bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit e307b5a845c5951dabafc48d00b6424ee64716c4 ]
The current implementation's default Pause Forward setting is causing unnecessary network traffic. This patch disables Pause Forward to address this issue.
Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: Sunil Kovvuri Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/marvell/octeontx2/af/rpm.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rpm.c b/drivers/net/ethernet/marvell/octeontx2/af/rpm.c index af21e2030cff..4728ba34b0e3 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rpm.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rpm.c @@ -373,6 +373,11 @@ void rpm_lmac_pause_frm_config(void *rpmd, int lmac_id, bool enable) cfg |= RPMX_MTI_MAC100X_COMMAND_CONFIG_TX_P_DISABLE; rpm_write(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG, cfg);
+ /* Disable forward pause to driver */ + cfg = rpm_read(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG); + cfg &= ~RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_FWD; + rpm_write(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG, cfg); + /* Enable channel mask for all LMACS */ if (is_dev_rpm2(rpm)) rpm_write(rpm, lmac_id, RPM2_CMR_CHAN_MSK_OR, 0xffff); @@ -616,12 +621,10 @@ int rpm_lmac_pfc_config(void *rpmd, int lmac_id, u8 tx_pause, u8 rx_pause, u16 p
if (rx_pause) { cfg &= ~(RPMX_MTI_MAC100X_COMMAND_CONFIG_RX_P_DISABLE | - RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_IGNORE | - RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_FWD); + RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_IGNORE); } else { cfg |= (RPMX_MTI_MAC100X_COMMAND_CONFIG_RX_P_DISABLE | - RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_IGNORE | - RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_FWD); + RPMX_MTI_MAC100X_COMMAND_CONFIG_PAUSE_IGNORE); }
if (tx_pause) {
From: Hyunwoo Kim v4bel@theori.io
stable inclusion from stable-v6.6.8 commit 531fd46f92895bcdc41bedd12533266c397196da bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 24e90b9e34f9e039f56b5f25f6e6eb92cdd8f4b3 ]
Because do_vcc_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with vcc_recvmsg(). A use-after-free for skb occurs with the following flow. ``` do_vcc_ioctl() -> skb_peek() vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram() ``` Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim v4bel@theori.io Link: https://lore.kernel.org/r/20231209094210.GA403126@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- net/atm/ioctl.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/atm/ioctl.c b/net/atm/ioctl.c index 838ebf0cabbf..f81f8d56f5c0 100644 --- a/net/atm/ioctl.c +++ b/net/atm/ioctl.c @@ -73,14 +73,17 @@ static int do_vcc_ioctl(struct socket *sock, unsigned int cmd, case SIOCINQ: { struct sk_buff *skb; + int amount;
if (sock->state != SS_CONNECTED) { error = -EINVAL; goto done; } + spin_lock_irq(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); - error = put_user(skb ? skb->len : 0, - (int __user *)argp) ? -EFAULT : 0; + amount = skb ? skb->len : 0; + spin_unlock_irq(&sk->sk_receive_queue.lock); + error = put_user(amount, (int __user *)argp) ? -EFAULT : 0; goto done; } case ATM_SETSC:
From: Hyunwoo Kim v4bel@theori.io
stable inclusion from stable-v6.6.8 commit 63caa51833e8701248a8a89d83effe96f30e4c80 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 810c38a369a0a0ce625b5c12169abce1dd9ccd53 ]
Because rose_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with rose_accept(). A use-after-free for skb occurs with the following flow. ``` rose_ioctl() -> skb_peek() rose_accept() -> skb_dequeue() -> kfree_skb() ``` Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim v4bel@theori.io Link: https://lore.kernel.org/r/20231209100538.GA407321@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- net/rose/af_rose.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index 49dafe9ac72f..4a5c2dc8dd7a 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -1315,9 +1315,11 @@ static int rose_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) case TIOCINQ: { struct sk_buff *skb; long amount = 0L; - /* These two are safe on a single CPU system as only user tasks fiddle here */ + + spin_lock_irq(&sk->sk_receive_queue.lock); if ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) amount = skb->len; + spin_unlock_irq(&sk->sk_receive_queue.lock); return put_user(amount, (unsigned int __user *) argp); }
From: Piotr Gardocki piotrx.gardocki@intel.com
stable inclusion from stable-v6.6.8 commit 3beb9d66e4422d1c3865da9027eee49863bff9e1 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3a0b5a2929fdeda63fc921c2dbed237059acf732 ]
New states introduced:
IAVF_FDIR_FLTR_DIS_REQUEST IAVF_FDIR_FLTR_DIS_PENDING IAVF_FDIR_FLTR_INACTIVE
Current FDIR state machines (SM) are not adequate to handle a few scenarios in the link DOWN/UP event, reset event and ntuple-feature.
For example, when VF link goes DOWN and comes back UP administratively, the expectation is that previously installed filters should also be restored. But with current SM, filters are not restored. So with new SM, during link DOWN filters are marked as INACTIVE in the iavf list but removed from PF. After link UP, SM will transition from INACTIVE to ADD_REQUEST to restore the filter.
Similarly, with VF reset, filters will be removed from the PF, but marked as INACTIVE in the iavf list. Filters will be restored after reset completion.
Steps to reproduce: -------------------
1. Create a VF. Here VF is enp8s0.
2. Assign IP addresses to VF and link partner and ping continuously from remote. Here remote IP is 1.1.1.1.
3. Check default RX Queue of traffic.
ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+.packets"
4. Add filter - change default RX Queue (to 15 here)
ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5
5. Ensure filter gets added and traffic is received on RX queue 15 now.
Link event testing: ------------------- 6. Bring VF link down and up. If traffic flows to configured queue 15, test is success, otherwise it is a failure.
Reset event testing: -------------------- 7. Reset the VF. If traffic flows to configured queue 15, test is success, otherwise it is a failure.
Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki piotrx.gardocki@intel.com Reviewed-by: Larysa Zaremba larysa.zaremba@intel.com Signed-off-by: Ranganatha Rao ranganatha.rao@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/intel/iavf/iavf.h | 1 + .../net/ethernet/intel/iavf/iavf_ethtool.c | 27 ++++--- drivers/net/ethernet/intel/iavf/iavf_fdir.h | 15 +++- drivers/net/ethernet/intel/iavf/iavf_main.c | 48 ++++++++++--- .../net/ethernet/intel/iavf/iavf_virtchnl.c | 71 +++++++++++++++++-- 5 files changed, 139 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h index d8d7b62ceb24..431d9d62c8c6 100644 --- a/drivers/net/ethernet/intel/iavf/iavf.h +++ b/drivers/net/ethernet/intel/iavf/iavf.h @@ -303,6 +303,7 @@ struct iavf_adapter { #define IAVF_FLAG_QUEUES_DISABLED BIT(17) #define IAVF_FLAG_SETUP_NETDEV_FEATURES BIT(18) #define IAVF_FLAG_REINIT_MSIX_NEEDED BIT(20) +#define IAVF_FLAG_FDIR_ENABLED BIT(21) /* duplicates for common code */ #define IAVF_FLAG_DCB_ENABLED 0 /* flags for admin queue service task */ diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index 1b412754aa42..892c6a4f03bb 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -1063,7 +1063,7 @@ iavf_get_ethtool_fdir_entry(struct iavf_adapter *adapter, struct iavf_fdir_fltr *rule = NULL; int ret = 0;
- if (!FDIR_FLTR_SUPPORT(adapter)) + if (!(adapter->flags & IAVF_FLAG_FDIR_ENABLED)) return -EOPNOTSUPP;
spin_lock_bh(&adapter->fdir_fltr_lock); @@ -1205,7 +1205,7 @@ iavf_get_fdir_fltr_ids(struct iavf_adapter *adapter, struct ethtool_rxnfc *cmd, unsigned int cnt = 0; int val = 0;
- if (!FDIR_FLTR_SUPPORT(adapter)) + if (!(adapter->flags & IAVF_FLAG_FDIR_ENABLED)) return -EOPNOTSUPP;
cmd->data = IAVF_MAX_FDIR_FILTERS; @@ -1397,7 +1397,7 @@ static int iavf_add_fdir_ethtool(struct iavf_adapter *adapter, struct ethtool_rx int count = 50; int err;
- if (!FDIR_FLTR_SUPPORT(adapter)) + if (!(adapter->flags & IAVF_FLAG_FDIR_ENABLED)) return -EOPNOTSUPP;
if (fsp->flow_type & FLOW_MAC_EXT) @@ -1438,12 +1438,16 @@ static int iavf_add_fdir_ethtool(struct iavf_adapter *adapter, struct ethtool_rx spin_lock_bh(&adapter->fdir_fltr_lock); iavf_fdir_list_add_fltr(adapter, fltr); adapter->fdir_active_fltr++; - fltr->state = IAVF_FDIR_FLTR_ADD_REQUEST; - adapter->aq_required |= IAVF_FLAG_AQ_ADD_FDIR_FILTER; + if (adapter->link_up) { + fltr->state = IAVF_FDIR_FLTR_ADD_REQUEST; + adapter->aq_required |= IAVF_FLAG_AQ_ADD_FDIR_FILTER; + } else { + fltr->state = IAVF_FDIR_FLTR_INACTIVE; + } spin_unlock_bh(&adapter->fdir_fltr_lock);
- mod_delayed_work(adapter->wq, &adapter->watchdog_task, 0); - + if (adapter->link_up) + mod_delayed_work(adapter->wq, &adapter->watchdog_task, 0); ret: if (err && fltr) kfree(fltr); @@ -1465,7 +1469,7 @@ static int iavf_del_fdir_ethtool(struct iavf_adapter *adapter, struct ethtool_rx struct iavf_fdir_fltr *fltr = NULL; int err = 0;
- if (!FDIR_FLTR_SUPPORT(adapter)) + if (!(adapter->flags & IAVF_FLAG_FDIR_ENABLED)) return -EOPNOTSUPP;
spin_lock_bh(&adapter->fdir_fltr_lock); @@ -1474,6 +1478,11 @@ static int iavf_del_fdir_ethtool(struct iavf_adapter *adapter, struct ethtool_rx if (fltr->state == IAVF_FDIR_FLTR_ACTIVE) { fltr->state = IAVF_FDIR_FLTR_DEL_REQUEST; adapter->aq_required |= IAVF_FLAG_AQ_DEL_FDIR_FILTER; + } else if (fltr->state == IAVF_FDIR_FLTR_INACTIVE) { + list_del(&fltr->list); + kfree(fltr); + adapter->fdir_active_fltr--; + fltr = NULL; } else { err = -EBUSY; } @@ -1782,7 +1791,7 @@ static int iavf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, ret = 0; break; case ETHTOOL_GRXCLSRLCNT: - if (!FDIR_FLTR_SUPPORT(adapter)) + if (!(adapter->flags & IAVF_FLAG_FDIR_ENABLED)) break; spin_lock_bh(&adapter->fdir_fltr_lock); cmd->rule_cnt = adapter->fdir_active_fltr; diff --git a/drivers/net/ethernet/intel/iavf/iavf_fdir.h b/drivers/net/ethernet/intel/iavf/iavf_fdir.h index 9eb9f73f6adf..d31bd923ba8c 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_fdir.h +++ b/drivers/net/ethernet/intel/iavf/iavf_fdir.h @@ -6,12 +6,25 @@
struct iavf_adapter;
-/* State of Flow Director filter */ +/* State of Flow Director filter + * + * *_REQUEST states are used to mark filter to be sent to PF driver to perform + * an action (either add or delete filter). *_PENDING states are an indication + * that request was sent to PF and the driver is waiting for response. + * + * Both DELETE and DISABLE states are being used to delete a filter in PF. + * The difference is that after a successful response filter in DEL_PENDING + * state is being deleted from VF driver as well and filter in DIS_PENDING state + * is being changed to INACTIVE state. + */ enum iavf_fdir_fltr_state_t { IAVF_FDIR_FLTR_ADD_REQUEST, /* User requests to add filter */ IAVF_FDIR_FLTR_ADD_PENDING, /* Filter pending add by the PF */ IAVF_FDIR_FLTR_DEL_REQUEST, /* User requests to delete filter */ IAVF_FDIR_FLTR_DEL_PENDING, /* Filter pending delete by the PF */ + IAVF_FDIR_FLTR_DIS_REQUEST, /* Filter scheduled to be disabled */ + IAVF_FDIR_FLTR_DIS_PENDING, /* Filter pending disable by the PF */ + IAVF_FDIR_FLTR_INACTIVE, /* Filter inactive on link down */ IAVF_FDIR_FLTR_ACTIVE, /* Filter is active */ };
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 68783a7b7096..5158addc0aa9 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1356,18 +1356,20 @@ static void iavf_clear_cloud_filters(struct iavf_adapter *adapter) **/ static void iavf_clear_fdir_filters(struct iavf_adapter *adapter) { - struct iavf_fdir_fltr *fdir, *fdirtmp; + struct iavf_fdir_fltr *fdir;
/* remove all Flow Director filters */ spin_lock_bh(&adapter->fdir_fltr_lock); - list_for_each_entry_safe(fdir, fdirtmp, &adapter->fdir_list_head, - list) { + list_for_each_entry(fdir, &adapter->fdir_list_head, list) { if (fdir->state == IAVF_FDIR_FLTR_ADD_REQUEST) { - list_del(&fdir->list); - kfree(fdir); - adapter->fdir_active_fltr--; - } else { - fdir->state = IAVF_FDIR_FLTR_DEL_REQUEST; + /* Cancel a request, keep filter as inactive */ + fdir->state = IAVF_FDIR_FLTR_INACTIVE; + } else if (fdir->state == IAVF_FDIR_FLTR_ADD_PENDING || + fdir->state == IAVF_FDIR_FLTR_ACTIVE) { + /* Disable filters which are active or have a pending + * request to PF to be added + */ + fdir->state = IAVF_FDIR_FLTR_DIS_REQUEST; } } spin_unlock_bh(&adapter->fdir_fltr_lock); @@ -4174,6 +4176,33 @@ static int iavf_setup_tc(struct net_device *netdev, enum tc_setup_type type, } }
+/** + * iavf_restore_fdir_filters + * @adapter: board private structure + * + * Restore existing FDIR filters when VF netdev comes back up. + **/ +static void iavf_restore_fdir_filters(struct iavf_adapter *adapter) +{ + struct iavf_fdir_fltr *f; + + spin_lock_bh(&adapter->fdir_fltr_lock); + list_for_each_entry(f, &adapter->fdir_list_head, list) { + if (f->state == IAVF_FDIR_FLTR_DIS_REQUEST) { + /* Cancel a request, keep filter as active */ + f->state = IAVF_FDIR_FLTR_ACTIVE; + } else if (f->state == IAVF_FDIR_FLTR_DIS_PENDING || + f->state == IAVF_FDIR_FLTR_INACTIVE) { + /* Add filters which are inactive or have a pending + * request to PF to be deleted + */ + f->state = IAVF_FDIR_FLTR_ADD_REQUEST; + adapter->aq_required |= IAVF_FLAG_AQ_ADD_FDIR_FILTER; + } + } + spin_unlock_bh(&adapter->fdir_fltr_lock); +} + /** * iavf_open - Called when a network interface is made active * @netdev: network interface device structure @@ -4241,8 +4270,9 @@ static int iavf_open(struct net_device *netdev)
spin_unlock_bh(&adapter->mac_vlan_list_lock);
- /* Restore VLAN filters that were removed with IFF_DOWN */ + /* Restore filters that were removed with IFF_DOWN */ iavf_restore_filters(adapter); + iavf_restore_fdir_filters(adapter);
iavf_configure(adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c index 0b97b424e487..b95a4f903204 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c @@ -1738,8 +1738,8 @@ void iavf_add_fdir_filter(struct iavf_adapter *adapter) **/ void iavf_del_fdir_filter(struct iavf_adapter *adapter) { + struct virtchnl_fdir_del f = {}; struct iavf_fdir_fltr *fdir; - struct virtchnl_fdir_del f; bool process_fltr = false; int len;
@@ -1756,11 +1756,16 @@ void iavf_del_fdir_filter(struct iavf_adapter *adapter) list_for_each_entry(fdir, &adapter->fdir_list_head, list) { if (fdir->state == IAVF_FDIR_FLTR_DEL_REQUEST) { process_fltr = true; - memset(&f, 0, len); f.vsi_id = fdir->vc_add_msg.vsi_id; f.flow_id = fdir->flow_id; fdir->state = IAVF_FDIR_FLTR_DEL_PENDING; break; + } else if (fdir->state == IAVF_FDIR_FLTR_DIS_REQUEST) { + process_fltr = true; + f.vsi_id = fdir->vc_add_msg.vsi_id; + f.flow_id = fdir->flow_id; + fdir->state = IAVF_FDIR_FLTR_DIS_PENDING; + break; } } spin_unlock_bh(&adapter->fdir_fltr_lock); @@ -1904,6 +1909,48 @@ static void iavf_netdev_features_vlan_strip_set(struct net_device *netdev, netdev->features &= ~NETIF_F_HW_VLAN_CTAG_RX; }
+/** + * iavf_activate_fdir_filters - Reactivate all FDIR filters after a reset + * @adapter: private adapter structure + * + * Called after a reset to re-add all FDIR filters and delete some of them + * if they were pending to be deleted. + */ +static void iavf_activate_fdir_filters(struct iavf_adapter *adapter) +{ + struct iavf_fdir_fltr *f, *ftmp; + bool add_filters = false; + + spin_lock_bh(&adapter->fdir_fltr_lock); + list_for_each_entry_safe(f, ftmp, &adapter->fdir_list_head, list) { + if (f->state == IAVF_FDIR_FLTR_ADD_REQUEST || + f->state == IAVF_FDIR_FLTR_ADD_PENDING || + f->state == IAVF_FDIR_FLTR_ACTIVE) { + /* All filters and requests have been removed in PF, + * restore them + */ + f->state = IAVF_FDIR_FLTR_ADD_REQUEST; + add_filters = true; + } else if (f->state == IAVF_FDIR_FLTR_DIS_REQUEST || + f->state == IAVF_FDIR_FLTR_DIS_PENDING) { + /* Link down state, leave filters as inactive */ + f->state = IAVF_FDIR_FLTR_INACTIVE; + } else if (f->state == IAVF_FDIR_FLTR_DEL_REQUEST || + f->state == IAVF_FDIR_FLTR_DEL_PENDING) { + /* Delete filters that were pending to be deleted, the + * list on PF is already cleared after a reset + */ + list_del(&f->list); + kfree(f); + adapter->fdir_active_fltr--; + } + } + spin_unlock_bh(&adapter->fdir_fltr_lock); + + if (add_filters) + adapter->aq_required |= IAVF_FLAG_AQ_ADD_FDIR_FILTER; +} + /** * iavf_virtchnl_completion * @adapter: adapter structure @@ -2081,7 +2128,8 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, spin_lock_bh(&adapter->fdir_fltr_lock); list_for_each_entry(fdir, &adapter->fdir_list_head, list) { - if (fdir->state == IAVF_FDIR_FLTR_DEL_PENDING) { + if (fdir->state == IAVF_FDIR_FLTR_DEL_PENDING || + fdir->state == IAVF_FDIR_FLTR_DIS_PENDING) { fdir->state = IAVF_FDIR_FLTR_ACTIVE; dev_info(&adapter->pdev->dev, "Failed to del Flow Director filter, error %s\n", iavf_stat_str(&adapter->hw, @@ -2217,6 +2265,8 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
spin_unlock_bh(&adapter->mac_vlan_list_lock);
+ iavf_activate_fdir_filters(adapter); + iavf_parse_vf_resource_msg(adapter);
/* negotiated VIRTCHNL_VF_OFFLOAD_VLAN_V2, so wait for the @@ -2406,7 +2456,9 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, list_for_each_entry_safe(fdir, fdir_tmp, &adapter->fdir_list_head, list) { if (fdir->state == IAVF_FDIR_FLTR_DEL_PENDING) { - if (del_fltr->status == VIRTCHNL_FDIR_SUCCESS) { + if (del_fltr->status == VIRTCHNL_FDIR_SUCCESS || + del_fltr->status == + VIRTCHNL_FDIR_FAILURE_RULE_NONEXIST) { dev_info(&adapter->pdev->dev, "Flow Director filter with location %u is deleted\n", fdir->loc); list_del(&fdir->list); @@ -2418,6 +2470,17 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, del_fltr->status); iavf_print_fdir_fltr(adapter, fdir); } + } else if (fdir->state == IAVF_FDIR_FLTR_DIS_PENDING) { + if (del_fltr->status == VIRTCHNL_FDIR_SUCCESS || + del_fltr->status == + VIRTCHNL_FDIR_FAILURE_RULE_NONEXIST) { + fdir->state = IAVF_FDIR_FLTR_INACTIVE; + } else { + fdir->state = IAVF_FDIR_FLTR_ACTIVE; + dev_info(&adapter->pdev->dev, "Failed to disable Flow Director filter with status: %d\n", + del_fltr->status); + iavf_print_fdir_fltr(adapter, fdir); + } } } spin_unlock_bh(&adapter->fdir_fltr_lock);
From: Piotr Gardocki piotrx.gardocki@intel.com
stable inclusion from stable-v6.6.8 commit e768a04908de2d72caaf8bc491472597c922cb39 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 09d23b8918f9ab0f8114f6b94f2faf8bde3fb52a ]
ntuple-filter feature on/off: Default is on. If turned off, the filters will be removed from both PF and iavf list. The removal is irrespective of current filter state.
Steps to reproduce: -------------------
1. Ensure ntuple is on.
ethtool -K enp8s0 ntuple-filters on
2. Create a filter to receive the traffic into non-default rx-queue like 15 and ensure traffic is flowing into queue into 15. Now, turn off ntuple. Traffic should not flow to configured queue 15. It should flow to default RX queue.
Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki piotrx.gardocki@intel.com Reviewed-by: Larysa Zaremba larysa.zaremba@intel.com Signed-off-by: Ranganatha Rao ranganatha.rao@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/intel/iavf/iavf_main.c | 59 +++++++++++++++++++++ 1 file changed, 59 insertions(+)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 5158addc0aa9..af8eb27a3615 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -4409,6 +4409,49 @@ static int iavf_change_mtu(struct net_device *netdev, int new_mtu) return ret; }
+/** + * iavf_disable_fdir - disable Flow Director and clear existing filters + * @adapter: board private structure + **/ +static void iavf_disable_fdir(struct iavf_adapter *adapter) +{ + struct iavf_fdir_fltr *fdir, *fdirtmp; + bool del_filters = false; + + adapter->flags &= ~IAVF_FLAG_FDIR_ENABLED; + + /* remove all Flow Director filters */ + spin_lock_bh(&adapter->fdir_fltr_lock); + list_for_each_entry_safe(fdir, fdirtmp, &adapter->fdir_list_head, + list) { + if (fdir->state == IAVF_FDIR_FLTR_ADD_REQUEST || + fdir->state == IAVF_FDIR_FLTR_INACTIVE) { + /* Delete filters not registered in PF */ + list_del(&fdir->list); + kfree(fdir); + adapter->fdir_active_fltr--; + } else if (fdir->state == IAVF_FDIR_FLTR_ADD_PENDING || + fdir->state == IAVF_FDIR_FLTR_DIS_REQUEST || + fdir->state == IAVF_FDIR_FLTR_ACTIVE) { + /* Filters registered in PF, schedule their deletion */ + fdir->state = IAVF_FDIR_FLTR_DEL_REQUEST; + del_filters = true; + } else if (fdir->state == IAVF_FDIR_FLTR_DIS_PENDING) { + /* Request to delete filter already sent to PF, change + * state to DEL_PENDING to delete filter after PF's + * response, not set as INACTIVE + */ + fdir->state = IAVF_FDIR_FLTR_DEL_PENDING; + } + } + spin_unlock_bh(&adapter->fdir_fltr_lock); + + if (del_filters) { + adapter->aq_required |= IAVF_FLAG_AQ_DEL_FDIR_FILTER; + mod_delayed_work(adapter->wq, &adapter->watchdog_task, 0); + } +} + #define NETIF_VLAN_OFFLOAD_FEATURES (NETIF_F_HW_VLAN_CTAG_RX | \ NETIF_F_HW_VLAN_CTAG_TX | \ NETIF_F_HW_VLAN_STAG_RX | \ @@ -4431,6 +4474,13 @@ static int iavf_set_features(struct net_device *netdev, iavf_set_vlan_offload_features(adapter, netdev->features, features);
+ if ((netdev->features & NETIF_F_NTUPLE) ^ (features & NETIF_F_NTUPLE)) { + if (features & NETIF_F_NTUPLE) + adapter->flags |= IAVF_FLAG_FDIR_ENABLED; + else + iavf_disable_fdir(adapter); + } + return 0; }
@@ -4726,6 +4776,9 @@ static netdev_features_t iavf_fix_features(struct net_device *netdev, { struct iavf_adapter *adapter = netdev_priv(netdev);
+ if (!FDIR_FLTR_SUPPORT(adapter)) + features &= ~NETIF_F_NTUPLE; + return iavf_fix_netdev_vlan_features(adapter, features); }
@@ -4843,6 +4896,12 @@ int iavf_process_config(struct iavf_adapter *adapter) if (vfres->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN) netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+ if (FDIR_FLTR_SUPPORT(adapter)) { + netdev->hw_features |= NETIF_F_NTUPLE; + netdev->features |= NETIF_F_NTUPLE; + adapter->flags |= IAVF_FLAG_FDIR_ENABLED; + } + netdev->priv_flags |= IFF_UNICAST_FLT;
/* Do not turn on offloads when they are requested to be turned off.
From: Slawomir Laba slawomirx.laba@intel.com
stable inclusion from stable-v6.6.8 commit 54f59a242bcfcd7ef7bbb9e1928edd39d908c8d5 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7ae42ef308ed0f6250b36f43e4eeb182ebbe6215 ]
Make the flow for pci shutdown be the same to the pci remove.
iavf_shutdown was implementing an incomplete version of iavf_remove. It misses several calls to the kernel like iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap that might break the system on reboot or hibernation.
Implement the call of iavf_remove directly in iavf_shutdown to close this gap.
Fixes below error messages (dmesg) during shutdown stress tests - [685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for VF 0 [685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for VF 0
Reproduction:
1. Create one VF interface: echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs
2. Run live dmesg on the host: dmesg -wH
3. On SUT, script below steps into vf_namespace_assignment.sh
<#!/bin/sh> // Remove <>. Git removes # line if=<VF name> (edit this per VF name) loop=0
while true; do
echo test round $loop let loop++
ip netns add ns$loop ip link set dev $if up ip link set dev $if netns ns$loop ip netns exec ns$loop ip link set dev $if up ip netns exec ns$loop ip link set dev $if netns 1 ip netns delete ns$loop
done
4. Run the script for at least 1000 iterations on SUT: ./vf_namespace_assignment.sh
Expected result: No errors in dmesg.
Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba slawomirx.laba@intel.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Ahmed Zaki ahmed.zaki@intel.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Co-developed-by: Ranganatha Rao ranganatha.rao@intel.com Signed-off-by: Ranganatha Rao ranganatha.rao@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/intel/iavf/iavf_main.c | 72 ++++++--------------- 1 file changed, 21 insertions(+), 51 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index af8eb27a3615..257865647c86 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -277,27 +277,6 @@ void iavf_free_virt_mem(struct iavf_hw *hw, struct iavf_virt_mem *mem) kfree(mem->va); }
-/** - * iavf_lock_timeout - try to lock mutex but give up after timeout - * @lock: mutex that should be locked - * @msecs: timeout in msecs - * - * Returns 0 on success, negative on failure - **/ -static int iavf_lock_timeout(struct mutex *lock, unsigned int msecs) -{ - unsigned int wait, delay = 10; - - for (wait = 0; wait < msecs; wait += delay) { - if (mutex_trylock(lock)) - return 0; - - msleep(delay); - } - - return -1; -} - /** * iavf_schedule_reset - Set the flags and schedule a reset event * @adapter: board private structure @@ -4925,34 +4904,6 @@ int iavf_process_config(struct iavf_adapter *adapter) return 0; }
-/** - * iavf_shutdown - Shutdown the device in preparation for a reboot - * @pdev: pci device structure - **/ -static void iavf_shutdown(struct pci_dev *pdev) -{ - struct iavf_adapter *adapter = iavf_pdev_to_adapter(pdev); - struct net_device *netdev = adapter->netdev; - - netif_device_detach(netdev); - - if (netif_running(netdev)) - iavf_close(netdev); - - if (iavf_lock_timeout(&adapter->crit_lock, 5000)) - dev_warn(&adapter->pdev->dev, "%s: failed to acquire crit_lock\n", __func__); - /* Prevent the watchdog from running. */ - iavf_change_state(adapter, __IAVF_REMOVE); - adapter->aq_required = 0; - mutex_unlock(&adapter->crit_lock); - -#ifdef CONFIG_PM - pci_save_state(pdev); - -#endif - pci_disable_device(pdev); -} - /** * iavf_probe - Device Initialization Routine * @pdev: PCI device information struct @@ -5166,17 +5117,22 @@ static int __maybe_unused iavf_resume(struct device *dev_d) **/ static void iavf_remove(struct pci_dev *pdev) { - struct iavf_adapter *adapter = iavf_pdev_to_adapter(pdev); struct iavf_fdir_fltr *fdir, *fdirtmp; struct iavf_vlan_filter *vlf, *vlftmp; struct iavf_cloud_filter *cf, *cftmp; struct iavf_adv_rss *rss, *rsstmp; struct iavf_mac_filter *f, *ftmp; + struct iavf_adapter *adapter; struct net_device *netdev; struct iavf_hw *hw; int err;
- netdev = adapter->netdev; + /* Don't proceed with remove if netdev is already freed */ + netdev = pci_get_drvdata(pdev); + if (!netdev) + return; + + adapter = iavf_pdev_to_adapter(pdev); hw = &adapter->hw;
if (test_and_set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) @@ -5304,11 +5260,25 @@ static void iavf_remove(struct pci_dev *pdev)
destroy_workqueue(adapter->wq);
+ pci_set_drvdata(pdev, NULL); + free_netdev(netdev);
pci_disable_device(pdev); }
+/** + * iavf_shutdown - Shutdown the device in preparation for a reboot + * @pdev: pci device structure + **/ +static void iavf_shutdown(struct pci_dev *pdev) +{ + iavf_remove(pdev); + + if (system_state == SYSTEM_POWER_OFF) + pci_set_power_state(pdev, PCI_D3hot); +} + static SIMPLE_DEV_PM_OPS(iavf_pm_ops, iavf_suspend, iavf_resume);
static struct pci_driver iavf_driver = {
From: Dinghao Liu dinghao.liu@zju.edu.cn
stable inclusion from stable-v6.6.8 commit 7106a15b96d7debc1da1c16623881ec94ac2b018 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit b65d52ac9c085c0c52dee012a210d4e2f352611b ]
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to free p_hwfn->p_cxt_mngr->ilt_shadow on error. However, qed_cxt_tables_alloc() accesses the freed pointer on failure of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(), which may lead to use-after-free. Fix this issue by setting p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/qlogic/qed/qed_cxt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c index 65e20693c549..33f4f58ee51c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c @@ -933,6 +933,7 @@ static void qed_ilt_shadow_free(struct qed_hwfn *p_hwfn) p_dma->virt_addr = NULL; } kfree(p_mngr->ilt_shadow); + p_mngr->ilt_shadow = NULL; }
static int qed_ilt_blk_alloc(struct qed_hwfn *p_hwfn,
From: Dong Chenchen dongchenchen2@huawei.com
stable inclusion from stable-v6.6.8 commit d6bef004987084ec0db319b0ec3f087b6f56f880 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f99cd56230f56c8b6b33713c5be4da5d6766be1f ]
syzkaller report:
kernel BUG at net/core/skbuff.c:3452! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452) Call Trace: icmp_glue_bits (net/ipv4/icmp.c:357) __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165) ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341) icmp_push_reply (net/ipv4/icmp.c:370) __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772) ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295) ip_output (net/ipv4/ip_output.c:427) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387) tcp_retransmit_skb (net/ipv4/tcp_output.c:3404) tcp_retransmit_timer (net/ipv4/tcp_timer.c:604) tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)
The panic issue was trigered by tcp simultaneous initiation. The initiation process is as follows:
TCP A TCP B
1. CLOSED CLOSED
2. SYN-SENT --> <SEQ=100><CTL=SYN> ...
3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT
4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED
5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
// TCP B: not send challenge ack for ack limit or packet loss // TCP A: close tcp_close tcp_send_fin if (!tskb && tcp_under_memory_pressure(sk)) tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag
6. FIN_WAIT_1 --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...
// TCP B: send challenge ack to SYN_FIN_ACK
7. ... <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED //challenge ack
// TCP A: <SND.UNA=101>
8. FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic
__tcp_retransmit_skb //skb->len=0 tcp_trim_head len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100 __pskb_trim_head skb->data_len -= len // skb->len=-1, wrap around ... ... ip_fragment icmp_glue_bits //BUG_ON
If we use tcp_trim_head() to remove acked SYN from packet that contains data or other flags, skb->len will be incorrectly decremented. We can remove SYN flag that has been acked from rtx_queue earlier than tcp_trim_head(), which can fix the problem mentioned above.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet edumazet@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- net/ipv4/tcp_output.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index eee34f6c7643..49500cd5c48d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3315,7 +3315,13 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) if (skb_still_in_host_queue(sk, skb)) return -EBUSY;
+start: if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { + if (unlikely(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)) { + TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_SYN; + TCP_SKB_CB(skb)->seq++; + goto start; + } if (unlikely(before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))) { WARN_ON_ONCE(1); return -EINVAL;
From: David Arinzon darinzon@amazon.com
stable inclusion from stable-v6.6.8 commit c22877fafd6b905e2714701fb3611953ff1b869e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 41db6f99b5489a0d2ef26afe816ef0c6118d1d47 ]
The ena_setup_and_create_all_xdp_queues() function freed all the resources upon failure, after creating only xdp_num_queues queues, instead of freeing just the created ones.
In this patch, the only resources that are freed, are the ones allocated right before the failure occurs.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shahar Itzko itzko@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index f955bde10cf9..098025d29247 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -74,6 +74,8 @@ static void ena_unmap_tx_buff(struct ena_ring *tx_ring, struct ena_tx_buffer *tx_info); static int ena_create_io_tx_queues_in_range(struct ena_adapter *adapter, int first_index, int count); +static void ena_free_all_io_tx_resources_in_range(struct ena_adapter *adapter, + int first_index, int count);
/* Increase a stat by cnt while holding syncp seqlock on 32bit machines */ static void ena_increase_stat(u64 *statp, u64 cnt, @@ -457,23 +459,22 @@ static void ena_init_all_xdp_queues(struct ena_adapter *adapter)
static int ena_setup_and_create_all_xdp_queues(struct ena_adapter *adapter) { + u32 xdp_first_ring = adapter->xdp_first_ring; + u32 xdp_num_queues = adapter->xdp_num_queues; int rc = 0;
- rc = ena_setup_tx_resources_in_range(adapter, adapter->xdp_first_ring, - adapter->xdp_num_queues); + rc = ena_setup_tx_resources_in_range(adapter, xdp_first_ring, xdp_num_queues); if (rc) goto setup_err;
- rc = ena_create_io_tx_queues_in_range(adapter, - adapter->xdp_first_ring, - adapter->xdp_num_queues); + rc = ena_create_io_tx_queues_in_range(adapter, xdp_first_ring, xdp_num_queues); if (rc) goto create_err;
return 0;
create_err: - ena_free_all_io_tx_resources(adapter); + ena_free_all_io_tx_resources_in_range(adapter, xdp_first_ring, xdp_num_queues); setup_err: return rc; }
From: David Arinzon darinzon@amazon.com
stable inclusion from stable-v6.6.8 commit 0cb2021b968e701582c984382b9a9efa55c36466 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 505b1a88d311ff6f8c44a34f94e3be21745cce6f ]
Current xdp code drops packets larger than ENA_XDP_MAX_MTU. This is an incorrect condition since the problem is not the size of the packet, rather the number of buffers it contains.
This commit:
1. Identifies and drops XDP multi-buffer packets at the beginning of the function. 2. Increases the xdp drop statistic when this drop occurs. 3. Adds a one-time print that such drops are happening to give better indication to the user.
Fixes: 838c93dc5449 ("net: ena: implement XDP drop support") Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 098025d29247..b638e1d3d151 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1672,20 +1672,23 @@ static void ena_set_rx_hash(struct ena_ring *rx_ring, } }
-static int ena_xdp_handle_buff(struct ena_ring *rx_ring, struct xdp_buff *xdp) +static int ena_xdp_handle_buff(struct ena_ring *rx_ring, struct xdp_buff *xdp, u16 num_descs) { struct ena_rx_buffer *rx_info; int ret;
+ /* XDP multi-buffer packets not supported */ + if (unlikely(num_descs > 1)) { + netdev_err_once(rx_ring->adapter->netdev, + "xdp: dropped unsupported multi-buffer packets\n"); + ena_increase_stat(&rx_ring->rx_stats.xdp_drop, 1, &rx_ring->syncp); + return ENA_XDP_DROP; + } + rx_info = &rx_ring->rx_buffer_info[rx_ring->ena_bufs[0].req_id]; xdp_prepare_buff(xdp, page_address(rx_info->page), rx_info->buf_offset, rx_ring->ena_bufs[0].len, false); - /* If for some reason we received a bigger packet than - * we expect, then we simply drop it - */ - if (unlikely(rx_ring->ena_bufs[0].len > ENA_XDP_MAX_MTU)) - return ENA_XDP_DROP;
ret = ena_xdp_execute(rx_ring, xdp);
@@ -1754,7 +1757,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, ena_rx_ctx.l4_proto, ena_rx_ctx.hash);
if (ena_xdp_present_ring(rx_ring)) - xdp_verdict = ena_xdp_handle_buff(rx_ring, &xdp); + xdp_verdict = ena_xdp_handle_buff(rx_ring, &xdp, ena_rx_ctx.descs);
/* allocate skb and fill it */ if (xdp_verdict == ENA_XDP_PASS)
From: David Arinzon darinzon@amazon.com
stable inclusion from stable-v6.6.8 commit 0116e02cee5a1a6d304f64faa22eea8cccb4ca9f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit d760117060cf2e90b5c59c5492cab179a4dbce01 ]
This patch fixes two issues:
Issue 1 ------- Description ``````````` Current code does not call dma_sync_single_for_cpu() to sync data from the device side memory to the CPU side memory before the XDP code path uses the CPU side data. This causes the XDP code path to read the unset garbage data in the CPU side memory, resulting in incorrect handling of the packet by XDP.
Solution ```````` 1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to use the data in the CPU side memory. 2. The XDP code verdict can be XDP_PASS, in which case there is a fallback to the non-XDP code, which also calls dma_sync_single_for_cpu(). To avoid calling dma_sync_single_for_cpu() twice: 2.1. Put the dma_sync_single_for_cpu() in the code in such a place where it happens before XDP and non-XDP code. 2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code for the first buffer only (rx_copybreak and non-rx_copybreak cases), since the new call that was added covers these cases. The call to dma_sync_single_for_cpu() for the second buffer and on stays because only the first buffer is handled by the newly added dma_sync_single_for_cpu(). And there is no need for special handling of the second buffer and on for the XDP path since currently the driver supports only single buffer packets.
Issue 2 ------- Description ``````````` In case the XDP code forwarded the packet (ENA_XDP_FORWARDED), ena_unmap_rx_buff_attrs() is called with attrs set to 0. This means that before unmapping the buffer, the internal function dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on the whole buffer (not only on the data part of it). This sync is both wasteful (since a sync was already explicitly called before) and also causes a bug, which will be explained using the below diagram.
The following diagram shows the flow of events causing the bug. The order of events is (1)-(4) as shown in the diagram.
CPU side memory area
(3)convert_to_xdp_frame() initializes the headroom with xdpf metadata || / ___________________________________ | | 0 | V 4K --------------------------------------------------------------------- | xdpf->data | other xdpf | < data > | tailroom ||...| | | fields | | GARBAGE || | ---------------------------------------------------------------------
/\ /\ || || (4)ena_unmap_rx_buff_attrs() calls (2)dma_sync_single_for_cpu() dma_sync_single_for_cpu() on the copies data from device whole buffer page, overwriting side to CPU side memory the xdpf->data with GARBAGE. || 0 4K --------------------------------------------------------------------- | headroom | < data > | tailroom ||...| | GARBAGE | | GARBAGE || | ---------------------------------------------------------------------
Device side memory area /\ || (1) device writes RX packet data
After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data becomes corrupted, and so when it is later accessed in ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault, crashing the kernel.
Solution ```````` Explicitly tell ena_unmap_rx_buff_attrs() not to call dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC flag.
Fixes: f7d625adeb7b ("net: ena: Add dynamic recycling mechanism for rx buffers") Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 23 ++++++++------------ 1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index b638e1d3d151..14e41eb57731 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1493,11 +1493,6 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring, if (unlikely(!skb)) return NULL;
- /* sync this buffer for CPU use */ - dma_sync_single_for_cpu(rx_ring->dev, - dma_unmap_addr(&rx_info->ena_buf, paddr) + pkt_offset, - len, - DMA_FROM_DEVICE); skb_copy_to_linear_data(skb, buf_addr + buf_offset, len); dma_sync_single_for_device(rx_ring->dev, dma_unmap_addr(&rx_info->ena_buf, paddr) + pkt_offset, @@ -1516,17 +1511,10 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring,
buf_len = SKB_DATA_ALIGN(len + buf_offset + tailroom);
- pre_reuse_paddr = dma_unmap_addr(&rx_info->ena_buf, paddr); - /* If XDP isn't loaded try to reuse part of the RX buffer */ reuse_rx_buf_page = !is_xdp_loaded && ena_try_rx_buf_page_reuse(rx_info, buf_len, len, pkt_offset);
- dma_sync_single_for_cpu(rx_ring->dev, - pre_reuse_paddr + pkt_offset, - len, - DMA_FROM_DEVICE); - if (!reuse_rx_buf_page) ena_unmap_rx_buff_attrs(rx_ring, rx_info, DMA_ATTR_SKIP_CPU_SYNC);
@@ -1723,6 +1711,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, int xdp_flags = 0; int total_len = 0; int xdp_verdict; + u8 pkt_offset; int rc = 0; int i;
@@ -1749,13 +1738,19 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
/* First descriptor might have an offset set by the device */ rx_info = &rx_ring->rx_buffer_info[rx_ring->ena_bufs[0].req_id]; - rx_info->buf_offset += ena_rx_ctx.pkt_offset; + pkt_offset = ena_rx_ctx.pkt_offset; + rx_info->buf_offset += pkt_offset;
netif_dbg(rx_ring->adapter, rx_status, rx_ring->netdev, "rx_poll: q %d got packet from ena. descs #: %d l3 proto %d l4 proto %d hash: %x\n", rx_ring->qid, ena_rx_ctx.descs, ena_rx_ctx.l3_proto, ena_rx_ctx.l4_proto, ena_rx_ctx.hash);
+ dma_sync_single_for_cpu(rx_ring->dev, + dma_unmap_addr(&rx_info->ena_buf, paddr) + pkt_offset, + rx_ring->ena_bufs[0].len, + DMA_FROM_DEVICE); + if (ena_xdp_present_ring(rx_ring)) xdp_verdict = ena_xdp_handle_buff(rx_ring, &xdp, ena_rx_ctx.descs);
@@ -1781,7 +1776,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, if (xdp_verdict & ENA_XDP_FORWARDED) { ena_unmap_rx_buff_attrs(rx_ring, &rx_ring->rx_buffer_info[req_id], - 0); + DMA_ATTR_SKIP_CPU_SYNC); rx_ring->rx_buffer_info[req_id].page = NULL; } }
From: David Arinzon darinzon@amazon.com
stable inclusion from stable-v6.6.8 commit 2cc8ffc3ad31b5618ad6b34450274d3a9817ce1f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 4ab138ca0a340e6d6e7a6a9bd5004bd8f83127ca ]
When sending TX packets, the meta descriptor can be all zeroes as no meta information is required (as in XDP).
This patch removes the validity check, as when `disable_meta_caching` is enabled, such TX packets will be dropped otherwise.
Fixes: 0e3a3f6dacf0 ("net: ena: support new LLQ acceleration mode") Signed-off-by: Shay Agroskin shayagr@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/amazon/ena/ena_eth_com.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.c b/drivers/net/ethernet/amazon/ena/ena_eth_com.c index 3d6f0a466a9e..f9f886289b97 100644 --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.c @@ -328,9 +328,6 @@ static int ena_com_create_and_store_tx_meta_desc(struct ena_com_io_sq *io_sq, * compare it to the stored version, just create the meta */ if (io_sq->disable_meta_caching) { - if (unlikely(!ena_tx_ctx->meta_valid)) - return -EINVAL; - *have_meta = true; return ena_com_create_meta(io_sq, ena_meta); }
From: Yanteng Si siyanteng@loongson.cn
stable inclusion from stable-v6.6.8 commit e0069c26c552c5c7f1009918eedac31a1eaa141b bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit e87d3a1370ce9f04770d789bcf7cce44865d2e8d ]
Generic code will use mdio. If it is not initialized before use, the kernel will Oops.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Yanteng Si siyanteng@loongson.cn Signed-off-by: Feiyang Chen chenfeiyang@loongson.cn Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../net/ethernet/stmicro/stmmac/dwmac-loongson.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c index 2cd6fce5c993..e7701326adc6 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c @@ -68,17 +68,15 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id if (!plat) return -ENOMEM;
+ plat->mdio_bus_data = devm_kzalloc(&pdev->dev, + sizeof(*plat->mdio_bus_data), + GFP_KERNEL); + if (!plat->mdio_bus_data) + return -ENOMEM; + plat->mdio_node = of_get_child_by_name(np, "mdio"); if (plat->mdio_node) { dev_info(&pdev->dev, "Found MDIO subnode\n"); - - plat->mdio_bus_data = devm_kzalloc(&pdev->dev, - sizeof(*plat->mdio_bus_data), - GFP_KERNEL); - if (!plat->mdio_bus_data) { - ret = -ENOMEM; - goto err_put_node; - } plat->mdio_bus_data->needs_reset = true; }
From: Yusong Gao a869920004@gmail.com
stable inclusion from stable-v6.6.8 commit f18ac4bae15ea7821716c8d46707bdeaa594b64e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 829649443e78d85db0cff0c37cadb28fbb1a5f6f ]
There are some wrong return values check in sign-file when call OpenSSL API. The ERR() check cond is wrong because of the program only check the return value is < 0 which ignored the return val is 0. For example: 1. CMS_final() return 1 for success or 0 for failure. 2. i2d_CMS_bio_stream() returns 1 for success or 0 for failure. 3. i2d_TYPEbio() return 1 for success and 0 for failure. 4. BIO_free() return 1 for success and 0 for failure.
Link: https://www.openssl.org/docs/manmaster/man3/ Fixes: e5a2e3c84782 ("scripts/sign-file.c: Add support for signing with a raw signature") Signed-off-by: Yusong Gao a869920004@gmail.com Reviewed-by: Juerg Haefliger juerg.haefliger@canonical.com Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20231213024405.624692-1-a869920004@gmail.com/ # v5 Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- scripts/sign-file.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/scripts/sign-file.c b/scripts/sign-file.c index 598ef5465f82..3edb156ae52c 100644 --- a/scripts/sign-file.c +++ b/scripts/sign-file.c @@ -322,7 +322,7 @@ int main(int argc, char **argv) CMS_NOSMIMECAP | use_keyid | use_signed_attrs), "CMS_add1_signer"); - ERR(CMS_final(cms, bm, NULL, CMS_NOCERTS | CMS_BINARY) < 0, + ERR(CMS_final(cms, bm, NULL, CMS_NOCERTS | CMS_BINARY) != 1, "CMS_final");
#else @@ -341,10 +341,10 @@ int main(int argc, char **argv) b = BIO_new_file(sig_file_name, "wb"); ERR(!b, "%s", sig_file_name); #ifndef USE_PKCS7 - ERR(i2d_CMS_bio_stream(b, cms, NULL, 0) < 0, + ERR(i2d_CMS_bio_stream(b, cms, NULL, 0) != 1, "%s", sig_file_name); #else - ERR(i2d_PKCS7_bio(b, pkcs7) < 0, + ERR(i2d_PKCS7_bio(b, pkcs7) != 1, "%s", sig_file_name); #endif BIO_free(b); @@ -374,9 +374,9 @@ int main(int argc, char **argv)
if (!raw_sig) { #ifndef USE_PKCS7 - ERR(i2d_CMS_bio_stream(bd, cms, NULL, 0) < 0, "%s", dest_name); + ERR(i2d_CMS_bio_stream(bd, cms, NULL, 0) != 1, "%s", dest_name); #else - ERR(i2d_PKCS7_bio(bd, pkcs7) < 0, "%s", dest_name); + ERR(i2d_PKCS7_bio(bd, pkcs7) != 1, "%s", dest_name); #endif } else { BIO *b; @@ -396,7 +396,7 @@ int main(int argc, char **argv) ERR(BIO_write(bd, &sig_info, sizeof(sig_info)) < 0, "%s", dest_name); ERR(BIO_write(bd, magic_number, sizeof(magic_number) - 1) < 0, "%s", dest_name);
- ERR(BIO_free(bd) < 0, "%s", dest_name); + ERR(BIO_free(bd) != 1, "%s", dest_name);
/* Finally, if we're signing in place, replace the original. */ if (replace_orig)
From: Nikolay Kuratov kniv@yandex-team.ru
stable inclusion from stable-v6.6.8 commit fa634779ffcc6938882bb6ac28308973c4975eab bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 60316d7f10b17a7ebb1ead0642fee8710e1560e0 ]
We need to do signed arithmetic if we expect condition `if (bytes < 0)` to be possible
Found by Linux Verification Center (linuxtesting.org) with SVACE
Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Nikolay Kuratov kniv@yandex-team.ru Reviewed-by: Stefano Garzarella sgarzare@redhat.com Link: https://lore.kernel.org/r/20231211162317.4116625-1-kniv@yandex-team.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- net/vmw_vsock/virtio_transport_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 8bc272b6003b..4084578b0b91 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -679,7 +679,7 @@ static s64 virtio_transport_has_space(struct vsock_sock *vsk) struct virtio_vsock_sock *vvs = vsk->trans; s64 bytes;
- bytes = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); + bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); if (bytes < 0) bytes = 0;
From: Ioana Ciornei ioana.ciornei@nxp.com
stable inclusion from stable-v6.6.8 commit 77e566c8813024d9e75ce6b8e9618d94eb9f05fa bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 2aad7d4189a923b24efa8ea6ad09059882b1bfe4 ]
The size of the DMA unmap was wrongly put as a sizeof of a pointer. Change the value of the DMA unmap to be the actual macro used for the allocation and the DMA map.
Fixes: 1110318d83e8 ("dpaa2-switch: add tc flower hardware offload on ingress traffic") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Link: https://lore.kernel.org/r/20231212164326.2753457-2-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch-flower.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch-flower.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch-flower.c index 4798fb7fe35d..b6a534a3e0b1 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch-flower.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch-flower.c @@ -139,7 +139,8 @@ int dpaa2_switch_acl_entry_add(struct dpaa2_switch_filter_block *filter_block, err = dpsw_acl_add_entry(ethsw->mc_io, 0, ethsw->dpsw_handle, filter_block->acl_id, acl_entry_cfg);
- dma_unmap_single(dev, acl_entry_cfg->key_iova, sizeof(cmd_buff), + dma_unmap_single(dev, acl_entry_cfg->key_iova, + DPAA2_ETHSW_PORT_ACL_CMD_BUF_SIZE, DMA_TO_DEVICE); if (err) { dev_err(dev, "dpsw_acl_add_entry() failed %d\n", err); @@ -181,8 +182,8 @@ dpaa2_switch_acl_entry_remove(struct dpaa2_switch_filter_block *block, err = dpsw_acl_remove_entry(ethsw->mc_io, 0, ethsw->dpsw_handle, block->acl_id, acl_entry_cfg);
- dma_unmap_single(dev, acl_entry_cfg->key_iova, sizeof(cmd_buff), - DMA_TO_DEVICE); + dma_unmap_single(dev, acl_entry_cfg->key_iova, + DPAA2_ETHSW_PORT_ACL_CMD_BUF_SIZE, DMA_TO_DEVICE); if (err) { dev_err(dev, "dpsw_acl_remove_entry() failed %d\n", err); kfree(cmd_buff);
From: Ioana Ciornei ioana.ciornei@nxp.com
stable inclusion from stable-v6.6.8 commit da8732cb38eac4461d7cbbfe51f6b43db18965c7 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f24a49a375f65e8e75ee1b19d806f46dbaae57fd ]
Starting with commit 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") the switchdev_bridge_port_offload() helper was extended with the intention to provide switchdev drivers easy access to object addition and deletion replays. This works by calling the replay helpers with non-NULL notifier blocks.
In the same commit, the dpaa2-switch driver was updated so that it passes valid notifier blocks to the helper. At that moment, no regression was identified through testing.
In the meantime, the blamed commit changed the behavior in terms of which ports get hit by the replay. Before this commit, only the initial port which identified itself as offloaded through switchdev_bridge_port_offload() got a replay of all port objects and FDBs. After this, the newly joining port will trigger a replay of objects on all bridge ports and on the bridge itself.
This behavior leads to errors in dpaa2_switch_port_vlans_add() when a VLAN gets installed on the same interface multiple times.
The intended mechanism to address this is to pass a non-NULL ctx to the switchdev_bridge_port_offload() helper and then check it against the port's private structure. But since the driver does not have any use for the replayed port objects and FDBs until it gains support for LAG offload, it's better to fix the issue by reverting the dpaa2-switch driver to not ask for replay. The pointers will be added back when we are prepared to ignore replays on unrelated ports.
Fixes: b28d580e2939 ("net: bridge: switchdev: replay all VLAN groups") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Link: https://lore.kernel.org/r/20231212164326.2753457-3-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c index 97d3151076d5..e01a246124ac 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c @@ -1998,9 +1998,6 @@ static int dpaa2_switch_port_attr_set_event(struct net_device *netdev, return notifier_from_errno(err); }
-static struct notifier_block dpaa2_switch_port_switchdev_nb; -static struct notifier_block dpaa2_switch_port_switchdev_blocking_nb; - static int dpaa2_switch_port_bridge_join(struct net_device *netdev, struct net_device *upper_dev, struct netlink_ext_ack *extack) @@ -2043,9 +2040,7 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev, goto err_egress_flood;
err = switchdev_bridge_port_offload(netdev, netdev, NULL, - &dpaa2_switch_port_switchdev_nb, - &dpaa2_switch_port_switchdev_blocking_nb, - false, extack); + NULL, NULL, false, extack); if (err) goto err_switchdev_offload;
@@ -2079,9 +2074,7 @@ static int dpaa2_switch_port_restore_rxvlan(struct net_device *vdev, int vid, vo
static void dpaa2_switch_port_pre_bridge_leave(struct net_device *netdev) { - switchdev_bridge_port_unoffload(netdev, NULL, - &dpaa2_switch_port_switchdev_nb, - &dpaa2_switch_port_switchdev_blocking_nb); + switchdev_bridge_port_unoffload(netdev, NULL, NULL, NULL); }
static int dpaa2_switch_port_bridge_leave(struct net_device *netdev)
From: Sneh Shah quic_snehshah@quicinc.com
stable inclusion from stable-v6.6.8 commit ad531dfcc648eb24d512a4acfeec76000eabaf42 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 981d947bcd382c3950a593690e0e13d194d65b1c ]
In 10M SGMII mode all the packets are being dropped due to wrong Rx clock. SGMII 10MBPS mode needs RX clock divider programmed to avoid drops in Rx. Update configure SGMII function with Rx clk divider programming.
Fixes: 463120c31c58 ("net: stmmac: dwmac-qcom-ethqos: add support for SGMII") Tested-by: Andrew Halaney ahalaney@redhat.com Signed-off-by: Sneh Shah quic_snehshah@quicinc.com Reviewed-by: Bjorn Andersson quic_bjorande@quicinc.com Link: https://lore.kernel.org/r/20231212092208.22393-1-quic_snehshah@quicinc.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c index d3bf42d0fceb..31631e3f89d0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c @@ -34,6 +34,7 @@ #define RGMII_CONFIG_LOOPBACK_EN BIT(2) #define RGMII_CONFIG_PROG_SWAP BIT(1) #define RGMII_CONFIG_DDR_MODE BIT(0) +#define RGMII_CONFIG_SGMII_CLK_DVDR GENMASK(18, 10)
/* SDCC_HC_REG_DLL_CONFIG fields */ #define SDCC_DLL_CONFIG_DLL_RST BIT(30) @@ -78,6 +79,8 @@ #define ETHQOS_MAC_CTRL_SPEED_MODE BIT(14) #define ETHQOS_MAC_CTRL_PORT_SEL BIT(15)
+#define SGMII_10M_RX_CLK_DVDR 0x31 + struct ethqos_emac_por { unsigned int offset; unsigned int value; @@ -598,6 +601,9 @@ static int ethqos_configure_rgmii(struct qcom_ethqos *ethqos) return 0; }
+/* On interface toggle MAC registers gets reset. + * Configure MAC block for SGMII on ethernet phy link up + */ static int ethqos_configure_sgmii(struct qcom_ethqos *ethqos) { int val; @@ -617,6 +623,10 @@ static int ethqos_configure_sgmii(struct qcom_ethqos *ethqos) case SPEED_10: val |= ETHQOS_MAC_CTRL_PORT_SEL; val &= ~ETHQOS_MAC_CTRL_SPEED_MODE; + rgmii_updatel(ethqos, RGMII_CONFIG_SGMII_CLK_DVDR, + FIELD_PREP(RGMII_CONFIG_SGMII_CLK_DVDR, + SGMII_10M_RX_CLK_DVDR), + RGMII_IO_MACRO_CONFIG); break; }
From: Andrew Halaney ahalaney@redhat.com
stable inclusion from stable-v6.6.8 commit 58c556661641c5ecbfc064067a31729d8d2c193d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit e23c0d21ce9234fbc31ece35663ababbb83f9347 ]
Many hardware configurations have the MDIO bus disabled, and are instead using some other MDIO bus to talk to the MAC's phy.
of_mdiobus_register() returns -ENODEV in this case. Let's handle it gracefully instead of failing to probe the MAC.
Fixes: 47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers.") Signed-off-by: Andrew Halaney ahalaney@redhat.com Reviewed-by: Serge Semin fancer.lancer@gmail.com Link: https://lore.kernel.org/r/20231212-b4-stmmac-handle-mdio-enodev-v2-1-600171a... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index fa9e7e7040b9..0542cfd1817e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -591,7 +591,11 @@ int stmmac_mdio_register(struct net_device *ndev) new_bus->parent = priv->device;
err = of_mdiobus_register(new_bus, mdio_node); - if (err != 0) { + if (err == -ENODEV) { + err = 0; + dev_info(dev, "MDIO bus is disabled\n"); + goto bus_register_fail; + } else if (err) { dev_err_probe(dev, err, "Cannot register the MDIO bus\n"); goto bus_register_fail; }
From: Hyunwoo Kim v4bel@theori.io
stable inclusion from stable-v6.6.8 commit e15ded324a3911358e8541a1b573665f99f216ef bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 189ff16722ee36ced4d2a2469d4ab65a8fee4198 ]
Because atalk_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with atalk_recvmsg(). A use-after-free for skb occurs with the following flow. ``` atalk_ioctl() -> skb_peek() atalk_recvmsg() -> skb_recv_datagram() -> skb_free_datagram() ``` Add sk->sk_receive_queue.lock to atalk_ioctl() to fix this issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim v4bel@theori.io Link: https://lore.kernel.org/r/20231213041056.GA519680@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- net/appletalk/ddp.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c index 8978fb6212ff..b070a8991200 100644 --- a/net/appletalk/ddp.c +++ b/net/appletalk/ddp.c @@ -1811,15 +1811,14 @@ static int atalk_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) break; } case TIOCINQ: { - /* - * These two are safe on a single CPU system as only - * user tasks fiddle here - */ - struct sk_buff *skb = skb_peek(&sk->sk_receive_queue); + struct sk_buff *skb; long amount = 0;
+ spin_lock_irq(&sk->sk_receive_queue.lock); + skb = skb_peek(&sk->sk_receive_queue); if (skb) amount = skb->len - sizeof(struct ddpehdr); + spin_unlock_irq(&sk->sk_receive_queue.lock); rc = put_user(amount, (int __user *)argp); break; }
From: Igor Russkikh irusskikh@marvell.com
stable inclusion from stable-v6.6.8 commit 3b5daf20c426b04d9278c0d3d6e3c7e0d6f5c852 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7bb26ea74aa86fdf894b7dbd8c5712c5b4187da7 ]
Driver has a logic leak in ring data allocation/free, where double free may happen in aq_ring_free if system is under stress and driver init/deinit is happening.
The probability is higher to get this during suspend/resume cycle.
Verification was done simulating same conditions with
stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000 while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done
Fixed by explicitly clearing pointers to NULL on deallocation
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Reported-by: Linus Torvalds torvalds@linux-foundation.org Closes: https://lore.kernel.org/netdev/CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCr... Signed-off-by: Igor Russkikh irusskikh@marvell.com Link: https://lore.kernel.org/r/20231213094044.22988-1-irusskikh@marvell.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 694daeaf3e61..e1885c1eb100 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -938,11 +938,14 @@ void aq_ring_free(struct aq_ring_s *self) return;
kfree(self->buff_ring); + self->buff_ring = NULL;
- if (self->dx_ring) + if (self->dx_ring) { dma_free_coherent(aq_nic_get_dev(self->aq_nic), self->size * self->dx_size, self->dx_ring, self->dx_ring_pa); + self->dx_ring = NULL; + } }
unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data)
From: Mario Limonciello mario.limonciello@amd.com
stable inclusion from stable-v6.6.8 commit fea8562f51b001de81a8115e34d138386b525afb bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit a9f68ffe1170ca4bc17ab29067d806a354a026e0 upstream.
Users have reported problems with recent Lenovo laptops that contain an IDEA5002 I2C HID device. Reports include fans turning on and running even at idle and spurious wakeups from suspend.
Presumably in the Windows ecosystem there is an application that uses the HID device. Maybe that puts it into a lower power state so it doesn't cause spurious events.
This device doesn't serve any functional purpose in Linux as nothing interacts with it so blacklist it from being probed. This will prevent the GPIO driver from setting up the GPIO and the spurious interrupts and wake events will not occur.
Cc: stable@vger.kernel.org # 6.1 Reported-and-tested-by: Marcus Aram marcus+oss@oxar.nl Reported-and-tested-by: Mark Herbert mark.herbert42@gmail.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2812 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/i2c-hid/i2c-hid-acpi.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/hid/i2c-hid/i2c-hid-acpi.c b/drivers/hid/i2c-hid/i2c-hid-acpi.c index ac918a9ea8d3..1b49243adb16 100644 --- a/drivers/hid/i2c-hid/i2c-hid-acpi.c +++ b/drivers/hid/i2c-hid/i2c-hid-acpi.c @@ -40,6 +40,11 @@ static const struct acpi_device_id i2c_hid_acpi_blacklist[] = { * ICN8505 controller, has a _CID of PNP0C50 but is not HID compatible. */ { "CHPN0001" }, + /* + * The IDEA5002 ACPI device causes high interrupt usage and spurious + * wakeups from suspend. + */ + { "IDEA5002" }, { } };
From: Sebastian Parschauer s.parschauer@gmx.de
stable inclusion from stable-v6.6.8 commit 6e5782b1e18b9dfe01fe220f14d5d4f25292a40e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 31e52523267faab5ed8569b9d5c22c9a2283872f upstream.
This device needs ALWAYS_POLL quirk, otherwise it keeps reconnecting indefinitely. It is a handbrake for sim racing detected as joystick. Reported and tested by GitHub user N0th1ngM4tt3rs.
Link: https://github.com/sriemer/fix-linux-mouse issue 22 Signed-off-by: Sebastian Parschauer s.parschauer@gmx.de Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index d10ccfa17e16..97ab317570dd 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -744,6 +744,7 @@
#define USB_VENDOR_ID_LABTEC 0x1020 #define USB_DEVICE_ID_LABTEC_WIRELESS_KEYBOARD 0x0006 +#define USB_DEVICE_ID_LABTEC_ODDOR_HANDBRAKE 0x8888
#define USB_VENDOR_ID_LCPOWER 0x1241 #define USB_DEVICE_ID_LCPOWER_LC1000 0xf767 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 5a48fcaa32f0..d9a4f8f7bbb0 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -120,6 +120,7 @@ static const struct hid_device_id hid_quirks[] = { { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_M406XE), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_MOUSEPEN_I608X_V2), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_PENSKETCH_T609A), HID_QUIRK_MULTI_INPUT }, + { HID_USB_DEVICE(USB_VENDOR_ID_LABTEC, USB_DEVICE_ID_LABTEC_ODDOR_HANDBRAKE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_OPTICAL_USB_MOUSE_600E), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_608D), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_6019), HID_QUIRK_ALWAYS_POLL },
From: Tyler Fanelli tfanelli@redhat.com
stable inclusion from stable-v6.6.8 commit 9f36c1c5132f3a03f420e88924b8829b508c59b8 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c55e0a55b165202f18cbc4a20650d2e1becd5507 upstream.
Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its description indicates a purpose of reducing memory footprint. This may imply that it could be further used to relax other DIRECT_IO operations in the future.
Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing, allow shared mmap of DIRECT_IO files while still bypassing the cache on regular reads and writes.
[Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility.
Signed-off-by: Tyler Fanelli tfanelli@redhat.com Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/fuse/file.c | 6 +++--- fs/fuse/fuse_i.h | 4 ++-- fs/fuse/inode.c | 6 +++--- include/uapi/linux/fuse.h | 10 ++++++---- 4 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1cdb6327511e..89e870d1a526 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1448,7 +1448,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, if (!ia) return -ENOMEM;
- if (fopen_direct_io && fc->direct_io_relax) { + if (fopen_direct_io && fc->direct_io_allow_mmap) { res = filemap_write_and_wait_range(mapping, pos, pos + count - 1); if (res) { fuse_io_free(ia); @@ -2466,9 +2466,9 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
if (ff->open_flags & FOPEN_DIRECT_IO) { /* Can't provide the coherency needed for MAP_SHARED - * if FUSE_DIRECT_IO_RELAX isn't set. + * if FUSE_DIRECT_IO_ALLOW_MMAP isn't set. */ - if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax) + if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap) return -ENODEV;
invalidate_inode_pages2(file->f_mapping); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 5c83c1d95cf7..69bbcf05ecfc 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -802,8 +802,8 @@ struct fuse_conn { /* Is tmpfile not implemented by fs? */ unsigned int no_tmpfile:1;
- /* relax restrictions in FOPEN_DIRECT_IO mode */ - unsigned int direct_io_relax:1; + /* Relax restrictions to allow shared mmap in FOPEN_DIRECT_IO mode */ + unsigned int direct_io_allow_mmap:1;
/* Is statx not implemented by fs? */ unsigned int no_statx:1; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 2e4eb7cf26fb..444418e240c8 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1232,8 +1232,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, fc->init_security = 1; if (flags & FUSE_CREATE_SUPP_GROUP) fc->create_supp_group = 1; - if (flags & FUSE_DIRECT_IO_RELAX) - fc->direct_io_relax = 1; + if (flags & FUSE_DIRECT_IO_ALLOW_MMAP) + fc->direct_io_allow_mmap = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; @@ -1280,7 +1280,7 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | - FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_RELAX; + FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index db92a7202b34..e7418d15fe39 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -209,7 +209,7 @@ * - add FUSE_HAS_EXPIRE_ONLY * * 7.39 - * - add FUSE_DIRECT_IO_RELAX + * - add FUSE_DIRECT_IO_ALLOW_MMAP * - add FUSE_STATX and related structures */
@@ -409,8 +409,7 @@ struct fuse_file_lock { * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir, * symlink and mknod (single group that matches parent) * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation - * FUSE_DIRECT_IO_RELAX: relax restrictions in FOPEN_DIRECT_IO mode, for now - * allow shared mmap + * FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode. */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -449,7 +448,10 @@ struct fuse_file_lock { #define FUSE_HAS_INODE_DAX (1ULL << 33) #define FUSE_CREATE_SUPP_GROUP (1ULL << 34) #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) -#define FUSE_DIRECT_IO_RELAX (1ULL << 36) +#define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) + +/* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ +#define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
/** * CUSE INIT request/reply flags
From: Krister Johansen kjlx@templeofstupid.com
stable inclusion from stable-v6.6.8 commit 2939dd306b1f00faf71472f2b16dead077f74c36 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c4d361f66ac91db8fc65061a9671682f61f4ca9d upstream.
Fuse submounts do not perform a lookup for the nodeid that they inherit from their parent. Instead, the code decrements the nlookup on the submount's fuse_inode when it is instantiated, and no forget is performed when a submount root is evicted.
Trouble arises when the submount's parent is evicted despite the submount itself being in use. In this author's case, the submount was in a container and deatched from the initial mount namespace via a MNT_DEATCH operation. When memory pressure triggered the shrinker, the inode from the parent was evicted, which triggered enough forgets to render the submount's nodeid invalid.
Since submounts should still function, even if their parent goes away, solve this problem by sharing refcounted state between the parent and its submount. When all of the references on this shared state reach zero, it's safe to forget the final lookup of the fuse nodeid.
Signed-off-by: Krister Johansen kjlx@templeofstupid.com Cc: stable@vger.kernel.org Fixes: 1866d779d5d2 ("fuse: Allow fuse_fill_super_common() for submounts") Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: fs/fuse/fuse_i.h
Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/fuse/fuse_i.h | 15 ++++++++++ fs/fuse/inode.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 87 insertions(+), 3 deletions(-)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 69bbcf05ecfc..d47a16a20c7b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -64,6 +64,19 @@ struct fuse_forget_link { struct fuse_forget_link *next; };
+/* Submount lookup tracking */ +struct fuse_submount_lookup { + /** Refcount */ + refcount_t count; + + /** Unique ID, which identifies the inode between userspace + * and kernel */ + u64 nodeid; + + /** The request used for sending the FORGET message */ + struct fuse_forget_link *forget; +}; + /** FUSE inode */ struct fuse_inode { /** Inode data */ @@ -159,6 +172,8 @@ struct fuse_inode { */ struct fuse_inode_dax *dax; #endif + /** Submount specific lookup tracking */ + struct fuse_submount_lookup *submount_lookup;
KABI_RESERVE(1) }; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 444418e240c8..59743813563e 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -68,6 +68,24 @@ struct fuse_forget_link *fuse_alloc_forget(void) return kzalloc(sizeof(struct fuse_forget_link), GFP_KERNEL_ACCOUNT); }
+static struct fuse_submount_lookup *fuse_alloc_submount_lookup(void) +{ + struct fuse_submount_lookup *sl; + + sl = kzalloc(sizeof(struct fuse_submount_lookup), GFP_KERNEL_ACCOUNT); + if (!sl) + return NULL; + sl->forget = fuse_alloc_forget(); + if (!sl->forget) + goto out_free; + + return sl; + +out_free: + kfree(sl); + return NULL; +} + static struct inode *fuse_alloc_inode(struct super_block *sb) { struct fuse_inode *fi; @@ -83,6 +101,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->attr_version = 0; fi->orig_ino = 0; fi->state = 0; + fi->submount_lookup = NULL; mutex_init(&fi->mutex); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); @@ -113,6 +132,17 @@ static void fuse_free_inode(struct inode *inode) kmem_cache_free(fuse_inode_cachep, fi); }
+static void fuse_cleanup_submount_lookup(struct fuse_conn *fc, + struct fuse_submount_lookup *sl) +{ + if (!refcount_dec_and_test(&sl->count)) + return; + + fuse_queue_forget(fc, sl->forget, sl->nodeid, 1); + sl->forget = NULL; + kfree(sl); +} + static void fuse_evict_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -132,6 +162,11 @@ static void fuse_evict_inode(struct inode *inode) fi->nlookup); fi->forget = NULL; } + + if (fi->submount_lookup) { + fuse_cleanup_submount_lookup(fc, fi->submount_lookup); + fi->submount_lookup = NULL; + } } if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) { WARN_ON(!list_empty(&fi->write_files)); @@ -332,6 +367,13 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, fuse_dax_dontcache(inode, attr->flags); }
+static void fuse_init_submount_lookup(struct fuse_submount_lookup *sl, + u64 nodeid) +{ + sl->nodeid = nodeid; + refcount_set(&sl->count, 1); +} + static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr, struct fuse_conn *fc) { @@ -395,12 +437,22 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid, */ if (fc->auto_submounts && (attr->flags & FUSE_ATTR_SUBMOUNT) && S_ISDIR(attr->mode)) { + struct fuse_inode *fi; + inode = new_inode(sb); if (!inode) return NULL;
fuse_init_inode(inode, attr, fc); - get_fuse_inode(inode)->nodeid = nodeid; + fi = get_fuse_inode(inode); + fi->nodeid = nodeid; + fi->submount_lookup = fuse_alloc_submount_lookup(); + if (!fi->submount_lookup) { + iput(inode); + return NULL; + } + /* Sets nlookup = 1 on fi->submount_lookup->nlookup */ + fuse_init_submount_lookup(fi->submount_lookup, nodeid); inode->i_flags |= S_AUTOMOUNT; goto done; } @@ -423,11 +475,11 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid, iput(inode); goto retry; } -done: fi = get_fuse_inode(inode); spin_lock(&fi->lock); fi->nlookup++; spin_unlock(&fi->lock); +done: fuse_change_attributes(inode, attr, NULL, attr_valid, attr_version);
return inode; @@ -1465,6 +1517,8 @@ static int fuse_fill_super_submount(struct super_block *sb, struct super_block *parent_sb = parent_fi->inode.i_sb; struct fuse_attr root_attr; struct inode *root; + struct fuse_submount_lookup *sl; + struct fuse_inode *fi;
fuse_sb_defaults(sb); fm->sb = sb; @@ -1487,12 +1541,27 @@ static int fuse_fill_super_submount(struct super_block *sb, * its nlookup should not be incremented. fuse_iget() does * that, though, so undo it here. */ - get_fuse_inode(root)->nlookup--; + fi = get_fuse_inode(root); + fi->nlookup--; + sb->s_d_op = &fuse_dentry_operations; sb->s_root = d_make_root(root); if (!sb->s_root) return -ENOMEM;
+ /* + * Grab the parent's submount_lookup pointer and take a + * reference on the shared nlookup from the parent. This is to + * prevent the last forget for this nodeid from getting + * triggered until all users have finished with it. + */ + sl = parent_fi->submount_lookup; + WARN_ON(!sl); + if (sl) { + refcount_inc(&sl->count); + fi->submount_lookup = sl; + } + return 0; }
From: Amir Goldstein amir73il@gmail.com
stable inclusion from stable-v6.6.8 commit fbcddc7410625f90a3c1352c503d286bb23c1a60 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 3f29f1c336c0e8a4bec52f1e5217f88835553e5b upstream.
The new fuse init flag FUSE_DIRECT_IO_ALLOW_MMAP breaks assumptions made by FOPEN_PARALLEL_DIRECT_WRITES and causes test generic/095 to hit BUG_ON(fi->writectr < 0) assertions in fuse_set_nowrite():
generic/095 5s ... kernel BUG at fs/fuse/dir.c:1756! ... ? fuse_set_nowrite+0x3d/0xdd ? do_raw_spin_unlock+0x88/0x8f ? _raw_spin_unlock+0x2d/0x43 ? fuse_range_is_writeback+0x71/0x84 fuse_sync_writes+0xf/0x19 fuse_direct_io+0x167/0x5bd fuse_direct_write_iter+0xf0/0x146
Auto disable FOPEN_PARALLEL_DIRECT_WRITES when server negotiated FUSE_DIRECT_IO_ALLOW_MMAP.
Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/fuse/file.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 89e870d1a526..a660f1f21540 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1574,6 +1574,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) ssize_t res; bool exclusive_lock = !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) || + get_fuse_conn(inode)->direct_io_allow_mmap || iocb->ki_flags & IOCB_APPEND || fuse_direct_write_extending_i_size(iocb, from);
@@ -1581,6 +1582,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) * Take exclusive lock if * - Parallel direct writes are disabled - a user space decision * - Parallel direct writes are enabled and i_size is being extended. + * - Shared mmap on direct_io file is supported (FUSE_DIRECT_IO_ALLOW_MMAP). * This might not be needed at all, but needs further investigation. */ if (exclusive_lock)
From: Hangyu Hua hbh25y@gmail.com
stable inclusion from stable-v6.6.8 commit ce5a6df21a00a02c155f2cf269f0e258aa522346 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 7f8ed28d1401320bcb02dda81b3c23ab2dc5a6d8 upstream.
fuse_dax_conn_free() will be called when fuse_fill_super_common() fails after fuse_dax_conn_alloc(). Then deactivate_locked_super() in virtio_fs_get_tree() will call virtio_kill_sb() to release the discarded superblock. This will call fuse_dax_conn_free() again in fuse_conn_put(), resulting in a possible double free.
Fixes: 1dd539577c42 ("virtiofs: add a mount option to enable dax") Signed-off-by: Hangyu Hua hbh25y@gmail.com Acked-by: Vivek Goyal vgoyal@redhat.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Cc: stable@vger.kernel.org # v5.10 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/fuse/dax.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 23904a6a9a96..12ef91d170bb 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -1222,6 +1222,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc) if (fc->dax) { fuse_free_dax_mem_ranges(&fc->dax->free_ranges); kfree(fc->dax); + fc->dax = NULL; } }
From: Al Viro viro@zeniv.linux.org.uk
stable inclusion from stable-v6.6.8 commit 9566ef570cc4c9a69af493fdba5f8b4ea101426f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 1ba0e9d69b2000e95267c888cbfa91d823388d47 upstream.
In 8e9fad0e70b7 "io_uring: Add io_uring command support for sockets" you've got an include of asm-generic/ioctls.h done in io_uring/uring_cmd.c. That had been done for the sake of this chunk - + ret = prot->ioctl(sk, SIOCINQ, &arg); + if (ret) + return ret; + return arg; + case SOCKET_URING_OP_SIOCOUTQ: + ret = prot->ioctl(sk, SIOCOUTQ, &arg);
SIOC{IN,OUT}Q are defined to symbols (FIONREAD and TIOCOUTQ) that come from ioctls.h, all right, but the values vary by the architecture.
FIONREAD is 0x467F on mips 0x4004667F on alpha, powerpc and sparc 0x8004667F on sh and xtensa 0x541B everywhere else TIOCOUTQ is 0x7472 on mips 0x40047473 on alpha, powerpc and sparc 0x80047473 on sh and xtensa 0x5411 everywhere else
->ioctl() expects the same values it would've gotten from userland; all places where we compare with SIOC{IN,OUT}Q are using asm/ioctls.h, so they pick the correct values. io_uring_cmd_sock(), OTOH, ends up passing the default ones.
Fixes: 8e9fad0e70b7 ("io_uring: Add io_uring command support for sockets") Cc: stable@vger.kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Link: https://lore.kernel.org/r/20231214213408.GT1674809@ZenIV Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- io_uring/uring_cmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 537795fddc87..5fa19861cda5 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -7,7 +7,7 @@ #include <linux/nospec.h>
#include <uapi/linux/io_uring.h> -#include <uapi/asm-generic/ioctls.h> +#include <asm/ioctls.h>
#include "io_uring.h" #include "rsrc.h"
From: Kai Vehmanen kai.vehmanen@linux.intel.com
stable inclusion from stable-v6.6.8 commit c52ebaf742734113b41b4b6bf8bd9a86e34e4445 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 3b1ff57e24a7bcd2e2a8426dd2013a80d1fa96eb upstream.
Add one more older NUC model that requires quirk to force all pins to be connected. The display codec pins are not registered properly without the force-connect quirk. The codec will report only one pin as having external connectivity, but i915 finds all three connectors on the system, so the two drivers are not in sync.
Issue found with DRM igt-gpu-tools test kms_hdmi_inject@inject-audio.
Link: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/3 Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Jani Saarinen jani.saarinen@intel.com Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231208132127.2438067-2-kai.vehmanen@linux.intel.... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/patch_hdmi.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index f6053fe32b2b..361b0268ff49 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1994,6 +1994,7 @@ static const struct snd_pci_quirk force_connect_list[] = { SND_PCI_QUIRK(0x103c, 0x8711, "HP", 1), SND_PCI_QUIRK(0x103c, 0x8715, "HP", 1), SND_PCI_QUIRK(0x1462, 0xec94, "MS-7C94", 1), + SND_PCI_QUIRK(0x8086, 0x2060, "Intel NUC5CPYB", 1), SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", 1), {} };
From: Kai Vehmanen kai.vehmanen@linux.intel.com
stable inclusion from stable-v6.6.8 commit 7ec57c10b01832d699bb61d4049e57031592625d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 924f5ca2975b2993ee81a7ecc3c809943a70f334 upstream.
On ASUSTeK Z170M PLUS and Z170 PRO GAMING systems, the display codec pins are not registered properly without the force-connect quirk. The codec will report only one pin as having external connectivity, but i915 finds all three connectors on the system, so the two drivers are not in sync.
Issue found with DRM igt-gpu-tools test kms_hdmi_inject@inject-audio.
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9801 Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Jani Saarinen jani.saarinen@intel.com Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231208132127.2438067-3-kai.vehmanen@linux.intel.... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/patch_hdmi.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index 361b0268ff49..99cd52045468 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1993,6 +1993,8 @@ static const struct snd_pci_quirk force_connect_list[] = { SND_PCI_QUIRK(0x103c, 0x871a, "HP", 1), SND_PCI_QUIRK(0x103c, 0x8711, "HP", 1), SND_PCI_QUIRK(0x103c, 0x8715, "HP", 1), + SND_PCI_QUIRK(0x1043, 0x86ae, "ASUS", 1), /* Z170 PRO */ + SND_PCI_QUIRK(0x1043, 0x86c7, "ASUS", 1), /* Z170M PLUS */ SND_PCI_QUIRK(0x1462, 0xec94, "MS-7C94", 1), SND_PCI_QUIRK(0x8086, 0x2060, "Intel NUC5CPYB", 1), SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", 1),
From: Hartmut Knaack knaack.h@gmx.de
stable inclusion from stable-v6.6.8 commit ffd1fe12d4c95c5644152fcbc85696a4195190ae bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 9b726bf6ae11add6a7a52883a21f90ff9cbca916 upstream.
The HP laptop 15-db0403ng uses the ALC236 codec and controls the mute LED using COEF 0x07 index 1. Sound card subsystem: Hewlett-Packard Company Device [103c:84ae]
Use the existing quirk for this model.
Signed-off-by: Hartmut Knaack knaack.h@gmx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/e61815d0-f1c7-b164-e49d-6ca84771476a@gmx.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 5c609c1150c4..2590879f0a84 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9705,6 +9705,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x103c, 0x83b9, "HP Spectre x360", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x841c, "HP Pavilion 15-CK0xx", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x8497, "HP Envy x360", ALC269_FIXUP_HP_MUTE_LED_MIC3), + SND_PCI_QUIRK(0x103c, 0x84ae, "HP 15-db0403ng", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x84da, "HP OMEN dc0019-ur", ALC295_FIXUP_HP_OMEN), SND_PCI_QUIRK(0x103c, 0x84e7, "HP Pavilion 15", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x8519, "HP Spectre x360 15-df0xxx", ALC285_FIXUP_HP_SPECTRE_X360),
From: Gergo Koteles soyer@irl.hu
stable inclusion from stable-v6.6.8 commit 7fc8bfdb7007ccd013eb786c76f6619da56a98f2 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 75a25d31b80770485641ad2789a854955f5c1e40 upstream.
Unloading then loading the module causes a NULL ponter dereference.
The hda_unbind zeroes the hda_component, later the hda_bind tries to dereference the codec field.
The hda_component is only initialized once by tas2781_generic_fixup.
Set only previously modified fields to NULL.
BUG: kernel NULL pointer dereference, address: 0000000000000322 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x7f/0x180 ? asm_exc_page_fault+0x26/0x30 ? tas2781_hda_bind+0x59/0x140 [snd_hda_scodec_tas2781_i2c] component_bind_all+0xf3/0x240 try_to_bring_up_aggregate_device+0x1c3/0x270 __component_add+0xbc/0x1a0 tas2781_hda_i2c_probe+0x289/0x3a0 [snd_hda_scodec_tas2781_i2c] i2c_device_probe+0x136/0x2e0
Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") Cc: stable@vger.kernel.org Signed-off-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/8b8ed2bd5f75fbb32e354a3226c2f966fa85b46b.170215652... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/tas2781_hda_i2c.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/tas2781_hda_i2c.c b/sound/pci/hda/tas2781_hda_i2c.c index fb802802939e..b42837105c22 100644 --- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -612,9 +612,13 @@ static void tas2781_hda_unbind(struct device *dev, { struct tasdevice_priv *tas_priv = dev_get_drvdata(dev); struct hda_component *comps = master_data; + comps = &comps[tas_priv->index];
- if (comps[tas_priv->index].dev == dev) - memset(&comps[tas_priv->index], 0, sizeof(*comps)); + if (comps->dev == dev) { + comps->dev = NULL; + memset(comps->name, 0, sizeof(comps->name)); + comps->playback_hook = NULL; + }
tasdevice_config_info_remove(tas_priv); tasdevice_dsp_remove(tas_priv);
From: Gergo Koteles soyer@irl.hu
stable inclusion from stable-v6.6.8 commit 795e91c599c29ab9b6b6cd57548b497d0d78e0d6 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 33071422714a4c9587753b0ccc130ca59323bf42 upstream.
The code does not properly check whether the calibration variable is available in the EFI. If it is not available, it causes a NULL pointer dereference.
Check the return value of the first get_variable call also.
BUG: kernel NULL pointer dereference, address: 0000000000000000 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? srso_alias_return_thunk+0x5/0x7f ? schedule+0x5e/0xd0 ? exc_page_fault+0x7f/0x180 ? asm_exc_page_fault+0x26/0x30 ? crc32_body+0x2c/0x120 ? tas2781_save_calibration+0xe4/0x220 [snd_hda_scodec_tas2781_i2c] tasdev_fw_ready+0x1af/0x280 [snd_hda_scodec_tas2781_i2c] request_firmware_work_func+0x59/0xa0
Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/f1f6583bda918f78556f67d522ca7b3b91cebbd5.170225110... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/tas2781_hda_i2c.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/tas2781_hda_i2c.c b/sound/pci/hda/tas2781_hda_i2c.c index b42837105c22..d3dafc9d150b 100644 --- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -455,9 +455,9 @@ static int tas2781_save_calibration(struct tasdevice_priv *tas_priv) status = efi.get_variable(efi_name, &efi_guid, &attr, &tas_priv->cali_data.total_sz, tas_priv->cali_data.data); - if (status != EFI_SUCCESS) - return -EINVAL; } + if (status != EFI_SUCCESS) + return -EINVAL;
tmp_val = (unsigned int *)tas_priv->cali_data.data;
From: Gergo Koteles soyer@irl.hu
stable inclusion from stable-v6.6.8 commit d94fad04a64b2a58157940ec7ab59d4ccd400a1c bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 6c6fa2641402e8e753262fb61ed9a15a7cb225ad upstream.
If the module can load the RCA but not the firmware binary, it will call the cleanup functions. Then unloading the module causes general protection fault due to double free.
Do not call the cleanup functions in tasdev_fw_ready.
general protection fault, probably for non-canonical address 0x6f2b8a2bff4c8fec: 0000 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> ? die_addr+0x36/0x90 ? exc_general_protection+0x1c5/0x430 ? asm_exc_general_protection+0x26/0x30 ? tasdevice_config_info_remove+0x6d/0xd0 [snd_soc_tas2781_fmwlib] tas2781_hda_unbind+0xaa/0x100 [snd_hda_scodec_tas2781_i2c] component_unbind+0x2e/0x50 component_unbind_all+0x92/0xa0 component_del+0xa8/0x140 tas2781_hda_remove.isra.0+0x32/0x60 [snd_hda_scodec_tas2781_i2c] i2c_device_remove+0x26/0xb0
Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/1a0885c424bb21172702d254655882b59ef6477a.170251001... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/tas2781_hda_i2c.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/sound/pci/hda/tas2781_hda_i2c.c b/sound/pci/hda/tas2781_hda_i2c.c index d3dafc9d150b..c8ee5f809c38 100644 --- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -550,11 +550,6 @@ static void tasdev_fw_ready(const struct firmware *fmw, void *context) tas2781_save_calibration(tas_priv);
out: - if (tas_priv->fw_state == TASDEVICE_DSP_FW_FAIL) { - /*If DSP FW fail, kcontrol won't be created */ - tasdevice_config_info_remove(tas_priv); - tasdevice_dsp_remove(tas_priv); - } mutex_unlock(&tas_priv->codec_lock); if (fmw) release_firmware(fmw);
From: Gergo Koteles soyer@irl.hu
stable inclusion from stable-v6.6.8 commit 56e22123449cf0be89db7bbf7c4667eb9c6bb953 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 315deab289924c83ab1ded50022e8db95d6e428b upstream.
Calling component_add starts loading the firmware, the callback function writes the program to the amplifiers. If the module resets the amplifiers after component_add, it happens that one of the amplifiers does not work because the reset and program writing are interleaving.
Call tas2781_reset before component_add to ensure reliable initialization.
Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/4d23bf58558e23ee8097de01f70f1eb8d9de2d15.170251124... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- sound/pci/hda/tas2781_hda_i2c.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/tas2781_hda_i2c.c b/sound/pci/hda/tas2781_hda_i2c.c index c8ee5f809c38..63a90c7e8976 100644 --- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -674,14 +674,14 @@ static int tas2781_hda_i2c_probe(struct i2c_client *clt)
pm_runtime_put_autosuspend(tas_priv->dev);
+ tas2781_reset(tas_priv); + ret = component_add(tas_priv->dev, &tas2781_hda_comp_ops); if (ret) { dev_err(tas_priv->dev, "Register component failed: %d\n", ret); pm_runtime_disable(tas_priv->dev); - goto err; }
- tas2781_reset(tas_priv); err: if (ret) tas2781_hda_remove(&clt->dev);
From: Bjorn Helgaas bhelgaas@google.com
stable inclusion from stable-v6.6.8 commit 5cc8d88a1b94b900fd74abda744c29ff5845430b bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 5df12742b7e3aae2594a30a9d14d5d6e9e7699f4 upstream.
This reverts commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 and the subsequent fix to it:
cc22522fd55e ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
40613da52b13 fixed a problem where hot-adding a device with large BARs failed if the bridge windows programmed by firmware were not large enough.
cc22522fd55e ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus") fixed a problem with 40613da52b13: an ACPI hot-add of a device on a PCI root bus (common in the virt world) or firmware sending ACPI Bus Check to non-existent Root Ports (e.g., on Dell Inspiron 7352/0W6WV0) caused a NULL pointer dereference and suspend/resume hangs.
Unfortunately the combination of 40613da52b13 and cc22522fd55e caused other problems:
- Fiona reported that hot-add of SCSI disks in QEMU virtual machine fails sometimes.
- Dongli reported a similar problem with hot-add of SCSI disks.
- Jonathan reported a console freeze during boot on bare metal due to an error in radeon GPU initialization.
Revert both patches to avoid adding these problems. This means we will again see the problems with hot-adding devices with large BARs and the NULL pointer dereferences and suspend/resume issues that 40613da52b13 and cc22522fd55e were intended to fix.
Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") Fixes: cc22522fd55e ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus") Reported-by: Fiona Ebner f.ebner@proxmox.com Closes: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@proxmox.com Reported-by: Dongli Zhang dongli.zhang@oracle.com Closes: https://lore.kernel.org/r/3c4a446a-b167-11b8-f36f-d3c1b49b42e9@oracle.com Reported-by: Jonathan Woithe jwoithe@just42.net Closes: https://lore.kernel.org/r/ZXpaNCLiDM+Kv38H@marvin.atrad.com.au Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Michael S. Tsirkin mst@redhat.com Acked-by: Igor Mammedov imammedo@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/pci/hotplug/acpiphp_glue.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index 601129772b2d..5b1f271c6034 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -512,15 +512,12 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) if (pass && dev->subordinate) { check_hotplug_bridge(slot, dev); pcibios_resource_survey_bus(dev->subordinate); - if (pci_is_root_bus(bus)) - __pci_bus_size_bridges(dev->subordinate, &add_list); + __pci_bus_size_bridges(dev->subordinate, + &add_list); } } } - if (pci_is_root_bus(bus)) - __pci_bus_assign_resources(bus, &add_list, NULL); - else - pci_assign_unassigned_bridge_resources(bus->self); + __pci_bus_assign_resources(bus, &add_list, NULL); }
acpiphp_sanitize_bus(bus);
From: Jiaxun Yang jiaxun.yang@flygoat.com
stable inclusion from stable-v6.6.8 commit 4fb5358c574e1b49a55e090dbe26cd4ddaa6c639 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit ef61a0405742a9f7f6051bc6fd2f017d87d07911 upstream.
This is a partial revert of 8b3517f88ff2 ("PCI: loongson: Prevent LS7A MRRS increases") for MIPS-based Loongson.
Some MIPS Loongson systems don't support arbitrary Max_Read_Request_Size (MRRS) settings. 8b3517f88ff2 ("PCI: loongson: Prevent LS7A MRRS increases") worked around that by (1) assuming that firmware configured MRRS to the maximum supported value and (2) preventing the PCI core from increasing MRRS.
Unfortunately, some firmware doesn't set that maximum MRRS correctly, which results in devices not being initialized correctly. One symptom, from the Debian report below, is this:
ata4.00: exception Emask 0x0 SAct 0x20000000 SErr 0x0 action 0x6 frozen ata4.00: failed command: WRITE FPDMA QUEUED ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq dma 16384 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata4.00: status: { DRDY } ata4: hard resetting link
Limit MRRS to 256 because MIPS Loongson with higher MRRS support is considered rare.
This must be done at device enablement stage because the MRRS setting may get lost if PCI_COMMAND_MASTER on the parent bridge is cleared, and we are only sure parent bridge is enabled at this point.
Fixes: 8b3517f88ff2 ("PCI: loongson: Prevent LS7A MRRS increases") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217680 Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1035587 Link: https://lore.kernel.org/r/20231201115028.84351-1-jiaxun.yang@flygoat.com Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Huacai Chen chenhuacai@loongson.cn Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/pci/controller/pci-loongson.c
Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/pci/controller/pci-loongson.c | 46 ++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c index 114fc8da1aff..df08dbf60c5c 100644 --- a/drivers/pci/controller/pci-loongson.c +++ b/drivers/pci/controller/pci-loongson.c @@ -96,13 +96,49 @@ DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_LOONGSON, DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_LOONGSON, DEV_LS7A_PCIE_PORT6, loongson_d3_quirk);
+/* + * Some Loongson PCIe ports have hardware limitations on their Maximum Read + * Request Size. They can't handle anything larger than this. Sane + * firmware will set proper MRRS at boot, so we only need no_inc_mrrs for + * bridges. However, some MIPS Loongson firmware doesn't set MRRS properly, + * so we have to enforce maximum safe MRRS, which is 256 bytes. + */ +#ifdef CONFIG_MIPS +static void loongson_set_min_mrrs_quirk(struct pci_dev *pdev) +{ + struct pci_bus *bus = pdev->bus; + struct pci_dev *bridge; + static const struct pci_device_id bridge_devids[] = { + { PCI_VDEVICE(LOONGSON, DEV_LS2K_PCIE_PORT0) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT0) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT1) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT2) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT3) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT4) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT5) }, + { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT6) }, + { 0, }, + }; + + /* look for the matching bridge */ + while (!pci_is_root_bus(bus)) { + bridge = bus->self; + bus = bus->parent; + + if (pci_match_id(bridge_devids, bridge)) { + if (pcie_get_readrq(pdev) > 256) { + pci_info(pdev, "limiting MRRS to 256\n"); + pcie_set_readrq(pdev, 256); + } + break; + } + } +} +DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_set_min_mrrs_quirk); +#endif + static void loongson_mrrs_quirk(struct pci_dev *pdev) { - /* - * Some Loongson PCIe ports have h/w limitations of maximum read - * request size. They can't handle anything larger than this. So - * force this limit on any devices attached under these ports. - */ struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
bridge->no_inc_mrrs = 1;
From: Johan Hovold johan+linaro@kernel.org
stable inclusion from stable-v6.6.8 commit 1e1f461ea574b1a9aba7ffcc15d8721c087c5cda bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 718ab8226636a1a3a7d281f5d6a7ad7c925efe5a upstream.
Add pci_enable_link_state_locked() for enabling link states that can be used in contexts where a pci_bus_sem read lock is already held (e.g. from pci_walk_bus()).
This helper will be used to fix a couple of potential deadlocks where the current helper is called with the lock already held, hence the CC stable tag.
Fixes: f492edb40b54 ("PCI: vmd: Add quirk to configure PCIe ASPM and LTR") Link: https://lore.kernel.org/r/20231128081512.19387-2-johan+linaro@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org [bhelgaas: include helper name in subject, commit log] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Cc: stable@vger.kernel.org # 6.3 Cc: Michael Bottini michael.a.bottini@linux.intel.com Cc: David E. Box david.e.box@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/pci/pcie/aspm.c | 53 +++++++++++++++++++++++++++++++---------- include/linux/pci.h | 3 +++ 2 files changed, 43 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index fc18e42f0a6e..67b13f26ba7c 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -1102,17 +1102,7 @@ int pci_disable_link_state(struct pci_dev *pdev, int state) } EXPORT_SYMBOL(pci_disable_link_state);
-/** - * pci_enable_link_state - Clear and set the default device link state so that - * the link may be allowed to enter the specified states. Note that if the - * BIOS didn't grant ASPM control to the OS, this does nothing because we can't - * touch the LNKCTL register. Also note that this does not enable states - * disabled by pci_disable_link_state(). Return 0 or a negative errno. - * - * @pdev: PCI device - * @state: Mask of ASPM link states to enable - */ -int pci_enable_link_state(struct pci_dev *pdev, int state) +static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) { struct pcie_link_state *link = pcie_aspm_get_link(pdev);
@@ -1129,7 +1119,8 @@ int pci_enable_link_state(struct pci_dev *pdev, int state) return -EPERM; }
- down_read(&pci_bus_sem); + if (!locked) + down_read(&pci_bus_sem); mutex_lock(&aspm_lock); link->aspm_default = 0; if (state & PCIE_LINK_STATE_L0S) @@ -1150,12 +1141,48 @@ int pci_enable_link_state(struct pci_dev *pdev, int state) link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0; pcie_set_clkpm(link, policy_to_clkpm_state(link)); mutex_unlock(&aspm_lock); - up_read(&pci_bus_sem); + if (!locked) + up_read(&pci_bus_sem);
return 0; } + +/** + * pci_enable_link_state - Clear and set the default device link state so that + * the link may be allowed to enter the specified states. Note that if the + * BIOS didn't grant ASPM control to the OS, this does nothing because we can't + * touch the LNKCTL register. Also note that this does not enable states + * disabled by pci_disable_link_state(). Return 0 or a negative errno. + * + * @pdev: PCI device + * @state: Mask of ASPM link states to enable + */ +int pci_enable_link_state(struct pci_dev *pdev, int state) +{ + return __pci_enable_link_state(pdev, state, false); +} EXPORT_SYMBOL(pci_enable_link_state);
+/** + * pci_enable_link_state_locked - Clear and set the default device link state + * so that the link may be allowed to enter the specified states. Note that if + * the BIOS didn't grant ASPM control to the OS, this does nothing because we + * can't touch the LNKCTL register. Also note that this does not enable states + * disabled by pci_disable_link_state(). Return 0 or a negative errno. + * + * @pdev: PCI device + * @state: Mask of ASPM link states to enable + * + * Context: Caller holds pci_bus_sem read lock. + */ +int pci_enable_link_state_locked(struct pci_dev *pdev, int state) +{ + lockdep_assert_held_read(&pci_bus_sem); + + return __pci_enable_link_state(pdev, state, true); +} +EXPORT_SYMBOL(pci_enable_link_state_locked); + static int pcie_aspm_set_policy(const char *val, const struct kernel_param *kp) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 6f96facca51a..adb98762b513 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1846,6 +1846,7 @@ extern bool pcie_ports_native; int pci_disable_link_state(struct pci_dev *pdev, int state); int pci_disable_link_state_locked(struct pci_dev *pdev, int state); int pci_enable_link_state(struct pci_dev *pdev, int state); +int pci_enable_link_state_locked(struct pci_dev *pdev, int state); void pcie_no_aspm(void); bool pcie_aspm_support_enabled(void); bool pcie_aspm_enabled(struct pci_dev *pdev); @@ -1856,6 +1857,8 @@ static inline int pci_disable_link_state_locked(struct pci_dev *pdev, int state) { return 0; } static inline int pci_enable_link_state(struct pci_dev *pdev, int state) { return 0; } +static inline int pci_enable_link_state_locked(struct pci_dev *pdev, int state) +{ return 0; } static inline void pcie_no_aspm(void) { } static inline bool pcie_aspm_support_enabled(void) { return false; } static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
From: Namjae Jeon linkinjeon@kernel.org
stable inclusion from stable-v6.6.8 commit f94c44342f0a5612be4710c0d3122198244ffa59 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 13736654481198e519059d4a2e2e3b20fa9fdb3e upstream.
MS confirm that "AISi" name of SMB2_CREATE_ALLOCATION_SIZE in MS-SMB2 specification is a typo. cifs/ksmbd have been using this wrong name from MS-SMB2. It should be "AlSi". Also It will cause problem when running smb2.create.open test in smbtorture against ksmbd.
Cc: stable@vger.kernel.org Fixes: 12197a7fdda9 ("Clarify SMB2/SMB3 create context and add missing ones") Signed-off-by: Namjae Jeon linkinjeon@kernel.org Reviewed-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/common/smb2pdu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/common/smb2pdu.h b/fs/smb/common/smb2pdu.h index 319fb9ffc6a0..ec20c83cc836 100644 --- a/fs/smb/common/smb2pdu.h +++ b/fs/smb/common/smb2pdu.h @@ -1120,7 +1120,7 @@ struct smb2_change_notify_rsp { #define SMB2_CREATE_SD_BUFFER "SecD" /* security descriptor */ #define SMB2_CREATE_DURABLE_HANDLE_REQUEST "DHnQ" #define SMB2_CREATE_DURABLE_HANDLE_RECONNECT "DHnC" -#define SMB2_CREATE_ALLOCATION_SIZE "AISi" +#define SMB2_CREATE_ALLOCATION_SIZE "AlSi" #define SMB2_CREATE_QUERY_MAXIMAL_ACCESS_REQUEST "MxAc" #define SMB2_CREATE_TIMEWARP_REQUEST "TWrp" #define SMB2_CREATE_QUERY_ON_DISK_ID "QFid"
From: Johan Hovold johan+linaro@kernel.org
stable inclusion from stable-v6.6.8 commit 98bd0b4ad5d46fe746c9f99e0f11da7cfec8d8bc bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 49de0dc87965079a8e2803ee4b39f9d946259423 upstream.
The vmd_pm_enable_quirk() helper is called from pci_walk_bus() during probe to enable ASPM for controllers with VMD_FEAT_BIOS_PM_QUIRK set.
Since pci_walk_bus() already holds a pci_bus_sem read lock, use pci_enable_link_state_locked() to enable link states in order to avoid a potential deadlock (e.g. in case someone takes a write lock before reacquiring the read lock).
Fixes: f492edb40b54 ("PCI: vmd: Add quirk to configure PCIe ASPM and LTR") Link: https://lore.kernel.org/r/20231128081512.19387-3-johan+linaro@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org [bhelgaas: add "potential" in subject since the deadlock has only been reported by lockdep, include helper name in commit log] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Cc: stable@vger.kernel.org # 6.3 Cc: Michael Bottini michael.a.bottini@linux.intel.com Cc: David E. Box david.e.box@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/pci/controller/vmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 1c1c1aa940a5..6ac0afae0ca1 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -751,7 +751,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) return 0;
- pci_enable_link_state(pdev, PCIE_LINK_STATE_ALL); + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); if (!pos)
From: Michael Walle mwalle@kernel.org
stable inclusion from stable-v6.6.8 commit 8964524158ace2f0fcc055e663ab5308cae40175 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 4662817aed5a9d6c695658d0105d8ff4b84ac6cb ]
drm_crtc_from_index(0) might return NULL if there are no CRTCs registered at all which will lead to a kernel oops in mtk_drm_crtc_dma_dev_get(). Add the missing return value check.
Fixes: 0d9eee9118b7 ("drm/mediatek: Add drm ovl_adaptor sub driver for MT8195") Signed-off-by: Michael Walle mwalle@kernel.org Reviewed-by: Nícolas F. R. A. Prado nfraprado@collabora.com Tested-by: Nícolas F. R. A. Prado nfraprado@collabora.com Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Tested-by: Eugen Hristev eugen.hristev@collabora.com Reviewed-by: Eugen Hristev eugen.hristev@collabora.com Link: https://patchwork.kernel.org/project/dri-devel/patch/20230905084922.3908121-... Signed-off-by: Chun-Kuang Hu chunkuang.hu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 2d6a979afe8f..cdd506c80373 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -421,6 +421,7 @@ static int mtk_drm_kms_init(struct drm_device *drm) struct mtk_drm_private *private = drm->dev_private; struct mtk_drm_private *priv_n; struct device *dma_dev = NULL; + struct drm_crtc *crtc; int ret, i, j;
if (drm_firmware_drivers_only()) @@ -495,7 +496,9 @@ static int mtk_drm_kms_init(struct drm_device *drm) }
/* Use OVL device for all DMA memory allocations */ - dma_dev = mtk_drm_crtc_dma_dev_get(drm_crtc_from_index(drm, 0)); + crtc = drm_crtc_from_index(drm, 0); + if (crtc) + dma_dev = mtk_drm_crtc_dma_dev_get(crtc); if (!dma_dev) { ret = -ENODEV; dev_err(drm->dev, "Need at least one OVL device\n");
From: "Jason-JH.Lin" jason-jh.lin@mediatek.com
stable inclusion from stable-v6.6.8 commit 7d6e9cb7b95165b40e7718b017c16b543d93411a bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit fe4c5f662097978b6c91c23a13c24ed92339a180 ]
Add spinlock protection to avoid race condition on vblank event between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip().
Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") Signed-off-by: Jason-JH.Lin jason-jh.lin@mediatek.com Suggested-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Reviewed-by: Alexandre Mergnat amergnat@baylibre.com Reviewed-by: Fei Shao fshao@chromium.org Tested-by: Fei Shao fshao@chromium.org Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Reviewed-by: CK Hu ck.hu@mediatek.com Link: https://patchwork.kernel.org/project/dri-devel/patch/20230920090658.31181-1-... Signed-off-by: Chun-Kuang Hu chunkuang.hu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index 0a511d7688a3..a033da279a94 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -747,6 +747,7 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, crtc); struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state); struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); + unsigned long flags;
if (mtk_crtc->event && mtk_crtc_state->base.event) DRM_ERROR("new event while there is still a pending event\n"); @@ -754,7 +755,11 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, if (mtk_crtc_state->base.event) { mtk_crtc_state->base.event->pipe = drm_crtc_index(crtc); WARN_ON(drm_crtc_vblank_get(crtc) != 0); + + spin_lock_irqsave(&crtc->dev->event_lock, flags); mtk_crtc->event = mtk_crtc_state->base.event; + spin_unlock_irqrestore(&crtc->dev->event_lock, flags); + mtk_crtc_state->base.event = NULL; } }
From: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com
stable inclusion from stable-v6.6.8 commit 0afcc6291024dc8fa32b966a0da27345b82bc6cd bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit eefa13a69053a09f20b2d1c00dda59be9c98cfe9 ]
Use ivpu_dbg(MISC) to print information about workarounds.
Reviewed-by: Karol Wachowski karol.wachowski@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Signed-off-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230901094957.168898-6-stanis... Stable-dep-of: 35c49cfc8b70 ("accel/ivpu/37xx: Fix interrupt_clear_with_0 WA initialization") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/accel/ivpu/ivpu_drv.h | 5 +++++ drivers/accel/ivpu/ivpu_hw_37xx.c | 5 +++++ drivers/accel/ivpu/ivpu_hw_40xx.c | 4 ++++ 3 files changed, 14 insertions(+)
diff --git a/drivers/accel/ivpu/ivpu_drv.h b/drivers/accel/ivpu/ivpu_drv.h index 2adc349126bb..6853dfe1c7e5 100644 --- a/drivers/accel/ivpu/ivpu_drv.h +++ b/drivers/accel/ivpu/ivpu_drv.h @@ -76,6 +76,11 @@
#define IVPU_WA(wa_name) (vdev->wa.wa_name)
+#define IVPU_PRINT_WA(wa_name) do { \ + if (IVPU_WA(wa_name)) \ + ivpu_dbg(vdev, MISC, "Using WA: " #wa_name "\n"); \ +} while (0) + struct ivpu_wa_table { bool punit_disabled; bool clear_runtime_mem; diff --git a/drivers/accel/ivpu/ivpu_hw_37xx.c b/drivers/accel/ivpu/ivpu_hw_37xx.c index b8010c07eec1..2409ff0dda61 100644 --- a/drivers/accel/ivpu/ivpu_hw_37xx.c +++ b/drivers/accel/ivpu/ivpu_hw_37xx.c @@ -104,6 +104,11 @@ static void ivpu_hw_wa_init(struct ivpu_device *vdev)
if (ivpu_device_id(vdev) == PCI_DEVICE_ID_MTL && ivpu_revision(vdev) < 4) vdev->wa.interrupt_clear_with_0 = true; + + IVPU_PRINT_WA(punit_disabled); + IVPU_PRINT_WA(clear_runtime_mem); + IVPU_PRINT_WA(d3hot_after_power_off); + IVPU_PRINT_WA(interrupt_clear_with_0); }
static void ivpu_hw_timeouts_init(struct ivpu_device *vdev) diff --git a/drivers/accel/ivpu/ivpu_hw_40xx.c b/drivers/accel/ivpu/ivpu_hw_40xx.c index 7c3ff25232a2..03600a7a5aca 100644 --- a/drivers/accel/ivpu/ivpu_hw_40xx.c +++ b/drivers/accel/ivpu/ivpu_hw_40xx.c @@ -125,6 +125,10 @@ static void ivpu_hw_wa_init(struct ivpu_device *vdev)
if (ivpu_hw_gen(vdev) == IVPU_HW_40XX) vdev->wa.disable_clock_relinquish = true; + + IVPU_PRINT_WA(punit_disabled); + IVPU_PRINT_WA(clear_runtime_mem); + IVPU_PRINT_WA(disable_clock_relinquish); }
static void ivpu_hw_timeouts_init(struct ivpu_device *vdev)
From: Andrzej Kacprowski Andrzej.Kacprowski@intel.com
stable inclusion from stable-v6.6.8 commit 83a42d791ba206978ec5381743dbcc49f3357e1d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 35c49cfc8b702eda7a0d3f05497b16f81b69e289 ]
Using PCI Device ID/Revision to initialize the interrupt_clear_with_0 workaround is problematic - there are many pre-production steppings with different behavior, even with the same PCI ID/Revision
Instead of checking for PCI Device ID/Revision, check the VPU buttress interrupt status register behavior - if this register is not zero after writing 1s it means there register is RW instead of RW1C and we need to enable the interrupt_clear_with_0 workaround.
Fixes: 7f34e01f77f8 ("accel/ivpu: Clear specific interrupt status bits on C0") Signed-off-by: Andrzej Kacprowski Andrzej.Kacprowski@intel.com Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://lore.kernel.org/all/20231204122331.40560-1-jacek.lawrynowicz@linux.i... Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/accel/ivpu/ivpu_hw_37xx.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_hw_37xx.c b/drivers/accel/ivpu/ivpu_hw_37xx.c index 2409ff0dda61..ddf03498fd4c 100644 --- a/drivers/accel/ivpu/ivpu_hw_37xx.c +++ b/drivers/accel/ivpu/ivpu_hw_37xx.c @@ -53,10 +53,12 @@
#define ICB_0_1_IRQ_MASK ((((u64)ICB_1_IRQ_MASK) << 32) | ICB_0_IRQ_MASK)
-#define BUTTRESS_IRQ_MASK ((REG_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, FREQ_CHANGE)) | \ - (REG_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, ATS_ERR)) | \ +#define BUTTRESS_IRQ_MASK ((REG_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, ATS_ERR)) | \ (REG_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, UFI_ERR)))
+#define BUTTRESS_ALL_IRQ_MASK (BUTTRESS_IRQ_MASK | \ + (REG_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, FREQ_CHANGE))) + #define BUTTRESS_IRQ_ENABLE_MASK ((u32)~BUTTRESS_IRQ_MASK) #define BUTTRESS_IRQ_DISABLE_MASK ((u32)-1)
@@ -102,8 +104,12 @@ static void ivpu_hw_wa_init(struct ivpu_device *vdev) vdev->wa.clear_runtime_mem = false; vdev->wa.d3hot_after_power_off = true;
- if (ivpu_device_id(vdev) == PCI_DEVICE_ID_MTL && ivpu_revision(vdev) < 4) + REGB_WR32(VPU_37XX_BUTTRESS_INTERRUPT_STAT, BUTTRESS_ALL_IRQ_MASK); + if (REGB_RD32(VPU_37XX_BUTTRESS_INTERRUPT_STAT) == BUTTRESS_ALL_IRQ_MASK) { + /* Writing 1s does not clear the interrupt status register */ vdev->wa.interrupt_clear_with_0 = true; + REGB_WR32(VPU_37XX_BUTTRESS_INTERRUPT_STAT, 0x0); + }
IVPU_PRINT_WA(punit_disabled); IVPU_PRINT_WA(clear_runtime_mem);
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
stable inclusion from stable-v6.6.8 commit 54d08313a34faf7cdfa29cd6da320d1aa58b5235 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7c7c863bf89c5f76d8c7fda177a81559b61dc15b ]
Engine->id namespace is per-tile so struct igt_live_test->reset_engine[] needs to be two-dimensional so engine reset counts from all tiles can be stored with no aliasing. With aliasing, if we had a real multi-tile platform, the reset counts would be incorrect for same engine instance on different tiles.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Fixes: 0c29efa23f5c ("drm/i915/selftests: Consider multi-gt instead of to_gt()") Reported-by: Alan Previn Teres Alexis alan.previn.teres.alexis@intel.com Cc: Tejas Upadhyay tejas.upadhyay@intel.com Cc: Andi Shyti andi.shyti@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-1-tvrtko... (cherry picked from commit 0647ece3819b018cb62a71c3bcb7c2c3243e78ac) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/i915/selftests/igt_live_test.c | 9 +++++---- drivers/gpu/drm/i915/selftests/igt_live_test.h | 3 ++- 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c index 4ddc6d902752..7d41874a49c5 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c @@ -37,8 +37,9 @@ int igt_live_test_begin(struct igt_live_test *t, }
for_each_engine(engine, gt, id) - t->reset_engine[id] = - i915_reset_engine_count(&i915->gpu_error, engine); + t->reset_engine[i][id] = + i915_reset_engine_count(&i915->gpu_error, + engine); }
t->reset_global = i915_reset_count(&i915->gpu_error); @@ -66,14 +67,14 @@ int igt_live_test_end(struct igt_live_test *t)
for_each_gt(gt, i915, i) { for_each_engine(engine, gt, id) { - if (t->reset_engine[id] == + if (t->reset_engine[i][id] == i915_reset_engine_count(&i915->gpu_error, engine)) continue;
gt_err(gt, "%s(%s): engine '%s' was reset %d times!\n", t->func, t->name, engine->name, i915_reset_engine_count(&i915->gpu_error, engine) - - t->reset_engine[id]); + t->reset_engine[i][id]); return -EIO; } } diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.h b/drivers/gpu/drm/i915/selftests/igt_live_test.h index 36ed42736c52..83e3ad430922 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.h +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.h @@ -7,6 +7,7 @@ #ifndef IGT_LIVE_TEST_H #define IGT_LIVE_TEST_H
+#include "gt/intel_gt_defines.h" /* for I915_MAX_GT */ #include "gt/intel_engine.h" /* for I915_NUM_ENGINES */
struct drm_i915_private; @@ -17,7 +18,7 @@ struct igt_live_test { const char *name;
unsigned int reset_global; - unsigned int reset_engine[I915_NUM_ENGINES]; + unsigned int reset_engine[I915_MAX_GT][I915_NUM_ENGINES]; };
/*
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
stable inclusion from stable-v6.6.8 commit cd378c371ba080652a9c4d02f77d0737ff0d126e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 1f721a93a528268fa97875cff515d1fcb69f4f44 ]
Commit 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") made the GSC0 engine not have a valid uabi class and so broke the engine reset counting, which in turn was made class based in cb823ed9915b ("drm/i915/gt: Use intel_gt as the primary object for handling resets").
Despite the title and commit text of the latter is not mentioning it (and has left the storage array incorrectly sized), tracking by class, despite it adding aliasing in hypthotetical multi-tile systems, is handy for virtual engines which for instance do not have a valid engine->id.
Therefore we keep that but just change it to use the internal class which is always valid. We also add a helper to increment the count, which aligns with the existing getter.
What was broken without this fix were out of bounds reads every time a reset would happen on the GSC0 engine, or during selftests when storing and cross-checking the counts in igt_live_test_begin and igt_live_test_end.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Fixes: 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") [tursulin: fixed Fixes tag] Reported-by: Alan Previn Teres Alexis alan.previn.teres.alexis@intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Reviewed-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-2-tvrtko... (cherry picked from commit cf9cb028ac56696ff879af1154c4b2f0b12701fd) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++-- drivers/gpu/drm/i915/i915_gpu_error.h | 12 ++++++++++-- 3 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index cc6bd21a3e51..5fa57a34cf4b 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1297,7 +1297,7 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) if (msg) drm_notice(&engine->i915->drm, "Resetting %s for %s\n", engine->name, msg); - atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]); + i915_increase_reset_engine_count(&engine->i915->gpu_error, engine);
ret = intel_gt_reset_engine(engine); if (ret) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index dc7b40e06e38..836e4d9d65ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4774,7 +4774,8 @@ static void capture_error_state(struct intel_guc *guc, if (match) { intel_engine_set_hung_context(e, ce); engine_mask |= e->mask; - atomic_inc(&i915->gpu_error.reset_engine_count[e->uabi_class]); + i915_increase_reset_engine_count(&i915->gpu_error, + e); } }
@@ -4786,7 +4787,7 @@ static void capture_error_state(struct intel_guc *guc, } else { intel_engine_set_hung_context(ce->engine, ce); engine_mask = ce->engine->mask; - atomic_inc(&i915->gpu_error.reset_engine_count[ce->engine->uabi_class]); + i915_increase_reset_engine_count(&i915->gpu_error, ce->engine); }
with_intel_runtime_pm(&i915->runtime_pm, wakeref) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index 9f5971f5e980..48f6c00402c4 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -16,6 +16,7 @@
#include "display/intel_display_device.h" #include "gt/intel_engine.h" +#include "gt/intel_engine_types.h" #include "gt/intel_gt_types.h" #include "gt/uc/intel_uc_fw.h"
@@ -232,7 +233,7 @@ struct i915_gpu_error { atomic_t reset_count;
/** Number of times an engine has been reset */ - atomic_t reset_engine_count[I915_NUM_ENGINES]; + atomic_t reset_engine_count[MAX_ENGINE_CLASS]; };
struct drm_i915_error_state_buf { @@ -255,7 +256,14 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error) static inline u32 i915_reset_engine_count(struct i915_gpu_error *error, const struct intel_engine_cs *engine) { - return atomic_read(&error->reset_engine_count[engine->uabi_class]); + return atomic_read(&error->reset_engine_count[engine->class]); +} + +static inline void +i915_increase_reset_engine_count(struct i915_gpu_error *error, + const struct intel_engine_cs *engine) +{ + atomic_inc(&error->reset_engine_count[engine->class]); }
#define CORE_DUMP_FLAG_NONE 0x0
From: David Hildenbrand david@redhat.com
stable inclusion from stable-v6.6.8 commit ca3ebcf2c448f04ae144ddb19ad68b4cab6fcad2 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a6fcd57cf2df409d35e9225b8dbad6f937b28df0 ]
Doing a ksft_print_msg() before the ksft_print_header() seems to confuse the ksft framework in a strange way: running the test on the cmdline results in the expected output.
But piping the output somewhere else, results in some odd output, whereby we repeatedly get the same info printed: # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled TAP version 13 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled TAP version 13 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page ok 1 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped out base page # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled
Doing the ksft_print_header() first seems to resolve that and gives us the output we expect: TAP version 13 # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page ok 1 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped out base page ok 2 No leak from parent into child # [RUN] Basic COW after fork() ... with THP ok 3 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped-out THP ok 4 No leak from parent into child # [RUN] Basic COW after fork() ... with PTE-mapped THP ok 5 No leak from parent into child
Link: https://lkml.kernel.org/r/20231206103558.38040-1-david@redhat.com Fixes: f4b5fd6946e2 ("selftests/vm: anon_cow: THP tests") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Nico Pache npache@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- tools/testing/selftests/mm/cow.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index ace94bfb532d..363bf5f801be 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -1738,6 +1738,8 @@ int main(int argc, char **argv) int err; struct thp_settings default_settings;
+ ksft_print_header(); + pagesize = getpagesize(); pmdsize = read_pmd_pagesize(); if (pmdsize) { @@ -1755,7 +1757,6 @@ int main(int argc, char **argv) ARRAY_SIZE(hugetlbsizes)); detect_huge_zeropage();
- ksft_print_header(); ksft_set_plan(ARRAY_SIZE(anon_test_cases) * tests_per_anon_test_case() + ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() + ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case());
From: Saurabh Sengar ssengar@linux.microsoft.com
stable inclusion from stable-v6.6.8 commit 54d8c1d3261dfb8be84ffc17cf94fa231c9d88d4 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7e8037b099c0bbe8f2109dc452dbcab8d400fc53 ]
A Gen2 VM doesn't support legacy PCI/PCIe, so both raw_pci_ops and raw_pci_ext_ops are NULL, and pci_subsys_init() -> pcibios_init() doesn't call pcibios_resource_survey() -> e820__reserve_resources_late(); as a result, any emulated persistent memory of E820_TYPE_PRAM (12) via the kernel parameter memmap=nn[KMG]!ss is not added into iomem_resource and hence can't be detected by register_e820_pmem().
Fix this by directly calling e820__reserve_resources_late() in hv_pci_init(), which is called from arch_initcall(pci_arch_init).
It's ok to move a Gen2 VM's e820__reserve_resources_late() from subsys_initcall(pci_subsys_init) to arch_initcall(pci_arch_init) because the code in-between doesn't depend on the E820 resources. e820__reserve_resources_late() depends on e820__reserve_resources(), which has been called earlier from setup_arch().
For a Gen-2 VM, the new hv_pci_init() also adds any memory of E820_TYPE_PMEM (7) into iomem_resource, and acpi_nfit_register_region() -> acpi_nfit_insert_resource() -> region_intersects() returns REGION_INTERSECTS, so the memory of E820_TYPE_PMEM won't get added twice.
Changed the local variable "int gen2vm" to "bool gen2vm".
Signed-off-by: Saurabh Sengar ssengar@linux.microsoft.com Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Message-ID: 1699691867-9827-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/x86/hyperv/hv_init.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 21556ad87f4b..8f3a4d16bb79 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -15,6 +15,7 @@ #include <linux/io.h> #include <asm/apic.h> #include <asm/desc.h> +#include <asm/e820/api.h> #include <asm/sev.h> #include <asm/ibt.h> #include <asm/hypervisor.h> @@ -286,15 +287,31 @@ static int hv_cpu_die(unsigned int cpu)
static int __init hv_pci_init(void) { - int gen2vm = efi_enabled(EFI_BOOT); + bool gen2vm = efi_enabled(EFI_BOOT);
/* - * For Generation-2 VM, we exit from pci_arch_init() by returning 0. - * The purpose is to suppress the harmless warning: + * A Generation-2 VM doesn't support legacy PCI/PCIe, so both + * raw_pci_ops and raw_pci_ext_ops are NULL, and pci_subsys_init() -> + * pcibios_init() doesn't call pcibios_resource_survey() -> + * e820__reserve_resources_late(); as a result, any emulated persistent + * memory of E820_TYPE_PRAM (12) via the kernel parameter + * memmap=nn[KMG]!ss is not added into iomem_resource and hence can't be + * detected by register_e820_pmem(). Fix this by directly calling + * e820__reserve_resources_late() here: e820__reserve_resources_late() + * depends on e820__reserve_resources(), which has been called earlier + * from setup_arch(). Note: e820__reserve_resources_late() also adds + * any memory of E820_TYPE_PMEM (7) into iomem_resource, and + * acpi_nfit_register_region() -> acpi_nfit_insert_resource() -> + * region_intersects() returns REGION_INTERSECTS, so the memory of + * E820_TYPE_PMEM won't get added twice. + * + * We return 0 here so that pci_arch_init() won't print the warning: * "PCI: Fatal: No config space access function found" */ - if (gen2vm) + if (gen2vm) { + e820__reserve_resources_late(); return 0; + }
/* For Generation-1 VM, we'll proceed in pci_arch_init(). */ return 1;
From: Oliver Neukum oneukum@suse.com
stable inclusion from stable-v6.6.8 commit 2ebf775f0541ae0d474836fa0cf3220e502f8e3e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit ccab434e674ca95d483788b1895a70c21b7f016a ]
If a device sends a packet that is inbetween 0 and sizeof(u64) the value passed to skb_trim() as length will wrap around ending up as some very large value.
The driver will then proceed to parse the header located at that position, which will either oops or process some random value.
The fix is to check against sizeof(u64) rather than 0, which the driver currently does. The issue exists since the introduction of the driver.
Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/usb/aqc111.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index a017e9de2119..7b8afa589a53 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -1079,17 +1079,17 @@ static int aqc111_rx_fixup(struct usbnet *dev, struct sk_buff *skb) u16 pkt_count = 0; u64 desc_hdr = 0; u16 vlan_tag = 0; - u32 skb_len = 0; + u32 skb_len;
if (!skb) goto err;
- if (skb->len == 0) + skb_len = skb->len; + if (skb_len < sizeof(desc_hdr)) goto err;
- skb_len = skb->len; /* RX Descriptor Header */ - skb_trim(skb, skb->len - sizeof(desc_hdr)); + skb_trim(skb, skb_len - sizeof(desc_hdr)); desc_hdr = le64_to_cpup((u64 *)skb_tail_pointer(skb));
/* Check these packets */
From: Jean Delvare jdelvare@suse.de
stable inclusion from stable-v6.6.8 commit c40c0b89bf1d243de9b3d5f233d8d19ac51f9eae bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7fbd5fc2b35a8f559a6b380dfa9bcd964a758186 ]
Only present the DWMAC_LOONGSON option on architectures where it can actually be used.
This follows the same logic as the DWMAC_INTEL option.
Signed-off-by: Jean Delvare jdelvare@suse.de Cc: Keguang Zhang keguang.zhang@gmail.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/ethernet/stmicro/stmmac/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig index 06c6871f8788..25f2d42de406 100644 --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig @@ -269,7 +269,7 @@ config DWMAC_INTEL config DWMAC_LOONGSON tristate "Loongson PCI DWMAC support" default MACH_LOONGSON64 - depends on STMMAC_ETH && PCI + depends on (MACH_LOONGSON64 || COMPILE_TEST) && STMMAC_ETH && PCI depends on COMMON_CLK help This selects the LOONGSON PCI bus support for the stmmac driver,
From: David Howells dhowells@redhat.com
stable inclusion from stable-v6.6.8 commit 462f1111d9459baa1ff98c3bc966b64d480b9ea0 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 0c3bd086d12d185650d095a906662593ec607bd0 ]
Fix some superficial issues with the tracing of rxrpc_bundle structs, including:
(1) Set the debug_id when the bundle is allocated rather than when it is set up so that the "NEW" trace line displays the correct bundle ID.
(2) Show the refcount when emitting the "FREE" traceline.
Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: "David S. Miller" davem@davemloft.net cc: Eric Dumazet edumazet@google.com cc: Jakub Kicinski kuba@kernel.org cc: Paolo Abeni pabeni@redhat.com cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- net/rxrpc/conn_client.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 981ca5b98bcb..1d95f8bc769f 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -73,6 +73,7 @@ static void rxrpc_destroy_client_conn_ids(struct rxrpc_local *local) static struct rxrpc_bundle *rxrpc_alloc_bundle(struct rxrpc_call *call, gfp_t gfp) { + static atomic_t rxrpc_bundle_id; struct rxrpc_bundle *bundle;
bundle = kzalloc(sizeof(*bundle), gfp); @@ -85,6 +86,7 @@ static struct rxrpc_bundle *rxrpc_alloc_bundle(struct rxrpc_call *call, bundle->upgrade = test_bit(RXRPC_CALL_UPGRADE, &call->flags); bundle->service_id = call->dest_srx.srx_service; bundle->security_level = call->security_level; + bundle->debug_id = atomic_inc_return(&rxrpc_bundle_id); refcount_set(&bundle->ref, 1); atomic_set(&bundle->active, 1); INIT_LIST_HEAD(&bundle->waiting_calls); @@ -105,7 +107,8 @@ struct rxrpc_bundle *rxrpc_get_bundle(struct rxrpc_bundle *bundle,
static void rxrpc_free_bundle(struct rxrpc_bundle *bundle) { - trace_rxrpc_bundle(bundle->debug_id, 1, rxrpc_bundle_free); + trace_rxrpc_bundle(bundle->debug_id, refcount_read(&bundle->ref), + rxrpc_bundle_free); rxrpc_put_peer(bundle->peer, rxrpc_peer_put_bundle); key_put(bundle->key); kfree(bundle); @@ -239,7 +242,6 @@ static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn) */ int rxrpc_look_up_bundle(struct rxrpc_call *call, gfp_t gfp) { - static atomic_t rxrpc_bundle_id; struct rxrpc_bundle *bundle, *candidate; struct rxrpc_local *local = call->local; struct rb_node *p, **pp, *parent; @@ -306,7 +308,6 @@ int rxrpc_look_up_bundle(struct rxrpc_call *call, gfp_t gfp) }
_debug("new bundle"); - candidate->debug_id = atomic_inc_return(&rxrpc_bundle_id); rb_link_node(&candidate->local_node, parent, pp); rb_insert_color(&candidate->local_node, &local->client_bundles); call->bundle = rxrpc_get_bundle(candidate, rxrpc_bundle_get_client_call);
From: Ming Lei ming.lei@redhat.com
stable inclusion from stable-v6.6.8 commit 5aba47ce61b7c277246e367f51fdcc930af6a699 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 27b13e209ddca5979847a1b57890e0372c1edcee ]
Inside blkg_for_each_descendant_pre(), both css_for_each_descendant_pre() and blkg_lookup() requires RCU read lock, and either cgroup_assert_mutex_or_rcu_locked() or rcu_read_lock_held() is called.
Fix the warning by adding rcu read lock.
Reported-by: Changhui Zhong czhong@redhat.com Signed-off-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20231117023527.3188627-2-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- block/blk-throttle.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 76a54f43135c..b1cbb3a1da0f 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1346,6 +1346,7 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) tg_bps_limit(tg, READ), tg_bps_limit(tg, WRITE), tg_iops_limit(tg, READ), tg_iops_limit(tg, WRITE));
+ rcu_read_lock(); /* * Update has_rules[] flags for the updated tg's subtree. A tg is * considered to have rules if either the tg itself or any of its @@ -1373,6 +1374,7 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) this_tg->latency_target = max(this_tg->latency_target, parent_tg->latency_target); } + rcu_read_unlock();
/* * We're already holding queue_lock and know @tg is valid. Let's
From: Coly Li colyli@suse.de
stable inclusion from stable-v6.6.8 commit 09bdafb89a56a267c070fdb96d385934c14d2735 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit baf8fb7e0e5ec54ea0839f0c534f2cdcd79bea9c ]
Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are used for dirty data writeback, their sizes are decided by backing device capacity and stripe size. Larger backing device capacity or smaller stripe size make these two arraies occupies more dynamic memory space.
Currently bcache->stripe_size is directly inherited from queue->limits.io_opt of underlying storage device. For normal hard drives, its limits.io_opt is 0, and bcache sets the corresponding stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for devices do declare value for queue->limits.io_opt, small stripe_size (comparing to 1TB) becomes an issue for oversize memory allocations of bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the capacity of hard drives gets much larger in recent decade.
For example a raid5 array assembled by three 20TB hardrives, the raid device capacity is 40TB with typical 512KB limits.io_opt. After the math calculation in bcache code, these two arraies will occupy 400MB dynamic memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is declared on a new 2TB hard drive, then these two arraies request 2GB and 512MB dynamic memory from kzalloc(). The result is that bcache device always fails to initialize on his system.
To avoid the oversize memory allocation, bcache->stripe_size should not directly inherited by queue->limits.io_opt from the underlying device. This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size and set bcache device's stripe size against the declared limits.io_opt value from the underlying storage device, - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will set its stripe size directly by this limits.io_opt value. - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will set its stripe size by a value multiplying limits.io_opt and euqal or large than BCH_MIN_STRIPE_SZ.
Then the minimal stripe size of a bcache device will always be >= 4MB. For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied by these two arraies will be 2.5MB in total.
Such mount of memory allocated for bcache->stripe_sectors_dirty and bcache->full_dirty_stripes is reasonable for most of storage devices.
Reported-by: Andrea Tomassetti andrea.tomassetti-opensource@devo.com Signed-off-by: Coly Li colyli@suse.de Reviewed-by: Eric Wheeler bcache@lists.ewheeler.net Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/md/bcache/bcache.h | 1 + drivers/md/bcache/super.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 2aa3f2c1f719..04e54a2d0ebf 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -265,6 +265,7 @@ struct bcache_device { #define BCACHE_DEV_WB_RUNNING 3 #define BCACHE_DEV_RATE_DW_RUNNING 4 int nr_stripes; +#define BCH_MIN_STRIPE_SZ ((4 << 20) >> SECTOR_SHIFT) unsigned int stripe_size; atomic_t *stripe_sectors_dirty; unsigned long *full_dirty_stripes; diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a30c8d4f2ac8..cc5c603f92d9 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -905,6 +905,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
if (!d->stripe_size) d->stripe_size = 1 << 31; + else if (d->stripe_size < BCH_MIN_STRIPE_SZ) + d->stripe_size = roundup(BCH_MIN_STRIPE_SZ, d->stripe_size);
n = DIV_ROUND_UP_ULL(sectors, d->stripe_size); if (!n || n > max_stripes) {
From: Colin Ian King colin.i.king@gmail.com
stable inclusion from stable-v6.6.8 commit 665341724499addb5400202294eef513b0f5f5de bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit be93825f0e6428c2d3f03a6e4d447dc48d33d7ff ]
Variable cur_idx is being initialized with a value that is never read, it is being re-assigned later in a while-loop. Remove the redundant assignment. Cleans up clang scan build warning:
drivers/md/bcache/writeback.c:916:2: warning: Value stored to 'cur_idx' is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King colin.i.king@gmail.com Reviewed-by: Coly Li colyli@suse.de Signed-off-by: Coly Li colyli@suse.de Link: https://lore.kernel.org/r/20231120052503.6122-4-colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/md/bcache/writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index d4432b3a6f96..3accfdaee6b1 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -913,7 +913,7 @@ static int bch_dirty_init_thread(void *arg) int cur_idx, prev_idx, skip_nr;
k = p = NULL; - cur_idx = prev_idx = 0; + prev_idx = 0;
bch_btree_iter_init(&c->root->keys, &iter, NULL); k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad);
From: Coly Li colyli@suse.de
stable inclusion from stable-v6.6.8 commit 286918928ed7094d53e6fd08513e4a3f1a09f1d0 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 31f5b956a197d4ec25c8a07cb3a2ab69d0c0b82f ]
This patch adds code comments to bch_btree_node_get() and __bch_btree_node_alloc() that NULL pointer will not be returned and it is unnecessary to check NULL pointer by the callers of these routines.
Signed-off-by: Coly Li colyli@suse.de Link: https://lore.kernel.org/r/20231120052503.6122-10-colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/md/bcache/btree.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 3084c57248f6..b709c2fde782 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -995,6 +995,9 @@ static struct btree *mca_alloc(struct cache_set *c, struct btree_op *op, * * The btree node will have either a read or a write lock held, depending on * level and op->lock. + * + * Note: Only error code or btree pointer will be returned, it is unncessary + * for callers to check NULL pointer. */ struct btree *bch_btree_node_get(struct cache_set *c, struct btree_op *op, struct bkey *k, int level, bool write, @@ -1106,6 +1109,10 @@ static void btree_node_free(struct btree *b) mutex_unlock(&b->c->bucket_lock); }
+/* + * Only error code or btree pointer will be returned, it is unncessary for + * callers to check NULL pointer. + */ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op, int level, bool wait, struct btree *parent)
From: Coly Li colyli@suse.de
stable inclusion from stable-v6.6.8 commit 4a4bba9f0470d0cb6afd8c8e880b04567585f542 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3eba5e0b2422aec3c9e79822029599961fdcab97 ]
In run_cache_set() after c->root returned from bch_btree_node_get(), it is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL because bch_btree_node_get() will not return NULL pointer to caller.
This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason.
Signed-off-by: Coly Li colyli@suse.de Link: https://lore.kernel.org/r/20231120052503.6122-11-colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index cc5c603f92d9..d028dfbbb6c6 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2018,7 +2018,7 @@ static int run_cache_set(struct cache_set *c) c->root = bch_btree_node_get(c, NULL, k, j->btree_level, true, NULL); - if (IS_ERR_OR_NULL(c->root)) + if (IS_ERR(c->root)) goto err;
list_del_init(&c->root->list);
From: Mark O'Donovan shiftee@posteo.net
stable inclusion from stable-v6.6.8 commit 89fc9028e86e9968294ea4cb66d94fa7e5438871 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 38ce1570e2c46e7e9af983aa337edd7e43723aa2 ]
Some error cases were not setting an auth-failure-reason-code-explanation. This means an AUTH_Failure2 message will be sent with an explanation value of 0 which is a reserved value.
Signed-off-by: Mark O'Donovan shiftee@posteo.net Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/nvme/host/auth.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index 064592a5d546..811541ce206b 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -840,6 +840,8 @@ static void nvme_queue_auth_work(struct work_struct *work) }
fail2: + if (chap->status == 0) + chap->status = NVME_AUTH_DHCHAP_FAILURE_FAILED; dev_dbg(ctrl->device, "%s: qid %d send failure2, status %x\n", __func__, chap->qid, chap->status); tl = nvme_auth_set_dhchap_failure2_data(ctrl, chap);
From: Hannes Reinecke hare@suse.de
stable inclusion from stable-v6.6.8 commit 9514925a9abc4e26ce58ec93f70bfb565b53bd72 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit cd9aed606088d36a7ffff3e808db4e76b1854285 ]
nvme_configure_metadata() is issuing I/O, so we might incur an I/O error which will cause the connection to be reset. But in that case any further probing will race with reset and cause UAF errors. So return a status from nvme_configure_metadata() and abort probing if there was an I/O error.
Signed-off-by: Hannes Reinecke hare@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/nvme/host/core.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index c09048984a27..d5c8b0a08d49 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1813,16 +1813,18 @@ static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id) return ret; }
-static void nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id) +static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id) { struct nvme_ctrl *ctrl = ns->ctrl; + int ret;
- if (nvme_init_ms(ns, id)) - return; + ret = nvme_init_ms(ns, id); + if (ret) + return ret;
ns->features &= ~(NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS); if (!ns->ms || !(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)) - return; + return 0;
if (ctrl->ops->flags & NVME_F_FABRICS) { /* @@ -1831,7 +1833,7 @@ static void nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id) * remap the separate metadata buffer from the block layer. */ if (WARN_ON_ONCE(!(id->flbas & NVME_NS_FLBAS_META_EXT))) - return; + return 0;
ns->features |= NVME_NS_EXT_LBAS;
@@ -1858,6 +1860,7 @@ static void nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id) else ns->features |= NVME_NS_METADATA_SUPPORTED; } + return 0; }
static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, @@ -2038,7 +2041,11 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns, ns->lba_shift = id->lbaf[lbaf].ds; nvme_set_queue_limits(ns->ctrl, ns->queue);
- nvme_configure_metadata(ns, id); + ret = nvme_configure_metadata(ns, id); + if (ret < 0) { + blk_mq_unfreeze_queue(ns->disk->queue); + goto out; + } nvme_set_chunk_sectors(ns, id); nvme_update_disk_info(ns->disk, ns, id);
From: Eduard Zingerman eddyz87@gmail.com
stable inclusion from stable-v6.6.8 commit 0ade0b82faf7b8dec11075fe8ea38a7dba898042 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks as if they are called unknown number of times" changes logic for callbacks handling. While previously callbacks were verified as a single function call, new scheme takes into account that callbacks could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid") int benchmark(void *ctx) { for (int i = 0; i < 1000; i++) { bpf_loop(nr_loops, empty_callback, NULL, 0); __sync_add_and_fetch(&hits, nr_loops); } return 0; }
W/o callbacks change verifier sees it as a 1000 calls to empty_callback(). However, with callbacks change things become exponential: - i=0: state exploring empty_callback is scheduled with i=0 (a); - i=1: state exploring empty_callback is scheduled with i=1; ... - i=999: state exploring empty_callback is scheduled with i=999; - state (a) is popped from stack; - i=1: state exploring empty_callback is scheduled with i=1; ...
Avoid this issue by rewriting outer loop as bpf_loop(). Unfortunately, this adds a function call to a loop at runtime, which negatively affects performance:
throughput latency before: 149.919 ± 0.168 M ops/s, 6.670 ns/op after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c index 4ce76eb064c4..d461746fd3c1 100644 --- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c +++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c @@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data) return 0; }
+static int outer_loop(__u32 index, void *data) +{ + bpf_loop(nr_loops, empty_callback, NULL, 0); + __sync_add_and_fetch(&hits, nr_loops); + return 0; +} + SEC("fentry/" SYS_PREFIX "sys_getpgid") int benchmark(void *ctx) { - for (int i = 0; i < 1000; i++) { - bpf_loop(nr_loops, empty_callback, NULL, 0); - - __sync_add_and_fetch(&hits, nr_loops); - } + bpf_loop(1000, outer_loop, NULL, 0); return 0; }
From: Masahiro Yamada masahiroy@kernel.org
stable inclusion from stable-v6.6.8 commit 03372601f5f1f143e759b611d93c0f2c2cb0b0b3 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit d3ec75bc635cb0cb8185b63293d33a3d1b942d22 ]
A common issue in Makefile is a race in parallel building.
You need to be careful to prevent multiple threads from writing to the same file simultaneously.
Commit 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not generate invalid images") addressed such a bad scenario.
A similar symptom occurs with the following command:
$ make -j$(nproc) ARCH=loongarch vmlinux.efi vmlinuz.efi [ snip ] SORTTAB vmlinux OBJCOPY arch/loongarch/boot/vmlinux.efi OBJCOPY arch/loongarch/boot/vmlinux.efi PAD arch/loongarch/boot/vmlinux.bin GZIP arch/loongarch/boot/vmlinuz OBJCOPY arch/loongarch/boot/vmlinuz.o LD arch/loongarch/boot/vmlinuz.efi.elf OBJCOPY arch/loongarch/boot/vmlinuz.efi
The log "OBJCOPY arch/loongarch/boot/vmlinux.efi" is displayed twice.
It indicates that two threads simultaneously enter arch/loongarch/boot/ and write to arch/loongarch/boot/vmlinux.efi.
It occasionally leads to a build failure:
$ make -j$(nproc) ARCH=loongarch vmlinux.efi vmlinuz.efi [ snip ] SORTTAB vmlinux OBJCOPY arch/loongarch/boot/vmlinux.efi PAD arch/loongarch/boot/vmlinux.bin truncate: Invalid number: ‘arch/loongarch/boot/vmlinux.bin’ make[2]: *** [drivers/firmware/efi/libstub/Makefile.zboot:13: arch/loongarch/boot/vmlinux.bin] Error 1 make[2]: *** Deleting file 'arch/loongarch/boot/vmlinux.bin' make[1]: *** [arch/loongarch/Makefile:146: vmlinuz.efi] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [Makefile:234: __sub-make] Error 2
vmlinuz.efi depends on vmlinux.efi, but such a dependency is not specified in arch/loongarch/Makefile.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/loongarch/Makefile | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index fb0fada43197..96747bfec1a1 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -142,6 +142,8 @@ vdso_install:
all: $(notdir $(KBUILD_IMAGE))
+vmlinuz.efi: vmlinux.efi + vmlinux.elf vmlinux.efi vmlinuz.efi: vmlinux $(Q)$(MAKE) $(build)=$(boot) $(bootvars-y) $(boot)/$@
From: WANG Rui wangrui@loongson.cn
stable inclusion from stable-v6.6.8 commit ab3f300524697919f64ae920e904d0836b4057b0 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit aa0cbc1b506b090c3a775b547c693ada108cc0d7 ]
To clarify, the previous version functioned flawlessly. However, it's worth noting that the LLVM's LoongArch backend currently lacks support for cross-section label calculations. With this patch, we enable the use of clang to compile relocatable kernels.
Tested-by: Nathan Chancellor nathan@kernel.org Signed-off-by: WANG Rui wangrui@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/loongarch/include/asm/asmmacro.h | 3 +-- arch/loongarch/include/asm/setup.h | 2 +- arch/loongarch/kernel/relocate.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/asmmacro.h b/arch/loongarch/include/asm/asmmacro.h index c9544f358c33..655db7d7a427 100644 --- a/arch/loongarch/include/asm/asmmacro.h +++ b/arch/loongarch/include/asm/asmmacro.h @@ -609,8 +609,7 @@ lu32i.d \reg, 0 lu52i.d \reg, \reg, 0 .pushsection ".la_abs", "aw", %progbits - 768: - .dword 768b-766b + .dword 766b .dword \sym .popsection #endif diff --git a/arch/loongarch/include/asm/setup.h b/arch/loongarch/include/asm/setup.h index a0bc159ce8bd..ee52fb1e9963 100644 --- a/arch/loongarch/include/asm/setup.h +++ b/arch/loongarch/include/asm/setup.h @@ -25,7 +25,7 @@ extern void set_merr_handler(unsigned long offset, void *addr, unsigned long len #ifdef CONFIG_RELOCATABLE
struct rela_la_abs { - long offset; + long pc; long symvalue; };
diff --git a/arch/loongarch/kernel/relocate.c b/arch/loongarch/kernel/relocate.c index 6c3eff9af9fb..288b739ca88d 100644 --- a/arch/loongarch/kernel/relocate.c +++ b/arch/loongarch/kernel/relocate.c @@ -52,7 +52,7 @@ static inline void __init relocate_absolute(long random_offset) for (p = begin; (void *)p < end; p++) { long v = p->symvalue; uint32_t lu12iw, ori, lu32id, lu52id; - union loongarch_instruction *insn = (void *)p - p->offset; + union loongarch_instruction *insn = (void *)p->pc;
lu12iw = (v >> 12) & 0xfffff; ori = v & 0xfff;
From: Huacai Chen chenhuacai@loongson.cn
stable inclusion from stable-v6.6.8 commit 71d8348cca92fa423538df047771653bf14f7810 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 902d75cdf0cf0a3fb58550089ee519abf12566f5 ]
The kernel parameter 'nokaslr' is handled before start_kernel(), so we don't need early_param() to mark it technically. But it can cause a boot warning as follows:
Unknown kernel command line parameters "nokaslr", will be passed to user space.
When we use 'init=/bin/bash', 'nokaslr' which passed to user space will even cause a kernel panic. So we use early_param() to mark 'nokaslr', simply print a notice and silence the boot warning (also fix a potential panic). This logic is similar to RISC-V.
Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/loongarch/kernel/relocate.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/loongarch/kernel/relocate.c b/arch/loongarch/kernel/relocate.c index 288b739ca88d..1acfa704c8d0 100644 --- a/arch/loongarch/kernel/relocate.c +++ b/arch/loongarch/kernel/relocate.c @@ -102,6 +102,14 @@ static inline __init unsigned long get_random_boot(void) return hash; }
+static int __init nokaslr(char *p) +{ + pr_info("KASLR is disabled.\n"); + + return 0; /* Print a notice and silence the boot warning */ +} +early_param("nokaslr", nokaslr); + static inline __init bool kaslr_disabled(void) { char *str;
From: Huacai Chen chenhuacai@loongson.cn
stable inclusion from stable-v6.6.8 commit c28fec461df3869355c95b6baed3c36aa870f1fe bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 19d86a496233731882aea7ec24505ce6641b1c0c ]
Mark {dmw,tlb}_virt_to_page() exports as non-GPL, in order to let out-of-tree modules (e.g. OpenZFS) be built without errors. Otherwise we get:
ERROR: modpost: GPL-incompatible module zfs.ko uses GPL-only symbol 'dmw_virt_to_page' ERROR: modpost: GPL-incompatible module zfs.ko uses GPL-only symbol 'tlb_virt_to_page'
Reported-by: Haowu Ge gehaowu@bitmoe.com Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/loongarch/mm/pgtable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c index 71d0539e2d0b..2aae72e63871 100644 --- a/arch/loongarch/mm/pgtable.c +++ b/arch/loongarch/mm/pgtable.c @@ -13,13 +13,13 @@ struct page *dmw_virt_to_page(unsigned long kaddr) { return pfn_to_page(virt_to_pfn(kaddr)); } -EXPORT_SYMBOL_GPL(dmw_virt_to_page); +EXPORT_SYMBOL(dmw_virt_to_page);
struct page *tlb_virt_to_page(unsigned long kaddr) { return pfn_to_page(pte_pfn(*virt_to_kpte(kaddr))); } -EXPORT_SYMBOL_GPL(tlb_virt_to_page); +EXPORT_SYMBOL(tlb_virt_to_page);
pgd_t *pgd_alloc(struct mm_struct *mm) {
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
stable inclusion from stable-v6.6.8 commit 086f91f3ce3b8d13cb794b955002b74444a384fa bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a6584711e64d9d12ab79a450ec3628fd35e4f476 ]
LKP found issues with a kernel doc in the driver:
core.c:116: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_update_events' core.c:188: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_get_eventconfig'
It looks like it were copy'n'paste typos when these descriptions had been introduced. Fix the typos.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202310070743.WALmRGSY-lkp@intel.com/ Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20231120150756.1661425-1-andriy.shevchenko@linux.i... Reviewed-by: Rajneesh Bhardwaj irenic.rajneesh@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/platform/x86/intel/telemetry/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/intel/telemetry/core.c b/drivers/platform/x86/intel/telemetry/core.c index fdf55b5d6948..e4be40f73eeb 100644 --- a/drivers/platform/x86/intel/telemetry/core.c +++ b/drivers/platform/x86/intel/telemetry/core.c @@ -102,7 +102,7 @@ static const struct telemetry_core_ops telm_defpltops = { /** * telemetry_update_events() - Update telemetry Configuration * @pss_evtconfig: PSS related config. No change if num_evts = 0. - * @pss_evtconfig: IOSS related config. No change if num_evts = 0. + * @ioss_evtconfig: IOSS related config. No change if num_evts = 0. * * This API updates the IOSS & PSS Telemetry configuration. Old config * is overwritten. Call telemetry_reset_events when logging is over @@ -176,7 +176,7 @@ EXPORT_SYMBOL_GPL(telemetry_reset_events); /** * telemetry_get_eventconfig() - Returns the pss and ioss events enabled * @pss_evtconfig: Pointer to PSS related configuration. - * @pss_evtconfig: Pointer to IOSS related configuration. + * @ioss_evtconfig: Pointer to IOSS related configuration. * @pss_len: Number of u32 elements allocated for pss_evtconfig array * @ioss_len: Number of u32 elements allocated for ioss_evtconfig array *
From: Hamish Martin hamish.martin@alliedtelesis.co.nz
stable inclusion from stable-v6.6.8 commit 2afe67cfe8f121b1b9dd244a0edc01b89cd65813 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f2d4a5834638bbc967371b9168c0b481519f7c5e ]
The process of adding an I2C adapter can invoke I2C accesses on that new adapter (see i2c_detect()).
Ensure we have set the adapter's driver data to avoid null pointer dereferences in the xfer functions during the adapter add.
This has been noted in the past and the same fix proposed but not completed. See: https://lore.kernel.org/lkml/ef597e73-ed71-168e-52af-0d19b03734ac@vigem.de/
Signed-off-by: Hamish Martin hamish.martin@alliedtelesis.co.nz Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-mcp2221.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-mcp2221.c b/drivers/hid/hid-mcp2221.c index 72883e0ce757..b95f31cf0fa2 100644 --- a/drivers/hid/hid-mcp2221.c +++ b/drivers/hid/hid-mcp2221.c @@ -1157,12 +1157,12 @@ static int mcp2221_probe(struct hid_device *hdev, snprintf(mcp->adapter.name, sizeof(mcp->adapter.name), "MCP2221 usb-i2c bridge");
+ i2c_set_adapdata(&mcp->adapter, mcp); ret = devm_i2c_add_adapter(&hdev->dev, &mcp->adapter); if (ret) { hid_err(hdev, "can't add usb-i2c adapter: %d\n", ret); return ret; } - i2c_set_adapdata(&mcp->adapter, mcp);
#if IS_REACHABLE(CONFIG_GPIOLIB) /* Setup GPIO chip */
From: Hamish Martin hamish.martin@alliedtelesis.co.nz
stable inclusion from stable-v6.6.8 commit d4b50ac06ea6e47bdd2b04e97d1090c673c80645 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 73ce9f1f2741a38f5d27393e627702ae2c46e6f2 ]
During the probe we add an I2C adapter and as soon as we add that adapter it may be used for a transfer (e.g via the code in i2cdetect()). Those transfers are not able to complete and time out. This is because the HID raw_event callback (mcp2221_raw_event) will not be invoked until the HID device's 'driver_input_lock' is marked up at the completion of the probe in hid_device_probe(). This starves the driver of the responses it is waiting for. In order to allow the I2C transfers to complete while we are still in the probe, start the IO once we have completed init of the HID device.
This issue seems to have been seen before and a patch was submitted but it seems it was never accepted. See: https://lore.kernel.org/all/20221103222714.21566-3-Enrik.Berkhan@inka.de/
Signed-off-by: Hamish Martin hamish.martin@alliedtelesis.co.nz Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-mcp2221.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hid/hid-mcp2221.c b/drivers/hid/hid-mcp2221.c index b95f31cf0fa2..aef0785c91cc 100644 --- a/drivers/hid/hid-mcp2221.c +++ b/drivers/hid/hid-mcp2221.c @@ -1142,6 +1142,8 @@ static int mcp2221_probe(struct hid_device *hdev, if (ret) return ret;
+ hid_device_io_start(hdev); + /* Set I2C bus clock diviser */ if (i2c_clk_freq > 400) i2c_clk_freq = 400;
From: Yihong Cao caoyihong4@outlook.com
stable inclusion from stable-v6.6.8 commit 6b3507b8ea55aae5377aaccd6d9cb9f6e0204136 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 113f736655e4f20633e107d731dd5bd097d5938c ]
Jamesdonkey A3R keyboard is identified as "Jamesdonkey A3R" in wired mode, "A3R-U" in wireless mode and "A3R" in bluetooth mode. Adding them to non-apple keyboards fixes function key.
Signed-off-by: Yihong Cao caoyihong4@outlook.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-apple.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c index 3ca45975c686..d9e9829b2200 100644 --- a/drivers/hid/hid-apple.c +++ b/drivers/hid/hid-apple.c @@ -345,6 +345,8 @@ static const struct apple_non_apple_keyboard non_apple_keyboards[] = { { "AONE" }, { "GANSS" }, { "Hailuck" }, + { "Jamesdonkey" }, + { "A3R" }, };
static bool apple_is_non_apple_keyboard(struct hid_device *hdev)
From: Brett Raye braye@fastmail.com
stable inclusion from stable-v6.6.8 commit c38f7b0f554f0403eb90516332a9b70a64cadc75 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a5e913c25b6b2b6ae02acef6d9400645ac03dfdf ]
The Glorious Model I mouse has a buggy HID report descriptor for its keyboard endpoint (used for programmable buttons). For report ID 2, there is a mismatch between Logical Minimum and Usage Minimum in the array that reports keycodes.
The offending portion of the descriptor: (from hid-decode)
0x95, 0x05, // Report Count (5) 30 0x75, 0x08, // Report Size (8) 32 0x15, 0x00, // Logical Minimum (0) 34 0x25, 0x65, // Logical Maximum (101) 36 0x05, 0x07, // Usage Page (Keyboard) 38 0x19, 0x01, // Usage Minimum (1) 40 0x29, 0x65, // Usage Maximum (101) 42 0x81, 0x00, // Input (Data,Arr,Abs) 44
This bug shifts all programmed keycodes up by 1. Importantly, this causes "empty" array indexes of 0x00 to be interpreted as 0x01, ErrorRollOver. The presence of ErrorRollOver causes the system to ignore all keypresses from the endpoint and breaks the ability to use the programmable buttons.
Setting byte 41 to 0x00 fixes this, and causes keycodes to be interpreted correctly.
Also, USB_VENDOR_ID_GLORIOUS is changed to USB_VENDOR_ID_SINOWEALTH, and a new ID for Laview Technology is added. Glorious seems to be white-labeling controller boards or mice from these vendors. There isn't a single canonical vendor ID for Glorious products.
Signed-off-by: Brett Raye braye@fastmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-glorious.c | 16 ++++++++++++++-- drivers/hid/hid-ids.h | 11 +++++++---- 2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/hid/hid-glorious.c b/drivers/hid/hid-glorious.c index 558eb08c19ef..281b3a7187ce 100644 --- a/drivers/hid/hid-glorious.c +++ b/drivers/hid/hid-glorious.c @@ -21,6 +21,10 @@ MODULE_DESCRIPTION("HID driver for Glorious PC Gaming Race mice"); * Glorious Model O and O- specify the const flag in the consumer input * report descriptor, which leads to inputs being ignored. Fix this * by patching the descriptor. + * + * Glorious Model I incorrectly specifes the Usage Minimum for its + * keyboard HID report, causing keycodes to be misinterpreted. + * Fix this by setting Usage Minimum to 0 in that report. */ static __u8 *glorious_report_fixup(struct hid_device *hdev, __u8 *rdesc, unsigned int *rsize) @@ -32,6 +36,10 @@ static __u8 *glorious_report_fixup(struct hid_device *hdev, __u8 *rdesc, rdesc[85] = rdesc[113] = rdesc[141] = \ HID_MAIN_ITEM_VARIABLE | HID_MAIN_ITEM_RELATIVE; } + if (*rsize == 156 && rdesc[41] == 1) { + hid_info(hdev, "patching Glorious Model I keyboard report descriptor\n"); + rdesc[41] = 0; + } return rdesc; }
@@ -44,6 +52,8 @@ static void glorious_update_name(struct hid_device *hdev) model = "Model O"; break; case USB_DEVICE_ID_GLORIOUS_MODEL_D: model = "Model D"; break; + case USB_DEVICE_ID_GLORIOUS_MODEL_I: + model = "Model I"; break; }
snprintf(hdev->name, sizeof(hdev->name), "%s %s", "Glorious", model); @@ -66,10 +76,12 @@ static int glorious_probe(struct hid_device *hdev, }
static const struct hid_device_id glorious_devices[] = { - { HID_USB_DEVICE(USB_VENDOR_ID_GLORIOUS, + { HID_USB_DEVICE(USB_VENDOR_ID_SINOWEALTH, USB_DEVICE_ID_GLORIOUS_MODEL_O) }, - { HID_USB_DEVICE(USB_VENDOR_ID_GLORIOUS, + { HID_USB_DEVICE(USB_VENDOR_ID_SINOWEALTH, USB_DEVICE_ID_GLORIOUS_MODEL_D) }, + { HID_USB_DEVICE(USB_VENDOR_ID_LAVIEW, + USB_DEVICE_ID_GLORIOUS_MODEL_I) }, { } }; MODULE_DEVICE_TABLE(hid, glorious_devices); diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 97ab317570dd..72046039d1be 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -511,10 +511,6 @@ #define USB_DEVICE_ID_GENERAL_TOUCH_WIN8_PIT_010A 0x010a #define USB_DEVICE_ID_GENERAL_TOUCH_WIN8_PIT_E100 0xe100
-#define USB_VENDOR_ID_GLORIOUS 0x258a -#define USB_DEVICE_ID_GLORIOUS_MODEL_D 0x0033 -#define USB_DEVICE_ID_GLORIOUS_MODEL_O 0x0036 - #define I2C_VENDOR_ID_GOODIX 0x27c6 #define I2C_DEVICE_ID_GOODIX_01F0 0x01f0
@@ -746,6 +742,9 @@ #define USB_DEVICE_ID_LABTEC_WIRELESS_KEYBOARD 0x0006 #define USB_DEVICE_ID_LABTEC_ODDOR_HANDBRAKE 0x8888
+#define USB_VENDOR_ID_LAVIEW 0x22D4 +#define USB_DEVICE_ID_GLORIOUS_MODEL_I 0x1503 + #define USB_VENDOR_ID_LCPOWER 0x1241 #define USB_DEVICE_ID_LCPOWER_LC1000 0xf767
@@ -1160,6 +1159,10 @@ #define USB_VENDOR_ID_SIGMATEL 0x066F #define USB_DEVICE_ID_SIGMATEL_STMP3780 0x3780
+#define USB_VENDOR_ID_SINOWEALTH 0x258a +#define USB_DEVICE_ID_GLORIOUS_MODEL_D 0x0033 +#define USB_DEVICE_ID_GLORIOUS_MODEL_O 0x0036 + #define USB_VENDOR_ID_SIS_TOUCH 0x0457 #define USB_DEVICE_ID_SIS9200_TOUCH 0x9200 #define USB_DEVICE_ID_SIS817_TOUCH 0x0817
From: Oliver Neukum oneukum@suse.com
stable inclusion from stable-v6.6.8 commit af48c4099bd86ae0b600eabfc3911f0923526525 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit c55092187d9ad7b2f8f5a8645286fa03997d442f ]
These devices disconnect if suspended without remote wakeup. They can operate with the standard driver.
Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-quirks.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index d9a4f8f7bbb0..e0bbf0c6345d 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -33,6 +33,7 @@ static const struct hid_device_id hid_quirks[] = { { HID_USB_DEVICE(USB_VENDOR_ID_AKAI, USB_DEVICE_ID_AKAI_MPKMINI2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_ALPS, USB_DEVICE_ID_IBM_GAMEPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_AMI, USB_DEVICE_ID_AMI_VIRT_KEYBOARD_AND_MOUSE), HID_QUIRK_ALWAYS_POLL }, + { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_REVB_ANSI), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_2PORTKVM), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_4PORTKVMC), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_4PORTKVM), HID_QUIRK_NOGET },
From: Denis Benato benato.denis96@gmail.com
stable inclusion from stable-v6.6.8 commit 9fc2827c02425c1a74233acaa258e725043d7d59 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 546edbd26cff7ae990e480a59150e801a06f77b1 ]
Some devices managed by this driver automatically set brightness to 0 before entering a suspended state and reset it back to a default brightness level after the resume: this has the effect of having the kernel report wrong brightness status after a sleep, and on some devices (like the Asus RC71L) that brightness is the intensity of LEDs directly facing the user.
Fix the above issue by setting back brightness to the level it had before entering a sleep state.
Signed-off-by: Denis Benato benato.denis96@gmail.com Signed-off-by: Luke D. Jones luke@ljones.dev Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-asus.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/hid/hid-asus.c b/drivers/hid/hid-asus.c index fd61dba88233..194a86cf30db 100644 --- a/drivers/hid/hid-asus.c +++ b/drivers/hid/hid-asus.c @@ -1000,6 +1000,24 @@ static int asus_start_multitouch(struct hid_device *hdev) return 0; }
+static int __maybe_unused asus_resume(struct hid_device *hdev) { + struct asus_drvdata *drvdata = hid_get_drvdata(hdev); + int ret = 0; + + if (drvdata->kbd_backlight) { + const u8 buf[] = { FEATURE_KBD_REPORT_ID, 0xba, 0xc5, 0xc4, + drvdata->kbd_backlight->cdev.brightness }; + ret = asus_kbd_set_report(hdev, buf, sizeof(buf)); + if (ret < 0) { + hid_err(hdev, "Asus failed to set keyboard backlight: %d\n", ret); + goto asus_resume_err; + } + } + +asus_resume_err: + return ret; +} + static int __maybe_unused asus_reset_resume(struct hid_device *hdev) { struct asus_drvdata *drvdata = hid_get_drvdata(hdev); @@ -1294,6 +1312,7 @@ static struct hid_driver asus_driver = { .input_configured = asus_input_configured, #ifdef CONFIG_PM .reset_resume = asus_reset_resume, + .resume = asus_resume, #endif .event = asus_event, .raw_event = asus_raw_event
From: Aoba K nexp_0x17@outlook.com
stable inclusion from stable-v6.6.8 commit c9d25e4639c1d95d384e08d48d7b89cf690db9d1 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9ffccb691adb854e7b7f3ee57fbbda12ff70533f ]
Honor MagicBook 13 2023 has a touchpad which do not switch to the multitouch mode until the input mode feature is written by the host. The touchpad do report the input mode at touchpad(3), while itself working under mouse mode. As a workaround, it is possible to call MT_QUIRE_FORCE_GET_FEATURE to force set feature in mt_set_input_mode for such device.
The touchpad reports as BLTP7853, which cannot retrive any useful manufacture information on the internel by this string at present. As the serial number of the laptop is GLO-G52, while DMI info reports the laptop serial number as GLO-GXXX, this workaround should applied to all models which has the GLO-GXXX.
Signed-off-by: Aoba K nexp_0x17@outlook.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-multitouch.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c index 8db4ae05febc..5ec1f174127a 100644 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -2048,6 +2048,11 @@ static const struct hid_device_id mt_devices[] = { MT_USB_DEVICE(USB_VENDOR_ID_HANVON_ALT, USB_DEVICE_ID_HANVON_ALT_MULTITOUCH) },
+ /* HONOR GLO-GXXX panel */ + { .driver_data = MT_CLS_VTL, + HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, + 0x347d, 0x7853) }, + /* Ilitek dual touch panel */ { .driver_data = MT_CLS_NSMU, MT_USB_DEVICE(USB_VENDOR_ID_ILITEK,
From: Nguyen Dinh Phi phind.uet@gmail.com
stable inclusion from stable-v6.6.8 commit 1f75542ce7c4e958386fb4932185842c3c8e1c4d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 84d2db91f14a32dc856a5972e3f0907089093c7a ]
syzbot reported an memory leak that happens when an skb is add to send_buff after virtual nci closed. This patch adds a variable to track if the ndev is running before handling new skb in send function.
Signed-off-by: Nguyen Dinh Phi phind.uet@gmail.com Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com Reviewed-by: Bongsu Jeon Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/nfc/virtual_ncidev.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nfc/virtual_ncidev.c b/drivers/nfc/virtual_ncidev.c index b027be0b0b6f..590b038e449e 100644 --- a/drivers/nfc/virtual_ncidev.c +++ b/drivers/nfc/virtual_ncidev.c @@ -26,10 +26,14 @@ struct virtual_nci_dev { struct mutex mtx; struct sk_buff *send_buff; struct wait_queue_head wq; + bool running; };
static int virtual_nci_open(struct nci_dev *ndev) { + struct virtual_nci_dev *vdev = nci_get_drvdata(ndev); + + vdev->running = true; return 0; }
@@ -40,6 +44,7 @@ static int virtual_nci_close(struct nci_dev *ndev) mutex_lock(&vdev->mtx); kfree_skb(vdev->send_buff); vdev->send_buff = NULL; + vdev->running = false; mutex_unlock(&vdev->mtx);
return 0; @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb) struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
mutex_lock(&vdev->mtx); - if (vdev->send_buff) { + if (vdev->send_buff || !vdev->running) { mutex_unlock(&vdev->mtx); kfree_skb(skb); return -1;
From: Heiko Carstens hca@linux.ibm.com
stable inclusion from stable-v6.6.8 commit 97774998f8e18dd1dfd728701ddbf82e73fa037f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit aab1f809d7540def24498e81347740a7239a74d5 ]
For some unknown reason the regular expression for checkstack only matches three digit numbers starting with the number "3", or any higher number. Which means that it skips any stack sizes smaller than 304 bytes. This makes the checkstack script a bit less useful than it could be.
Change the script to match any number. To be filtered out stack sizes can be configured with the min_stack variable, which omits any stack frame sizes smaller than 100 bytes by default.
Tested-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- scripts/checkstack.pl | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl index ac74f8629cea..f27d552aec43 100755 --- a/scripts/checkstack.pl +++ b/scripts/checkstack.pl @@ -97,8 +97,7 @@ my (@stack, $re, $dre, $sub, $x, $xs, $funcre, $min_stack); # 11160: a7 fb ff 60 aghi %r15,-160 # or # 100092: e3 f0 ff c8 ff 71 lay %r15,-56(%r15) - $re = qr/.*(?:lay|ag?hi).*%r15,-(([0-9]{2}|[3-9])[0-9]{2}) - (?:(%r15))?$/ox; + $re = qr/.*(?:lay|ag?hi).*%r15,-([0-9]+)(?:(%r15))?$/o; } elsif ($arch eq 'sparc' || $arch eq 'sparc64') { # f0019d10: 9d e3 bf 90 save %sp, -112, %sp $re = qr/.*save.*%sp, -(([0-9]{2}|[3-9])[0-9]{2}), %sp/o;
From: Linus Torvalds torvalds@linux-foundation.org
stable inclusion from stable-v6.6.8 commit a739ceb7474561e522250b6209f73c12f15c0b77 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 125b0bb95dd6bec81b806b997a4ccb026eeecf8f ]
We really don't want to do atomic_read() or anything like that, since we already have the value, not the lock. The whole point of this is that we've loaded the lock from memory, and we want to check whether the value we loaded was a locked one or not.
The main use of this is the lockref code, which loads both the lock and the reference count in one atomic operation, and then works on that combined value. With the atomic_read(), the compiler would pointlessly spill the value to the stack, in order to then be able to read it back "atomically".
This is the qspinlock version of commit c6f4a9002252 ("asm-generic: ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same bug for ticket locks.
Cc: Guo Ren guoren@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: Waiman Long longman@redhat.com Link: https://lore.kernel.org/all/CAHk-=whNRv0v6kQiV5QO6DJhjH4KEL36vWQ6Re8Csrnh4zb... Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- include/asm-generic/qspinlock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h index 995513fa2690..0655aa5b57b2 100644 --- a/include/asm-generic/qspinlock.h +++ b/include/asm-generic/qspinlock.h @@ -70,7 +70,7 @@ static __always_inline int queued_spin_is_locked(struct qspinlock *lock) */ static __always_inline int queued_spin_value_unlocked(struct qspinlock lock) { - return !atomic_read(&lock.val); + return !lock.val.counter; }
/**
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.18 commit 29bb70cad6685839f4e5d0b97c4d7ffc8639cb9d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit fc4561226feaad5fcdcb55646c348d77b8ee69c5 ]
The eventfs directory is dynamically created via the meta data supplied by the existing trace events. All files and directories in eventfs has a parent. Do not allow NULL to be passed into eventfs_start_creating() as the parent because that should never happen. Warn if it does.
Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Andrew Morton akpm@linux-foundation.org Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/tracefs/inode.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 891653ba9cf3..0292c6a2bed9 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -509,20 +509,15 @@ struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) struct dentry *dentry; int error;
+ /* Must always have a parent. */ + if (WARN_ON_ONCE(!parent)) + return ERR_PTR(-EINVAL); + error = simple_pin_fs(&trace_fs_type, &tracefs_mount, &tracefs_mount_count); if (error) return ERR_PTR(error);
- /* - * If the parent is not specified, we create it in the root. - * We need the root dentry to do this, which is in the super - * block. A pointer to that is in the struct vfsmount that we - * have around. - */ - if (!parent) - parent = tracefs_mount->mnt_root; - if (unlikely(IS_DEADDIR(parent->d_inode))) dentry = ERR_PTR(-ENOENT); else
From: Lech Perczak lech.perczak@gmail.com
stable inclusion from stable-v6.6.8 commit e25ee0c2459ae021b19d81febea329baad641eba bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 99360d9620f09fb8bc15548d855011bbb198c680 ]
Interface 4 is used by for QMI interface in stock firmware of MF28D, the router which uses MF290 modem. Rebind it to qmi_wwan after freeing it up from option driver. The proper configuration is:
Interface mapping is: 0: QCDM, 1: (unknown), 2: AT (PCUI), 2: AT (Modem), 4: QMI
T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=19d2 ProdID=0189 Rev= 0.00 S: Manufacturer=ZTE, Incorporated S: Product=ZTE LTE Technologies MSM C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=86(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
Cc: Bjørn Mork bjorn@mork.no Signed-off-by: Lech Perczak lech.perczak@gmail.com Link: https://lore.kernel.org/r/20231117231918.100278-3-lech.perczak@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 344af3c5c836..e2e181378f41 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1289,6 +1289,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x19d2, 0x0168, 4)}, {QMI_FIXED_INTF(0x19d2, 0x0176, 3)}, {QMI_FIXED_INTF(0x19d2, 0x0178, 3)}, + {QMI_FIXED_INTF(0x19d2, 0x0189, 4)}, /* ZTE MF290 */ {QMI_FIXED_INTF(0x19d2, 0x0191, 4)}, /* ZTE EuFi890 */ {QMI_FIXED_INTF(0x19d2, 0x0199, 1)}, /* ZTE MF820S */ {QMI_FIXED_INTF(0x19d2, 0x0200, 1)},
From: Paulo Alcantara pc@manguebit.com
stable inclusion from stable-v6.6.8 commit d5c959a1dba6b21ba305ca173ffee89891640b03 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit ed3e0a149b58ea8cfd10cc4f7cefb39877ff07ac ]
Reparse points are not limited to symlinks, so implement ->query_reparse_point() in order to handle different file types.
Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/client/cifspdu.h | 2 +- fs/smb/client/cifsproto.h | 9 ++ fs/smb/client/cifssmb.c | 193 +++++++++++++++----------------------- fs/smb/client/smb1ops.c | 49 ++-------- fs/smb/client/smb2ops.c | 35 ++++--- 5 files changed, 113 insertions(+), 175 deletions(-)
diff --git a/fs/smb/client/cifspdu.h b/fs/smb/client/cifspdu.h index a75220db5c1e..2a90134331a4 100644 --- a/fs/smb/client/cifspdu.h +++ b/fs/smb/client/cifspdu.h @@ -1356,7 +1356,7 @@ typedef struct smb_com_transaction_ioctl_rsp { __le32 DataDisplacement; __u8 SetupCount; /* 1 */ __le16 ReturnedDataLen; - __u16 ByteCount; + __le16 ByteCount; } __attribute__((packed)) TRANSACT_IOCTL_RSP;
#define CIFS_ACL_OWNER 1 diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h index 8e53abcfc5ec..dc2c43dd6a91 100644 --- a/fs/smb/client/cifsproto.h +++ b/fs/smb/client/cifsproto.h @@ -457,6 +457,12 @@ extern int CIFSSMBUnixQuerySymLink(const unsigned int xid, struct cifs_tcon *tcon, const unsigned char *searchName, char **syminfo, const struct nls_table *nls_codepage, int remap); +extern int cifs_query_reparse_point(const unsigned int xid, + struct cifs_tcon *tcon, + struct cifs_sb_info *cifs_sb, + const char *full_path, + u32 *tag, struct kvec *rsp, + int *rsp_buftype); extern int CIFSSMBQuerySymLink(const unsigned int xid, struct cifs_tcon *tcon, __u16 fid, char **symlinkinfo, const struct nls_table *nls_codepage); @@ -656,6 +662,9 @@ void cifs_put_tcp_super(struct super_block *sb); int cifs_update_super_prepath(struct cifs_sb_info *cifs_sb, char *prefix); char *extract_hostname(const char *unc); char *extract_sharename(const char *unc); +int parse_reparse_point(struct reparse_data_buffer *buf, + u32 plen, struct cifs_sb_info *cifs_sb, + bool unicode, char **target_path);
#ifdef CONFIG_CIFS_DFS_UPCALL static inline int get_dfs_path(const unsigned int xid, struct cifs_ses *ses, diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c index 25503f1a4fd2..bad91ba6c3a9 100644 --- a/fs/smb/client/cifssmb.c +++ b/fs/smb/client/cifssmb.c @@ -2690,136 +2690,97 @@ CIFSSMBUnixQuerySymLink(const unsigned int xid, struct cifs_tcon *tcon, return rc; }
-/* - * Recent Windows versions now create symlinks more frequently - * and they use the "reparse point" mechanism below. We can of course - * do symlinks nicely to Samba and other servers which support the - * CIFS Unix Extensions and we can also do SFU symlinks and "client only" - * "MF" symlinks optionally, but for recent Windows we really need to - * reenable the code below and fix the cifs_symlink callers to handle this. - * In the interim this code has been moved to its own config option so - * it is not compiled in by default until callers fixed up and more tested. - */ -int -CIFSSMBQuerySymLink(const unsigned int xid, struct cifs_tcon *tcon, - __u16 fid, char **symlinkinfo, - const struct nls_table *nls_codepage) +int cifs_query_reparse_point(const unsigned int xid, + struct cifs_tcon *tcon, + struct cifs_sb_info *cifs_sb, + const char *full_path, + u32 *tag, struct kvec *rsp, + int *rsp_buftype) { - int rc = 0; - int bytes_returned; - struct smb_com_transaction_ioctl_req *pSMB; - struct smb_com_transaction_ioctl_rsp *pSMBr; - bool is_unicode; - unsigned int sub_len; - char *sub_start; - struct reparse_symlink_data *reparse_buf; - struct reparse_posix_data *posix_buf; + struct cifs_open_parms oparms; + TRANSACT_IOCTL_REQ *io_req = NULL; + TRANSACT_IOCTL_RSP *io_rsp = NULL; + struct cifs_fid fid; __u32 data_offset, data_count; - char *end_of_smb; + __u8 *start, *end; + int io_rsp_len; + int oplock = 0; + int rc;
- cifs_dbg(FYI, "In Windows reparse style QueryLink for fid %u\n", fid); - rc = smb_init(SMB_COM_NT_TRANSACT, 23, tcon, (void **) &pSMB, - (void **) &pSMBr); + cifs_tcon_dbg(FYI, "%s: path=%s\n", __func__, full_path); + + if (cap_unix(tcon->ses)) + return -EOPNOTSUPP; + + oparms = (struct cifs_open_parms) { + .tcon = tcon, + .cifs_sb = cifs_sb, + .desired_access = FILE_READ_ATTRIBUTES, + .create_options = cifs_create_options(cifs_sb, + OPEN_REPARSE_POINT), + .disposition = FILE_OPEN, + .path = full_path, + .fid = &fid, + }; + + rc = CIFS_open(xid, &oparms, &oplock, NULL); if (rc) return rc;
- pSMB->TotalParameterCount = 0 ; - pSMB->TotalDataCount = 0; - pSMB->MaxParameterCount = cpu_to_le32(2); - /* BB find exact data count max from sess structure BB */ - pSMB->MaxDataCount = cpu_to_le32(CIFSMaxBufSize & 0xFFFFFF00); - pSMB->MaxSetupCount = 4; - pSMB->Reserved = 0; - pSMB->ParameterOffset = 0; - pSMB->DataCount = 0; - pSMB->DataOffset = 0; - pSMB->SetupCount = 4; - pSMB->SubCommand = cpu_to_le16(NT_TRANSACT_IOCTL); - pSMB->ParameterCount = pSMB->TotalParameterCount; - pSMB->FunctionCode = cpu_to_le32(FSCTL_GET_REPARSE_POINT); - pSMB->IsFsctl = 1; /* FSCTL */ - pSMB->IsRootFlag = 0; - pSMB->Fid = fid; /* file handle always le */ - pSMB->ByteCount = 0; + rc = smb_init(SMB_COM_NT_TRANSACT, 23, tcon, + (void **)&io_req, (void **)&io_rsp); + if (rc) + goto error;
- rc = SendReceive(xid, tcon->ses, (struct smb_hdr *) pSMB, - (struct smb_hdr *) pSMBr, &bytes_returned, 0); - if (rc) { - cifs_dbg(FYI, "Send error in QueryReparseLinkInfo = %d\n", rc); - goto qreparse_out; - } + io_req->TotalParameterCount = 0; + io_req->TotalDataCount = 0; + io_req->MaxParameterCount = cpu_to_le32(2); + /* BB find exact data count max from sess structure BB */ + io_req->MaxDataCount = cpu_to_le32(CIFSMaxBufSize & 0xFFFFFF00); + io_req->MaxSetupCount = 4; + io_req->Reserved = 0; + io_req->ParameterOffset = 0; + io_req->DataCount = 0; + io_req->DataOffset = 0; + io_req->SetupCount = 4; + io_req->SubCommand = cpu_to_le16(NT_TRANSACT_IOCTL); + io_req->ParameterCount = io_req->TotalParameterCount; + io_req->FunctionCode = cpu_to_le32(FSCTL_GET_REPARSE_POINT); + io_req->IsFsctl = 1; + io_req->IsRootFlag = 0; + io_req->Fid = fid.netfid; + io_req->ByteCount = 0; + + rc = SendReceive(xid, tcon->ses, (struct smb_hdr *)io_req, + (struct smb_hdr *)io_rsp, &io_rsp_len, 0); + if (rc) + goto error;
- data_offset = le32_to_cpu(pSMBr->DataOffset); - data_count = le32_to_cpu(pSMBr->DataCount); - if (get_bcc(&pSMBr->hdr) < 2 || data_offset > 512) { - /* BB also check enough total bytes returned */ - rc = -EIO; /* bad smb */ - goto qreparse_out; - } - if (!data_count || (data_count > 2048)) { + data_offset = le32_to_cpu(io_rsp->DataOffset); + data_count = le32_to_cpu(io_rsp->DataCount); + if (get_bcc(&io_rsp->hdr) < 2 || data_offset > 512 || + !data_count || data_count > 2048) { rc = -EIO; - cifs_dbg(FYI, "Invalid return data count on get reparse info ioctl\n"); - goto qreparse_out; - } - end_of_smb = 2 + get_bcc(&pSMBr->hdr) + (char *)&pSMBr->ByteCount; - reparse_buf = (struct reparse_symlink_data *) - ((char *)&pSMBr->hdr.Protocol + data_offset); - if ((char *)reparse_buf >= end_of_smb) { - rc = -EIO; - goto qreparse_out; - } - if (reparse_buf->ReparseTag == cpu_to_le32(IO_REPARSE_TAG_NFS)) { - cifs_dbg(FYI, "NFS style reparse tag\n"); - posix_buf = (struct reparse_posix_data *)reparse_buf; - - if (posix_buf->InodeType != cpu_to_le64(NFS_SPECFILE_LNK)) { - cifs_dbg(FYI, "unsupported file type 0x%llx\n", - le64_to_cpu(posix_buf->InodeType)); - rc = -EOPNOTSUPP; - goto qreparse_out; - } - is_unicode = true; - sub_len = le16_to_cpu(reparse_buf->ReparseDataLength); - if (posix_buf->PathBuffer + sub_len > end_of_smb) { - cifs_dbg(FYI, "reparse buf beyond SMB\n"); - rc = -EIO; - goto qreparse_out; - } - *symlinkinfo = cifs_strndup_from_utf16(posix_buf->PathBuffer, - sub_len, is_unicode, nls_codepage); - goto qreparse_out; - } else if (reparse_buf->ReparseTag != - cpu_to_le32(IO_REPARSE_TAG_SYMLINK)) { - rc = -EOPNOTSUPP; - goto qreparse_out; + goto error; }
- /* Reparse tag is NTFS symlink */ - sub_start = le16_to_cpu(reparse_buf->SubstituteNameOffset) + - reparse_buf->PathBuffer; - sub_len = le16_to_cpu(reparse_buf->SubstituteNameLength); - if (sub_start + sub_len > end_of_smb) { - cifs_dbg(FYI, "reparse buf beyond SMB\n"); + end = 2 + get_bcc(&io_rsp->hdr) + (__u8 *)&io_rsp->ByteCount; + start = (__u8 *)&io_rsp->hdr.Protocol + data_offset; + if (start >= end) { rc = -EIO; - goto qreparse_out; + goto error; } - if (pSMBr->hdr.Flags2 & SMBFLG2_UNICODE) - is_unicode = true; - else - is_unicode = false; - - /* BB FIXME investigate remapping reserved chars here */ - *symlinkinfo = cifs_strndup_from_utf16(sub_start, sub_len, is_unicode, - nls_codepage); - if (!*symlinkinfo) - rc = -ENOMEM; -qreparse_out: - cifs_buf_release(pSMB);
- /* - * Note: On -EAGAIN error only caller can retry on handle based calls - * since file handle passed in no longer valid. - */ + *tag = le32_to_cpu(((struct reparse_data_buffer *)start)->ReparseTag); + rsp->iov_base = io_rsp; + rsp->iov_len = io_rsp_len; + *rsp_buftype = CIFS_LARGE_BUFFER; + CIFSSMBClose(xid, tcon, fid.netfid); + return 0; + +error: + cifs_buf_release(io_req); + CIFSSMBClose(xid, tcon, fid.netfid); return rc; }
diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c index 9bf8735cdd1e..6b4d8effa79d 100644 --- a/fs/smb/client/smb1ops.c +++ b/fs/smb/client/smb1ops.c @@ -979,18 +979,13 @@ static int cifs_query_symlink(const unsigned int xid, char **target_path, struct kvec *rsp_iov) { + struct reparse_data_buffer *buf; + TRANSACT_IOCTL_RSP *io = rsp_iov->iov_base; + bool unicode = !!(io->hdr.Flags2 & SMBFLG2_UNICODE); + u32 plen = le16_to_cpu(io->ByteCount); int rc; - int oplock = 0; - bool is_reparse_point = !!rsp_iov; - struct cifs_fid fid; - struct cifs_open_parms oparms;
- cifs_dbg(FYI, "%s: path: %s\n", __func__, full_path); - - if (is_reparse_point) { - cifs_dbg(VFS, "reparse points not handled for SMB1 symlinks\n"); - return -EOPNOTSUPP; - } + cifs_tcon_dbg(FYI, "%s: path=%s\n", __func__, full_path);
/* Check for unix extensions */ if (cap_unix(tcon->ses)) { @@ -1001,37 +996,12 @@ static int cifs_query_symlink(const unsigned int xid, rc = cifs_unix_dfs_readlink(xid, tcon, full_path, target_path, cifs_sb->local_nls); - - goto out; + return rc; }
- oparms = (struct cifs_open_parms) { - .tcon = tcon, - .cifs_sb = cifs_sb, - .desired_access = FILE_READ_ATTRIBUTES, - .create_options = cifs_create_options(cifs_sb, - OPEN_REPARSE_POINT), - .disposition = FILE_OPEN, - .path = full_path, - .fid = &fid, - }; - - rc = CIFS_open(xid, &oparms, &oplock, NULL); - if (rc) - goto out; - - rc = CIFSSMBQuerySymLink(xid, tcon, fid.netfid, target_path, - cifs_sb->local_nls); - if (rc) - goto out_close; - - convert_delimiter(*target_path, '/'); -out_close: - CIFSSMBClose(xid, tcon, fid.netfid); -out: - if (!rc) - cifs_dbg(FYI, "%s: target path: %s\n", __func__, *target_path); - return rc; + buf = (struct reparse_data_buffer *)((__u8 *)&io->hdr.Protocol + + le32_to_cpu(io->DataOffset)); + return parse_reparse_point(buf, plen, cifs_sb, unicode, target_path); }
static bool @@ -1214,6 +1184,7 @@ struct smb_version_operations smb1_operations = { .is_path_accessible = cifs_is_path_accessible, .can_echo = cifs_can_echo, .query_path_info = cifs_query_path_info, + .query_reparse_point = cifs_query_reparse_point, .query_file_info = cifs_query_file_info, .get_srv_inum = cifs_get_srv_inum, .set_path_size = CIFSSMBSetEOF, diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index ea7ccaea711a..917ea8ac1ea5 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -2894,27 +2894,26 @@ parse_reparse_posix(struct reparse_posix_data *symlink_buf, return 0; }
-static int -parse_reparse_symlink(struct reparse_symlink_data_buffer *symlink_buf, - u32 plen, char **target_path, - struct cifs_sb_info *cifs_sb) +static int parse_reparse_symlink(struct reparse_symlink_data_buffer *sym, + u32 plen, bool unicode, char **target_path, + struct cifs_sb_info *cifs_sb) { unsigned int sub_len; unsigned int sub_offset;
/* We handle Symbolic Link reparse tag here. See: MS-FSCC 2.1.2.4 */
- sub_offset = le16_to_cpu(symlink_buf->SubstituteNameOffset); - sub_len = le16_to_cpu(symlink_buf->SubstituteNameLength); + sub_offset = le16_to_cpu(sym->SubstituteNameOffset); + sub_len = le16_to_cpu(sym->SubstituteNameLength); if (sub_offset + 20 > plen || sub_offset + sub_len + 20 > plen) { cifs_dbg(VFS, "srv returned malformed symlink buffer\n"); return -EIO; }
- *target_path = cifs_strndup_from_utf16( - symlink_buf->PathBuffer + sub_offset, - sub_len, true, cifs_sb->local_nls); + *target_path = cifs_strndup_from_utf16(sym->PathBuffer + sub_offset, + sub_len, unicode, + cifs_sb->local_nls); if (!(*target_path)) return -ENOMEM;
@@ -2924,19 +2923,17 @@ parse_reparse_symlink(struct reparse_symlink_data_buffer *symlink_buf, return 0; }
-static int -parse_reparse_point(struct reparse_data_buffer *buf, - u32 plen, char **target_path, - struct cifs_sb_info *cifs_sb) +int parse_reparse_point(struct reparse_data_buffer *buf, + u32 plen, struct cifs_sb_info *cifs_sb, + bool unicode, char **target_path) { - if (plen < sizeof(struct reparse_data_buffer)) { + if (plen < sizeof(*buf)) { cifs_dbg(VFS, "reparse buffer is too small. Must be at least 8 bytes but was %d\n", plen); return -EIO; }
- if (plen < le16_to_cpu(buf->ReparseDataLength) + - sizeof(struct reparse_data_buffer)) { + if (plen < le16_to_cpu(buf->ReparseDataLength) + sizeof(*buf)) { cifs_dbg(VFS, "srv returned invalid reparse buf length: %d\n", plen); return -EIO; @@ -2951,7 +2948,7 @@ parse_reparse_point(struct reparse_data_buffer *buf, case IO_REPARSE_TAG_SYMLINK: return parse_reparse_symlink( (struct reparse_symlink_data_buffer *)buf, - plen, target_path, cifs_sb); + plen, unicode, target_path, cifs_sb); default: cifs_dbg(VFS, "srv returned unknown symlink buffer tag:0x%08x\n", le32_to_cpu(buf->ReparseTag)); @@ -2970,11 +2967,11 @@ static int smb2_query_symlink(const unsigned int xid, struct smb2_ioctl_rsp *io = rsp_iov->iov_base; u32 plen = le32_to_cpu(io->OutputCount);
- cifs_dbg(FYI, "%s: path: %s\n", __func__, full_path); + cifs_tcon_dbg(FYI, "%s: path: %s\n", __func__, full_path);
buf = (struct reparse_data_buffer *)((u8 *)io + le32_to_cpu(io->OutputOffset)); - return parse_reparse_point(buf, plen, target_path, cifs_sb); + return parse_reparse_point(buf, plen, cifs_sb, true, target_path); }
static int smb2_query_reparse_point(const unsigned int xid,
From: Paulo Alcantara pc@manguebit.com
stable inclusion from stable-v6.6.8 commit 4d07e5df13877921cbbebd1e305b50f512c241ee bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 539aad7f14dab7f947e5ab81901c0b20513a50db ]
Parse reparse point into cifs_open_info_data structure and feed it through cifs_open_info_to_fattr().
Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/client/cifsglob.h | 6 ++++-- fs/smb/client/inode.c | 23 +++++++++++++--------- fs/smb/client/smb1ops.c | 41 ++++++++++++++++++++++------------------ fs/smb/client/smb2ops.c | 28 ++++++++++++++------------- 4 files changed, 56 insertions(+), 42 deletions(-)
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index b8d1c19f6771..7180b5713bea 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -395,8 +395,7 @@ struct smb_version_operations { struct cifs_tcon *tcon, struct cifs_sb_info *cifs_sb, const char *full_path, - char **target_path, - struct kvec *rsp_iov); + char **target_path); /* open a file for non-posix mounts */ int (*open)(const unsigned int xid, struct cifs_open_parms *oparms, __u32 *oplock, void *buf); @@ -551,6 +550,9 @@ struct smb_version_operations { bool (*is_status_io_timeout)(char *buf); /* Check for STATUS_NETWORK_NAME_DELETED */ bool (*is_network_name_deleted)(char *buf, struct TCP_Server_Info *srv); + int (*parse_reparse_point)(struct cifs_sb_info *cifs_sb, + struct kvec *rsp_iov, + struct cifs_open_info_data *data); };
struct smb_version_values { diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index d6aa5e474d5e..7a2a013bb05e 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -457,8 +457,7 @@ static int cifs_get_unix_fattr(const unsigned char *full_path, return -EOPNOTSUPP; rc = server->ops->query_symlink(xid, tcon, cifs_sb, full_path, - &fattr->cf_symlink_target, - NULL); + &fattr->cf_symlink_target); cifs_dbg(FYI, "%s: query_symlink: %d\n", __func__, rc); } return rc; @@ -1035,22 +1034,28 @@ static int reparse_info_to_fattr(struct cifs_open_info_data *data, if (!rc) iov = &rsp_iov; } + + rc = -EOPNOTSUPP; switch ((data->reparse_tag = tag)) { case 0: /* SMB1 symlink */ - iov = NULL; - fallthrough; - case IO_REPARSE_TAG_NFS: - case IO_REPARSE_TAG_SYMLINK: - if (!data->symlink_target && server->ops->query_symlink) { + if (server->ops->query_symlink) { rc = server->ops->query_symlink(xid, tcon, cifs_sb, full_path, - &data->symlink_target, - iov); + &data->symlink_target); } break; case IO_REPARSE_TAG_MOUNT_POINT: cifs_create_junction_fattr(fattr, sb); + rc = 0; goto out; + default: + if (data->symlink_target) { + rc = 0; + } else if (server->ops->parse_reparse_point) { + rc = server->ops->parse_reparse_point(cifs_sb, + iov, data); + } + break; }
cifs_open_info_to_fattr(fattr, data, sb); diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c index 6b4d8effa79d..0dd599004e04 100644 --- a/fs/smb/client/smb1ops.c +++ b/fs/smb/client/smb1ops.c @@ -976,32 +976,36 @@ static int cifs_query_symlink(const unsigned int xid, struct cifs_tcon *tcon, struct cifs_sb_info *cifs_sb, const char *full_path, - char **target_path, - struct kvec *rsp_iov) + char **target_path) { - struct reparse_data_buffer *buf; - TRANSACT_IOCTL_RSP *io = rsp_iov->iov_base; - bool unicode = !!(io->hdr.Flags2 & SMBFLG2_UNICODE); - u32 plen = le16_to_cpu(io->ByteCount); int rc;
cifs_tcon_dbg(FYI, "%s: path=%s\n", __func__, full_path);
- /* Check for unix extensions */ - if (cap_unix(tcon->ses)) { - rc = CIFSSMBUnixQuerySymLink(xid, tcon, full_path, target_path, - cifs_sb->local_nls, - cifs_remap(cifs_sb)); - if (rc == -EREMOTE) - rc = cifs_unix_dfs_readlink(xid, tcon, full_path, - target_path, - cifs_sb->local_nls); - return rc; - } + if (!cap_unix(tcon->ses)) + return -EOPNOTSUPP; + + rc = CIFSSMBUnixQuerySymLink(xid, tcon, full_path, target_path, + cifs_sb->local_nls, cifs_remap(cifs_sb)); + if (rc == -EREMOTE) + rc = cifs_unix_dfs_readlink(xid, tcon, full_path, + target_path, cifs_sb->local_nls); + return rc; +} + +static int cifs_parse_reparse_point(struct cifs_sb_info *cifs_sb, + struct kvec *rsp_iov, + struct cifs_open_info_data *data) +{ + struct reparse_data_buffer *buf; + TRANSACT_IOCTL_RSP *io = rsp_iov->iov_base; + bool unicode = !!(io->hdr.Flags2 & SMBFLG2_UNICODE); + u32 plen = le16_to_cpu(io->ByteCount);
buf = (struct reparse_data_buffer *)((__u8 *)&io->hdr.Protocol + le32_to_cpu(io->DataOffset)); - return parse_reparse_point(buf, plen, cifs_sb, unicode, target_path); + return parse_reparse_point(buf, plen, cifs_sb, unicode, + &data->symlink_target); }
static bool @@ -1200,6 +1204,7 @@ struct smb_version_operations smb1_operations = { .rename = CIFSSMBRename, .create_hardlink = CIFSCreateHardLink, .query_symlink = cifs_query_symlink, + .parse_reparse_point = cifs_parse_reparse_point, .open = cifs_open_file, .set_fid = cifs_set_fid, .close = cifs_close_file, diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index 917ea8ac1ea5..e3e2ecc32dc8 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -2949,6 +2949,12 @@ int parse_reparse_point(struct reparse_data_buffer *buf, return parse_reparse_symlink( (struct reparse_symlink_data_buffer *)buf, plen, unicode, target_path, cifs_sb); + case IO_REPARSE_TAG_LX_SYMLINK: + case IO_REPARSE_TAG_AF_UNIX: + case IO_REPARSE_TAG_LX_FIFO: + case IO_REPARSE_TAG_LX_CHR: + case IO_REPARSE_TAG_LX_BLK: + return 0; default: cifs_dbg(VFS, "srv returned unknown symlink buffer tag:0x%08x\n", le32_to_cpu(buf->ReparseTag)); @@ -2956,22 +2962,18 @@ int parse_reparse_point(struct reparse_data_buffer *buf, } }
-static int smb2_query_symlink(const unsigned int xid, - struct cifs_tcon *tcon, - struct cifs_sb_info *cifs_sb, - const char *full_path, - char **target_path, - struct kvec *rsp_iov) +static int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb, + struct kvec *rsp_iov, + struct cifs_open_info_data *data) { struct reparse_data_buffer *buf; struct smb2_ioctl_rsp *io = rsp_iov->iov_base; u32 plen = le32_to_cpu(io->OutputCount);
- cifs_tcon_dbg(FYI, "%s: path: %s\n", __func__, full_path); - buf = (struct reparse_data_buffer *)((u8 *)io + le32_to_cpu(io->OutputOffset)); - return parse_reparse_point(buf, plen, cifs_sb, true, target_path); + return parse_reparse_point(buf, plen, cifs_sb, + true, &data->symlink_target); }
static int smb2_query_reparse_point(const unsigned int xid, @@ -5217,7 +5219,7 @@ struct smb_version_operations smb20_operations = { .unlink = smb2_unlink, .rename = smb2_rename_path, .create_hardlink = smb2_create_hardlink, - .query_symlink = smb2_query_symlink, + .parse_reparse_point = smb2_parse_reparse_point, .query_mf_symlink = smb3_query_mf_symlink, .create_mf_symlink = smb3_create_mf_symlink, .open = smb2_open_file, @@ -5319,7 +5321,7 @@ struct smb_version_operations smb21_operations = { .unlink = smb2_unlink, .rename = smb2_rename_path, .create_hardlink = smb2_create_hardlink, - .query_symlink = smb2_query_symlink, + .parse_reparse_point = smb2_parse_reparse_point, .query_mf_symlink = smb3_query_mf_symlink, .create_mf_symlink = smb3_create_mf_symlink, .open = smb2_open_file, @@ -5424,7 +5426,7 @@ struct smb_version_operations smb30_operations = { .unlink = smb2_unlink, .rename = smb2_rename_path, .create_hardlink = smb2_create_hardlink, - .query_symlink = smb2_query_symlink, + .parse_reparse_point = smb2_parse_reparse_point, .query_mf_symlink = smb3_query_mf_symlink, .create_mf_symlink = smb3_create_mf_symlink, .open = smb2_open_file, @@ -5538,7 +5540,7 @@ struct smb_version_operations smb311_operations = { .unlink = smb2_unlink, .rename = smb2_rename_path, .create_hardlink = smb2_create_hardlink, - .query_symlink = smb2_query_symlink, + .parse_reparse_point = smb2_parse_reparse_point, .query_mf_symlink = smb3_query_mf_symlink, .create_mf_symlink = smb3_create_mf_symlink, .open = smb2_open_file,
From: Paulo Alcantara pc@manguebit.com
stable inclusion from stable-v6.6.8 commit df32e887d32b80875c410e2a1091be166549e051 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 45e724022e2704b5a5193fd96f378822b0448e07 ]
Handle all file types in NFS reparse points as specified in MS-FSCC 2.1.2.6 Network File System (NFS) Reparse Data Buffer.
The client is now able to set all file types based on the parsed NFS reparse point, which used to support only symlinks. This works for SMB1+.
Before patch:
$ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt ls: cannot access 'block': Operation not supported ls: cannot access 'char': Operation not supported ls: cannot access 'fifo': Operation not supported ls: cannot access 'sock': Operation not supported total 1 l????????? ? ? ? ? ? block l????????? ? ? ? ? ? char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 l????????? ? ? ? ? ? fifo l--------- 1 root root 0 Nov 18 23:23 link -> f0 l????????? ? ? ? ? ? sock
After patch:
$ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt total 1 brwxr-xr-x 1 root root 123, 123 Nov 18 00:34 block crwxr-xr-x 1 root root 1234, 1234 Nov 18 00:33 char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 prwxr-xr-x 1 root root 0 Nov 18 23:23 fifo lrwxr-xr-x 1 root root 0 Nov 18 23:23 link -> f0 srwxr-xr-x 1 root root 0 Nov 19 2023 sock
Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/client/cifsglob.h | 8 ++- fs/smb/client/cifspdu.h | 2 +- fs/smb/client/cifsproto.h | 4 +- fs/smb/client/inode.c | 51 +++++++++++++++++-- fs/smb/client/readdir.c | 6 ++- fs/smb/client/smb1ops.c | 3 +- fs/smb/client/smb2inode.c | 2 +- fs/smb/client/smb2ops.c | 101 ++++++++++++++++++++------------------ 8 files changed, 116 insertions(+), 61 deletions(-)
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index 7180b5713bea..bd7fc20c49de 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -191,7 +191,13 @@ struct cifs_open_info_data { bool reparse_point; bool symlink; }; - __u32 reparse_tag; + struct { + __u32 tag; + union { + struct reparse_data_buffer *buf; + struct reparse_posix_data *posix; + }; + } reparse; char *symlink_target; union { struct smb2_file_all_info fi; diff --git a/fs/smb/client/cifspdu.h b/fs/smb/client/cifspdu.h index 2a90134331a4..83ccc51a54d0 100644 --- a/fs/smb/client/cifspdu.h +++ b/fs/smb/client/cifspdu.h @@ -1509,7 +1509,7 @@ struct reparse_posix_data { __le16 ReparseDataLength; __u16 Reserved; __le64 InodeType; /* LNK, FIFO, CHR etc. */ - char PathBuffer[]; + __u8 DataBuffer[]; } __attribute__((packed));
struct cifs_quota_data { diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h index dc2c43dd6a91..c858feaf4f92 100644 --- a/fs/smb/client/cifsproto.h +++ b/fs/smb/client/cifsproto.h @@ -209,7 +209,7 @@ int cifs_get_inode_info(struct inode **inode, const char *full_path, const struct cifs_fid *fid); bool cifs_reparse_point_to_fattr(struct cifs_sb_info *cifs_sb, struct cifs_fattr *fattr, - u32 tag); + struct cifs_open_info_data *data); extern int smb311_posix_get_inode_info(struct inode **pinode, const char *search_path, struct super_block *sb, unsigned int xid); extern int cifs_get_inode_info_unix(struct inode **pinode, @@ -664,7 +664,7 @@ char *extract_hostname(const char *unc); char *extract_sharename(const char *unc); int parse_reparse_point(struct reparse_data_buffer *buf, u32 plen, struct cifs_sb_info *cifs_sb, - bool unicode, char **target_path); + bool unicode, struct cifs_open_info_data *data);
#ifdef CONFIG_CIFS_DFS_UPCALL static inline int get_dfs_path(const unsigned int xid, struct cifs_ses *ses, diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index 7a2a013bb05e..6a856945f2b4 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -719,10 +719,51 @@ static void smb311_posix_info_to_fattr(struct cifs_fattr *fattr, fattr->cf_mode, fattr->cf_uniqueid, fattr->cf_nlink); }
+static inline dev_t nfs_mkdev(struct reparse_posix_data *buf) +{ + u64 v = le64_to_cpu(*(__le64 *)buf->DataBuffer); + + return MKDEV(v >> 32, v & 0xffffffff); +} + bool cifs_reparse_point_to_fattr(struct cifs_sb_info *cifs_sb, struct cifs_fattr *fattr, - u32 tag) + struct cifs_open_info_data *data) { + struct reparse_posix_data *buf = data->reparse.posix; + u32 tag = data->reparse.tag; + + if (tag == IO_REPARSE_TAG_NFS && buf) { + switch (le64_to_cpu(buf->InodeType)) { + case NFS_SPECFILE_CHR: + fattr->cf_mode |= S_IFCHR | cifs_sb->ctx->file_mode; + fattr->cf_dtype = DT_CHR; + fattr->cf_rdev = nfs_mkdev(buf); + break; + case NFS_SPECFILE_BLK: + fattr->cf_mode |= S_IFBLK | cifs_sb->ctx->file_mode; + fattr->cf_dtype = DT_BLK; + fattr->cf_rdev = nfs_mkdev(buf); + break; + case NFS_SPECFILE_FIFO: + fattr->cf_mode |= S_IFIFO | cifs_sb->ctx->file_mode; + fattr->cf_dtype = DT_FIFO; + break; + case NFS_SPECFILE_SOCK: + fattr->cf_mode |= S_IFSOCK | cifs_sb->ctx->file_mode; + fattr->cf_dtype = DT_SOCK; + break; + case NFS_SPECFILE_LNK: + fattr->cf_mode = S_IFLNK | cifs_sb->ctx->file_mode; + fattr->cf_dtype = DT_LNK; + break; + default: + WARN_ON_ONCE(1); + return false; + } + return true; + } + switch (tag) { case IO_REPARSE_TAG_LX_SYMLINK: fattr->cf_mode |= S_IFLNK | cifs_sb->ctx->file_mode; @@ -788,7 +829,7 @@ static void cifs_open_info_to_fattr(struct cifs_fattr *fattr, fattr->cf_nlink = le32_to_cpu(info->NumberOfLinks);
if (cifs_open_data_reparse(data) && - cifs_reparse_point_to_fattr(cifs_sb, fattr, data->reparse_tag)) + cifs_reparse_point_to_fattr(cifs_sb, fattr, data)) goto out_reparse;
if (fattr->cf_cifsattrs & ATTR_DIRECTORY) { @@ -855,7 +896,7 @@ cifs_get_file_info(struct file *filp) data.adjust_tz = false; if (data.symlink_target) { data.symlink = true; - data.reparse_tag = IO_REPARSE_TAG_SYMLINK; + data.reparse.tag = IO_REPARSE_TAG_SYMLINK; } cifs_open_info_to_fattr(&fattr, &data, inode->i_sb); break; @@ -1024,7 +1065,7 @@ static int reparse_info_to_fattr(struct cifs_open_info_data *data, struct cifs_sb_info *cifs_sb = CIFS_SB(sb); struct kvec rsp_iov, *iov = NULL; int rsp_buftype = CIFS_NO_BUFFER; - u32 tag = data->reparse_tag; + u32 tag = data->reparse.tag; int rc = 0;
if (!tag && server->ops->query_reparse_point) { @@ -1036,7 +1077,7 @@ static int reparse_info_to_fattr(struct cifs_open_info_data *data, }
rc = -EOPNOTSUPP; - switch ((data->reparse_tag = tag)) { + switch ((data->reparse.tag = tag)) { case 0: /* SMB1 symlink */ if (server->ops->query_symlink) { rc = server->ops->query_symlink(xid, tcon, diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c index 47fc22de8d20..d30ea2005eb3 100644 --- a/fs/smb/client/readdir.c +++ b/fs/smb/client/readdir.c @@ -153,6 +153,10 @@ static bool reparse_file_needs_reval(const struct cifs_fattr *fattr) static void cifs_fill_common_info(struct cifs_fattr *fattr, struct cifs_sb_info *cifs_sb) { + struct cifs_open_info_data data = { + .reparse = { .tag = fattr->cf_cifstag, }, + }; + fattr->cf_uid = cifs_sb->ctx->linux_uid; fattr->cf_gid = cifs_sb->ctx->linux_gid;
@@ -165,7 +169,7 @@ cifs_fill_common_info(struct cifs_fattr *fattr, struct cifs_sb_info *cifs_sb) * reasonably map some of them to directories vs. files vs. symlinks */ if ((fattr->cf_cifsattrs & ATTR_REPARSE) && - cifs_reparse_point_to_fattr(cifs_sb, fattr, fattr->cf_cifstag)) + cifs_reparse_point_to_fattr(cifs_sb, fattr, &data)) goto out_reparse;
if (fattr->cf_cifsattrs & ATTR_DIRECTORY) { diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c index 0dd599004e04..64e25233e85d 100644 --- a/fs/smb/client/smb1ops.c +++ b/fs/smb/client/smb1ops.c @@ -1004,8 +1004,7 @@ static int cifs_parse_reparse_point(struct cifs_sb_info *cifs_sb,
buf = (struct reparse_data_buffer *)((__u8 *)&io->hdr.Protocol + le32_to_cpu(io->DataOffset)); - return parse_reparse_point(buf, plen, cifs_sb, unicode, - &data->symlink_target); + return parse_reparse_point(buf, plen, cifs_sb, unicode, data); }
static bool diff --git a/fs/smb/client/smb2inode.c b/fs/smb/client/smb2inode.c index 0b89f7008ac0..c94940af5d4b 100644 --- a/fs/smb/client/smb2inode.c +++ b/fs/smb/client/smb2inode.c @@ -555,7 +555,7 @@ static int parse_create_response(struct cifs_open_info_data *data, break; } data->reparse_point = reparse_point; - data->reparse_tag = tag; + data->reparse.tag = tag; return rc; }
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index e3e2ecc32dc8..258a591f716a 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -2866,89 +2866,95 @@ smb2_get_dfs_refer(const unsigned int xid, struct cifs_ses *ses, return rc; }
-static int -parse_reparse_posix(struct reparse_posix_data *symlink_buf, - u32 plen, char **target_path, - struct cifs_sb_info *cifs_sb) +/* See MS-FSCC 2.1.2.6 for the 'NFS' style reparse tags */ +static int parse_reparse_posix(struct reparse_posix_data *buf, + struct cifs_sb_info *cifs_sb, + struct cifs_open_info_data *data) { unsigned int len; - - /* See MS-FSCC 2.1.2.6 for the 'NFS' style reparse tags */ - len = le16_to_cpu(symlink_buf->ReparseDataLength); - - if (le64_to_cpu(symlink_buf->InodeType) != NFS_SPECFILE_LNK) { - cifs_dbg(VFS, "%lld not a supported symlink type\n", - le64_to_cpu(symlink_buf->InodeType)); + u64 type; + + switch ((type = le64_to_cpu(buf->InodeType))) { + case NFS_SPECFILE_LNK: + len = le16_to_cpu(buf->ReparseDataLength); + data->symlink_target = cifs_strndup_from_utf16(buf->DataBuffer, + len, true, + cifs_sb->local_nls); + if (!data->symlink_target) + return -ENOMEM; + convert_delimiter(data->symlink_target, '/'); + cifs_dbg(FYI, "%s: target path: %s\n", + __func__, data->symlink_target); + break; + case NFS_SPECFILE_CHR: + case NFS_SPECFILE_BLK: + case NFS_SPECFILE_FIFO: + case NFS_SPECFILE_SOCK: + break; + default: + cifs_dbg(VFS, "%s: unhandled inode type: 0x%llx\n", + __func__, type); return -EOPNOTSUPP; } - - *target_path = cifs_strndup_from_utf16( - symlink_buf->PathBuffer, - len, true, cifs_sb->local_nls); - if (!(*target_path)) - return -ENOMEM; - - convert_delimiter(*target_path, '/'); - cifs_dbg(FYI, "%s: target path: %s\n", __func__, *target_path); - return 0; }
static int parse_reparse_symlink(struct reparse_symlink_data_buffer *sym, - u32 plen, bool unicode, char **target_path, - struct cifs_sb_info *cifs_sb) + u32 plen, bool unicode, + struct cifs_sb_info *cifs_sb, + struct cifs_open_info_data *data) { - unsigned int sub_len; - unsigned int sub_offset; + unsigned int len; + unsigned int offs;
/* We handle Symbolic Link reparse tag here. See: MS-FSCC 2.1.2.4 */
- sub_offset = le16_to_cpu(sym->SubstituteNameOffset); - sub_len = le16_to_cpu(sym->SubstituteNameLength); - if (sub_offset + 20 > plen || - sub_offset + sub_len + 20 > plen) { + offs = le16_to_cpu(sym->SubstituteNameOffset); + len = le16_to_cpu(sym->SubstituteNameLength); + if (offs + 20 > plen || offs + len + 20 > plen) { cifs_dbg(VFS, "srv returned malformed symlink buffer\n"); return -EIO; }
- *target_path = cifs_strndup_from_utf16(sym->PathBuffer + sub_offset, - sub_len, unicode, - cifs_sb->local_nls); - if (!(*target_path)) + data->symlink_target = cifs_strndup_from_utf16(sym->PathBuffer + offs, + len, unicode, + cifs_sb->local_nls); + if (!data->symlink_target) return -ENOMEM;
- convert_delimiter(*target_path, '/'); - cifs_dbg(FYI, "%s: target path: %s\n", __func__, *target_path); + convert_delimiter(data->symlink_target, '/'); + cifs_dbg(FYI, "%s: target path: %s\n", __func__, data->symlink_target);
return 0; }
int parse_reparse_point(struct reparse_data_buffer *buf, u32 plen, struct cifs_sb_info *cifs_sb, - bool unicode, char **target_path) + bool unicode, struct cifs_open_info_data *data) { if (plen < sizeof(*buf)) { - cifs_dbg(VFS, "reparse buffer is too small. Must be at least 8 bytes but was %d\n", - plen); + cifs_dbg(VFS, "%s: reparse buffer is too small. Must be at least 8 bytes but was %d\n", + __func__, plen); return -EIO; }
if (plen < le16_to_cpu(buf->ReparseDataLength) + sizeof(*buf)) { - cifs_dbg(VFS, "srv returned invalid reparse buf length: %d\n", - plen); + cifs_dbg(VFS, "%s: invalid reparse buf length: %d\n", + __func__, plen); return -EIO; }
+ data->reparse.buf = buf; + /* See MS-FSCC 2.1.2 */ switch (le32_to_cpu(buf->ReparseTag)) { case IO_REPARSE_TAG_NFS: - return parse_reparse_posix( - (struct reparse_posix_data *)buf, - plen, target_path, cifs_sb); + return parse_reparse_posix((struct reparse_posix_data *)buf, + cifs_sb, data); case IO_REPARSE_TAG_SYMLINK: return parse_reparse_symlink( (struct reparse_symlink_data_buffer *)buf, - plen, unicode, target_path, cifs_sb); + plen, unicode, cifs_sb, data); case IO_REPARSE_TAG_LX_SYMLINK: case IO_REPARSE_TAG_AF_UNIX: case IO_REPARSE_TAG_LX_FIFO: @@ -2956,8 +2962,8 @@ int parse_reparse_point(struct reparse_data_buffer *buf, case IO_REPARSE_TAG_LX_BLK: return 0; default: - cifs_dbg(VFS, "srv returned unknown symlink buffer tag:0x%08x\n", - le32_to_cpu(buf->ReparseTag)); + cifs_dbg(VFS, "%s: unhandled reparse tag: 0x%08x\n", + __func__, le32_to_cpu(buf->ReparseTag)); return -EOPNOTSUPP; } } @@ -2972,8 +2978,7 @@ static int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb,
buf = (struct reparse_data_buffer *)((u8 *)io + le32_to_cpu(io->OutputOffset)); - return parse_reparse_point(buf, plen, cifs_sb, - true, &data->symlink_target); + return parse_reparse_point(buf, plen, cifs_sb, true, data); }
static int smb2_query_reparse_point(const unsigned int xid,
From: Masahiro Yamada masahiroy@kernel.org
stable inclusion from stable-v6.6.8 commit 610610da58af14a2da813c550264616e99a1a756 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit c0a8574204054effad6ac83cc75c02576e2985fe ]
A common issue in Makefile is a race in parallel building.
You need to be careful to prevent multiple threads from writing to the same file simultaneously.
Commit 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not generate invalid images") addressed such a bad scenario.
A similar symptom occurs with the following command:
$ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi [ snip ] SORTTAB vmlinux OBJCOPY arch/arm64/boot/Image OBJCOPY arch/arm64/boot/Image AS arch/arm64/boot/zboot-header.o PAD arch/arm64/boot/vmlinux.bin GZIP arch/arm64/boot/vmlinuz OBJCOPY arch/arm64/boot/vmlinuz.o LD arch/arm64/boot/vmlinuz.efi.elf OBJCOPY arch/arm64/boot/vmlinuz.efi
The log "OBJCOPY arch/arm64/boot/Image" is displayed twice.
It indicates that two threads simultaneously enter arch/arm64/boot/ and write to arch/arm64/boot/Image.
It occasionally leads to a build failure:
$ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi [ snip ] SORTTAB vmlinux OBJCOPY arch/arm64/boot/Image PAD arch/arm64/boot/vmlinux.bin truncate: Invalid number: 'arch/arm64/boot/vmlinux.bin' make[2]: *** [drivers/firmware/efi/libstub/Makefile.zboot:13: arch/arm64/boot/vmlinux.bin] Error 1 make[2]: *** Deleting file 'arch/arm64/boot/vmlinux.bin' make[1]: *** [arch/arm64/Makefile:163: vmlinuz.efi] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [Makefile:234: __sub-make] Error 2
vmlinuz.efi depends on Image, but such a dependency is not specified in arch/arm64/Makefile.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Acked-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: SImon Glass sjg@chromium.org Link: https://lore.kernel.org/r/20231119053234.2367621-1-masahiroy@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/arm64/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index efd4b861fb86..94be26b6c147 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -158,7 +158,7 @@ endif
all: $(notdir $(KBUILD_IMAGE))
- +vmlinuz.efi: Image Image vmlinuz.efi: vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
From: Denis Benato benato.denis96@gmail.com
stable inclusion from stable-v6.6.8 commit 5ce0fb87311d17907910e06e0a9b1306da4c62b2 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 06ae5afce8cc1f7621cc5c7751e449ce20d68af7 ]
In the function asus_kbd_set_report the parameter buf is read-only as it gets copied in a memory portion suitable for USB transfer, but the parameter is not marked as const: add the missing const and mark const immutable buffers passed to that function.
Signed-off-by: Denis Benato benato.denis96@gmail.com Signed-off-by: Luke D. Jones luke@ljones.dev Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/hid/hid-asus.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/hid/hid-asus.c b/drivers/hid/hid-asus.c index 194a86cf30db..78cdfb8b9a7a 100644 --- a/drivers/hid/hid-asus.c +++ b/drivers/hid/hid-asus.c @@ -381,7 +381,7 @@ static int asus_raw_event(struct hid_device *hdev, return 0; }
-static int asus_kbd_set_report(struct hid_device *hdev, u8 *buf, size_t buf_size) +static int asus_kbd_set_report(struct hid_device *hdev, const u8 *buf, size_t buf_size) { unsigned char *dmabuf; int ret; @@ -404,7 +404,7 @@ static int asus_kbd_set_report(struct hid_device *hdev, u8 *buf, size_t buf_size
static int asus_kbd_init(struct hid_device *hdev) { - u8 buf[] = { FEATURE_KBD_REPORT_ID, 0x41, 0x53, 0x55, 0x53, 0x20, 0x54, + const u8 buf[] = { FEATURE_KBD_REPORT_ID, 0x41, 0x53, 0x55, 0x53, 0x20, 0x54, 0x65, 0x63, 0x68, 0x2e, 0x49, 0x6e, 0x63, 0x2e, 0x00 }; int ret;
@@ -418,7 +418,7 @@ static int asus_kbd_init(struct hid_device *hdev) static int asus_kbd_get_functions(struct hid_device *hdev, unsigned char *kbd_func) { - u8 buf[] = { FEATURE_KBD_REPORT_ID, 0x05, 0x20, 0x31, 0x00, 0x08 }; + const u8 buf[] = { FEATURE_KBD_REPORT_ID, 0x05, 0x20, 0x31, 0x00, 0x08 }; u8 *readbuf; int ret;
@@ -449,7 +449,7 @@ static int asus_kbd_get_functions(struct hid_device *hdev,
static int rog_nkey_led_init(struct hid_device *hdev) { - u8 buf_init_start[] = { FEATURE_KBD_LED_REPORT_ID1, 0xB9 }; + const u8 buf_init_start[] = { FEATURE_KBD_LED_REPORT_ID1, 0xB9 }; u8 buf_init2[] = { FEATURE_KBD_LED_REPORT_ID1, 0x41, 0x53, 0x55, 0x53, 0x20, 0x54, 0x65, 0x63, 0x68, 0x2e, 0x49, 0x6e, 0x63, 0x2e, 0x00 }; u8 buf_init3[] = { FEATURE_KBD_LED_REPORT_ID1,
From: Mark Rutland mark.rutland@arm.com
stable inclusion from stable-v6.6.8 commit 545d55a3e0c8f4e59fbc9d92c99ef5725c799014 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 7e2c1e4b34f07d9aa8937fab88359d4a0fce468e upstream.
When lockdep is enabled, the for_each_sibling_event(sibling, event) macro checks that event->ctx->mutex is held. When creating a new group leader event, we call perf_event_validate_size() on a partially initialized event where event->ctx is NULL, and so when for_each_sibling_event() attempts to check event->ctx->mutex, we get a splat, as reported by Lucas De Marchi:
WARNING: CPU: 8 PID: 1471 at kernel/events/core.c:1950 __do_sys_perf_event_open+0xf37/0x1080
This only happens for a new event which is its own group_leader, and in this case there cannot be any sibling events. Thus it's safe to skip the check for siblings, which avoids having to make invasive and ugly changes to for_each_sibling_event().
Avoid the splat by bailing out early when the new event is its own group_leader.
Fixes: 382c27f4ed28f803 ("perf: Fix perf_event_validate_size()") Closes: https://lore.kernel.org/lkml/20231214000620.3081018-1-lucas.demarchi@intel.c... Closes: https://lore.kernel.org/lkml/ZXpm6gQ%2Fd59jGsuW@xpf.sh.intel.com/ Reported-by: Lucas De Marchi lucas.demarchi@intel.com Reported-by: Pengfei Xu pengfei.xu@intel.com Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20231215112450.3972309-1-mark.rutland@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/events/core.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4995c6ba32e6..c383a26c4409 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1947,6 +1947,16 @@ static bool perf_event_validate_size(struct perf_event *event) group_leader->nr_siblings + 1) > 16*1024) return false;
+ /* + * When creating a new group leader, group_leader->ctx is initialized + * after the size has been validated, but we cannot safely use + * for_each_sibling_event() until group_leader->ctx is set. A new group + * leader cannot have any siblings yet, so we can safely skip checking + * the non-existent siblings. + */ + if (event == group_leader) + return true; + for_each_sibling_event(sibling, group_leader) { if (__perf_event_read_size(sibling->attr.read_format, group_leader->nr_siblings + 1) > 16*1024)
From: Josef Bacik josef@toxicpanda.com
stable inclusion from stable-v6.6.8 commit 654461744af8c33ee6943b7937272e5a4945e5c9 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit a8892fd71933126ebae3d60aec5918d4dceaae76 upstream.
Our btrfs subvolume snapshot <source> <destination> utility enforces that <source> is the root of the subvolume, however this isn't enforced in the kernel. Update the kernel to also enforce this limitation to avoid problems with other users of this ioctl that don't have the appropriate checks in place.
Reported-by: Martin Michaelis code@mgjm.de CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Neal Gompa neal@gompa.dev Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/btrfs/ioctl.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e2f00e084c88..0a05a5e65295 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1290,6 +1290,15 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file, * are limited to own subvolumes only */ ret = -EPERM; + } else if (btrfs_ino(BTRFS_I(src_inode)) != BTRFS_FIRST_FREE_OBJECTID) { + /* + * Snapshots must be made with the src_inode referring + * to the subvolume inode, otherwise the permission + * checking above is useless because we may have + * permission on a lower directory but not the subvol + * itself. + */ + ret = -EINVAL; } else { ret = btrfs_mksnapshot(&file->f_path, idmap, name, namelen,
From: Dan Williams dan.j.williams@intel.com
stable inclusion from stable-v6.6.8 commit c1d2d084751dfd8a8dc71856f741a4e0170af115 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 6f5c4eca48ffe18307b4e1d375817691c9005c87 upstream.
The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is sufficient to assert that cxl_dpa_rwsem is held rather than acquire it in the helper. Otherwise, it triggers multiple lockdep reports:
1/ Tracing callbacks are in an atomic context that can not acquire sleeping locks:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 [..] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
2/ The rwsem is already held in the inject poison path:
WARNING: possible recursive locking detected 6.7.0-rc2+ #12 Tainted: G W OE N -------------------------------------------- bash/1288 is trying to acquire lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]
but task is already holding lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core] [..] Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] __traceiter_cxl_poison+0x5c/0x80 [cxl_core] cxl_inject_poison+0x1bc/0x1e0 [cxl_core]
This appears to have been an issue since the initial implementation and uncovered by the new cxl-poison.sh test [1]. That test is now passing with these changes.
Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159... [1] Cc: stable@vger.kernel.org Cc: Alison Schofield alison.schofield@intel.com Cc: Jonathan Cameron Jonathan.Cameron@huawei.com Cc: Dave Jiang dave.jiang@intel.com Cc: Ira Weiny ira.weiny@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/cxl/core/hdm.c | 3 +-- drivers/cxl/core/port.c | 4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index dd1ec5914adf..4a4b10f4aa8a 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -363,10 +363,9 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) { resource_size_t base = -1;
- down_read(&cxl_dpa_rwsem); + lockdep_assert_held(&cxl_dpa_rwsem); if (cxled->dpa_res) base = cxled->dpa_res->start; - up_read(&cxl_dpa_rwsem);
return base; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index e2f888224362..eb4a6adecca2 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -219,9 +219,9 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at char *buf) { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - u64 base = cxl_dpa_resource_start(cxled);
- return sysfs_emit(buf, "%#llx\n", base); + guard(rwsem_read)(&cxl_dpa_rwsem); + return sysfs_emit(buf, "%#llx\n", (u64)cxl_dpa_resource_start(cxled)); } static DEVICE_ATTR_RO(dpa_resource);
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
stable inclusion from stable-v6.6.8 commit 40abc387459a0fc1bfd52feab095b2384a79c262 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit e199bf52ffda8f98f129728d57244a9cd9ad5623 upstream.
If bus is marked as multi_link, but number of masters in the stream is not higher than bus->hw_sync_min_links (bus->multi_link && m_rt_count >= bus->hw_sync_min_links), bank switching should not happen. The first part of do_bank_switch() code properly takes these conditions into account, but second part (sdw_ml_sync_bank_switch()) relies purely on bus->multi_link property. This is not balanced and leads to NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... Call trace: wait_for_completion_timeout+0x124/0x1f0 do_bank_switch+0x370/0x6f8 sdw_prepare_stream+0x2d0/0x438 qcom_snd_sdw_prepare+0xa0/0x118 sm8450_snd_prepare+0x128/0x148 snd_soc_link_prepare+0x5c/0xe8 __soc_pcm_prepare+0x28/0x1ec dpcm_be_dai_prepare+0x1e0/0x2c0 dpcm_fe_dai_prepare+0x108/0x28c snd_pcm_do_prepare+0x44/0x68 snd_pcm_action_single+0x54/0xc0 snd_pcm_action_nonatomic+0xe4/0xec snd_pcm_prepare+0xc4/0x114 snd_pcm_common_ioctl+0x1154/0x1cc0 snd_pcm_ioctl+0x54/0x74
Fixes: ce6e74d008ff ("soundwire: Add support for multi link bank switch") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20231124180136.390621-1-krzysztof.kozlowski@linaro... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/soundwire/stream.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c index d77a8a0d42c8..68d54887992d 100644 --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -742,14 +742,15 @@ static int sdw_bank_switch(struct sdw_bus *bus, int m_rt_count) * sdw_ml_sync_bank_switch: Multilink register bank switch * * @bus: SDW bus instance + * @multi_link: whether this is a multi-link stream with hardware-based sync * * Caller function should free the buffers on error */ -static int sdw_ml_sync_bank_switch(struct sdw_bus *bus) +static int sdw_ml_sync_bank_switch(struct sdw_bus *bus, bool multi_link) { unsigned long time_left;
- if (!bus->multi_link) + if (!multi_link) return 0;
/* Wait for completion of transfer */ @@ -847,7 +848,7 @@ static int do_bank_switch(struct sdw_stream_runtime *stream) bus->bank_switch_timeout = DEFAULT_BANK_SWITCH_TIMEOUT;
/* Check if bank switch was successful */ - ret = sdw_ml_sync_bank_switch(bus); + ret = sdw_ml_sync_bank_switch(bus, multi_link); if (ret < 0) { dev_err(bus->dev, "multi link bank switch failed: %d\n", ret);
From: Baokun Li libaokun1@huawei.com
stable inclusion from stable-v6.6.8 commit 4f18d187fb2a99f181b1aab0322f3f8451255b6b bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 2dcf5fde6dffb312a4bfb8ef940cea2d1f402e32 upstream.
For files with logical blocks close to EXT_MAX_BLOCKS, the file size predicted in ext4_mb_normalize_request() may exceed EXT_MAX_BLOCKS. This can cause some blocks to be preallocated that will not be used. And after [Fixes], the following issue may be triggered:
========================================================= kernel BUG at fs/ext4/mballoc.c:4653! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 1 PID: 2357 Comm: xfs_io 6.7.0-rc2-00195-g0f5cc96c367f Hardware name: linux,dummy-virt (DT) pc : ext4_mb_use_inode_pa+0x148/0x208 lr : ext4_mb_use_inode_pa+0x98/0x208 Call trace: ext4_mb_use_inode_pa+0x148/0x208 ext4_mb_new_inode_pa+0x240/0x4a8 ext4_mb_use_best_found+0x1d4/0x208 ext4_mb_try_best_found+0xc8/0x110 ext4_mb_regular_allocator+0x11c/0xf48 ext4_mb_new_blocks+0x790/0xaa8 ext4_ext_map_blocks+0x7cc/0xd20 ext4_map_blocks+0x170/0x600 ext4_iomap_begin+0x1c0/0x348 =========================================================
Here is a calculation when adjusting ac_b_ex in ext4_mb_new_inode_pa():
ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); if (ac->ac_o_ex.fe_logical >= ex.fe_logical) goto adjust_bex;
The problem is that when orig_goal_end is subtracted from ac_b_ex.fe_len it is still greater than EXT_MAX_BLOCKS, which causes ex.fe_logical to overflow to a very small value, which ultimately triggers a BUG_ON in ext4_mb_new_inode_pa() because pa->pa_free < len.
The last logical block of an actual write request does not exceed EXT_MAX_BLOCKS, so in ext4_mb_normalize_request() also avoids normalizing the last logical block to exceed EXT_MAX_BLOCKS to avoid the above issue.
The test case in [Link] can reproduce the above issue with 64k block size.
Link: https://patchwork.kernel.org/project/fstests/list/?series=804003 Cc: stable@kernel.org # 6.4 Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20231127063313.3734294-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/ext4/mballoc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index bbb2410a0724..a37972aef95e 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4509,6 +4509,10 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, start = max(start, rounddown(ac->ac_o_ex.fe_logical, (ext4_lblk_t)EXT4_BLOCKS_PER_GROUP(ac->ac_sb)));
+ /* avoid unnecessary preallocation that may trigger assertions */ + if (start + size > EXT_MAX_BLOCKS) + size = EXT_MAX_BLOCKS - start; + /* don't cover already allocated blocks in selected range */ if (ar->pleft && start <= ar->lleft) { size -= ar->lleft + 1 - start;
From: John Hubbard jhubbard@nvidia.com
stable inclusion from stable-v6.6.8 commit d228e98dfacb205d4e9ae181f5a8f467e28fc777 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 43e8832fed08438e2a27afed9bac21acd0ceffe5 upstream.
This reverts commit 9fc96c7c19df ("selftests: error out if kernel header files are not yet built").
It turns out that requiring the kernel headers to be built as a prerequisite to building selftests, does not work in many cases. For example, Peter Zijlstra writes:
"My biggest beef with the whole thing is that I simply do not want to use 'make headers', it doesn't work for me.
I have a ton of output directories and I don't care to build tools into the output dirs, in fact some of them flat out refuse to work that way (bpf comes to mind)." [1]
Therefore, stop erroring out on the selftests build. Additional patches will be required in order to change over to not requiring the kernel headers.
[1] https://lore.kernel.org/20231208221007.GO28727@noisy.programming.kicks-ass.n...
Link: https://lkml.kernel.org/r/20231209020144.244759-1-jhubbard@nvidia.com Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built") Signed-off-by: John Hubbard jhubbard@nvidia.com Cc: Anders Roxell anders.roxell@linaro.org Cc: Muhammad Usama Anjum usama.anjum@collabora.com Cc: David Hildenbrand david@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Nathan Chancellor nathan@kernel.org Cc: Shuah Khan shuah@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Marcos Paulo de Souza mpdesouza@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- tools/testing/selftests/Makefile | 21 +---------------- tools/testing/selftests/lib.mk | 40 +++----------------------------- 2 files changed, 4 insertions(+), 57 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 1a21d6beebc6..697f13bbbc32 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -152,12 +152,10 @@ ifneq ($(KBUILD_OUTPUT),) abs_objtree := $(realpath $(abs_objtree)) BUILD := $(abs_objtree)/kselftest KHDR_INCLUDES := -isystem ${abs_objtree}/usr/include - KHDR_DIR := ${abs_objtree}/usr/include else BUILD := $(CURDIR) abs_srctree := $(shell cd $(top_srcdir) && pwd) KHDR_INCLUDES := -isystem ${abs_srctree}/usr/include - KHDR_DIR := ${abs_srctree}/usr/include DEFAULT_INSTALL_HDR_PATH := 1 endif
@@ -171,7 +169,7 @@ export KHDR_INCLUDES # all isn't the first target in the file. .DEFAULT_GOAL := all
-all: kernel_header_files +all: @ret=1; \ for TARGET in $(TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ @@ -182,23 +180,6 @@ all: kernel_header_files ret=$$((ret * $$?)); \ done; exit $$ret;
-kernel_header_files: - @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \ - if [ $$? -ne 0 ]; then \ - RED='\033[1;31m'; \ - NOCOLOR='\033[0m'; \ - echo; \ - echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \ - echo "Please run this and try again:"; \ - echo; \ - echo " cd $(top_srcdir)"; \ - echo " make headers"; \ - echo; \ - exit 1; \ - fi - -.PHONY: kernel_header_files - run_tests: all @for TARGET in $(TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 118e0964bda9..aa646e0661f3 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -44,26 +44,10 @@ endif selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST)))) top_srcdir = $(selfdir)/../../..
-ifeq ("$(origin O)", "command line") - KBUILD_OUTPUT := $(O) +ifeq ($(KHDR_INCLUDES),) +KHDR_INCLUDES := -isystem $(top_srcdir)/usr/include endif
-ifneq ($(KBUILD_OUTPUT),) - # Make's built-in functions such as $(abspath ...), $(realpath ...) cannot - # expand a shell special character '~'. We use a somewhat tedious way here. - abs_objtree := $(shell cd $(top_srcdir) && mkdir -p $(KBUILD_OUTPUT) && cd $(KBUILD_OUTPUT) && pwd) - $(if $(abs_objtree),, \ - $(error failed to create output directory "$(KBUILD_OUTPUT)")) - # $(realpath ...) resolves symlinks - abs_objtree := $(realpath $(abs_objtree)) - KHDR_DIR := ${abs_objtree}/usr/include -else - abs_srctree := $(shell cd $(top_srcdir) && pwd) - KHDR_DIR := ${abs_srctree}/usr/include -endif - -KHDR_INCLUDES := -isystem $(KHDR_DIR) - # The following are built by lib.mk common compile rules. # TEST_CUSTOM_PROGS should be used by tests that require # custom build rule and prevent common build rule use. @@ -74,25 +58,7 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS)) TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED)) TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
-all: kernel_header_files $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) \ - $(TEST_GEN_FILES) - -kernel_header_files: - @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \ - if [ $$? -ne 0 ]; then \ - RED='\033[1;31m'; \ - NOCOLOR='\033[0m'; \ - echo; \ - echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \ - echo "Please run this and try again:"; \ - echo; \ - echo " cd $(top_srcdir)"; \ - echo " make headers"; \ - echo; \ - exit 1; \ - fi - -.PHONY: kernel_header_files +all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
define RUN_TESTS BASE_DIR="$(selfdir)"; \
From: James Houghton jthoughton@google.com
stable inclusion from stable-v6.6.8 commit 2c8a21a124cab4391254680cda796036d061d9b9 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 3c0696076aad60a2f04c019761921954579e1b0e upstream.
It is currently possible for a userspace application to enter an infinite page fault loop when using HugeTLB pages implemented with contiguous PTEs when HAFDBS is not available. This happens because:
1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).
2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling the memory access on a system without HAFDBS, we will get a page fault.
3. HugeTLB will check if it needs to update the dirty bits on the PTE. For contiguous PTEs, it will check to see if the pgprot bits need updating. In this case, HugeTLB wants to write a sequence of sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), so it thinks no update is necessary.
We can get the kernel to write a sw-dirty, hw-clean PTE with the following steps (showing the relevant VMA flags and pgprot bits):
i. Create a valid, writable contiguous PTE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE
ii. mprotect the VMA to PROT_NONE. VMA vmflags: VM_SHARED VMA pgprot bits: PTE_RDONLY PTE pgprot bits: PTE_DIRTY | PTE_RDONLY
iii. mprotect the VMA back to PROT_READ | PROT_WRITE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY
Make it impossible to create a writeable sw-dirty, hw-clean PTE with pte_modify(). Such a PTE should be impossible to create, and there may be places that assume that pte_dirty() implies pte_hw_dirty().
Signed-off-by: James Houghton jthoughton@google.com Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags") Cc: stable@vger.kernel.org Acked-by: Will Deacon will@kernel.org Reviewed-by: Ryan Roberts ryan.roberts@arm.com Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/arm64/include/asm/pgtable.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7f7d9b1df4e5..07bdf5dd8ebe 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -826,6 +826,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask); + /* + * If we end up clearing hw dirtiness for a sw-dirty PTE, set hardware + * dirtiness again. + */ + if (pte_sw_dirty(pte)) + pte = pte_mkdirty(pte); return pte; }
From: Florent Revest revest@chromium.org
stable inclusion from stable-v6.6.8 commit 28b36426b83e35dbfbf3d2104c9aac61c1d8f5df bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c12296bbecc488623b7d1932080e394d08f3226b upstream.
In __team_options_register, team_options are allocated and appended to the team's option_list. If one option instance allocation fails, the "inst_rollback" cleanup path frees the previously allocated options but doesn't remove them from the team's option_list. This leaves dangling pointers that can be dereferenced later by other parts of the team driver that iterate over options.
This patch fixes the cleanup path to remove the dangling pointers from the list.
As far as I can tell, this uaf doesn't have much security implications since it would be fairly hard to exploit (an attacker would need to make the allocation of that specific small object fail) but it's still nice to fix.
Cc: stable@vger.kernel.org Fixes: 80f7c6683fe0 ("team: add support for per-port options") Signed-off-by: Florent Revest revest@chromium.org Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/net/team/team.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index 508d9a392ab1..f575f225d417 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -281,8 +281,10 @@ static int __team_options_register(struct team *team, return 0;
inst_rollback: - for (i--; i >= 0; i--) + for (i--; i >= 0; i--) { __team_option_inst_del_option(team, dst_opts[i]); + list_del(&dst_opts[i]->list); + }
i = option_count; alloc_rollback:
From: Alex Deucher alexander.deucher@amd.com
stable inclusion from stable-v6.6.8 commit 3aae4ef4d799fb3d0381157640fdb251008cf0ae bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit ab4750332dbe535243def5dcebc24ca00c1f98ac upstream.
Add begin/end_use ring callbacks to disallow GFXOFF when SDMA work is submitted and allow it again afterward.
This should avoid corner cases where GFXOFF is erroneously entered when SDMA is still active. For now just allow/disallow GFXOFF in the begin and end helpers until we root cause the issue. This should not impact power as SDMA usage is pretty minimal and GFXOSS should not be active when SDMA is active anyway, this just makes it explicit.
v2: move everything into sdma5.2 code. No reason for this to be generic at this point. v3: Add comments in new code
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220 Reviewed-by: Mario Limonciello mario.limonciello@amd.com (v1) Tested-by: Mario Limonciello mario.limonciello@amd.com (v1) Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 2b3ebebc4299..0f930fd8a383 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -1651,6 +1651,32 @@ static void sdma_v5_2_get_clockgating_state(void *handle, u64 *flags) *flags |= AMD_CG_SUPPORT_SDMA_LS; }
+static void sdma_v5_2_ring_begin_use(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + /* SDMA 5.2.3 (RMB) FW doesn't seem to properly + * disallow GFXOFF in some cases leading to + * hangs in SDMA. Disallow GFXOFF while SDMA is active. + * We can probably just limit this to 5.2.3, + * but it shouldn't hurt for other parts since + * this GFXOFF will be disallowed anyway when SDMA is + * active, this just makes it explicit. + */ + amdgpu_gfx_off_ctrl(adev, false); +} + +static void sdma_v5_2_ring_end_use(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + /* SDMA 5.2.3 (RMB) FW doesn't seem to properly + * disallow GFXOFF in some cases leading to + * hangs in SDMA. Allow GFXOFF when SDMA is complete. + */ + amdgpu_gfx_off_ctrl(adev, true); +} + const struct amd_ip_funcs sdma_v5_2_ip_funcs = { .name = "sdma_v5_2", .early_init = sdma_v5_2_early_init, @@ -1698,6 +1724,8 @@ static const struct amdgpu_ring_funcs sdma_v5_2_ring_funcs = { .test_ib = sdma_v5_2_ring_test_ib, .insert_nop = sdma_v5_2_ring_insert_nop, .pad_ib = sdma_v5_2_ring_pad_ib, + .begin_use = sdma_v5_2_ring_begin_use, + .end_use = sdma_v5_2_ring_end_use, .emit_wreg = sdma_v5_2_ring_emit_wreg, .emit_reg_wait = sdma_v5_2_ring_emit_reg_wait, .emit_reg_write_reg_wait = sdma_v5_2_ring_emit_reg_write_reg_wait,
From: Stuart Lee stuart.lee@mediatek.com
stable inclusion from stable-v6.6.8 commit 03e63e497a404f444f5af25e120739da3a4f03ff bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b6961d187fcd138981b8707dac87b9fcdbfe75d1 upstream.
Add error handling to check NULL input in mtk_drm_crtc_dma_dev_get function.
While display path is not configured correctly, none of crtc is established. So the caller of mtk_drm_crtc_dma_dev_get may pass input parameter *crtc as NULL, Which may cause coredump when we try to get the container of NULL pointer.
Fixes: cb1d6bcca542 ("drm/mediatek: Add dma dev get function") Signed-off-by: Stuart Lee stuart.lee@mediatek.com Cc: stable@vger.kernel.org Reviewed-by: AngeloGioacchino DEl Regno angelogioacchino.delregno@collabora.com Tested-by: Macpaul Lin macpaul.lin@mediatek.com Link: https://patchwork.kernel.org/project/dri-devel/patch/20231110012914.14884-2-... Signed-off-by: Chun-Kuang Hu chunkuang.hu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index a033da279a94..80e34b39a6b1 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -885,7 +885,14 @@ static int mtk_drm_crtc_init_comp_planes(struct drm_device *drm_dev,
struct device *mtk_drm_crtc_dma_dev_get(struct drm_crtc *crtc) { - struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); + struct mtk_drm_crtc *mtk_crtc = NULL; + + if (!crtc) + return NULL; + + mtk_crtc = to_mtk_crtc(crtc); + if (!mtk_crtc) + return NULL;
return mtk_crtc->dma_dev; }
From: Amelie Delaunay amelie.delaunay@foss.st.com
stable inclusion from stable-v6.6.8 commit 9127515bf9cd196d71ffe33bd7791bf272bbff0c bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 54bed6bafa0f38daf9697af50e3aff5ff1354fe1 upstream.
stm32_dma_get_burst() returns a negative error for invalid input, which gets turned into a large u32 value in stm32_dma_prep_dma_memcpy() that in turn triggers an assertion because it does not fit into a two-bit field: drivers/dma/stm32-dma.c: In function 'stm32_dma_prep_dma_memcpy': include/linux/compiler_types.h:354:38: error: call to '__compiletime_assert_282' declared with attribute error: FIELD_PREP: value too large for the field _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ include/linux/compiler_types.h:335:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler_types.h:354:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~~~~~~~~~~~~~~~~~ include/linux/bitfield.h:68:3: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ? \ ^~~~~~~~~~~~~~~~ include/linux/bitfield.h:114:3: note: in expansion of macro '__BF_FIELD_CHECK' __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \ ^~~~~~~~~~~~~~~~ drivers/dma/stm32-dma.c:1237:4: note: in expansion of macro 'FIELD_PREP' FIELD_PREP(STM32_DMA_SCR_PBURST_MASK, dma_burst) | ^~~~~~~~~~
As an easy workaround, assume the error can happen, so try to handle this by failing stm32_dma_prep_dma_memcpy() before the assertion. It replicates what is done in stm32_dma_set_xfer_param() where stm32_dma_get_burst() is also used.
Fixes: 1c32d6c37cc2 ("dmaengine: stm32-dma: use bitfield helpers") Fixes: a2b6103b7a8a ("dmaengine: stm32-dma: Improve memory burst management") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Amelie Delaunay amelie.delaunay@foss.st.com Cc: stable@vger.kernel.org Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202311060135.Q9eMnpCL-lkp@intel.com/ Link: https://lore.kernel.org/r/20231106134832.1470305-1-amelie.delaunay@foss.st.c... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/dma/stm32-dma.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c index 0b30151fb45c..9840594a6aaa 100644 --- a/drivers/dma/stm32-dma.c +++ b/drivers/dma/stm32-dma.c @@ -1249,8 +1249,8 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_dma_memcpy( enum dma_slave_buswidth max_width; struct stm32_dma_desc *desc; size_t xfer_count, offset; - u32 num_sgs, best_burst, dma_burst, threshold; - int i; + u32 num_sgs, best_burst, threshold; + int dma_burst, i;
num_sgs = DIV_ROUND_UP(len, STM32_DMA_ALIGNED_MAX_DATA_ITEMS); desc = kzalloc(struct_size(desc, sg_req, num_sgs), GFP_NOWAIT); @@ -1268,6 +1268,10 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_dma_memcpy( best_burst = stm32_dma_get_best_burst(len, STM32_DMA_MAX_BURST, threshold, max_width); dma_burst = stm32_dma_get_burst(chan, best_burst); + if (dma_burst < 0) { + kfree(desc); + return NULL; + }
stm32_dma_clear_reg(&desc->sg_req[i].chan_reg); desc->sg_req[i].chan_reg.dma_scr =
From: Frank Li Frank.Li@nxp.com
stable inclusion from stable-v6.6.8 commit ed50e07d6a8eae1c079b637926811d7500e8dd9e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 4ee632c82d2dbb9e2dcc816890ef182a151cbd99 upstream.
Allocate channel count consistently increases due to a missing source ID (srcid) cleanup in the fsl_edma_free_chan_resources() function at imx93 eDMAv4.
Reset 'srcid' at fsl_edma_free_chan_resources().
Cc: stable@vger.kernel.org Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support") Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20231127214325.2477247-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/dma/fsl-edma-common.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c index 6a3abe5b1790..b53f46245c37 100644 --- a/drivers/dma/fsl-edma-common.c +++ b/drivers/dma/fsl-edma-common.c @@ -828,6 +828,7 @@ void fsl_edma_free_chan_resources(struct dma_chan *chan) dma_pool_destroy(fsl_chan->tcd_pool); fsl_chan->tcd_pool = NULL; fsl_chan->is_sw = false; + fsl_chan->srcid = 0; }
void fsl_edma_cleanup_vchan(struct dma_device *dmadev)
From: Yu Zhao yuzhao@google.com
stable inclusion from stable-v6.6.8 commit b2ce691b452f2731bd7d30c5b05333bf2ae97a0d bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 081488051d28d32569ebb7c7a23572778b2e7d57 upstream.
Unmapped folios accessed through file descriptors can be underprotected. Those folios are added to the oldest generation based on:
1. The fact that they are less costly to reclaim (no need to walk the rmap and flush the TLB) and have less impact on performance (don't cause major PFs and can be non-blocking if needed again). 2. The observation that they are likely to be single-use. E.g., for client use cases like Android, its apps parse configuration files and store the data in heap (anon); for server use cases like MySQL, it reads from InnoDB files and holds the cached data for tables in buffer pools (anon).
However, the oldest generation can be very short lived, and if so, it doesn't provide the PID controller with enough time to respond to a surge of refaults. (Note that the PID controller uses weighted refaults and those from evicted generations only take a half of the whole weight.) In other words, for a short lived generation, the moving average smooths out the spike quickly.
To fix the problem: 1. For folios that are already on LRU, if they can be beyond the tracking range of tiers, i.e., five accesses through file descriptors, move them to the second oldest generation to give them more time to age. (Note that tiers are used by the PID controller to statistically determine whether folios accessed multiple times through file descriptors are worth protecting.) 2. When adding unmapped folios to LRU, adjust the placement of them so that they are not too close to the tail. The effect of this is similar to the above.
On Android, launching 55 apps sequentially: Before After Change workingset_refault_anon 25641024 25598972 0% workingset_refault_file 115016834 106178438 -8%
Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: Charan Teja Kalla quic_charante@quicinc.com Tested-by: Kalesh Singh kaleshsingh@google.com Cc: T.J. Mercier tjmercier@google.com Cc: Kairui Song ryncsn@gmail.com Cc: Hillf Danton hdanton@sina.com Cc: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- include/linux/mm_inline.h | 23 ++++++++++++++--------- mm/vmscan.c | 2 +- mm/workingset.c | 6 +++--- 3 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 57acfa854841..386e1eaac1fa 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -234,22 +234,27 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, if (folio_test_unevictable(folio) || !lrugen->enabled) return false; /* - * There are three common cases for this page: - * 1. If it's hot, e.g., freshly faulted in or previously hot and - * migrated, add it to the youngest generation. - * 2. If it's cold but can't be evicted immediately, i.e., an anon page - * not in swapcache or a dirty page pending writeback, add it to the - * second oldest generation. - * 3. Everything else (clean, cold) is added to the oldest generation. + * There are four common cases for this page: + * 1. If it's hot, i.e., freshly faulted in, add it to the youngest + * generation, and it's protected over the rest below. + * 2. If it can't be evicted immediately, i.e., a dirty page pending + * writeback, add it to the second youngest generation. + * 3. If it should be evicted first, e.g., cold and clean from + * folio_rotate_reclaimable(), add it to the oldest generation. + * 4. Everything else falls between 2 & 3 above and is added to the + * second oldest generation if it's considered inactive, or the + * oldest generation otherwise. See lru_gen_is_active(). */ if (folio_test_active(folio)) seq = lrugen->max_seq; else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || (folio_test_reclaim(folio) && (folio_test_dirty(folio) || folio_test_writeback(folio)))) - seq = lrugen->min_seq[type] + 1; - else + seq = lrugen->max_seq - 1; + else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq) seq = lrugen->min_seq[type]; + else + seq = lrugen->min_seq[type] + 1;
gen = lru_gen_from_seq(seq); flags = (gen + 1UL) << LRU_GEN_PGOFF; diff --git a/mm/vmscan.c b/mm/vmscan.c index fc3d70abc78e..b2dd9d47d0d9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4953,7 +4953,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c }
/* protected */ - if (tier > tier_idx) { + if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { int hist = lru_hist_from_seq(lrugen->min_seq[type]);
gen = folio_inc_gen(lruvec, folio, false); diff --git a/mm/workingset.c b/mm/workingset.c index da58a26d0d4d..2559a1f2fc1c 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 1. For pages accessed through page tables, hotter pages pushed out * hot pages which refaulted immediately. * 2. For pages accessed multiple times through file descriptors, - * numbers of accesses might have been out of the range. + * they would have been protected by sort_folio(). */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { - folio_set_workingset(folio); + if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) { + set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset)); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } unlock:
From: Yu Zhao yuzhao@google.com
stable inclusion from stable-v6.6.8 commit c5f67b7e847490d41219f1dfe1b1610e88a997e4 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 5095a2b23987d3c3c47dd16b3d4080e2733b8bb9 upstream.
The initial MGLRU patchset didn't include the memcg LRU support, and it relied on should_abort_scan(), added by commit f76c83378851 ("mm: multi-gen LRU: optimize multiple memcgs"), to "backoff to avoid overshooting their aggregate reclaim target by too much".
Later on when the memcg LRU was added, should_abort_scan() was deemed unnecessary, and the test results [1] showed no side effects after it was removed by commit a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard").
However, that test used memory.reclaim, which sets nr_to_reclaim to SWAP_CLUSTER_MAX. So it can overshoot only by SWAP_CLUSTER_MAX-1 pages, i.e., from nr_reclaimed=nr_to_reclaim-1 to nr_reclaimed=nr_to_reclaim+SWAP_CLUSTER_MAX-1. Compared with the batch size kswapd sets to nr_to_reclaim, SWAP_CLUSTER_MAX is tiny. Therefore that test isn't able to reproduce the worst case scenario, i.e., kswapd overshooting GBs on large systems and "consuming 100% CPU" (see the Closes tag).
Bring back a simplified version of should_abort_scan() on top of the memcg LRU, so that kswapd stops when all eligible zones are above their respective high watermarks plus a small delta to lower the chance of KSWAPD_HIGH_WMARK_HIT_QUICKLY. Note that this only applies to order-0 reclaim, meaning compaction-induced reclaim can still run wild (which is a different problem).
On Android, launching 55 apps sequentially: Before After Change pgpgin 838377172 802955040 -4% pgpgout 38037080 34336300 -10%
[1] https://lore.kernel.org/20221222041905.2431096-1-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20231208061407.2125867-2-yuzhao@google.com Fixes: a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard") Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: Charan Teja Kalla quic_charante@quicinc.com Reported-by: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Closes: https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@... Tested-by: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Tested-by: Kalesh Singh kaleshsingh@google.com Cc: Hillf Danton hdanton@sina.com Cc: Kairui Song ryncsn@gmail.com Cc: T.J. Mercier tjmercier@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- mm/vmscan.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c index b2dd9d47d0d9..4266401df75e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5361,20 +5361,41 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0; }
-static unsigned long get_nr_to_reclaim(struct scan_control *sc) +static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc) { + int i; + enum zone_watermarks mark; + /* don't abort memcg reclaim to ensure fairness */ if (!root_reclaim(sc)) - return -1; + return false; + + if (sc->nr_reclaimed >= max(sc->nr_to_reclaim, compact_gap(sc->order))) + return true; + + /* check the order to exclude compaction-induced reclaim */ + if (!current_is_kswapd() || sc->order) + return false;
- return max(sc->nr_to_reclaim, compact_gap(sc->order)); + mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ? + WMARK_PROMO : WMARK_HIGH; + + for (i = 0; i <= sc->reclaim_idx; i++) { + struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i; + unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH; + + if (managed_zone(zone) && !zone_watermark_ok(zone, 0, size, sc->reclaim_idx, 0)) + return false; + } + + /* kswapd should abort if all eligible zones are safe */ + return true; }
static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { long nr_to_scan; unsigned long scanned = 0; - unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); int swappiness = get_swappiness(lruvec, sc);
/* clean file folios are more likely to exist */ @@ -5396,7 +5417,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) if (scanned >= nr_to_scan) break;
- if (sc->nr_reclaimed >= nr_to_reclaim) + if (should_abort_scan(lruvec, sc)) break;
cond_resched(); @@ -5457,7 +5478,6 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) struct lru_gen_folio *lrugen; struct mem_cgroup *memcg; const struct hlist_nulls_node *pos; - unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); restart: @@ -5490,7 +5510,7 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
rcu_read_lock();
- if (sc->nr_reclaimed >= nr_to_reclaim) + if (should_abort_scan(lruvec, sc)) break; }
@@ -5501,7 +5521,7 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
mem_cgroup_put(memcg);
- if (sc->nr_reclaimed >= nr_to_reclaim) + if (!is_a_nulls(pos)) return;
/* restart if raced with lru_gen_rotate_memcg() */
From: Yu Zhao yuzhao@google.com
stable inclusion from stable-v6.6.8 commit 6b131c2a2875682bc5ea0d4b15ba76775aa3ea80 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 8aa420617918d12d1f5d55030a503c9418e73c2c upstream.
While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru: try to stop at high watermarks"), it was discovered that the memcg LRU can breach the thrashing protection imposed by min_ttl_ms.
Before the memcg LRU: kswapd() shrink_node_memcgs() mem_cgroup_iter() inc_max_seq() // always hit a different memcg lru_gen_age_node() mem_cgroup_iter() check the timestamp of the oldest generation
After the memcg LRU: kswapd() shrink_many() restart: iterate the memcg LRU: inc_max_seq() // occasionally hit the same memcg if raced with lru_gen_rotate_memcg(): goto restart lru_gen_age_node() mem_cgroup_iter() check the timestamp of the oldest generation
Specifically, when the restart happens in shrink_many(), it needs to stick with the (memcg LRU) generation it began with. In other words, it should neither re-read memcg_lru->seq nor age an lruvec of a different generation. Otherwise it can hit the same memcg multiple times without giving lru_gen_age_node() a chance to check the timestamp of that memcg's oldest generation (against min_ttl_ms).
[1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@...
Link: https://lkml.kernel.org/r/20231208061407.2125867-3-yuzhao@google.com Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao yuzhao@google.com Tested-by: T.J. Mercier tjmercier@google.com Cc: Charan Teja Kalla quic_charante@quicinc.com Cc: Hillf Danton hdanton@sina.com Cc: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Cc: Kairui Song ryncsn@gmail.com Cc: Kalesh Singh kaleshsingh@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- include/linux/mmzone.h | 30 +++++++++++++++++------------- mm/vmscan.c | 30 ++++++++++++++++-------------- 2 files changed, 33 insertions(+), 27 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 655157b32fa6..e9dbaae315bf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -514,33 +514,37 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * the old generation, is incremented when all its bins become empty. * * There are four operations: - * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its + * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its * current generation (old or young) and updates its "seg" to "head"; - * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its + * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its * current generation (old or young) and updates its "seg" to "tail"; - * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old + * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old * generation, updates its "gen" to "old" and resets its "seg" to "default"; - * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the + * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the * young generation, updates its "gen" to "young" and resets its "seg" to * "default". * * The events that trigger the above operations are: * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; - * 2. The first attempt to reclaim an memcg below low, which triggers + * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim an memcg below reclaimable size threshold, + * 3. The first attempt to reclaim a memcg below reclaimable size threshold, * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim an memcg below reclaimable size threshold, + * 4. The second attempt to reclaim a memcg below reclaimable size threshold, * which triggers MEMCG_LRU_YOUNG; - * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG; + * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; - * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD. + * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. * - * Note that memcg LRU only applies to global reclaim, and the round-robin - * incrementing of their max_seq counters ensures the eventual fairness to all - * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). + * Notes: + * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing + * of their max_seq counters ensures the eventual fairness to all eligible + * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). + * 2. There are only two valid generations: old (seq) and young (seq+1). + * MEMCG_NR_GENS is set to three so that when reading the generation counter + * locklessly, a stale value (seq-1) does not wraparound to young. */ -#define MEMCG_NR_GENS 2 +#define MEMCG_NR_GENS 3 #define MEMCG_NR_BINS 8
struct lru_gen_memcg { diff --git a/mm/vmscan.c b/mm/vmscan.c index 4266401df75e..38dee3f8f631 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4810,6 +4810,9 @@ static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) else VM_WARN_ON_ONCE(true);
+ WRITE_ONCE(lruvec->lrugen.seg, seg); + WRITE_ONCE(lruvec->lrugen.gen, new); + hlist_nulls_del_rcu(&lruvec->lrugen.list);
if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD) @@ -4820,9 +4823,6 @@ static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) pgdat->memcg_lru.nr_memcgs[old]--; pgdat->memcg_lru.nr_memcgs[new]++;
- lruvec->lrugen.gen = new; - WRITE_ONCE(lruvec->lrugen.seg, seg); - if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq)) WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
@@ -4845,11 +4845,11 @@ void lru_gen_online_memcg(struct mem_cgroup *memcg)
gen = get_memcg_gen(pgdat->memcg_lru.seq);
+ lruvec->lrugen.gen = gen; + hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]); pgdat->memcg_lru.nr_memcgs[gen]++;
- lruvec->lrugen.gen = gen; - spin_unlock_irq(&pgdat->memcg_lru.lock); } } @@ -5348,7 +5348,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool DEFINE_MAX_SEQ(lruvec);
if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) - return 0; + return -1;
if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan)) return nr_to_scan; @@ -5423,7 +5423,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) cond_resched(); }
- /* whether try_to_inc_max_seq() was successful */ + /* whether this lruvec should be rotated */ return nr_to_scan < 0; }
@@ -5477,13 +5477,13 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) struct lruvec *lruvec; struct lru_gen_folio *lrugen; struct mem_cgroup *memcg; - const struct hlist_nulls_node *pos; + struct hlist_nulls_node *pos;
+ gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); restart: op = 0; memcg = NULL; - gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
rcu_read_lock();
@@ -5494,6 +5494,10 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) }
mem_cgroup_put(memcg); + memcg = NULL; + + if (gen != READ_ONCE(lrugen->gen)) + continue;
lruvec = container_of(lrugen, struct lruvec, lrugen); memcg = lruvec_memcg(lruvec); @@ -5578,16 +5582,14 @@ static void set_initial_priority(struct pglist_data *pgdat, struct scan_control if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH) return; /* - * Determine the initial priority based on ((total / MEMCG_NR_GENS) >> - * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the - * estimated reclaimed_to_scanned_ratio = inactive / total. + * Determine the initial priority based on + * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, + * where reclaimed_to_scanned_ratio = inactive / total. */ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE); if (get_swappiness(lruvec, sc)) reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
- reclaimable /= MEMCG_NR_GENS; - /* round down reclaimable and round up sc->nr_to_reclaim */ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
From: Yu Zhao yuzhao@google.com
stable inclusion from stable-v6.6.8 commit a107d6a132cbb596afa648f588f0078713ad46e9 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 4376807bf2d5371c3e00080c972be568c3f8a7d1 upstream.
In the effort to reduce zombie memcgs [1], it was discovered that the memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically, instead of rotating them to the tail of the current generation (MEMCG_LRU_TAIL) for a second attempt, it moves them to the next generation (MEMCG_LRU_YOUNG) after the first attempt.
Not applying enough pressure on offlined memcgs can cause them to build up, and this can be particularly harmful to memory-constrained systems.
On Pixel 8 Pro, launching apps for 50 cycles: Before After Change Zombie memcgs 45 35 -22%
[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@...
Link: https://lkml.kernel.org/r/20231208061407.2125867-4-yuzhao@google.com Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: T.J. Mercier tjmercier@google.com Tested-by: T.J. Mercier tjmercier@google.com Cc: Charan Teja Kalla quic_charante@quicinc.com Cc: Hillf Danton hdanton@sina.com Cc: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Cc: Kairui Song ryncsn@gmail.com Cc: Kalesh Singh kaleshsingh@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- include/linux/mmzone.h | 8 ++++---- mm/vmscan.c | 24 ++++++++++++++++-------- 2 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e9dbaae315bf..e2ff25522c7a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -528,10 +528,10 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_YOUNG; + * 3. The first attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_TAIL; + * 4. The second attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_YOUNG; * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. diff --git a/mm/vmscan.c b/mm/vmscan.c index 38dee3f8f631..93097be6f10d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5311,7 +5311,12 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, }
/* try to scrape all its memory if this memcg was deleted */ - *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total; + if (!mem_cgroup_online(memcg)) { + *nr_to_scan = total; + return false; + } + + *nr_to_scan = total >> sc->priority;
/* * The aging tries to be lazy to reduce the overhead, while the eviction @@ -5432,14 +5437,9 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) bool success; unsigned long scanned = sc->nr_scanned; unsigned long reclaimed = sc->nr_reclaimed; - int seg = lru_gen_memcg_seg(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- /* see the comment on MEMCG_NR_GENS */ - if (!lruvec_is_sizable(lruvec, sc)) - return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; - mem_cgroup_calculate_protection(NULL, memcg);
if (mem_cgroup_below_min(NULL, memcg)) @@ -5447,7 +5447,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
if (mem_cgroup_below_low(NULL, memcg)) { /* see the comment on MEMCG_NR_GENS */ - if (seg != MEMCG_LRU_TAIL) + if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL) return MEMCG_LRU_TAIL;
memcg_memory_event(memcg, MEMCG_LOW); @@ -5463,7 +5463,15 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
flush_reclaim_state(sc);
- return success ? MEMCG_LRU_YOUNG : 0; + if (success && mem_cgroup_online(memcg)) + return MEMCG_LRU_YOUNG; + + if (!success && lruvec_is_sizable(lruvec, sc)) + return 0; + + /* one retry if offlined or too small */ + return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ? + MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; }
#ifdef CONFIG_MEMCG
From: David Stevens stevensd@chromium.org
stable inclusion from stable-v6.6.8 commit 7a4ae7acd20805940307be98b304871a43288a01 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 55ac8bbe358bdd2f3c044c12f249fd22d48fe015 upstream.
Split folios during the second loop of shmem_undo_range. It's not sufficient to only split folios when dealing with partial pages, since it's possible for a THP to be faulted in after that point. Calling truncate_inode_folio in that situation can result in throwing away data outside of the range being targeted.
[akpm@linux-foundation.org: tidy up comment layout] Link: https://lkml.kernel.org/r/20230418084031.3439795-1-stevensd@google.com Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios") Signed-off-by: David Stevens stevensd@chromium.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Suleiman Souhlal suleiman@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- mm/shmem.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c index de9a884b10e8..bc0091750825 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1101,7 +1101,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, } VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); - truncate_inode_folio(mapping, folio); + + if (!folio_test_large(folio)) { + truncate_inode_folio(mapping, folio); + } else if (truncate_inode_partial_folio(folio, lstart, lend)) { + /* + * If we split a page, reset the loop so + * that we pick up the new sub pages. + * Otherwise the THP was entirely + * dropped or the target range was + * zeroed, so just continue the loop as + * is. + */ + if (!folio_test_large(folio)) { + folio_unlock(folio); + index = start; + break; + } + } } folio_unlock(folio); }
From: Ignat Korchagin ignat@cloudflare.com
stable inclusion from stable-v6.6.8 commit 37b561d55936d0fcebd199d22180e595bb72b784 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c41bd2514184d75db087fe4c1221237fb7922875 upstream.
In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP required CONFIG_KEXEC.
However, it was not enough at least for arm64 platforms. While further testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of which are set in our config.
Given that we already established that CONFIG_KEXEC (which is a switch for kexec system call itself) is not required for CONFIG_CRASH_DUMP drop CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") and this patch are applied given that the necessary shared bits are included via CONFIG_KEXEC_CORE dependency.
[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n] Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv [bhe@redhat.com: riscv, kexec: fix dependency of two items] Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Ignat Korchagin ignat@cloudflare.com Signed-off-by: Baoquan He bhe@redhat.com Acked-by: Baoquan He bhe@redhat.com Cc: Alexander Gordeev agordeev@linux.ibm.com Cc: stable@vger.kernel.org # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/riscv/Kconfig | 4 ++-- arch/riscv/kernel/crash_core.c | 4 +++- kernel/Kconfig.kexec | 1 - 3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 18646dee0f53..32aadf652681 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -670,7 +670,7 @@ config RISCV_BOOT_SPINWAIT If unsure what to do here, say N.
config ARCH_SUPPORTS_KEXEC - def_bool MMU + def_bool y
config ARCH_SELECTS_KEXEC def_bool y @@ -678,7 +678,7 @@ config ARCH_SELECTS_KEXEC select HOTPLUG_CPU if SMP
config ARCH_SUPPORTS_KEXEC_FILE - def_bool 64BIT && MMU + def_bool 64BIT
config ARCH_SELECTS_KEXEC_FILE def_bool y diff --git a/arch/riscv/kernel/crash_core.c b/arch/riscv/kernel/crash_core.c index 55f1d7856b54..8706736fd4e2 100644 --- a/arch/riscv/kernel/crash_core.c +++ b/arch/riscv/kernel/crash_core.c @@ -5,17 +5,19 @@
void arch_crash_save_vmcoreinfo(void) { - VMCOREINFO_NUMBER(VA_BITS); VMCOREINFO_NUMBER(phys_ram_base);
vmcoreinfo_append_str("NUMBER(PAGE_OFFSET)=0x%lx\n", PAGE_OFFSET); vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END); +#ifdef CONFIG_MMU + VMCOREINFO_NUMBER(VA_BITS); vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START); vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END); #ifdef CONFIG_64BIT vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR); vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END); +#endif #endif vmcoreinfo_append_str("NUMBER(KERNEL_LINK_ADDR)=0x%lx\n", KERNEL_LINK_ADDR); vmcoreinfo_append_str("NUMBER(va_kernel_pa_offset)=0x%lx\n", diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index 143f4d8eab7c..f9619ac6b71d 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -94,7 +94,6 @@ config KEXEC_JUMP config CRASH_DUMP bool "kernel crash dumps" depends on ARCH_SUPPORTS_CRASH_DUMP - depends on ARCH_SUPPORTS_KEXEC select CRASH_CORE select KEXEC_CORE help
From: Boris Burkov boris@bur.io
stable inclusion from stable-v6.6.8 commit 14570dfa170ede01eacedc60f717df121d8b9bbd bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit f63e1164b90b385cd832ff0fdfcfa76c3cc15436 upstream.
An ordered extent completing is a critical moment in qgroup reserve handling, as the ownership of the reservation is handed off from the ordered extent to the delayed ref. In the happy path we release (unlock) but do not free (decrement counter) the reservation, and the delayed ref drives the free. However, on an error, we don't create a delayed ref, since there is no ref to add. Therefore, free on the error path.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/btrfs/ordered-data.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 345c449d588c..0fa0bc13e8ac 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -603,7 +603,9 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode, release = entry->disk_num_bytes; else release = entry->num_bytes; - btrfs_delalloc_release_metadata(btrfs_inode, release, false); + btrfs_delalloc_release_metadata(btrfs_inode, release, + test_bit(BTRFS_ORDERED_IOERR, + &entry->flags)); }
percpu_counter_add_batch(&fs_info->ordered_bytes, -entry->num_bytes,
From: Boris Burkov boris@bur.io
stable inclusion from stable-v6.6.8 commit 1823491513e3f72beb85f7fba2478ef3562615d8 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 upstream.
The reserved data counter and input parameter is a u64, but we inadvertently accumulate it in an int. Overflowing that int results in freeing the wrong amount of data and breaking reserve accounting.
Unfortunately, this overflow rot spreads from there, as the qgroup release/free functions rely on returning an int to take advantage of negative values for error codes.
Therefore, the full fix is to return the "released" or "freed" amount by a u64 argument and to return 0 or negative error code via the return value.
Most of the call sites simply ignore the return value, though some of them handle the error and count the returned bytes. Change all of them accordingly.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/btrfs/delalloc-space.c | 2 +- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 16 ++++++++-------- fs/btrfs/ordered-data.c | 7 ++++--- fs/btrfs/qgroup.c | 25 +++++++++++++++---------- fs/btrfs/qgroup.h | 4 ++-- 6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index 0d105ed1b8de..eef341bbcc60 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -199,7 +199,7 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode, start = round_down(start, fs_info->sectorsize);
btrfs_free_reserved_data_space_noquota(fs_info, len); - btrfs_qgroup_free_data(inode, reserved, start, len); + btrfs_qgroup_free_data(inode, reserved, start, len, NULL); }
/* diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 23a145ca9457..c997b790568f 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3187,7 +3187,7 @@ static long btrfs_fallocate(struct file *file, int mode, qgroup_reserved -= range->len; } else if (qgroup_reserved > 0) { btrfs_qgroup_free_data(BTRFS_I(inode), data_reserved, - range->start, range->len); + range->start, range->len, NULL); qgroup_reserved -= range->len; } list_del(&range->list); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c92c589b454d..7bf5c2c1a54d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -687,7 +687,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 size, * And at reserve time, it's always aligned to page size, so * just free one page here. */ - btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE); + btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE, NULL); btrfs_free_path(path); btrfs_end_transaction(trans); return ret; @@ -5129,7 +5129,7 @@ static void evict_inode_truncate_pages(struct inode *inode) */ if (state_flags & EXTENT_DELALLOC) btrfs_qgroup_free_data(BTRFS_I(inode), NULL, start, - end - start + 1); + end - start + 1, NULL);
clear_extent_bit(io_tree, start, end, EXTENT_CLEAR_ALL_BITS | EXTENT_DO_ACCOUNTING, @@ -8051,7 +8051,7 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset, * reserved data space. * Since the IO will never happen for this page. */ - btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur); + btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur, NULL); if (!inode_evicting) { clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_UPTODATE | @@ -9481,7 +9481,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent( struct btrfs_path *path; u64 start = ins->objectid; u64 len = ins->offset; - int qgroup_released; + u64 qgroup_released = 0; int ret;
memset(&stack_fi, 0, sizeof(stack_fi)); @@ -9494,9 +9494,9 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent( btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE); /* Encryption and other encoding is reserved and all 0 */
- qgroup_released = btrfs_qgroup_release_data(inode, file_offset, len); - if (qgroup_released < 0) - return ERR_PTR(qgroup_released); + ret = btrfs_qgroup_release_data(inode, file_offset, len, &qgroup_released); + if (ret < 0) + return ERR_PTR(ret);
if (trans) { ret = insert_reserved_file_extent(trans, inode, @@ -10391,7 +10391,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from, btrfs_delalloc_release_metadata(inode, disk_num_bytes, ret < 0); out_qgroup_free_data: if (ret < 0) - btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes); + btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes, NULL); out_free_data_space: /* * If btrfs_reserve_extent() succeeded, then we already decremented diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 0fa0bc13e8ac..2b8ff8b53af0 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -153,11 +153,12 @@ static struct btrfs_ordered_extent *alloc_ordered_extent( { struct btrfs_ordered_extent *entry; int ret; + u64 qgroup_rsv = 0;
if (flags & ((1 << BTRFS_ORDERED_NOCOW) | (1 << BTRFS_ORDERED_PREALLOC))) { /* For nocow write, we can release the qgroup rsv right now */ - ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes); + ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes, &qgroup_rsv); if (ret < 0) return ERR_PTR(ret); } else { @@ -165,7 +166,7 @@ static struct btrfs_ordered_extent *alloc_ordered_extent( * The ordered extent has reserved qgroup space, release now * and pass the reserved number for qgroup_record to free. */ - ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes); + ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes, &qgroup_rsv); if (ret < 0) return ERR_PTR(ret); } @@ -183,7 +184,7 @@ static struct btrfs_ordered_extent *alloc_ordered_extent( entry->inode = igrab(&inode->vfs_inode); entry->compress_type = compress_type; entry->truncated_len = (u64)-1; - entry->qgroup_rsv = ret; + entry->qgroup_rsv = qgroup_rsv; entry->flags = flags; refcount_set(&entry->refs, 1); init_waitqueue_head(&entry->wait); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index bdaebb9fc689..7c9249438154 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3855,13 +3855,14 @@ int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
/* Free ranges specified by @reserved, normally in error path */ static int qgroup_free_reserved_data(struct btrfs_inode *inode, - struct extent_changeset *reserved, u64 start, u64 len) + struct extent_changeset *reserved, + u64 start, u64 len, u64 *freed_ret) { struct btrfs_root *root = inode->root; struct ulist_node *unode; struct ulist_iterator uiter; struct extent_changeset changeset; - int freed = 0; + u64 freed = 0; int ret;
extent_changeset_init(&changeset); @@ -3902,7 +3903,9 @@ static int qgroup_free_reserved_data(struct btrfs_inode *inode, } btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid, freed, BTRFS_QGROUP_RSV_DATA); - ret = freed; + if (freed_ret) + *freed_ret = freed; + ret = 0; out: extent_changeset_release(&changeset); return ret; @@ -3910,7 +3913,7 @@ static int qgroup_free_reserved_data(struct btrfs_inode *inode,
static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, struct extent_changeset *reserved, u64 start, u64 len, - int free) + u64 *released, int free) { struct extent_changeset changeset; int trace_op = QGROUP_RELEASE; @@ -3922,7 +3925,7 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, /* In release case, we shouldn't have @reserved */ WARN_ON(!free && reserved); if (free && reserved) - return qgroup_free_reserved_data(inode, reserved, start, len); + return qgroup_free_reserved_data(inode, reserved, start, len, released); extent_changeset_init(&changeset); ret = clear_record_extent_bits(&inode->io_tree, start, start + len -1, EXTENT_QGROUP_RESERVED, &changeset); @@ -3937,7 +3940,8 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, btrfs_qgroup_free_refroot(inode->root->fs_info, inode->root->root_key.objectid, changeset.bytes_changed, BTRFS_QGROUP_RSV_DATA); - ret = changeset.bytes_changed; + if (released) + *released = changeset.bytes_changed; out: extent_changeset_release(&changeset); return ret; @@ -3956,9 +3960,10 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, * NOTE: This function may sleep for memory allocation. */ int btrfs_qgroup_free_data(struct btrfs_inode *inode, - struct extent_changeset *reserved, u64 start, u64 len) + struct extent_changeset *reserved, + u64 start, u64 len, u64 *freed) { - return __btrfs_qgroup_release_data(inode, reserved, start, len, 1); + return __btrfs_qgroup_release_data(inode, reserved, start, len, freed, 1); }
/* @@ -3976,9 +3981,9 @@ int btrfs_qgroup_free_data(struct btrfs_inode *inode, * * NOTE: This function may sleep for memory allocation. */ -int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len) +int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released) { - return __btrfs_qgroup_release_data(inode, NULL, start, len, 0); + return __btrfs_qgroup_release_data(inode, NULL, start, len, released, 0); }
static void add_root_meta_rsv(struct btrfs_root *root, int num_bytes, diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 7bffa10589d6..104c9bd3c337 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -363,10 +363,10 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, /* New io_tree based accurate qgroup reserve API */ int btrfs_qgroup_reserve_data(struct btrfs_inode *inode, struct extent_changeset **reserved, u64 start, u64 len); -int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len); +int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released); int btrfs_qgroup_free_data(struct btrfs_inode *inode, struct extent_changeset *reserved, u64 start, - u64 len); + u64 len, u64 *freed); int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, enum btrfs_qgroup_rsv_type type, bool enforce); int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
From: Boris Burkov boris@bur.io
stable inclusion from stable-v6.6.8 commit d3cf024353e213989a48399ed9ed24775e6472f0 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit a86805504b88f636a6458520d85afdf0634e3c6b upstream.
The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for duplicate reservations. That is two writes to that range in one transaction shouldn't create two reservations, as the reservation will only be freed once when the write finally goes down. Therefore, it is never OK to clear that bit without freeing the associated qgroup reserve. At this point, we don't want to be freeing the reserve, so mask off the bit.
CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/btrfs/extent_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1530df88370c..03c10e0ba0e2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2303,7 +2303,8 @@ static int try_release_extent_state(struct extent_io_tree *tree, ret = 0; } else { u32 clear_bits = ~(EXTENT_LOCKED | EXTENT_NODATASUM | - EXTENT_DELALLOC_NEW | EXTENT_CTLBITS); + EXTENT_DELALLOC_NEW | EXTENT_CTLBITS | + EXTENT_QGROUP_RESERVED);
/* * At this point we can safely clear everything except the
From: Christian König christian.koenig@amd.com
stable inclusion from stable-v6.6.8 commit d50670681d8a14980a54238bad00a07fee122b5f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit ceb9a321e7639700844aa3bf234a4e0884f13b77 upstream.
When freeing PD/PT with shadows it can happen that the shadow destruction races with detaching the PD/PT from the VM causing a NULL pointer dereference in the invalidation code.
Fix this by detaching the the PD/PT from the VM first and then freeing the shadow instead.
Signed-off-by: Christian König christian.koenig@amd.com Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867 Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 96d601e209b8..0d51222f6f8e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -642,13 +642,14 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
if (!entry->bo) return; + + entry->bo->vm_bo = NULL; shadow = amdgpu_bo_shadowed(entry->bo); if (shadow) { ttm_bo_set_bulk_move(&shadow->tbo, NULL); amdgpu_bo_unref(&shadow); } ttm_bo_set_bulk_move(&entry->bo->tbo, NULL); - entry->bo->vm_bo = NULL;
spin_lock(&entry->vm->status_lock); list_del(&entry->vm_status);
From: Jani Nikula jani.nikula@intel.com
stable inclusion from stable-v6.6.8 commit a511e851d49e13835f0da022d48e873c8cc15827 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 759f14e20891de72e676d9d738eb2c573aa15f52 upstream.
When the separate add modes call was added back in commit c533b5167c7e ("drm/edid: add separate drm_edid_connector_add_modes()"), it failed to address drm_edid_override_connector_update(). Also call add modes there.
Reported-by: bbaa bbaa@bbaa.fun Closes: https://lore.kernel.org/r/930E9B4C7D91FDFF+29b34d89-8658-4910-966a-c772f320e... Fixes: c533b5167c7e ("drm/edid: add separate drm_edid_connector_add_modes()") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Jani Nikula jani.nikula@intel.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231207093821.2654267-1-jani.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/drm_edid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index b3e1b288fc0c..a491280ca48c 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -2308,7 +2308,8 @@ int drm_edid_override_connector_update(struct drm_connector *connector)
override = drm_edid_override_get(connector); if (override) { - num_modes = drm_edid_connector_update(connector, override); + if (drm_edid_connector_update(connector, override) == 0) + num_modes = drm_edid_connector_add_modes(connector);
drm_edid_free(override);
From: Mario Limonciello mario.limonciello@amd.com
stable inclusion from stable-v6.6.8 commit c7f6e836e675683e704a40e68eded505bdc30428 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b96ab339ee50470d13a1faa6ad94d2218a7cd49f upstream.
Mark reports that brightness is not restored after Xorg dpms screen blank.
This behavior was introduced by commit d9e865826c20 ("drm/amd/display: Simplify brightness initialization") which dropped the cached backlight value in display code, but also removed code for when the default value read back was less than 1 nit.
Restore this code so that the backlight brightness is restored to the correct default value in this circumstance.
Reported-by: Mark Herbert mark.herbert42@gmail.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3031 Cc: stable@vger.kernel.org Cc: Camille Cho camille.cho@amd.com Cc: Krunoslav Kovac krunoslav.kovac@amd.com Cc: Hamza Mahfooz hamza.mahfooz@amd.com Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- .../amd/display/dc/link/protocols/link_edp_panel_control.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c index fe74d4252a51..13c3d7ff6139 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c @@ -280,8 +280,8 @@ bool set_default_brightness_aux(struct dc_link *link) if (link && link->dpcd_sink_ext_caps.bits.oled == 1) { if (!read_default_bl_aux(link, &default_backlight)) default_backlight = 150000; - // if > 5000, it might be wrong readback - if (default_backlight > 5000000) + // if < 1 nits or > 5000, it might be wrong readback + if (default_backlight < 1000 || default_backlight > 5000000) default_backlight = 150000;
return edp_set_backlight_level_nits(link, true,
From: Mario Limonciello mario.limonciello@amd.com
stable inclusion from stable-v6.6.15 commit 107a11637f43e7cdcca96c09525481e38b004455 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit e7ab758741672acb21c5d841a9f0309d30e48a06 upstream.
When screen brightness is rapidly changed and PSR-SU is enabled the display hangs on panels with this TCON even on the latest DCN 3.1.4 microcode (0x8002a81 at this time).
This was disabled previously as commit 072030b17830 ("drm/amd: Disable PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17ce546 ("Revert "drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for a new enough microcode (commit cd2e31a9ab93 ("drm/amd/display: Set minimum requirement for using PSR-SU on Phoenix")).
As hangs are still happening specifically with this TCON, disable PSR-SU again for it until it can be root caused.
Cc: stable@vger.kernel.org Cc: aaron.ma@canonical.com Cc: binli@gnome.org Cc: Marc Rossi Marc.Rossi@amd.com Cc: Hamza Mahfooz Hamza.Mahfooz@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131 Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/amd/display/modules/power/power_helpers.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c index 73a2b37fbbd7..2b3d5183818a 100644 --- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c +++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c @@ -839,6 +839,8 @@ bool is_psr_su_specific_panel(struct dc_link *link) ((dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x08) || (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x07))) isPSRSUSupported = false; + else if (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x03) + isPSRSUSupported = false; else if (dpcd_caps->psr_info.force_psrsu_cap == 0x1) isPSRSUSupported = true; }
From: Ville Syrjälä ville.syrjala@linux.intel.com
stable inclusion from stable-v6.6.8 commit b6295a167fa55fff2c0ac7ed534137342a422eb1 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 324b70e997aab0a7deab8cb90711faccda4e98c8 upstream.
plane_view_scanout_stride() currently assumes that we had to pad the mapping stride with dummy pages in order to align it. But that is not the case if the original fb stride exceeds the aligned stride used to populate the remapped view, which is calculated from the user specified framebuffer width rather than the user specified framebuffer stride.
Ignore the original fb stride in this case and just stick to the POT aligned stride. Getting this wrong will cause the plane to fetch the wrong data, and can lead to fault errors if the page tables at the bogus location aren't even populated.
TODO: figure out if this is OK for CCS, or if we should instead increase the width of the view to cover the entire user specified fb stride instead...
Cc: Imre Deak imre.deak@intel.com Cc: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231204202443.31247-1-ville.s... Reviewed-by: Imre Deak imre.deak@intel.com Reviewed-by: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com (cherry picked from commit 01a39f1c4f1220a4e6a25729fae87ff5794cbc52) Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/i915/display/intel_fb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 446bbf7986b6..ca37f90eeafd 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -1370,7 +1370,8 @@ plane_view_scanout_stride(const struct intel_framebuffer *fb, int color_plane, struct drm_i915_private *i915 = to_i915(fb->base.dev); unsigned int stride_tiles;
- if (IS_ALDERLAKE_P(i915) || DISPLAY_VER(i915) >= 14) + if ((IS_ALDERLAKE_P(i915) || DISPLAY_VER(i915) >= 14) && + src_stride_tiles < dst_stride_tiles) stride_tiles = src_stride_tiles; else stride_tiles = dst_stride_tiles;
From: Ville Syrjälä ville.syrjala@linux.intel.com
stable inclusion from stable-v6.6.8 commit a9d951b007904942e2519bc8ff7ba2d1b4121060 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c3070f080f9ba18dea92eaa21730f7ab85b5c8f4 upstream.
Since the plane_state variable is declared outside the scaler_users loop in intel_atomic_setup_scalers(), and it's never reset back to NULL inside the loop we may end up calling intel_atomic_setup_scaler() with a non-NULL plane state for the pipe scaling case. That is bad because intel_atomic_setup_scaler() determines whether we are doing plane scaling or pipe scaling based on plane_state!=NULL. The end result is that we may miscalculate the scaler mode for pipe scaling.
The hardware becomes somewhat upset if we end up in this situation when scanning out a planar format on a SDR plane. We end up programming the pipe scaler into planar mode as well, and the result is a screenfull of garbage.
Fix the situation by making sure we pass the correct plane_state==NULL when calculating the scaler mode for pipe scaling.
Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231207193441.20206-2-ville.s... Reviewed-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit e81144106e21271c619f0c722a09e27ccb8c043d) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/i915/display/skl_scaler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/skl_scaler.c b/drivers/gpu/drm/i915/display/skl_scaler.c index 1e7c97243fcf..8a934bada624 100644 --- a/drivers/gpu/drm/i915/display/skl_scaler.c +++ b/drivers/gpu/drm/i915/display/skl_scaler.c @@ -504,7 +504,6 @@ int intel_atomic_setup_scalers(struct drm_i915_private *dev_priv, { struct drm_plane *plane = NULL; struct intel_plane *intel_plane; - struct intel_plane_state *plane_state = NULL; struct intel_crtc_scaler_state *scaler_state = &crtc_state->scaler_state; struct drm_atomic_state *drm_state = crtc_state->uapi.state; @@ -536,6 +535,7 @@ int intel_atomic_setup_scalers(struct drm_i915_private *dev_priv,
/* walkthrough scaler_users bits and start assigning scalers */ for (i = 0; i < sizeof(scaler_state->scaler_users) * 8; i++) { + struct intel_plane_state *plane_state = NULL; int *scaler_id; const char *name; int idx, ret;
From: Ville Syrjälä ville.syrjala@linux.intel.com
stable inclusion from stable-v6.6.8 commit 4029b025bedaee1cd819f4f9a8b2244266fd320b bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0ccd963fe555451b1f84e6d14d2b3ef03dd5c947 upstream.
On ADL+ the hardware automagically calculates the CCS AUX surface stride from the main surface stride, so when remapping we can't really play a lot of tricks with the main surface stride, or else the AUX surface stride would get miscalculated and no longer match the actual data layout in memory.
Supposedly we could remap in 256 main surface tile units (AUX page(4096)/cachline(64)*4(4x1 main surface tiles per AUX cacheline)=256 main surface tiles), but the extra complexity is probably not worth the hassle.
So let's just make sure our mapping stride is calculated from the full framebuffer stride (instead of the framebuffer width). This way the stride we program into PLANE_STRIDE will be the original framebuffer stride, and thus there will be no change to the AUX stride/layout.
Cc: stable@vger.kernel.org Cc: Imre Deak imre.deak@intel.com Cc: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231205180308.7505-1-ville.sy... Reviewed-by: Imre Deak imre.deak@intel.com (cherry picked from commit 2c12eb36f849256f5eb00ffaee9bf99396fd3814) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/gpu/drm/i915/display/intel_fb.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index ca37f90eeafd..689b7c16d300 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -1498,8 +1498,20 @@ static u32 calc_plane_remap_info(const struct intel_framebuffer *fb, int color_p
size += remap_info->size; } else { - unsigned int dst_stride = plane_view_dst_stride_tiles(fb, color_plane, - remap_info->width); + unsigned int dst_stride; + + /* + * The hardware automagically calculates the CCS AUX surface + * stride from the main surface stride so can't really remap a + * smaller subset (unless we'd remap in whole AUX page units). + */ + if (intel_fb_needs_pot_stride_remap(fb) && + intel_fb_is_ccs_modifier(fb->base.modifier)) + dst_stride = remap_info->src_stride; + else + dst_stride = remap_info->width; + + dst_stride = plane_view_dst_stride_tiles(fb, color_plane, dst_stride);
assign_chk_ovf(i915, remap_info->dst_stride, dst_stride); color_plane_info->mapping_stride = dst_stride *
From: Paulo Alcantara pc@manguebit.com
stable inclusion from stable-v6.6.8 commit 17a0f64cc02d4972e21c733d9f21d1c512963afa bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit af1689a9b7701d9907dfc84d2a4b57c4bc907144 upstream.
Validate offsets and lengths before dereferencing create contexts in smb2_parse_contexts().
This fixes following oops when accessing invalid create contexts from server:
BUG: unable to handle page fault for address: ffff8881178d8cc3 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs] Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00 00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7 7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00 RSP: 0018:ffffc900007939e0 EFLAGS: 00010216 RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90 RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000 RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000 R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000 R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22 FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? smb2_parse_contexts+0xa0/0x3a0 [cifs] SMB2_open+0x38d/0x5f0 [cifs] ? smb2_is_path_accessible+0x138/0x260 [cifs] smb2_is_path_accessible+0x138/0x260 [cifs] cifs_is_path_remote+0x8d/0x230 [cifs] cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f8737657b1e
Reported-by: Robert Morris rtm@csail.mit.edu Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/client/cached_dir.c | 17 +++++-- fs/smb/client/smb2pdu.c | 93 +++++++++++++++++++++++--------------- fs/smb/client/smb2proto.h | 12 +++-- 3 files changed, 75 insertions(+), 47 deletions(-)
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c index 59f6b8e32cc9..d64a306a414b 100644 --- a/fs/smb/client/cached_dir.c +++ b/fs/smb/client/cached_dir.c @@ -291,16 +291,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, oparms.fid->mid = le64_to_cpu(o_rsp->hdr.MessageId); #endif /* CIFS_DEBUG2 */
- rc = -EINVAL; + if (o_rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) { + spin_unlock(&cfids->cfid_list_lock); + rc = -EINVAL; + goto oshr_free; + } + + rc = smb2_parse_contexts(server, rsp_iov, + &oparms.fid->epoch, + oparms.fid->lease_key, + &oplock, NULL, NULL); + if (rc) { spin_unlock(&cfids->cfid_list_lock); goto oshr_free; }
- smb2_parse_contexts(server, o_rsp, - &oparms.fid->epoch, - oparms.fid->lease_key, &oplock, - NULL, NULL); + rc = -EINVAL; if (!(oplock & SMB2_LEASE_READ_CACHING_HE)) { spin_unlock(&cfids->cfid_list_lock); goto oshr_free; diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index c75a80bb6d9e..2df118540e89 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -2141,17 +2141,18 @@ parse_posix_ctxt(struct create_context *cc, struct smb2_file_all_info *info, posix->nlink, posix->mode, posix->reparse_tag); }
-void -smb2_parse_contexts(struct TCP_Server_Info *server, - struct smb2_create_rsp *rsp, - unsigned int *epoch, char *lease_key, __u8 *oplock, - struct smb2_file_all_info *buf, - struct create_posix_rsp *posix) +int smb2_parse_contexts(struct TCP_Server_Info *server, + struct kvec *rsp_iov, + unsigned int *epoch, + char *lease_key, __u8 *oplock, + struct smb2_file_all_info *buf, + struct create_posix_rsp *posix) { - char *data_offset; + struct smb2_create_rsp *rsp = rsp_iov->iov_base; struct create_context *cc; - unsigned int next; - unsigned int remaining; + size_t rem, off, len; + size_t doff, dlen; + size_t noff, nlen; char *name; static const char smb3_create_tag_posix[] = { 0x93, 0xAD, 0x25, 0x50, 0x9C, @@ -2160,45 +2161,63 @@ smb2_parse_contexts(struct TCP_Server_Info *server, };
*oplock = 0; - data_offset = (char *)rsp + le32_to_cpu(rsp->CreateContextsOffset); - remaining = le32_to_cpu(rsp->CreateContextsLength); - cc = (struct create_context *)data_offset; + + off = le32_to_cpu(rsp->CreateContextsOffset); + rem = le32_to_cpu(rsp->CreateContextsLength); + if (check_add_overflow(off, rem, &len) || len > rsp_iov->iov_len) + return -EINVAL; + cc = (struct create_context *)((u8 *)rsp + off);
/* Initialize inode number to 0 in case no valid data in qfid context */ if (buf) buf->IndexNumber = 0;
- while (remaining >= sizeof(struct create_context)) { - name = le16_to_cpu(cc->NameOffset) + (char *)cc; - if (le16_to_cpu(cc->NameLength) == 4 && - strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4) == 0) - *oplock = server->ops->parse_lease_buf(cc, epoch, - lease_key); - else if (buf && (le16_to_cpu(cc->NameLength) == 4) && - strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4) == 0) - parse_query_id_ctxt(cc, buf); - else if ((le16_to_cpu(cc->NameLength) == 16)) { - if (posix && - memcmp(name, smb3_create_tag_posix, 16) == 0) + while (rem >= sizeof(*cc)) { + doff = le16_to_cpu(cc->DataOffset); + dlen = le32_to_cpu(cc->DataLength); + if (check_add_overflow(doff, dlen, &len) || len > rem) + return -EINVAL; + + noff = le16_to_cpu(cc->NameOffset); + nlen = le16_to_cpu(cc->NameLength); + if (noff + nlen >= doff) + return -EINVAL; + + name = (char *)cc + noff; + switch (nlen) { + case 4: + if (!strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4)) { + *oplock = server->ops->parse_lease_buf(cc, epoch, + lease_key); + } else if (buf && + !strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4)) { + parse_query_id_ctxt(cc, buf); + } + break; + case 16: + if (posix && !memcmp(name, smb3_create_tag_posix, 16)) parse_posix_ctxt(cc, buf, posix); + break; + default: + cifs_dbg(FYI, "%s: unhandled context (nlen=%zu dlen=%zu)\n", + __func__, nlen, dlen); + if (IS_ENABLED(CONFIG_CIFS_DEBUG2)) + cifs_dump_mem("context data: ", cc, dlen); + break; } - /* else { - cifs_dbg(FYI, "Context not matched with len %d\n", - le16_to_cpu(cc->NameLength)); - cifs_dump_mem("Cctxt name: ", name, 4); - } */ - - next = le32_to_cpu(cc->Next); - if (!next) + + off = le32_to_cpu(cc->Next); + if (!off) break; - remaining -= next; - cc = (struct create_context *)((char *)cc + next); + if (check_sub_overflow(rem, off, &rem)) + return -EINVAL; + cc = (struct create_context *)((u8 *)cc + off); }
if (rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) *oplock = rsp->OplockLevel;
- return; + return 0; }
static int @@ -3029,8 +3048,8 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path, }
- smb2_parse_contexts(server, rsp, &oparms->fid->epoch, - oparms->fid->lease_key, oplock, buf, posix); + rc = smb2_parse_contexts(server, &rsp_iov, &oparms->fid->epoch, + oparms->fid->lease_key, oplock, buf, posix); creat_exit: SMB2_open_free(&rqst); free_rsp_buf(resp_buftype, rsp); diff --git a/fs/smb/client/smb2proto.h b/fs/smb/client/smb2proto.h index 46eff9ec302a..0e371f7e2854 100644 --- a/fs/smb/client/smb2proto.h +++ b/fs/smb/client/smb2proto.h @@ -251,11 +251,13 @@ extern int smb3_validate_negotiate(const unsigned int, struct cifs_tcon *);
extern enum securityEnum smb2_select_sectype(struct TCP_Server_Info *, enum securityEnum); -extern void smb2_parse_contexts(struct TCP_Server_Info *server, - struct smb2_create_rsp *rsp, - unsigned int *epoch, char *lease_key, - __u8 *oplock, struct smb2_file_all_info *buf, - struct create_posix_rsp *posix); +int smb2_parse_contexts(struct TCP_Server_Info *server, + struct kvec *rsp_iov, + unsigned int *epoch, + char *lease_key, __u8 *oplock, + struct smb2_file_all_info *buf, + struct create_posix_rsp *posix); + extern int smb3_encryption_required(const struct cifs_tcon *tcon); extern int smb2_validate_iov(unsigned int offset, unsigned int buffer_length, struct kvec *iov, unsigned int min_buf_size);
From: Paulo Alcantara pc@manguebit.com
stable inclusion from stable-v6.6.8 commit ef748d4a62a788e235b5443bef664b39e00444a8 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 90d025c2e953c11974e76637977c473200593a46 upstream.
If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset, smb2_get_data_area() sets @len to non-zero but return NULL, so decode_negTokeninit() ends up being called with a NULL @security_blob:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:asn1_ber_decoder+0x173/0xc80 Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00 RSP: 0018:ffffc9000063f950 EFLAGS: 00010202 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000 FS: 00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? __stack_depot_save+0x1e6/0x480 ? exc_page_fault+0x6f/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? asn1_ber_decoder+0x173/0xc80 ? check_object+0x40/0x340 decode_negTokenInit+0x1e/0x30 [cifs] SMB2_negotiate+0xc99/0x17c0 [cifs] ? smb2_negotiate+0x46/0x60 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 smb2_negotiate+0x46/0x60 [cifs] cifs_negotiate_protocol+0xae/0x130 [cifs] cifs_get_smb_ses+0x517/0x1040 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? queue_delayed_work_on+0x5d/0x90 cifs_mount_get_session+0x78/0x200 [cifs] dfs_mount_share+0x13a/0x9f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 ? find_nls+0x16/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7fce52c2ab1e
Fix this by setting @len to zero when @off == 0 so callers won't attempt to dereference non-existing data areas.
Reported-by: Robert Morris rtm@csail.mit.edu Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/client/smb2misc.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/fs/smb/client/smb2misc.c b/fs/smb/client/smb2misc.c index 32dfa0f7a78c..e20b4354e703 100644 --- a/fs/smb/client/smb2misc.c +++ b/fs/smb/client/smb2misc.c @@ -313,6 +313,9 @@ static const bool has_smb2_data_area[NUMBER_OF_SMB2_COMMANDS] = { char * smb2_get_data_area_len(int *off, int *len, struct smb2_hdr *shdr) { + const int max_off = 4096; + const int max_len = 128 * 1024; + *off = 0; *len = 0;
@@ -384,29 +387,20 @@ smb2_get_data_area_len(int *off, int *len, struct smb2_hdr *shdr) * Invalid length or offset probably means data area is invalid, but * we have little choice but to ignore the data area in this case. */ - if (*off > 4096) { - cifs_dbg(VFS, "offset %d too large, data area ignored\n", *off); - *len = 0; - *off = 0; - } else if (*off < 0) { - cifs_dbg(VFS, "negative offset %d to data invalid ignore data area\n", - *off); + if (unlikely(*off < 0 || *off > max_off || + *len < 0 || *len > max_len)) { + cifs_dbg(VFS, "%s: invalid data area (off=%d len=%d)\n", + __func__, *off, *len); *off = 0; *len = 0; - } else if (*len < 0) { - cifs_dbg(VFS, "negative data length %d invalid, data area ignored\n", - *len); - *len = 0; - } else if (*len > 128 * 1024) { - cifs_dbg(VFS, "data area larger than 128K: %d\n", *len); + } else if (*off == 0) { *len = 0; }
/* return pointer to beginning of data area, ie offset from SMB start */ - if ((*off != 0) && (*len != 0)) + if (*off > 0 && *len > 0) return (char *)shdr + *off; - else - return NULL; + return NULL; }
/*
From: Paulo Alcantara pc@manguebit.com
stable inclusion from stable-v6.6.8 commit e72ed491bc6ef97bec6a976cb3832f0e01da24f0 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 3a42709fa909e22b0be4bb1e2795aa04ada732a3 upstream.
Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that their sum does not wrap to a number that is smaller than @reparse_buf and we end up with a wild pointer as follows:
BUG: unable to handle page fault for address: ffff88809c5cd45f #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs] Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff 49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b> 04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff RSP: 0018:ffffc90000347a90 EFLAGS: 00010212 RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4 RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100 R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000 FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? _raw_spin_unlock_irqrestore+0x44/0x60 ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs] cifs_get_fattr+0x16e/0xa50 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 cifs_root_iget+0x163/0x5f0 [cifs] cifs_smb3_do_mount+0x5bd/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f02c08d5b1e
Fixes: 2e4564b31b64 ("smb3: add support for stat of WSL reparse points for special file types") Cc: stable@vger.kernel.org Reported-by: Robert Morris rtm@csail.mit.edu Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- fs/smb/client/smb2ops.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index 258a591f716a..dbcfdf7bc270 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -3001,7 +3001,7 @@ static int smb2_query_reparse_point(const unsigned int xid, struct kvec *rsp_iov; struct smb2_ioctl_rsp *ioctl_rsp; struct reparse_data_buffer *reparse_buf; - u32 plen; + u32 off, count, len;
cifs_dbg(FYI, "%s: path: %s\n", __func__, full_path);
@@ -3082,16 +3082,22 @@ static int smb2_query_reparse_point(const unsigned int xid, */ if (rc == 0) { /* See MS-FSCC 2.3.23 */ + off = le32_to_cpu(ioctl_rsp->OutputOffset); + count = le32_to_cpu(ioctl_rsp->OutputCount); + if (check_add_overflow(off, count, &len) || + len > rsp_iov[1].iov_len) { + cifs_tcon_dbg(VFS, "%s: invalid ioctl: off=%d count=%d\n", + __func__, off, count); + rc = -EIO; + goto query_rp_exit; + }
- reparse_buf = (struct reparse_data_buffer *) - ((char *)ioctl_rsp + - le32_to_cpu(ioctl_rsp->OutputOffset)); - plen = le32_to_cpu(ioctl_rsp->OutputCount); - - if (plen + le32_to_cpu(ioctl_rsp->OutputOffset) > - rsp_iov[1].iov_len) { - cifs_tcon_dbg(FYI, "srv returned invalid ioctl len: %d\n", - plen); + reparse_buf = (void *)((u8 *)ioctl_rsp + off); + len = sizeof(*reparse_buf); + if (count < len || + count < le16_to_cpu(reparse_buf->ReparseDataLength) + len) { + cifs_tcon_dbg(VFS, "%s: invalid ioctl: off=%d count=%d\n", + __func__, off, count); rc = -EIO; goto query_rp_exit; }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit b02bf0d952ad237444d91881adc56a75912e5e7a bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 17d801758157bec93f26faaf5ff1a8b9a552d67a upstream.
Reading the ring buffer does a swap of a sub-buffer within the ring buffer with a empty sub-buffer. This allows the reader to have full access to the content of the sub-buffer that was swapped out without having to worry about contention with the writer.
The readers call ring_buffer_alloc_read_page() to allocate a page that will be used to swap with the ring buffer. When the code is finished with the reader page, it calls ring_buffer_free_read_page(). Instead of freeing the page, it stores it as a spare. Then next call to ring_buffer_alloc_read_page() will return this spare instead of calling into the memory management system to allocate a new page.
Unfortunately, on freeing of the ring buffer, this spare page is not freed, and causes a memory leak.
Link: https://lore.kernel.org/linux-trace-kernel/20231210221250.7b9cc83c@rorschach...
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index e70aa61539dc..3cba9e176d7f 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1794,6 +1794,8 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) free_buffer_page(bpage); }
+ free_page((unsigned long)cpu_buffer->free_page); + kfree(cpu_buffer); }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit 5062b8c5ae2fe11c52bbb824723445d9dd562b3e bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit d06aff1cb13d2a0d52b48e605462518149c98c81 upstream.
The snapshot buffer is to mimic the main buffer so that when a snapshot is needed, the snapshot and main buffer are swapped. When the snapshot buffer is allocated, it is set to the minimal size that the ring buffer may be at and still functional. When it is allocated it becomes the same size as the main ring buffer, and when the main ring buffer changes in size, it should do.
Currently, the resize only updates the snapshot buffer if it's used by the current tracer (ie. the preemptirqsoff tracer). But it needs to be updated anytime it is allocated.
When changing the size of the main buffer, instead of looking to see if the current tracer is utilizing the snapshot buffer, just check if it is allocated to know if it should be updated or not.
Also fix typo in comment just above the code change.
Link: https://lore.kernel.org/linux-trace-kernel/20231210225447.48476a6a@rorschach...
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions") Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5afa58302b06..310cfaa9c0f1 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6356,7 +6356,7 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr, if (!tr->array_buffer.buffer) return 0;
- /* Do not allow tracing while resizng ring buffer */ + /* Do not allow tracing while resizing ring buffer */ tracing_stop_tr(tr);
ret = ring_buffer_resize(tr->array_buffer.buffer, size, cpu); @@ -6364,7 +6364,7 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr, goto out_start;
#ifdef CONFIG_TRACER_MAX_TRACE - if (!tr->current_trace->use_max_tr) + if (!tr->allocated_snapshot) goto out;
ret = ring_buffer_resize(tr->max_buffer.buffer, size, cpu);
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit 5e584836779b15ff1f40b8eec0d261355dbd026f bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 9e45e39dc249c970d99d2681f6bcb55736fd725c upstream.
The ring buffer timestamps are synchronized by two timestamp placeholders. One is the "before_stamp" and the other is the "write_stamp" (sometimes referred to as the "after stamp" but only in the comments. These two stamps are key to knowing how to handle nested events coming in with a lockless system.
When moving across sub-buffers, the before stamp is updated but the write stamp is not. There's an effort to put back the before stamp to something that seems logical in case there's nested events. But as the current event is about to cross sub-buffers, and so will any new nested event that happens, updating the before stamp is useless, and could even introduce new race conditions.
The first event on a sub-buffer simply uses the sub-buffer's timestamp and keeps a "delta" of zero. The "before_stamp" and "write_stamp" are not used in the algorithm in this case. There's no reason to try to fix the before_stamp when this happens.
As a bonus, it removes a cmpxchg() when crossing sub-buffers!
Link: https://lore.kernel.org/linux-trace-kernel/20231211114420.36dde01b@gandalf.l...
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 3cba9e176d7f..4b2653953b85 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3611,14 +3611,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
/* See if we shot pass the end of this buffer page */ if (unlikely(write > BUF_PAGE_SIZE)) { - /* before and after may now different, fix it up*/ - b_ok = rb_time_read(&cpu_buffer->before_stamp, &info->before); - a_ok = rb_time_read(&cpu_buffer->write_stamp, &info->after); - if (a_ok && b_ok && info->before != info->after) - (void)rb_time_cmpxchg(&cpu_buffer->before_stamp, - info->before, info->after); - if (a_ok && b_ok) - check_buffer(cpu_buffer, info, CHECK_FULL_PAGE); + check_buffer(cpu_buffer, info, CHECK_FULL_PAGE); return rb_move_tail(cpu_buffer, tail, info); }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit 307ed139d7af56502698438f0a737e4c7dfcac82 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b049525855fdd0024881c9b14b8fbec61c3f53d3 upstream.
For the ring buffer iterator (non-consuming read), the event needs to be copied into the iterator buffer to make sure that a writer does not overwrite it while the user is reading it. If a write happens during the copy, the buffer is simply discarded.
But the temp buffer itself was not big enough. The allocation of the buffer was only BUF_MAX_DATA_SIZE, which is the maximum data size that can be passed into the ring buffer and saved. But the temp buffer needs to hold the meta data as well. That would be BUF_PAGE_SIZE and not BUF_MAX_DATA_SIZE.
Link: https://lore.kernel.org/linux-trace-kernel/20231212072558.61f76493@gandalf.l...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: 785888c544e04 ("ring-buffer: Have rb_iter_head_event() handle concurrent writer") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 4b2653953b85..4b4980ab1f63 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2416,7 +2416,7 @@ rb_iter_head_event(struct ring_buffer_iter *iter) */ barrier();
- if ((iter->head + length) > commit || length > BUF_MAX_DATA_SIZE) + if ((iter->head + length) > commit || length > BUF_PAGE_SIZE) /* Writer corrupted the read? */ goto reset;
@@ -5120,7 +5120,8 @@ ring_buffer_read_prepare(struct trace_buffer *buffer, int cpu, gfp_t flags) if (!iter) return NULL;
- iter->event = kmalloc(BUF_MAX_DATA_SIZE, flags); + /* Holds the entire event: data and meta data */ + iter->event = kmalloc(BUF_PAGE_SIZE, flags); if (!iter->event) { kfree(iter); return NULL;
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit ae76d9bdf100e588109036c3b7db23b6693e46e3 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b3ae7b67b87fed771fa5bf95389df06b0433603e upstream.
The maximum ring buffer data size is the maximum size of data that can be recorded on the ring buffer. Events must be smaller than the sub buffer data size minus any meta data. This size is checked before trying to allocate from the ring buffer because the allocation assumes that the size will fit on the sub buffer.
The maximum size was calculated as the size of a sub buffer page (which is currently PAGE_SIZE minus the sub buffer header) minus the size of the meta data of an individual event. But it missed the possible adding of a time stamp for events that are added long enough apart that the event meta data can't hold the time delta.
When an event is added that is greater than the current BUF_MAX_DATA_SIZE minus the size of a time stamp, but still less than or equal to BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking for a page that can hold the event. Luckily, there's a check for this loop and after 1000 iterations and a warning is emitted and the ring buffer is disabled. But this should never happen.
This can happen when a large event is added first, or after a long period where an absolute timestamp is prefixed to the event, increasing its size by 8 bytes. This passes the check and then goes into the algorithm that causes the infinite loop.
For events that are the first event on the sub-buffer, it does not need to add a timestamp, because the sub-buffer itself contains an absolute timestamp, and adding one is redundant.
The fix is to check if the event is to be the first event on the sub-buffer, and if it is, then do not add a timestamp.
This also fixes 32 bit adding a timestamp when a read of before_stamp or write_stamp is interrupted. There's still no need to add that timestamp if the event is going to be the first event on the sub buffer.
Also, if the buffer has "time_stamp_abs" set, then also check if the length plus the timestamp is greater than the BUF_MAX_DATA_SIZE.
Link: https://lore.kernel.org/all/20231212104549.58863438@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231212071837.5fdd6c13@gandalf.l... Link: https://lore.kernel.org/linux-trace-kernel/20231212111617.39e02849@gandalf.l...
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated") Fixes: 58fbc3c63275c ("ring-buffer: Consolidate add_timestamp to remove some branches") Reported-by: Kent Overstreet kent.overstreet@linux.dev # (on IRC) Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 4b4980ab1f63..6ff4b98af2df 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3588,7 +3588,10 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, * absolute timestamp. * Don't bother if this is the start of a new page (w == 0). */ - if (unlikely(!a_ok || !b_ok || (info->before != info->after && w))) { + if (!w) { + /* Use the sub-buffer timestamp */ + info->delta = 0; + } else if (unlikely(!a_ok || !b_ok || info->before != info->after)) { info->add_timestamp |= RB_ADD_STAMP_FORCE | RB_ADD_STAMP_EXTEND; info->length += RB_LEN_TIME_EXTEND; } else { @@ -3739,6 +3742,8 @@ rb_reserve_next_event(struct trace_buffer *buffer, if (ring_buffer_time_stamp_abs(cpu_buffer->buffer)) { add_ts_default = RB_ADD_STAMP_ABSOLUTE; info.length += RB_LEN_TIME_EXTEND; + if (info.length > BUF_MAX_DATA_SIZE) + goto out_fail; } else { add_ts_default = RB_ADD_STAMP_NONE; }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit bc17bc96432868ee5d59c44556959306c220aea7 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit fff88fa0fbc7067ba46dde570912d63da42c59a9 upstream.
Mathieu Desnoyers pointed out an issue in the rb_time_cmpxchg() for 32 bit architectures. That is:
static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) { unsigned long cnt, top, bottom, msb; unsigned long cnt2, top2, bottom2, msb2; u64 val;
/* The cmpxchg always fails if it interrupted an update */ if (!__rb_time_read(t, &val, &cnt2)) return false;
if (val != expect) return false;
<<<< interrupted here!
cnt = local_read(&t->cnt);
The problem is that the synchronization counter in the rb_time_t is read *after* the value of the timestamp is read. That means if an interrupt were to come in between the value being read and the counter being read, it can change the value and the counter and the interrupted process would be clueless about it!
The counter needs to be read first and then the value. That way it is easy to tell if the value is stale or not. If the counter hasn't been updated, then the value is still good.
Link: https://lore.kernel.org/linux-trace-kernel/20231211201324.652870-1-mathieu.d... Link: https://lore.kernel.org/linux-trace-kernel/20231212115301.7a9c9a64@gandalf.l...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Fixes: 10464b4aa605e ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit") Reported-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6ff4b98af2df..b9f7998c5bdd 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -713,6 +713,9 @@ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) unsigned long cnt2, top2, bottom2, msb2; u64 val;
+ /* Any interruptions in this function should cause a failure */ + cnt = local_read(&t->cnt); + /* The cmpxchg always fails if it interrupted an update */ if (!__rb_time_read(t, &val, &cnt2)) return false; @@ -720,7 +723,6 @@ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) if (val != expect) return false;
- cnt = local_read(&t->cnt); if ((cnt & 3) != cnt2) return false;
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit b3778a2fa4a281a21ea0f1d12022cdcf90616f61 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit dd939425707898da992e59ab0fcfae4652546910 upstream.
If an update to an event is interrupted by another event between the time the initial event allocated its buffer and where it wrote to the write_stamp, the code try to reset the write stamp back to the what it had just overwritten. It knows that it was overwritten via checking the before_stamp, and if it didn't match what it wrote to the before_stamp before it allocated its space, it knows it was overwritten.
To put back the write_stamp, it uses the before_stamp it read. The problem here is that by writing the before_stamp to the write_stamp it makes the two equal again, which means that the write_stamp can be considered valid as the last timestamp written to the ring buffer. But this is not necessarily true. The event that interrupted the event could have been interrupted in a way that it was interrupted as well, and can end up leaving with an invalid write_stamp. But if this happens and returns to this context that uses the before_stamp to update the write_stamp again, it can possibly incorrectly make it valid, causing later events to have in correct time stamps.
As it is OK to leave this function with an invalid write_stamp (one that doesn't match the before_stamp), there's no reason to try to make it valid again in this case. If this race happens, then just leave with the invalid write_stamp and the next event to come along will just add a absolute timestamp and validate everything again.
Bonus points: This gets rid of another cmpxchg64!
Link: https://lore.kernel.org/linux-trace-kernel/20231214222921.193037a7@gandalf.l...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Joel Fernandes joel@joelfernandes.org Cc: Vincent Donnefort vdonnefort@google.com Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 29 ++++++----------------------- 1 file changed, 6 insertions(+), 23 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index b9f7998c5bdd..08eb6c310c38 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3621,14 +3621,14 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, }
if (likely(tail == w)) { - u64 save_before; - bool s_ok; - /* Nothing interrupted us between A and C */ /*D*/ rb_time_set(&cpu_buffer->write_stamp, info->ts); - barrier(); - /*E*/ s_ok = rb_time_read(&cpu_buffer->before_stamp, &save_before); - RB_WARN_ON(cpu_buffer, !s_ok); + /* + * If something came in between C and D, the write stamp + * may now not be in sync. But that's fine as the before_stamp + * will be different and then next event will just be forced + * to use an absolute timestamp. + */ if (likely(!(info->add_timestamp & (RB_ADD_STAMP_FORCE | RB_ADD_STAMP_ABSOLUTE)))) /* This did not interrupt any time update */ @@ -3636,24 +3636,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, else /* Just use full timestamp for interrupting event */ info->delta = info->ts; - barrier(); check_buffer(cpu_buffer, info, tail); - if (unlikely(info->ts != save_before)) { - /* SLOW PATH - Interrupted between C and E */ - - a_ok = rb_time_read(&cpu_buffer->write_stamp, &info->after); - RB_WARN_ON(cpu_buffer, !a_ok); - - /* Write stamp must only go forward */ - if (save_before > info->after) { - /* - * We do not care about the result, only that - * it gets updated atomically. - */ - (void)rb_time_cmpxchg(&cpu_buffer->write_stamp, - info->after, save_before); - } - } } else { u64 ts; /* SLOW PATH - Interrupted between A and C */
From: "Steven Rostedt (Google)" rostedt@goodmis.org
stable inclusion from stable-v6.6.8 commit 3432f9686a37a591da81bff73d15da3735e4bac8 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0aa0e5289cfe984a8a9fdd79ccf46ccf080151f7 upstream.
The rb_time_cmpxchg() on 32-bit architectures requires setting three 32-bit words to represent the 64-bit timestamp, with some salt for synchronization. Those are: msb, top, and bottom
The issue is, the rb_time_cmpxchg() did not properly salt the msb portion, and the msb that was written was stale.
Link: https://lore.kernel.org/linux-trace-kernel/20231215084114.20899342@rorschach...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: f03f2abce4f39 ("ring-buffer: Have 32 bit time stamps use all 64 bits") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- kernel/trace/ring_buffer.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 08eb6c310c38..91cb774fb7c1 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -729,10 +729,12 @@ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) cnt2 = cnt + 1;
rb_time_split(val, &top, &bottom, &msb); + msb = rb_time_val_cnt(msb, cnt); top = rb_time_val_cnt(top, cnt); bottom = rb_time_val_cnt(bottom, cnt);
rb_time_split(set, &top2, &bottom2, &msb2); + msb2 = rb_time_val_cnt(msb2, cnt); top2 = rb_time_val_cnt(top2, cnt2); bottom2 = rb_time_val_cnt(bottom2, cnt2);
From: Fangrui Song maskray@google.com
stable inclusion from stable-v6.6.8 commit 06f61af8025452514945657e9b4cb1c9ba8967c5 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b8ec60e1186cdcfce41e7db4c827cb107e459002 upstream.
.discard.retpoline_safe sections do not have the SHF_ALLOC flag. These sections referencing text sections' STT_SECTION symbols with PC-relative relocations like R_386_PC32 [0] is conceptually not suitable. Newer LLD will report warnings for REL relocations even for relocatable links [1]:
ld.lld: warning: vmlinux.a(drivers/i2c/busses/i2c-i801.o):(.discard.retpoline_safe+0x120): has non-ABS relocation R_386_PC32 against symbol ''
Switch to absolute relocations instead, which indicate link-time addresses. In a relocatable link, these addresses are also output section offsets, used by checks in tools/objtool/check.c. When linking vmlinux, these .discard.* sections will be discarded, therefore it is not a problem that R_X86_64_32 cannot represent a kernel address.
Alternatively, we could set the SHF_ALLOC flag for .discard.* sections, but I think non-SHF_ALLOC for sections to be discarded makes more sense.
Note: if we decide to never support REL architectures (e.g. arm, i386), we can utilize R_*_NONE relocations (.reloc ., BFD_RELOC_NONE, sym), making .discard.* sections zero-sized. That said, the section content waste is 4 bytes per entry, much smaller than sizeof(Elf{32,64}_Rel).
[0] commit 1c0c1faf5692 ("objtool: Use relative pointers for annotations") [1] https://github.com/ClangBuiltLinux/linux/issues/1937
Signed-off-by: Fangrui Song maskray@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/20230920001728.1439947-1-maskray@google.com Cc: Nathan Chancellor nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/x86/include/asm/alternative.h | 4 ++-- arch/x86/include/asm/nospec-branch.h | 4 ++-- include/linux/objtool.h | 10 +++++----- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 9c4da699e11a..65f79092c9d9 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -58,7 +58,7 @@ #define ANNOTATE_IGNORE_ALTERNATIVE \ "999:\n\t" \ ".pushsection .discard.ignore_alts\n\t" \ - ".long 999b - .\n\t" \ + ".long 999b\n\t" \ ".popsection\n\t"
/* @@ -352,7 +352,7 @@ static inline int alternatives_text_reserved(void *start, void *end) .macro ANNOTATE_IGNORE_ALTERNATIVE .Lannotate_@: .pushsection .discard.ignore_alts - .long .Lannotate_@ - . + .long .Lannotate_@ .popsection .endm
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 197ff4f4d1ce..0396458c201f 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -196,7 +196,7 @@ .macro ANNOTATE_RETPOLINE_SAFE .Lhere_@: .pushsection .discard.retpoline_safe - .long .Lhere_@ - . + .long .Lhere_@ .popsection .endm
@@ -334,7 +334,7 @@ #define ANNOTATE_RETPOLINE_SAFE \ "999:\n\t" \ ".pushsection .discard.retpoline_safe\n\t" \ - ".long 999b - .\n\t" \ + ".long 999b\n\t" \ ".popsection\n\t"
typedef u8 retpoline_thunk_t[RETPOLINE_THUNK_SIZE]; diff --git a/include/linux/objtool.h b/include/linux/objtool.h index b5440e7da55b..33212e93f4a6 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -48,13 +48,13 @@ #define ANNOTATE_NOENDBR \ "986: \n\t" \ ".pushsection .discard.noendbr\n\t" \ - ".long 986b - .\n\t" \ + ".long 986b\n\t" \ ".popsection\n\t"
#define ASM_REACHABLE \ "998:\n\t" \ ".pushsection .discard.reachable\n\t" \ - ".long 998b - .\n\t" \ + ".long 998b\n\t" \ ".popsection\n\t"
#else /* __ASSEMBLY__ */ @@ -66,7 +66,7 @@ #define ANNOTATE_INTRA_FUNCTION_CALL \ 999: \ .pushsection .discard.intra_function_calls; \ - .long 999b - .; \ + .long 999b; \ .popsection;
/* @@ -118,7 +118,7 @@ .macro ANNOTATE_NOENDBR .Lhere_@: .pushsection .discard.noendbr - .long .Lhere_@ - . + .long .Lhere_@ .popsection .endm
@@ -142,7 +142,7 @@ .macro REACHABLE .Lhere_@: .pushsection .discard.reachable - .long .Lhere_@ - . + .long .Lhere_@ .popsection .endm
From: Patrisious Haddad phaddad@nvidia.com
stable inclusion from stable-v6.6.8 commit 885faf3c7e5f53561c34104d140f2e321ea305a6 bugzilla: https://gitee.com/openeuler/kernel/issues/I99K53
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 02e7d139e5e24abb5fde91934fc9dc0344ac1926 upstream.
Change the key that we send from IB driver to EN driver regarding the MPV device affiliation, since at that stage the IB device is not yet initialized, so its index would be zero for different IB devices and cause wrong associations between unrelated master and slave devices.
Instead use a unique value from inside the core device which is already initialized at this stage.
Fixes: 0d293714ac32 ("RDMA/mlx5: Send events from IB driver about device affiliation state") Signed-off-by: Patrisious Haddad phaddad@nvidia.com Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.169770518... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- drivers/infiniband/hw/mlx5/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 4c4233b9c8b0..102ead497196 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -3263,7 +3263,7 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev,
mlx5_ib_init_cong_debugfs(ibdev, port_num);
- key = ibdev->ib_dev.index; + key = mpi->mdev->priv.adev_idx; mlx5_core_mp_event_replay(mpi->mdev, MLX5_DRIVER_EVENT_AFFILIATION_DONE, &key);
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5318 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/N...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5318 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/N...