Backport 5.10.140 LTS patches from upstream. git cherry-pick v5.10.139..v5.10.140~1 -s
Already merged(-9): 6a564bad3a64 xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list() 1b9b4139d794 xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP* 37837bc3ef31 xfs: return errors in xfs_fs_sync_fs 70d560e2fb5e xfs: only bother with sync_filesystem during readonly remount c5c4d4c9806d af_key: Do not call xfrm_probe_algs in parallel c08a104a8bce netfilter: nf_tables: disallow binding to already bound chain e3e0d117294d x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry 3ddbd0907f6d blk-mq: fix io hung due to missing commit_rqs e89798071784 bpf: Don't use tnum_range on array range checking for poke descriptors
Context conflict: 4379a10c1db7 xfrm: clone missing x->lastused in xfrm_do_migrate 6301a73bd83d netfilter: nf_tables: disallow jump to implicit chain from set element
Skipped(-1): 2761612bcde9 xfrm: policy: fix metadata dst->dev xmit null pointer dereference due to 5b7f84b1f9f4 ("xfrm: Add possibility to set the default to block if we have no policy") skipped
Total patches: 86 - 9 - 1 = 76
Alex Elder (1): net: ipa: don't assume SMEM is page-aligned
Anand Jain (2): btrfs: replace: drop assert for suspended replace btrfs: add info when mount fails due to stale replace target
Antony Antony (1): xfrm: clone missing x->lastused in xfrm_do_migrate
Basavaraj Natikar (1): pinctrl: amd: Don't save/restore interrupt status and wake status bits
Bernard Pidoux (1): rose: check NULL rose_loopback_neigh->loopback
Brian Foster (1): s390: fix double free of GS and RI CBs on fork() failure
Christoph Hellwig (1): fs: remove __sync_filesystem
Colin Ian King (1): netfilter: nftables: remove redundant assignment of variable err
Darrick J. Wong (1): vfs: make sync_filesystem return errors from ->sync_fs
David Hildenbrand (1): mm/hugetlb: fix hugetlb not supporting softdirty tracking
David Howells (1): rxrpc: Fix locking in rxrpc's sendmsg
Duoming Zhou (1): nfc: pn533: Fix use-after-free bugs caused by pn532_cmd_timeout
Eric Dumazet (1): tcp: tweak len/truesize ratio for coalesce candidates
Filipe Manana (1): btrfs: fix silent failure when deleting root reference
Florian Westphal (1): netfilter: ebtables: reject blobs that don't provide all entry points
Gaosheng Cui (1): audit: fix potential double free on error path from fsnotify_add_inode_mark
Goldwyn Rodrigues (1): btrfs: check if root is readonly while setting security xattr
Guoqing Jiang (2): Revert "md-raid: destroy the bitmap after destroying the thread" md: call __md_stop_writes in md_stop
Helge Deller (1): parisc: Fix exception handler for fldw and fstw instructions
Hui Su (1): kernel/sched: Remove dl_boosted flag comment
Jacob Keller (1): ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
James Clark (1): perf python: Fix build when PYTHON_CONFIG is user supplied
Jeremy Sowden (1): netfilter: bitwise: improve error goto labels
Jonathan Toppins (1): bonding: 802.3ad: fix no transmission of LACPDUs
Juergen Gross (1): xen/privcmd: fix error exit of privcmd_ioctl_dm_op()
Kan Liang (1): perf/x86/lbr: Enable the branch type for the Arch LBR by default
Kiwoong Kim (1): scsi: ufs: core: Enable link lost interrupt
Kuniyuki Iwashima (15): net: Fix data-races around sysctl_[rw]mem(_offset)?. net: Fix data-races around sysctl_[rw]mem_(max|default). net: Fix data-races around weight_p and dev_weight_[rt]x_bias. net: Fix data-races around netdev_max_backlog. net: Fix data-races around netdev_tstamp_prequeue. ratelimit: Fix data-races in ___ratelimit(). net: Fix data-races around sysctl_optmem_max. net: Fix a data-race around sysctl_tstamp_allow_data. net: Fix a data-race around sysctl_net_busy_poll. net: Fix a data-race around sysctl_net_busy_read. net: Fix a data-race around netdev_budget. net: Fix a data-race around netdev_budget_usecs. net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. net: Fix data-races around sysctl_devconf_inherit_init_net. net: Fix a data-race around sysctl_somaxconn.
Maciej Fijalkowski (2): ice: xsk: Force rings to be sized to power of 2 ice: xsk: prohibit usage of non-balanced queue id
Maciej Żenczykowski (1): net: ipvtap - add __init/__exit annotations to module init/exit funcs
Martin KaFai Lau (1): bpf: Folding omem_charge() into sk_storage_charge()
Miaohe Lin (1): mm/huge_memory.c: use helper function migration_entry_to_page()
Olga Kornievskaia (1): NFSv4.2 fix problems with __nfs42_ssc_open
Pablo Neira Ayuso (9): netfilter: nft_payload: report ERANGE for too long offset and length netfilter: nft_payload: do not truncate csum_offset and csum_type netfilter: nf_tables: do not leave chain stats enabled on error netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families netfilter: nft_tunnel: restrict it to netdev family netfilter: nf_tables: consolidate rule verdict trace call netfilter: nft_cmp: optimize comparison for 16-bytes netfilter: nf_tables: upfront validation of data via nft_data_init() netfilter: nf_tables: disallow jump to implicit chain from set element
Pawan Gupta (1): x86/bugs: Add "unknown" reporting for MMIO Stale Data
Peter Xu (1): mm/smaps: don't access young/dirty bit if pte unpresent
Quanyang Wang (1): asm-generic: sections: refactor memory_intersects
Randy Dunlap (1): kernel/sys_ni: add compat entry for fadvise64_64
Riwen Lu (1): ACPI: processor: Remove freq Qos request for all CPUs
Salvatore Bonaccorso (1): Documentation/ABI: Mention retbleed vulnerability info file for sysfs
Saurabh Sengar (1): scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq
Sergei Antonov (1): net: moxa: get rid of asymmetry in DMA mapping/unmapping
Shannon Nelson (1): ionic: fix up issues with handling EAGAIN on FW cmds
Siddh Raman Pant (1): loop: Check for overflow while configuring loop
Stephane Eranian (1): perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU
Trond Myklebust (2): NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open() SUNRPC: RPC level errors should set task->tk_rpc_status
Vikas Gupta (1): bnxt_en: fix NQ resource accounting during vf creation on 57500 chips
Vlad Buslov (1): net/mlx5e: Properly disable vlan strip on non-UL reps
Xin Xiong (1): xfrm: fix refcount leak in __xfrm_policy_check()
Zenghui Yu (1): arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76
.../ABI/testing/sysfs-devices-system-cpu | 1 + .../hw-vuln/processor_mmio_stale_data.rst | 14 ++ Documentation/admin-guide/sysctl/net.rst | 2 +- arch/arm64/kernel/cpu_errata.c | 2 + arch/parisc/kernel/unaligned.c | 2 +- arch/s390/kernel/process.c | 22 ++- arch/x86/events/intel/lbr.c | 8 + arch/x86/events/intel/uncore_snb.c | 18 ++- arch/x86/include/asm/cpufeatures.h | 5 +- arch/x86/kernel/cpu/bugs.c | 14 +- arch/x86/kernel/cpu/common.c | 42 ++++-- drivers/acpi/processor_thermal.c | 2 +- drivers/block/loop.c | 5 + drivers/md/md.c | 3 +- drivers/net/bonding/bond_3ad.c | 38 ++--- .../net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 14 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 ++++++-- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 + drivers/net/ethernet/moxa/moxart_ether.c | 11 +- .../net/ethernet/pensando/ionic/ionic_main.c | 4 +- drivers/net/ipa/ipa_mem.c | 2 +- drivers/net/ipvlan/ipvtap.c | 4 +- drivers/nfc/pn533/uart.c | 1 + drivers/pinctrl/pinctrl-amd.c | 11 +- drivers/scsi/storvsc_drv.c | 2 +- drivers/scsi/ufs/ufshci.h | 6 +- drivers/xen/privcmd.c | 21 +-- fs/btrfs/dev-replace.c | 5 +- fs/btrfs/root-tree.c | 5 +- fs/btrfs/xattr.c | 3 + fs/nfs/nfs4file.c | 16 +- fs/proc/task_mmu.c | 7 +- fs/sync.c | 48 +++--- include/asm-generic/sections.h | 7 +- include/linux/netdevice.h | 20 ++- include/linux/netfilter_bridge/ebtables.h | 4 - include/linux/sched.h | 4 - include/net/busy_poll.h | 2 +- include/net/netfilter/nf_tables.h | 9 +- include/net/netfilter/nf_tables_core.h | 9 ++ include/net/sock.h | 8 +- kernel/audit_fsnotify.c | 1 + kernel/sys_ni.c | 1 + lib/ratelimit.c | 12 +- mm/huge_memory.c | 4 +- mm/mmap.c | 8 +- net/bridge/netfilter/ebtable_broute.c | 8 - net/bridge/netfilter/ebtable_filter.c | 8 - net/bridge/netfilter/ebtable_nat.c | 8 - net/bridge/netfilter/ebtables.c | 8 +- net/core/bpf_sk_storage.c | 24 ++- net/core/dev.c | 18 +-- net/core/filter.c | 13 +- net/core/gro_cells.c | 2 +- net/core/skbuff.c | 2 +- net/core/sock.c | 18 ++- net/core/sysctl_net_core.c | 15 +- net/decnet/af_decnet.c | 4 +- net/ipv4/devinet.c | 16 +- net/ipv4/ip_output.c | 2 +- net/ipv4/ip_sockglue.c | 6 +- net/ipv4/tcp.c | 6 +- net/ipv4/tcp_input.c | 51 +++++-- net/ipv4/tcp_output.c | 4 +- net/ipv6/addrconf.c | 5 +- net/ipv6/ipv6_sockglue.c | 4 +- net/mptcp/protocol.c | 6 +- net/netfilter/ipvs/ip_vs_sync.c | 4 +- net/netfilter/nf_tables_api.c | 88 ++++++----- net/netfilter/nf_tables_core.c | 55 ++++++- net/netfilter/nft_bitwise.c | 67 +++++---- net/netfilter/nft_cmp.c | 142 +++++++++++++++--- net/netfilter/nft_immediate.c | 22 ++- net/netfilter/nft_osf.c | 18 ++- net/netfilter/nft_payload.c | 29 +++- net/netfilter/nft_range.c | 27 ++-- net/netfilter/nft_tunnel.c | 1 + net/rose/rose_loopback.c | 3 +- net/rxrpc/call_object.c | 4 +- net/rxrpc/sendmsg.c | 92 +++++++----- net/sched/sch_generic.c | 2 +- net/socket.c | 2 +- net/sunrpc/clnt.c | 2 +- net/tipc/socket.c | 2 +- net/xfrm/espintcp.c | 2 +- net/xfrm/xfrm_input.c | 2 +- net/xfrm/xfrm_policy.c | 1 + net/xfrm/xfrm_state.c | 1 + tools/perf/Makefile.config | 2 +- 90 files changed, 834 insertions(+), 452 deletions(-)
From: Gaosheng Cui cuigaosheng1@huawei.com
stable inclusion from stable-v5.10.140 commit e10bb2f2e99b01ab7f9ec965735dcb4592b5490a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit ad982c3be4e60c7d39c03f782733503cbd88fd2a upstream.
Audit_alloc_mark() assign pathname to audit_mark->path, on error path from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory of audit_mark->path, but the caller of audit_alloc_mark will free the pathname again, so there will be double free problem.
Fix this by resetting audit_mark->path to NULL pointer on error path from fsnotify_add_inode_mark().
Cc: stable@vger.kernel.org Fixes: 7b1293234084d ("fsnotify: Add group pointer in fsnotify_init_mark()") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- kernel/audit_fsnotify.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 5b3f01da172b..b2ebacd2f309 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -102,6 +102,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true); if (ret < 0) { + audit_mark->path = NULL; fsnotify_put_mark(&audit_mark->mark); audit_mark = ERR_PTR(ret); }
From: Helge Deller deller@gmx.de
stable inclusion from stable-v5.10.140 commit df1d445e7fcfbc8e4bbed4ce936977b4a7642cd9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 7ae1f5508d9a33fd58ed3059bd2d569961e3b8bd upstream.
The exception handler is broken for unaligned memory acceses with fldw and fstw instructions, because it trashes or uses randomly some other floating point register than the one specified in the instruction word on loads and stores.
The instruction "fldw 0(addr),%fr22L" (and the other fldw/fstw instructions) encode the target register (%fr22) in the rightmost 5 bits of the instruction word. The 7th rightmost bit of the instruction word defines if the left or right half of %fr22 should be used.
While processing unaligned address accesses, the FR3() define is used to extract the offset into the local floating-point register set. But the calculation in FR3() was buggy, so that for example instead of %fr22, register %fr12 [((22 * 2) & 0x1f) = 12] was used.
This bug has been since forever in the parisc kernel and I wonder why it wasn't detected earlier. Interestingly I noticed this bug just because the libime debian package failed to build on *native* hardware, while it successfully built in qemu.
This patch corrects the bitshift and masking calculation in FR3().
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/parisc/kernel/unaligned.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/parisc/kernel/unaligned.c b/arch/parisc/kernel/unaligned.c index 286cec4d86d7..cc6ed7496050 100644 --- a/arch/parisc/kernel/unaligned.c +++ b/arch/parisc/kernel/unaligned.c @@ -107,7 +107,7 @@ #define R1(i) (((i)>>21)&0x1f) #define R2(i) (((i)>>16)&0x1f) #define R3(i) ((i)&0x1f) -#define FR3(i) ((((i)<<1)&0x1f)|(((i)>>6)&1)) +#define FR3(i) ((((i)&0x1f)<<1)|(((i)>>6)&1)) #define IM(i,n) (((i)>>1&((1<<(n-1))-1))|((i)&1?((0-1L)<<(n-1)):0)) #define IM5_2(i) IM((i)>>16,5) #define IM5_3(i) IM((i),5)
From: Randy Dunlap rdunlap@infradead.org
stable inclusion from stable-v5.10.140 commit 665433b5ddc29901c2611c4bfd7ae56e402307eb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit a8faed3a02eeb75857a3b5d660fa80fe79db77a3 upstream.
When CONFIG_ADVISE_SYSCALLS is not set/enabled and CONFIG_COMPAT is set/enabled, the riscv compat_syscall_table references 'compat_sys_fadvise64_64', which is not defined:
riscv64-linux-ld: arch/riscv/kernel/compat_syscall_table.o:(.rodata+0x6f8): undefined reference to `compat_sys_fadvise64_64'
Add 'fadvise64_64' to kernel/sys_ni.c as a conditional COMPAT function so that when CONFIG_ADVISE_SYSCALLS is not set, there is a fallback function available.
Link: https://lkml.kernel.org/r/20220807220934.5689-1-rdunlap@infradead.org Fixes: d3ac21cacc24 ("mm: Support compiling out madvise and fadvise") Signed-off-by: Randy Dunlap rdunlap@infradead.org Suggested-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Arnd Bergmann arnd@arndb.de Cc: Josh Triplett josh@joshtriplett.org Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Albert Ou aou@eecs.berkeley.edu Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- kernel/sys_ni.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index f27ac94d5fa7..cdecd47e5580 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -268,6 +268,7 @@ COND_SYSCALL_COMPAT(keyctl);
/* mm/fadvise.c */ COND_SYSCALL(fadvise64_64); +COND_SYSCALL_COMPAT(fadvise64_64);
/* mm/, CONFIG_MMU only */ COND_SYSCALL(swapon);
From: Basavaraj Natikar Basavaraj.Natikar@amd.com
stable inclusion from stable-v5.10.140 commit a5757df6128b9d7ac923fa59aceff9f212b36106 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b8c824a869f220c6b46df724f85794349bafbf23 upstream.
Saving/restoring interrupt and wake status bits across suspend can cause the suspend to fail if an IRQ is serviced across the suspend cycle.
Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Basavaraj Natikar Basavaraj.Natikar@amd.com Fixes: 79d2c8bede2c ("pinctrl/amd: save pin registers over suspend/resume") Link: https://lore.kernel.org/r/20220613064127.220416-3-Basavaraj.Natikar@amd.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/pinctrl/pinctrl-amd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c index e20bcc835d6a..82b658a3c220 100644 --- a/drivers/pinctrl/pinctrl-amd.c +++ b/drivers/pinctrl/pinctrl-amd.c @@ -815,6 +815,7 @@ static int amd_gpio_suspend(struct device *dev) { struct amd_gpio *gpio_dev = dev_get_drvdata(dev); struct pinctrl_desc *desc = gpio_dev->pctrl->desc; + unsigned long flags; int i;
for (i = 0; i < desc->npins; i++) { @@ -823,7 +824,9 @@ static int amd_gpio_suspend(struct device *dev) if (!amd_gpio_should_save(gpio_dev, pin)) continue;
- gpio_dev->saved_regs[i] = readl(gpio_dev->base + pin*4); + raw_spin_lock_irqsave(&gpio_dev->lock, flags); + gpio_dev->saved_regs[i] = readl(gpio_dev->base + pin * 4) & ~PIN_IRQ_PENDING; + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); }
return 0; @@ -833,6 +836,7 @@ static int amd_gpio_resume(struct device *dev) { struct amd_gpio *gpio_dev = dev_get_drvdata(dev); struct pinctrl_desc *desc = gpio_dev->pctrl->desc; + unsigned long flags; int i;
for (i = 0; i < desc->npins; i++) { @@ -841,7 +845,10 @@ static int amd_gpio_resume(struct device *dev) if (!amd_gpio_should_save(gpio_dev, pin)) continue;
- writel(gpio_dev->saved_regs[i], gpio_dev->base + pin*4); + raw_spin_lock_irqsave(&gpio_dev->lock, flags); + gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4) & PIN_IRQ_PENDING; + writel(gpio_dev->saved_regs[i], gpio_dev->base + pin * 4); + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); }
return 0;
From: Christoph Hellwig hch@lst.de
stable inclusion from stable-v5.10.140 commit 9255a42fe7ab65bc1972ed7452966199d47b7009 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 9a208ba5c9afa62c7b1e9c6f5e783066e84e2d3c upstream.
[backported for dependency]
There is no clear benefit in having this helper vs just open coding it.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Link: https://lore.kernel.org/r/20211019062530.2174626-2-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/sync.c | 38 +++++++++++++++++--------------------- 1 file changed, 17 insertions(+), 21 deletions(-)
diff --git a/fs/sync.c b/fs/sync.c index 1373a610dc78..0d6cdc507cb9 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -21,25 +21,6 @@ #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ SYNC_FILE_RANGE_WAIT_AFTER)
-/* - * Do the filesystem syncing work. For simple filesystems - * writeback_inodes_sb(sb) just dirties buffers with inodes so we have to - * submit IO for these buffers via __sync_blockdev(). This also speeds up the - * wait == 1 case since in that case write_inode() functions do - * sync_dirty_buffer() and thus effectively write one block at a time. - */ -static int __sync_filesystem(struct super_block *sb, int wait) -{ - if (wait) - sync_inodes_sb(sb); - else - writeback_inodes_sb(sb, WB_REASON_SYNC); - - if (sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, wait); - return __sync_blockdev(sb->s_bdev, wait); -} - /* * Write out and wait upon all dirty data associated with this * superblock. Filesystem data as well as the underlying block @@ -61,10 +42,25 @@ int sync_filesystem(struct super_block *sb) if (sb_rdonly(sb)) return 0;
- ret = __sync_filesystem(sb, 0); + /* + * Do the filesystem syncing work. For simple filesystems + * writeback_inodes_sb(sb) just dirties buffers with inodes so we have + * to submit I/O for these buffers via __sync_blockdev(). This also + * speeds up the wait == 1 case since in that case write_inode() + * methods call sync_dirty_buffer() and thus effectively write one block + * at a time. + */ + writeback_inodes_sb(sb, WB_REASON_SYNC); + if (sb->s_op->sync_fs) + sb->s_op->sync_fs(sb, 0); + ret = __sync_blockdev(sb->s_bdev, 0); if (ret < 0) return ret; - return __sync_filesystem(sb, 1); + + sync_inodes_sb(sb); + if (sb->s_op->sync_fs) + sb->s_op->sync_fs(sb, 1); + return __sync_blockdev(sb->s_bdev, 1); } EXPORT_SYMBOL(sync_filesystem);
From: "Darrick J. Wong" djwong@kernel.org
stable inclusion from stable-v5.10.140 commit 76a51e49da9c9919b97225982d720da5315ebaa7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 5679897eb104cec9e99609c3f045a0c20603da4c upstream.
[backport to 5.10 only differs in __sync_blockdev helper]
Strangely, sync_filesystem ignores the return code from the ->sync_fs call, which means that syscalls like syncfs(2) never see the error. This doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/sync.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/sync.c b/fs/sync.c index 0d6cdc507cb9..79180e58d862 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -28,7 +28,7 @@ */ int sync_filesystem(struct super_block *sb) { - int ret; + int ret = 0;
/* * We need to be protected against the filesystem going from @@ -51,15 +51,21 @@ int sync_filesystem(struct super_block *sb) * at a time. */ writeback_inodes_sb(sb, WB_REASON_SYNC); - if (sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, 0); + if (sb->s_op->sync_fs) { + ret = sb->s_op->sync_fs(sb, 0); + if (ret) + return ret; + } ret = __sync_blockdev(sb->s_bdev, 0); - if (ret < 0) + if (ret) return ret;
sync_inodes_sb(sb); - if (sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, 1); + if (sb->s_op->sync_fs) { + ret = sb->s_op->sync_fs(sb, 1); + if (ret) + return ret; + } return __sync_blockdev(sb->s_bdev, 1); } EXPORT_SYMBOL(sync_filesystem);
From: Hui Su suhui_kernel@163.com
stable inclusion from stable-v5.10.140 commit c30c0f720533c6bcab2e16b90d302afb50a61d2a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0e3872499de1a1230cef5221607d71aa09264bd5 upstream.
since commit 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes"), we should not keep it here.
Signed-off-by: Hui Su suhui_kernel@163.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Daniel Bristot de Oliveira bristot@redhat.com Link: https://lore.kernel.org/r/20220107095254.GA49258@localhost.localdomain Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/linux/sched.h | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index c52aa9bc4acc..a84372945f9e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -567,10 +567,6 @@ struct sched_dl_entity { * task has to wait for a replenishment to be performed at the * next firing of dl_timer. * - * @dl_boosted tells if we are boosted due to DI. If so we are - * outside bandwidth enforcement mechanism (but only until we - * exit the critical section); - * * @dl_yielded tells if task gave up the CPU before consuming * all its available runtime during the last job. *
From: Xin Xiong xiongx18@fudan.edu.cn
stable inclusion from stable-v5.10.140 commit 1305d7d4f35ca6f214a2d23b075aa6a924cff3be category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]
The issue happens on an error path in __xfrm_policy_check(). When the fetching process of the object `pols[1]` fails, the function simply returns 0, forgetting to decrement the reference count of `pols[0]`, which is incremented earlier by either xfrm_sk_policy_lookup() or xfrm_policy_lookup(). This may result in memory leaks.
Fix it by decreasing the reference count of `pols[0]` in that path.
Fixes: 134b0fc544ba ("IPsec: propagate security module errors up from flow_cache_lookup") Signed-off-by: Xin Xiong xiongx18@fudan.edu.cn Signed-off-by: Xin Tan tanxin.ctf@gmail.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/xfrm/xfrm_policy.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 01a783fccf76..78a687d5ab3e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3631,6 +3631,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, if (pols[1]) { if (IS_ERR(pols[1])) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR); + xfrm_pol_put(pols[0]); return 0; } pols[1]->curlft.use_time = ktime_get_real_seconds();
From: Antony Antony antony.antony@secunet.com
stable inclusion from stable-v5.10.140 commit 4379a10c1db793ce39ea410e36ddc3099ec0694c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 6aa811acdb76facca0b705f4e4c1d948ccb6af8b ]
x->lastused was not cloned in xfrm_do_migrate. Add it to clone during migrate.
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony antony.antony@secunet.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com
Conflicts: net/xfrm/xfrm_state.c Reviewed-by: Wei Li liwei391@huawei.com --- net/xfrm/xfrm_state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 25bf8df9bf31..ac2f1a7330c9 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1556,6 +1556,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, x->km.seq = orig->km.seq; x->replay = orig->replay; x->preplay = orig->preplay; + x->lastused = orig->lastused;
return x;
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from stable-v5.10.140 commit 23c6f25a60435f85e1a442dd5492e96a260dd731 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 156cd28562a4e8ca454d11b234d9f634a45d6390 ]
The preferred behaviour is always to allocate struct nfs_fattr from the slab.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/nfs/nfs4file.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index f12494e08267..33481c8fc131 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -318,7 +318,7 @@ static int read_name_gen = 1; static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt, struct nfs_fh *src_fh, nfs4_stateid *stateid) { - struct nfs_fattr fattr; + struct nfs_fattr *fattr = nfs_alloc_fattr(); struct file *filep, *res; struct nfs_server *server; struct inode *r_ino = NULL; @@ -329,9 +329,10 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,
server = NFS_SERVER(ss_mnt->mnt_root->d_inode);
- nfs_fattr_init(&fattr); + if (!fattr) + return ERR_PTR(-ENOMEM);
- status = nfs4_proc_getattr(server, src_fh, &fattr, NULL, NULL); + status = nfs4_proc_getattr(server, src_fh, fattr, NULL, NULL); if (status < 0) { res = ERR_PTR(status); goto out; @@ -344,7 +345,7 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt, goto out; snprintf(read_name, len, SSC_READ_NAME_BODY, read_name_gen++);
- r_ino = nfs_fhget(ss_mnt->mnt_root->d_inode->i_sb, src_fh, &fattr, + r_ino = nfs_fhget(ss_mnt->mnt_root->d_inode->i_sb, src_fh, fattr, NULL); if (IS_ERR(r_ino)) { res = ERR_CAST(r_ino); @@ -389,6 +390,7 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt, out_free_name: kfree(read_name); out: + nfs_free_fattr(fattr); return res; out_stateowner: nfs4_put_state_owner(sp);
From: Olga Kornievskaia kolga@netapp.com
stable inclusion from stable-v5.10.140 commit 5e49ea099850feadcbf33c74b4f514a3e8049b91 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit fcfc8be1e9cf2f12b50dce8b579b3ae54443a014 ]
A destination server while doing a COPY shouldn't accept using the passed in filehandle if its not a regular filehandle.
If alloc_file_pseudo() has failed, we need to decrement a reference on the newly created inode, otherwise it leaks.
Reported-by: Al Viro viro@zeniv.linux.org.uk Fixes: ec4b092508982 ("NFS: inter ssc open") Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/nfs/nfs4file.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 33481c8fc131..fc95b2a0ef9f 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -338,6 +338,11 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt, goto out; }
+ if (!S_ISREG(fattr->mode)) { + res = ERR_PTR(-EBADF); + goto out; + } + res = ERR_PTR(-ENOMEM); len = strlen(SSC_READ_NAME_BODY) + 16; read_name = kzalloc(len, GFP_NOFS); @@ -356,6 +361,7 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt, r_ino->i_fop); if (IS_ERR(filep)) { res = ERR_CAST(filep); + iput(r_ino); goto out_free_name; } filep->f_mode |= FMODE_READ;
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from stable-v5.10.140 commit 8be096f018e4d4a4d430c8c87c4009c7024ffb90 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit ed06fce0b034b2e25bd93430f5c4cbb28036cc1a ]
Fix up a case in call_encode() where we're failing to set task->tk_rpc_status when an RPC level error occurred.
Fixes: 9c5948c24869 ("SUNRPC: task should be exit if encode return EKEYEXPIRED more times") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/sunrpc/clnt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index c5af31312e0c..78c6648af782 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1867,7 +1867,7 @@ call_encode(struct rpc_task *task) break; case -EKEYEXPIRED: if (!task->tk_cred_retry) { - rpc_exit(task, task->tk_status); + rpc_call_rpcerror(task, task->tk_status); } else { task->tk_action = call_refresh; task->tk_cred_retry--;
From: Miaohe Lin linmiaohe@huawei.com
stable inclusion from stable-v5.10.140 commit c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a44f89dc6c5f8ba70240b81a570260d29d04bcb0 ]
It's more recommended to use helper function migration_entry_to_page() to get the page via migration entry. We can also enjoy the PageLocked() check there.
Link: https://lkml.kernel.org/r/20210318122722.13135-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Matthew Wilcox willy@infradead.org Cc: Michel Lespinasse walken@google.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Thomas Hellstrm (Intel) thomas_os@shipmail.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Yang richard.weiyang@linux.alibaba.com Cc: William Kucharski william.kucharski@oracle.com Cc: Yang Shi yang.shi@linux.alibaba.com Cc: yuleixzhang yulei.kernel@gmail.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e6f1903f6f63..e20f47caeabb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1707,7 +1707,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + page = migration_entry_to_page(entry); flush_needed = 0; } else WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); @@ -2128,7 +2128,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, swp_entry_t entry;
entry = pmd_to_swp_entry(old_pmd); - page = pfn_to_page(swp_offset(entry)); + page = migration_entry_to_page(entry); write = is_write_migration_entry(entry); young = false; soft_dirty = pmd_swp_soft_dirty(old_pmd);
From: Peter Xu peterx@redhat.com
stable inclusion from stable-v5.10.140 commit e9fe1283a88c658773112b261df31ae177119caa category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit efd4149342db2df41b1bbe68972ead853b30e444 ]
These bits should only be valid when the ptes are present. Introducing two booleans for it and set it to false when !pte_present() for both pte and pmd accountings.
The bug is found during code reading and no real world issue reported, but logically such an error can cause incorrect readings for either smaps or smaps_rollup output on quite a few fields.
For example, it could cause over-estimate on values like Shared_Dirty, Private_Dirty, Referenced. Or it could also cause under-estimate on values like LazyFree, Shared_Clean, Private_Clean.
Link: https://lkml.kernel.org/r/20220805160003.58929-1-peterx@redhat.com Fixes: b1d4d9e0cbd0 ("proc/smaps: carefully handle migration entries") Fixes: c94b6923fa0a ("/proc/PID/smaps: Add PMD migration entry parsing") Signed-off-by: Peter Xu peterx@redhat.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Konstantin Khlebnikov khlebnikov@openvz.org Cc: Huang Ying ying.huang@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/proc/task_mmu.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f96d081b881a..81a871ce7b17 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -505,10 +505,12 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, struct vm_area_struct *vma = walk->vma; bool locked = !!(vma->vm_flags & VM_LOCKED); struct page *page = NULL; - bool migration = false; + bool migration = false, young = false, dirty = false;
if (pte_present(*pte)) { page = vm_normal_page(vma, addr, *pte); + young = pte_young(*pte); + dirty = pte_dirty(*pte); } else if (is_swap_pte(*pte)) { swp_entry_t swpent = pte_to_swp_entry(*pte);
@@ -542,8 +544,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, if (!page) return;
- smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte), - locked, migration); + smaps_account(mss, page, false, young, dirty, locked, migration); }
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
From: Bernard Pidoux f6bvp@free.fr
stable inclusion from stable-v5.10.140 commit de3deadd11987070788b48825bec4647458b988d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ]
Commit 3b3fd068c56e3fbea30090859216a368398e39bf added NULL check for `rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to check rose_loopback_neigh->loopback.
It thus prevents *all* rose connect.
The reason is that a special rose_neigh loopback has a NULL device.
/proc/net/rose_neigh illustrates it via rose_neigh_show() function : [...] seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu", rose_neigh->number, (rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign), rose_neigh->dev ? rose_neigh->dev->name : "???", rose_neigh->count,
/proc/net/rose_neigh displays special rose_loopback_neigh->loopback as callsign RSLOOP-0:
addr callsign dev count use mode restart t0 tf digipeaters 00001 RSLOOP-0 ??? 1 2 DCE yes 0 0
By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called even in case rose_loopback_neigh->dev is NULL. This repairs rose connections.
Verification with rose client application FPAC:
FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h) F6BVP-4 (Commands = ?) : u Users - AX.25 Level 2 sessions : Port Callsign Callsign AX.25 state ROSE state NetRom status axudp F6BVP-5 -> F6BVP-9 Connected Connected ---------
Fixes: 3b3fd068c56e ("rose: Fix Null pointer dereference in rose_send_frame()") Signed-off-by: Bernard Pidoux f6bvp@free.fr Suggested-by: Francois Romieu romieu@fr.zoreil.com Cc: Thomas DL9SAU Osterried thomas@osterried.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/rose/rose_loopback.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/rose/rose_loopback.c b/net/rose/rose_loopback.c index 11c45c8c6c16..036d92c0ad79 100644 --- a/net/rose/rose_loopback.c +++ b/net/rose/rose_loopback.c @@ -96,7 +96,8 @@ static void rose_loopback_timer(struct timer_list *unused) }
if (frametype == ROSE_CALL_REQUEST) { - if (!rose_loopback_neigh->dev) { + if (!rose_loopback_neigh->dev && + !rose_loopback_neigh->loopback) { kfree_skb(skb); continue; }
From: Duoming Zhou duoming@zju.edu.cn
stable inclusion from stable-v5.10.140 commit 50403ee6daddf0d7a14e9d3b51a377c39a08ec8c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f1e941dbf80a9b8bab0bffbc4cbe41cc7f4c6fb6 ]
When the pn532 uart device is detaching, the pn532_uart_remove() is called. But there are no functions in pn532_uart_remove() that could delete the cmd_timeout timer, which will cause use-after-free bugs. The process is shown below:
(thread 1) | (thread 2) | pn532_uart_send_frame pn532_uart_remove | mod_timer(&pn532->cmd_timeout,...) ... | (wait a time) kfree(pn532) //FREE | pn532_cmd_timeout | pn532_uart_send_frame | pn532->... //USE
This patch adds del_timer_sync() in pn532_uart_remove() in order to prevent the use-after-free bugs. What's more, the pn53x_unregister_nfc() is well synchronized, it sets nfc_dev->shutting_down to true and there are no syscalls could restart the cmd_timeout timer.
Fixes: c656aa4c27b1 ("nfc: pn533: add UART phy driver") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/nfc/pn533/uart.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/nfc/pn533/uart.c b/drivers/nfc/pn533/uart.c index a0665d8ea85b..e92535ebb528 100644 --- a/drivers/nfc/pn533/uart.c +++ b/drivers/nfc/pn533/uart.c @@ -310,6 +310,7 @@ static void pn532_uart_remove(struct serdev_device *serdev) pn53x_unregister_nfc(pn532->priv); serdev_device_close(serdev); pn53x_common_clean(pn532->priv); + del_timer_sync(&pn532->cmd_timeout); kfree_skb(pn532->recv_skb); kfree(pn532); }
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
stable inclusion from stable-v5.10.140 commit d29d7108e19e1c471e398bf85028999c0d0e4eae category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 296f13ff3854535009a185aaf8e3603266d39d94 ]
With the upcoming introduction of batching to XSK data path, performance wise it will be the best to have the ring descriptor count to be aligned to power of 2.
Check if ring sizes that user is going to attach the XSK socket fulfill the condition above. For Tx side, although check is being done against the Tx queue and in the end the socket will be attached to the XDP queue, it is fine since XDP queues get the ring->count setting from Tx queues.
Suggested-by: Alexander Lobakin alexandr.lobakin@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Alexander Lobakin alexandr.lobakin@intel.com Acked-by: Magnus Karlsson magnus.karlsson@intel.com Link: https://lore.kernel.org/bpf/20220125160446.78976-3-maciej.fijalkowski@intel.... Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/intel/ice/ice_xsk.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 5733526fa245..4bb62950d92d 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -371,6 +371,13 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) bool if_running, pool_present = !!pool; int ret = 0, pool_failure = 0;
+ if (!is_power_of_2(vsi->rx_rings[qid]->count) || + !is_power_of_2(vsi->tx_rings[qid]->count)) { + netdev_err(vsi->netdev, "Please align ring sizes to power of 2\n"); + pool_failure = -EINVAL; + goto failure; + } + if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
if (if_running) { @@ -393,6 +400,7 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret); }
+failure: if (pool_failure) { netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n", pool_present ? "en" : "dis", pool_failure);
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
stable inclusion from stable-v5.10.140 commit 1bfdcde723d8ceb2d73291b0415767e7c1cc1d8a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 5a42f112d367bb4700a8a41f5c12724fde6bfbb9 ]
Fix the following scenario: 1. ethtool -L $IFACE rx 8 tx 96 2. xdpsock -q 10 -t -z
Above refers to a case where user would like to attach XSK socket in txonly mode at a queue id that does not have a corresponding Rx queue. At this moment ice's XSK logic is tightly bound to act on a "queue pair", e.g. both Tx and Rx queues at a given queue id are disabled/enabled and both of them will get XSK pool assigned, which is broken for the presented queue configuration. This results in the splat included at the bottom, which is basically an OOB access to Rx ring array.
To fix this, allow using the ids only in scope of "combined" queues reported by ethtool. However, logic should be rewritten to allow such configurations later on, which would end up as a complete rewrite of the control path, so let us go with this temporary fix.
[420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082 [420160.566359] #PF: supervisor read access in kernel mode [420160.572657] #PF: error_code(0x0000) - not-present page [420160.579002] PGD 0 P4D 0 [420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI [420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10 [420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420160.740707] PKRU: 55555554 [420160.745960] Call Trace: [420160.750962] <TASK> [420160.755597] ? kmalloc_large_node+0x79/0x90 [420160.762703] ? __kmalloc_node+0x3f5/0x4b0 [420160.769341] xp_assign_dev+0xfd/0x210 [420160.775661] ? shmem_file_read_iter+0x29a/0x420 [420160.782896] xsk_bind+0x152/0x490 [420160.788943] __sys_bind+0xd0/0x100 [420160.795097] ? exit_to_user_mode_prepare+0x20/0x120 [420160.802801] __x64_sys_bind+0x16/0x20 [420160.809298] do_syscall_64+0x38/0x90 [420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd [420160.823731] RIP: 0033:0x7fa86a0dd2fb [420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48 [420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb [420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003 [420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000 [420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0 [420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000 [420160.919817] </TASK> [420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice] [420160.977576] CR2: 0000000000000082 [420160.985037] ---[ end trace 0000000000000000 ]--- [420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420161.257052] PKRU: 55555554
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: George Kuruvinakunnel george.kuruvinakunnel@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/intel/ice/ice_xsk.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 4bb62950d92d..59963b901be0 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -371,6 +371,12 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) bool if_running, pool_present = !!pool; int ret = 0, pool_failure = 0;
+ if (qid >= vsi->num_rxq || qid >= vsi->num_txq) { + netdev_err(vsi->netdev, "Please use queue id in scope of combined queues count\n"); + pool_failure = -EINVAL; + goto failure; + } + if (!is_power_of_2(vsi->rx_rings[qid]->count) || !is_power_of_2(vsi->tx_rings[qid]->count)) { netdev_err(vsi->netdev, "Please align ring sizes to power of 2\n");
From: Vlad Buslov vladbu@nvidia.com
stable inclusion from stable-v5.10.140 commit 29accb2d96e69a9ccd5050a19d00831d3546c2b5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f37044fd759b6bc40b6398a978e0b1acdf717372 ]
When querying mlx5 non-uplink representors capabilities with ethtool rx-vlan-offload is marked as "off [fixed]". However, it is actually always enabled because mlx5e_params->vlan_strip_disable is 0 by default when initializing struct mlx5e_params instance. Fix the issue by explicitly setting the vlan_strip_disable to 'true' for non-uplink representors.
Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 5778ce5ff4e8..21362367e913 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -716,6 +716,8 @@ static void mlx5e_build_rep_params(struct net_device *netdev)
params->num_tc = 1; params->tunneled_offload_en = false; + if (rep->vport != MLX5_VPORT_UPLINK) + params->vlan_strip_disable = true;
mlx5_query_min_inline(mdev, ¶ms->tx_min_inline_mode);
From: Alex Elder elder@linaro.org
stable inclusion from stable-v5.10.140 commit faa8bf8451067aec58322cc6e101cbcf19811585 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit b8d4380365c515d8e0351f2f46d371738dd19be1 ]
In ipa_smem_init(), a Qualcomm SMEM region is allocated (if needed) and then its virtual address is fetched using qcom_smem_get(). The physical address associated with that region is also fetched.
The physical address is adjusted so that it is page-aligned, and an attempt is made to update the size of the region to compensate for any non-zero adjustment.
But that adjustment isn't done properly. The physical address is aligned twice, and as a result the size is never actually adjusted.
Fix this by *not* aligning the "addr" local variable, and instead making the "phys" local variable be the adjusted "addr" value.
Fixes: a0036bb413d5b ("net: ipa: define SMEM memory region for IPA") Signed-off-by: Alex Elder elder@linaro.org Link: https://lore.kernel.org/r/20220818134206.567618-1-elder@linaro.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ipa/ipa_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ipa/ipa_mem.c b/drivers/net/ipa/ipa_mem.c index a78d66051a17..25a8d029f207 100644 --- a/drivers/net/ipa/ipa_mem.c +++ b/drivers/net/ipa/ipa_mem.c @@ -414,7 +414,7 @@ static int ipa_smem_init(struct ipa *ipa, u32 item, size_t size) }
/* Align the address down and the size up to a page boundary */ - addr = qcom_smem_virt_to_phys(virt) & PAGE_MASK; + addr = qcom_smem_virt_to_phys(virt); phys = addr & PAGE_MASK; size = PAGE_ALIGN(size + addr - phys); iova = phys; /* We just want a direct mapping */
From: Sergei Antonov saproj@gmail.com
stable inclusion from stable-v5.10.140 commit 14ef913a9582fcc06602f641e97a4ae12b5f4247 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 0ee7828dfc56e97d71e51e6374dc7b4eb2b6e081 ]
Since priv->rx_mapping[i] is maped in moxart_mac_open(), we should unmap it from moxart_mac_stop(). Fixes 2 warnings.
1. During error unwinding in moxart_mac_probe(): "goto init_fail;", then moxart_mac_free_memory() calls dma_unmap_single() with priv->rx_mapping[i] pointers zeroed.
WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:963 check_unmap+0x704/0x980 DMA-API: moxart-ethernet 92000000.mac: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=1600 bytes] CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0+ #60 Hardware name: Generic DT based system unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0xbc/0x1f0 __warn from warn_slowpath_fmt+0x94/0xc8 warn_slowpath_fmt from check_unmap+0x704/0x980 check_unmap from debug_dma_unmap_page+0x8c/0x9c debug_dma_unmap_page from moxart_mac_free_memory+0x3c/0xa8 moxart_mac_free_memory from moxart_mac_probe+0x190/0x218 moxart_mac_probe from platform_probe+0x48/0x88 platform_probe from really_probe+0xc0/0x2e4
2. After commands: ip link set dev eth0 down ip link set dev eth0 up
WARNING: CPU: 0 PID: 55 at kernel/dma/debug.c:570 add_dma_entry+0x204/0x2ec DMA-API: moxart-ethernet 92000000.mac: cacheline tracking EEXIST, overlapping mappings aren't supported CPU: 0 PID: 55 Comm: ip Not tainted 5.19.0+ #57 Hardware name: Generic DT based system unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0xbc/0x1f0 __warn from warn_slowpath_fmt+0x94/0xc8 warn_slowpath_fmt from add_dma_entry+0x204/0x2ec add_dma_entry from dma_map_page_attrs+0x110/0x328 dma_map_page_attrs from moxart_mac_open+0x134/0x320 moxart_mac_open from __dev_open+0x11c/0x1ec __dev_open from __dev_change_flags+0x194/0x22c __dev_change_flags from dev_change_flags+0x14/0x44 dev_change_flags from devinet_ioctl+0x6d4/0x93c devinet_ioctl from inet_ioctl+0x1ac/0x25c
v1 -> v2: Extraneous change removed.
Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver") Signed-off-by: Sergei Antonov saproj@gmail.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20220819110519.1230877-1-saproj@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/moxa/moxart_ether.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c index 6137000b11c5..73aac97fb5c9 100644 --- a/drivers/net/ethernet/moxa/moxart_ether.c +++ b/drivers/net/ethernet/moxa/moxart_ether.c @@ -74,11 +74,6 @@ static int moxart_set_mac_address(struct net_device *ndev, void *addr) static void moxart_mac_free_memory(struct net_device *ndev) { struct moxart_mac_priv_t *priv = netdev_priv(ndev); - int i; - - for (i = 0; i < RX_DESC_NUM; i++) - dma_unmap_single(&priv->pdev->dev, priv->rx_mapping[i], - priv->rx_buf_size, DMA_FROM_DEVICE);
if (priv->tx_desc_base) dma_free_coherent(&priv->pdev->dev, @@ -193,6 +188,7 @@ static int moxart_mac_open(struct net_device *ndev) static int moxart_mac_stop(struct net_device *ndev) { struct moxart_mac_priv_t *priv = netdev_priv(ndev); + int i;
napi_disable(&priv->napi);
@@ -204,6 +200,11 @@ static int moxart_mac_stop(struct net_device *ndev) /* disable all functions */ writel(0, priv->base + REG_MAC_CTRL);
+ /* unmap areas mapped in moxart_mac_setup_desc_ring() */ + for (i = 0; i < RX_DESC_NUM; i++) + dma_unmap_single(&priv->pdev->dev, priv->rx_mapping[i], + priv->rx_buf_size, DMA_FROM_DEVICE); + return 0; }
From: Jonathan Toppins jtoppins@redhat.com
stable inclusion from stable-v5.10.140 commit 7e7e88e8b5b4aba2076663b925c6aa00c435647b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]
This is caused by the global variable ad_ticks_per_sec being zero as demonstrated by the reproducer script discussed below. This causes all timer values in __ad_timer_to_ticks to be zero, resulting in the periodic timer to never fire.
To reproduce: Run the script in `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which puts bonding into a state where it never transmits LACPDUs.
line 44: ip link add fbond type bond mode 4 miimon 200 \ xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast setting bond param: ad_actor_sys_prio given: params.ad_actor_system = 0 call stack: bond_option_ad_actor_sys_prio() -> bond_3ad_update_ad_actor_settings() -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio -> ad.system.sys_mac_addr = bond->dev->dev_addr; because params.ad_actor_system == 0 results: ad.system.sys_mac_addr = bond->dev->dev_addr
line 48: ip link set fbond address 52:54:00:3B:7C:A6 setting bond MAC addr call stack: bond->dev->dev_addr = new_mac
line 52: ip link set fbond type bond ad_actor_sys_prio 65535 setting bond param: ad_actor_sys_prio given: params.ad_actor_system = 0 call stack: bond_option_ad_actor_sys_prio() -> bond_3ad_update_ad_actor_settings() -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio -> ad.system.sys_mac_addr = bond->dev->dev_addr; because params.ad_actor_system == 0 results: ad.system.sys_mac_addr = bond->dev->dev_addr
line 60: ip link set veth1-bond down master fbond given: params.ad_actor_system = 0 params.mode = BOND_MODE_8023AD ad.system.sys_mac_addr == bond->dev->dev_addr call stack: bond_enslave -> bond_3ad_initialize(); because first slave -> if ad.system.sys_mac_addr != bond->dev->dev_addr return results: Nothing is run in bond_3ad_initialize() because dev_addr equals sys_mac_addr leaving the global ad_ticks_per_sec zero as it is never initialized anywhere else.
The if check around the contents of bond_3ad_initialize() is no longer needed due to commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately") which sets ad.system.sys_mac_addr if any one of the bonding parameters whos set function calls bond_3ad_update_ad_actor_settings(). This is because if ad.system.sys_mac_addr is zero it will be set to the current bond mac address, this causes the if check to never be true.
Fixes: 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately") Signed-off-by: Jonathan Toppins jtoppins@redhat.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/bonding/bond_3ad.c | 38 ++++++++++++++-------------------- 1 file changed, 16 insertions(+), 22 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index f2af84ecd1a2..e351dc638a5c 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1997,30 +1997,24 @@ void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout) */ void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution) { - /* check that the bond is not initialized yet */ - if (!MAC_ADDRESS_EQUAL(&(BOND_AD_INFO(bond).system.sys_mac_addr), - bond->dev->dev_addr)) { - - BOND_AD_INFO(bond).aggregator_identifier = 0; - - BOND_AD_INFO(bond).system.sys_priority = - bond->params.ad_actor_sys_prio; - if (is_zero_ether_addr(bond->params.ad_actor_system)) - BOND_AD_INFO(bond).system.sys_mac_addr = - *((struct mac_addr *)bond->dev->dev_addr); - else - BOND_AD_INFO(bond).system.sys_mac_addr = - *((struct mac_addr *)bond->params.ad_actor_system); + BOND_AD_INFO(bond).aggregator_identifier = 0; + BOND_AD_INFO(bond).system.sys_priority = + bond->params.ad_actor_sys_prio; + if (is_zero_ether_addr(bond->params.ad_actor_system)) + BOND_AD_INFO(bond).system.sys_mac_addr = + *((struct mac_addr *)bond->dev->dev_addr); + else + BOND_AD_INFO(bond).system.sys_mac_addr = + *((struct mac_addr *)bond->params.ad_actor_system);
- /* initialize how many times this module is called in one - * second (should be about every 100ms) - */ - ad_ticks_per_sec = tick_resolution; + /* initialize how many times this module is called in one + * second (should be about every 100ms) + */ + ad_ticks_per_sec = tick_resolution;
- bond_3ad_initiate_agg_selection(bond, - AD_AGGREGATOR_SELECTION_TIMER * - ad_ticks_per_sec); - } + bond_3ad_initiate_agg_selection(bond, + AD_AGGREGATOR_SELECTION_TIMER * + ad_ticks_per_sec); }
/**
From: Maciej Żenczykowski maze@google.com
stable inclusion from stable-v5.10.140 commit f82a6b85e0ae5f57f6b8dd5cc3a00650de0b6bcf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 4b2e3a17e9f279325712b79fb01d1493f9e3e005 ]
Looks to have been left out in an oversight.
Cc: Mahesh Bandewar maheshb@google.com Cc: Sainath Grandhi sainath.grandhi@intel.com Fixes: 235a9d89da97 ('ipvtap: IP-VLAN based tap driver') Signed-off-by: Maciej Żenczykowski maze@google.com Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ipvlan/ipvtap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c index 1cedb634f4f7..f01078b2581c 100644 --- a/drivers/net/ipvlan/ipvtap.c +++ b/drivers/net/ipvlan/ipvtap.c @@ -194,7 +194,7 @@ static struct notifier_block ipvtap_notifier_block __read_mostly = { .notifier_call = ipvtap_device_event, };
-static int ipvtap_init(void) +static int __init ipvtap_init(void) { int err;
@@ -228,7 +228,7 @@ static int ipvtap_init(void) } module_init(ipvtap_init);
-static void ipvtap_exit(void) +static void __exit ipvtap_exit(void) { rtnl_link_unregister(&ipvtap_link_ops); unregister_netdevice_notifier(&ipvtap_notifier_block);
From: Florian Westphal fw@strlen.de
stable inclusion from stable-v5.10.140 commit 624c30521233e110cf50ba01980a591e045036ae category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7997eff82828304b780dc0a39707e1946d6f1ebf ]
Harshit Mogalapalli says: In ebt_do_table() function dereferencing 'private->hook_entry[hook]' can lead to NULL pointer dereference. [..] Kernel panic:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [..] RIP: 0010:ebt_do_table+0x1dc/0x1ce0 Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88 [..] Call Trace: nf_hook_slow+0xb1/0x170 __br_forward+0x289/0x730 maybe_deliver+0x24b/0x380 br_flood+0xc6/0x390 br_dev_xmit+0xa2e/0x12c0
For some reason ebtables rejects blobs that provide entry points that are not supported by the table, but what it should instead reject is the opposite: blobs that DO NOT provide an entry point supported by the table.
t->valid_hooks is the bitmask of hooks (input, forward ...) that will see packets. Providing an entry point that is not support is harmless (never called/used), but the inverse isn't: it results in a crash because the ebtables traverser doesn't expect a NULL blob for a location its receiving packets for.
Instead of fixing all the individual checks, do what iptables is doing and reject all blobs that differ from the expected hooks.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/linux/netfilter_bridge/ebtables.h | 4 ---- net/bridge/netfilter/ebtable_broute.c | 8 -------- net/bridge/netfilter/ebtable_filter.c | 8 -------- net/bridge/netfilter/ebtable_nat.c | 8 -------- net/bridge/netfilter/ebtables.c | 8 +------- 5 files changed, 1 insertion(+), 35 deletions(-)
diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h index 3a956145a25c..a18fb73a2b77 100644 --- a/include/linux/netfilter_bridge/ebtables.h +++ b/include/linux/netfilter_bridge/ebtables.h @@ -94,10 +94,6 @@ struct ebt_table { struct ebt_replace_kernel *table; unsigned int valid_hooks; rwlock_t lock; - /* e.g. could be the table explicitly only allows certain - * matches, targets, ... 0 == let it in */ - int (*check)(const struct ebt_table_info *info, - unsigned int valid_hooks); /* the data used by the kernel */ struct ebt_table_info *private; struct module *me; diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c index 32bc2821027f..57f91efce0f7 100644 --- a/net/bridge/netfilter/ebtable_broute.c +++ b/net/bridge/netfilter/ebtable_broute.c @@ -36,18 +36,10 @@ static struct ebt_replace_kernel initial_table = { .entries = (char *)&initial_chain, };
-static int check(const struct ebt_table_info *info, unsigned int valid_hooks) -{ - if (valid_hooks & ~(1 << NF_BR_BROUTING)) - return -EINVAL; - return 0; -} - static const struct ebt_table broute_table = { .name = "broute", .table = &initial_table, .valid_hooks = 1 << NF_BR_BROUTING, - .check = check, .me = THIS_MODULE, };
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c index bcf982e12f16..7f2e620f4978 100644 --- a/net/bridge/netfilter/ebtable_filter.c +++ b/net/bridge/netfilter/ebtable_filter.c @@ -43,18 +43,10 @@ static struct ebt_replace_kernel initial_table = { .entries = (char *)initial_chains, };
-static int check(const struct ebt_table_info *info, unsigned int valid_hooks) -{ - if (valid_hooks & ~FILTER_VALID_HOOKS) - return -EINVAL; - return 0; -} - static const struct ebt_table frame_filter = { .name = "filter", .table = &initial_table, .valid_hooks = FILTER_VALID_HOOKS, - .check = check, .me = THIS_MODULE, };
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c index 0d092773f816..1743a105485c 100644 --- a/net/bridge/netfilter/ebtable_nat.c +++ b/net/bridge/netfilter/ebtable_nat.c @@ -43,18 +43,10 @@ static struct ebt_replace_kernel initial_table = { .entries = (char *)initial_chains, };
-static int check(const struct ebt_table_info *info, unsigned int valid_hooks) -{ - if (valid_hooks & ~NAT_VALID_HOOKS) - return -EINVAL; - return 0; -} - static const struct ebt_table frame_nat = { .name = "nat", .table = &initial_table, .valid_hooks = NAT_VALID_HOOKS, - .check = check, .me = THIS_MODULE, };
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index d481ff24a150..310740cc684a 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -999,8 +999,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl, goto free_iterate; }
- /* the table doesn't like it */ - if (t->check && (ret = t->check(newinfo, repl->valid_hooks))) + if (repl->valid_hooks != t->valid_hooks) goto free_unlock;
if (repl->num_counters && repl->num_counters != t->private->nentries) { @@ -1186,11 +1185,6 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table, if (ret != 0) goto free_chainstack;
- if (table->check && table->check(newinfo, table->valid_hooks)) { - ret = -EINVAL; - goto free_chainstack; - } - table->private = newinfo; rwlock_init(&table->lock); mutex_lock(&ebt_mutex);
From: Vikas Gupta vikas.gupta@broadcom.com
stable inclusion from stable-v5.10.140 commit e0f8cf01927d334cd1095f2d8a20b75f6bcd570c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 09a89cc59ad67794a11e1d3dd13c5b3172adcc51 ]
There are 2 issues:
1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs with the number of NQs assigned to the VFs. The IRQs are fixed on each function and cannot be re-assigned. Only the NQs are being assigned to the VFs.
2. vf_msix is the total number of NQs to be assigned to the VFs. So we should decrement vf_msix from hw_resc->max_nqs.
Fixes: b16b68918674 ("bnxt_en: Add SR-IOV support for 57500 chips.") Signed-off-by: Vikas Gupta vikas.gupta@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c index 23b80aa171dd..819f9df9425c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c @@ -599,7 +599,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs, bool reset) hw_resc->max_stat_ctxs -= le16_to_cpu(req.min_stat_ctx) * n; hw_resc->max_vnics -= le16_to_cpu(req.min_vnics) * n; if (bp->flags & BNXT_FLAG_CHIP_P5) - hw_resc->max_irqs -= vf_msix * n; + hw_resc->max_nqs -= vf_msix;
rc = pf->active_vfs; }
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 93a46d6c72b18be6c91b1d11f20310fec5daed37 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 94254f990c07e9ddf1634e0b727fab821c3b5bf9 ]
Instead of offset and length are truncation to u8, report ERANGE.
Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nft_payload.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 01878c16418c..bdb07362c0ef 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -733,6 +733,7 @@ nft_payload_select_ops(const struct nft_ctx *ctx, { enum nft_payload_bases base; unsigned int offset, len; + int err;
if (tb[NFTA_PAYLOAD_BASE] == NULL || tb[NFTA_PAYLOAD_OFFSET] == NULL || @@ -758,8 +759,13 @@ nft_payload_select_ops(const struct nft_ctx *ctx, if (tb[NFTA_PAYLOAD_DREG] == NULL) return ERR_PTR(-EINVAL);
- offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); - len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U8_MAX, &offset); + if (err < 0) + return ERR_PTR(err); + + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_LEN], U8_MAX, &len); + if (err < 0) + return ERR_PTR(err);
if (len <= 4 && is_power_of_2(len) && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER)
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit ea358cfc8e25255b15305f14297e09e1defea4ca category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7044ab281febae9e2fa9b0b247693d6026166293 ]
Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support.
Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nft_payload.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index bdb07362c0ef..551e0d6cf63d 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -660,17 +660,23 @@ static int nft_payload_set_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_payload_set *priv = nft_expr_priv(expr); + u32 csum_offset, csum_type = NFT_PAYLOAD_CSUM_NONE; + int err;
priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE])); priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
if (tb[NFTA_PAYLOAD_CSUM_TYPE]) - priv->csum_type = - ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE])); - if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) - priv->csum_offset = - ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_OFFSET])); + csum_type = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE])); + if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) { + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_CSUM_OFFSET], U8_MAX, + &csum_offset); + if (err < 0) + return err; + + priv->csum_offset = csum_offset; + } if (tb[NFTA_PAYLOAD_CSUM_FLAGS]) { u32 flags;
@@ -681,7 +687,7 @@ static int nft_payload_set_init(const struct nft_ctx *ctx, priv->csum_flags = flags; }
- switch (priv->csum_type) { + switch (csum_type) { case NFT_PAYLOAD_CSUM_NONE: case NFT_PAYLOAD_CSUM_INET: break; @@ -695,6 +701,7 @@ static int nft_payload_set_init(const struct nft_ctx *ctx, default: return -EOPNOTSUPP; } + priv->csum_type = csum_type;
return nft_parse_register_load(tb[NFTA_PAYLOAD_SREG], &priv->sreg, priv->len);
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit c907dfe4eaca9665694a0340de1458a093abe354 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 43eb8949cfdffa764b92bc6c54b87cbe5b0003fe ]
Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created.
Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nf_tables_api.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 6ad420957c54..7d3116c876e1 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1999,9 +1999,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, u8 policy, u32 flags) { const struct nlattr * const *nla = ctx->nla; + struct nft_stats __percpu *stats = NULL; struct nft_table *table = ctx->table; struct nft_base_chain *basechain; - struct nft_stats __percpu *stats; struct net *net = ctx->net; char name[NFT_NAME_MAXLEN]; struct nft_trans *trans; @@ -2037,7 +2037,6 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, return PTR_ERR(stats); } rcu_assign_pointer(basechain->stats, stats); - static_branch_inc(&nft_counters_enabled); }
err = nft_basechain_init(basechain, family, &hook, flags); @@ -2120,6 +2119,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, goto err_unregister_hook; }
+ if (stats) + static_branch_inc(&nft_counters_enabled); + table->use++;
return 0;
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 9a67c2c89c32bca7da6082c6b174457f8521121a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 5f3b7aae14a706d0d7da9f9e39def52ff5fc3d39 ]
As it was originally intended, restrict extension to supported families.
Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nft_osf.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c index d82677e83400..720dc9fba6d4 100644 --- a/net/netfilter/nft_osf.c +++ b/net/netfilter/nft_osf.c @@ -115,9 +115,21 @@ static int nft_osf_validate(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nft_data **data) { - return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) | - (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_FORWARD)); + unsigned int hooks; + + switch (ctx->family) { + case NFPROTO_IPV4: + case NFPROTO_IPV6: + case NFPROTO_INET: + hooks = (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_FORWARD); + break; + default: + return -EOPNOTSUPP; + } + + return nft_chain_validate_hooks(ctx->chain, hooks); }
static struct nft_expr_type nft_osf_type;
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 35519ce7bac9138b00126817c7ddb8f4ebdbc066 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 01e4092d53bc4fe122a6e4b6d664adbd57528ca3 ]
Only allow to use this expression from NFPROTO_NETDEV family.
Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nft_tunnel.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c index 3b27926d5382..2ee50996da8c 100644 --- a/net/netfilter/nft_tunnel.c +++ b/net/netfilter/nft_tunnel.c @@ -133,6 +133,7 @@ static const struct nft_expr_ops nft_tunnel_get_ops = {
static struct nft_expr_type nft_tunnel_type __read_mostly = { .name = "tunnel", + .family = NFPROTO_NETDEV, .ops = &nft_tunnel_get_ops, .policy = nft_tunnel_policy, .maxattr = NFTA_TUNNEL_MAX,
From: Colin Ian King colin.king@canonical.com
stable inclusion from stable-v5.10.140 commit cd962806c4493eec8a0da72d1df227dac28d3e15 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 626899a02e6afcd4b2ce5c0551092e3554cec4aa ]
The variable err is being assigned a value that is never read, the same error number is being returned at the error return path via label err1. Clean up the code by removing the assignment.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nft_cmp.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c index b529c0e86546..47b6d05f1ae6 100644 --- a/net/netfilter/nft_cmp.c +++ b/net/netfilter/nft_cmp.c @@ -303,10 +303,8 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) if (err < 0) return ERR_PTR(err);
- if (desc.type != NFT_DATA_VALUE) { - err = -EINVAL; + if (desc.type != NFT_DATA_VALUE) goto err1; - }
if (desc.len <= sizeof(u32) && (op == NFT_CMP_EQ || op == NFT_CMP_NEQ)) return &nft_cmp_fast_ops;
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 1d7d74a8240e5d4505f5f3f67417bb7334230c11 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 4765473fefd4403b5eeca371637065b561522c50 ]
Add function to consolidate verdict tracing.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nf_tables_core.c | 39 ++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index a61b5bf5aa0f..6dd27c8cd425 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -67,6 +67,36 @@ static void nft_cmp_fast_eval(const struct nft_expr *expr, regs->verdict.code = NFT_BREAK; }
+static noinline void __nft_trace_verdict(struct nft_traceinfo *info, + const struct nft_chain *chain, + const struct nft_regs *regs) +{ + enum nft_trace_types type; + + switch (regs->verdict.code) { + case NFT_CONTINUE: + case NFT_RETURN: + type = NFT_TRACETYPE_RETURN; + break; + default: + type = NFT_TRACETYPE_RULE; + break; + } + + __nft_trace_packet(info, chain, type); +} + +static inline void nft_trace_verdict(struct nft_traceinfo *info, + const struct nft_chain *chain, + const struct nft_rule *rule, + const struct nft_regs *regs) +{ + if (static_branch_unlikely(&nft_trace_enabled)) { + info->rule = rule; + __nft_trace_verdict(info, chain, regs); + } +} + static bool nft_payload_fast_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) @@ -207,13 +237,13 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) break; }
+ nft_trace_verdict(&info, chain, rule, ®s); + switch (regs.verdict.code & NF_VERDICT_MASK) { case NF_ACCEPT: case NF_DROP: case NF_QUEUE: case NF_STOLEN: - nft_trace_packet(&info, chain, rule, - NFT_TRACETYPE_RULE); return regs.verdict.code; }
@@ -226,15 +256,10 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) stackptr++; fallthrough; case NFT_GOTO: - nft_trace_packet(&info, chain, rule, - NFT_TRACETYPE_RULE); - chain = regs.verdict.chain; goto do_chain; case NFT_CONTINUE: case NFT_RETURN: - nft_trace_packet(&info, chain, rule, - NFT_TRACETYPE_RETURN); break; default: WARN_ON(1);
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 2267d38520c4aa74c9110c9ef2f6619aebc8446d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 23f68d462984bfda47c7bf663dca347e8e3df549 ]
Allow up to 16-byte comparisons with a new cmp fast version. Use two 64-bit words and calculate the mask representing the bits to be compared. Make sure the comparison is 64-bit aligned and avoid out-of-bound memory access on registers.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/net/netfilter/nf_tables_core.h | 9 +++ net/netfilter/nf_tables_core.c | 16 ++++ net/netfilter/nft_cmp.c | 102 ++++++++++++++++++++++++- 3 files changed, 125 insertions(+), 2 deletions(-)
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index fd10a7862fdc..ce75121782bf 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -38,6 +38,14 @@ struct nft_cmp_fast_expr { bool inv; };
+struct nft_cmp16_fast_expr { + struct nft_data data; + struct nft_data mask; + u8 sreg; + u8 len; + bool inv; +}; + struct nft_immediate_expr { struct nft_data data; u8 dreg; @@ -55,6 +63,7 @@ static inline u32 nft_cmp_fast_mask(unsigned int len) }
extern const struct nft_expr_ops nft_cmp_fast_ops; +extern const struct nft_expr_ops nft_cmp16_fast_ops;
struct nft_payload { enum nft_payload_bases base:8; diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 6dd27c8cd425..9dc18429ed87 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -67,6 +67,20 @@ static void nft_cmp_fast_eval(const struct nft_expr *expr, regs->verdict.code = NFT_BREAK; }
+static void nft_cmp16_fast_eval(const struct nft_expr *expr, + struct nft_regs *regs) +{ + const struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr); + const u64 *reg_data = (const u64 *)®s->data[priv->sreg]; + const u64 *mask = (const u64 *)&priv->mask; + const u64 *data = (const u64 *)&priv->data; + + if (((reg_data[0] & mask[0]) == data[0] && + ((reg_data[1] & mask[1]) == data[1])) ^ priv->inv) + return; + regs->verdict.code = NFT_BREAK; +} + static noinline void __nft_trace_verdict(struct nft_traceinfo *info, const struct nft_chain *chain, const struct nft_regs *regs) @@ -215,6 +229,8 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) nft_rule_for_each_expr(expr, last, rule) { if (expr->ops == &nft_cmp_fast_ops) nft_cmp_fast_eval(expr, ®s); + else if (expr->ops == &nft_cmp16_fast_ops) + nft_cmp16_fast_eval(expr, ®s); else if (expr->ops == &nft_bitwise_fast_ops) nft_bitwise_fast_eval(expr, ®s); else if (expr->ops != &nft_payload_fast_ops || diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c index 47b6d05f1ae6..917072af09df 100644 --- a/net/netfilter/nft_cmp.c +++ b/net/netfilter/nft_cmp.c @@ -272,12 +272,103 @@ const struct nft_expr_ops nft_cmp_fast_ops = { .offload = nft_cmp_fast_offload, };
+static u32 nft_cmp_mask(u32 bitlen) +{ + return (__force u32)cpu_to_le32(~0U >> (sizeof(u32) * BITS_PER_BYTE - bitlen)); +} + +static void nft_cmp16_fast_mask(struct nft_data *data, unsigned int bitlen) +{ + int len = bitlen / BITS_PER_BYTE; + int i, words = len / sizeof(u32); + + for (i = 0; i < words; i++) { + data->data[i] = 0xffffffff; + bitlen -= sizeof(u32) * BITS_PER_BYTE; + } + + if (len % sizeof(u32)) + data->data[i++] = nft_cmp_mask(bitlen); + + for (; i < 4; i++) + data->data[i] = 0; +} + +static int nft_cmp16_fast_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr); + struct nft_data_desc desc; + int err; + + err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &desc, + tb[NFTA_CMP_DATA]); + if (err < 0) + return err; + + err = nft_parse_register_load(tb[NFTA_CMP_SREG], &priv->sreg, desc.len); + if (err < 0) + return err; + + nft_cmp16_fast_mask(&priv->mask, desc.len * BITS_PER_BYTE); + priv->inv = ntohl(nla_get_be32(tb[NFTA_CMP_OP])) != NFT_CMP_EQ; + priv->len = desc.len; + + return 0; +} + +static int nft_cmp16_fast_offload(struct nft_offload_ctx *ctx, + struct nft_flow_rule *flow, + const struct nft_expr *expr) +{ + const struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr); + struct nft_cmp_expr cmp = { + .data = priv->data, + .sreg = priv->sreg, + .len = priv->len, + .op = priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ, + }; + + return __nft_cmp_offload(ctx, flow, &cmp); +} + +static int nft_cmp16_fast_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr); + enum nft_cmp_ops op = priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ; + + if (nft_dump_register(skb, NFTA_CMP_SREG, priv->sreg)) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_CMP_OP, htonl(op))) + goto nla_put_failure; + + if (nft_data_dump(skb, NFTA_CMP_DATA, &priv->data, + NFT_DATA_VALUE, priv->len) < 0) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + + +const struct nft_expr_ops nft_cmp16_fast_ops = { + .type = &nft_cmp_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_cmp16_fast_expr)), + .eval = NULL, /* inlined */ + .init = nft_cmp16_fast_init, + .dump = nft_cmp16_fast_dump, + .offload = nft_cmp16_fast_offload, +}; + static const struct nft_expr_ops * nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_data_desc desc; struct nft_data data; enum nft_cmp_ops op; + u8 sreg; int err;
if (tb[NFTA_CMP_SREG] == NULL || @@ -306,9 +397,16 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) if (desc.type != NFT_DATA_VALUE) goto err1;
- if (desc.len <= sizeof(u32) && (op == NFT_CMP_EQ || op == NFT_CMP_NEQ)) - return &nft_cmp_fast_ops; + sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG]));
+ if (op == NFT_CMP_EQ || op == NFT_CMP_NEQ) { + if (desc.len <= sizeof(u32)) + return &nft_cmp_fast_ops; + else if (desc.len <= sizeof(data) && + ((sreg >= NFT_REG_1 && sreg <= NFT_REG_4) || + (sreg >= NFT_REG32_00 && sreg <= NFT_REG32_12 && sreg % 2 == 0))) + return &nft_cmp16_fast_ops; + } return &nft_cmp_ops; err1: nft_data_release(&data, desc.type);
From: Jeremy Sowden jeremy@azazel.net
stable inclusion from stable-v5.10.140 commit 8790eecdea019fe9a94e69271dc1633dd26c9505 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 00bd435208e5201eb935d273052930bd3b272b6f ]
Replace two labels (`err1` and `err2`) with more informative ones.
Signed-off-by: Jeremy Sowden jeremy@azazel.net Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/netfilter/nft_bitwise.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c index 47b0dba95054..d0c648b64cd4 100644 --- a/net/netfilter/nft_bitwise.c +++ b/net/netfilter/nft_bitwise.c @@ -109,22 +109,23 @@ static int nft_bitwise_init_bool(struct nft_bitwise *priv, return err; if (mask.type != NFT_DATA_VALUE || mask.len != priv->len) { err = -EINVAL; - goto err1; + goto err_mask_release; }
err = nft_data_init(NULL, &priv->xor, sizeof(priv->xor), &xor, tb[NFTA_BITWISE_XOR]); if (err < 0) - goto err1; + goto err_mask_release; if (xor.type != NFT_DATA_VALUE || xor.len != priv->len) { err = -EINVAL; - goto err2; + goto err_xor_release; }
return 0; -err2: + +err_xor_release: nft_data_release(&priv->xor, xor.type); -err1: +err_mask_release: nft_data_release(&priv->mask, mask.type); return err; }
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 98827687593b579f20feb60c7ce3ac8a6fbb5f2e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 341b6941608762d8235f3fd1e45e4d7114ed8c2c ]
Instead of parsing the data and then validate that type and length are correct, pass a description of the expected data so it can be validated upfront before parsing it to bail out earlier.
This patch adds a new .size field to specify the maximum size of the data area. The .len field is optional and it is used as an input/output field, it provides the specific length of the expected data in the input path. If then .len field is not specified, then obtained length from the netlink attribute is stored. This is required by cmp, bitwise, range and immediate, which provide no netlink attribute that describes the data length. The immediate expression uses the destination register type to infer the expected data type.
Relying on opencoded validation of the expected data might lead to subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables: stricter validation of element data").
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/net/netfilter/nf_tables.h | 4 +- net/netfilter/nf_tables_api.c | 78 ++++++++++++++++--------------- net/netfilter/nft_bitwise.c | 66 +++++++++++++------------- net/netfilter/nft_cmp.c | 44 ++++++++--------- net/netfilter/nft_immediate.c | 22 +++++++-- net/netfilter/nft_range.c | 27 +++++------ 6 files changed, 126 insertions(+), 115 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index b9948e7861f2..6c062b2509b9 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -178,11 +178,11 @@ struct nft_ctx {
struct nft_data_desc { enum nft_data_types type; + unsigned int size; unsigned int len; };
-int nft_data_init(const struct nft_ctx *ctx, - struct nft_data *data, unsigned int size, +int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data, struct nft_data_desc *desc, const struct nlattr *nla); void nft_data_hold(const struct nft_data *data, enum nft_data_types type); void nft_data_release(const struct nft_data *data, enum nft_data_types type); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 7d3116c876e1..ca53378ad5fa 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4841,19 +4841,13 @@ static int nft_setelem_parse_flags(const struct nft_set *set, static int nft_setelem_parse_key(struct nft_ctx *ctx, struct nft_set *set, struct nft_data *key, struct nlattr *attr) { - struct nft_data_desc desc; - int err; - - err = nft_data_init(ctx, key, NFT_DATA_VALUE_MAXLEN, &desc, attr); - if (err < 0) - return err; - - if (desc.type != NFT_DATA_VALUE || desc.len != set->klen) { - nft_data_release(key, desc.type); - return -EINVAL; - } + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = NFT_DATA_VALUE_MAXLEN, + .len = set->klen, + };
- return 0; + return nft_data_init(ctx, key, &desc, attr); }
static int nft_setelem_parse_data(struct nft_ctx *ctx, struct nft_set *set, @@ -4862,24 +4856,17 @@ static int nft_setelem_parse_data(struct nft_ctx *ctx, struct nft_set *set, struct nlattr *attr) { u32 dtype; - int err; - - err = nft_data_init(ctx, data, NFT_DATA_VALUE_MAXLEN, desc, attr); - if (err < 0) - return err;
if (set->dtype == NFT_DATA_VERDICT) dtype = NFT_DATA_VERDICT; else dtype = NFT_DATA_VALUE;
- if (dtype != desc->type || - set->dlen != desc->len) { - nft_data_release(data, desc->type); - return -EINVAL; - } + desc->type = dtype; + desc->size = NFT_DATA_VALUE_MAXLEN; + desc->len = set->dlen;
- return 0; + return nft_data_init(ctx, data, desc, attr); }
static int nft_get_set_elem(struct nft_ctx *ctx, struct nft_set *set, @@ -8699,7 +8686,7 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, }
desc->len = sizeof(data->verdict); - desc->type = NFT_DATA_VERDICT; + return 0; }
@@ -8752,20 +8739,25 @@ int nft_verdict_dump(struct sk_buff *skb, int type, const struct nft_verdict *v) }
static int nft_value_init(const struct nft_ctx *ctx, - struct nft_data *data, unsigned int size, - struct nft_data_desc *desc, const struct nlattr *nla) + struct nft_data *data, struct nft_data_desc *desc, + const struct nlattr *nla) { unsigned int len;
len = nla_len(nla); if (len == 0) return -EINVAL; - if (len > size) + if (len > desc->size) return -EOVERFLOW; + if (desc->len) { + if (len != desc->len) + return -EINVAL; + } else { + desc->len = len; + }
nla_memcpy(data->data, nla, len); - desc->type = NFT_DATA_VALUE; - desc->len = len; + return 0; }
@@ -8785,7 +8777,6 @@ static const struct nla_policy nft_data_policy[NFTA_DATA_MAX + 1] = { * * @ctx: context of the expression using the data * @data: destination struct nft_data - * @size: maximum data length * @desc: data description * @nla: netlink attribute containing data * @@ -8795,24 +8786,35 @@ static const struct nla_policy nft_data_policy[NFTA_DATA_MAX + 1] = { * The caller can indicate that it only wants to accept data of type * NFT_DATA_VALUE by passing NULL for the ctx argument. */ -int nft_data_init(const struct nft_ctx *ctx, - struct nft_data *data, unsigned int size, +int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data, struct nft_data_desc *desc, const struct nlattr *nla) { struct nlattr *tb[NFTA_DATA_MAX + 1]; int err;
+ if (WARN_ON_ONCE(!desc->size)) + return -EINVAL; + err = nla_parse_nested_deprecated(tb, NFTA_DATA_MAX, nla, nft_data_policy, NULL); if (err < 0) return err;
- if (tb[NFTA_DATA_VALUE]) - return nft_value_init(ctx, data, size, desc, - tb[NFTA_DATA_VALUE]); - if (tb[NFTA_DATA_VERDICT] && ctx != NULL) - return nft_verdict_init(ctx, data, desc, tb[NFTA_DATA_VERDICT]); - return -EINVAL; + if (tb[NFTA_DATA_VALUE]) { + if (desc->type != NFT_DATA_VALUE) + return -EINVAL; + + err = nft_value_init(ctx, data, desc, tb[NFTA_DATA_VALUE]); + } else if (tb[NFTA_DATA_VERDICT] && ctx != NULL) { + if (desc->type != NFT_DATA_VERDICT) + return -EINVAL; + + err = nft_verdict_init(ctx, data, desc, tb[NFTA_DATA_VERDICT]); + } else { + err = -EINVAL; + } + + return err; } EXPORT_SYMBOL_GPL(nft_data_init);
diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c index d0c648b64cd4..d6ab7aa14adc 100644 --- a/net/netfilter/nft_bitwise.c +++ b/net/netfilter/nft_bitwise.c @@ -93,7 +93,16 @@ static const struct nla_policy nft_bitwise_policy[NFTA_BITWISE_MAX + 1] = { static int nft_bitwise_init_bool(struct nft_bitwise *priv, const struct nlattr *const tb[]) { - struct nft_data_desc mask, xor; + struct nft_data_desc mask = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->mask), + .len = priv->len, + }; + struct nft_data_desc xor = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->xor), + .len = priv->len, + }; int err;
if (tb[NFTA_BITWISE_DATA]) @@ -103,37 +112,30 @@ static int nft_bitwise_init_bool(struct nft_bitwise *priv, !tb[NFTA_BITWISE_XOR]) return -EINVAL;
- err = nft_data_init(NULL, &priv->mask, sizeof(priv->mask), &mask, - tb[NFTA_BITWISE_MASK]); + err = nft_data_init(NULL, &priv->mask, &mask, tb[NFTA_BITWISE_MASK]); if (err < 0) return err; - if (mask.type != NFT_DATA_VALUE || mask.len != priv->len) { - err = -EINVAL; - goto err_mask_release; - }
- err = nft_data_init(NULL, &priv->xor, sizeof(priv->xor), &xor, - tb[NFTA_BITWISE_XOR]); + err = nft_data_init(NULL, &priv->xor, &xor, tb[NFTA_BITWISE_XOR]); if (err < 0) - goto err_mask_release; - if (xor.type != NFT_DATA_VALUE || xor.len != priv->len) { - err = -EINVAL; - goto err_xor_release; - } + goto err_xor_err;
return 0;
-err_xor_release: - nft_data_release(&priv->xor, xor.type); -err_mask_release: +err_xor_err: nft_data_release(&priv->mask, mask.type); + return err; }
static int nft_bitwise_init_shift(struct nft_bitwise *priv, const struct nlattr *const tb[]) { - struct nft_data_desc d; + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->data), + .len = sizeof(u32), + }; int err;
if (tb[NFTA_BITWISE_MASK] || @@ -143,13 +145,12 @@ static int nft_bitwise_init_shift(struct nft_bitwise *priv, if (!tb[NFTA_BITWISE_DATA]) return -EINVAL;
- err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &d, - tb[NFTA_BITWISE_DATA]); + err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_BITWISE_DATA]); if (err < 0) return err; - if (d.type != NFT_DATA_VALUE || d.len != sizeof(u32) || - priv->data.data[0] >= BITS_PER_TYPE(u32)) { - nft_data_release(&priv->data, d.type); + + if (priv->data.data[0] >= BITS_PER_TYPE(u32)) { + nft_data_release(&priv->data, desc.type); return -EINVAL; }
@@ -291,22 +292,21 @@ static const struct nft_expr_ops nft_bitwise_ops = { static int nft_bitwise_extract_u32_data(const struct nlattr * const tb, u32 *out) { - struct nft_data_desc desc; struct nft_data data; - int err = 0; + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = sizeof(data), + .len = sizeof(u32), + }; + int err;
- err = nft_data_init(NULL, &data, sizeof(data), &desc, tb); + err = nft_data_init(NULL, &data, &desc, tb); if (err < 0) return err;
- if (desc.type != NFT_DATA_VALUE || desc.len != sizeof(u32)) { - err = -EINVAL; - goto err; - } *out = data.data[0]; -err: - nft_data_release(&data, desc.type); - return err; + + return 0; }
static int nft_bitwise_fast_init(const struct nft_ctx *ctx, diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c index 917072af09df..461763a571f2 100644 --- a/net/netfilter/nft_cmp.c +++ b/net/netfilter/nft_cmp.c @@ -73,20 +73,16 @@ static int nft_cmp_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_cmp_expr *priv = nft_expr_priv(expr); - struct nft_data_desc desc; + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->data), + }; int err;
- err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &desc, - tb[NFTA_CMP_DATA]); + err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_CMP_DATA]); if (err < 0) return err;
- if (desc.type != NFT_DATA_VALUE) { - err = -EINVAL; - nft_data_release(&priv->data, desc.type); - return err; - } - err = nft_parse_register_load(tb[NFTA_CMP_SREG], &priv->sreg, desc.len); if (err < 0) return err; @@ -201,12 +197,14 @@ static int nft_cmp_fast_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_cmp_fast_expr *priv = nft_expr_priv(expr); - struct nft_data_desc desc; struct nft_data data; + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = sizeof(data), + }; int err;
- err = nft_data_init(NULL, &data, sizeof(data), &desc, - tb[NFTA_CMP_DATA]); + err = nft_data_init(NULL, &data, &desc, tb[NFTA_CMP_DATA]); if (err < 0) return err;
@@ -299,11 +297,13 @@ static int nft_cmp16_fast_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr); - struct nft_data_desc desc; + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->data), + }; int err;
- err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &desc, - tb[NFTA_CMP_DATA]); + err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_CMP_DATA]); if (err < 0) return err;
@@ -365,8 +365,11 @@ const struct nft_expr_ops nft_cmp16_fast_ops = { static const struct nft_expr_ops * nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { - struct nft_data_desc desc; struct nft_data data; + struct nft_data_desc desc = { + .type = NFT_DATA_VALUE, + .size = sizeof(data), + }; enum nft_cmp_ops op; u8 sreg; int err; @@ -389,14 +392,10 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) return ERR_PTR(-EINVAL); }
- err = nft_data_init(NULL, &data, sizeof(data), &desc, - tb[NFTA_CMP_DATA]); + err = nft_data_init(NULL, &data, &desc, tb[NFTA_CMP_DATA]); if (err < 0) return ERR_PTR(err);
- if (desc.type != NFT_DATA_VALUE) - goto err1; - sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG]));
if (op == NFT_CMP_EQ || op == NFT_CMP_NEQ) { @@ -408,9 +407,6 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) return &nft_cmp16_fast_ops; } return &nft_cmp_ops; -err1: - nft_data_release(&data, desc.type); - return ERR_PTR(-EINVAL); }
struct nft_expr_type nft_cmp_type __read_mostly = { diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c index d0f67d325bdf..fcdbc5ed3f36 100644 --- a/net/netfilter/nft_immediate.c +++ b/net/netfilter/nft_immediate.c @@ -29,20 +29,36 @@ static const struct nla_policy nft_immediate_policy[NFTA_IMMEDIATE_MAX + 1] = { [NFTA_IMMEDIATE_DATA] = { .type = NLA_NESTED }, };
+static enum nft_data_types nft_reg_to_type(const struct nlattr *nla) +{ + enum nft_data_types type; + u8 reg; + + reg = ntohl(nla_get_be32(nla)); + if (reg == NFT_REG_VERDICT) + type = NFT_DATA_VERDICT; + else + type = NFT_DATA_VALUE; + + return type; +} + static int nft_immediate_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_immediate_expr *priv = nft_expr_priv(expr); - struct nft_data_desc desc; + struct nft_data_desc desc = { + .size = sizeof(priv->data), + }; int err;
if (tb[NFTA_IMMEDIATE_DREG] == NULL || tb[NFTA_IMMEDIATE_DATA] == NULL) return -EINVAL;
- err = nft_data_init(ctx, &priv->data, sizeof(priv->data), &desc, - tb[NFTA_IMMEDIATE_DATA]); + desc.type = nft_reg_to_type(tb[NFTA_IMMEDIATE_DREG]); + err = nft_data_init(ctx, &priv->data, &desc, tb[NFTA_IMMEDIATE_DATA]); if (err < 0) return err;
diff --git a/net/netfilter/nft_range.c b/net/netfilter/nft_range.c index e4a1c44d7f51..e6bbe32c323d 100644 --- a/net/netfilter/nft_range.c +++ b/net/netfilter/nft_range.c @@ -51,7 +51,14 @@ static int nft_range_init(const struct nft_ctx *ctx, const struct nft_expr *expr const struct nlattr * const tb[]) { struct nft_range_expr *priv = nft_expr_priv(expr); - struct nft_data_desc desc_from, desc_to; + struct nft_data_desc desc_from = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->data_from), + }; + struct nft_data_desc desc_to = { + .type = NFT_DATA_VALUE, + .size = sizeof(priv->data_to), + }; int err; u32 op;
@@ -61,26 +68,16 @@ static int nft_range_init(const struct nft_ctx *ctx, const struct nft_expr *expr !tb[NFTA_RANGE_TO_DATA]) return -EINVAL;
- err = nft_data_init(NULL, &priv->data_from, sizeof(priv->data_from), - &desc_from, tb[NFTA_RANGE_FROM_DATA]); + err = nft_data_init(NULL, &priv->data_from, &desc_from, + tb[NFTA_RANGE_FROM_DATA]); if (err < 0) return err;
- if (desc_from.type != NFT_DATA_VALUE) { - err = -EINVAL; - goto err1; - } - - err = nft_data_init(NULL, &priv->data_to, sizeof(priv->data_to), - &desc_to, tb[NFTA_RANGE_TO_DATA]); + err = nft_data_init(NULL, &priv->data_to, &desc_to, + tb[NFTA_RANGE_TO_DATA]); if (err < 0) goto err1;
- if (desc_to.type != NFT_DATA_VALUE) { - err = -EINVAL; - goto err2; - } - if (desc_from.len != desc_to.len) { err = -EINVAL; goto err2;
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit 6301a73bd83d94b9b3eab8581adb04e40fb5f079 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f323ef3a0d49e147365284bc1f02212e617b7f09 ]
Extend struct nft_data_desc to add a flag field that specifies nft_data_init() is being called for set element data.
Use it to disallow jump to implicit chain from set element, only jump to chain via immediate expression is allowed.
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com
Conflicts: net/netfilter/nf_tables_api.c Reviewed-by: Wei Li liwei391@huawei.com --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nf_tables_api.c | 4 ++++ 2 files changed, 9 insertions(+)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 6c062b2509b9..e66fee99ed3e 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -176,10 +176,15 @@ struct nft_ctx { bool report; };
+enum nft_data_desc_flags { + NFT_DATA_DESC_SETELEM = (1 << 0), +}; + struct nft_data_desc { enum nft_data_types type; unsigned int size; unsigned int len; + unsigned int flags; };
int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data, diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index ca53378ad5fa..1b039476e4d6 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4865,6 +4865,7 @@ static int nft_setelem_parse_data(struct nft_ctx *ctx, struct nft_set *set, desc->type = dtype; desc->size = NFT_DATA_VALUE_MAXLEN; desc->len = set->dlen; + desc->flags = NFT_DATA_DESC_SETELEM;
return nft_data_init(ctx, data, desc, attr); } @@ -8679,6 +8680,9 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, return -EOPNOTSUPP; if (nft_chain_is_bound(chain)) return -EINVAL; + if (desc->flags & NFT_DATA_DESC_SETELEM && + chain->flags & NFT_CHAIN_BINDING) + return -EINVAL;
chain->use++; data->verdict.chain = chain;
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v5.10.140 commit e73a29554f0b7adfb9411a0587ac56703ed23e2c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 240bfd134c592791fdceba1ce7fc3f973c33df2d ]
tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh which has a direct impact on advertized window sizes.
We added TCP coalescing in linux-3.4 & linux-3.5:
Instead of storing skbs with one or two MSS in receive queue (or OFO queue), we try to append segments together to reduce memory overhead.
High performance network drivers tend to cook skb with 3 parts :
1) sk_buff structure (256 bytes) 2) skb->head contains room to copy headers as needed, and skb_shared_info 3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU)
Once coalesced into a previous skb, 1) and 2) are freed.
We can therefore tweak the way we compute len/truesize ratio knowing that skb->truesize is inflated by 1) and 2) soon to be freed.
This is done only for in-order skb, or skb coalesced into OFO queue.
The result is that low rate flows no longer pay the memory price of having low GRO aggregation factor. Same result for drivers not using GRO.
This is critical to allow a big enough receiver window, typically tcp_rmem[2] / 2.
We have been using this at Google for about 5 years, it is due time to make it upstream.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Soheil Hassas Yeganeh soheil@google.com Cc: Neal Cardwell ncardwell@google.com Cc: Yuchung Cheng ycheng@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/ipv4/tcp_input.c | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a98bf77960f1..4f6c5bfdb603 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -454,11 +454,12 @@ static void tcp_sndbuf_expand(struct sock *sk) */
/* Slow part of check#2. */ -static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb) +static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb, + unsigned int skbtruesize) { struct tcp_sock *tp = tcp_sk(sk); /* Optimize this! */ - int truesize = tcp_win_from_space(sk, skb->truesize) >> 1; + int truesize = tcp_win_from_space(sk, skbtruesize) >> 1; int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;
while (tp->rcv_ssthresh <= window) { @@ -471,7 +472,27 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb) return 0; }
-static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) +/* Even if skb appears to have a bad len/truesize ratio, TCP coalescing + * can play nice with us, as sk_buff and skb->head might be either + * freed or shared with up to MAX_SKB_FRAGS segments. + * Only give a boost to drivers using page frag(s) to hold the frame(s), + * and if no payload was pulled in skb->head before reaching us. + */ +static u32 truesize_adjust(bool adjust, const struct sk_buff *skb) +{ + u32 truesize = skb->truesize; + + if (adjust && !skb_headlen(skb)) { + truesize -= SKB_TRUESIZE(skb_end_offset(skb)); + /* paranoid check, some drivers might be buggy */ + if (unlikely((int)truesize < (int)skb->len)) + truesize = skb->truesize; + } + return truesize; +} + +static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, + bool adjust) { struct tcp_sock *tp = tcp_sk(sk); int room; @@ -480,15 +501,16 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb)
/* Check #1 */ if (room > 0 && !tcp_under_memory_pressure(sk)) { + unsigned int truesize = truesize_adjust(adjust, skb); int incr;
/* Check #2. Increase window, if skb with such overhead * will fit to rcvbuf in future. */ - if (tcp_win_from_space(sk, skb->truesize) <= skb->len) + if (tcp_win_from_space(sk, truesize) <= skb->len) incr = 2 * tp->advmss; else - incr = __tcp_grow_window(sk, skb); + incr = __tcp_grow_window(sk, skb, truesize);
if (incr) { incr = max_t(int, incr, 2 * skb->len); @@ -782,7 +804,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) tcp_ecn_check_ce(sk, skb);
if (skb->len >= 128) - tcp_grow_window(sk, skb); + tcp_grow_window(sk, skb, true); }
/* Called to compute a smoothed rtt estimate. The data fed to this @@ -4783,7 +4805,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) * and trigger fast retransmit. */ if (tcp_is_sack(tp)) - tcp_grow_window(sk, skb); + tcp_grow_window(sk, skb, true); kfree_skb_partial(skb, fragstolen); skb = NULL; goto add_sack; @@ -4871,7 +4893,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) * and trigger fast retransmit. */ if (tcp_is_sack(tp)) - tcp_grow_window(sk, skb); + tcp_grow_window(sk, skb, false); skb_condense(skb); skb_set_owner_r(skb, sk); }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 613fd026209e6f6ee69afaf45d1bec2a11b09fea category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 02739545951ad4c1215160db7fbf9b7a918d3c0b ]
While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers.
- .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/net/sock.h | 8 ++++---- net/decnet/af_decnet.c | 4 ++-- net/ipv4/tcp.c | 6 +++--- net/ipv4/tcp_input.c | 13 +++++++------ net/ipv4/tcp_output.c | 2 +- net/mptcp/protocol.c | 6 +++--- net/tipc/socket.c | 2 +- 7 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index 02dac64dcff6..f4de05072597 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2715,18 +2715,18 @@ static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ if (proto->sysctl_wmem_offset) - return *(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset); + return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset));
- return *proto->sysctl_wmem; + return READ_ONCE(*proto->sysctl_wmem); }
static inline int sk_get_rmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_rmem ? */ if (proto->sysctl_rmem_offset) - return *(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset); + return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset));
- return *proto->sysctl_rmem; + return READ_ONCE(*proto->sysctl_rmem); }
/* Default TCP Small queue budget is ~1 ms of data (1sec >> 10) diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index dc92a67baea3..7d542eb46172 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -480,8 +480,8 @@ static struct sock *dn_alloc_sock(struct net *net, struct socket *sock, gfp_t gf sk->sk_family = PF_DECnet; sk->sk_protocol = 0; sk->sk_allocation = gfp; - sk->sk_sndbuf = sysctl_decnet_wmem[1]; - sk->sk_rcvbuf = sysctl_decnet_rmem[1]; + sk->sk_sndbuf = READ_ONCE(sysctl_decnet_wmem[1]); + sk->sk_rcvbuf = READ_ONCE(sysctl_decnet_rmem[1]);
/* Initialization of DECnet Session Control Port */ scp = DN_SK(sk); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f1e6a819502f..3b217c428d90 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -455,8 +455,8 @@ void tcp_init_sock(struct sock *sk)
icsk->icsk_sync_mss = tcp_sync_mss;
- WRITE_ONCE(sk->sk_sndbuf, sock_net(sk)->ipv4.sysctl_tcp_wmem[1]); - WRITE_ONCE(sk->sk_rcvbuf, sock_net(sk)->ipv4.sysctl_tcp_rmem[1]); + WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1])); + WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]));
sk_sockets_allocated_inc(sk); sk->sk_route_forced_caps = NETIF_F_GSO; @@ -1715,7 +1715,7 @@ int tcp_set_rcvlowat(struct sock *sk, int val) if (sk->sk_userlocks & SOCK_RCVBUF_LOCK) cap = sk->sk_rcvbuf >> 1; else - cap = sock_net(sk)->ipv4.sysctl_tcp_rmem[2] >> 1; + cap = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1; val = min(val, cap); WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4f6c5bfdb603..c62327b2714b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -425,7 +425,7 @@ static void tcp_sndbuf_expand(struct sock *sk)
if (sk->sk_sndbuf < sndmem) WRITE_ONCE(sk->sk_sndbuf, - min(sndmem, sock_net(sk)->ipv4.sysctl_tcp_wmem[2])); + min(sndmem, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[2]))); }
/* 2. Tuning advertised window (window_clamp, rcv_ssthresh) @@ -460,7 +460,7 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb, struct tcp_sock *tp = tcp_sk(sk); /* Optimize this! */ int truesize = tcp_win_from_space(sk, skbtruesize) >> 1; - int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1; + int window = tcp_win_from_space(sk, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])) >> 1;
while (tp->rcv_ssthresh <= window) { if (truesize <= skb->len) @@ -565,16 +565,17 @@ static void tcp_clamp_window(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); struct net *net = sock_net(sk); + int rmem2;
icsk->icsk_ack.quick = 0; + rmem2 = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]);
- if (sk->sk_rcvbuf < net->ipv4.sysctl_tcp_rmem[2] && + if (sk->sk_rcvbuf < rmem2 && !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) && !tcp_under_memory_pressure(sk) && sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) { WRITE_ONCE(sk->sk_rcvbuf, - min(atomic_read(&sk->sk_rmem_alloc), - net->ipv4.sysctl_tcp_rmem[2])); + min(atomic_read(&sk->sk_rmem_alloc), rmem2)); } if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) tp->rcv_ssthresh = min(tp->window_clamp, 2U * tp->advmss); @@ -736,7 +737,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
do_div(rcvwin, tp->advmss); rcvbuf = min_t(u64, rcvwin * rcvmem, - sock_net(sk)->ipv4.sysctl_tcp_rmem[2]); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); if (rcvbuf > sk->sk_rcvbuf) { WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index dce5eeaf5733..d38fb1cff08f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -238,7 +238,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss, *rcv_wscale = 0; if (wscale_ok) { /* Set window scaling on max possible window */ - space = max_t(u32, space, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]); + space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); space = max_t(u32, space, sysctl_rmem_max); space = min_t(u32, space, *window_clamp); *rcv_wscale = clamp_t(int, ilog2(space) - 15, diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index d0e91aa7b30e..e61c85873ea2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1439,7 +1439,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
do_div(rcvwin, advmss); rcvbuf = min_t(u64, rcvwin * rcvmem, - sock_net(sk)->ipv4.sysctl_tcp_rmem[2]); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
if (rcvbuf > sk->sk_rcvbuf) { u32 window_clamp; @@ -1872,8 +1872,8 @@ static int mptcp_init_sock(struct sock *sk) return ret;
sk_sockets_allocated_inc(sk); - sk->sk_rcvbuf = sock_net(sk)->ipv4.sysctl_tcp_rmem[1]; - sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[1]; + sk->sk_rcvbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]); + sk->sk_sndbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]);
return 0; } diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 38256aabf4f1..8f3c9fbb9916 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -504,7 +504,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock, timer_setup(&sk->sk_timer, tipc_sk_timeout, 0); sk->sk_shutdown = 0; sk->sk_backlog_rcv = tipc_sk_backlog_rcv; - sk->sk_rcvbuf = sysctl_tipc_rmem[1]; + sk->sk_rcvbuf = READ_ONCE(sysctl_tipc_rmem[1]); sk->sk_data_ready = tipc_data_ready; sk->sk_write_space = tipc_write_space; sk->sk_destruct = tipc_sock_destruct;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit fb442c72db385695684ef5a2df930326a2c4f1b3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 1227c1771dd2ad44318aa3ab9e3a293b3f34ff2a ]
While reading sysctl_[rw]mem_(max|default), they can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/filter.c | 4 ++-- net/core/sock.c | 8 ++++---- net/ipv4/ip_output.c | 2 +- net/ipv4/tcp_output.c | 2 +- net/netfilter/ipvs/ip_vs_sync.c | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c index 23b6820a2e28..8adcf207d481 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4713,14 +4713,14 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname, /* Only some socketops are supported */ switch (optname) { case SO_RCVBUF: - val = min_t(u32, val, sysctl_rmem_max); + val = min_t(u32, val, READ_ONCE(sysctl_rmem_max)); val = min_t(int, val, INT_MAX / 2); sk->sk_userlocks |= SOCK_RCVBUF_LOCK; WRITE_ONCE(sk->sk_rcvbuf, max_t(int, val * 2, SOCK_MIN_RCVBUF)); break; case SO_SNDBUF: - val = min_t(u32, val, sysctl_wmem_max); + val = min_t(u32, val, READ_ONCE(sysctl_wmem_max)); val = min_t(int, val, INT_MAX / 2); sk->sk_userlocks |= SOCK_SNDBUF_LOCK; WRITE_ONCE(sk->sk_sndbuf, diff --git a/net/core/sock.c b/net/core/sock.c index 3f49f1117753..3be6caef862f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -887,7 +887,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname, * play 'guess the biggest size' games. RCVBUF/SNDBUF * are treated in BSD as hints */ - val = min_t(u32, val, sysctl_wmem_max); + val = min_t(u32, val, READ_ONCE(sysctl_wmem_max)); set_sndbuf: /* Ensure val * 2 fits into an int, to prevent max_t() * from treating it as a negative value. @@ -919,7 +919,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname, * play 'guess the biggest size' games. RCVBUF/SNDBUF * are treated in BSD as hints */ - __sock_set_rcvbuf(sk, min_t(u32, val, sysctl_rmem_max)); + __sock_set_rcvbuf(sk, min_t(u32, val, READ_ONCE(sysctl_rmem_max))); break;
case SO_RCVBUFFORCE: @@ -2973,8 +2973,8 @@ void sock_init_data(struct socket *sock, struct sock *sk) timer_setup(&sk->sk_timer, NULL, 0);
sk->sk_allocation = GFP_KERNEL; - sk->sk_rcvbuf = sysctl_rmem_default; - sk->sk_sndbuf = sysctl_wmem_default; + sk->sk_rcvbuf = READ_ONCE(sysctl_rmem_default); + sk->sk_sndbuf = READ_ONCE(sysctl_wmem_default); sk->sk_state = TCP_CLOSE; sk_set_socket(sk, sock);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index f77b0af3cb65..0dbf950de832 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1721,7 +1721,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
sk->sk_protocol = ip_hdr(skb)->protocol; sk->sk_bound_dev_if = arg->bound_dev_if; - sk->sk_sndbuf = sysctl_wmem_default; + sk->sk_sndbuf = READ_ONCE(sysctl_wmem_default); ipc.sockc.mark = fl4.flowi4_mark; err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base, len, 0, &ipc, &rt, MSG_DONTWAIT); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d38fb1cff08f..61bfd54beb59 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -239,7 +239,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss, if (wscale_ok) { /* Set window scaling on max possible window */ space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); - space = max_t(u32, space, sysctl_rmem_max); + space = max_t(u32, space, READ_ONCE(sysctl_rmem_max)); space = min_t(u32, space, *window_clamp); *rcv_wscale = clamp_t(int, ilog2(space) - 15, 0, TCP_MAX_WSCALE); diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index 16b48064f715..daab857c52a8 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -1280,12 +1280,12 @@ static void set_sock_size(struct sock *sk, int mode, int val) lock_sock(sk); if (mode) { val = clamp_t(int, val, (SOCK_MIN_SNDBUF + 1) / 2, - sysctl_wmem_max); + READ_ONCE(sysctl_wmem_max)); sk->sk_sndbuf = val * 2; sk->sk_userlocks |= SOCK_SNDBUF_LOCK; } else { val = clamp_t(int, val, (SOCK_MIN_RCVBUF + 1) / 2, - sysctl_rmem_max); + READ_ONCE(sysctl_rmem_max)); sk->sk_rcvbuf = val * 2; sk->sk_userlocks |= SOCK_RCVBUF_LOCK; }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit b498a1b0171e7152ce5a837c7f74b4c1a296586f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]
While reading weight_p, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Also, dev_[rt]x_weight can be read/written at the same time. So, we need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to use the same weight_p while changing dev_[rt]x_weight, we add a mutex in proc_do_dev_weight().
Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/dev.c | 2 +- net/core/sysctl_net_core.c | 15 +++++++++------ net/sched/sch_generic.c | 2 +- 3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 326f4ce33a3d..b4ebbbb8a6c9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6377,7 +6377,7 @@ static int process_backlog(struct napi_struct *napi, int quota) net_rps_action_and_irq_enable(sd); }
- napi->weight = dev_rx_weight; + napi->weight = READ_ONCE(dev_rx_weight); while (again) { struct sk_buff *skb;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 2e0a4378e778..0dfe9f255ab3 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -235,14 +235,17 @@ static int set_default_qdisc(struct ctl_table *table, int write, static int proc_do_dev_weight(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - int ret; + static DEFINE_MUTEX(dev_weight_mutex); + int ret, weight;
+ mutex_lock(&dev_weight_mutex); ret = proc_dointvec(table, write, buffer, lenp, ppos); - if (ret != 0) - return ret; - - dev_rx_weight = weight_p * dev_weight_rx_bias; - dev_tx_weight = weight_p * dev_weight_tx_bias; + if (!ret && write) { + weight = READ_ONCE(weight_p); + WRITE_ONCE(dev_rx_weight, weight * dev_weight_rx_bias); + WRITE_ONCE(dev_tx_weight, weight * dev_weight_tx_bias); + } + mutex_unlock(&dev_weight_mutex);
return ret; } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 5d5391adb667..68f1e89430b3 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -403,7 +403,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
void __qdisc_run(struct Qdisc *q) { - int quota = dev_tx_weight; + int quota = READ_ONCE(dev_tx_weight); int packets;
while (qdisc_restart(q, &packets)) {
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 3850060352f41366bdc25b091baeeb5c144b4a9e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ]
While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
While at it, we remove the unnecessary spaces in the doc.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- Documentation/admin-guide/sysctl/net.rst | 2 +- net/core/dev.c | 4 ++-- net/core/gro_cells.c | 2 +- net/xfrm/espintcp.c | 2 +- net/xfrm/xfrm_input.c | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index f2ab8a5b6a4b..7f553859dba8 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -271,7 +271,7 @@ poll cycle or the number of packets processed reaches netdev_budget. netdev_max_backlog ------------------
-Maximum number of packets, queued on the INPUT side, when the interface +Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.
netdev_rss_key diff --git a/net/core/dev.c b/net/core/dev.c index b4ebbbb8a6c9..c6f4513c0f90 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4518,7 +4518,7 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen) struct softnet_data *sd; unsigned int old_flow, new_flow;
- if (qlen < (netdev_max_backlog >> 1)) + if (qlen < (READ_ONCE(netdev_max_backlog) >> 1)) return false;
sd = this_cpu_ptr(&softnet_data); @@ -4566,7 +4566,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu, if (!netif_running(skb->dev)) goto drop; qlen = skb_queue_len(&sd->input_pkt_queue); - if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) { + if (qlen <= READ_ONCE(netdev_max_backlog) && !skb_flow_limit(skb, qlen)) { if (qlen) { enqueue: __skb_queue_tail(&sd->input_pkt_queue, skb); diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c index 6eb2e5ec2c50..2f66f3f29563 100644 --- a/net/core/gro_cells.c +++ b/net/core/gro_cells.c @@ -26,7 +26,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
cell = this_cpu_ptr(gcells->cells);
- if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) { + if (skb_queue_len(&cell->napi_skbs) > READ_ONCE(netdev_max_backlog)) { drop: atomic_long_inc(&dev->rx_dropped); kfree_skb(skb); diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c index 1f08ebf7d80c..24ca49ecebea 100644 --- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -170,7 +170,7 @@ int espintcp_queue_out(struct sock *sk, struct sk_buff *skb) { struct espintcp_ctx *ctx = espintcp_getctx(sk);
- if (skb_queue_len(&ctx->out_queue) >= netdev_max_backlog) + if (skb_queue_len(&ctx->out_queue) >= READ_ONCE(netdev_max_backlog)) return -ENOBUFS;
__skb_queue_tail(&ctx->out_queue, skb); diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 61e6220ddd5a..77e82033ad70 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -782,7 +782,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
trans = this_cpu_ptr(&xfrm_trans_tasklet);
- if (skb_queue_len(&trans->queue) >= netdev_max_backlog) + if (skb_queue_len(&trans->queue) >= READ_ONCE(netdev_max_backlog)) return -ENOBUFS;
BUILD_BUG_ON(sizeof(struct xfrm_trans_cb) > sizeof(skb->cb));
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit e73009ebc1233e447ae6503da51c402e4ed7ca4b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 61adf447e38664447526698872e21c04623afb8e ]
While reading netdev_tstamp_prequeue, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c index c6f4513c0f90..113d2c78f671 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4801,7 +4801,7 @@ static int netif_rx_internal(struct sk_buff *skb) { int ret;
- net_timestamp_check(netdev_tstamp_prequeue, skb); + net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);
trace_netif_rx(skb);
@@ -5162,7 +5162,7 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, int ret = NET_RX_DROP; __be16 type;
- net_timestamp_check(!netdev_tstamp_prequeue, skb); + net_timestamp_check(!READ_ONCE(netdev_tstamp_prequeue), skb);
trace_netif_receive_skb(skb);
@@ -5564,7 +5564,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb) { int ret;
- net_timestamp_check(netdev_tstamp_prequeue, skb); + net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);
if (skb_defer_rx_timestamp(skb)) return NET_RX_SUCCESS; @@ -5594,7 +5594,7 @@ static void netif_receive_skb_list_internal(struct list_head *head)
INIT_LIST_HEAD(&sublist); list_for_each_entry_safe(skb, next, head, list) { - net_timestamp_check(netdev_tstamp_prequeue, skb); + net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb); skb_list_del_init(skb); if (!skb_defer_rx_timestamp(skb)) list_add_tail(&skb->list, &sublist);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 4d4e39245dd58bb36f57f9b48a142b20d291c605 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]
While reading rs->interval and rs->burst, they can be changed concurrently via sysctl (e.g. net_ratelimit_state). Thus, we need to add READ_ONCE() to their readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- lib/ratelimit.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/lib/ratelimit.c b/lib/ratelimit.c index e01a93f46f83..ce945c17980b 100644 --- a/lib/ratelimit.c +++ b/lib/ratelimit.c @@ -26,10 +26,16 @@ */ int ___ratelimit(struct ratelimit_state *rs, const char *func) { + /* Paired with WRITE_ONCE() in .proc_handler(). + * Changing two values seperately could be inconsistent + * and some message could be lost. (See: net_ratelimit_state). + */ + int interval = READ_ONCE(rs->interval); + int burst = READ_ONCE(rs->burst); unsigned long flags; int ret;
- if (!rs->interval) + if (!interval) return 1;
/* @@ -44,7 +50,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func) if (!rs->begin) rs->begin = jiffies;
- if (time_is_before_jiffies(rs->begin + rs->interval)) { + if (time_is_before_jiffies(rs->begin + interval)) { if (rs->missed) { if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) { printk_deferred(KERN_WARNING @@ -56,7 +62,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func) rs->begin = jiffies; rs->printed = 0; } - if (rs->burst && rs->burst > rs->printed) { + if (burst && burst > rs->printed) { rs->printed++; ret = 1; } else {
From: Martin KaFai Lau kafai@fb.com
stable inclusion from stable-v5.10.140 commit 27e8ade792655177cc74417a6ded25735f3cacf0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9e838b02b0bb795793f12049307a354e28b5749c ]
sk_storage_charge() is the only user of omem_charge(). This patch simplifies it by folding omem_charge() into sk_storage_charge().
Signed-off-by: Martin KaFai Lau kafai@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Song Liu songliubraving@fb.com Acked-by: KP Singh kpsingh@google.com Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/bpf_sk_storage.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 7bbdb4d2cb9e..2a5428a5e52e 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -15,18 +15,6 @@
DEFINE_BPF_STORAGE_CACHE(sk_cache);
-static int omem_charge(struct sock *sk, unsigned int size) -{ - /* same check as in sock_kmalloc() */ - if (size <= sysctl_optmem_max && - atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) { - atomic_add(size, &sk->sk_omem_alloc); - return 0; - } - - return -ENOMEM; -} - static struct bpf_local_storage_data * sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit) { @@ -316,7 +304,16 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk) static int sk_storage_charge(struct bpf_local_storage_map *smap, void *owner, u32 size) { - return omem_charge(owner, size); + struct sock *sk = (struct sock *)owner; + + /* same check as in sock_kmalloc() */ + if (size <= sysctl_optmem_max && + atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) { + atomic_add(size, &sk->sk_omem_alloc); + return 0; + } + + return -ENOMEM; }
static void sk_storage_uncharge(struct bpf_local_storage_map *smap,
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit ed48223f87c5c5928b6bf56c02c75dda0a2ab0dd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7de6d09f51917c829af2b835aba8bb5040f8e86a ]
While reading sysctl_optmem_max, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/bpf_sk_storage.c | 5 +++-- net/core/filter.c | 9 +++++---- net/core/sock.c | 8 +++++--- net/ipv4/ip_sockglue.c | 6 +++--- net/ipv6/ipv6_sockglue.c | 4 ++-- 5 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 2a5428a5e52e..d59c29209164 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -304,11 +304,12 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk) static int sk_storage_charge(struct bpf_local_storage_map *smap, void *owner, u32 size) { + int optmem_max = READ_ONCE(sysctl_optmem_max); struct sock *sk = (struct sock *)owner;
/* same check as in sock_kmalloc() */ - if (size <= sysctl_optmem_max && - atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) { + if (size <= optmem_max && + atomic_read(&sk->sk_omem_alloc) + size < optmem_max) { atomic_add(size, &sk->sk_omem_alloc); return 0; } diff --git a/net/core/filter.c b/net/core/filter.c index 8adcf207d481..4bcf93934241 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1212,10 +1212,11 @@ void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp) static bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp) { u32 filter_size = bpf_prog_size(fp->prog->len); + int optmem_max = READ_ONCE(sysctl_optmem_max);
/* same check as in sock_kmalloc() */ - if (filter_size <= sysctl_optmem_max && - atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) { + if (filter_size <= optmem_max && + atomic_read(&sk->sk_omem_alloc) + filter_size < optmem_max) { atomic_add(filter_size, &sk->sk_omem_alloc); return true; } @@ -1547,7 +1548,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk) if (IS_ERR(prog)) return PTR_ERR(prog);
- if (bpf_prog_size(prog->len) > sysctl_optmem_max) + if (bpf_prog_size(prog->len) > READ_ONCE(sysctl_optmem_max)) err = -ENOMEM; else err = reuseport_attach_prog(sk, prog); @@ -1614,7 +1615,7 @@ int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk) } } else { /* BPF_PROG_TYPE_SOCKET_FILTER */ - if (bpf_prog_size(prog->len) > sysctl_optmem_max) { + if (bpf_prog_size(prog->len) > READ_ONCE(sysctl_optmem_max)) { err = -ENOMEM; goto err_prog_put; } diff --git a/net/core/sock.c b/net/core/sock.c index 3be6caef862f..c035ac5b4b13 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2219,7 +2219,7 @@ struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
/* small safe race: SKB_TRUESIZE may differ from final skb->truesize */ if (atomic_read(&sk->sk_omem_alloc) + SKB_TRUESIZE(size) > - sysctl_optmem_max) + READ_ONCE(sysctl_optmem_max)) return NULL;
skb = alloc_skb(size, priority); @@ -2237,8 +2237,10 @@ struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size, */ void *sock_kmalloc(struct sock *sk, int size, gfp_t priority) { - if ((unsigned int)size <= sysctl_optmem_max && - atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) { + int optmem_max = READ_ONCE(sysctl_optmem_max); + + if ((unsigned int)size <= optmem_max && + atomic_read(&sk->sk_omem_alloc) + size < optmem_max) { void *mem; /* First do the add, to avoid the race if kmalloc * might sleep. diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 22507a6a3f71..4cc39c62af55 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -773,7 +773,7 @@ static int ip_set_mcast_msfilter(struct sock *sk, sockptr_t optval, int optlen)
if (optlen < GROUP_FILTER_SIZE(0)) return -EINVAL; - if (optlen > sysctl_optmem_max) + if (optlen > READ_ONCE(sysctl_optmem_max)) return -ENOBUFS;
gsf = memdup_sockptr(optval, optlen); @@ -808,7 +808,7 @@ static int compat_ip_set_mcast_msfilter(struct sock *sk, sockptr_t optval,
if (optlen < size0) return -EINVAL; - if (optlen > sysctl_optmem_max - 4) + if (optlen > READ_ONCE(sysctl_optmem_max) - 4) return -ENOBUFS;
p = kmalloc(optlen + 4, GFP_KERNEL); @@ -1231,7 +1231,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname,
if (optlen < IP_MSFILTER_SIZE(0)) goto e_inval; - if (optlen > sysctl_optmem_max) { + if (optlen > READ_ONCE(sysctl_optmem_max)) { err = -ENOBUFS; break; } diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 65f83ce728fa..ce4e0da4ab9b 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -208,7 +208,7 @@ static int ipv6_set_mcast_msfilter(struct sock *sk, sockptr_t optval,
if (optlen < GROUP_FILTER_SIZE(0)) return -EINVAL; - if (optlen > sysctl_optmem_max) + if (optlen > READ_ONCE(sysctl_optmem_max)) return -ENOBUFS;
gsf = memdup_sockptr(optval, optlen); @@ -242,7 +242,7 @@ static int compat_ipv6_set_mcast_msfilter(struct sock *sk, sockptr_t optval,
if (optlen < size0) return -EINVAL; - if (optlen > sysctl_optmem_max - 4) + if (optlen > READ_ONCE(sysctl_optmem_max) - 4) return -ENOBUFS;
p = kmalloc(optlen + 4, GFP_KERNEL);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 8db070463e3ebedf81dfcc1d8bc51f46e751872f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]
While reading sysctl_tstamp_allow_data, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1aa1be87238b..bb31ac64ec38 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4768,7 +4768,7 @@ static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly) { bool ret;
- if (likely(sysctl_tstamp_allow_data || tsonly)) + if (likely(READ_ONCE(sysctl_tstamp_allow_data) || tsonly)) return true;
read_lock_bh(&sk->sk_callback_lock);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 2c7dae6c45112ee7ead62155fed375cb2e7d7cf8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit c42b7cddea47503411bfb5f2f93a4154aaffa2d9 ]
While reading sysctl_net_busy_poll, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 060212928670 ("net: add low latency socket poll") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/net/busy_poll.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 716b7c5f6fdd..36e5e75e7172 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -31,7 +31,7 @@ extern unsigned int sysctl_net_busy_poll __read_mostly;
static inline bool net_busy_loop_on(void) { - return sysctl_net_busy_poll; + return READ_ONCE(sysctl_net_busy_poll); }
static inline bool sk_can_busy_loop(const struct sock *sk)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 410c88314ce351c9c77ec68da1d37bd91ddd76a2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit e59ef36f0795696ab229569c153936bfd068d21c ]
While reading sysctl_net_busy_read, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index c035ac5b4b13..56a927b9b372 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3033,7 +3033,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
#ifdef CONFIG_NET_RX_BUSY_POLL sk->sk_napi_id = 0; - sk->sk_ll_usec = sysctl_net_busy_read; + sk->sk_ll_usec = READ_ONCE(sysctl_net_busy_read); #endif
sk->sk_max_pacing_rate = ~0UL;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 12a34d7f0463ebb6db01977091b5cda46d9060bc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]
While reading netdev_budget, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 113d2c78f671..a57783e41c98 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6886,7 +6886,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) struct softnet_data *sd = this_cpu_ptr(&softnet_data); unsigned long time_limit = jiffies + usecs_to_jiffies(netdev_budget_usecs); - int budget = netdev_budget; + int budget = READ_ONCE(netdev_budget); LIST_HEAD(list); LIST_HEAD(repoll);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit c3bda708e9c40d0a88f3c86c3d9103db44b38c0b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]
While reading netdev_budget_usecs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c index a57783e41c98..64837bb056ee 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6885,7 +6885,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); unsigned long time_limit = jiffies + - usecs_to_jiffies(netdev_budget_usecs); + usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); int budget = READ_ONCE(netdev_budget); LIST_HEAD(list); LIST_HEAD(repoll);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 371a3bcf3144584c511f80e87d4c28ac2c75e9a7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit af67508ea6cbf0e4ea27f8120056fa2efce127dd ]
While reading sysctl_fb_tunnels_only_for_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/linux/netdevice.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0f95a702bc22..23a895168b97 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -647,9 +647,14 @@ extern int sysctl_devconf_inherit_init_net; */ static inline bool net_has_fallback_tunnels(const struct net *net) { - return !IS_ENABLED(CONFIG_SYSCTL) || - !sysctl_fb_tunnels_only_for_init_net || - (net == &init_net && sysctl_fb_tunnels_only_for_init_net == 1); +#if IS_ENABLED(CONFIG_SYSCTL) + int fb_tunnels_only_for_init_net = READ_ONCE(sysctl_fb_tunnels_only_for_init_net); + + return !fb_tunnels_only_for_init_net || + (net_eq(net, &init_net) && fb_tunnels_only_for_init_net == 1); +#else + return true; +#endif }
static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 9fcc4f4066208b3383824ed8f0eba6bf47c23e87 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]
While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/linux/netdevice.h | 9 +++++++++ net/ipv4/devinet.c | 16 ++++++++++------ net/ipv6/addrconf.c | 5 ++--- 3 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 23a895168b97..42b44b1ba116 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -657,6 +657,15 @@ static inline bool net_has_fallback_tunnels(const struct net *net) #endif }
+static inline int net_inherit_devconf(void) +{ +#if IS_ENABLED(CONFIG_SYSCTL) + return READ_ONCE(sysctl_devconf_inherit_init_net); +#else + return 0; +#endif +} + static inline int netdev_queue_numa_node_read(const struct netdev_queue *q) { #if defined(CONFIG_XPS) && defined(CONFIG_NUMA) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 148ef484a66c..8f1753875550 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2668,23 +2668,27 @@ static __net_init int devinet_init_net(struct net *net) #endif
if (!net_eq(net, &init_net)) { - if (IS_ENABLED(CONFIG_SYSCTL) && - sysctl_devconf_inherit_init_net == 3) { + switch (net_inherit_devconf()) { + case 3: /* copy from the current netns */ memcpy(all, current->nsproxy->net_ns->ipv4.devconf_all, sizeof(ipv4_devconf)); memcpy(dflt, current->nsproxy->net_ns->ipv4.devconf_dflt, sizeof(ipv4_devconf_dflt)); - } else if (!IS_ENABLED(CONFIG_SYSCTL) || - sysctl_devconf_inherit_init_net != 2) { - /* inherit == 0 or 1: copy from init_net */ + break; + case 0: + case 1: + /* copy from init_net */ memcpy(all, init_net.ipv4.devconf_all, sizeof(ipv4_devconf)); memcpy(dflt, init_net.ipv4.devconf_dflt, sizeof(ipv4_devconf_dflt)); + break; + case 2: + /* use compiled values */ + break; } - /* else inherit == 2: use compiled values */ }
#ifdef CONFIG_SYSCTL diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 97624d494c5b..31a0b28e4167 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -7046,9 +7046,8 @@ static int __net_init addrconf_init_net(struct net *net) if (!dflt) goto err_alloc_dflt;
- if (IS_ENABLED(CONFIG_SYSCTL) && - !net_eq(net, &init_net)) { - switch (sysctl_devconf_inherit_init_net) { + if (!net_eq(net, &init_net)) { + switch (net_inherit_devconf()) { case 1: /* copy from init_net */ memcpy(all, init_net.ipv6.devconf_all, sizeof(ipv6_devconf));
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.140 commit 23cf93bb32e571cf1fc678c83888df9950749d64 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]
While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/socket.c b/net/socket.c index 7d84c289e5ae..78fdcbe5eaa5 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1672,7 +1672,7 @@ int __sys_listen(int fd, int backlog)
sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock) { - somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn; + somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); if ((unsigned int)backlog > somaxconn) backlog = somaxconn;
From: Jacob Keller jacob.e.keller@intel.com
stable inclusion from stable-v5.10.140 commit c3a6e863d51bd7cacef6c68f92ccfd235c797428 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 25d7a5f5a6bb15a2dae0a3f39ea5dda215024726 ]
The ixgbe_ptp_start_cyclecounter is intended to be called whenever the cyclecounter parameters need to be changed.
Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices"), this function has cleared the SYSTIME registers and reset the TSAUXC DISABLE_SYSTIME bit.
While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear them during ixgbe_ptp_start_cyclecounter. This function may be called during both reset and link status change. When link changes, the SYSTIME counter is still operating normally, but the cyclecounter should be updated to account for the possibly changed parameters.
Clearing SYSTIME when link changes causes the timecounter to jump because the cycle counter now reads zero.
Extract the SYSTIME initialization out to a new function and call this during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids an unnecessary reset of the current time.
This also restores the original SYSTIME clearing that occurred during ixgbe_ptp_reset before the commit above.
Reported-by: Steve Payne spayne@aurora.tech Reported-by: Ilya Evenbach ievenbach@aurora.tech Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Gurucharan gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 +++++++++++++++----- 1 file changed, 46 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c index 22a874eee2e8..8b7f30035571 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c @@ -1211,7 +1211,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter) struct cyclecounter cc; unsigned long flags; u32 incval = 0; - u32 tsauxc = 0; u32 fuse0 = 0;
/* For some of the boards below this mask is technically incorrect. @@ -1246,18 +1245,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter) case ixgbe_mac_x550em_a: case ixgbe_mac_X550: cc.read = ixgbe_ptp_read_X550; - - /* enable SYSTIME counter */ - IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0); - IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0); - IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0); - tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC); - IXGBE_WRITE_REG(hw, IXGBE_TSAUXC, - tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME); - IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS); - IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC); - - IXGBE_WRITE_FLUSH(hw); break; case ixgbe_mac_X540: cc.read = ixgbe_ptp_read_82599; @@ -1289,6 +1276,50 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter) spin_unlock_irqrestore(&adapter->tmreg_lock, flags); }
+/** + * ixgbe_ptp_init_systime - Initialize SYSTIME registers + * @adapter: the ixgbe private board structure + * + * Initialize and start the SYSTIME registers. + */ +static void ixgbe_ptp_init_systime(struct ixgbe_adapter *adapter) +{ + struct ixgbe_hw *hw = &adapter->hw; + u32 tsauxc; + + switch (hw->mac.type) { + case ixgbe_mac_X550EM_x: + case ixgbe_mac_x550em_a: + case ixgbe_mac_X550: + tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC); + + /* Reset SYSTIME registers to 0 */ + IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0); + IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0); + IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0); + + /* Reset interrupt settings */ + IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS); + IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC); + + /* Activate the SYSTIME counter */ + IXGBE_WRITE_REG(hw, IXGBE_TSAUXC, + tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME); + break; + case ixgbe_mac_X540: + case ixgbe_mac_82599EB: + /* Reset SYSTIME registers to 0 */ + IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0); + IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0); + break; + default: + /* Other devices aren't supported */ + return; + }; + + IXGBE_WRITE_FLUSH(hw); +} + /** * ixgbe_ptp_reset * @adapter: the ixgbe private board structure @@ -1315,6 +1346,8 @@ void ixgbe_ptp_reset(struct ixgbe_adapter *adapter)
ixgbe_ptp_start_cyclecounter(adapter);
+ ixgbe_ptp_init_systime(adapter); + spin_lock_irqsave(&adapter->tmreg_lock, flags); timecounter_init(&adapter->hw_tc, &adapter->hw_cc, ktime_to_ns(ktime_get_real()));
From: David Howells dhowells@redhat.com
stable inclusion from stable-v5.10.140 commit 79e2ca7aa96e80961828ab6312264633b66183cc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit b0f571ecd7943423c25947439045f0d352ca3dbf ]
Fix three bugs in the rxrpc's sendmsg implementation:
(1) rxrpc_new_client_call() should release the socket lock when returning an error from rxrpc_get_call_slot().
(2) rxrpc_wait_for_tx_window_intr() will return without the call mutex held in the event that we're interrupted by a signal whilst waiting for tx space on the socket or relocking the call mutex afterwards.
Fix this by: (a) moving the unlock/lock of the call mutex up to rxrpc_send_data() such that the lock is not held around all of rxrpc_wait_for_tx_window*() and (b) indicating to higher callers whether we're return with the lock dropped. Note that this means recvmsg() will not block on this call whilst we're waiting.
(3) After dropping and regaining the call mutex, rxrpc_send_data() needs to go and recheck the state of the tx_pending buffer and the tx_total_len check in case we raced with another sendmsg() on the same call.
Thinking on this some more, it might make sense to have different locks for sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating that a call is dead before a sendmsg() to that call returns - but that can currently happen anyway.
Without fix (2), something like the following can be induced:
WARNING: bad unlock balance detected! 5.16.0-rc6-syzkaller #0 Not tainted ------------------------------------- syz-executor011/3597 is trying to release lock (&call->user_mutex) at: [<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 but there are no more locks to release!
other info that might help us debug this: no locks held by syz-executor011/3597. ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline] __lock_release kernel/locking/lockdep.c:5306 [inline] lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]
Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Marc Dionne marc.dionne@auristor.com Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com cc: Hawkins Jiawei yin31149@gmail.com cc: Khalid Masum khalid.masum.92@gmail.com cc: Dan Carpenter dan.carpenter@oracle.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@wart... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/rxrpc/call_object.c | 4 +- net/rxrpc/sendmsg.c | 92 ++++++++++++++++++++++++----------------- 2 files changed, 57 insertions(+), 39 deletions(-)
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 043508fd8d8a..150cd7b2154c 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -285,8 +285,10 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, _enter("%p,%lx", rx, p->user_call_ID);
limiter = rxrpc_get_call_slot(p, gfp); - if (!limiter) + if (!limiter) { + release_sock(&rx->sk); return ERR_PTR(-ERESTARTSYS); + }
call = rxrpc_alloc_client_call(rx, srx, gfp, debug_id); if (IS_ERR(call)) { diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index aa23ba4e2566..eef3c14fd1c1 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -51,10 +51,7 @@ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, return sock_intr_errno(*timeo);
trace_rxrpc_transmit(call, rxrpc_transmit_wait); - mutex_unlock(&call->user_mutex); *timeo = schedule_timeout(*timeo); - if (mutex_lock_interruptible(&call->user_mutex) < 0) - return sock_intr_errno(*timeo); } }
@@ -290,37 +287,48 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, static int rxrpc_send_data(struct rxrpc_sock *rx, struct rxrpc_call *call, struct msghdr *msg, size_t len, - rxrpc_notify_end_tx_t notify_end_tx) + rxrpc_notify_end_tx_t notify_end_tx, + bool *_dropped_lock) { struct rxrpc_skb_priv *sp; struct sk_buff *skb; struct sock *sk = &rx->sk; + enum rxrpc_call_state state; long timeo; - bool more; - int ret, copied; + bool more = msg->msg_flags & MSG_MORE; + int ret, copied = 0;
timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
/* this should be in poll */ sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
+reload: + ret = -EPIPE; if (sk->sk_shutdown & SEND_SHUTDOWN) - return -EPIPE; - - more = msg->msg_flags & MSG_MORE; - + goto maybe_error; + state = READ_ONCE(call->state); + ret = -ESHUTDOWN; + if (state >= RXRPC_CALL_COMPLETE) + goto maybe_error; + ret = -EPROTO; + if (state != RXRPC_CALL_CLIENT_SEND_REQUEST && + state != RXRPC_CALL_SERVER_ACK_REQUEST && + state != RXRPC_CALL_SERVER_SEND_REPLY) + goto maybe_error; + + ret = -EMSGSIZE; if (call->tx_total_len != -1) { - if (len > call->tx_total_len) - return -EMSGSIZE; - if (!more && len != call->tx_total_len) - return -EMSGSIZE; + if (len - copied > call->tx_total_len) + goto maybe_error; + if (!more && len - copied != call->tx_total_len) + goto maybe_error; }
skb = call->tx_pending; call->tx_pending = NULL; rxrpc_see_skb(skb, rxrpc_skb_seen);
- copied = 0; do { /* Check to see if there's a ping ACK to reply to. */ if (call->ackr_reason == RXRPC_ACK_PING_RESPONSE) @@ -331,16 +339,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
_debug("alloc");
- if (!rxrpc_check_tx_space(call, NULL)) { - ret = -EAGAIN; - if (msg->msg_flags & MSG_DONTWAIT) - goto maybe_error; - ret = rxrpc_wait_for_tx_window(rx, call, - &timeo, - msg->msg_flags & MSG_WAITALL); - if (ret < 0) - goto maybe_error; - } + if (!rxrpc_check_tx_space(call, NULL)) + goto wait_for_space;
max = RXRPC_JUMBO_DATALEN; max -= call->conn->security_size; @@ -485,6 +485,27 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, efault: ret = -EFAULT; goto out; + +wait_for_space: + ret = -EAGAIN; + if (msg->msg_flags & MSG_DONTWAIT) + goto maybe_error; + mutex_unlock(&call->user_mutex); + *_dropped_lock = true; + ret = rxrpc_wait_for_tx_window(rx, call, &timeo, + msg->msg_flags & MSG_WAITALL); + if (ret < 0) + goto maybe_error; + if (call->interruptibility == RXRPC_INTERRUPTIBLE) { + if (mutex_lock_interruptible(&call->user_mutex) < 0) { + ret = sock_intr_errno(timeo); + goto maybe_error; + } + } else { + mutex_lock(&call->user_mutex); + } + *_dropped_lock = false; + goto reload; }
/* @@ -646,6 +667,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) enum rxrpc_call_state state; struct rxrpc_call *call; unsigned long now, j; + bool dropped_lock = false; int ret;
struct rxrpc_send_params p = { @@ -754,21 +776,13 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) ret = rxrpc_send_abort_packet(call); } else if (p.command != RXRPC_CMD_SEND_DATA) { ret = -EINVAL; - } else if (rxrpc_is_client_call(call) && - state != RXRPC_CALL_CLIENT_SEND_REQUEST) { - /* request phase complete for this client call */ - ret = -EPROTO; - } else if (rxrpc_is_service_call(call) && - state != RXRPC_CALL_SERVER_ACK_REQUEST && - state != RXRPC_CALL_SERVER_SEND_REPLY) { - /* Reply phase not begun or not complete for service call. */ - ret = -EPROTO; } else { - ret = rxrpc_send_data(rx, call, msg, len, NULL); + ret = rxrpc_send_data(rx, call, msg, len, NULL, &dropped_lock); }
out_put_unlock: - mutex_unlock(&call->user_mutex); + if (!dropped_lock) + mutex_unlock(&call->user_mutex); error_put: rxrpc_put_call(call, rxrpc_call_put); _leave(" = %d", ret); @@ -796,6 +810,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, struct msghdr *msg, size_t len, rxrpc_notify_end_tx_t notify_end_tx) { + bool dropped_lock = false; int ret;
_enter("{%d,%s},", call->debug_id, rxrpc_call_states[call->state]); @@ -813,7 +828,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, case RXRPC_CALL_SERVER_ACK_REQUEST: case RXRPC_CALL_SERVER_SEND_REPLY: ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len, - notify_end_tx); + notify_end_tx, &dropped_lock); break; case RXRPC_CALL_COMPLETE: read_lock_bh(&call->state_lock); @@ -827,7 +842,8 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, break; }
- mutex_unlock(&call->user_mutex); + if (!dropped_lock) + mutex_unlock(&call->user_mutex); _leave(" = %d", ret); return ret; }
From: Shannon Nelson snelson@pensando.io
stable inclusion from stable-v5.10.140 commit 3a351b567e204036da00db319cf0838146ea64b4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 0fc4dd452d6c14828eed6369155c75c0ac15bab3 ]
In looping on FW update tests we occasionally see the FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop waiting for the FW activate step to finsh inside the FW. The firmware is complaining that the done bit is set when a new dev_cmd is going to be processed.
Doing a clean on the cmd registers and doorbell before exiting the wait-for-done and cleaning the done bit before the sleep prevents this from occurring.
Fixes: fbfb8031533c ("ionic: Add hardware init and device commands") Signed-off-by: Shannon Nelson snelson@pensando.io Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/net/ethernet/pensando/ionic/ionic_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_main.c b/drivers/net/ethernet/pensando/ionic/ionic_main.c index e14869a2e24a..f60ffef33e0c 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_main.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_main.c @@ -378,8 +378,8 @@ int ionic_dev_cmd_wait(struct ionic *ionic, unsigned long max_seconds) ionic_opcode_to_str(opcode), opcode, ionic_error_to_str(err), err);
- msleep(1000); iowrite32(0, &idev->dev_cmd_regs->done); + msleep(1000); iowrite32(1, &idev->dev_cmd_regs->doorbell); goto try_again; } @@ -392,6 +392,8 @@ int ionic_dev_cmd_wait(struct ionic *ionic, unsigned long max_seconds) return ionic_error_to_errno(err); }
+ ionic_dev_cmd_clean(ionic); + return 0; }
From: Filipe Manana fdmanana@suse.com
stable inclusion from stable-v5.10.140 commit 2fc3c168d5b66ff8e673c9b6b7763dd783089134 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 47bf225a8d2cccb15f7e8d4a1ed9b757dd86afd7 upstream.
At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end up returning from the function with a value of 0 (success). This happens because the function returns the value stored in the variable 'err', which is 0, while the error value we got from btrfs_search_slot() is stored in the 'ret' variable.
So fix it by setting 'err' with the error value.
Fixes: 8289ed9f93bef2 ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/btrfs/root-tree.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index db37a3799649..e9e8ca4e98a7 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -336,9 +336,10 @@ int btrfs_del_root_ref(struct btrfs_trans_handle *trans, u64 root_id, key.offset = ref_id; again: ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1); - if (ret < 0) + if (ret < 0) { + err = ret; goto out; - if (ret == 0) { + } else if (ret == 0) { leaf = path->nodes[0]; ref = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_root_ref);
From: Anand Jain anand.jain@oracle.com
stable inclusion from stable-v5.10.140 commit b2d352ed4d489f9fdfe5caf7d5e62bd2c310f0a8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 59a3991984dbc1fc47e5651a265c5200bd85464e upstream.
If the filesystem mounts with the replace-operation in a suspended state and try to cancel the suspended replace-operation, we hit the assert. The assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state") that was actually not required. So just remove it.
$ mount /dev/sda5 /btrfs
BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
$ mount -o degraded /dev/sda5 /btrfs <-- success.
$ btrfs replace cancel /btrfs
kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 kernel: ------------[ cut here ]------------ kernel: kernel BUG at fs/btrfs/ctree.h:3750!
After the patch:
$ btrfs replace cancel /btrfs
BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
Fixes: fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state") CC: stable@vger.kernel.org # 5.0+ Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/btrfs/dev-replace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index d29780463182..81ea35fa6a63 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -954,8 +954,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) up_write(&dev_replace->rwsem);
/* Scrub for replace must not be running in suspended state */ - ret = btrfs_scrub_cancel(fs_info); - ASSERT(ret != -ENOTCONN); + btrfs_scrub_cancel(fs_info);
trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) {
From: Anand Jain anand.jain@oracle.com
stable inclusion from stable-v5.10.140 commit dcac6293f57136370a6f2016346f7c134fdf005d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit f2c3bec215694fb8bc0ef5010f2a758d1906fc2d upstream.
If the replace target device reappears after the suspended replace is cancelled, it blocks the mount operation as it can't find the matching replace-item in the metadata. As shown below,
BTRFS error (device sda5): replace devid present without an active replace item
To overcome this situation, the user can run the command
btrfs device scan --forget <replace target device>
and try the mount command again. And also, to avoid repeating the issue, superblock on the devid=0 must be wiped.
wipefs -a device-path-to-devid=0.
This patch adds some info when this situation occurs.
Reported-by: Samuel Greiner samuel@balkonien.org Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@bal... CC: stable@vger.kernel.org # 5.0+ Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/btrfs/dev-replace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 81ea35fa6a63..be6935d19197 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -161,7 +161,7 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info) if (btrfs_find_device(fs_info->fs_devices, BTRFS_DEV_REPLACE_DEVID, NULL, NULL, false)) { btrfs_err(fs_info, - "replace devid present without an active replace item"); +"replace without active item, run 'device scan --forget' on the target device"); ret = -EUCLEAN; } else { dev_replace->srcdev = NULL;
From: Goldwyn Rodrigues rgoldwyn@suse.de
stable inclusion from stable-v5.10.140 commit d2bd18d50c1e835d154e018adb8f56d35d622528 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b51111271b0352aa596c5ae8faf06939e91b3b68 upstream.
For a filesystem which has btrfs read-only property set to true, all write operations including xattr should be denied. However, security xattr can still be changed even if btrfs ro property is true.
This happens because xattr_permission() does not have any restrictions on security.*, system.* and in some cases trusted.* from VFS and the decision is left to the underlying filesystem. See comments in xattr_permission() for more details.
This patch checks if the root is read-only before performing the set xattr operation.
Testcase:
DEV=/dev/vdb MNT=/mnt
mkfs.btrfs -f $DEV mount $DEV $MNT echo "file one" > $MNT/f1
setfattr -n "security.one" -v 2 $MNT/f1 btrfs property set /mnt ro true
setfattr -n "security.one" -v 1 $MNT/f1
umount $MNT
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo wqu@suse.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Goldwyn Rodrigues rgoldwyn@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/btrfs/xattr.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index f1a60bcdb3db..cd6049b0bde5 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -389,6 +389,9 @@ static int btrfs_xattr_handler_set(const struct xattr_handler *handler, const char *name, const void *buffer, size_t size, int flags) { + if (btrfs_root_readonly(BTRFS_I(inode)->root)) + return -EROFS; + name = xattr_full_name(handler, name); return btrfs_setxattr_trans(inode, name, buffer, size, flags); }
From: Kan Liang kan.liang@linux.intel.com
stable inclusion from stable-v5.10.140 commit 090f0ac167a04d5cbc22718655f0c733a3b99775 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream.
On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set.
In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks.
There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools.
The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble.
Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian eranian@google.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/x86/events/intel/lbr.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index a6876a7f6738..9c3069aab490 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -1112,6 +1112,14 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event)
if (static_cpu_has(X86_FEATURE_ARCH_LBR)) { reg->config = mask; + + /* + * The Arch LBR HW can retrieve the common branch types + * from the LBR_INFO. It doesn't require the high overhead + * SW disassemble. + * Enable the branch type by default for the Arch LBR. + */ + reg->reg |= X86_BR_TYPE_SAVE; return 0; }
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
stable inclusion from stable-v5.10.140 commit 14cbbb9c9914663d0eeca6b59c1c9d4f5a547ee0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.
Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown.
Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities.
Mitigation is not deployed when the status is unknown.
[ bp: Massage, fixup. ]
Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper andrew.cooper3@citrix.com Suggested-by: Tony Luck tony.luck@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.165784626... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- .../hw-vuln/processor_mmio_stale_data.rst | 14 +++++++ arch/x86/include/asm/cpufeatures.h | 5 ++- arch/x86/kernel/cpu/bugs.c | 14 ++++++- arch/x86/kernel/cpu/common.c | 42 ++++++++++++------- 4 files changed, 56 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst index 9393c50b5afc..c98fd11907cc 100644 --- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst +++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst @@ -230,6 +230,20 @@ The possible values in this file are: * - 'Mitigation: Clear CPU buffers' - The processor is vulnerable and the CPU buffer clearing mitigation is enabled. + * - 'Unknown: No mitigations' + - The processor vulnerability status is unknown because it is + out of Servicing period. Mitigation is not attempted. + +Definitions: +------------ + +Servicing period: The process of providing functional and security updates to +Intel processors or platforms, utilizing the Intel Platform Update (IPU) +process or other similar mechanisms. + +End of Servicing Updates (ESU): ESU is the date at which Intel will no +longer provide Servicing, such as through IPU or other similar update +processes. ESU dates will typically be aligned to end of quarter.
If the processor is vulnerable then the following information is appended to the above information: diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index ef38a1d33b76..3cb7e1793c25 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -468,7 +468,8 @@ #define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */ #define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */ #define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */ -#define X86_BUG_RETBLEED X86_BUG(26) /* CPU is affected by RETBleed */ -#define X86_BUG_EIBRS_PBRSB X86_BUG(27) /* EIBRS is vulnerable to Post Barrier RSB Predictions */ +#define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */ +#define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */ +#define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
#endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1fd8df7e7f7c..29ae372eb949 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -424,7 +424,8 @@ static void __init mmio_select_mitigation(void) u64 ia32_cap;
if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) || - cpu_mitigations_off()) { + boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) || + cpu_mitigations_off()) { mmio_mitigation = MMIO_MITIGATION_OFF; return; } @@ -529,6 +530,8 @@ static void __init md_clear_update_mitigation(void) pr_info("TAA: %s\n", taa_strings[taa_mitigation]); if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]); + else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN)) + pr_info("MMIO Stale Data: Unknown: No mitigations\n"); }
static void __init md_clear_select_mitigation(void) @@ -2198,6 +2201,9 @@ static ssize_t tsx_async_abort_show_state(char *buf)
static ssize_t mmio_stale_data_show_state(char *buf) { + if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN)) + return sysfs_emit(buf, "Unknown: No mitigations\n"); + if (mmio_mitigation == MMIO_MITIGATION_OFF) return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);
@@ -2344,6 +2350,7 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr return srbds_show_state(buf);
case X86_BUG_MMIO_STALE_DATA: + case X86_BUG_MMIO_UNKNOWN: return mmio_stale_data_show_state(buf);
case X86_BUG_RETBLEED: @@ -2403,7 +2410,10 @@ ssize_t cpu_show_srbds(struct device *dev, struct device_attribute *attr, char *
ssize_t cpu_show_mmio_stale_data(struct device *dev, struct device_attribute *attr, char *buf) { - return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA); + if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN)) + return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_UNKNOWN); + else + return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA); }
ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 281629d0b8c7..00974e304028 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1023,7 +1023,8 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c) #define NO_SWAPGS BIT(6) #define NO_ITLB_MULTIHIT BIT(7) #define NO_SPECTRE_V2 BIT(8) -#define NO_EIBRS_PBRSB BIT(9) +#define NO_MMIO BIT(9) +#define NO_EIBRS_PBRSB BIT(10)
#define VULNWL(vendor, family, model, whitelist) \ X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, whitelist) @@ -1044,6 +1045,11 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION),
/* Intel Family 6 */ + VULNWL_INTEL(TIGERLAKE, NO_MMIO), + VULNWL_INTEL(TIGERLAKE_L, NO_MMIO), + VULNWL_INTEL(ALDERLAKE, NO_MMIO), + VULNWL_INTEL(ALDERLAKE_L, NO_MMIO), + VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION | NO_ITLB_MULTIHIT), VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION | NO_ITLB_MULTIHIT), VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION | NO_ITLB_MULTIHIT), @@ -1062,9 +1068,9 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT), VULNWL_INTEL(ATOM_AIRMONT_NP, NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB), + VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB),
/* * Technically, swapgs isn't serializing on AMD (despite it previously @@ -1079,18 +1085,18 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
/* AMD Family 0xf - 0x12 */ - VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ - VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
/* Zhaoxin Family 7 */ - VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS), - VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS), + VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO), + VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO), {} };
@@ -1244,10 +1250,16 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) * Affected CPU list is generally enough to enumerate the vulnerability, * but for virtualization case check for ARCH_CAP MSR bits also, VMM may * not want the guest to enumerate the bug. + * + * Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist, + * nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits. */ - if (cpu_matches(cpu_vuln_blacklist, MMIO) && - !arch_cap_mmio_immune(ia32_cap)) - setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA); + if (!arch_cap_mmio_immune(ia32_cap)) { + if (cpu_matches(cpu_vuln_blacklist, MMIO)) + setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA); + else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO)) + setup_force_cpu_bug(X86_BUG_MMIO_UNKNOWN); + }
if (!cpu_has(c, X86_FEATURE_BTC_NO)) { if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA))
From: Siddh Raman Pant code@siddh.me
stable inclusion from stable-v5.10.140 commit 6858933131d0dadac071c4d33335a9ea4b8e76cf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.
The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset.
loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset;
This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos);
Thus, check for negative value during loop_set_status_from_info().
Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29...
Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Siddh Raman Pant code@siddh.me Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/block/loop.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 9ee97e593347..76b96c42f417 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1031,6 +1031,11 @@ loop_set_status_from_info(struct loop_device *lo,
lo->lo_offset = info->lo_offset; lo->lo_sizelimit = info->lo_sizelimit; + + /* loff_t vars have been assigned __u64 */ + if (lo->lo_offset < 0 || lo->lo_sizelimit < 0) + return -EOVERFLOW; + memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE); memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE); lo->lo_file_name[LO_NAME_SIZE-1] = 0;
From: Quanyang Wang quanyang.wang@windriver.com
stable inclusion from stable-v5.10.140 commit c60ae878782db483918eca4b398a9c8f076aed06 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0c7d7cc2b4fe2e74ef8728f030f0f1674f9f6aee upstream.
There are two problems with the current code of memory_intersects:
First, it doesn't check whether the region (begin, end) falls inside the region (virt, vend), that is (virt < begin && vend > end).
The second problem is if vend is equal to begin, it will return true but this is wrong since vend (virt + size) is not the last address of the memory region but (virt + size -1) is. The wrong determination will trigger the misreporting when the function check_for_illegal_area calls memory_intersects to check if the dma region intersects with stext region.
The misreporting is as below (stext is at 0x80100000): WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536] Modules linked in: CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5 Hardware name: Xilinx Zynq Platform unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0xb0/0x198 __warn from warn_slowpath_fmt+0x80/0xb4 warn_slowpath_fmt from check_for_illegal_area+0x130/0x168 check_for_illegal_area from debug_dma_map_sg+0x94/0x368 debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128 __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24 dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4 usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214 usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118 usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70 usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360 usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440 usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238 usb_stor_control_thread from kthread+0xf8/0x104 kthread from ret_from_fork+0x14/0x2c
Refactor memory_intersects to fix the two problems above.
Before the 1d7db834a027e ("dma-debug: use memory_intersects() directly"), memory_intersects is called only by printk_late_init:
printk_late_init -> init_section_intersects ->memory_intersects.
There were few places where memory_intersects was called.
When commit 1d7db834a027e ("dma-debug: use memory_intersects() directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA subsystem uses it to check for an illegal area and the calltrace above is triggered.
[akpm@linux-foundation.org: fix nearby comment typo] Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.co... Fixes: 979559362516 ("asm/sections: add helpers to check for section data") Signed-off-by: Quanyang Wang quanyang.wang@windriver.com Cc: Ard Biesheuvel ardb@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Thierry Reding treding@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- include/asm-generic/sections.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index d16302d3eb59..72f1e2a8c167 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -114,7 +114,7 @@ static inline bool memory_contains(void *begin, void *end, void *virt, /** * memory_intersects - checks if the region occupied by an object intersects * with another memory region - * @begin: virtual address of the beginning of the memory regien + * @begin: virtual address of the beginning of the memory region * @end: virtual address of the end of the memory region * @virt: virtual address of the memory object * @size: size of the memory object @@ -127,7 +127,10 @@ static inline bool memory_intersects(void *begin, void *end, void *virt, { void *vend = virt + size;
- return (virt >= begin && virt < end) || (vend >= begin && vend < end); + if (virt < end && vend > begin) + return true; + + return false; }
/**
From: Brian Foster bfoster@redhat.com
stable inclusion from stable-v5.10.140 commit 297ae7e87a87a001dd3dfeac1cb26a42fd929708 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 13cccafe0edcd03bf1c841de8ab8a1c8e34f77d9 upstream.
The pointers for guarded storage and runtime instrumentation control blocks are stored in the thread_struct of the associated task. These pointers are initially copied on fork() via arch_dup_task_struct() and then cleared via copy_thread() before fork() returns. If fork() happens to fail after the initial task dup and before copy_thread(), the newly allocated task and associated thread_struct memory are freed via free_task() -> arch_release_task_struct(). This results in a double free of the guarded storage and runtime info structs because the fields in the failed task still refer to memory associated with the source task.
This problem can manifest as a BUG_ON() in set_freepointer() (with CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled) when running trinity syscall fuzz tests on s390x. To avoid this problem, clear the associated pointer fields in arch_dup_task_struct() immediately after the new task is copied. Note that the RI flag is still cleared in copy_thread() because it resides in thread stack memory and that is where stack info is copied.
Signed-off-by: Brian Foster bfoster@redhat.com Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling") Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling") Cc: stable@vger.kernel.org # 4.15 Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/s390/kernel/process.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c index ec801d3bbb37..137a170f47d4 100644 --- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -77,6 +77,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
memcpy(dst, src, arch_task_struct_size); dst->thread.fpu.regs = dst->thread.fpu.fprs; + + /* + * Don't transfer over the runtime instrumentation or the guarded + * storage control block pointers. These fields are cleared here instead + * of in copy_thread() to avoid premature freeing of associated memory + * on fork() failure. Wait to clear the RI flag because ->stack still + * refers to the source thread. + */ + dst->thread.ri_cb = NULL; + dst->thread.gs_cb = NULL; + dst->thread.gs_bc_cb = NULL; + return 0; }
@@ -134,13 +146,11 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp, frame->childregs.flags = 0; if (new_stackp) frame->childregs.gprs[15] = new_stackp; - - /* Don't copy runtime instrumentation info */ - p->thread.ri_cb = NULL; + /* + * Clear the runtime instrumentation flag after the above childregs + * copy. The CB pointer was already cleared in arch_dup_task_struct(). + */ frame->childregs.psw.mask &= ~PSW_MASK_RI; - /* Don't copy guarded storage control block */ - p->thread.gs_cb = NULL; - p->thread.gs_bc_cb = NULL;
/* Set a new TLS ? */ if (clone_flags & CLONE_SETTLS) {
From: Riwen Lu luriwen@kylinos.cn
stable inclusion from stable-v5.10.140 commit 8d5f8a4f25b1cb88e5a75e83e223e1197f9d734d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 36527b9d882362567ceb4eea8666813280f30e6f upstream.
The freq Qos request would be removed repeatedly if the cpufreq policy relates to more than one CPU. Then, it would cause the "called for unknown object" warning.
Remove the freq Qos request for each CPU relates to the cpufreq policy, instead of removing repeatedly for the last CPU of it.
Fixes: a1bb46c36ce3 ("ACPI: processor: Add QoS requests for all CPUs") Reported-by: Jeremy Linton Jeremy.Linton@arm.com Tested-by: Jeremy Linton jeremy.linton@arm.com Signed-off-by: Riwen Lu luriwen@kylinos.cn Cc: 5.4+ stable@vger.kernel.org # 5.4+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/acpi/processor_thermal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c index 6c7d05b37c98..7df0c6e3ba63 100644 --- a/drivers/acpi/processor_thermal.c +++ b/drivers/acpi/processor_thermal.c @@ -148,7 +148,7 @@ void acpi_thermal_cpufreq_exit(struct cpufreq_policy *policy) unsigned int cpu;
for_each_cpu(cpu, policy->related_cpus) { - struct acpi_processor *pr = per_cpu(processors, policy->cpu); + struct acpi_processor *pr = per_cpu(processors, cpu);
if (pr) freq_qos_remove_request(&pr->thermal_req);
From: Juergen Gross jgross@suse.com
stable inclusion from stable-v5.10.140 commit 6de50db104af0dc921f593fd95c55db86a52ceef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c5deb27895e017a0267de0a20d140ad5fcc55a54 upstream.
The error exit of privcmd_ioctl_dm_op() is calling unlock_pages() potentially with pages being NULL, leading to a NULL dereference.
Additionally lock_pages() doesn't check for pin_user_pages_fast() having been completely successful, resulting in potentially not locking all pages into memory. This could result in sporadic failures when using the related memory in user mode.
Fix all of that by calling unlock_pages() always with the real number of pinned pages, which will be zero in case pages being NULL, and by checking the number of pages pinned by pin_user_pages_fast() matching the expected number of pages.
Cc: stable@vger.kernel.org Fixes: ab520be8cd5d ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP") Reported-by: Rustam Subkhankulov subkhankulov@ispras.ru Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Reviewed-by: Oleksandr Tyshchenko oleksandr_tyshchenko@epam.com Link: https://lore.kernel.org/r/20220825141918.3581-1-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/xen/privcmd.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index fe8df32bb612..cd5f2f09468e 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -581,27 +581,30 @@ static int lock_pages( struct privcmd_dm_op_buf kbufs[], unsigned int num, struct page *pages[], unsigned int nr_pages, unsigned int *pinned) { - unsigned int i; + unsigned int i, off = 0;
- for (i = 0; i < num; i++) { + for (i = 0; i < num; ) { unsigned int requested; int page_count;
requested = DIV_ROUND_UP( offset_in_page(kbufs[i].uptr) + kbufs[i].size, - PAGE_SIZE); + PAGE_SIZE) - off; if (requested > nr_pages) return -ENOSPC;
page_count = pin_user_pages_fast( - (unsigned long) kbufs[i].uptr, + (unsigned long)kbufs[i].uptr + off * PAGE_SIZE, requested, FOLL_WRITE, pages); - if (page_count < 0) - return page_count; + if (page_count <= 0) + return page_count ? : -EFAULT;
*pinned += page_count; nr_pages -= page_count; pages += page_count; + + off = (requested == page_count) ? 0 : off + page_count; + i += !off; }
return 0; @@ -677,10 +680,8 @@ static long privcmd_ioctl_dm_op(struct file *file, void __user *udata) }
rc = lock_pages(kbufs, kdata.num, pages, nr_pages, &pinned); - if (rc < 0) { - nr_pages = pinned; + if (rc < 0) goto out; - }
for (i = 0; i < kdata.num; i++) { set_xen_guest_handle(xbufs[i].h, kbufs[i].uptr); @@ -692,7 +693,7 @@ static long privcmd_ioctl_dm_op(struct file *file, void __user *udata) xen_preemptible_hcall_end();
out: - unlock_pages(pages, nr_pages); + unlock_pages(pages, pinned); kfree(xbufs); kfree(pages); kfree(kbufs);
From: David Hildenbrand david@redhat.com
stable inclusion from stable-v5.10.140 commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream.
Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
I observed that hugetlb does not support/expect write-faults in shared mappings that would have to map the R/O-mapped page writable -- and I found two case where we could currently get such faults and would erroneously map an anon page into a shared mapping.
Reproducers part of the patches.
I propose to backport both fixes to stable trees. The first fix needs a small adjustment.
This patch (of 2):
Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable.
Looks like we were wrong:
-------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h>
#define HUGETLB_SIZE (2 * 1024 * 1024u)
static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret;
if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); }
int main(int argc, char **argv) { char *map; int fd;
fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; }
map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; }
*map = 0;
if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; }
clear_softdirty();
if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; }
*map = 0;
return 0; } --------------------------------------------------------------------------
Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped)
And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0
Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support softdirty tracking.
Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Peter Feiner pfeiner@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Pavel Emelyanov xemul@parallels.com Cc: Jamie Liu jamieliu@google.com Cc: Hugh Dickins hughd@google.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Bjorn Helgaas bhelgaas@google.com Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org [3.18+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- mm/mmap.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 07824ada1949..568bc1b68fae 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1952,8 +1952,12 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) pgprot_val(vm_pgprot_modify(vm_page_prot, vm_flags))) return 0;
- /* Do we need to track softdirty? */ - if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY)) + /* + * Do we need to track softdirty? hugetlb does not support softdirty + * tracking yet. + */ + if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY) && + !is_vm_hugetlb_page(vma)) return 1;
/* Specialty mapping? */
From: Guoqing Jiang guoqing.jiang@linux.dev
stable inclusion from stable-v5.10.140 commit f68f025c7e692f817d5f459253e6604bb1417977 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 1d258758cf06a0734482989911d184dd5837ed4e upstream.
This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it obviously breaks clustered raid as noticed by Neil though it fixed KASAN issue for dm-raid, let's revert it and fix KASAN issue in next commit.
[1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linu...
Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") Signed-off-by: Guoqing Jiang guoqing.jiang@linux.dev Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/md/md.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index abee20704bd5..9e6989c05314 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6278,11 +6278,11 @@ static void mddev_detach(struct mddev *mddev) static void __md_stop(struct mddev *mddev) { struct md_personality *pers = mddev->pers; + md_bitmap_destroy(mddev); mddev_detach(mddev); /* Ensure ->event_work is done */ if (mddev->event_work.func) flush_workqueue(md_misc_wq); - md_bitmap_destroy(mddev); spin_lock(&mddev->lock); mddev->pers = NULL; spin_unlock(&mddev->lock);
From: Guoqing Jiang guoqing.jiang@linux.dev
stable inclusion from stable-v5.10.140 commit a5a58fab556bfe618b4c9719eb85712d78c6cb10 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 upstream.
From the link [1], we can see raid1d was running even after the path raid_dtr -> md_stop -> __md_stop.
Let's stop write first in destructor to align with normal md-raid to fix the KASAN issue.
[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZce...
Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Reported-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Guoqing Jiang guoqing.jiang@linux.dev Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/md/md.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 9e6989c05314..8658e3df3b24 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6299,6 +6299,7 @@ void md_stop(struct mddev *mddev) /* stop the array and free an attached data structures. * This is called from dm-raid */ + __md_stop_writes(mddev); __md_stop(mddev); bioset_exit(&mddev->bio_set); bioset_exit(&mddev->sync_set);
From: Zenghui Yu yuzenghui@huawei.com
stable inclusion from stable-v5.10.140 commit 189623261994fe6e7416b3648f2c0d5f71e9e3b4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 5e1e087457c94ad7fafbe1cf6f774c6999ee29d4 upstream.
Since commit 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs"), we failed to detect erratum 1286807 on Cortex-A76 because its entry in arm64_repeat_tlbi_list[] was accidently corrupted by this commit.
Fix this issue by creating a separate entry for Kryo4xx Gold.
Fixes: 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs") Cc: Shreyas K K quic_shrekk@quicinc.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Acked-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220809043848.969-1-yuzenghui@huawei.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/arm64/kernel/cpu_errata.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index a4f258c83a1f..3e7f2c546e16 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -292,6 +292,8 @@ static const struct arm64_cpu_capabilities arm64_repeat_tlbi_list[] = { #ifdef CONFIG_ARM64_ERRATUM_1286807 { ERRATA_MIDR_RANGE(MIDR_CORTEX_A76, 0, 0, 3, 0), + }, + { /* Kryo4xx Gold (rcpe to rfpe) => (r0p0 to r3p0) */ ERRATA_MIDR_RANGE(MIDR_QCOM_KRYO_4XX_GOLD, 0xc, 0xe, 0xf, 0xe), },
From: Salvatore Bonaccorso carnil@debian.org
stable inclusion from stable-v5.10.140 commit 7ca73d0a16e382ae365d5fab7e2e47f753d18f35 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 00da0cb385d05a89226e150a102eb49d8abb0359 upstream.
While reporting for the AMD retbleed vulnerability was added in
6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")
the new sysfs file was not mentioned so far in the ABI documentation for sysfs-devices-system-cpu. Fix that.
Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") Signed-off-by: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Borislav Petkov bp@suse.de Link: https://lore.kernel.org/r/20220801091529.325327-1-carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + 1 file changed, 1 insertion(+)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 44c6e5730398..500d5d8937cb 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -511,6 +511,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/tsx_async_abort /sys/devices/system/cpu/vulnerabilities/itlb_multihit /sys/devices/system/cpu/vulnerabilities/mmio_stale_data + /sys/devices/system/cpu/vulnerabilities/retbleed Date: January 2018 Contact: Linux kernel mailing list linux-kernel@vger.kernel.org Description: Information about CPU vulnerabilities
From: James Clark james.clark@arm.com
stable inclusion from stable-v5.10.140 commit 5a768c977085e75eaa7471fcefe65c01a7c1569c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit bc9e7fe313d5e56d4d5f34bcc04d1165f94f86fb upstream.
The previous change to Python autodetection had a small mistake where the auto value was used to determine the Python binary, rather than the user supplied value. The Python binary is only used for one part of the build process, rather than the final linking, so it was producing correct builds in most scenarios, especially when the auto detected value matched what the user wanted, or the system only had a valid set of Pythons.
Change it so that the Python binary path is derived from either the PYTHON_CONFIG value or PYTHON value, depending on what is specified by the user. This was the original intention.
This error was spotted in a build failure an odd cross compilation environment after commit 4c41cb46a732fe82 ("perf python: Prefer python3") was merged.
Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime") Signed-off-by: James Clark james.clark@arm.com Acked-by: Ian Rogers irogers@google.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Ingo Molnar mingo@redhat.com Cc: James Clark james.clark@arm.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- tools/perf/Makefile.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index e3df1705df61..f195e079660f 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -254,7 +254,7 @@ endif # defined. get-executable-or-default fails with an error if the first argument is supplied but # doesn't exist. override PYTHON_CONFIG := $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON_AUTO)) -override PYTHON := $(call get-executable-or-default,PYTHON,$(subst -config,,$(PYTHON_AUTO))) +override PYTHON := $(call get-executable-or-default,PYTHON,$(subst -config,,$(PYTHON_CONFIG)))
grep-libs = $(filter -l%,$(1)) strip-libs = $(filter-out -l%,$(1))
From: Stephane Eranian eranian@google.com
stable inclusion from stable-v5.10.140 commit c0ba9aa95bf79aa49f1097ff1bb5884c8e08023c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream.
Existing code was generating bogus counts for the SNB IMC bandwidth counters:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/
The problem was introduced by commit: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function.
The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above.
The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/
Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian eranian@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Kan Liang kan.liang@linux.intel.com Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/x86/events/intel/uncore_snb.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c index f90b80957192..c477272225ce 100644 --- a/arch/x86/events/intel/uncore_snb.c +++ b/arch/x86/events/intel/uncore_snb.c @@ -657,6 +657,22 @@ int snb_pci2phy_map_init(int devid) return 0; }
+static u64 snb_uncore_imc_read_counter(struct intel_uncore_box *box, struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + + /* + * SNB IMC counters are 32-bit and are laid out back to back + * in MMIO space. Therefore we must use a 32-bit accessor function + * using readq() from uncore_mmio_read_counter() causes problems + * because it is reading 64-bit at a time. This is okay for the + * uncore_perf_event_update() function because it drops the upper + * 32-bits but not okay for plain uncore_read_counter() as invoked + * in uncore_pmu_event_start(). + */ + return (u64)readl(box->io_addr + hwc->event_base); +} + static struct pmu snb_uncore_imc_pmu = { .task_ctx_nr = perf_invalid_context, .event_init = snb_uncore_imc_event_init, @@ -676,7 +692,7 @@ static struct intel_uncore_ops snb_uncore_imc_ops = { .disable_event = snb_uncore_imc_disable_event, .enable_event = snb_uncore_imc_enable_event, .hw_config = snb_uncore_imc_hw_config, - .read_counter = uncore_mmio_read_counter, + .read_counter = snb_uncore_imc_read_counter, };
static struct intel_uncore_type snb_uncore_imc = {
From: Kiwoong Kim kwmad.kim@samsung.com
stable inclusion from stable-v5.10.140 commit 8d5c106fe216bf16080d7070c37adf56a9227e60 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 6d17a112e9a63ff6a5edffd1676b99e0ffbcd269 upstream.
Link lost is treated as fatal error with commit c99b9b230149 ("scsi: ufs: Treat link loss as fatal error"), but the event isn't registered as interrupt source. Enable it.
Link: https://lore.kernel.org/r/1659404551-160958-1-git-send-email-kwmad.kim@samsu... Fixes: c99b9b230149 ("scsi: ufs: Treat link loss as fatal error") Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Kiwoong Kim kwmad.kim@samsung.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/scsi/ufs/ufshci.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h index 1d999228efc8..e38094131811 100644 --- a/drivers/scsi/ufs/ufshci.h +++ b/drivers/scsi/ufs/ufshci.h @@ -129,11 +129,7 @@ enum {
#define UFSHCD_UIC_MASK (UIC_COMMAND_COMPL | UFSHCD_UIC_PWR_MASK)
-#define UFSHCD_ERROR_MASK (UIC_ERROR |\ - DEVICE_FATAL_ERROR |\ - CONTROLLER_FATAL_ERROR |\ - SYSTEM_BUS_FATAL_ERROR |\ - CRYPTO_ENGINE_FATAL_ERROR) +#define UFSHCD_ERROR_MASK (UIC_ERROR | INT_FATAL_ERRORS)
#define INT_FATAL_ERRORS (DEVICE_FATAL_ERROR |\ CONTROLLER_FATAL_ERROR |\
From: Saurabh Sengar ssengar@linux.microsoft.com
stable inclusion from stable-v5.10.140 commit 46fcb0fc884db78a0384be92cc2a51927e6581b8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream.
storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it doesn't need to make forward progress under memory pressure. Marking this workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following warning:
[ 14.506347] ------------[ cut here ]------------ [ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn [ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130 [ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu [ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun [ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130 <-snip-> [ 14.506408] Call Trace: [ 14.506412] __flush_work+0xf1/0x1c0 [ 14.506414] __cancel_work_timer+0x12f/0x1b0 [ 14.506417] ? kernfs_put+0xf0/0x190 [ 14.506418] cancel_delayed_work_sync+0x13/0x20 [ 14.506420] disk_block_events+0x78/0x80 [ 14.506421] del_gendisk+0x3d/0x2f0 [ 14.506423] sr_remove+0x28/0x70 [ 14.506427] device_release_driver_internal+0xef/0x1c0 [ 14.506428] device_release_driver+0x12/0x20 [ 14.506429] bus_remove_device+0xe1/0x150 [ 14.506431] device_del+0x167/0x380 [ 14.506432] __scsi_remove_device+0x11d/0x150 [ 14.506433] scsi_remove_device+0x26/0x40 [ 14.506434] storvsc_remove_lun+0x40/0x60 [ 14.506436] process_one_work+0x209/0x400 [ 14.506437] worker_thread+0x34/0x400 [ 14.506439] kthread+0x121/0x140 [ 14.506440] ? process_one_work+0x400/0x400 [ 14.506441] ? kthread_park+0x90/0x90 [ 14.506443] ret_from_fork+0x35/0x40 [ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]---
Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.mi... Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun") Reviewed-by: Michael Kelley mikelley@microsoft.com Signed-off-by: Saurabh Sengar ssengar@linux.microsoft.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/scsi/storvsc_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 0ee0b80006e0..7ac1090d4379 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1997,7 +1997,7 @@ static int storvsc_probe(struct hv_device *device, */ host_dev->handle_error_wq = alloc_ordered_workqueue("storvsc_error_wq_%d", - WQ_MEM_RECLAIM, + 0, host->host_no); if (!host_dev->handle_error_wq) { ret = -ENOMEM;