mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 51 participants
  • 18726 discussions
[PATCH OLK-5.10] ext4: flexibly control whether to enable dioread_nolock by default
by Yang Erkun 09 Aug '24

09 Aug '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAAPPE -------------------------------- After commit 244adf6426ee ("ext4: make dioread_nolock the default"), writeback for ext4 mounted with dioread_nolock will first start journal, then get a unwritten extent, change i_size and mark inode dirty. Besides, we won't call ext4_jbd2_inode_add_write since the extent is unwritten, so when jbd2 try do commit journal, we will not wait stable data. And combine with a poweroff before data writepage success, we will find a file with size has already been update but the content still keep zero since the extent is unwritten. This is really intolerable for some production. So we need give a choice to decided does we really need default enable dioread_nolock. Back to why we default enable dioread_nolock, the upper commit give some description, the most import problem is that dioread parallel with fault write(and writepage for fault write has alloc the block) will read some stale data. But the case dioread parallel with fault write is really rarely used, so it seems little impact now. We now give a more flexible way to control how to default enable dioread_nolock or not: - set CONFIG_EXT4_DIOREAD_NOLOCK_PARAM to N, still default enable dioread_nlock - set CONFIG_EXT4_DIOREAD_NOLOCK_PARAM to Y, default disable dioread_nolock, also we give a module param default_dioread_nolock to control it, it you want default enable dioread_nlock, set default_dioread_nolock to 1. Fixes: 244adf6426ee ("ext4: make dioread_nolock the default") Signed-off-by: Yang Erkun <yangerkun(a)huawei.com> --- fs/ext4/Kconfig | 8 ++++++++ fs/ext4/super.c | 19 +++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index cd7f1e90c237..2e1a5b0796ed 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -134,3 +134,11 @@ config EXT4_MITIGATION_FALSE_SHARING Enable this to mitigation cacheline false sharing in ext4 inode info. If unsure, say N. + +config EXT4_DIOREAD_NOLOCK_PARAM + bool "Ext4 default_dioread_nolock module param support" + depends on EXT4_FS + default n + help + Support to enable default_dioread_nolock module param, be attention to + that we default disable dioread_nolock with this config set to Y. diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 90f886184fc6..db64771c6e46 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -64,6 +64,20 @@ #define CREATE_TRACE_POINTS #include <trace/events/ext4.h> +#ifdef CONFIG_EXT4_DIOREAD_NOLOCK_PARAM +/* + * After 244adf6426ee ("ext4: make dioread_nolock the default"), we will enable + * dioread_nolock by default, but this options may lead data lose combine with + * poweroff(Since we may first update i_size, and then unwritten extent convert + * to written extent). For this case, we give a param to help control does we + * really default enable dioread_nolock and we default disable dioread_nolock, + * enable it with ext4.default_dioread_nolock=1 if you want. + */ +int default_dioread_nolock; +module_param_named(default_dioread_nolock, default_dioread_nolock, int, 0644); +MODULE_PARM_DESC(default_dioread_nolock, "Default enable dioread_nolock"); +#endif + static struct ext4_lazy_init *ext4_li_info; static struct mutex ext4_li_mtx; static struct ratelimit_state ext4_mount_msg_ratelimit; @@ -4292,8 +4306,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) blocksize = EXT4_MIN_BLOCK_SIZE << le32_to_cpu(es->s_log_block_size); +#ifdef CONFIG_EXT4_DIOREAD_NOLOCK_PARAM + if (blocksize == PAGE_SIZE && default_dioread_nolock) + set_opt(sb, DIOREAD_NOLOCK); +#else if (blocksize == PAGE_SIZE) set_opt(sb, DIOREAD_NOLOCK); +#endif if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV) { sbi->s_inode_size = EXT4_GOOD_OLD_INODE_SIZE; -- 2.39.2
2 1
0 0
[PATCH openEuler-22.03-LTS-SP1] wireguard: allowedips: avoid unaligned 64-bit memory accesses
by Zhengchao Shao 09 Aug '24

09 Aug '24
From: Helge Deller <deller(a)kernel.org> stable inclusion from stable-v5.10.222 commit ae630de24efb123d7199a43256396d7758f4cb75 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAILG8 CVE: CVE-2024-42247 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… --------------------------- commit 948f991c62a4018fb81d85804eeab3029c6209f8 upstream. On the parisc platform, the kernel issues kernel warnings because swap_endian() tries to load a 128-bit IPv6 address from an unaligned memory location: Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df) Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc) Avoid such unaligned memory accesses by instead using the get_unaligned_be64() helper macro. Signed-off-by: Helge Deller <deller(a)gmx.de> [Jason: replace src[8] in original patch with src+8] Cc: stable(a)vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com> Link: https://patch.msgid.link/20240704154517.1572127-3-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- drivers/net/wireguard/allowedips.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c index 5bf7822c53f1..04d7047167d0 100644 --- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -15,8 +15,8 @@ static void swap_endian(u8 *dst, const u8 *src, u8 bits) if (bits == 32) { *(u32 *)dst = be32_to_cpu(*(const __be32 *)src); } else if (bits == 128) { - ((u64 *)dst)[0] = be64_to_cpu(((const __be64 *)src)[0]); - ((u64 *)dst)[1] = be64_to_cpu(((const __be64 *)src)[1]); + ((u64 *)dst)[0] = get_unaligned_be64(src); + ((u64 *)dst)[1] = get_unaligned_be64(src + 8); } } -- 2.34.1
2 1
0 0
[PATCH OLK-5.10] wireguard: allowedips: avoid unaligned 64-bit memory accesses
by Zhengchao Shao 09 Aug '24

09 Aug '24
From: Helge Deller <deller(a)kernel.org> stable inclusion from stable-v5.10.222 commit ae630de24efb123d7199a43256396d7758f4cb75 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAILG8 CVE: CVE-2024-42247 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… --------------------------- commit 948f991c62a4018fb81d85804eeab3029c6209f8 upstream. On the parisc platform, the kernel issues kernel warnings because swap_endian() tries to load a 128-bit IPv6 address from an unaligned memory location: Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df) Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc) Avoid such unaligned memory accesses by instead using the get_unaligned_be64() helper macro. Signed-off-by: Helge Deller <deller(a)gmx.de> [Jason: replace src[8] in original patch with src+8] Cc: stable(a)vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com> Link: https://patch.msgid.link/20240704154517.1572127-3-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- drivers/net/wireguard/allowedips.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c index 0ba714ca5185..4b8528206cc8 100644 --- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -15,8 +15,8 @@ static void swap_endian(u8 *dst, const u8 *src, u8 bits) if (bits == 32) { *(u32 *)dst = be32_to_cpu(*(const __be32 *)src); } else if (bits == 128) { - ((u64 *)dst)[0] = be64_to_cpu(((const __be64 *)src)[0]); - ((u64 *)dst)[1] = be64_to_cpu(((const __be64 *)src)[1]); + ((u64 *)dst)[0] = get_unaligned_be64(src); + ((u64 *)dst)[1] = get_unaligned_be64(src + 8); } } -- 2.34.1
2 1
0 0
[openeuler:OLK-6.6] BUILD SUCCESS dc3e480568627c27f13878766588cc8c32213af5
by kernel test robot 09 Aug '24

09 Aug '24
tree/branch: https://gitee.com/openeuler/kernel.git OLK-6.6 branch HEAD: dc3e480568627c27f13878766588cc8c32213af5 !10816 sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime Warning ids grouped by kconfigs: recent_errors |-- arm64-allmodconfig | |-- arch-arm64-kvm-arm.c:warning:variable-r-is-used-uninitialized-whenever-if-condition-is-false | |-- arch-arm64-kvm-tmi.c:warning:no-previous-prototype-for-function-tmi_tmm_inf_test | |-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_create_ttt_levels | |-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_get_num_brps | |-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_get_num_wrps | |-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_ipa_limit | |-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_populate_par_region | |-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_supports_pmu | `-- arch-arm64-kvm-virtcca_cvm.c:warning:no-previous-prototype-for-function-kvm_cvm_supports_sve |-- arm64-randconfig-051-20240808 | |-- arch-arm64-boot-dts-freescale-imx8mp-beacon-kit.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-beacon-kit.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-beacon-kit.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-beacon-kit.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-data-modul-edm-sbc.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-data-modul-edm-sbc.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-data-modul-edm-sbc.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-data-modul-edm-sbc.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-model-a.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-model-a.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-model-a.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-model-a.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-som-a-bmb-.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-som-a-bmb-.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-som-a-bmb-.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-debix-som-a-bmb-.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk2.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk2.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk2.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk2.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk3.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk3.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk3.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-dhcom-pdk3.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-evk.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-evk.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-evk.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-evk.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-icore-mx8mp-edimm2..dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-icore-mx8mp-edimm2..dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-icore-mx8mp-edimm2..dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-icore-mx8mp-edimm2..dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-msc-sm2s-ep1.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-msc-sm2s-ep1.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-msc-sm2s-ep1.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-msc-sm2s-ep1.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-phyboard-pollux-rdk.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-phyboard-pollux-rdk.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-phyboard-pollux-rdk.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-phyboard-pollux-rdk.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw71xx-2x.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw71xx-2x.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw71xx-2x.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw71xx-2x.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw72xx-2x.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw72xx-2x.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw72xx-2x.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw72xx-2x.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw73xx-2x.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw73xx-2x.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw73xx-2x.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw73xx-2x.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw74xx.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw74xx.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw74xx.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw74xx.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw7905-2x.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw7905-2x.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw7905-2x.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-venice-gw7905-2x.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dahlia.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dahlia.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dahlia.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dahlia.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dev.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dev.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dev.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-dev.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-yavia.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-yavia.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-yavia.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-nonwifi-yavia.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dahlia.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dahlia.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dahlia.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dahlia.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dev.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dev.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dev.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-dev.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-yavia.dtb:blk-ctrl-32fc0000:clock-names:apb-axi-ref_266m-ref_24m-fdcc-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-yavia.dtb:blk-ctrl-32fc0000:clocks:is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-yavia.dtb:blk-ctrl-32fc0000:power-domain-names:bus-irqsteer-lcdif-pai-pvi-trng-hdmi-tx-hdmi-tx-phy-hdcp-hrv-is-too-long | |-- arch-arm64-boot-dts-freescale-imx8mp-verdin-wifi-yavia.dtb:blk-ctrl-32fc0000:power-domains:is-too-long | |-- arch-arm64-boot-dts-qcom-sa8775p-ride.dtb:ethernet:Unevaluated-properties-are-not-allowed-(-dma-coherent-mdio-phy-handle-phy-mode-power-domains-rx-fifo-depth-rx-queues-config-snps-mtl-rx-config-snps-m | `-- arch-arm64-boot-dts-qcom-sa8775p-ride.dtb:ethernet:Unevaluated-properties-are-not-allowed-(-dma-coherent-phy-handle-phy-mode-power-domains-rx-fifo-depth-rx-queues-config-snps-mtl-rx-config-snps-mtl-tx |-- loongarch-allmodconfig | `-- arch-loongarch-kvm-..-..-..-virt-kvm-kvm_main.c:warning:kvmalloc_array-sizes-specified-with-sizeof-in-the-earlier-argument-and-not-in-the-later-argument |-- loongarch-randconfig-001-20240809 | `-- drivers-char-virtio_console.c:warning:u-directive-output-may-be-truncated-writing-between-and-bytes-into-a-region-of-size-between-and `-- x86_64-allyesconfig `-- drivers-gpu-drm-amd-amdgpu-..-amdkfd-kfd_topology.c:warning:stack-frame-size-()-exceeds-limit-()-in-kfd_topology_add_device elapsed time: 731m configs tested: 14 configs skipped: 124 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: arm64 allmodconfig clang-20 arm64 allnoconfig gcc-14.1.0 arm64 randconfig-001-20240809 clang-20 arm64 randconfig-002-20240809 clang-15 arm64 randconfig-003-20240809 clang-15 arm64 randconfig-004-20240809 clang-20 loongarch allmodconfig gcc-14.1.0 loongarch allnoconfig gcc-14.1.0 loongarch randconfig-001-20240809 gcc-14.1.0 loongarch randconfig-002-20240809 gcc-14.1.0 x86_64 allnoconfig clang-18 x86_64 allyesconfig clang-18 x86_64 defconfig gcc-11 x86_64 rhel-8.3-rust clang-18 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
1 0
0 0
[PATCH 1/4] RDMA/hns: Register notifier block of bonding events in bond_grp
by Chengchang Tang 08 Aug '24

08 Aug '24
From: Junxian Huang <huangjunxian6(a)hisilicon.com> driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAH10J ---------------------------------------------------------------------- Currently the notifier block of bonding events is in the hr_dev structure. bond_grp is dynamic allocated in the event handler. Since all these hr_dev would response bonding events, we had to add complicated filter to choose a suitable hr_dev to handle the events. Besides we also had to concern about the validity of bond_grp pointers in many concurrency cases as they may have been freed. Refactor the bonding event handler by: 1. allocating/deallocating bond_grp structures when driver inits/exits; 2. registering notifier block of bonding events in bond_grp Signed-off-by: Junxian Huang <huangjunxian6(a)hisilicon.com> Signed-off-by: Xinghai Cen <cenxinghai(a)h-partners.com> --- drivers/infiniband/hw/hns/hns_roce_bond.c | 415 +++++++++++--------- drivers/infiniband/hw/hns/hns_roce_bond.h | 3 + drivers/infiniband/hw/hns/hns_roce_device.h | 1 - drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 9 +- drivers/infiniband/hw/hns/hns_roce_main.c | 2 +- 5 files changed, 247 insertions(+), 183 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.c b/drivers/infiniband/hw/hns/hns_roce_bond.c index 4b2b5538c918..c5b30d7dab67 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.c +++ b/drivers/infiniband/hw/hns/hns_roce_bond.c @@ -52,32 +52,6 @@ static int get_netdev_bond_slave_id(struct net_device *net_dev, return -ENOENT; } -static bool is_hrdev_bond_slave(struct hns_roce_dev *hr_dev, - struct net_device *upper_dev) -{ - struct hns_roce_bond_group *bond_grp; - struct net_device *net_dev; - u8 bus_num; - - if (!hr_dev || !upper_dev) - return false; - - if (!netif_is_lag_master(upper_dev)) - return false; - - net_dev = get_hr_netdev(hr_dev, 0); - bus_num = get_hr_bus_num(hr_dev); - - if (upper_dev == get_upper_dev_from_ndev(net_dev)) - return true; - - bond_grp = hns_roce_get_bond_grp(net_dev, bus_num); - if (bond_grp && upper_dev == bond_grp->upper_dev) - return true; - - return false; -} - struct hns_roce_bond_group *hns_roce_get_bond_grp(struct net_device *net_dev, u8 bus_num) { @@ -92,8 +66,10 @@ struct hns_roce_bond_group *hns_roce_get_bond_grp(struct net_device *net_dev, bond_grp = die_info->bgrps[i]; if (!bond_grp) continue; - if (get_netdev_bond_slave_id(net_dev, bond_grp) >= 0 || - (bond_grp->upper_dev == get_upper_dev_from_ndev(net_dev))) + if (get_netdev_bond_slave_id(net_dev, bond_grp) >= 0) + return bond_grp; + if (bond_grp->upper_dev && + bond_grp->upper_dev == get_upper_dev_from_ndev(net_dev)) return bond_grp; } @@ -107,8 +83,8 @@ bool hns_roce_bond_is_active(struct hns_roce_dev *hr_dev) u8 bus_num = get_hr_bus_num(hr_dev); bond_grp = hns_roce_get_bond_grp(net_dev, bus_num); - - if (bond_grp && bond_grp->bond_state != HNS_ROCE_BOND_NOT_BONDED) + if (bond_grp && bond_grp->bond_state != HNS_ROCE_BOND_NOT_BONDED && + bond_grp->bond_state != HNS_ROCE_BOND_NOT_ATTACHED) return true; return false; @@ -507,39 +483,6 @@ static void hns_roce_do_bond_work(struct work_struct *work) hns_roce_queue_bond_work(bond_grp, HZ); } -int hns_roce_bond_init(struct hns_roce_dev *hr_dev) -{ - struct net_device *net_dev = get_hr_netdev(hr_dev, 0); - struct hns_roce_v2_priv *priv = hr_dev->priv; - struct hns_roce_bond_group *bond_grp; - u8 bus_num = get_hr_bus_num(hr_dev); - int ret; - - bond_grp = hns_roce_get_bond_grp(net_dev, bus_num); - if (priv->handle->rinfo.reset_state == HNS_ROCE_STATE_RST_INIT && - bond_grp) { - bond_grp->main_hr_dev = hr_dev; - ret = hns_roce_recover_bond(bond_grp); - if (ret) { - ibdev_err(&hr_dev->ib_dev, - "failed to recover RoCE bond, ret = %d.\n", - ret); - return ret; - } - } - - hr_dev->bond_nb.notifier_call = hns_roce_bond_event; - ret = register_netdevice_notifier(&hr_dev->bond_nb); - if (ret) { - ibdev_err(&hr_dev->ib_dev, - "failed to register notifier for RoCE bond, ret = %d.\n", - ret); - hr_dev->bond_nb.notifier_call = NULL; - } - - return ret; -} - static struct hns_roce_die_info *alloc_die_info(int bus_num) { struct hns_roce_die_info *die_info; @@ -611,67 +554,116 @@ static int remove_bond_id(int bus_num, u8 bond_id) return 0; } -int hns_roce_cleanup_bond(struct hns_roce_bond_group *bond_grp) +static int hns_roce_alloc_bond_grp(struct hns_roce_dev *hr_dev) { - bool completion_no_waiter; + struct hns_roce_bond_group *bgrps[ROCE_BOND_NUM_MAX]; + struct hns_roce_bond_group *bond_grp; int ret; + int i; - ret = bond_grp->main_hr_dev ? - hns_roce_cmd_bond(bond_grp, HNS_ROCE_CLEAR_BOND) : -EIO; - if (ret) - BOND_ERR_LOG("failed to clear RoCE bond, ret = %d.\n", ret); - - cancel_delayed_work(&bond_grp->bond_work); - ret = remove_bond_id(bond_grp->bus_num, bond_grp->bond_id); - if (ret) - BOND_ERR_LOG("failed to remove bond id %u, ret = %d.\n", - bond_grp->bond_id, ret); + for (i = 0; i < ROCE_BOND_NUM_MAX; i++) { + bond_grp = kvzalloc(sizeof(*bond_grp), GFP_KERNEL); + if (!bond_grp) { + ret = -ENOMEM; + goto mem_err; + } - completion_no_waiter = completion_done(&bond_grp->bond_work_done); - complete(&bond_grp->bond_work_done); - mutex_destroy(&bond_grp->bond_mutex); - if (completion_no_waiter) - kfree(bond_grp); + mutex_init(&bond_grp->bond_mutex); + INIT_DELAYED_WORK(&bond_grp->bond_work, hns_roce_do_bond_work); + init_completion(&bond_grp->bond_work_done); - return ret; -} + bond_grp->bond_ready = false; + bond_grp->bond_state = HNS_ROCE_BOND_NOT_ATTACHED; + bond_grp->bus_num = get_hr_bus_num(hr_dev); -static bool hns_roce_bond_lowerstate_event(struct hns_roce_dev *hr_dev, - struct hns_roce_bond_group *bond_grp, - struct netdev_notifier_changelowerstate_info *info) -{ - struct net_device *net_dev = - netdev_notifier_info_to_dev((struct netdev_notifier_info *)info); + ret = alloc_bond_id(bond_grp); + if (ret) { + ibdev_err(&hr_dev->ib_dev, + "failed to alloc bond ID, ret = %d.\n", ret); + goto alloc_id_err; + } - if (!netif_is_lag_port(net_dev) || - (!bond_grp || hr_dev != bond_grp->main_hr_dev)) - return false; + bond_grp->bond_nb.notifier_call = hns_roce_bond_event; + ret = register_netdevice_notifier(&bond_grp->bond_nb); + if (ret) { + ibdev_err(&hr_dev->ib_dev, + "failed to register bond nb, ret = %d.\n", ret); + goto register_nb_err; + } + bgrps[i] = bond_grp; + } - mutex_lock(&bond_grp->bond_mutex); + return 0; - if (bond_grp->bond_ready && - bond_grp->bond_state == HNS_ROCE_BOND_IS_BONDED) - bond_grp->bond_state = HNS_ROCE_BOND_SLAVE_CHANGESTATE; +register_nb_err: + remove_bond_id(bond_grp->bus_num, bond_grp->bond_id); +alloc_id_err: + mutex_destroy(&bond_grp->bond_mutex); + kvfree(bond_grp); +mem_err: + for (i--; i >= 0; i--) { + unregister_netdevice_notifier(&bgrps[i]->bond_nb); + cancel_delayed_work_sync(&bgrps[i]->bond_work); + complete(&bgrps[i]->bond_work_done); + remove_bond_id(bgrps[i]->bus_num, bgrps[i]->bond_id); + mutex_destroy(&bgrps[i]->bond_mutex); + kvfree(bgrps[i]); + } + return ret; +} - mutex_unlock(&bond_grp->bond_mutex); +void hns_roce_dealloc_bond_grp(void) +{ + struct hns_roce_bond_group *bond_grp; + struct hns_roce_die_info *die_info; + unsigned long id; + int i; - return true; + xa_for_each(&roce_bond_xa, id, die_info) { + for (i = 0; i < ROCE_BOND_NUM_MAX; i++) { + bond_grp = die_info->bgrps[i]; + if (!bond_grp) + continue; + unregister_netdevice_notifier(&bond_grp->bond_nb); + cancel_delayed_work_sync(&bond_grp->bond_work); + remove_bond_id(bond_grp->bus_num, bond_grp->bond_id); + mutex_destroy(&bond_grp->bond_mutex); + kvfree(bond_grp); + } + } } -static bool is_bond_setting_supported(struct netdev_lag_upper_info *bond_info) +int hns_roce_bond_init(struct hns_roce_dev *hr_dev) { - if (!bond_info) - return false; + struct net_device *net_dev = get_hr_netdev(hr_dev, 0); + struct hns_roce_v2_priv *priv = hr_dev->priv; + struct hns_roce_bond_group *bond_grp; + u8 bus_num = get_hr_bus_num(hr_dev); + int ret = 0; - if (bond_info->tx_type != NETDEV_LAG_TX_TYPE_ACTIVEBACKUP && - bond_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) - return false; + if (priv->handle->rinfo.reset_state == HNS_ROCE_STATE_RST_INIT) { + bond_grp = hns_roce_get_bond_grp(net_dev, bus_num); + if (!bond_grp) + return 0; - if (bond_info->tx_type == NETDEV_LAG_TX_TYPE_HASH && - bond_info->hash_type > NETDEV_LAG_HASH_L23) - return false; + bond_grp->main_hr_dev = hr_dev; + ret = hns_roce_recover_bond(bond_grp); + if (ret) + ibdev_err(&hr_dev->ib_dev, + "failed to recover RoCE bond, ret = %d.\n", + ret); + return ret; + } - return true; + if (!xa_load(&roce_bond_xa, bus_num)) { + ret = hns_roce_alloc_bond_grp(hr_dev); + if (ret) + ibdev_err(&hr_dev->ib_dev, + "failed to alloc RoCE bond, ret = %d.\n", + ret); + } + + return ret; } static void hns_roce_bond_info_update(struct hns_roce_bond_group *bond_grp, @@ -716,6 +708,80 @@ static void hns_roce_bond_info_update(struct hns_roce_bond_group *bond_grp, rcu_read_unlock(); } +static void hns_roce_attach_bond_grp(struct hns_roce_bond_group *bond_grp, + struct hns_roce_dev *hr_dev, + struct net_device *upper_dev) +{ + bond_grp->upper_dev = upper_dev; + bond_grp->main_hr_dev = hr_dev; + bond_grp->bond_state = HNS_ROCE_BOND_NOT_BONDED; + bond_grp->bond_ready = false; + hns_roce_bond_info_update(bond_grp, upper_dev, true); +} + +static void hns_roce_detach_bond_grp(struct hns_roce_bond_group *bond_grp) +{ + cancel_delayed_work(&bond_grp->bond_work); + bond_grp->upper_dev = NULL; + bond_grp->main_hr_dev = NULL; + bond_grp->bond_ready = false; + bond_grp->bond_state = HNS_ROCE_BOND_NOT_ATTACHED; + bond_grp->slave_map = 0; + bond_grp->slave_map_diff = 0; + memset(bond_grp->bond_func_info, 0, sizeof(bond_grp->bond_func_info)); +} + +int hns_roce_cleanup_bond(struct hns_roce_bond_group *bond_grp) +{ + int ret; + + ret = bond_grp->main_hr_dev ? + hns_roce_cmd_bond(bond_grp, HNS_ROCE_CLEAR_BOND) : -EIO; + if (ret) + BOND_ERR_LOG("failed to clear RoCE bond, ret = %d.\n", ret); + + hns_roce_detach_bond_grp(bond_grp); + complete(&bond_grp->bond_work_done); + + return ret; +} + +static bool hns_roce_bond_lowerstate_event(struct hns_roce_bond_group *bond_grp, + struct netdev_notifier_changelowerstate_info *info) +{ + struct net_device *net_dev = + netdev_notifier_info_to_dev((struct netdev_notifier_info *)info); + + if (!netif_is_lag_port(net_dev)) + return false; + + mutex_lock(&bond_grp->bond_mutex); + + if (bond_grp->bond_ready && + bond_grp->bond_state == HNS_ROCE_BOND_IS_BONDED) + bond_grp->bond_state = HNS_ROCE_BOND_SLAVE_CHANGESTATE; + + mutex_unlock(&bond_grp->bond_mutex); + + return true; +} + +static bool is_bond_setting_supported(struct netdev_lag_upper_info *bond_info) +{ + if (!bond_info) + return false; + + if (bond_info->tx_type != NETDEV_LAG_TX_TYPE_ACTIVEBACKUP && + bond_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) + return false; + + if (bond_info->tx_type == NETDEV_LAG_TX_TYPE_HASH && + bond_info->hash_type > NETDEV_LAG_HASH_L23) + return false; + + return true; +} + static bool hns_roce_bond_upper_event(struct hns_roce_bond_group *bond_grp, struct netdev_notifier_changeupper_info *info) { @@ -755,44 +821,8 @@ static bool hns_roce_bond_upper_event(struct hns_roce_bond_group *bond_grp, return changed; } -static struct hns_roce_bond_group *hns_roce_alloc_bond_grp(struct hns_roce_dev *main_hr_dev, - struct net_device *upper_dev) -{ - struct hns_roce_bond_group *bond_grp; - int ret; - - bond_grp = kzalloc(sizeof(*bond_grp), GFP_KERNEL); - if (!bond_grp) - return NULL; - - mutex_init(&bond_grp->bond_mutex); - - INIT_DELAYED_WORK(&bond_grp->bond_work, hns_roce_do_bond_work); - - init_completion(&bond_grp->bond_work_done); - - bond_grp->upper_dev = upper_dev; - bond_grp->main_hr_dev = main_hr_dev; - bond_grp->bond_ready = false; - bond_grp->bond_state = HNS_ROCE_BOND_NOT_BONDED; - bond_grp->bus_num = main_hr_dev->pci_dev->bus->number; - - ret = alloc_bond_id(bond_grp); - if (ret) { - ibdev_err(&main_hr_dev->ib_dev, - "failed to alloc bond ID, ret = %d.\n", ret); - mutex_destroy(&bond_grp->bond_mutex); - kfree(bond_grp); - return NULL; - } - - hns_roce_bond_info_update(bond_grp, upper_dev, true); - - return bond_grp; -} - static bool is_dev_bond_supported(struct hns_roce_bond_group *bond_grp, - struct net_device *net_dev, int bus_num) + struct net_device *net_dev) { struct hns_roce_dev *hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); @@ -810,7 +840,7 @@ static bool is_dev_bond_supported(struct hns_roce_bond_group *bond_grp, if (hr_dev->is_vf || pci_num_vf(hr_dev->pci_dev) > 0) return false; - if (bus_num != get_hr_bus_num(hr_dev)) + if (bond_grp->bus_num != get_hr_bus_num(hr_dev)) return false; return true; @@ -833,8 +863,7 @@ static bool check_unlinking_bond_support(struct hns_roce_bond_group *bond_grp) static bool check_linking_bond_support(struct netdev_lag_upper_info *bond_info, struct hns_roce_bond_group *bond_grp, - struct net_device *upper_dev, - int bus_num) + struct net_device *upper_dev) { struct net_device *net_dev; u8 slave_num = 0; @@ -844,7 +873,7 @@ static bool check_linking_bond_support(struct netdev_lag_upper_info *bond_info, rcu_read_lock(); for_each_netdev_in_bond_rcu(upper_dev, net_dev) { - if (is_dev_bond_supported(bond_grp, net_dev, bus_num)) { + if (is_dev_bond_supported(bond_grp, net_dev)) { slave_num++; } else { rcu_read_unlock(); @@ -857,19 +886,14 @@ static bool check_linking_bond_support(struct netdev_lag_upper_info *bond_info, } static enum bond_support_type - check_bond_support(struct hns_roce_dev *hr_dev, - struct net_device **upper_dev, + check_bond_support(struct hns_roce_bond_group *bond_grp, + struct net_device *upper_dev, struct netdev_notifier_changeupper_info *info) { - struct net_device *net_dev = get_hr_netdev(hr_dev, 0); - struct hns_roce_bond_group *bond_grp; - int bus_num = get_hr_bus_num(hr_dev); bool bond_grp_exist = false; bool support; - *upper_dev = info->upper_dev; - bond_grp = hns_roce_get_bond_grp(net_dev, bus_num); - if (bond_grp && *upper_dev == bond_grp->upper_dev) + if (upper_dev == bond_grp->upper_dev) bond_grp_exist = true; if (!info->linking && !bond_grp_exist) @@ -877,7 +901,7 @@ static enum bond_support_type if (info->linking) support = check_linking_bond_support(info->upper_info, bond_grp, - *upper_dev, bus_num); + upper_dev); else support = check_unlinking_bond_support(bond_grp); @@ -887,16 +911,56 @@ static enum bond_support_type return bond_grp_exist ? BOND_EXISTING_NOT_SUPPORT : BOND_NOT_SUPPORT; } +static bool upper_event_filter(struct netdev_notifier_changeupper_info *info, + struct hns_roce_bond_group *bond_grp, + struct net_device *net_dev) +{ + struct net_device *upper_dev = info->upper_dev; + struct hns_roce_bond_group *bond_grp_tmp; + struct hns_roce_dev *hr_dev; + u8 bus_num; + + if (!info->linking) + return bond_grp->upper_dev == upper_dev; + + hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); + if (!hr_dev) + return false; + + bus_num = get_hr_bus_num(hr_dev); + if (bond_grp->bus_num != bus_num) + return false; + + bond_grp_tmp = hns_roce_get_bond_grp(net_dev, bus_num); + if (bond_grp_tmp && bond_grp_tmp != bond_grp) + return false; + + if (bond_grp->bond_state != HNS_ROCE_BOND_NOT_ATTACHED && + bond_grp->upper_dev != upper_dev) + return false; + + return true; +} + +static bool lowerstate_event_filter(struct hns_roce_bond_group *bond_grp, + struct net_device *net_dev) +{ + struct hns_roce_bond_group *bond_grp_tmp; + + bond_grp_tmp = hns_roce_get_bond_grp(net_dev, bond_grp->bus_num); + return bond_grp_tmp == bond_grp; +} + int hns_roce_bond_event(struct notifier_block *self, unsigned long event, void *ptr) { + struct hns_roce_bond_group *bond_grp = + container_of(self, struct hns_roce_bond_group, bond_nb); struct net_device *net_dev = netdev_notifier_info_to_dev(ptr); - struct hns_roce_dev *hr_dev = - container_of(self, struct hns_roce_dev, bond_nb); + struct netdev_notifier_changeupper_info *info; enum bond_support_type support = BOND_SUPPORT; - struct hns_roce_bond_group *bond_grp; - u8 bus_num = get_hr_bus_num(hr_dev); struct net_device *upper_dev; + struct hns_roce_dev *hr_dev; bool changed; int slave_id; @@ -904,30 +968,27 @@ int hns_roce_bond_event(struct notifier_block *self, return NOTIFY_DONE; if (event == NETDEV_CHANGEUPPER) { - support = check_bond_support(hr_dev, &upper_dev, ptr); + if (!upper_event_filter(ptr, bond_grp, net_dev)) + return NOTIFY_DONE; + info = (struct netdev_notifier_changeupper_info *)ptr; + upper_dev = info->upper_dev; + support = check_bond_support(bond_grp, upper_dev, ptr); if (support == BOND_NOT_SUPPORT) return NOTIFY_DONE; } else { + if (!lowerstate_event_filter(bond_grp, net_dev)) + return NOTIFY_DONE; upper_dev = get_upper_dev_from_ndev(net_dev); } - if (upper_dev && !is_hrdev_bond_slave(hr_dev, upper_dev)) - return NOTIFY_DONE; - else if (!upper_dev && hr_dev != hns_roce_get_hrdev_by_netdev(net_dev)) - return NOTIFY_DONE; - - bond_grp = hns_roce_get_bond_grp(get_hr_netdev(hr_dev, 0), bus_num); if (event == NETDEV_CHANGEUPPER) { - if (!bond_grp) { - bond_grp = hns_roce_alloc_bond_grp(hr_dev, upper_dev); - if (!bond_grp) { - ibdev_err(&hr_dev->ib_dev, - "failed to alloc RoCE bond_grp!\n"); + if (bond_grp->bond_state == HNS_ROCE_BOND_NOT_ATTACHED) { + hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); + if (!hr_dev) return NOTIFY_DONE; - } - } else if (hr_dev != bond_grp->main_hr_dev) { - return NOTIFY_DONE; + hns_roce_attach_bond_grp(bond_grp, hr_dev, upper_dev); } + /* In the case of netdev being unregistered, the roce * instance shouldn't be inited. */ @@ -944,7 +1005,7 @@ int hns_roce_bond_event(struct notifier_block *self, } changed = hns_roce_bond_upper_event(bond_grp, ptr); } else { - changed = hns_roce_bond_lowerstate_event(hr_dev, bond_grp, ptr); + changed = hns_roce_bond_lowerstate_event(bond_grp, ptr); } if (changed) hns_roce_queue_bond_work(bond_grp, HZ); diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.h b/drivers/infiniband/hw/hns/hns_roce_bond.h index 75c9d670de7c..ba82ec12e4e5 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.h +++ b/drivers/infiniband/hw/hns/hns_roce_bond.h @@ -33,6 +33,7 @@ enum bond_support_type { }; enum hns_roce_bond_state { + HNS_ROCE_BOND_NOT_ATTACHED, HNS_ROCE_BOND_NOT_BONDED, HNS_ROCE_BOND_IS_BONDED, HNS_ROCE_BOND_REGISTERING, @@ -72,6 +73,7 @@ struct hns_roce_bond_group { struct hns_roce_func_info bond_func_info[ROCE_BOND_FUNC_MAX]; struct delayed_work bond_work; struct completion bond_work_done; + struct notifier_block bond_nb; }; struct hns_roce_die_info { @@ -88,5 +90,6 @@ struct net_device *hns_roce_get_bond_netdev(struct hns_roce_dev *hr_dev); struct hns_roce_bond_group *hns_roce_get_bond_grp(struct net_device *net_dev, u8 bus_num); bool is_bond_slave_in_reset(struct hns_roce_bond_group *bond_grp); +void hns_roce_dealloc_bond_grp(void); #endif diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 0f1b87f8c75b..8add4742c094 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -1154,7 +1154,6 @@ struct hns_roce_dev { struct hns_roce_dev_debugfs dbgfs; atomic64_t *dfx_cnt; struct hns_roce_scc_param *scc_param; - struct notifier_block bond_nb; struct list_head mtr_unfree_list; /* list of unfree mtr on this dev */ struct mutex mtr_unfree_list_mutex; /* protect mtr_unfree_list */ diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 2568609edb60..54ea073db0d0 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -2388,6 +2388,9 @@ static int hns_roce_query_caps(struct hns_roce_dev *hr_dev) caps->flags |= le16_to_cpu(resp_d->cap_flags_ex) << HNS_ROCE_CAP_FLAGS_EX_SHIFT; + if (hr_dev->is_vf) + caps->flags &= ~HNS_ROCE_CAP_FLAG_BOND; + caps->num_cqs = 1 << hr_reg_read(resp_c, PF_CAPS_C_NUM_CQS); caps->gid_table_len[0] = hr_reg_read(resp_c, PF_CAPS_C_MAX_GID); caps->max_cqes = 1 << hr_reg_read(resp_c, PF_CAPS_C_CQ_DEPTH); @@ -7460,11 +7463,8 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle, if (handle->rinfo.instance_state == HNS_ROCE_STATE_BOND_UNINIT) { bond_grp = hns_roce_get_bond_grp(handle->rinfo.netdev, handle->pdev->bus->number); - if (bond_grp) { + if (bond_grp) wait_for_completion(&bond_grp->bond_work_done); - if (bond_grp->bond_state == HNS_ROCE_BOND_NOT_BONDED) - kfree(bond_grp); - } } if (handle->rinfo.instance_state != HNS_ROCE_STATE_INITED) @@ -7705,6 +7705,7 @@ static int __init hns_roce_hw_v2_init(void) static void __exit hns_roce_hw_v2_exit(void) { + hns_roce_dealloc_bond_grp(); hnae3_unregister_client(&hns_roce_hw_v2_client); hns_roce_cleanup_debugfs(); } diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index 3628dbb250ab..725f2f08cb0b 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -856,7 +856,6 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev, if (!(hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_BOND)) goto normal_unregister; - unregister_netdevice_notifier(&hr_dev->bond_nb); bond_grp = hns_roce_get_bond_grp(net_dev, bus_num); if (!bond_grp) goto normal_unregister; @@ -866,6 +865,7 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev, * is unregistered, re-initialized the remaining slaves before * the bond resources cleanup. */ + cancel_delayed_work_sync(&bond_grp->bond_work); bond_grp->bond_state = HNS_ROCE_BOND_NOT_BONDED; for (i = 0; i < ROCE_BOND_FUNC_MAX; i++) { net_dev = bond_grp->bond_func_info[i].net_dev; -- 2.33.0
1 3
0 0
[PATCH openEuler-22.03-LTS-SP1] netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
by Wang Hai 08 Aug '24

08 Aug '24
From: Pablo Neira Ayuso <pablo(a)netfilter.org> stable inclusion from stable-v5.10.221 commit 5d43d789b57943720dca4181a05f6477362b94cf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEO4 CVE: CVE-2024-42070 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 7931d32955e09d0a11b1fe0b6aac1bfa061c005c ] register store validation for NFT_DATA_VALUE is conditional, however, the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This only requires a new helper function to infer the register type from the set datatype so this conditional check can be removed. Otherwise, pointer to chain object can be leaked through the registers. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Linus Torvalds <torvalds(a)linuxfoundation.org> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: include/net/netfilter/nf_tables.h [just context conflict] Signed-off-by: Wang Hai <wanghai38(a)huawei.com> --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nf_tables_api.c | 8 ++++---- net/netfilter/nft_lookup.c | 3 ++- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 7550328080bf..6ab0312e3f28 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -475,6 +475,11 @@ static inline void *nft_set_priv(const struct nft_set *set) return (void *)set->data; } +static inline enum nft_data_types nft_set_datatype(const struct nft_set *set) +{ + return set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE; +} + static inline struct nft_set *nft_set_container_of(const void *priv) { return (void *)priv - offsetof(struct nft_set, data); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 990fdecca0d2..f3492376bf74 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4896,8 +4896,7 @@ static int nf_tables_fill_setelem(struct sk_buff *skb, if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && nft_data_dump(skb, NFTA_SET_ELEM_DATA, nft_set_ext_data(ext), - set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE, - set->dlen) < 0) + nft_set_datatype(set), set->dlen) < 0) goto nla_put_failure; if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR) && @@ -8841,6 +8840,9 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, return 0; default: + if (type != NFT_DATA_VALUE) + return -EINVAL; + if (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE) return -EINVAL; if (len == 0) @@ -8849,8 +8851,6 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, sizeof_field(struct nft_regs, data)) return -ERANGE; - if (data != NULL && type != NFT_DATA_VALUE) - return -EINVAL; return 0; } } diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 8bc008ff00cb..d2f8131edaf1 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -101,7 +101,8 @@ static int nft_lookup_init(const struct nft_ctx *ctx, return -EINVAL; err = nft_parse_register_store(ctx, tb[NFTA_LOOKUP_DREG], - &priv->dreg, NULL, set->dtype, + &priv->dreg, NULL, + nft_set_datatype(set), set->dlen); if (err < 0) return err; -- 2.17.1
2 1
0 0
[PATCH OLK-5.10] netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
by Wang Hai 08 Aug '24

08 Aug '24
From: Pablo Neira Ayuso <pablo(a)netfilter.org> stable inclusion from stable-v5.10.221 commit 5d43d789b57943720dca4181a05f6477362b94cf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEO4 CVE: CVE-2024-42070 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 7931d32955e09d0a11b1fe0b6aac1bfa061c005c ] register store validation for NFT_DATA_VALUE is conditional, however, the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This only requires a new helper function to infer the register type from the set datatype so this conditional check can be removed. Otherwise, pointer to chain object can be leaked through the registers. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Linus Torvalds <torvalds(a)linuxfoundation.org> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Wang Hai <wanghai38(a)huawei.com> --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nf_tables_api.c | 8 ++++---- net/netfilter/nft_lookup.c | 3 ++- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 5e3529c73e61..17659138e15a 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -504,6 +504,11 @@ static inline void *nft_set_priv(const struct nft_set *set) return (void *)set->data; } +static inline enum nft_data_types nft_set_datatype(const struct nft_set *set) +{ + return set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE; +} + static inline bool nft_set_gc_is_pending(const struct nft_set *s) { return refcount_read(&s->refs) != 1; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 79fe3743502b..f990851709a3 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -5019,8 +5019,7 @@ static int nf_tables_fill_setelem(struct sk_buff *skb, if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && nft_data_dump(skb, NFTA_SET_ELEM_DATA, nft_set_ext_data(ext), - set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE, - set->dlen) < 0) + nft_set_datatype(set), set->dlen) < 0) goto nla_put_failure; if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR) && @@ -9277,6 +9276,9 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, return 0; default: + if (type != NFT_DATA_VALUE) + return -EINVAL; + if (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE) return -EINVAL; if (len == 0) @@ -9285,8 +9287,6 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, sizeof_field(struct nft_regs, data)) return -ERANGE; - if (data != NULL && type != NFT_DATA_VALUE) - return -EINVAL; return 0; } } diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index c2dd1d3ac328..f6ea1b32dae1 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -101,7 +101,8 @@ static int nft_lookup_init(const struct nft_ctx *ctx, return -EINVAL; err = nft_parse_register_store(ctx, tb[NFTA_LOOKUP_DREG], - &priv->dreg, NULL, set->dtype, + &priv->dreg, NULL, + nft_set_datatype(set), set->dlen); if (err < 0) return err; -- 2.17.1
2 1
0 0
[PATCH openEuler-1.0-LTS] netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
by Wang Hai 08 Aug '24

08 Aug '24
From: Pablo Neira Ayuso <pablo(a)netfilter.org> stable inclusion from stable-v4.19.317 commit 40188a25a9847dbeb7ec67517174a835a677752f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ CVE: CVE-2024-42070 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 7931d32955e09d0a11b1fe0b6aac1bfa061c005c ] register store validation for NFT_DATA_VALUE is conditional, however, the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This only requires a new helper function to infer the register type from the set datatype so this conditional check can be removed. Otherwise, pointer to chain object can be leaked through the registers. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Linus Torvalds <torvalds(a)linuxfoundation.org> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: include/net/netfilter/nf_tables.h net/netfilter/nft_lookup.c [nf_tables.h conflicts due to context, nft_lookup.c conflicts because e1c59f90e1a6 ("netfilter: nftables: add nft_parse_register_store() and use it") is not merged.] Signed-off-by: Wang Hai <wanghai38(a)huawei.com> --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nf_tables_api.c | 8 ++++---- net/netfilter/nft_lookup.c | 2 +- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index ecc301ee1556..247e4637b458 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -438,6 +438,11 @@ static inline void *nft_set_priv(const struct nft_set *set) return (void *)set->data; } +static inline enum nft_data_types nft_set_datatype(const struct nft_set *set) +{ + return set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE; +} + static inline struct nft_set *nft_set_container_of(const void *priv) { return (void *)priv - offsetof(struct nft_set, data); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 97e56cadb4bf..133cff3a3f77 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3961,8 +3961,7 @@ static int nf_tables_fill_setelem(struct sk_buff *skb, if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && nft_data_dump(skb, NFTA_SET_ELEM_DATA, nft_set_ext_data(ext), - set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE, - set->dlen) < 0) + nft_set_datatype(set), set->dlen) < 0) goto nla_put_failure; if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR) && @@ -7174,6 +7173,9 @@ int nft_validate_register_store(const struct nft_ctx *ctx, return 0; default: + if (type != NFT_DATA_VALUE) + return -EINVAL; + if (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE) return -EINVAL; if (len == 0) @@ -7182,8 +7184,6 @@ int nft_validate_register_store(const struct nft_ctx *ctx, FIELD_SIZEOF(struct nft_regs, data)) return -ERANGE; - if (data != NULL && type != NFT_DATA_VALUE) - return -EINVAL; return 0; } } diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index cb9e937a5ce0..0d7a21ec9b11 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -102,7 +102,7 @@ static int nft_lookup_init(const struct nft_ctx *ctx, priv->dreg = nft_parse_register(tb[NFTA_LOOKUP_DREG]); err = nft_validate_register_store(ctx, priv->dreg, NULL, - set->dtype, set->dlen); + nft_set_datatype(set), set->dlen); if (err < 0) return err; } else if (set->flags & NFT_SET_MAP) -- 2.17.1
2 1
0 0
[PATCH openEuler-22.03-LTS-SP1] netfilter: nf_tables: prefer nft_chain_validate
by Wang Hai 08 Aug '24

08 Aug '24
From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.10 commit cff3bd012a9512ac5ed858d38e6ed65f6391008c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEMR CVE: CVE-2024-41042 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- nft_chain_validate already performs loop detection because a cycle will result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE). It also follows maps via ->validate callback in nft_lookup, so there appears no reason to iterate the maps again. nf_tables_check_loops() and all its helper functions can be removed. This improves ruleset load time significantly, from 23s down to 12s. This also fixes a crash bug. Old loop detection code can result in unbounded recursion: BUG: TASK stack guard page was hit at .... Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1 [..] with a suitable ruleset during validation of register stores. I can't see any actual reason to attempt to check for this from nft_validate_register_store(), at this point the transaction is still in progress, so we don't have a full picture of the rule graph. For nf-next it might make sense to either remove it or make this depend on table->validate_state in case we could catch an error earlier (for improved error reporting to userspace). Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_tables_api.c [Because the deleted function had some patches that were not merged in, resulting in conflicts] Signed-off-by: Wang Hai <wanghai38(a)huawei.com> --- net/netfilter/nf_tables_api.c | 115 ++++------------------------------ 1 file changed, 13 insertions(+), 102 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 6105149b9e1a..990fdecca0d2 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3288,6 +3288,15 @@ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *r nf_tables_rule_destroy(ctx, rule); } +/** nft_chain_validate - loop detection and hook validation + * + * @ctx: context containing call depth and base chain + * @chain: chain to validate + * + * Walk through the rules of the given chain and chase all jumps/gotos + * and set lookups until either the jump limit is hit or all reachable + * chains have been validated. + */ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) { struct nft_expr *expr, *last; @@ -3306,6 +3315,9 @@ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) if (!expr->ops->validate) continue; + /* This may call nft_chain_validate() recursively, + * callers that do so must increment ctx->level. + */ err = expr->ops->validate(ctx, expr, &data); if (err < 0) return err; @@ -8677,107 +8689,6 @@ int nft_chain_validate_hooks(const struct nft_chain *chain, } EXPORT_SYMBOL_GPL(nft_chain_validate_hooks); -/* - * Loop detection - walk through the ruleset beginning at the destination chain - * of a new jump until either the source chain is reached (loop) or all - * reachable chains have been traversed. - * - * The loop check is performed whenever a new jump verdict is added to an - * expression or verdict map or a verdict map is bound to a new chain. - */ - -static int nf_tables_check_loops(const struct nft_ctx *ctx, - const struct nft_chain *chain); - -static int nf_tables_loop_check_setelem(const struct nft_ctx *ctx, - struct nft_set *set, - const struct nft_set_iter *iter, - struct nft_set_elem *elem) -{ - const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv); - const struct nft_data *data; - - if (nft_set_ext_exists(ext, NFT_SET_EXT_FLAGS) && - *nft_set_ext_flags(ext) & NFT_SET_ELEM_INTERVAL_END) - return 0; - - data = nft_set_ext_data(ext); - switch (data->verdict.code) { - case NFT_JUMP: - case NFT_GOTO: - return nf_tables_check_loops(ctx, data->verdict.chain); - default: - return 0; - } -} - -static int nf_tables_check_loops(const struct nft_ctx *ctx, - const struct nft_chain *chain) -{ - const struct nft_rule *rule; - const struct nft_expr *expr, *last; - struct nft_set *set; - struct nft_set_binding *binding; - struct nft_set_iter iter; - - if (ctx->chain == chain) - return -ELOOP; - - list_for_each_entry(rule, &chain->rules, list) { - nft_rule_for_each_expr(expr, last, rule) { - struct nft_immediate_expr *priv; - const struct nft_data *data; - int err; - - if (strcmp(expr->ops->type->name, "immediate")) - continue; - - priv = nft_expr_priv(expr); - if (priv->dreg != NFT_REG_VERDICT) - continue; - - data = &priv->data; - switch (data->verdict.code) { - case NFT_JUMP: - case NFT_GOTO: - err = nf_tables_check_loops(ctx, - data->verdict.chain); - if (err < 0) - return err; - break; - default: - break; - } - } - } - - list_for_each_entry(set, &ctx->table->sets, list) { - if (!nft_is_active_next(ctx->net, set)) - continue; - if (!(set->flags & NFT_SET_MAP) || - set->dtype != NFT_DATA_VERDICT) - continue; - - list_for_each_entry(binding, &set->bindings, list) { - if (!(binding->flags & NFT_SET_MAP) || - binding->chain != chain) - continue; - - iter.genmask = nft_genmask_next(ctx->net); - iter.skip = 0; - iter.count = 0; - iter.err = 0; - iter.fn = nf_tables_loop_check_setelem; - - set->ops->walk(ctx, set, &iter); - if (iter.err < 0) - return iter.err; - } - } - - return 0; -} - /** * nft_parse_u32_check - fetch u32 attribute and check for maximum value * @@ -8923,7 +8834,7 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, if (data != NULL && (data->verdict.code == NFT_GOTO || data->verdict.code == NFT_JUMP)) { - err = nf_tables_check_loops(ctx, data->verdict.chain); + err = nft_chain_validate(ctx, data->verdict.chain); if (err < 0) return err; } -- 2.17.1
2 1
0 0
[PATCH OLK-6.6] netfilter: nf_tables: prefer nft_chain_validate
by Wang Hai 08 Aug '24

08 Aug '24
From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.10 commit cff3bd012a9512ac5ed858d38e6ed65f6391008c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEMR CVE: CVE-2024-41042 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- nft_chain_validate already performs loop detection because a cycle will result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE). It also follows maps via ->validate callback in nft_lookup, so there appears no reason to iterate the maps again. nf_tables_check_loops() and all its helper functions can be removed. This improves ruleset load time significantly, from 23s down to 12s. This also fixes a crash bug. Old loop detection code can result in unbounded recursion: BUG: TASK stack guard page was hit at .... Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1 [..] with a suitable ruleset during validation of register stores. I can't see any actual reason to attempt to check for this from nft_validate_register_store(), at this point the transaction is still in progress, so we don't have a full picture of the rule graph. For nf-next it might make sense to either remove it or make this depend on table->validate_state in case we could catch an error earlier (for improved error reporting to userspace). Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_tables_api.c [Because the deleted function had some patches that were not merged in, resulting in conflicts] Signed-off-by: Wang Hai <wanghai38(a)huawei.com> --- net/netfilter/nf_tables_api.c | 154 +++------------------------------- 1 file changed, 13 insertions(+), 141 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index dd044a47c872..ea139fca74cb 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3743,6 +3743,15 @@ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *r nf_tables_rule_destroy(ctx, rule); } +/** nft_chain_validate - loop detection and hook validation + * + * @ctx: context containing call depth and base chain + * @chain: chain to validate + * + * Walk through the rules of the given chain and chase all jumps/gotos + * and set lookups until either the jump limit is hit or all reachable + * chains have been validated. + */ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) { struct nft_expr *expr, *last; @@ -3764,6 +3773,9 @@ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) if (!expr->ops->validate) continue; + /* This may call nft_chain_validate() recursively, + * callers that do so must increment ctx->level. + */ err = expr->ops->validate(ctx, expr, &data); if (err < 0) return err; @@ -10621,146 +10633,6 @@ int nft_chain_validate_hooks(const struct nft_chain *chain, } EXPORT_SYMBOL_GPL(nft_chain_validate_hooks); -/* - * Loop detection - walk through the ruleset beginning at the destination chain - * of a new jump until either the source chain is reached (loop) or all - * reachable chains have been traversed. - * - * The loop check is performed whenever a new jump verdict is added to an - * expression or verdict map or a verdict map is bound to a new chain. - */ - -static int nf_tables_check_loops(const struct nft_ctx *ctx, - const struct nft_chain *chain); - -static int nft_check_loops(const struct nft_ctx *ctx, - const struct nft_set_ext *ext) -{ - const struct nft_data *data; - int ret; - - data = nft_set_ext_data(ext); - switch (data->verdict.code) { - case NFT_JUMP: - case NFT_GOTO: - ret = nf_tables_check_loops(ctx, data->verdict.chain); - break; - default: - ret = 0; - break; - } - - return ret; -} - -static int nf_tables_loop_check_setelem(const struct nft_ctx *ctx, - struct nft_set *set, - const struct nft_set_iter *iter, - struct nft_set_elem *elem) -{ - const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv); - - if (nft_set_ext_exists(ext, NFT_SET_EXT_FLAGS) && - *nft_set_ext_flags(ext) & NFT_SET_ELEM_INTERVAL_END) - return 0; - - return nft_check_loops(ctx, ext); -} - -static int nft_set_catchall_loops(const struct nft_ctx *ctx, - struct nft_set *set) -{ - u8 genmask = nft_genmask_next(ctx->net); - struct nft_set_elem_catchall *catchall; - struct nft_set_ext *ext; - int ret = 0; - - list_for_each_entry_rcu(catchall, &set->catchall_list, list) { - ext = nft_set_elem_ext(set, catchall->elem); - if (!nft_set_elem_active(ext, genmask)) - continue; - - ret = nft_check_loops(ctx, ext); - if (ret < 0) - return ret; - } - - return ret; -} - -static int nf_tables_check_loops(const struct nft_ctx *ctx, - const struct nft_chain *chain) -{ - const struct nft_rule *rule; - const struct nft_expr *expr, *last; - struct nft_set *set; - struct nft_set_binding *binding; - struct nft_set_iter iter; - - if (ctx->chain == chain) - return -ELOOP; - - if (fatal_signal_pending(current)) - return -EINTR; - - list_for_each_entry(rule, &chain->rules, list) { - nft_rule_for_each_expr(expr, last, rule) { - struct nft_immediate_expr *priv; - const struct nft_data *data; - int err; - - if (strcmp(expr->ops->type->name, "immediate")) - continue; - - priv = nft_expr_priv(expr); - if (priv->dreg != NFT_REG_VERDICT) - continue; - - data = &priv->data; - switch (data->verdict.code) { - case NFT_JUMP: - case NFT_GOTO: - err = nf_tables_check_loops(ctx, - data->verdict.chain); - if (err < 0) - return err; - break; - default: - break; - } - } - } - - list_for_each_entry(set, &ctx->table->sets, list) { - if (!nft_is_active_next(ctx->net, set)) - continue; - if (!(set->flags & NFT_SET_MAP) || - set->dtype != NFT_DATA_VERDICT) - continue; - - list_for_each_entry(binding, &set->bindings, list) { - if (!(binding->flags & NFT_SET_MAP) || - binding->chain != chain) - continue; - - iter.genmask = nft_genmask_next(ctx->net); - iter.skip = 0; - iter.count = 0; - iter.err = 0; - iter.fn = nf_tables_loop_check_setelem; - - set->ops->walk(ctx, set, &iter); - if (!iter.err) - iter.err = nft_set_catchall_loops(ctx, set); - - if (iter.err < 0) - return iter.err; - } - } - - return 0; -} - /** * nft_parse_u32_check - fetch u32 attribute and check for maximum value * @@ -10873,7 +10745,7 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, if (data != NULL && (data->verdict.code == NFT_GOTO || data->verdict.code == NFT_JUMP)) { - err = nf_tables_check_loops(ctx, data->verdict.chain); + err = nft_chain_validate(ctx, data->verdict.chain); if (err < 0) return err; } -- 2.17.1
2 1
0 0
  • ← Newer
  • 1
  • ...
  • 700
  • 701
  • 702
  • 703
  • 704
  • 705
  • 706
  • ...
  • 1873
  • Older →

HyperKitty Powered by HyperKitty