mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

June 2023

  • 61 participants
  • 246 discussions
[PATCH OLK-5.10 v3 0/2] Add vhost net polling to suppress kick ratio
by Xu Kuohai 13 Jun '23

13 Jun '23
During high-frequency packet transmission, if vhost immediately notifies virtio driver after reading the virtio_queue empty, virtio driver will perform a kick after sending the next packet. However, if vhost waits a little longer, it could get the next packet sent by virtio driver without the need for virtio to perform a kick. This patch optimizes for this issue. If the TX interval recently recorded by vhost-net is within 50us, it is considered a high-frequency packet sending. At this time, after detecting the virtio_queue is empty, vhost-net will wait for new packets to arrive before notifing the virtio driver. Xu Kuohai (2): vhost_net: Suppress kick ratio when high frequency TX detected openeuler: vhost_net: Enable vhost net polling for openeuler arm64 and x86 arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/vhost/Kconfig | 9 +++++ drivers/vhost/net.c | 49 ++++++++++++++++++++++++++ 4 files changed, 60 insertions(+) -- 2.30.2
2 3
0 0
[PATCH OLK-5.10] proc: allow pid_revalidate() during LOOKUP_RCU
by Li Nan 13 Jun '23

13 Jun '23
From: Stephen Brennan <stephen.s.brennan(a)oracle.com> mainline inclusion from mainline-v5.16-rc1 commit da4d6b9cf80ae5b0083f640133b85b68b53b6497 category: bugfix bugzilla: 188892, https://gitee.com/openeuler/kernel/issues/I7CWJ7 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?i… ---------------------------------------- Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since b17ec22fb339 ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.… Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com> Cc: Konrad Wilk <konrad.wilk(a)oracle.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Li Nan <linan122(a)huawei.com> --- fs/proc/base.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 22c65289128e..24c70ff923b8 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2101,19 +2101,21 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int ret = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); + rcu_read_lock(); + inode = d_inode_rcu(dentry); + if (!inode) + goto out; + task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { pid_update_inode(task, inode); - put_task_struct(task); - return 1; + ret = 1; } - return 0; +out: + rcu_read_unlock(); + return ret; } static inline bool proc_inode_is_dead(struct inode *inode) -- 2.39.2
1 0
0 0
[PATCH openEuler-22.03-LTS-SP2 v2 0/2] add two bpf helper function for redissockmap
by Liu Jian 13 Jun '23

13 Jun '23
v1->v2: 1) add error handling in localip_init() 2) add static attribute to localip_init() and localip_cleanup() add two bpf helper function for redissockmap Liu Jian (2): net: add bpf_is_local_ipaddr bpf helper function net: let sockops can use bpf_get_current_comm() arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/net/Kconfig | 8 ++ drivers/net/Makefile | 1 + drivers/net/localip/Makefile | 8 ++ drivers/net/localip/localip.c | 149 +++++++++++++++++++++++++ include/uapi/linux/bpf.h | 7 ++ net/core/filter.c | 22 ++++ tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 204 insertions(+) create mode 100644 drivers/net/localip/Makefile create mode 100644 drivers/net/localip/localip.c -- 2.34.1
1 2
0 0
[OLK-5.10] proc: allow pid_revalidate() during LOOKUP_RCU
by Li Nan 13 Jun '23

13 Jun '23
From: Stephen Brennan <stephen.s.brennan(a)oracle.com> mainline inclusion from mainline-v5.16-rc1 commit da4d6b9cf80ae5b0083f640133b85b68b53b6497 category: bugfix bugzilla: 188892, https://gitee.com/openeuler/kernel/issues/I7CWJ7 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?i… ---------------------------------------- Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since b17ec22fb339 ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.… Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com> Cc: Konrad Wilk <konrad.wilk(a)oracle.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Li Nan <linan122(a)huawei.com> --- fs/proc/base.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 22c65289128e..24c70ff923b8 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2101,19 +2101,21 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int ret = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); + rcu_read_lock(); + inode = d_inode_rcu(dentry); + if (!inode) + goto out; + task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { pid_update_inode(task, inode); - put_task_struct(task); - return 1; + ret = 1; } - return 0; +out: + rcu_read_unlock(); + return ret; } static inline bool proc_inode_is_dead(struct inode *inode) -- 2.39.2
1 0
0 0
[PATCH OLK-5.10 v2 0/2] Add vhost net polling to suppress kick ratio
by Xu Kuohai 13 Jun '23

13 Jun '23
During high-frequency packet transmission, if vhost immediately notifies virtio driver after reading the virtio_queue empty, virtio driver will perform a kick after sending the next packet. However, if vhost waits a little longer, it could get the next packet sent by virtio driver without the need for virtio to perform a kick. This patch optimizes for this issue. If the TX interval recently recorded by vhost-net is within 50us, it is considered a high-frequency packet sending. At this time, after detecting the virtio_queue is empty, vhost-net will wait for new packets to arrive before notifing the virtio driver. Xu Kuohai (2): vhost_net: Suppress kick ratio when high frequency TX detected vhost_net: Enable vhost net polling for openeuler arm64 and x86 arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/vhost/Kconfig | 9 ++++++ drivers/vhost/net.c | 45 ++++++++++++++++++++++++++ 4 files changed, 56 insertions(+) -- 2.30.2
2 3
0 0
[OLK-5.10 v2] irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
by chenxiang 12 Jun '23

12 Jun '23
From: Xiang Chen <chenxiang66(a)hisilicon.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7C103 ------------------------------------------------ When enabled GICv4.1 in hip09, there are some invalid vPE configurations in configuration table for some situations, which will cause some vSGI interrupts lost. To fix the issue, need to send vinvall command after vmovp. Signed-off-by: Nianyao Tang <tangnianyao(a)huawei.com> Signed-off-by: Xiang Chen <chenxiang66(a)hisilicon.com> Signed-off-by: caijian <caijian11(a)h-partners.com> --- Documentation/arm64/silicon-errata.rst | 2 ++ arch/arm64/Kconfig | 11 +++++++ arch/arm64/configs/openeuler_defconfig | 1 + drivers/irqchip/irq-gic-v3-its.c | 40 ++++++++++++++++++++------ 4 files changed, 46 insertions(+), 8 deletions(-) diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index b72fbbfe3fcb..2305def38396 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -149,6 +149,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | TSV{110,200} | #1980005 | HISILICON_ERRATUM_1980005 | +----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip09 | #162100801 | HISILICON_ERRATUM_162100801 | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3dcf96f37a3c..766356f04abc 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -802,6 +802,17 @@ config HISILICON_ERRATUM_1980005 If unsure, say N. +config HISILICON_ERRATUM_162100801 + bool "Hip09 162100801 erratum support" + default y + help + When enabled GICv4.1 in hip09, there are some invalid vPE config + in configuration tables for some situation, which will cause vSGI + interrupts lost. So fix it by sending vinvall commands after vmovp. + + If unsure, say Y. + + config QCOM_FALKOR_ERRATUM_1003 bool "Falkor E1003: Incorrect translation due to ASID change" default y diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 8261f11b54fd..1b633c835279 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -390,6 +390,7 @@ CONFIG_QCOM_FALKOR_ERRATUM_E1041=y CONFIG_SOCIONEXT_SYNQUACER_PREITS=y CONFIG_HISILICON_ERRATUM_HIP08_RU_PREFETCH=y # CONFIG_HISILICON_HIP08_RU_PREFETCH_DEFAULT_OFF is not set +CONFIG_HISILICON_ERRATUM_162100801=y # end of ARM errata workarounds via the alternatives framework CONFIG_ARM64_4K_PAGES=y diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 99fcc5601ea4..db8219d65e7c 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -46,6 +46,7 @@ #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0) #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1) #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2) +#define ITS_FLAGS_WORKAROUND_HISILICON_162100801 (1ULL << 3) #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0) #define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1) @@ -1294,6 +1295,14 @@ static void its_send_vmapp(struct its_node *its, its_send_single_vcommand(its, its_build_vmapp_cmd, &desc); } +static void its_send_vinvall(struct its_node *its, struct its_vpe *vpe) +{ + struct its_cmd_desc desc; + + desc.its_vinvall_cmd.vpe = vpe; + its_send_single_vcommand(its, its_build_vinvall_cmd, &desc); +} + static void its_send_vmovp(struct its_vpe *vpe) { struct its_cmd_desc desc = {}; @@ -1307,6 +1316,10 @@ static void its_send_vmovp(struct its_vpe *vpe) its = list_first_entry(&its_nodes, struct its_node, entry); desc.its_vmovp_cmd.col = &its->collections[col_id]; its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); + if (is_v4_1(its) && (its->flags & + ITS_FLAGS_WORKAROUND_HISILICON_162100801)) + its_send_vinvall(its, vpe); + return; } @@ -1333,19 +1346,14 @@ static void its_send_vmovp(struct its_vpe *vpe) desc.its_vmovp_cmd.col = &its->collections[col_id]; its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); + if (is_v4_1(its) && (its->flags & + ITS_FLAGS_WORKAROUND_HISILICON_162100801)) + its_send_vinvall(its, vpe); } raw_spin_unlock_irqrestore(&vmovp_lock, flags); } -static void its_send_vinvall(struct its_node *its, struct its_vpe *vpe) -{ - struct its_cmd_desc desc; - - desc.its_vinvall_cmd.vpe = vpe; - its_send_single_vcommand(its, its_build_vinvall_cmd, &desc); -} - static void its_send_vinv(struct its_device *dev, u32 event_id) { struct its_cmd_desc desc; @@ -4963,6 +4971,14 @@ static bool __maybe_unused its_enable_quirk_hip07_161600802(void *data) return true; } +static bool __maybe_unused its_enable_quirk_hip09_162100801(void *data) +{ + struct its_node *its = data; + + its->flags |= ITS_FLAGS_WORKAROUND_HISILICON_162100801; + return true; +} + static const struct gic_quirk its_quirks[] = { #ifdef CONFIG_CAVIUM_ERRATUM_22375 { @@ -5008,6 +5024,14 @@ static const struct gic_quirk its_quirks[] = { .mask = 0xffffffff, .init = its_enable_quirk_hip07_161600802, }, +#endif +#ifdef CONFIG_HISILICON_ERRATUM_162100801 + { + .desc = "ITS: Hip09 erratum 162100801", + .iidr = 0x00051736, + .mask = 0xffffffff, + .init = its_enable_quirk_hip09_162100801, + }, #endif { } -- 2.30.0
2 1
0 0
[PATCH openEuler-22.03-LTS-SP2 0/2] add two bpf helper function for redissockmap
by Liu Jian 12 Jun '23

12 Jun '23
add two bpf helper function for redissockmap Liu Jian (2): net: add bpf_is_local_ipaddr bpf helper function net: let sockops can use bpf_get_current_comm() arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/net/Kconfig | 8 ++ drivers/net/Makefile | 1 + drivers/net/localip/Makefile | 8 ++ drivers/net/localip/localip.c | 146 +++++++++++++++++++++++++ include/uapi/linux/bpf.h | 7 ++ net/core/filter.c | 22 ++++ tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 201 insertions(+) create mode 100644 drivers/net/localip/Makefile create mode 100644 drivers/net/localip/localip.c -- 2.34.1
2 3
0 0
[PATCH openEuler-22.03-LTS-SP2] irqchip: gic-v3: Collection table support muti pages
by wangwudi 12 Jun '23

12 Jun '23
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7CX6S CVE: NA -------------------------------------------------------------------------- Only one page is allocated to the collection table. Recalculate the page number of collection table based on the number of CPUs. Signed-off-by: wangwudi <wangwudi(a)hisilicon.com> --- drivers/irqchip/irq-gic-v3-its.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 99fcc5601ea4..fa45fd7ed173 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2665,6 +2665,10 @@ static int its_alloc_tables(struct its_node *its) indirect = its_parse_indirect_baser(its, baser, &order, ITS_MAX_VPEID_BITS); break; + case GITS_BASER_TYPE_COLLECTION: + indirect = its_parse_indirect_baser(its, baser, &order, + order_base_2(num_possible_cpus())); + break; } err = its_setup_baser(its, baser, cache, shr, order, indirect); -- 2.31.0
2 1
0 0
[PATCH openEuler-22.03-LTS-SP2] irqchip: gic-v3: Collection table support muti pages
by wangwudi 12 Jun '23

12 Jun '23
From 89bf31ca572987247ae0cf147df7ff5a83ab45d3 Mon Sep 17 00:00:00 2001 From: wangwudi <wangwudi(a)hisilicon.com> Date: Mon, 12 Jun 2023 22:16:32 +0800 Subject: [PATCH openEuler-22.03-LTS-SP2] irqchip: gic-v3: Collection table support muti pages -------- Signed-off-by: wangwudi <wangwudi(a)hisilicon.com> --- drivers/irqchip/irq-gic-v3-its.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 0e429fe9cfd8..111e86b1a99e 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2624,6 +2624,10 @@ static int its_alloc_tables(struct its_node *its) indirect = its_parse_indirect_baser(its, baser, &order, ITS_MAX_VPEID_BITS); break; + case GITS_BASER_TYPE_COLLECTION: + indirect = its_parse_indirect_baser(its, baser, &order, + order_base_2(num_possible_cpus())); + break; } err = its_setup_baser(its, baser, cache, shr, order, indirect); -- 2.31.0
1 0
0 0
[PATCH openEuler-22.03-LTS-SP2 1/2] net: add bpf_is_local_ipaddr bpf helper function
by Liu Jian 12 Jun '23

12 Jun '23
hulk inclusion category: feature bugzilla: NA CVE: N/A ---------------------------------------------------- Some network acceleration solutions, such as sockmap, are valid only for internal packets of the local host. The bpf_is_local_ipaddr() bpf helper function is added so that the ebpf program can determine whether a packet is an internal packet of the local host. Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/net/Kconfig | 8 ++ drivers/net/Makefile | 1 + drivers/net/localip/Makefile | 8 ++ drivers/net/localip/localip.c | 146 +++++++++++++++++++++++++ include/uapi/linux/bpf.h | 7 ++ net/core/filter.c | 20 ++++ tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 199 insertions(+) create mode 100644 drivers/net/localip/Makefile create mode 100644 drivers/net/localip/localip.c diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index f934557ff765..b2d06797c08a 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -3131,6 +3131,7 @@ CONFIG_DLCI_MAX=8 CONFIG_USB4_NET=m # CONFIG_NETDEVSIM is not set CONFIG_NET_FAILOVER=m +CONFIG_NET_LOCALIP_LST=m # CONFIG_ISDN is not set # diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 11323cdc3301..4f2682d4b71e 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -3206,6 +3206,7 @@ CONFIG_USB4_NET=m CONFIG_HYPERV_NET=m CONFIG_NETDEVSIM=m CONFIG_NET_FAILOVER=m +CONFIG_NET_LOCALIP_LST=m CONFIG_ISDN=y CONFIG_ISDN_CAPI=y CONFIG_CAPI_TRACE=y diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f20808024305..913acb7eed6f 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -592,4 +592,12 @@ config NET_FAILOVER a VM with direct attached VF by failing over to the paravirtual datapath when the VF is unplugged. +config NET_LOCALIP_LST + tristate "Collect local ipv4 address" + depends on INET + default n + help + Similar to inet_addr_lst, only the IP address is recorded, and + net_namespace is not concerned. + endif # NETDEVICES diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 72e18d505d1a..b4ff8a310dcc 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -84,3 +84,4 @@ thunderbolt-net-y += thunderbolt.o obj-$(CONFIG_USB4_NET) += thunderbolt-net.o obj-$(CONFIG_NETDEVSIM) += netdevsim/ obj-$(CONFIG_NET_FAILOVER) += net_failover.o +obj-$(CONFIG_NET_LOCALIP_LST) += localip/ diff --git a/drivers/net/localip/Makefile b/drivers/net/localip/Makefile new file mode 100644 index 000000000000..03b535709271 --- /dev/null +++ b/drivers/net/localip/Makefile @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Makefile for the linux kernel. +# + +# Object file lists. + +obj-$(CONFIG_NET_LOCALIP_LST) += localip.o diff --git a/drivers/net/localip/localip.c b/drivers/net/localip/localip.c new file mode 100644 index 000000000000..3756924553bf --- /dev/null +++ b/drivers/net/localip/localip.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Huawei Technologies Co., Ltd + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/mutex.h> +#include <linux/device.h> +#include <linux/inetdevice.h> +#include <linux/spinlock.h> + +#define IN4_ADDR_HSIZE_SHIFT 8 +#define IN4_ADDR_HSIZE (1U << IN4_ADDR_HSIZE_SHIFT) + +static struct hlist_head localip_lst[IN4_ADDR_HSIZE]; + +static DEFINE_SPINLOCK(localip_lock); + +struct localipaddr { + struct hlist_node node; + struct rcu_head rcu; + __u32 ipaddr; +}; + +static u32 localip_hash(__be32 addr) +{ + return hash_32(addr, IN4_ADDR_HSIZE_SHIFT); +} + +static void localip_hash_insert(struct localipaddr *ip) +{ + u32 hash = localip_hash(ip->ipaddr); + + hlist_add_head_rcu(&ip->node, &localip_lst[hash]); +} + +static void localip_hash_remove(struct localipaddr *ip) +{ + hlist_del_init_rcu(&ip->node); +} + +static int is_local_ipaddr(uint32_t ipaddr) +{ + u32 hash = localip_hash(ipaddr); + struct localipaddr *localip; + + rcu_read_lock(); + hlist_for_each_entry_rcu(localip, &localip_lst[hash], node) { + if (localip->ipaddr == ipaddr) { + rcu_read_unlock(); + return 1; + } + } + rcu_read_unlock(); + + return 0; +} + +static int localip_event(struct notifier_block *this, unsigned long event, + void *ptr) +{ + struct in_ifaddr *ifa = ptr; + struct net_device *event_netdev = ifa->ifa_dev->dev; + struct localipaddr *localip; + u32 hash; + + if (ipv4_is_loopback(ifa->ifa_local)) + return NOTIFY_DONE; + + switch (event) { + case NETDEV_UP: + pr_debug("UP, dev:%s, ip:0x%x, mask:0x%x\n", event_netdev->name, + ifa->ifa_local, ifa->ifa_mask); + localip = kzalloc(sizeof(struct localipaddr), GFP_KERNEL); + if (!localip) { + pr_err("kzalloc failed.\n"); + break; + } + localip->ipaddr = ifa->ifa_local; + spin_lock(&localip_lock); + localip_hash_insert(localip); + spin_unlock(&localip_lock); + break; + case NETDEV_DOWN: + pr_debug("DOWN, dev:%s, ip:0x%x, mask:0x%x\n", event_netdev->name, + ifa->ifa_local, ifa->ifa_mask); + hash = localip_hash(ifa->ifa_local); + spin_lock(&localip_lock); + hlist_for_each_entry(localip, &localip_lst[hash], node) { + if (localip->ipaddr == ifa->ifa_local) { + localip_hash_remove(localip); + kfree_rcu(localip, rcu); + break; + } + } + spin_unlock(&localip_lock); + break; + default: + break; + } + return NOTIFY_DONE; +} + +static struct notifier_block localip_notifier = { + .notifier_call = localip_event, +}; + +typedef int (*is_local_ipaddr_func)(uint32_t ipaddr); +extern is_local_ipaddr_func bpf_is_local_ipaddr_func; + +int localip_init(void) +{ + int i; + + for (i = 0; i < IN4_ADDR_HSIZE; i++) + INIT_HLIST_HEAD(&localip_lst[i]); + + register_inetaddr_notifier(&localip_notifier); + bpf_is_local_ipaddr_func = is_local_ipaddr; + return 0; +} + +void localip_cleanup(void) +{ + struct localipaddr *localip; + struct hlist_node *n; + int i; + + bpf_is_local_ipaddr_func = NULL; + unregister_inetaddr_notifier(&localip_notifier); + + spin_lock(&localip_lock); + for (i = 0; i < IN4_ADDR_HSIZE; i++) { + hlist_for_each_entry_safe(localip, n, &localip_lst[i], node) { + pr_debug("cleanup, hash:%i, ip:0x%x\n", i, localip->ipaddr); + localip_hash_remove(localip); + kfree_rcu(localip, rcu); + } + } + spin_unlock(&localip_lock); + synchronize_rcu(); +} + +module_init(localip_init); +module_exit(localip_cleanup); +MODULE_LICENSE("GPL"); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 9e64dac44d60..9373abafcb91 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3872,6 +3872,12 @@ union bpf_attr { * check src_cpu whether share cache with dst_cpu. * Return * yes 1, no 0. + * + * long bpf_is_local_ipaddr(u32 ipaddr) + * Description + * Check the ipaddr is local address or not. + * Return + * 1 is local address, 0 is not. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -4044,6 +4050,7 @@ union bpf_attr { FN(sched_entity_to_tg), \ FN(cpumask_op), \ FN(cpus_share_cache), \ + FN(is_local_ipaddr), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/net/core/filter.c b/net/core/filter.c index 012a5070a9e5..527b66921dd6 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5084,6 +5084,24 @@ static const struct bpf_func_proto bpf_sk_original_addr_proto = { .arg4_type = ARG_CONST_SIZE, }; +typedef int (*is_local_ipaddr_func)(uint32_t ipaddr); +is_local_ipaddr_func bpf_is_local_ipaddr_func; +EXPORT_SYMBOL(bpf_is_local_ipaddr_func); + +BPF_CALL_1(bpf_is_local_ipaddr, uint32_t, ipaddr) +{ + if (!bpf_is_local_ipaddr_func) + return 0; + return bpf_is_local_ipaddr_func(ipaddr); +} + +static const struct bpf_func_proto bpf_is_local_ipaddr_proto = { + .func = bpf_is_local_ipaddr, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, +}; + BPF_CALL_5(bpf_sock_addr_getsockopt, struct bpf_sock_addr_kern *, ctx, int, level, int, optname, char *, optval, int, optlen) { @@ -7398,6 +7416,8 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; #endif /* CONFIG_INET */ + case BPF_FUNC_is_local_ipaddr: + return &bpf_is_local_ipaddr_proto; default: return bpf_sk_base_func_proto(func_id); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index abf8023d606b..41bc2f496176 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3872,6 +3872,12 @@ union bpf_attr { * check src_cpu whether share cache with dst_cpu. * Return * true yes, false no. + * + * long bpf_is_local_ipaddr(u32 ipaddr) + * Description + * Check the ipaddr is local address or not. + * Return + * 1 is local address, 0 is not. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -4044,6 +4050,7 @@ union bpf_attr { FN(sched_entity_to_tg), \ FN(cpumask_op), \ FN(cpus_share_cache), \ + FN(is_local_ipaddr), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper -- 2.34.1
1 1
0 0
  • ← Newer
  • 1
  • ...
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • ...
  • 25
  • Older →

HyperKitty Powered by HyperKitty