mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 44 participants
  • 18219 discussions
[PATCH v2 OLK-5.10] iommu/iova: decrease cpu rcaches to improve iova rcache performance
by Zhang Zekun 13 Jun '23

13 Jun '23
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ASVH CVE: NA --------------------------------------- The logic of cpu rcaches could cause severe performance problem. It is because that the logic of iova rcache would create percpu rcache which will be used only by local cpu, free iovas in cpu_rcache won't be shared among different cpus. For example, platform which have 256 cpus with 256 threads will have 417792 entiries( 6 * (256 * 2 + 32) * 128) to cache iova, but every thread running on a cpu can only use 1/256 of total cache size, and this could cause serve performance problem. We limit the max cpu size to 64 to fix this problem, and each iovad could save about 2MB memory. In our test FIO with 256 threads: Jobs: 12 (f=2): [/(1),R(2),/(9)][16.7%][r=2KiB/s][r=0 IOPS][eta 00m:30s] Jobs: 12 (f=12): [R(12)][20.0%][r=1091MiB/s][r=279k IOPS][eta 00m:28s] Jobs: 12 (f=12): [R(12)][22.2%][r=1426MiB/s][r=365k IOPS][eta 00m:28s] Jobs: 12 (f=12): [R(12)][25.0%][r=1607MiB/s][r=411k IOPS][eta 00m:27s] Jobs: 12 (f=12): [R(12)][27.8%][r=1501MiB/s][r=384k IOPS][eta 00m:26s] Jobs: 12 (f=12): [R(12)][30.6%][r=1486MiB/s][r=380k IOPS][eta 00m:25s] Jobs: 12 (f=12): [R(12)][33.3%][r=1393MiB/s][r=357k IOPS][eta 00m:24s] Jobs: 12 (f=12): [R(12)][36.1%][r=1550MiB/s][r=397k IOPS][eta 00m:23s] Jobs: 12 (f=12): [R(12)][38.9%][r=1485MiB/s][r=380k IOPS][eta 00m:22s] After this patch: bs: 10 (f=10): [R(10)][98.4%][r=7414MiB/s][r=1898k IOPS][eta 00m:15s] Jobs: 10 (f=10): [R(10)][98.5%][r=7495MiB/s][r=1919k IOPS][eta 00m:14s] Jobs: 10 (f=10): [R(10)][98.6%][r=7497MiB/s][r=1919k IOPS][eta 00m:13s] Jobs: 10 (f=10): [R(10)][98.7%][r=7497MiB/s][r=1919k IOPS][eta 00m:12s] Jobs: 10 (f=10): [R(10)][98.8%][r=7471MiB/s][r=1913k IOPS][eta 00m:11s] Jobs: 10 (f=10): [R(10)][98.9%][r=7483MiB/s][r=1916k IOPS][eta 00m:10s] Jobs: 10 (f=10): [R(10)][99.0%][r=7491MiB/s][r=1918k IOPS][eta 00m:09s] Jobs: 10 (f=10): [R(10)][99.1%][r=7436MiB/s][r=1904k IOPS][eta 00m:08s] Jobs: 10 (f=10): [R(10)][99.2%][r=7462MiB/s][r=1910k IOPS][eta 00m:07s] can increase about 500% of IOPS. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> Reviewed-by: Weilong Chen <chenweilong(a)huawei.com> --- v2: -add hulk inclusion drivers/iommu/Kconfig | 10 ++++++++++ drivers/iommu/iova.c | 33 +++++++++++++++++++++++++++++++++ include/linux/iova.h | 5 +++++ 3 files changed, 48 insertions(+) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f04a2bde0018..c71eb20bc702 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -343,6 +343,16 @@ config ARM_SMMU_V3_PM help Add support for suspend and resume support for arm smmu v3. +config ARM_SMMU_V3_REDUCE_CPURCACHE + bool "Add support for cpu rcache" + depends on ARM_SMMU_V3 + default n + help + Add support for reducing the total amount of cpu rcache. When + num of cpus is large, iova rcache will have serve performance + problem. Reduce the amount of cpu rcaches to improve cache hit + rate. + config S390_IOMMU def_bool y if S390 && PCI depends on S390 && PCI diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 1246e8f8bf08..ed86bb09db4f 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -881,11 +881,19 @@ static void init_iova_rcaches(struct iova_domain *iovad) rcache = &iovad->rcaches[i]; spin_lock_init(&rcache->lock); rcache->depot_size = 0; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + for (cpu = 0; cpu < MAX_CPU_SIZE; cpu++) { + rcache->cpu_rcaches[cpu] = kmalloc(sizeof(*cpu_rcache), GFP_KERNEL); + if (WARN_ON(!rcache->cpu_rcaches[cpu])) + continue; + cpu_rcache = rcache->cpu_rcaches[cpu]; +#else rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); if (WARN_ON(!rcache->cpu_rcaches)) continue; for_each_possible_cpu(cpu) { cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); +#endif spin_lock_init(&cpu_rcache->lock); cpu_rcache->loaded = iova_magazine_alloc(GFP_KERNEL); cpu_rcache->prev = iova_magazine_alloc(GFP_KERNEL); @@ -907,8 +915,14 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, struct iova_cpu_rcache *cpu_rcache; bool can_insert = false; unsigned long flags; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + int cpu; + cpu = raw_smp_processor_id(); + cpu_rcache = rcache->cpu_rcaches[cpu % MAX_CPU_SIZE]; +#else cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches); +#endif spin_lock_irqsave(&cpu_rcache->lock, flags); if (!iova_magazine_full(cpu_rcache->loaded)) { @@ -970,8 +984,14 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache, unsigned long iova_pfn = 0; bool has_pfn = false; unsigned long flags; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + int cpu; + cpu = raw_smp_processor_id(); + cpu_rcache = rcache->cpu_rcaches[cpu % MAX_CPU_SIZE]; +#else cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches); +#endif spin_lock_irqsave(&cpu_rcache->lock, flags); if (!iova_magazine_empty(cpu_rcache->loaded)) { @@ -1026,12 +1046,21 @@ static void free_iova_rcaches(struct iova_domain *iovad) for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { rcache = &iovad->rcaches[i]; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + for (cpu = 0; cpu < MAX_CPU_SIZE; cpu++) { + cpu_rcache = rcache->cpu_rcaches[cpu]; + iova_magazine_free(cpu_rcache->loaded); + iova_magazine_free(cpu_rcache->prev); + kfree(cpu_rcache); + } +#else for_each_possible_cpu(cpu) { cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); iova_magazine_free(cpu_rcache->loaded); iova_magazine_free(cpu_rcache->prev); } free_percpu(rcache->cpu_rcaches); +#endif for (j = 0; j < rcache->depot_size; ++j) iova_magazine_free(rcache->depot[j]); } @@ -1049,7 +1078,11 @@ void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { rcache = &iovad->rcaches[i]; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + cpu_rcache = rcache->cpu_rcaches[cpu % MAX_CPU_SIZE]; +#else cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); +#endif spin_lock_irqsave(&cpu_rcache->lock, flags); iova_magazine_free_pfns(cpu_rcache->loaded, iovad); iova_magazine_free_pfns(cpu_rcache->prev, iovad); diff --git a/include/linux/iova.h b/include/linux/iova.h index dfa51ae49666..06897fc970eb 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -27,12 +27,17 @@ struct iova_cpu_rcache; #define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */ #define MAX_GLOBAL_MAGS 32 /* magazines per bin */ +#define MAX_CPU_SIZE (NR_CPUS > 64 ? 64 : NR_CPUS) /* max cpu rcaches */ struct iova_rcache { spinlock_t lock; unsigned long depot_size; struct iova_magazine *depot[MAX_GLOBAL_MAGS]; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + struct iova_cpu_rcache *cpu_rcaches[MAX_CPU_SIZE]; +#else struct iova_cpu_rcache __percpu *cpu_rcaches; +#endif }; struct iova_domain; -- 2.17.1
2 1
0 0
[PATCH v2 OLK-5.10] iommu/iova: decrease cpu rcaches to improve iova rcache performance
by Zhang Zekun 13 Jun '23

13 Jun '23
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ASVH CVE: NA --------------------------------------- The logic of cpu rcaches could cause severe performance problem. It is because that the logic of iova rcache would create percpu rcache which will be used only by local cpu, free iovas in cpu_rcache won't be shared among different cpus. For example, platform which have 256 cpus with 256 threads will have 417792 entiries( 6 * (256 * 2 + 32) * 128) to cache iova, but every thread running on a cpu can only use 1/256 of total cache size, and this could cause serve performance problem. We limit the max cpu size to 64 to fix this problem, and each iovad could save about 2MB memory. In our test FIO with 256 threads: Jobs: 12 (f=2): [/(1),R(2),/(9)][16.7%][r=2KiB/s][r=0 IOPS][eta 00m:30s] Jobs: 12 (f=12): [R(12)][20.0%][r=1091MiB/s][r=279k IOPS][eta 00m:28s] Jobs: 12 (f=12): [R(12)][22.2%][r=1426MiB/s][r=365k IOPS][eta 00m:28s] Jobs: 12 (f=12): [R(12)][25.0%][r=1607MiB/s][r=411k IOPS][eta 00m:27s] Jobs: 12 (f=12): [R(12)][27.8%][r=1501MiB/s][r=384k IOPS][eta 00m:26s] Jobs: 12 (f=12): [R(12)][30.6%][r=1486MiB/s][r=380k IOPS][eta 00m:25s] Jobs: 12 (f=12): [R(12)][33.3%][r=1393MiB/s][r=357k IOPS][eta 00m:24s] Jobs: 12 (f=12): [R(12)][36.1%][r=1550MiB/s][r=397k IOPS][eta 00m:23s] Jobs: 12 (f=12): [R(12)][38.9%][r=1485MiB/s][r=380k IOPS][eta 00m:22s] After this patch: bs: 10 (f=10): [R(10)][98.4%][r=7414MiB/s][r=1898k IOPS][eta 00m:15s] Jobs: 10 (f=10): [R(10)][98.5%][r=7495MiB/s][r=1919k IOPS][eta 00m:14s] Jobs: 10 (f=10): [R(10)][98.6%][r=7497MiB/s][r=1919k IOPS][eta 00m:13s] Jobs: 10 (f=10): [R(10)][98.7%][r=7497MiB/s][r=1919k IOPS][eta 00m:12s] Jobs: 10 (f=10): [R(10)][98.8%][r=7471MiB/s][r=1913k IOPS][eta 00m:11s] Jobs: 10 (f=10): [R(10)][98.9%][r=7483MiB/s][r=1916k IOPS][eta 00m:10s] Jobs: 10 (f=10): [R(10)][99.0%][r=7491MiB/s][r=1918k IOPS][eta 00m:09s] Jobs: 10 (f=10): [R(10)][99.1%][r=7436MiB/s][r=1904k IOPS][eta 00m:08s] Jobs: 10 (f=10): [R(10)][99.2%][r=7462MiB/s][r=1910k IOPS][eta 00m:07s] can increase about 500% of IOPS. Signed-off-by: Zhang Zekun <zhangzekun11(a)huawei.com> Reviewed-by: Weilong Chen <chenweilong(a)huawei.com> --- v2: -add hulk inclusion drivers/iommu/Kconfig | 10 ++++++++++ drivers/iommu/iova.c | 33 +++++++++++++++++++++++++++++++++ include/linux/iova.h | 5 +++++ 3 files changed, 48 insertions(+) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f04a2bde0018..c71eb20bc702 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -343,6 +343,16 @@ config ARM_SMMU_V3_PM help Add support for suspend and resume support for arm smmu v3. +config ARM_SMMU_V3_REDUCE_CPURCACHE + bool "Add support for cpu rcache" + depends on ARM_SMMU_V3 + default n + help + Add support for reducing the total amount of cpu rcache. When + num of cpus is large, iova rcache will have serve performance + problem. Reduce the amount of cpu rcaches to improve cache hit + rate. + config S390_IOMMU def_bool y if S390 && PCI depends on S390 && PCI diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 1246e8f8bf08..ed86bb09db4f 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -881,11 +881,19 @@ static void init_iova_rcaches(struct iova_domain *iovad) rcache = &iovad->rcaches[i]; spin_lock_init(&rcache->lock); rcache->depot_size = 0; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + for (cpu = 0; cpu < MAX_CPU_SIZE; cpu++) { + rcache->cpu_rcaches[cpu] = kmalloc(sizeof(*cpu_rcache), GFP_KERNEL); + if (WARN_ON(!rcache->cpu_rcaches[cpu])) + continue; + cpu_rcache = rcache->cpu_rcaches[cpu]; +#else rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); if (WARN_ON(!rcache->cpu_rcaches)) continue; for_each_possible_cpu(cpu) { cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); +#endif spin_lock_init(&cpu_rcache->lock); cpu_rcache->loaded = iova_magazine_alloc(GFP_KERNEL); cpu_rcache->prev = iova_magazine_alloc(GFP_KERNEL); @@ -907,8 +915,14 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, struct iova_cpu_rcache *cpu_rcache; bool can_insert = false; unsigned long flags; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + int cpu; + cpu = raw_smp_processor_id(); + cpu_rcache = rcache->cpu_rcaches[cpu % MAX_CPU_SIZE]; +#else cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches); +#endif spin_lock_irqsave(&cpu_rcache->lock, flags); if (!iova_magazine_full(cpu_rcache->loaded)) { @@ -970,8 +984,14 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache, unsigned long iova_pfn = 0; bool has_pfn = false; unsigned long flags; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + int cpu; + cpu = raw_smp_processor_id(); + cpu_rcache = rcache->cpu_rcaches[cpu % MAX_CPU_SIZE]; +#else cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches); +#endif spin_lock_irqsave(&cpu_rcache->lock, flags); if (!iova_magazine_empty(cpu_rcache->loaded)) { @@ -1026,12 +1046,21 @@ static void free_iova_rcaches(struct iova_domain *iovad) for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { rcache = &iovad->rcaches[i]; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + for (cpu = 0; cpu < MAX_CPU_SIZE; cpu++) { + cpu_rcache = rcache->cpu_rcaches[cpu]; + iova_magazine_free(cpu_rcache->loaded); + iova_magazine_free(cpu_rcache->prev); + kfree(cpu_rcache); + } +#else for_each_possible_cpu(cpu) { cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); iova_magazine_free(cpu_rcache->loaded); iova_magazine_free(cpu_rcache->prev); } free_percpu(rcache->cpu_rcaches); +#endif for (j = 0; j < rcache->depot_size; ++j) iova_magazine_free(rcache->depot[j]); } @@ -1049,7 +1078,11 @@ void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { rcache = &iovad->rcaches[i]; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + cpu_rcache = rcache->cpu_rcaches[cpu % MAX_CPU_SIZE]; +#else cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); +#endif spin_lock_irqsave(&cpu_rcache->lock, flags); iova_magazine_free_pfns(cpu_rcache->loaded, iovad); iova_magazine_free_pfns(cpu_rcache->prev, iovad); diff --git a/include/linux/iova.h b/include/linux/iova.h index dfa51ae49666..06897fc970eb 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -27,12 +27,17 @@ struct iova_cpu_rcache; #define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */ #define MAX_GLOBAL_MAGS 32 /* magazines per bin */ +#define MAX_CPU_SIZE (NR_CPUS > 64 ? 64 : NR_CPUS) /* max cpu rcaches */ struct iova_rcache { spinlock_t lock; unsigned long depot_size; struct iova_magazine *depot[MAX_GLOBAL_MAGS]; +#ifdef CONFIG_ARM_SMMU_V3_REDUCE_CPURCACHE + struct iova_cpu_rcache *cpu_rcaches[MAX_CPU_SIZE]; +#else struct iova_cpu_rcache __percpu *cpu_rcaches; +#endif }; struct iova_domain; -- 2.17.1
1 0
0 0
[PATCH OLK-5.10 v3 0/2] Add vhost net polling to suppress kick ratio
by Xu Kuohai 13 Jun '23

13 Jun '23
During high-frequency packet transmission, if vhost immediately notifies virtio driver after reading the virtio_queue empty, virtio driver will perform a kick after sending the next packet. However, if vhost waits a little longer, it could get the next packet sent by virtio driver without the need for virtio to perform a kick. This patch optimizes for this issue. If the TX interval recently recorded by vhost-net is within 50us, it is considered a high-frequency packet sending. At this time, after detecting the virtio_queue is empty, vhost-net will wait for new packets to arrive before notifing the virtio driver. Xu Kuohai (2): vhost_net: Suppress kick ratio when high frequency TX detected openeuler: vhost_net: Enable vhost net polling for openeuler arm64 and x86 arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/vhost/Kconfig | 9 +++++ drivers/vhost/net.c | 49 ++++++++++++++++++++++++++ 4 files changed, 60 insertions(+) -- 2.30.2
2 3
0 0
[PATCH OLK-5.10] proc: allow pid_revalidate() during LOOKUP_RCU
by Li Nan 13 Jun '23

13 Jun '23
From: Stephen Brennan <stephen.s.brennan(a)oracle.com> mainline inclusion from mainline-v5.16-rc1 commit da4d6b9cf80ae5b0083f640133b85b68b53b6497 category: bugfix bugzilla: 188892, https://gitee.com/openeuler/kernel/issues/I7CWJ7 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?i… ---------------------------------------- Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since b17ec22fb339 ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.… Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com> Cc: Konrad Wilk <konrad.wilk(a)oracle.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Li Nan <linan122(a)huawei.com> --- fs/proc/base.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 22c65289128e..24c70ff923b8 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2101,19 +2101,21 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int ret = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); + rcu_read_lock(); + inode = d_inode_rcu(dentry); + if (!inode) + goto out; + task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { pid_update_inode(task, inode); - put_task_struct(task); - return 1; + ret = 1; } - return 0; +out: + rcu_read_unlock(); + return ret; } static inline bool proc_inode_is_dead(struct inode *inode) -- 2.39.2
1 0
0 0
[PATCH openEuler-22.03-LTS-SP2 v2 0/2] add two bpf helper function for redissockmap
by Liu Jian 13 Jun '23

13 Jun '23
v1->v2: 1) add error handling in localip_init() 2) add static attribute to localip_init() and localip_cleanup() add two bpf helper function for redissockmap Liu Jian (2): net: add bpf_is_local_ipaddr bpf helper function net: let sockops can use bpf_get_current_comm() arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/net/Kconfig | 8 ++ drivers/net/Makefile | 1 + drivers/net/localip/Makefile | 8 ++ drivers/net/localip/localip.c | 149 +++++++++++++++++++++++++ include/uapi/linux/bpf.h | 7 ++ net/core/filter.c | 22 ++++ tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 204 insertions(+) create mode 100644 drivers/net/localip/Makefile create mode 100644 drivers/net/localip/localip.c -- 2.34.1
1 2
0 0
[OLK-5.10] proc: allow pid_revalidate() during LOOKUP_RCU
by Li Nan 13 Jun '23

13 Jun '23
From: Stephen Brennan <stephen.s.brennan(a)oracle.com> mainline inclusion from mainline-v5.16-rc1 commit da4d6b9cf80ae5b0083f640133b85b68b53b6497 category: bugfix bugzilla: 188892, https://gitee.com/openeuler/kernel/issues/I7CWJ7 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?i… ---------------------------------------- Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since b17ec22fb339 ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.… Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com> Cc: Konrad Wilk <konrad.wilk(a)oracle.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Li Nan <linan122(a)huawei.com> --- fs/proc/base.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 22c65289128e..24c70ff923b8 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2101,19 +2101,21 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int ret = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); + rcu_read_lock(); + inode = d_inode_rcu(dentry); + if (!inode) + goto out; + task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { pid_update_inode(task, inode); - put_task_struct(task); - return 1; + ret = 1; } - return 0; +out: + rcu_read_unlock(); + return ret; } static inline bool proc_inode_is_dead(struct inode *inode) -- 2.39.2
1 0
0 0
[PATCH OLK-5.10 v2 0/2] Add vhost net polling to suppress kick ratio
by Xu Kuohai 13 Jun '23

13 Jun '23
During high-frequency packet transmission, if vhost immediately notifies virtio driver after reading the virtio_queue empty, virtio driver will perform a kick after sending the next packet. However, if vhost waits a little longer, it could get the next packet sent by virtio driver without the need for virtio to perform a kick. This patch optimizes for this issue. If the TX interval recently recorded by vhost-net is within 50us, it is considered a high-frequency packet sending. At this time, after detecting the virtio_queue is empty, vhost-net will wait for new packets to arrive before notifing the virtio driver. Xu Kuohai (2): vhost_net: Suppress kick ratio when high frequency TX detected vhost_net: Enable vhost net polling for openeuler arm64 and x86 arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/vhost/Kconfig | 9 ++++++ drivers/vhost/net.c | 45 ++++++++++++++++++++++++++ 4 files changed, 56 insertions(+) -- 2.30.2
2 3
0 0
[OLK-5.10 v2] irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
by chenxiang 12 Jun '23

12 Jun '23
From: Xiang Chen <chenxiang66(a)hisilicon.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7C103 ------------------------------------------------ When enabled GICv4.1 in hip09, there are some invalid vPE configurations in configuration table for some situations, which will cause some vSGI interrupts lost. To fix the issue, need to send vinvall command after vmovp. Signed-off-by: Nianyao Tang <tangnianyao(a)huawei.com> Signed-off-by: Xiang Chen <chenxiang66(a)hisilicon.com> Signed-off-by: caijian <caijian11(a)h-partners.com> --- Documentation/arm64/silicon-errata.rst | 2 ++ arch/arm64/Kconfig | 11 +++++++ arch/arm64/configs/openeuler_defconfig | 1 + drivers/irqchip/irq-gic-v3-its.c | 40 ++++++++++++++++++++------ 4 files changed, 46 insertions(+), 8 deletions(-) diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index b72fbbfe3fcb..2305def38396 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -149,6 +149,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | TSV{110,200} | #1980005 | HISILICON_ERRATUM_1980005 | +----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip09 | #162100801 | HISILICON_ERRATUM_162100801 | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3dcf96f37a3c..766356f04abc 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -802,6 +802,17 @@ config HISILICON_ERRATUM_1980005 If unsure, say N. +config HISILICON_ERRATUM_162100801 + bool "Hip09 162100801 erratum support" + default y + help + When enabled GICv4.1 in hip09, there are some invalid vPE config + in configuration tables for some situation, which will cause vSGI + interrupts lost. So fix it by sending vinvall commands after vmovp. + + If unsure, say Y. + + config QCOM_FALKOR_ERRATUM_1003 bool "Falkor E1003: Incorrect translation due to ASID change" default y diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 8261f11b54fd..1b633c835279 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -390,6 +390,7 @@ CONFIG_QCOM_FALKOR_ERRATUM_E1041=y CONFIG_SOCIONEXT_SYNQUACER_PREITS=y CONFIG_HISILICON_ERRATUM_HIP08_RU_PREFETCH=y # CONFIG_HISILICON_HIP08_RU_PREFETCH_DEFAULT_OFF is not set +CONFIG_HISILICON_ERRATUM_162100801=y # end of ARM errata workarounds via the alternatives framework CONFIG_ARM64_4K_PAGES=y diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 99fcc5601ea4..db8219d65e7c 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -46,6 +46,7 @@ #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0) #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1) #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2) +#define ITS_FLAGS_WORKAROUND_HISILICON_162100801 (1ULL << 3) #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0) #define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1) @@ -1294,6 +1295,14 @@ static void its_send_vmapp(struct its_node *its, its_send_single_vcommand(its, its_build_vmapp_cmd, &desc); } +static void its_send_vinvall(struct its_node *its, struct its_vpe *vpe) +{ + struct its_cmd_desc desc; + + desc.its_vinvall_cmd.vpe = vpe; + its_send_single_vcommand(its, its_build_vinvall_cmd, &desc); +} + static void its_send_vmovp(struct its_vpe *vpe) { struct its_cmd_desc desc = {}; @@ -1307,6 +1316,10 @@ static void its_send_vmovp(struct its_vpe *vpe) its = list_first_entry(&its_nodes, struct its_node, entry); desc.its_vmovp_cmd.col = &its->collections[col_id]; its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); + if (is_v4_1(its) && (its->flags & + ITS_FLAGS_WORKAROUND_HISILICON_162100801)) + its_send_vinvall(its, vpe); + return; } @@ -1333,19 +1346,14 @@ static void its_send_vmovp(struct its_vpe *vpe) desc.its_vmovp_cmd.col = &its->collections[col_id]; its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); + if (is_v4_1(its) && (its->flags & + ITS_FLAGS_WORKAROUND_HISILICON_162100801)) + its_send_vinvall(its, vpe); } raw_spin_unlock_irqrestore(&vmovp_lock, flags); } -static void its_send_vinvall(struct its_node *its, struct its_vpe *vpe) -{ - struct its_cmd_desc desc; - - desc.its_vinvall_cmd.vpe = vpe; - its_send_single_vcommand(its, its_build_vinvall_cmd, &desc); -} - static void its_send_vinv(struct its_device *dev, u32 event_id) { struct its_cmd_desc desc; @@ -4963,6 +4971,14 @@ static bool __maybe_unused its_enable_quirk_hip07_161600802(void *data) return true; } +static bool __maybe_unused its_enable_quirk_hip09_162100801(void *data) +{ + struct its_node *its = data; + + its->flags |= ITS_FLAGS_WORKAROUND_HISILICON_162100801; + return true; +} + static const struct gic_quirk its_quirks[] = { #ifdef CONFIG_CAVIUM_ERRATUM_22375 { @@ -5008,6 +5024,14 @@ static const struct gic_quirk its_quirks[] = { .mask = 0xffffffff, .init = its_enable_quirk_hip07_161600802, }, +#endif +#ifdef CONFIG_HISILICON_ERRATUM_162100801 + { + .desc = "ITS: Hip09 erratum 162100801", + .iidr = 0x00051736, + .mask = 0xffffffff, + .init = its_enable_quirk_hip09_162100801, + }, #endif { } -- 2.30.0
2 1
0 0
[PATCH openEuler-22.03-LTS-SP2 0/2] add two bpf helper function for redissockmap
by Liu Jian 12 Jun '23

12 Jun '23
add two bpf helper function for redissockmap Liu Jian (2): net: add bpf_is_local_ipaddr bpf helper function net: let sockops can use bpf_get_current_comm() arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/net/Kconfig | 8 ++ drivers/net/Makefile | 1 + drivers/net/localip/Makefile | 8 ++ drivers/net/localip/localip.c | 146 +++++++++++++++++++++++++ include/uapi/linux/bpf.h | 7 ++ net/core/filter.c | 22 ++++ tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 201 insertions(+) create mode 100644 drivers/net/localip/Makefile create mode 100644 drivers/net/localip/localip.c -- 2.34.1
2 3
0 0
[PATCH openEuler-22.03-LTS-SP2] irqchip: gic-v3: Collection table support muti pages
by wangwudi 12 Jun '23

12 Jun '23
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7CX6S CVE: NA -------------------------------------------------------------------------- Only one page is allocated to the collection table. Recalculate the page number of collection table based on the number of CPUs. Signed-off-by: wangwudi <wangwudi(a)hisilicon.com> --- drivers/irqchip/irq-gic-v3-its.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 99fcc5601ea4..fa45fd7ed173 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2665,6 +2665,10 @@ static int its_alloc_tables(struct its_node *its) indirect = its_parse_indirect_baser(its, baser, &order, ITS_MAX_VPEID_BITS); break; + case GITS_BASER_TYPE_COLLECTION: + indirect = its_parse_indirect_baser(its, baser, &order, + order_base_2(num_possible_cpus())); + break; } err = its_setup_baser(its, baser, cache, shr, order, indirect); -- 2.31.0
2 1
0 0
  • ← Newer
  • 1
  • ...
  • 1512
  • 1513
  • 1514
  • 1515
  • 1516
  • 1517
  • 1518
  • ...
  • 1822
  • Older →

HyperKitty Powered by HyperKitty