From: Toke Høiland-Jørgensen <toke(a)redhat.com>
stable inclusion
from stable-v5.10.214
commit 225da02acdc97af01b6bc6ce1a3e5362bf01d3fb
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9HK1R
CVE: CVE-2024-26885
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]
The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.
Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a(a)syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Message-ID: <20240307120340.99577-2-toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Conflicts:
kernel/bpf/devmap.c
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
---
kernel/bpf/devmap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 01149821ded9..7eb1282edc8e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -131,10 +131,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
bpf_map_init_from_attr(&dtab->map, attr);
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
- dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
-
- if (!dtab->n_buckets) /* Overflow check */
+ /* hash table size must be power of 2; roundup_pow_of_two() can
+ * overflow into UB on 32-bit arches, so check that first
+ */
+ if (dtab->map.max_entries > 1UL << 31)
return -EINVAL;
+
+ dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
} else {
cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
--
2.17.1
From: Toke Høiland-Jørgensen <toke(a)redhat.com>
stable inclusion
from stable-v5.10.214
commit 225da02acdc97af01b6bc6ce1a3e5362bf01d3fb
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9HK1R
CVE: CVE-2024-26885
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]
The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.
Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a(a)syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Message-ID: <20240307120340.99577-2-toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Conflicts:
kernel/bpf/devmap.c
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
---
kernel/bpf/devmap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 01149821ded9..7eb1282edc8e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -131,10 +131,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
bpf_map_init_from_attr(&dtab->map, attr);
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
- dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
-
- if (!dtab->n_buckets) /* Overflow check */
+ /* hash table size must be power of 2; roundup_pow_of_two() can
+ * overflow into UB on 32-bit arches, so check that first
+ */
+ if (dtab->map.max_entries > 1UL << 31)
return -EINVAL;
+
+ dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
} else {
cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
--
2.17.1
From: Toke Høiland-Jørgensen <toke(a)redhat.com>
stable inclusion
from stable-v5.10.214
commit 225da02acdc97af01b6bc6ce1a3e5362bf01d3fb
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9HK1R
CVE: CVE-2024-26885
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]
The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.
Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a(a)syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Message-ID: <20240307120340.99577-2-toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Conflicts:
kernel/bpf/devmap.c
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
---
kernel/bpf/devmap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 01149821ded9..7eb1282edc8e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -131,10 +131,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
bpf_map_init_from_attr(&dtab->map, attr);
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
- dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
-
- if (!dtab->n_buckets) /* Overflow check */
+ /* hash table size must be power of 2; roundup_pow_of_two() can
+ * overflow into UB on 32-bit arches, so check that first
+ */
+ if (dtab->map.max_entries > 1UL << 31)
return -EINVAL;
+
+ dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
} else {
cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
--
2.17.1
From: Toke Høiland-Jørgensen <toke(a)redhat.com>
stable inclusion
from stable-v5.10.214
commit 225da02acdc97af01b6bc6ce1a3e5362bf01d3fb
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9HK1R
CVE: CVE-2024-26885
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]
The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.
Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a(a)syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Message-ID: <20240307120340.99577-2-toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Conflicts:
kernel/bpf/devmap.c
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
---
kernel/bpf/devmap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 01149821ded9..7eb1282edc8e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -131,10 +131,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
bpf_map_init_from_attr(&dtab->map, attr);
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
- dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
-
- if (!dtab->n_buckets) /* Overflow check */
+ /* hash table size must be power of 2; roundup_pow_of_two() can
+ * overflow into UB on 32-bit arches, so check that first
+ */
+ if (dtab->map.max_entries > 1UL << 31)
return -EINVAL;
+
+ dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
} else {
cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
--
2.17.1
hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I9N68A
CVE: NA
--------------------------------
After committing df0ad0579ac9 (‘mm: align larger anonymous mappings on THP
boundaries’), memory read latency increased significantly on arm64 but not
x86_64, so it's temporarily disabled on arm64 and we'll turn it back on
once it's fixed.
Signed-off-by: Ze Zuo <zuoze1(a)huawei.com>
---
mm/mmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 6cd5d3f607e5..77e66c5a1f46 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1848,7 +1848,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
* so use shmem's get_unmapped_area in case it can be huge.
*/
get_area = shmem_get_unmapped_area;
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ !IS_ENABLED(CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT)) {
/* Ensures that larger anonymous mappings are THP aligned. */
get_area = thp_get_unmapped_area;
}
--
2.33.0
From: Jianxing Wang <wangjianxing(a)loongson.cn>
mainline inclusion
from mainline-v5.19-rc1
commit b191c9bc334a936775843867485c207e23b30e1b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9N6YV
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
free a large list of pages maybe cause rcu_sched starved on
non-preemptible kernels. howerver free_unref_page_list maybe can't
cond_resched as it maybe called in interrupt or atomic context, especially
can't detect atomic context in CONFIG_PREEMPTION=n.
The issue is detected in guest with kvm cpu 200% overcommit, however I
didn't see the warning in the host with the same application. I'm sure
that the patch is needed for guest kernel, but no sure for host.
To reproduce, set up two virtual machines in one host machine, per vm has
the same number cpu and half memory of host. the run ltpstress.sh in per
vm, then will see rcu stall warning.kernel is preempt disabled, append
kernel command 'preempt=none' if enable dynamic preempt . It could
detected in loongson machine(32 core, 128G mem) and ProLiant DL380
Gen9(x86 E5-2680, 28 core, 64G mem)
tlb flush batch count depends on PAGE_SIZE, it's too large if PAGE_SIZE >
4K, here limit free batch count with 512. And add schedule point in
tlb_batch_pages_flush.
rcu: rcu_sched kthread starved for 5359 jiffies! g454793 f0x0
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=19
[...]
Call Trace:
free_unref_page_list+0x19c/0x270
release_pages+0x3cc/0x498
tlb_flush_mmu_free+0x44/0x70
zap_pte_range+0x450/0x738
unmap_page_range+0x108/0x240
unmap_vmas+0x74/0xf0
unmap_region+0xb0/0x120
do_munmap+0x264/0x438
vm_munmap+0x58/0xa0
sys_munmap+0x10/0x20
syscall_common+0x24/0x38
Link: https://lkml.kernel.org/r/20220317072857.2635262-1-wangjianxing@loongson.cn
Signed-off-by: Jianxing Wang <wangjianxing(a)loongson.cn>
Signed-off-by: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Conflicts:
mm/mmu_gather
[Context conflicts]
Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com>
---
mm/mmu_gather.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index a44cf211ffee..2b3f6967176f 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -71,8 +71,20 @@ void tlb_flush_mmu_free(struct mmu_gather *tlb)
tlb_table_flush(tlb);
#endif
for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
- free_pages_and_swap_cache(batch->pages, batch->nr);
- batch->nr = 0;
+ struct page **pages = batch->pages;
+
+ do {
+ /*
+ * limit free batch count when PAGE_SIZE > 4K
+ */
+ unsigned int nr = min(512U, batch->nr);
+
+ free_pages_and_swap_cache(pages, nr);
+ pages += nr;
+ batch->nr -= nr;
+
+ cond_resched();
+ } while (batch->nr);
}
tlb->active = &tlb->local;
}
--
2.25.1