From: Peng Liu liupeng256@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4V388 CVE: NA
--------------------------------
KFENCE is designed to be enabled in production kernels, but it can be also useful in some debug situations. For machines with limited memory and CPU resources, KASAN is really hard to run. Fortunately, KFENCE can be a suitable candidate. For KFENCE running on a single machine, the possibility of discovering existed bugs will increase as the increasing of KFENCE objects, but this will cost more memory. In order to balance the possibility of discovering existed bugs and memory cost, KFENCE objects need to be adjusted according to memory resources for a compiled kernel Image. Add a module parameter to adjust KFENCE objects will make kfence to use in different machines with the same kernel Image.
In short, the following reasons motivate us to add this parameter. 1) In some debug situations, this will make kfence flexible. 2) For some production machines with different memory and CPU size, this will reduce the kernel-Image-version burden.
The main change is just using kfence_num_objects to replace config CONFIG_KFENCE_NUM_OBJECTS for dynamic configuration convenient. To make compatible, kfence_metadata and alloc_covered are alloced by memblock_alloc. Considering "cat /sys/kernel/debug/kfence/objects" will read kfence_metadata, the initialization of this fs should check whether kfence_metadata is successfully allocated.
Unfortunately, dynamic allocation require the KFENCE pool size to be a configurable variable, which lead to additional instructions (eg, load) added to the fast path of the memory allocation. As a result, the performance will degrade. To avoid bad performance in production machine, an ugly macro is used to isolate the changes.
Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Cheng Jian cj.chengjian@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- Documentation/dev-tools/kfence.rst | 8 +- include/linux/kfence.h | 9 +- lib/Kconfig.kfence | 10 +++ mm/kfence/core.c | 138 ++++++++++++++++++++++++++--- mm/kfence/kfence.h | 6 +- mm/kfence/kfence_test.c | 2 +- 6 files changed, 155 insertions(+), 18 deletions(-)
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst index ac6b89d1a8c3..5d194615aed0 100644 --- a/Documentation/dev-tools/kfence.rst +++ b/Documentation/dev-tools/kfence.rst @@ -41,13 +41,19 @@ guarded by KFENCE. The default is configurable via the Kconfig option ``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` disables KFENCE.
-The KFENCE memory pool is of fixed size, and if the pool is exhausted, no +If ``CONFIG_KFENCE_DYNAMIC_OBJECTS`` is disabled, +the KFENCE memory pool is of fixed size, and if the pool is exhausted, no further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number of available guarded objects can be controlled. Each object requires 2 pages, one for the object itself and the other one used as a guard page; object pages are interleaved with guard pages, and every object page is therefore surrounded by two guard pages.
+If ``CONFIG_KFENCE_DYNAMIC_OBJECTS`` is enabled, +the KFENCE memory pool size could be set via the kernel boot parameter +``kfence.num_objects``. Note, the performance will degrade due to additional +instructions(eg, load) added to the fast path of the memory allocation. + The total memory dedicated to the KFENCE memory pool can be computed as::
( #objects + 1 ) * 2 * PAGE_SIZE diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 4b5e3679a72c..3ea58c70d9c7 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -17,12 +17,19 @@ #include <linux/atomic.h> #include <linux/static_key.h>
+#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +extern unsigned long kfence_num_objects; +#define KFENCE_NR_OBJECTS kfence_num_objects +#else +#define KFENCE_NR_OBJECTS CONFIG_KFENCE_NUM_OBJECTS +#endif + /* * We allocate an even number of pages, as it simplifies calculations to map * address to metadata indices; effectively, the very first page serves as an * extended guard page, but otherwise has no special purpose. */ -#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE) +#define KFENCE_POOL_SIZE ((KFENCE_NR_OBJECTS + 1) * 2 * PAGE_SIZE) extern char *__kfence_pool;
DECLARE_STATIC_KEY_FALSE(kfence_allocation_key); diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence index 912f252a41fc..f7adceb4a4cf 100644 --- a/lib/Kconfig.kfence +++ b/lib/Kconfig.kfence @@ -45,6 +45,16 @@ config KFENCE_NUM_OBJECTS pages are required; with one containing the object and two adjacent ones used as guard pages.
+config KFENCE_DYNAMIC_OBJECTS + bool "Support dynamic configuration of the number of guarded objects" + default n + help + Enable dynamic configuration of the number of KFENCE guarded objects. + If this config is enabled, the number of KFENCE guarded objects could + be overridden via boot parameter "kfence.num_objects". Note that the + performance will degrade due to additional instructions(eg, load) + added to the fast path of the memory allocation. + config KFENCE_STATIC_KEYS bool "Use static keys to set up allocations" if EXPERT depends on JUMP_LABEL diff --git a/mm/kfence/core.c b/mm/kfence/core.c index a19154a8d196..0249af5f8244 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -93,13 +93,73 @@ module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644) char *__kfence_pool __ro_after_init; EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
+#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +/* + * The number of kfence objects will affect performance and bug detection + * accuracy. The initial value of this global parameter is determined by + * compiling settings. + */ +unsigned long kfence_num_objects = CONFIG_KFENCE_NUM_OBJECTS; +EXPORT_SYMBOL(kfence_num_objects); /* Export for test modules. */ + +#define MIN_KFENCE_OBJECTS 1 +#define MAX_KFENCE_OBJECTS 65535 + +static int param_set_num_objects(const char *val, const struct kernel_param *kp) +{ + unsigned long num; + + if (system_state != SYSTEM_BOOTING) + return -EINVAL; /* Cannot adjust KFENCE objects number on-the-fly. */ + + if (kstrtoul(val, 0, &num) < 0) + return -EINVAL; + + if (num < MIN_KFENCE_OBJECTS || num > MAX_KFENCE_OBJECTS) { + pr_warn("kfence_num_objects = %lu is not in valid range [%d, %d]\n", + num, MIN_KFENCE_OBJECTS, MAX_KFENCE_OBJECTS); + return -EINVAL; + } + + *((unsigned long *)kp->arg) = num; + return 0; +} + +static int param_get_num_objects(char *buffer, const struct kernel_param *kp) +{ + if (!READ_ONCE(kfence_enabled)) + return sprintf(buffer, "0\n"); + + return param_get_ulong(buffer, kp); +} + +static const struct kernel_param_ops num_objects_param_ops = { + .set = param_set_num_objects, + .get = param_get_num_objects, +}; +module_param_cb(num_objects, &num_objects_param_ops, &kfence_num_objects, 0600); +#endif + /* * Per-object metadata, with one-to-one mapping of object metadata to * backing pages (in __kfence_pool). */ +#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +struct kfence_metadata *kfence_metadata; +static phys_addr_t metadata_size; + +static inline bool kfence_metadata_valid(void) +{ + return !!kfence_metadata; +} + +#else static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0); struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
+static inline bool kfence_metadata_valid(void) { return true; } +#endif + /* Freelist with available objects. */ static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist); static DEFINE_RAW_SPINLOCK(kfence_freelist_lock); /* Lock protecting freelist. */ @@ -124,11 +184,16 @@ atomic_t kfence_allocation_gate = ATOMIC_INIT(1); * P(alloc_traces) = (1 - e^(-HNUM * (alloc_traces / SIZE)) ^ HNUM */ #define ALLOC_COVERED_HNUM 2 -#define ALLOC_COVERED_ORDER (const_ilog2(CONFIG_KFENCE_NUM_OBJECTS) + 2) +#define ALLOC_COVERED_ORDER (const_ilog2(KFENCE_NR_OBJECTS) + 2) #define ALLOC_COVERED_SIZE (1 << ALLOC_COVERED_ORDER) #define ALLOC_COVERED_HNEXT(h) hash_32(h, ALLOC_COVERED_ORDER) #define ALLOC_COVERED_MASK (ALLOC_COVERED_SIZE - 1) +#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +static atomic_t *alloc_covered; +static phys_addr_t covered_size; +#else static atomic_t alloc_covered[ALLOC_COVERED_SIZE]; +#endif
/* Stack depth used to determine uniqueness of an allocation. */ #define UNIQUE_ALLOC_STACK_DEPTH ((size_t)8) @@ -168,7 +233,7 @@ static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT);
static inline bool should_skip_covered(void) { - unsigned long thresh = (CONFIG_KFENCE_NUM_OBJECTS * kfence_skip_covered_thresh) / 100; + unsigned long thresh = (KFENCE_NR_OBJECTS * kfence_skip_covered_thresh) / 100;
return atomic_long_read(&counters[KFENCE_COUNTER_ALLOCATED]) > thresh; } @@ -236,7 +301,7 @@ static inline struct kfence_metadata *addr_to_metadata(unsigned long addr) * error. */ index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) - 1; - if (index < 0 || index >= CONFIG_KFENCE_NUM_OBJECTS) + if (index < 0 || index >= KFENCE_NR_OBJECTS) return NULL;
return &kfence_metadata[index]; @@ -251,7 +316,7 @@ static inline unsigned long metadata_to_pageaddr(const struct kfence_metadata *m
/* Only call with a pointer into kfence_metadata. */ if (KFENCE_WARN_ON(meta < kfence_metadata || - meta >= kfence_metadata + CONFIG_KFENCE_NUM_OBJECTS)) + meta >= kfence_metadata + KFENCE_NR_OBJECTS)) return 0;
/* @@ -576,7 +641,7 @@ static bool __init kfence_init_pool(void) addr += PAGE_SIZE; }
- for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + for (i = 0; i < KFENCE_NR_OBJECTS; i++) { struct kfence_metadata *meta = &kfence_metadata[i];
/* Initialize metadata. */ @@ -637,7 +702,7 @@ DEFINE_SHOW_ATTRIBUTE(stats); */ static void *start_object(struct seq_file *seq, loff_t *pos) { - if (*pos < CONFIG_KFENCE_NUM_OBJECTS) + if (*pos < KFENCE_NR_OBJECTS) return (void *)((long)*pos + 1); return NULL; } @@ -649,7 +714,7 @@ static void stop_object(struct seq_file *seq, void *v) static void *next_object(struct seq_file *seq, void *v, loff_t *pos) { ++*pos; - if (*pos < CONFIG_KFENCE_NUM_OBJECTS) + if (*pos < KFENCE_NR_OBJECTS) return (void *)((long)*pos + 1); return NULL; } @@ -691,7 +756,11 @@ static int __init kfence_debugfs_init(void) struct dentry *kfence_dir = debugfs_create_dir("kfence", NULL);
debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops); - debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops); + + /* Variable kfence_metadata may fail to allocate. */ + if (kfence_metadata_valid()) + debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops); + return 0; }
@@ -751,6 +820,40 @@ static void toggle_allocation_gate(struct work_struct *work) } static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
+#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +static int __init kfence_dynamic_init(void) +{ + metadata_size = sizeof(struct kfence_metadata) * KFENCE_NR_OBJECTS; + kfence_metadata = memblock_alloc(metadata_size, PAGE_SIZE); + if (!kfence_metadata) { + pr_err("failed to allocate metadata\n"); + return -ENOMEM; + } + + covered_size = sizeof(atomic_t) * KFENCE_NR_OBJECTS; + alloc_covered = memblock_alloc(covered_size, PAGE_SIZE); + if (!alloc_covered) { + memblock_free((phys_addr_t)kfence_metadata, metadata_size); + kfence_metadata = NULL; + pr_err("failed to allocate covered\n"); + return -ENOMEM; + } + + return 0; +} + +static void __init kfence_dynamic_destroy(void) +{ + memblock_free((phys_addr_t)alloc_covered, covered_size); + alloc_covered = NULL; + memblock_free((phys_addr_t)kfence_metadata, metadata_size); + kfence_metadata = NULL; +} +#else +static int __init kfence_dynamic_init(void) { return 0; } +static void __init kfence_dynamic_destroy(void) { } +#endif + /* === Public interface ===================================================== */
void __init kfence_alloc_pool(void) @@ -758,10 +861,14 @@ void __init kfence_alloc_pool(void) if (!kfence_sample_interval) return;
- __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); + if (kfence_dynamic_init()) + return;
- if (!__kfence_pool) + __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); + if (!__kfence_pool) { pr_err("failed to allocate pool\n"); + kfence_dynamic_destroy(); + } }
void __init kfence_init(void) @@ -780,8 +887,8 @@ void __init kfence_init(void) static_branch_enable(&kfence_allocation_key); WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); - pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, - CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, + pr_info("initialized - using %lu bytes for %lu objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, + (unsigned long)KFENCE_NR_OBJECTS, (void *)__kfence_pool, (void *)(__kfence_pool + KFENCE_POOL_SIZE)); }
@@ -791,7 +898,10 @@ void kfence_shutdown_cache(struct kmem_cache *s) struct kfence_metadata *meta; int i;
- for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + if (!kfence_metadata_valid()) + return; + + for (i = 0; i < KFENCE_NR_OBJECTS; i++) { bool in_use;
meta = &kfence_metadata[i]; @@ -830,7 +940,7 @@ void kfence_shutdown_cache(struct kmem_cache *s) } }
- for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + for (i = 0; i < KFENCE_NR_OBJECTS; i++) { meta = &kfence_metadata[i];
/* See above. */ diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 2a2d5de9d379..e5f8f8577911 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -91,7 +91,11 @@ struct kfence_metadata { u32 alloc_stack_hash; };
-extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; +#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +extern struct kfence_metadata *kfence_metadata; +#else +extern struct kfence_metadata kfence_metadata[KFENCE_NR_OBJECTS]; +#endif
/* KFENCE error types for report generation. */ enum kfence_error_type { diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index f1690cf54199..213a49e0c742 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -621,7 +621,7 @@ static void test_gfpzero(struct kunit *test) break; test_free(buf2);
- if (i == CONFIG_KFENCE_NUM_OBJECTS) { + if (i == KFENCE_NR_OBJECTS) { kunit_warn(test, "giving up ... cannot get same object back\n"); return; }
From: Peng Liu liupeng256@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4V388 CVE: NA
--------------------------------
The parameter kfence_sample_interval can be set via boot parameter and late shell command, which is convenient for automated tests and KFENCE parameter optimization. However, KFENCE test case just uses compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test case not run as users desired. Export kfence_sample_interval, so that KFENCE test case can use run-time-set sample interval.
Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/kfence.h | 2 ++ mm/kfence/core.c | 3 ++- mm/kfence/kfence_test.c | 8 ++++---- 3 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 3ea58c70d9c7..f77b0e4de937 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -17,6 +17,8 @@ #include <linux/atomic.h> #include <linux/static_key.h>
+extern unsigned long kfence_sample_interval; + #ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS extern unsigned long kfence_num_objects; #define KFENCE_NR_OBJECTS kfence_num_objects diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 0249af5f8244..3a559f0f282d 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -47,7 +47,8 @@
static bool kfence_enabled __read_mostly;
-static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; +unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; +EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */
#ifdef MODULE_PARAM_PREFIX #undef MODULE_PARAM_PREFIX diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 213a49e0c742..c9952fc8d596 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -263,13 +263,13 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat * 100x the sample interval should be more than enough to ensure we get * a KFENCE allocation eventually. */ - timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL); + timeout = jiffies + msecs_to_jiffies(100 * kfence_sample_interval); /* * Especially for non-preemption kernels, ensure the allocation-gate * timer can catch up: after @resched_after, every failed allocation * attempt yields, to ensure the allocation-gate timer is scheduled. */ - resched_after = jiffies + msecs_to_jiffies(CONFIG_KFENCE_SAMPLE_INTERVAL); + resched_after = jiffies + msecs_to_jiffies(kfence_sample_interval); do { if (test_cache) alloc = kmem_cache_alloc(test_cache, gfp); @@ -603,7 +603,7 @@ static void test_gfpzero(struct kunit *test) char *buf1, *buf2; int i;
- if (CONFIG_KFENCE_SAMPLE_INTERVAL > 100) { + if (kfence_sample_interval > 100) { kunit_warn(test, "skipping ... would take too long\n"); return; } @@ -737,7 +737,7 @@ static void test_memcache_alloc_bulk(struct kunit *test) * 100x the sample interval should be more than enough to ensure we get * a KFENCE allocation eventually. */ - timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL); + timeout = jiffies + msecs_to_jiffies(100 * kfence_sample_interval); do { void *objects[100]; int i, num = kmem_cache_alloc_bulk(test_cache, GFP_ATOMIC, ARRAY_SIZE(objects),
From: Pablo Neira Ayuso pablo@netfilter.org
mainline inclusion from mainline-v5.17-rc6 commit b1a5983f56e371046dcf164f90bfaf704d2b89f6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4VNH7 CVE: CVE-2022-25636
--------------------------------
immediate verdict expression needs to allocate one slot in the flow offload action array, however, immediate data expression does not need to do so.
fwd and dup expression need to allocate one slot, this is missing.
Add a new offload_action interface to report if this expression needs to allocate one slot in the flow offload action array.
Fixes: be2861dc36d7 ("netfilter: nft_{fwd,dup}_netdev: add offload support") Reported-and-tested-by: Nick Gregory Nick.Gregory@Sophos.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org
conficts: net/netfilter/nft_fwd_netdev.c include/net/netfilter/nf_tables.h
Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wei Yongjun weiyongjun1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/net/netfilter/nf_tables.h | 2 +- include/net/netfilter/nf_tables_offload.h | 2 -- net/netfilter/nf_tables_offload.c | 3 ++- net/netfilter/nft_dup_netdev.c | 6 ++++++ net/netfilter/nft_fwd_netdev.c | 6 ++++++ net/netfilter/nft_immediate.c | 12 +++++++++++- 6 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index ed4a9d098164..76bfb6cd5815 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -825,7 +825,7 @@ struct nft_expr_ops { int (*offload)(struct nft_offload_ctx *ctx, struct nft_flow_rule *flow, const struct nft_expr *expr); - u32 offload_flags; + bool (*offload_action)(const struct nft_expr *expr); const struct nft_expr_type *type; void *data; }; diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h index 434a6158852f..7a453a35a41d 100644 --- a/include/net/netfilter/nf_tables_offload.h +++ b/include/net/netfilter/nf_tables_offload.h @@ -67,8 +67,6 @@ struct nft_flow_rule { struct flow_rule *rule; };
-#define NFT_OFFLOAD_F_ACTION (1 << 0) - void nft_flow_rule_set_addr_type(struct nft_flow_rule *flow, enum flow_dissector_key_id addr_type);
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c index e5fcbb0e4b8e..839fd09f1bb4 100644 --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -94,7 +94,8 @@ struct nft_flow_rule *nft_flow_rule_create(struct net *net,
expr = nft_expr_first(rule); while (nft_expr_more(rule, expr)) { - if (expr->ops->offload_flags & NFT_OFFLOAD_F_ACTION) + if (expr->ops->offload_action && + expr->ops->offload_action(expr)) num_actions++;
expr = nft_expr_next(expr); diff --git a/net/netfilter/nft_dup_netdev.c b/net/netfilter/nft_dup_netdev.c index 40788b3f1071..70c457476b87 100644 --- a/net/netfilter/nft_dup_netdev.c +++ b/net/netfilter/nft_dup_netdev.c @@ -67,6 +67,11 @@ static int nft_dup_netdev_offload(struct nft_offload_ctx *ctx, return nft_fwd_dup_netdev_offload(ctx, flow, FLOW_ACTION_MIRRED, oif); }
+static bool nft_dup_netdev_offload_action(const struct nft_expr *expr) +{ + return true; +} + static struct nft_expr_type nft_dup_netdev_type; static const struct nft_expr_ops nft_dup_netdev_ops = { .type = &nft_dup_netdev_type, @@ -75,6 +80,7 @@ static const struct nft_expr_ops nft_dup_netdev_ops = { .init = nft_dup_netdev_init, .dump = nft_dup_netdev_dump, .offload = nft_dup_netdev_offload, + .offload_action = nft_dup_netdev_offload_action, };
static struct nft_expr_type nft_dup_netdev_type __read_mostly = { diff --git a/net/netfilter/nft_fwd_netdev.c b/net/netfilter/nft_fwd_netdev.c index b77985986b24..3b0dcd170551 100644 --- a/net/netfilter/nft_fwd_netdev.c +++ b/net/netfilter/nft_fwd_netdev.c @@ -77,6 +77,11 @@ static int nft_fwd_netdev_offload(struct nft_offload_ctx *ctx, return nft_fwd_dup_netdev_offload(ctx, flow, FLOW_ACTION_REDIRECT, oif); }
+static bool nft_fwd_netdev_offload_action(const struct nft_expr *expr) +{ + return true; +} + struct nft_fwd_neigh { enum nft_registers sreg_dev:8; enum nft_registers sreg_addr:8; @@ -219,6 +224,7 @@ static const struct nft_expr_ops nft_fwd_netdev_ops = { .dump = nft_fwd_netdev_dump, .validate = nft_fwd_validate, .offload = nft_fwd_netdev_offload, + .offload_action = nft_fwd_netdev_offload_action, };
static const struct nft_expr_ops * diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c index c63eb3b17178..5c9d88560a47 100644 --- a/net/netfilter/nft_immediate.c +++ b/net/netfilter/nft_immediate.c @@ -213,6 +213,16 @@ static int nft_immediate_offload(struct nft_offload_ctx *ctx, return 0; }
+static bool nft_immediate_offload_action(const struct nft_expr *expr) +{ + const struct nft_immediate_expr *priv = nft_expr_priv(expr); + + if (priv->dreg == NFT_REG_VERDICT) + return true; + + return false; +} + static const struct nft_expr_ops nft_imm_ops = { .type = &nft_imm_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_immediate_expr)), @@ -224,7 +234,7 @@ static const struct nft_expr_ops nft_imm_ops = { .dump = nft_immediate_dump, .validate = nft_immediate_validate, .offload = nft_immediate_offload, - .offload_flags = NFT_OFFLOAD_F_ACTION, + .offload_action = nft_immediate_offload_action, };
struct nft_expr_type nft_imm_type __read_mostly = {
From: Cong Wang cong.wang@bytedance.com
mainline inclusion from mainline-v5.14-rc1 commit 1581a6c1c3291a8320b080f4411345f60229976d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4VIOZ?from=project-issue
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently sk_psock_verdict_apply() is void, but it handles some error conditions too. Its caller is impossible to learn whether it succeeds or fails, especially sk_psock_verdict_recv().
Make it return int to indicate error cases and propagate errors to callers properly.
Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program") Signed-off-by: Cong Wang cong.wang@bytedance.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/bpf/20210615021342.7416-7-xiyou.wangcong@gmail.com Conflicts: net/core/skmsg.c Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/core/skmsg.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 5dd5569f89bf..4ee4fe436847 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -765,7 +765,7 @@ static struct sk_psock *sk_psock_from_strp(struct strparser *strp) return container_of(parser, struct sk_psock, parser); }
-static void sk_psock_skb_redirect(struct sk_buff *skb) +static int sk_psock_skb_redirect(struct sk_buff *skb) { struct sk_psock *psock_other; struct sock *sk_other; @@ -776,7 +776,7 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) */ if (unlikely(!sk_other)) { kfree_skb(skb); - return; + return -EIO; } psock_other = sk_psock(sk_other); /* This error indicates the socket is being torn down or had another @@ -786,11 +786,12 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { kfree_skb(skb); - return; + return -EIO; }
skb_queue_tail(&psock_other->ingress_skb, skb); schedule_work(&psock_other->work); + return 0; }
static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict) @@ -826,15 +827,16 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read);
-static void sk_psock_verdict_apply(struct sk_psock *psock, - struct sk_buff *skb, int verdict) +static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, + int verdict) { struct tcp_skb_cb *tcp; struct sock *sk_other; - int err = -EIO; + int err = 0;
switch (verdict) { case __SK_PASS: + err = -EIO; sk_other = psock->sk; if (sock_flag(sk_other, SOCK_DEAD) || !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { @@ -859,13 +861,15 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, } break; case __SK_REDIRECT: - sk_psock_skb_redirect(skb); + err = sk_psock_skb_redirect(skb); break; case __SK_DROP: default: out_free: kfree_skb(skb); } + + return err; }
static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) @@ -967,7 +971,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); skb->sk = NULL; } - sk_psock_verdict_apply(psock, skb, ret); + if (sk_psock_verdict_apply(psock, skb, ret) < 0) + len = 0; out: rcu_read_unlock(); return len;
From: liubo liubo254@huawei.com
euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4VUFS CVE: NA
------------------------------------------------- The etmem_scan.ko module is used to scan the process memory.
The specific usage is as follows: The etmem user mode process issues scan commands through /proc/pid/idle_pages, and the etmem_scan module scans based on the issued address information.
Under certain circumstances, the phenomenon that the scan result is empty may occur. This phenomenon is a normal logic flow and does not need to print the log through WARN_ONCE.
Therefore, Replace WARN_ONCE() with debug_printk for "nothing read"
Signed-off-by: liubo liubo254@huawei.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/proc/etmem_scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/etmem_scan.c b/fs/proc/etmem_scan.c index ec06e606ca7b..8bcb8d3af7c5 100644 --- a/fs/proc/etmem_scan.c +++ b/fs/proc/etmem_scan.c @@ -1244,7 +1244,7 @@ static int mm_idle_walk_range(struct page_idle_ctrl *pic, pic->next_hva, end); ret = 0; } else - WARN_ONCE(1, "nothing read"); + debug_printk("nothing read"); return ret; }
From: Wei Yongjun weiyongjun1@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4UV43 CVE: NA
---------------------------
Allow vendor modules to attach bonding driver hooks. This patch introduce vendor_bond_check_dev_link hook.
Usage:
static void vendor_foo(void *data, const struct bonding *bond, const struct slave *slave, int *state) { pr_info("%s\n", __func__); }
static int __init vendor_bond_init(void) { return register_trace_vendor_bond_check_dev_link(&vendor_foo, NULL); }
static void __exit vendor_bond_exit(void) { unregister_trace_vendor_bond_check_dev_link(&vendor_foo, NULL); }
module_init(vendor_bond_init); module_exit(vendor_bond_exit);
Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Reviewed-by: Zhang Jialin zhangjialin11@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + drivers/hooks/Kconfig | 10 +++++++++ drivers/hooks/vendor_hooks.c | 4 ++++ drivers/net/bonding/bond_main.c | 5 +++++ include/trace/hooks/bonding.h | 29 ++++++++++++++++++++++++++ 6 files changed, 50 insertions(+) create mode 100644 include/trace/hooks/bonding.h
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index b476b105ee10..460fcdcca62b 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -6090,6 +6090,7 @@ CONFIG_USB4=m # Vendor Hooks # CONFIG_VENDOR_HOOKS=y +CONFIG_VENDOR_BOND_HOOKS=y # end of Vendor Hooks
CONFIG_LIBNVDIMM=m diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 6c0f50e766a7..64afe7021c45 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -7183,6 +7183,7 @@ CONFIG_USB4=m # Vendor Hooks # CONFIG_VENDOR_HOOKS=y +CONFIG_VENDOR_BOND_HOOKS=y # end of Vendor Hooks
CONFIG_LIBNVDIMM=m diff --git a/drivers/hooks/Kconfig b/drivers/hooks/Kconfig index 1c0e33ef9a56..6a00168e67ad 100644 --- a/drivers/hooks/Kconfig +++ b/drivers/hooks/Kconfig @@ -10,4 +10,14 @@ config VENDOR_HOOKS Allow vendor modules to attach to tracepoint "hooks" defined via DECLARE_HOOK or DECLARE_RESTRICTED_HOOK.
+config VENDOR_BOND_HOOKS + bool "Ethernet Bonding driver Vendor Hooks" + depends on VENDOR_HOOKS && BONDING + default n + help + Enable ethernet bonding driver vendor hooks + + Allow vendor modules to attach bonding driver hooks defined via + DECLARE_HOOK or DECLARE_RESTRICTED_HOOK. + endmenu diff --git a/drivers/hooks/vendor_hooks.c b/drivers/hooks/vendor_hooks.c index 359989d1bb32..85bda58159f6 100644 --- a/drivers/hooks/vendor_hooks.c +++ b/drivers/hooks/vendor_hooks.c @@ -8,9 +8,13 @@
#define CREATE_TRACE_POINTS #include <trace/hooks/vendor_hooks.h> +#include <trace/hooks/bonding.h>
/* * Export tracepoints that act as a bare tracehook (ie: have no trace event * associated with them) to allow external modules to probe them. */
+#ifdef CONFIG_VENDOR_BOND_HOOKS +EXPORT_TRACEPOINT_SYMBOL_GPL(vendor_bond_check_dev_link); +#endif diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c065da5b6ca2..4804264c012f 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -83,6 +83,7 @@ #include <net/bonding.h> #include <net/bond_3ad.h> #include <net/bond_alb.h> +#include <trace/hooks/bonding.h>
#include "bonding_priv.h"
@@ -2415,6 +2416,10 @@ static int bond_miimon_inspect(struct bonding *bond)
link_state = bond_check_dev_link(bond, slave->dev, 0);
+#ifdef CONFIG_VENDOR_BOND_HOOKS + trace_vendor_bond_check_dev_link(bond, slave, &link_state); +#endif + switch (slave->link) { case BOND_LINK_UP: if (link_state) diff --git a/include/trace/hooks/bonding.h b/include/trace/hooks/bonding.h new file mode 100644 index 000000000000..fc77d6da3a19 --- /dev/null +++ b/include/trace/hooks/bonding.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Ethernet Bonding driver Vendor Hooks + * + * Copyright (c) 2022, Huawei Tech. Co., Ltd. + */ + +#ifdef CONFIG_VENDOR_BOND_HOOKS + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM bonding + +#define TRACE_INCLUDE_PATH trace/hooks +#if !defined(_TRACE_HOOK_BONDING_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HOOK_BONDING_H +#include <linux/tracepoint.h> +#include <trace/hooks/vendor_hooks.h> + +struct bonding; +struct slave; +DECLARE_HOOK(vendor_bond_check_dev_link, + TP_PROTO(const struct bonding *bond, const struct slave *slave, int *state), + TP_ARGS(bond, slave, state)); + +#endif +/* This part must be outside protection */ +#include <trace/define_trace.h> + +#endif
From: Luo Meng luomeng12@huawei.com
hulk inclusion category: bugfix bugzilla: 185894, https://gitee.com/openeuler/kernel/issues/I4SJ8H?from=project-issue CVE: NA
-----------------------------------------------
This reverts commit b0d9aeb41d5d5e90601fdf89834eba0a0613291c.
This commit b0d9aeb41d5d ("dm space maps: don't reset space map allocation cursor when committing") change the way to find free block.
But when use ramdisk(not support discard) for thin-pool,and storage over-commitment. Then constantly create and delete file, can find block in thin-pool, but can't find block in ramdisk.
So need revert this patch.
Signed-off-by: Luo Meng luomeng12@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/md/persistent-data/dm-space-map-disk.c | 9 +-------- drivers/md/persistent-data/dm-space-map-metadata.c | 9 +-------- 2 files changed, 2 insertions(+), 16 deletions(-)
diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c index e0acae7a3815..bf4c5e2ccb6f 100644 --- a/drivers/md/persistent-data/dm-space-map-disk.c +++ b/drivers/md/persistent-data/dm-space-map-disk.c @@ -171,14 +171,6 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b) * Any block we allocate has to be free in both the old and current ll. */ r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b); - if (r == -ENOSPC) { - /* - * There's no free block between smd->begin and the end of the metadata device. - * We search before smd->begin in case something has been freed. - */ - r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, 0, smd->begin, b); - } - if (r) return r;
@@ -207,6 +199,7 @@ static int sm_disk_commit(struct dm_space_map *sm) return r;
memcpy(&smd->old_ll, &smd->ll, sizeof(smd->old_ll)); + smd->begin = 0; smd->nr_allocated_this_transaction = 0;
r = sm_disk_get_nr_free(sm, &nr_free); diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c index da439ac85796..9e3c64ec2026 100644 --- a/drivers/md/persistent-data/dm-space-map-metadata.c +++ b/drivers/md/persistent-data/dm-space-map-metadata.c @@ -452,14 +452,6 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b) * Any block we allocate has to be free in both the old and current ll. */ r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b); - if (r == -ENOSPC) { - /* - * There's no free block between smm->begin and the end of the metadata device. - * We search before smm->begin in case something has been freed. - */ - r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, 0, smm->begin, b); - } - if (r) return r;
@@ -511,6 +503,7 @@ static int sm_metadata_commit(struct dm_space_map *sm) return r;
memcpy(&smm->old_ll, &smm->ll, sizeof(smm->old_ll)); + smm->begin = 0; smm->allocated_this_transaction = 0;
return 0;
From: Chao Liu liuchao173@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4VPL5 CVE: NA
-------------------------------------------------
Enable configs to support 9P.
Signed-off-by: Chao Liu liuchao173@huawei.com Reviewed-by: Kai Liu kai.liu@suse.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 460fcdcca62b..62f43ff270f9 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -1876,7 +1876,10 @@ CONFIG_RFKILL=m CONFIG_RFKILL_LEDS=y CONFIG_RFKILL_INPUT=y CONFIG_RFKILL_GPIO=m -# CONFIG_NET_9P is not set +CONFIG_NET_9P=m +CONFIG_NET_9P_VIRTIO=m +# CONFIG_NET_9P_RDMA is not set +# CONFIG_NET_9P_DEBUG is not set # CONFIG_CAIF is not set CONFIG_CEPH_LIB=m # CONFIG_CEPH_LIB_PRETTYDEBUG is not set @@ -6398,6 +6401,10 @@ CONFIG_CIFS_DFS_UPCALL=y # CONFIG_CIFS_FSCACHE is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set +CONFIG_9P_FS=m +CONFIG_9P_FSCACHE=y +CONFIG_9P_FS_POSIX_ACL=y +CONFIG_9P_FS_SECURITY=y CONFIG_EULER_FS=m CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8"
From: Baokun Li libaokun1@huawei.com
hulk inclusion category: bugfix bugzilla: 186191, https://gitee.com/openeuler/kernel/issues/I4U4SQ CVE: NA
--------------------------------
When renaming the whiteout file, the old whiteout file is not deleted. Therefore, we add the old dentry size to the old dir like XFS. Otherwise, an error may be reported due to `fscki->calc_sz != fscki->size` in check_indes.
Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Reported-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/ubifs/dir.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 3db1c75fda5b..f5777f59a101 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -1400,6 +1400,9 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry, iput(whiteout); goto out_release; } + + /* Add the old_dentry size to the old_dir size. */ + old_sz -= CALC_DENT_SIZE(fname_len(&old_nm)); }
lock_4_inodes(old_dir, new_dir, new_inode, whiteout);