From: Bharata B Rao bharata@amd.com
mainline inclusion from mainline-v5.16-rc1 commit 6cf253925df72e522c06dac09ede7e81a6e38121 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4T0ML CVE: NA
-------------------------------------------------
Patch series "Fix NUMA nodes fallback list ordering".
For a NUMA system that has multiple nodes at same distance from other nodes, the fallback list generation prefers same node order for them instead of round-robin thereby penalizing one node over others. This series fixes it.
More description of the problem and the fix is present in the patch description.
This patch (of 2):
Print information message about the allocation fallback order for each NUMA node during boot.
No functional changes here. This makes it easier to illustrate the problem in the node fallback list generation, which the next patch fixes.
Link: https://lkml.kernel.org/r/20210830121603.1081-1-bharata@amd.com Link: https://lkml.kernel.org/r/20210830121603.1081-2-bharata@amd.com Signed-off-by: Bharata B Rao bharata@amd.com Acked-by: Mel Gorman mgorman@suse.de Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Cc: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com Cc: Lee Schermerhorn lee.schermerhorn@hp.com Cc: Krupa Ramakrishnan krupa.ramakrishnan@amd.com Cc: Sadagopan Srinivasan Sadagopan.Srinivasan@amd.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/page_alloc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3791bdc958bd..4e67b4506238 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6067,6 +6067,10 @@ static void build_zonelists(pg_data_t *pgdat)
build_zonelists_in_node_order(pgdat, node_order, nr_nodes); build_thisnode_zonelists(pgdat); + pr_info("Fallback order for Node %d: ", local_node); + for (node = 0; node < nr_nodes; node++) + pr_cont("%d ", node_order[node]); + pr_cont("\n"); }
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
From: Krupa Ramakrishnan krupa.ramakrishnan@amd.com
mainline inclusion from mainline-v5.16-rc1 commit 54d032ced98378bcb9d32dd5e378b7e402b36ad8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4T0ML CVE: NA
-----------------------------------------------
In build_zonelists(), when the fallback list is built for the nodes, the node load gets reinitialized during each iteration. This results in nodes with same distances occupying the same slot in different node fallback lists rather than appearing in the intended round- robin manner. This results in one node getting picked for allocation more compared to other nodes with the same distance.
As an example, consider a 4 node system with the following distance matrix.
Node 0 1 2 3 ---------------- 0 10 12 32 32 1 12 10 32 32 2 32 32 10 12 3 32 32 12 10
For this case, the node fallback list gets built like this:
Node Fallback list --------------------- 0 0 1 2 3 1 1 0 3 2 2 2 3 0 1 3 3 2 0 1 <-- Unexpected fallback order
In the fallback list for nodes 2 and 3, the nodes 0 and 1 appear in the same order which results in more allocations getting satisfied from node 0 compared to node 1.
The effect of this on remote memory bandwidth as seen by stream benchmark is shown below:
Case 1: Bandwidth from cores on nodes 2 & 3 to memory on nodes 0 & 1 (numactl -m 0,1 ./stream_lowOverhead ... --cores <from 2, 3>) Case 2: Bandwidth from cores on nodes 0 & 1 to memory on nodes 2 & 3 (numactl -m 2,3 ./stream_lowOverhead ... --cores <from 0, 1>)
---------------------------------------- BANDWIDTH (MB/s) TEST Case 1 Case 2 ---------------------------------------- COPY 57479.6 110791.8 SCALE 55372.9 105685.9 ADD 50460.6 96734.2 TRIADD 50397.6 97119.1 ----------------------------------------
The bandwidth drop in Case 1 occurs because most of the allocations get satisfied by node 0 as it appears first in the fallback order for both nodes 2 and 3.
This can be fixed by accumulating the node load in build_zonelists() rather than reinitializing it during each iteration. With this the nodes with the same distance rightly get assigned in the round robin manner.
In fact this was how it was originally until commit f0c0b2b808f2 ("change zonelist order: zonelist order selection logic") dropped the load accumulation and resorted to initializing the load during each iteration.
While zonelist ordering was removed by commit c9bff3eebc09 ("mm, page_alloc: rip out ZONELIST_ORDER_ZONE"), the change to the node load accumulation in build_zonelists() remained. So essentially this patch reverts back to the accumulated node load logic.
After this fix, the fallback order gets built like this:
Node Fallback list ------------------ 0 0 1 2 3 1 1 0 3 2 2 2 3 0 1 3 3 2 1 0 <-- Note the change here
The bandwidth in Case 1 improves and matches Case 2 as shown below.
---------------------------------------- BANDWIDTH (MB/s) TEST Case 1 Case 2 ---------------------------------------- COPY 110438.9 110107.2 SCALE 105930.5 105817.5 ADD 97005.1 96159.8 TRIADD 97441.5 96757.1 ----------------------------------------
The correctness of the fallback list generation has been verified for the above node configuration where the node 3 starts as memory-less node and comes up online only during memory hotplug.
[bharata@amd.com: Added changelog, review, test validation]
Link: https://lkml.kernel.org/r/20210830121603.1081-3-bharata@amd.com Fixes: f0c0b2b808f2 ("change zonelist order: zonelist order selection logic") Signed-off-by: Krupa Ramakrishnan krupa.ramakrishnan@amd.com Co-developed-by: Sadagopan Srinivasan Sadagopan.Srinivasan@amd.com Signed-off-by: Sadagopan Srinivasan Sadagopan.Srinivasan@amd.com Signed-off-by: Bharata B Rao bharata@amd.com Acked-by: Mel Gorman mgorman@suse.de Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Cc: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com Cc: Lee Schermerhorn lee.schermerhorn@hp.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4e67b4506238..ef761d5c7025 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6058,7 +6058,7 @@ static void build_zonelists(pg_data_t *pgdat) */ if (node_distance(local_node, node) != node_distance(local_node, prev_node)) - node_load[node] = load; + node_load[node] += load;
node_order[nr_nodes++] = node; prev_node = node;
From: wangshouping wangshouping@huawei.com
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4T4W4?from=project-issue CVE: NA
--------
Reserve some fields beforehand for parsing RSASSA-PSS style certificates
---------
Signed-off-by: wangshouping wangshouping@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/crypto/akcipher.h | 2 ++ include/crypto/public_key.h | 9 +++++++++ include/linux/keyctl.h | 3 +++ include/linux/oid_registry.h | 2 ++ 4 files changed, 16 insertions(+)
diff --git a/include/crypto/akcipher.h b/include/crypto/akcipher.h index 5764b46bd1ec..4ecbdd745a5f 100644 --- a/include/crypto/akcipher.h +++ b/include/crypto/akcipher.h @@ -8,6 +8,7 @@ #ifndef _CRYPTO_AKCIPHER_H #define _CRYPTO_AKCIPHER_H #include <linux/crypto.h> +#include <linux/kabi.h>
/** * struct akcipher_request - public key request @@ -101,6 +102,7 @@ struct akcipher_alg { unsigned int (*max_size)(struct crypto_akcipher *tfm); int (*init)(struct crypto_akcipher *tfm); void (*exit)(struct crypto_akcipher *tfm); + KABI_RESERVE(1)
unsigned int reqsize; struct crypto_alg base; diff --git a/include/crypto/public_key.h b/include/crypto/public_key.h index f5bd80858fc5..041e2c023a8e 100644 --- a/include/crypto/public_key.h +++ b/include/crypto/public_key.h @@ -13,6 +13,7 @@ #include <linux/keyctl.h> #include <linux/oid_registry.h> #include <crypto/akcipher.h> +#include <linux/kabi.h>
/* * Cryptographic data for the public-key subtype of the asymmetric key type. @@ -29,6 +30,11 @@ struct public_key { bool key_is_private; const char *id_type; const char *pkey_algo; + KABI_RESERVE(1) + KABI_RESERVE(2) + KABI_RESERVE(3) + KABI_RESERVE(4) + KABI_RESERVE(5) };
extern void public_key_free(struct public_key *key); @@ -47,6 +53,9 @@ struct public_key_signature { const char *encoding; const void *data; unsigned int data_size; + KABI_RESERVE(1) + KABI_RESERVE(2) + KABI_RESERVE(3) };
extern void public_key_signature_free(struct public_key_signature *sig); diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h index 5b79847207ef..1228af4053a8 100644 --- a/include/linux/keyctl.h +++ b/include/linux/keyctl.h @@ -9,6 +9,7 @@ #define __LINUX_KEYCTL_H
#include <uapi/linux/keyctl.h> +#include <linux/kabi.h>
struct kernel_pkey_query { __u32 supported_ops; /* Which ops are supported */ @@ -37,6 +38,8 @@ struct kernel_pkey_params { __u32 in2_len; /* 2nd input data size (verify) */ }; enum kernel_pkey_operation op : 8; + KABI_RESERVE(1) + KABI_RESERVE(2) };
#endif /* __LINUX_KEYCTL_H */ diff --git a/include/linux/oid_registry.h b/include/linux/oid_registry.h index f32d91895e4d..72882f35bd88 100644 --- a/include/linux/oid_registry.h +++ b/include/linux/oid_registry.h @@ -116,6 +116,8 @@ enum OID { OID_sm3, /* 1.2.156.10197.1.401 */ OID_SM2_with_SM3, /* 1.2.156.10197.1.501 */ OID_sm3WithRSAEncryption, /* 1.2.156.10197.1.504 */ + OID_mgf1, /* 1.2.840.113549.1.1.8 */ + OID_rsassaPSS, /* 1.2.840.113549.1.1.10 */
OID__NR };
From: Wei Li liwei391@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4T1NF CVE: NA
-------------------------------------------------
When CONFIG_KABI_RESERVE=n && CONFIG_KABI_SIZE_ALIGN_CHECKS=y, with kabi reserved padding replaced by KABI_USE(), we will get the build error:
include/linux/kabi.h:383:3: error: static assertion failed: \ "include/linux/fs.h:2306: long aaa is larger than . \ Disable CONFIG_KABI_SIZE_ALIGN_CHECKS if debugging." _Static_assert(sizeof(struct{_new;}) <= sizeof(struct{_orig;}), \ ^~~~~~~~~~~~~~
In fact, the result of KABI_USE() when CONFIG_KABI_RESERVE=n is weird, update _KABI_REPLACE() to fix this.
Signed-off-by: Wei Li liwei391@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/kabi.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/include/linux/kabi.h b/include/linux/kabi.h index 0bc7ca2483f4..a52d9fa72cfa 100644 --- a/include/linux/kabi.h +++ b/include/linux/kabi.h @@ -396,6 +396,7 @@ # define _KABI_DEPRECATE(_type, _orig) _type kabi_reserved_##_orig # define _KABI_DEPRECATE_FN(_type, _orig, _args...) \ _type (* kabi_reserved_##_orig)(_args) +#ifdef CONFIG_KABI_RESERVE # define _KABI_REPLACE(_orig, _new) \ union { \ _new; \ @@ -404,6 +405,9 @@ } __UNIQUE_ID(kabi_hide); \ __KABI_CHECK_SIZE_ALIGN(_orig, _new); \ } +#else +# define _KABI_REPLACE(_orig, _new) KABI_BROKEN_REPLACE(_orig, _new) +#endif
# define _KABI_EXCLUDE(_elem) _elem
@@ -426,9 +430,9 @@ * code. */ #ifdef CONFIG_KABI_RESERVE - # define _KABI_RESERVE(n) u64 kabi_reserved##n +# define _KABI_RESERVE(n) u64 kabi_reserved##n #else - # define _KABI_RESERVE(n) +# define _KABI_RESERVE(n) #endif # define KABI_RESERVE(n) _KABI_RESERVE(n); /*
From: Wei Li liwei391@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4T1NF CVE: NA
-------------------------------------------------
Move config entries of kabi to "General setup", and make CONFIG_KABI_SIZE_ALIGN_CHECKS depending on CONFIG_KABI_RESERVE.
Signed-off-by: Wei Li liwei391@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- Kconfig | 15 --------------- init/Kconfig | 16 ++++++++++++++++ 2 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/Kconfig b/Kconfig index 9c9b5fedd43c..745bc773f567 100644 --- a/Kconfig +++ b/Kconfig @@ -30,18 +30,3 @@ source "lib/Kconfig" source "lib/Kconfig.debug"
source "Documentation/Kconfig" - -config KABI_SIZE_ALIGN_CHECKS - bool "Enables more stringent kabi checks in the macros" - default y - help - This option enables more stringent kabi checks. Those must be disable - in case of a debug-build because they allow to change struct sizes. - -config KABI_RESERVE - bool "Enable KABI PADDING RESERVE" - default y - help - This option enables KABI padding reserve. - For some embedded system, KABI padding reserve may be not necessary. - Disable it on demand. diff --git a/init/Kconfig b/init/Kconfig index b86163991bc9..4410b711f9dc 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2086,6 +2086,22 @@ config PROFILING config TRACEPOINTS bool
+config KABI_RESERVE + bool "Enable KABI PADDING RESERVE" + default y + help + This option enables KABI padding reserve. + For some embedded system, KABI padding reserve may be not necessary. + Disable it on demand. + +config KABI_SIZE_ALIGN_CHECKS + bool "Enables more stringent kabi checks in the macros" + default y + depends on KABI_RESERVE + help + This option enables more stringent kabi checks. Those must be disable + in case of a debug-build because they allow to change struct sizes. + endmenu # General setup
source "arch/Kconfig"