patch #1: Move the interface that exists under /sys/devices/system/cpu/cpuX/topology/ to the more logical Documentation/ABI/ file that can be properly parsed and displayed to the user space.
patch #2: clarify the overflow issue of sysfs pagebuf, and Move the presence of BUILD_BUG_ON to more sensible place.
Tian Tao (2): CPU, NUMA topology ABIs: clarify and cleanup CPU and NUMA topology ABIs cpumask: clarify the overflow issue of sysfs pagebuf
Documentation/ABI/stable/sysfs-devices-node | 4 + Documentation/ABI/stable/sysfs-devices-system-cpu | 149 ++++++++++++++++++++++ Documentation/admin-guide/cputopology.rst | 89 ------------- drivers/base/node.c | 3 - include/linux/cpumask.h | 9 +- 5 files changed, 161 insertions(+), 93 deletions(-)
Move the interfaces that exist under /sys/devices/system/cpu/cpuX/topology from the file Documentation/admin-guide/cputopology.rst to Documentation/ABI/stable/sysfs- devices-system-cpu.
Topology bitmap mask strings shouldn't be larger than PAGE_SIZE as lstopo and numactl depend on them. But other ABIs exposing cpu lists are not really used by common applications, so this patch also marks those lists could be trimmed as there is no any guarantee those lists are always less than PAGE_SIZE especially a list could be like this: 0, 3, 5, 7, 9, 11... etc.
Signed-off-by: Tian Tao tiantao6@hisilicon.com Signed-off-by: Barry Song song.bao.hua@hisilicon.com --- Documentation/ABI/stable/sysfs-devices-node | 4 + Documentation/ABI/stable/sysfs-devices-system-cpu | 149 ++++++++++++++++++++++ Documentation/admin-guide/cputopology.rst | 89 ------------- 3 files changed, 153 insertions(+), 89 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 484fc04..ef783d9 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -48,6 +48,10 @@ Date: October 2002 Contact: Linux Memory Management list linux-mm@kvack.org Description: The CPUs associated to the node. + The CPUs associated to the node. The format is like 0-3, + 8-11, 14,17. The maximum size is PAGE_SIZE, so the tail + of the string will be trimmed while its size is larger + than PAGE_SIZE.
What: /sys/devices/system/node/nodeX/meminfo Date: October 2002 diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu b/Documentation/ABI/stable/sysfs-devices-system-cpu index 33c133e..7d0b23e 100644 --- a/Documentation/ABI/stable/sysfs-devices-system-cpu +++ b/Documentation/ABI/stable/sysfs-devices-system-cpu @@ -1,3 +1,7 @@ +Export CPU topology info via sysfs. Items (attributes) are similar +to /proc/cpuinfo output of some architectures. They reside in +/sys/devices/system/cpu/cpuX/topology/: + What: /sys/devices/system/cpu/dscr_default Date: 13-May-2014 KernelVersion: v3.15.0 @@ -23,3 +27,148 @@ Description: Default value for the Data Stream Control Register (DSCR) on here). If set by a process it will be inherited by child processes. Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/physical_package_id +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: physical package id of cpuX. Typically corresponds to a physical + socket number, but the actual value is architecture and platform + dependent. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/die_id +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: the CPU die ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/core_id +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: the CPU core ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/book_id +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: the book ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/drawer_id +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: the drawer ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/core_cpus +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: internal kernel map of CPUs within the same core. + (deprecated name: "thread_siblings") +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/core_cpus_list +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: human-readable list of CPUs within the same core. + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, + so the tail of the string will be trimmed while its size is larger + than PAGE_SIZE. + (deprecated name: "thread_siblings_list"). +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/package_cpus +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: internal kernel map of the CPUs sharing the same physical_package_id. + (deprecated name: "core_siblings"). +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/package_cpus_list +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: human-readable list of CPUs sharing the same physical_package_id. + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, + so the tail of the string will be trimmed while its size is larger + than PAGE_SIZE. + (deprecated name: "core_siblings_list") +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/die_cpus +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: internal kernel map of CPUs within the same die. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/die_cpus_list +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: human-readable list of CPUs within the same die. + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, + so the tail of the string will be trimmed while its size is larger + than PAGE_SIZE. +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/book_siblings +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: internal kernel map of cpuX's hardware threads within the same + book_id. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/book_siblings_list +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: human-readable list of cpuX's hardware threads within the same + book_id. + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, + so the tail of the string will be trimmed while its size is larger + than PAGE_SIZE. +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: internal kernel map of cpuX's hardware threads within the same + drawer_id. +Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list +Date: 19-Mar-2021 +KernelVersion: v5.12 +Contact: +Description: human-readable list of cpuX's hardware threads within the same + drawer_id. + The format is like 0-3, 8-11, 14,17. The maximum size is PAGE_SIZE, + so the tail of the string will be trimmed while its size is larger + than PAGE_SIZE. +Values: hexadecimal bitmask. + +Architecture-neutral, drivers/base/topology.c, exports these attributes. +However, the book and drawer related sysfs files will only be created if +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively. + +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390, +where they reflect the cpu and cache hierarchy. diff --git a/Documentation/admin-guide/cputopology.rst b/Documentation/admin-guide/cputopology.rst index b90dafc..4672465 100644 --- a/Documentation/admin-guide/cputopology.rst +++ b/Documentation/admin-guide/cputopology.rst @@ -2,95 +2,6 @@ How CPU topology info is exported via sysfs ===========================================
-Export CPU topology info via sysfs. Items (attributes) are similar -to /proc/cpuinfo output of some architectures. They reside in -/sys/devices/system/cpu/cpuX/topology/: - -physical_package_id: - - physical package id of cpuX. Typically corresponds to a physical - socket number, but the actual value is architecture and platform - dependent. - -die_id: - - the CPU die ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -core_id: - - the CPU core ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -book_id: - - the book ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -drawer_id: - - the drawer ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -core_cpus: - - internal kernel map of CPUs within the same core. - (deprecated name: "thread_siblings") - -core_cpus_list: - - human-readable list of CPUs within the same core. - (deprecated name: "thread_siblings_list"); - -package_cpus: - - internal kernel map of the CPUs sharing the same physical_package_id. - (deprecated name: "core_siblings") - -package_cpus_list: - - human-readable list of CPUs sharing the same physical_package_id. - (deprecated name: "core_siblings_list") - -die_cpus: - - internal kernel map of CPUs within the same die. - -die_cpus_list: - - human-readable list of CPUs within the same die. - -book_siblings: - - internal kernel map of cpuX's hardware threads within the same - book_id. - -book_siblings_list: - - human-readable list of cpuX's hardware threads within the same - book_id. - -drawer_siblings: - - internal kernel map of cpuX's hardware threads within the same - drawer_id. - -drawer_siblings_list: - - human-readable list of cpuX's hardware threads within the same - drawer_id. - -Architecture-neutral, drivers/base/topology.c, exports these attributes. -However, the book and drawer related sysfs files will only be created if -CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively. - -CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390, -where they reflect the cpu and cache hierarchy. - For an architecture to support this feature, it must define some of these macros in include/asm-XXX/topology.h::
Both numa node and cpu use cpu bitmap like 3,ffffffff to expose hardware topology. When cpu number is large, the page buffer of sysfs will over- flow. This doesn't really happen nowadays as the maximum NR_CPUS is 8196 for X86_64 and 4096 for ARM64 since 8196 * 9 / 32 = 2305 is still smaller than 4KB page size. So the existing BUILD_BUG_ON() in drivers/base/node.c is pretty much preventing future problems similar with Y2K when hardware gets more and more CPUs. On the other hand, it should be more sensible to move the guard to common code which can protect both cpu and numa: /sys/devices/system/cpu/cpu0/topology/die_cpus etc. /sys/devices/system/node/node0/cpumap etc.
Signed-off-by: Tian Tao tiantao6@hisilicon.com Signed-off-by: Barry Song song.bao.hua@hisilicon.com --- drivers/base/node.c | 3 --- include/linux/cpumask.h | 9 ++++++++- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c index 2c36f61d..e24530c3 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -33,9 +33,6 @@ static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf) cpumask_var_t mask; struct node *node_dev = to_node(dev);
- /* 2008/04/07: buf currently PAGE_SIZE, need 9 chars per 32 bits. */ - BUILD_BUG_ON((NR_CPUS/32 * 9) > (PAGE_SIZE-1)); - if (!alloc_cpumask_var(&mask, GFP_KERNEL)) return 0;
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index bfc4690..7e3ccdc 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -13,6 +13,8 @@ #include <linux/atomic.h> #include <linux/bug.h>
+#include <asm/page.h> + /* Don't assign or return these: may not be this big! */ typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
@@ -979,8 +981,13 @@ static inline bool cpu_dying(unsigned int cpu) static inline ssize_t cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask) { + /* + * 32bits requires 9bytes: "ff,ffffffff", thus, too many CPUs will + * cause the overflow of sysfs pagebuf + */ + BUILD_BUG_ON((NR_CPUS/32 * 9) > (PAGE_SIZE-1)); return bitmap_print_to_pagebuf(list, buf, cpumask_bits(mask), - nr_cpu_ids); + nr_cpu_ids); }
#if NR_CPUS <= BITS_PER_LONG