*** BLURB HERE ***
Waiman Long (3): cgroup/cpuset: Documentation update for partition cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
Documentation/admin-guide/cgroup-v2.rst | 127 ++++++++++++++++++------ kernel/cgroup/cpuset.c | 103 ++++++++++++++----- 2 files changed, 171 insertions(+), 59 deletions(-)
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v6.7-rc1 commit efdf7532bd3d302a96436beee153364f26a1ddae category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRJD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
----------------------------------------------------------------------
This patch updates the cgroup-v2.rst file to include information about the new "cpuset.cpus.exclusive" and "cpuset.cpus.excluisve.effective" control files as well as the new remote partition type.
Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Chen Ridong chenridong@huawei.com --- Documentation/admin-guide/cgroup-v2.rst | 123 ++++++++++++++++++------ 1 file changed, 91 insertions(+), 32 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 61b31f209a25..30d1c26668d2 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2266,6 +2266,49 @@ Cpuset Interface Files
Its value will be affected by memory nodes hotplug events.
+ cpuset.cpus.exclusive + A read-write multiple values file which exists on non-root + cpuset-enabled cgroups. + + It lists all the exclusive CPUs that are allowed to be used + to create a new cpuset partition. Its value is not used + unless the cgroup becomes a valid partition root. See the + "cpuset.cpus.partition" section below for a description of what + a cpuset partition is. + + When the cgroup becomes a partition root, the actual exclusive + CPUs that are allocated to that partition are listed in + "cpuset.cpus.exclusive.effective" which may be different + from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive" + has previously been set, "cpuset.cpus.exclusive.effective" + is always a subset of it. + + Users can manually set it to a value that is different from + "cpuset.cpus". The only constraint in setting it is that the + list of CPUs must be exclusive with respect to its sibling. + + For a parent cgroup, any one of its exclusive CPUs can only + be distributed to at most one of its child cgroups. Having an + exclusive CPU appearing in two or more of its child cgroups is + not allowed (the exclusivity rule). A value that violates the + exclusivity rule will be rejected with a write error. + + The root cgroup is a partition root and all its available CPUs + are in its exclusive CPU set. + + cpuset.cpus.exclusive.effective + A read-only multiple values file which exists on all non-root + cpuset-enabled cgroups. + + This file shows the effective set of exclusive CPUs that + can be used to create a partition root. The content of this + file will always be a subset of "cpuset.cpus" and its parent's + "cpuset.cpus.exclusive.effective" if its parent is not the root + cgroup. It will also be a subset of "cpuset.cpus.exclusive" + if it is set. If "cpuset.cpus.exclusive" is not set, it is + treated to have an implicit value of "cpuset.cpus" in the + formation of local partition. + cpuset.cpus.partition A read-write single value file which exists on non-root cpuset-enabled cgroups. This flag is owned by the parent cgroup @@ -2279,26 +2322,41 @@ Cpuset Interface Files "isolated" Partition root without load balancing ========== =====================================
- The root cgroup is always a partition root and its state - cannot be changed. All other non-root cgroups start out as - "member". + A cpuset partition is a collection of cpuset-enabled cgroups with + a partition root at the top of the hierarchy and its descendants + except those that are separate partition roots themselves and + their descendants. A partition has exclusive access to the + set of exclusive CPUs allocated to it. Other cgroups outside + of that partition cannot use any CPUs in that set. + + There are two types of partitions - local and remote. A local + partition is one whose parent cgroup is also a valid partition + root. A remote partition is one whose parent cgroup is not a + valid partition root itself. Writing to "cpuset.cpus.exclusive" + is optional for the creation of a local partition as its + "cpuset.cpus.exclusive" file will assume an implicit value that + is the same as "cpuset.cpus" if it is not set. Writing the + proper "cpuset.cpus.exclusive" values down the cgroup hierarchy + before the target partition root is mandatory for the creation + of a remote partition. + + Currently, a remote partition cannot be created under a local + partition. All the ancestors of a remote partition root except + the root cgroup cannot be a partition root. + + The root cgroup is always a partition root and its state cannot + be changed. All other non-root cgroups start out as "member".
When set to "root", the current cgroup is the root of a new - partition or scheduling domain that comprises itself and all - its descendants except those that are separate partition roots - themselves and their descendants. + partition or scheduling domain. The set of exclusive CPUs is + determined by the value of its "cpuset.cpus.exclusive.effective".
- When set to "isolated", the CPUs in that partition root will + When set to "isolated", the CPUs in that partition will be in an isolated state without any load balancing from the scheduler. Tasks placed in such a partition with multiple CPUs should be carefully distributed and bound to each of the individual CPUs for optimal performance.
- The value shown in "cpuset.cpus.effective" of a partition root - is the CPUs that the partition root can dedicate to a potential - new child partition root. The new child subtracts available - CPUs from its parent "cpuset.cpus.effective". - A partition root ("root" or "isolated") can be in one of the two possible states - valid or invalid. An invalid partition root is in a degraded state where some state information may @@ -2321,37 +2379,33 @@ Cpuset Interface Files In the case of an invalid partition root, a descriptive string on why the partition is invalid is included within parentheses.
- For a partition root to become valid, the following conditions + For a local partition root to be valid, the following conditions must be met.
- 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they - are not shared by any of its siblings (exclusivity rule). - 2) The parent cgroup is a valid partition root. - 3) The "cpuset.cpus" is not empty and must contain at least - one of the CPUs from parent's "cpuset.cpus", i.e. they overlap. - 4) The "cpuset.cpus.effective" cannot be empty unless there is + 1) The parent cgroup is a valid partition root. + 2) The "cpuset.cpus.exclusive.effective" file cannot be empty, + though it may contain offline CPUs. + 3) The "cpuset.cpus.effective" cannot be empty unless there is no task associated with this partition.
- External events like hotplug or changes to "cpuset.cpus" can - cause a valid partition root to become invalid and vice versa. - Note that a task cannot be moved to a cgroup with empty - "cpuset.cpus.effective". + For a remote partition root to be valid, all the above conditions + except the first one must be met.
- For a valid partition root with the sibling cpu exclusivity - rule enabled, changes made to "cpuset.cpus" that violate the - exclusivity rule will invalidate the partition as well as its - sibling partitions with conflicting cpuset.cpus values. So - care must be taking in changing "cpuset.cpus". + External events like hotplug or changes to "cpuset.cpus" or + "cpuset.cpus.exclusive" can cause a valid partition root to + become invalid and vice versa. Note that a task cannot be + moved to a cgroup with empty "cpuset.cpus.effective".
A valid non-root parent partition may distribute out all its CPUs - to its child partitions when there is no task associated with it. + to its child local partitions when there is no task associated + with it.
- Care must be taken to change a valid partition root to - "member" as all its child partitions, if present, will become + Care must be taken to change a valid partition root to "member" + as all its child local partitions, if present, will become invalid causing disruption to tasks running in those child partitions. These inactivated partitions could be recovered if their parent is switched back to a partition root with a proper - set of "cpuset.cpus". + value in "cpuset.cpus" or "cpuset.cpus.exclusive".
Poll and inotify events are triggered whenever the state of "cpuset.cpus.partition" changes. That includes changes caused @@ -2361,6 +2415,11 @@ Cpuset Interface Files to "cpuset.cpus.partition" without the need to do continuous polling.
+ A user can pre-configure certain CPUs to an isolated state + with load balancing disabled at boot time with the "isolcpus" + kernel boot command line option. If those CPUs are to be put + into a partition, they have to be used in an isolated partition. +
Device controller -----------------
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v6.11-rc1 commit fe8cd2736e75c8ca3aed1ef181a834e41dc5310f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRJD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
----------------------------------------------------------------------
The CS_CPU_EXCLUSIVE flag is currently set whenever cpuset.cpus.exclusive is set to make sure that the exclusivity test will be run to ensure its exclusiveness. At the same time, this flag can be changed whenever the partition root state is changed. For example, the CS_CPU_EXCLUSIVE flag will be reset whenever a partition root becomes invalid. This makes using CS_CPU_EXCLUSIVE to ensure exclusiveness a bit fragile.
The current scheme also makes setting up a cpuset.cpus.exclusive hierarchy to enable remote partition harder as cpuset.cpus.exclusive cannot overlap with any cpuset.cpus of sibling cpusets if their cpuset.cpus.exclusive aren't set.
Solve these issues by deferring the setting of CS_CPU_EXCLUSIVE flag until the cpuset become a valid partition root while adding new checks in validate_change() to ensure that cpuset.cpus.exclusive of sibling cpusets cannot overlap.
An additional check is also added to validate_change() to make sure that cpuset.cpus of one cpuset cannot be a subset of cpuset.cpus.exclusive of a sibling cpuset to avoid the problem that none of those CPUs will be available when these exclusive CPUs are extracted out to a newly enabled partition root. The Documentation/admin-guide/cgroup-v2.rst file is updated to document the new constraints.
Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Chen Ridong chenridong@huawei.com --- Documentation/admin-guide/cgroup-v2.rst | 8 ++++-- kernel/cgroup/cpuset.c | 36 ++++++++++++++++++++----- 2 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 30d1c26668d2..69905668e06f 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2284,8 +2284,12 @@ Cpuset Interface Files is always a subset of it.
Users can manually set it to a value that is different from - "cpuset.cpus". The only constraint in setting it is that the - list of CPUs must be exclusive with respect to its sibling. + "cpuset.cpus". One constraint in setting it is that the list of + CPUs must be exclusive with respect to "cpuset.cpus.exclusive" + of its sibling. If "cpuset.cpus.exclusive" of a sibling cgroup + isn't set, its "cpuset.cpus" value, if set, cannot be a subset + of it to leave at least one CPU available when the exclusive + CPUs are taken away.
For a parent cgroup, any one of its exclusive CPUs can only be distributed to at most one of its child cgroups. Having an diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index f7870857d1dd..26e95916ee8e 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -862,17 +862,41 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
/* * If either I or some sibling (!= me) is exclusive, we can't - * overlap + * overlap. exclusive_cpus cannot overlap with each other if set. */ ret = -EINVAL; cpuset_for_each_child(c, css, par) { - if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) && - c != cur) { + bool txset, cxset; /* Are exclusive_cpus set? */ + + if (c == cur) + continue; + + txset = !cpumask_empty(trial->exclusive_cpus); + cxset = !cpumask_empty(c->exclusive_cpus); + if (is_cpu_exclusive(trial) || is_cpu_exclusive(c) || + (txset && cxset)) { if (!cpusets_are_exclusive(trial, c)) goto out; + } else if (txset || cxset) { + struct cpumask *xcpus, *acpus; + + /* + * When just one of the exclusive_cpus's is set, + * cpus_allowed of the other cpuset, if set, cannot be + * a subset of it or none of those CPUs will be + * available if these exclusive CPUs are activated. + */ + if (txset) { + xcpus = trial->exclusive_cpus; + acpus = c->cpus_allowed; + } else { + xcpus = c->exclusive_cpus; + acpus = trial->cpus_allowed; + } + if (!cpumask_empty(acpus) && cpumask_subset(acpus, xcpus)) + goto out; } if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) && - c != cur && nodes_intersects(trial->mems_allowed, c->mems_allowed)) goto out; } @@ -1437,7 +1461,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs, */ static int update_partition_exclusive(struct cpuset *cs, int new_prs) { - bool exclusive = (new_prs > 0); + bool exclusive = (new_prs > PRS_MEMBER);
if (exclusive && !is_cpu_exclusive(cs)) { if (update_flag(CS_CPU_EXCLUSIVE, cs, 1)) @@ -2594,8 +2618,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs, retval = cpulist_parse(buf, trialcs->exclusive_cpus); if (retval < 0) return retval; - if (!is_cpu_exclusive(cs)) - set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags); }
/* Nothing to do if the CPUs didn't change */
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v6.11-rc commit 737bb142a00d53de3743ae389732721b3b9f0191 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRJD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
----------------------------------------------------------------------
The "cpuset.cpus.exclusive.effective" value is currently limited to a subset of its "cpuset.cpus". This makes the exclusive CPUs distribution hierarchy subsumed within the larger "cpuset.cpus" hierarchy. We have to decide on what CPUs are used locally and what CPUs can be passed down as exclusive CPUs down the hierarchy and combine them into "cpuset.cpus".
The advantage of the current scheme is to have only one hierarchy to worry about. However, it make it harder to use as all the "cpuset.cpus" values have to be properly set along the way down to the designated remote partition root. It also makes it more cumbersome to find out what CPUs can be used locally.
Make creation of remote partition simpler by breaking the dependency of "cpuset.cpus.exclusive" on "cpuset.cpus" and make them independent entities. Now we have two separate hierarchies - one for setting "cpuset.cpus.effective" and the other one for setting "cpuset.cpus.exclusive.effective". We may not need to set "cpuset.cpus" when we activate a partition root anymore.
Also update Documentation/admin-guide/cgroup-v2.rst and cpuset.c comment to document this change.
Suggested-by: Petr Malat oss@malat.biz Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org
Conflicts: kernel/cgroup/cpuset.c [isolated was not backported] Signed-off-by: Chen Ridong chenridong@huawei.com --- Documentation/admin-guide/cgroup-v2.rst | 4 +- kernel/cgroup/cpuset.c | 67 +++++++++++++++++-------- 2 files changed, 49 insertions(+), 22 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 69905668e06f..86e294970484 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2305,8 +2305,8 @@ Cpuset Interface Files cpuset-enabled cgroups.
This file shows the effective set of exclusive CPUs that - can be used to create a partition root. The content of this - file will always be a subset of "cpuset.cpus" and its parent's + can be used to create a partition root. The content + of this file will always be a subset of its parent's "cpuset.cpus.exclusive.effective" if its parent is not the root cgroup. It will also be a subset of "cpuset.cpus.exclusive" if it is set. If "cpuset.cpus.exclusive" is not set, it is diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 26e95916ee8e..52ef54b1b268 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -86,7 +86,7 @@ static const char * const perr_strings[] = { [PERR_NOTEXCL] = "Cpu list in cpuset.cpus not exclusive", [PERR_NOCPUS] = "Parent unable to distribute cpu downstream", [PERR_HOTPLUG] = "No cpu available due to hotplug", - [PERR_CPUSEMPTY] = "cpuset.cpus is empty", + [PERR_CPUSEMPTY] = "cpuset.cpus and cpuset.cpus.exclusive are empty", [PERR_HKEEPING] = "partition config conflicts with housekeeping setup", };
@@ -129,19 +129,28 @@ struct cpuset { /* * Exclusive CPUs dedicated to current cgroup (default hierarchy only) * - * This exclusive CPUs must be a subset of cpus_allowed. A parent - * cgroup can only grant exclusive CPUs to one of its children. + * The effective_cpus of a valid partition root comes solely from its + * effective_xcpus and some of the effective_xcpus may be distributed + * to sub-partitions below & hence excluded from its effective_cpus. + * For a valid partition root, its effective_cpus have no relationship + * with cpus_allowed unless its exclusive_cpus isn't set. * - * When the cgroup becomes a valid partition root, effective_xcpus - * defaults to cpus_allowed if not set. The effective_cpus of a valid - * partition root comes solely from its effective_xcpus and some of the - * effective_xcpus may be distributed to sub-partitions below & hence - * excluded from its effective_cpus. + * This value will only be set if either exclusive_cpus is set or + * when this cpuset becomes a local partition root. */ cpumask_var_t effective_xcpus;
/* * Exclusive CPUs as requested by the user (default hierarchy only) + * + * Its value is independent of cpus_allowed and designates the set of + * CPUs that can be granted to the current cpuset or its children when + * it becomes a valid partition root. The effective set of exclusive + * CPUs granted (effective_xcpus) depends on whether those exclusive + * CPUs are passed down by its ancestors and not yet taken up by + * another sibling partition root along the way. + * + * If its value isn't set, it defaults to cpus_allowed. */ cpumask_var_t exclusive_cpus;
@@ -232,6 +241,17 @@ static struct list_head remote_children; * 2 - partition root without load balancing (isolated) * -1 - invalid partition root * -2 - invalid isolated partition root + * + * There are 2 types of partitions - local or remote. Local partitions are + * those whose parents are partition root themselves. Setting of + * cpuset.cpus.exclusive are optional in setting up local partitions. + * Remote partitions are those whose parents are not partition roots. Passing + * down exclusive CPUs by setting cpuset.cpus.exclusive along its ancestor + * nodes are mandatory in creating a remote partition. + * + * For simplicity, a local partition can be created under a local or remote + * partition but a remote partition cannot have any partition root in its + * ancestor chain except the cgroup root. */ #define PRS_MEMBER 0 #define PRS_ROOT 1 @@ -740,6 +760,19 @@ static inline void free_cpuset(struct cpuset *cs) kfree(cs); }
+/* Return user specified exclusive CPUs */ +static inline struct cpumask *user_xcpus(struct cpuset *cs) +{ + return cpumask_empty(cs->exclusive_cpus) ? cs->cpus_allowed + : cs->exclusive_cpus; +} + +static inline bool xcpus_empty(struct cpuset *cs) +{ + return cpumask_empty(cs->cpus_allowed) && + cpumask_empty(cs->exclusive_cpus); +} + static inline struct cpumask *fetch_xcpus(struct cpuset *cs) { return !cpumask_empty(cs->exclusive_cpus) ? cs->exclusive_cpus : @@ -1552,7 +1585,7 @@ static void reset_partition_data(struct cpuset *cs) * Return: true if xcpus is not empty, false otherwise. * * Starting with exclusive_cpus (cpus_allowed if exclusive_cpus is not set), - * it must be a subset of cpus_allowed and parent's effective_xcpus. + * it must be a subset of parent's effective_xcpus. */ static bool compute_effective_exclusive_cpumask(struct cpuset *cs, struct cpumask *xcpus) @@ -1562,12 +1595,7 @@ static bool compute_effective_exclusive_cpumask(struct cpuset *cs, if (!xcpus) xcpus = cs->effective_xcpus;
- if (!cpumask_empty(cs->exclusive_cpus)) - cpumask_and(xcpus, cs->exclusive_cpus, cs->cpus_allowed); - else - cpumask_copy(xcpus, cs->cpus_allowed); - - return cpumask_and(xcpus, xcpus, parent->effective_xcpus); + return cpumask_and(xcpus, user_xcpus(cs), parent->effective_xcpus); }
static inline bool is_remote_partition(struct cpuset *cs) @@ -1853,8 +1881,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd, */ adding = deleting = false; old_prs = new_prs = cs->partition_root_state; - xcpus = !cpumask_empty(cs->exclusive_cpus) - ? cs->effective_xcpus : cs->cpus_allowed; + xcpus = user_xcpus(cs);
if (cmd == partcmd_invalidate) { if (is_prs_invalid(old_prs)) @@ -1882,7 +1909,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd, return is_partition_invalid(parent) ? PERR_INVPARENT : PERR_NOTPART; } - if (!newmask && cpumask_empty(cs->cpus_allowed)) + if (!newmask && xcpus_empty(cs)) return PERR_CPUSEMPTY;
nocpu = tasks_nocpu_error(parent, cs, xcpus); @@ -3100,9 +3127,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (!old_prs) { /* - * cpus_allowed cannot be empty. + * cpus_allowed and exclusive_cpus cannot be both empty. */ - if (cpumask_empty(cs->cpus_allowed)) { + if (xcpus_empty(cs)) { err = PERR_CPUSEMPTY; goto out; }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10470 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10470 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7...