Cheng Jian (2): membarrier/kabi: fix kabi for membarrier_state selftest/membarrier: fix build error
Mathieu Desnoyers (5): sched/membarrier: Remove redundant check sched/membarrier: Fix p->mm->membarrier_state racy load selftests, sched/membarrier: Add multi-threaded test sched/membarrier: Skip IPIs when mm->mm_users == 1 sched/membarrier: Return -ENOMEM to userspace on memory allocation failure
Peter Zijlstra (2): sched: Clean up active_mm reference counting membarrier: Fix RCU locking bug caused by faulty merge
Tyler Hicks (1): drm/i915: Fix use-after-free when destroying GEM context
Xiongfeng Wang (2): PCI: add a member in 'struct pci_bus' to record the original 'pci_ops' PCI: fix kabi change in struct pci_bus
Ying Fang (2): KVM: arm/arm64: use esr_ec as trace field of kvm_exit tracepoint KVM: tools/kvm_stat: Fix kvm_exit filter name
drivers/gpu/drm/i915/i915_gem_context.c | 13 +- drivers/pci/probe.c | 7 +- fs/exec.c | 2 +- include/linux/mm_types.h | 8 + include/linux/pci.h | 5 +- include/linux/sched/mm.h | 8 +- kernel/sched/core.c | 49 ++-- kernel/sched/membarrier.c | 236 +++++++++------ kernel/sched/sched.h | 37 +++ tools/kvm/kvm_stat/kvm_stat | 8 +- tools/testing/selftests/membarrier/.gitignore | 3 +- tools/testing/selftests/membarrier/Makefile | 4 +- .../testing/selftests/membarrier/membarrier_test.c | 312 -------------------- .../selftests/membarrier/membarrier_test_impl.h | 317 +++++++++++++++++++++ .../membarrier/membarrier_test_multi_thread.c | 72 +++++ .../membarrier/membarrier_test_single_thread.c | 23 ++ virt/kvm/arm/arm.c | 2 +- virt/kvm/arm/trace.h | 14 +- 18 files changed, 678 insertions(+), 442 deletions(-) delete mode 100644 tools/testing/selftests/membarrier/membarrier_test.c create mode 100644 tools/testing/selftests/membarrier/membarrier_test_impl.h create mode 100644 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c create mode 100644 tools/testing/selftests/membarrier/membarrier_test_single_thread.c
From: Ying Fang <Ying Fang>
euleros inclusion category: bugfix
-------------------------------------------------
The filter name 'exit_reason' is only applicable to x86, however kvm_stat use this trace fileld to filter Guest trap info. It will fail to call ioctl(fd, SET_FILTER, fileld_name) on a different architecture other than x86.
This patch makes arm64 use 'esr_ec' as the trace fileld of kvm_exit tracepoint, so that kvm_stat will work.
Signed-off-by: Ying Fang fangying1@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/arm.c | 2 +- virt/kvm/arm/trace.h | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index a54c221..d1fef36 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -853,7 +853,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) * guest time. */ guest_exit(); - trace_kvm_exit(vcpu->vcpu_id, ret, *vcpu_pc(vcpu)); + trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
/* Exit types that need handling before we can be preempted */ handle_exit_early(vcpu, run, ret); diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h index e32fee7..674fda1 100644 --- a/virt/kvm/arm/trace.h +++ b/virt/kvm/arm/trace.h @@ -28,24 +28,24 @@ );
TRACE_EVENT(kvm_exit, - TP_PROTO(unsigned int vcpu_id, int ret, unsigned long vcpu_pc), - TP_ARGS(vcpu_id, ret, vcpu_pc), + TP_PROTO(int ret, unsigned int esr_ec, unsigned long vcpu_pc), + TP_ARGS(ret, esr_ec, vcpu_pc),
TP_STRUCT__entry( - __field( unsigned int, vcpu_id ) __field( int, ret ) + __field( unsigned int, esr_ec ) __field( unsigned long, vcpu_pc ) ),
TP_fast_assign( - __entry->vcpu_id = vcpu_id; __entry->ret = ARM_EXCEPTION_CODE(ret); + __entry->esr_ec = ARM_EXCEPTION_IS_TRAP(ret) ? esr_ec : 0; __entry->vcpu_pc = vcpu_pc; ), - - TP_printk("VCPU %u: exit_type=%s, PC=0x%08lx", - __entry->vcpu_id, + TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx", __print_symbolic(__entry->ret, kvm_arm_exception_type), + __entry->esr_ec, + __print_symbolic(__entry->esr_ec, kvm_arm_exception_class), __entry->vcpu_pc) );
From: Ying Fang <Ying Fang>
euleros inclusion category: bugfix
picked from: https://patchwork.kernel.org/patch/11296413/
-------------------------------------------------
The filter name is fixed to "exit_reason" for some kvm_exit events, no matter what architect we have. Actually, the filter name ("exit_reason") is only applicable to x86, meaning it's broken on other architects including aarch64.
This fixes the issue by providing various kvm_exit filter names, depending on architect we're on. Afterwards, the variable filter name is picked and applied by ioctl(fd, SET_FILTER).
Reported-by: Andrew Jones drjones@redhat.com Signed-off-by: Gavin Shan gshan@redhat.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- tools/kvm/kvm_stat/kvm_stat | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/kvm/kvm_stat/kvm_stat b/tools/kvm/kvm_stat/kvm_stat index ba7ee74..f6ca0a2 100755 --- a/tools/kvm/kvm_stat/kvm_stat +++ b/tools/kvm/kvm_stat/kvm_stat @@ -271,6 +271,7 @@ class ArchX86(Arch): def __init__(self, exit_reasons): self.sc_perf_evt_open = 298 self.ioctl_numbers = IOCTL_NUMBERS + self.exit_reason_field = 'exit_reason' self.exit_reasons = exit_reasons
def debugfs_is_child(self, field): @@ -290,6 +291,7 @@ class ArchPPC(Arch): # numbers depend on the wordsize. char_ptr_size = ctypes.sizeof(ctypes.c_char_p) self.ioctl_numbers['SET_FILTER'] = 0x80002406 | char_ptr_size << 16 + self.exit_reason_field = 'exit_nr' self.exit_reasons = {}
def debugfs_is_child(self, field): @@ -301,6 +303,7 @@ class ArchA64(Arch): def __init__(self): self.sc_perf_evt_open = 241 self.ioctl_numbers = IOCTL_NUMBERS + self.exit_reason_field = 'esr_ec' self.exit_reasons = AARCH64_EXIT_REASONS
def debugfs_is_child(self, field): @@ -312,6 +315,7 @@ class ArchS390(Arch): def __init__(self): self.sc_perf_evt_open = 331 self.ioctl_numbers = IOCTL_NUMBERS + self.exit_reason_field = None self.exit_reasons = None
def debugfs_is_child(self, field): @@ -542,8 +546,8 @@ class TracepointProvider(Provider): """ filters = {} filters['kvm_userspace_exit'] = ('reason', USERSPACE_EXIT_REASONS) - if ARCH.exit_reasons: - filters['kvm_exit'] = ('exit_reason', ARCH.exit_reasons) + if ARCH.exit_reason_field and ARCH.exit_reasons: + filters['kvm_exit'] = (ARCH.exit_reason_field, ARCH.exit_reasons) return filters
def _get_available_fields(self):
From: Xiongfeng Wang wangxiongfeng2@huawei.com
hulk inclusion category: bugfix bugzilla: NA CVE: NA
-------------------------------------------------
When I test 'aer-inject' with the following procedures: 1. inject a fatal error into a upstream PCI bridge 2. remove the upstream bridge by sysfs 3. rescan the PCI tree by 'echo 1 > /sys/bus/pci/rescan' 4. execute command 'rmmod aer-inject' 5. remove the upstream bridge by sysfs again
I came across the following Oops.
[ 799.713238] Internal error: Oops: 96000007 [#1] SMP [ 799.718099] Process bash (pid: 10683, stack limit = 0x00000000125a3b1b) [ 799.724686] CPU: 108 PID: 10683 Comm: bash Kdump: loaded Not tainted 4.19.36 #2 [ 799.731962] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019 [ 799.739325] pstate: 40400009 (nZcv daif +PAN -UAO) [ 799.744104] pc : pci_remove_bus+0xc0/0x1c0 [ 799.748182] lr : pci_remove_bus+0x94/0x1c0 [ 799.752260] sp : ffffa02e335df940 [ 799.755560] x29: ffffa02e335df940 x28: ffff2000088216a8 [ 799.760849] x27: 1ffff405c66bbfbc x26: ffff20000a9518c0 [ 799.766139] x25: ffffa02dea6ec418 x24: 1ffff405bd4dd883 [ 799.771427] x23: ffffa02e72576628 x22: 1ffff405ce4aecc0 [ 799.776715] x21: ffffa02e72576608 x20: ffff200002e75080 [ 799.782003] x19: ffffa02e72576600 x18: 0000000000000000 [ 799.787291] x17: 0000000000000000 x16: 0000000000000000 [ 799.792578] x15: 0000000000000001 x14: dfff200000000000 [ 799.797866] x13: ffff20000a6dfaf0 x12: 0000000000000000 [ 799.803154] x11: 1fffe4000159b217 x10: ffff04000159b217 [ 799.808442] x9 : dfff200000000000 x8 : ffff20000acd90bf [ 799.813730] x7 : 0000000000000000 x6 : 0000000000000000 [ 799.819017] x5 : 0000000000000001 x4 : 0000000000000000 [ 799.824306] x3 : 1ffff405dbe62603 x2 : 1fffe400005cea11 [ 799.829593] x1 : dfff200000000000 x0 : ffff200002e75088 [ 799.834882] Call trace: [ 799.837323] pci_remove_bus+0xc0/0x1c0 [ 799.841056] pci_remove_bus_device+0xd0/0x2f0 [ 799.845392] pci_stop_and_remove_bus_device_locked+0x2c/0x40 [ 799.851028] remove_store+0x1b8/0x1d0 [ 799.854679] dev_attr_store+0x60/0x80 [ 799.858330] sysfs_kf_write+0x104/0x170 [ 799.862149] kernfs_fop_write+0x23c/0x430 [ 799.866143] __vfs_write+0xec/0x4e0 [ 799.869615] vfs_write+0x12c/0x3d0 [ 799.873001] ksys_write+0xd0/0x190 [ 799.876389] __arm64_sys_write+0x70/0xa0 [ 799.880298] el0_svc_common+0xfc/0x278 [ 799.884030] el0_svc_handler+0x50/0xc0 [ 799.887764] el0_svc+0x8/0xc [ 799.890634] Code: d2c40001 f2fbffe1 91002280 d343fc02 (38e16841) [ 799.896700] kernel fault(0x1) notification starting on CPU 108
It is because when we alloc a new bus in rescanning process, the 'pci_ops' of the newly allocced 'pci_bus' is inherited from its parent pci bus. Whereas, the 'pci_ops' of the parent bus may be changed to 'aer_inj_pci_ops' in 'aer_inject()'. When we unload the module 'aer_inject', we only restore the 'pci_ops' for the pci bus of the error-injected device and the root port in 'aer_inject_exit'. After we have unloaded the module, the 'pci_ops' of the newly allocced pci bus is still 'aer_inj_pci_ops'. When we access it, an Oops happened.
This patch add a member 'backup_ops' in 'struct pci_bus' to record the original 'ops'. When we alloc a child pci bus, we assign the 'backup_ops' of the parent bus to the 'ops' of the child bus.
Maybe the best way is to not modify the 'pci_ops' in 'struct pci_bus', but this will refactor the 'aer_inject' framework a lot. I haven't found a better way to handle it.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/pci/probe.c | 7 ++++++- include/linux/pci.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 1320b48..95d7a5f 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -848,6 +848,7 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge) bus->sysdata = bridge->sysdata; bus->msi = bridge->msi; bus->ops = bridge->ops; + bus->backup_ops = bus->ops; bus->number = bus->busn_res.start = bridge->busnr; #ifdef CONFIG_PCI_DOMAINS_GENERIC bus->domain_nr = pci_bus_find_domain_nr(bus, parent); @@ -993,7 +994,11 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, return NULL;
child->parent = parent; - child->ops = parent->ops; + if (parent->backup_ops) + child->ops = parent->backup_ops; + else + child->ops = parent->ops; + child->backup_ops = child->ops; child->msi = parent->msi; child->sysdata = parent->sysdata; child->bus_flags = parent->bus_flags; diff --git a/include/linux/pci.h b/include/linux/pci.h index 1221ff8..c97de5c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -584,6 +584,7 @@ struct pci_bus { struct resource busn_res; /* Bus numbers routed to this bus */
struct pci_ops *ops; /* Configuration access functions */ + struct pci_ops *backup_ops; struct msi_controller *msi; /* MSI controller */ void *sysdata; /* Hook for sys-specific extension */ struct proc_dir_entry *procdir; /* Directory entry in /proc/bus/pci */
From: Xiongfeng Wang wangxiongfeng2@huawei.com
hulk inclusion category: bugfix bugzilla: NA CVE: NA
---------------------------
Fix kabi change in struct pci_bus since the following patch. 12de28f380a9 ("PCI: add a member in 'struct pci_bus' to record the original 'pci_ops'")
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/pci.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/pci.h b/include/linux/pci.h index c97de5c..eb85ed6 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -584,7 +584,6 @@ struct pci_bus { struct resource busn_res; /* Bus numbers routed to this bus */
struct pci_ops *ops; /* Configuration access functions */ - struct pci_ops *backup_ops; struct msi_controller *msi; /* MSI controller */ void *sysdata; /* Hook for sys-specific extension */ struct proc_dir_entry *procdir; /* Directory entry in /proc/bus/pci */ @@ -606,8 +605,11 @@ struct pci_bus { struct bin_attribute *legacy_io; /* Legacy I/O for this bus */ struct bin_attribute *legacy_mem; /* Legacy mem */ unsigned int is_added:1; - +#ifndef __GENKSYMS__ + struct pci_ops *backup_ops; +#else KABI_RESERVE(1) +#endif KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4)
From: Tyler Hicks tyhicks@canonical.com
This patch is a simplified fix to address a use-after-free in 4.14.x and 4.19.x stable kernels. The flaw is already fixed upstream, starting in 5.2, by commit 7dc40713618c ("drm/i915: Introduce a mutex for file_priv->context_idr") as part of a more complex patch series that isn't appropriate for backporting to stable kernels.
Expand mutex coverage, while destroying the GEM context, to include the GEM context lookup step. This fixes a use-after-free detected by KASAN:
================================================================== BUG: KASAN: use-after-free in i915_ppgtt_close+0x2ca/0x2f0 Write of size 1 at addr ffff8881368a8368 by task i915-poc/3124
CPU: 0 PID: 3124 Comm: i915-poc Not tainted 4.14.164 #1 Hardware name: HP HP Elite x2 1012 G1 /80FC, BIOS N85 Ver. 01.20 04/05/2017 Call Trace: dump_stack+0xcd/0x12e ? _atomic_dec_and_lock+0x1b2/0x1b2 ? i915_ppgtt_close+0x2ca/0x2f0 ? printk+0x8f/0xab ? show_regs_print_info+0x53/0x53 ? i915_ppgtt_close+0x2ca/0x2f0 print_address_description+0x65/0x270 ? i915_ppgtt_close+0x2ca/0x2f0 kasan_report+0x251/0x340 i915_ppgtt_close+0x2ca/0x2f0 ? __radix_tree_insert+0x3f0/0x3f0 ? i915_ppgtt_init_hw+0x7c0/0x7c0 context_close+0x42e/0x680 ? i915_gem_context_release+0x230/0x230 ? kasan_kmalloc+0xa0/0xd0 ? radix_tree_delete_item+0x1d4/0x250 ? radix_tree_lookup+0x10/0x10 ? inet_recvmsg+0x4b0/0x4b0 ? kasan_slab_free+0x88/0xc0 i915_gem_context_destroy_ioctl+0x236/0x300 ? i915_gem_context_create_ioctl+0x360/0x360 ? drm_dev_printk+0x1d0/0x1d0 ? memcpy+0x34/0x50 ? i915_gem_context_create_ioctl+0x360/0x360 drm_ioctl_kernel+0x1b0/0x2b0 ? drm_ioctl_permit+0x2a0/0x2a0 ? avc_ss_reset+0xd0/0xd0 drm_ioctl+0x6fe/0xa20 ? i915_gem_context_create_ioctl+0x360/0x360 ? drm_getstats+0x20/0x20 ? put_unused_fd+0x260/0x260 do_vfs_ioctl+0x189/0x12d0 ? ioctl_preallocate+0x280/0x280 ? selinux_file_ioctl+0x3a7/0x680 ? selinux_bprm_set_creds+0xe30/0xe30 ? security_file_ioctl+0x69/0xa0 ? selinux_bprm_set_creds+0xe30/0xe30 SyS_ioctl+0x6f/0x80 ? __sys_sendmmsg+0x4a0/0x4a0 ? do_vfs_ioctl+0x12d0/0x12d0 do_syscall_64+0x214/0x5f0 ? __switch_to_asm+0x31/0x60 ? __switch_to_asm+0x25/0x60 ? __switch_to_asm+0x31/0x60 ? syscall_return_slowpath+0x2c0/0x2c0 ? copy_overflow+0x20/0x20 ? __switch_to_asm+0x25/0x60 ? syscall_return_via_sysret+0x2a/0x7a ? prepare_exit_to_usermode+0x200/0x200 ? __switch_to_asm+0x31/0x60 ? __switch_to_asm+0x31/0x60 ? __switch_to_asm+0x25/0x60 ? __switch_to_asm+0x25/0x60 ? __switch_to_asm+0x31/0x60 ? __switch_to_asm+0x25/0x60 ? __switch_to_asm+0x31/0x60 ? __switch_to_asm+0x31/0x60 ? __switch_to_asm+0x25/0x60 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7f7fda5115d7 RSP: 002b:00007f7eec317ec8 EFLAGS: 00000286 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7fda5115d7 RDX: 000055b306db9188 RSI: 000000004008646e RDI: 0000000000000003 RBP: 00007f7eec317ef0 R08: 00007f7eec318700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000286 R12: 00007f7eec317fc0 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffd8007ade0
Allocated by task 2898: save_stack+0x32/0xb0 kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc_trace+0x5e/0x180 i915_ppgtt_create+0xab/0x2510 i915_gem_create_context+0x981/0xf90 i915_gem_context_create_ioctl+0x1d7/0x360 drm_ioctl_kernel+0x1b0/0x2b0 drm_ioctl+0x6fe/0xa20 do_vfs_ioctl+0x189/0x12d0 SyS_ioctl+0x6f/0x80 do_syscall_64+0x214/0x5f0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Freed by task 104: save_stack+0x32/0xb0 kasan_slab_free+0x72/0xc0 kfree+0x88/0x190 i915_ppgtt_release+0x24e/0x460 i915_gem_context_free+0x90/0x480 contexts_free_worker+0x54/0x80 process_one_work+0x876/0x14e0 worker_thread+0x1b8/0xfd0 kthread+0x2f8/0x3c0 ret_from_fork+0x35/0x40
The buggy address belongs to the object at ffff8881368a8000 which belongs to the cache kmalloc-8192 of size 8192 The buggy address is located 872 bytes inside of 8192-byte region [ffff8881368a8000, ffff8881368aa000) The buggy address belongs to the page: page:ffffea0004da2a00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x200000000008100(slab|head) raw: 0200000000008100 0000000000000000 0000000000000000 0000000100030003 raw: dead000000000100 dead000000000200 ffff88822a002280 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8881368a8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881368a8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8881368a8300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8881368a8380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881368a8400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 1acfc104cdf8 ("drm/i915: Enable rcu-only context lookups") Reported-by: 罗权 luoquan@qianxin.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: stable@vger.kernel.org # 4.14.x Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Tyler Hicks tyhicks@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/gpu/drm/i915/i915_gem_context.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 7a0e6db..ef383fd 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -770,18 +770,19 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, if (args->ctx_id == DEFAULT_CONTEXT_HANDLE) return -ENOENT;
+ ret = i915_mutex_lock_interruptible(dev); + if (ret) + return ret; + ctx = i915_gem_context_lookup(file_priv, args->ctx_id); - if (!ctx) + if (!ctx) { + mutex_unlock(&dev->struct_mutex); return -ENOENT; - - ret = mutex_lock_interruptible(&dev->struct_mutex); - if (ret) - goto out; + }
__destroy_hw_context(ctx, file_priv); mutex_unlock(&dev->struct_mutex);
-out: i915_gem_context_put(ctx); return 0; }
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
mainline inclusion from mainline-5.4-rc1 commit 09554009c0cad4cb2223dd943c813c9257c6883a category: bugfix bugzilla: 28332 CVE: NA
-------------------------------------------------
Checking that the number of threads is 1 is redundant with checking mm_users == 1.
No change in functionality intended.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-3-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/sched/membarrier.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index dd27e632..bcd0ff7 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -195,7 +195,7 @@ static int membarrier_register_global_expedited(void) MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY) return 0; atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state); - if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) { + if (atomic_read(&mm->mm_users) == 1) { /* * For single mm user, single threaded process, we can * simply issue a memory barrier after setting @@ -241,7 +241,7 @@ static int membarrier_register_private_expedited(int flags) if (flags & MEMBARRIER_FLAG_SYNC_CORE) atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE, &mm->membarrier_state); - if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) { + if (atomic_read(&mm->mm_users) != 1) { /* * Ensure all future scheduler executions will observe the * new thread flag state for this process.
From: Peter Zijlstra peterz@infradead.org
mainline inclusion from mainline-5.4-rc1 commit 139d025cda1da5484e7287b35c019fe1dcf9b650 category: bugfix bugzilla: 28332 [preparation] CVE: NA
-------------------------------------------------
The current active_mm reference counting is confusing and sub-optimal.
Rewrite the code to explicitly consider the 4 separate cases:
user -> user
When switching between two user tasks, all we need to consider is switch_mm().
user -> kernel
When switching from a user task to a kernel task (which doesn't have an associated mm) we retain the last mm in our active_mm. Increment a reference count on active_mm.
kernel -> kernel
When switching between kernel threads, all we need to do is pass along the active_mm reference.
kernel -> user
When switching between a kernel and user task, we must switch from the last active_mm to the next mm, hoping of course that these are the same. Decrement a reference on the active_mm.
The code keeps a different order, because as you'll note, both 'to user' cases require switch_mm().
And where the old code would increment/decrement for the 'kernel -> kernel' case, the new code observes this is a neutral operation and avoids touching the reference count.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Rik van Riel riel@surriel.com Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: luto@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/sched/core.c | 49 ++++++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 48f81e6..e2ee09d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2811,12 +2811,8 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) context_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next, struct rq_flags *rf) { - struct mm_struct *mm, *oldmm; - prepare_task_switch(rq, prev, next);
- mm = next->mm; - oldmm = prev->active_mm; /* * For paravirt, this is coupled with an exit in switch_to to * combine the page table reload and the switch backend into @@ -2825,22 +2821,37 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) arch_start_context_switch(prev);
/* - * If mm is non-NULL, we pass through switch_mm(). If mm is - * NULL, we will pass through mmdrop() in finish_task_switch(). - * Both of these contain the full memory barrier required by - * membarrier after storing to rq->curr, before returning to - * user-space. + * kernel -> kernel lazy + transfer active + * user -> kernel lazy + mmgrab() active + * + * kernel -> user switch + mmdrop() active + * user -> user switch */ - if (!mm) { - next->active_mm = oldmm; - mmgrab(oldmm); - enter_lazy_tlb(oldmm, next); - } else - switch_mm_irqs_off(oldmm, mm, next); - - if (!prev->mm) { - prev->active_mm = NULL; - rq->prev_mm = oldmm; + if (!next->mm) { // to kernel + enter_lazy_tlb(prev->active_mm, next); + + next->active_mm = prev->active_mm; + if (prev->mm) // from user + mmgrab(prev->active_mm); + else + prev->active_mm = NULL; + } else { // to user + /* + * sys_membarrier() requires an smp_mb() between setting + * rq->curr and returning to userspace. + * + * The below provides this either through switch_mm(), or in + * case 'prev->active_mm == next->mm' through + * finish_task_switch()'s mmdrop(). + */ + + switch_mm_irqs_off(prev->active_mm, next->mm, next); + + if (!prev->mm) { // from kernel + /* will mmdrop() in finish_task_switch(). */ + rq->prev_mm = prev->active_mm; + prev->active_mm = NULL; + } }
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
mainline inclusion from mainline-5.4-rc1 commit 227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce category: bugfix bugzilla: 28332 CVE: NA
-------------------------------------------------
The membarrier_state field is located within the mm_struct, which is not guaranteed to exist when used from runqueue-lock-free iteration on runqueues by the membarrier system call.
Copy the membarrier_state from the mm_struct into the scheduler runqueue when the scheduler switches between mm.
When registering membarrier for mm, after setting the registration bit in the mm membarrier state, issue a synchronize_rcu() to ensure the scheduler observes the change. In order to take care of the case where a runqueue keeps executing the target mm without swapping to other mm, iterate over each runqueue and issue an IPI to copy the membarrier_state from the mm_struct into each runqueue which have the same mm which state has just been modified.
Move the mm membarrier_state field closer to pgd in mm_struct to use a cache line already touched by the scheduler switch_mm.
The membarrier_execve() (now membarrier_exec_mmap) hook now needs to clear the runqueue's membarrier state in addition to clear the mm membarrier state, so move its implementation into the scheduler membarrier code so it can access the runqueue structure.
Add memory barrier in membarrier_exec_mmap() prior to clearing the membarrier state, ensuring memory accesses executed prior to exec are not reordered with the stores clearing the membarrier state.
As suggested by Linus, move all membarrier.c RCU read-side locks outside of the for each cpu loops.
[Cheng Jian: use task_rcu_dereference in sync_runqueues_membarrier_state]
Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/exec.c | 2 +- include/linux/mm_types.h | 14 +++- include/linux/sched/mm.h | 8 +-- kernel/sched/core.c | 4 +- kernel/sched/membarrier.c | 175 ++++++++++++++++++++++++++++++++++------------ kernel/sched/sched.h | 34 +++++++++ 6 files changed, 183 insertions(+), 54 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c index 10f1e24..f0a7a59 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1029,6 +1029,7 @@ static int exec_mmap(struct mm_struct *mm) } task_lock(tsk); active_mm = tsk->active_mm; + membarrier_exec_mmap(mm); tsk->mm = mm; tsk->active_mm = mm; activate_mm(active_mm, mm); @@ -1823,7 +1824,6 @@ static int __do_execve_file(int fd, struct filename *filename, /* execve succeeded */ current->fs->in_exec = 0; current->in_execve = 0; - membarrier_execve(current); rseq_execve(current); acct_update_integrals(current); task_numa_free(current, false); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3c639a8..178e9de 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -369,6 +369,16 @@ struct mm_struct { unsigned long highest_vm_end; /* highest vma end address */ pgd_t * pgd;
+#ifdef CONFIG_MEMBARRIER + /** + * @membarrier_state: Flags controlling membarrier behavior. + * + * This field is close to @pgd to hopefully fit in the same + * cache-line, which needs to be touched by switch_mm(). + */ + atomic_t membarrier_state; +#endif + /** * @mm_users: The number of users including userspace. * @@ -438,9 +448,7 @@ struct mm_struct { unsigned long flags; /* Must use atomic bitops to access */
struct core_state *core_state; /* coredumping support */ -#ifdef CONFIG_MEMBARRIER - atomic_t membarrier_state; -#endif + #ifdef CONFIG_AIO spinlock_t ioctx_lock; struct kioctx_table __rcu *ioctx_table; diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index e9d4e38..8c3eb9d 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -338,10 +338,8 @@ static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) sync_core_before_usermode(); }
-static inline void membarrier_execve(struct task_struct *t) -{ - atomic_set(&t->mm->membarrier_state, 0); -} +extern void membarrier_exec_mmap(struct mm_struct *mm); + #else #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS static inline void membarrier_arch_switch_mm(struct mm_struct *prev, @@ -350,7 +348,7 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev, { } #endif -static inline void membarrier_execve(struct task_struct *t) +static inline void membarrier_exec_mmap(struct mm_struct *mm) { } static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e2ee09d..5057f19 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2836,15 +2836,15 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) else prev->active_mm = NULL; } else { // to user + membarrier_switch_mm(rq, prev->active_mm, next->mm); /* * sys_membarrier() requires an smp_mb() between setting - * rq->curr and returning to userspace. + * rq->curr / membarrier_switch_mm() and returning to userspace. * * The below provides this either through switch_mm(), or in * case 'prev->active_mm == next->mm' through * finish_task_switch()'s mmdrop(). */ - switch_mm_irqs_off(prev->active_mm, next->mm, next);
if (!prev->mm) { // from kernel diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index bcd0ff7..dc25e83 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -39,6 +39,39 @@ static void ipi_mb(void *info) smp_mb(); /* IPIs should be serializing but paranoid. */ }
+static void ipi_sync_rq_state(void *info) +{ + struct mm_struct *mm = (struct mm_struct *) info; + + if (current->mm != mm) + return; + this_cpu_write(runqueues.membarrier_state, + atomic_read(&mm->membarrier_state)); + /* + * Issue a memory barrier after setting + * MEMBARRIER_STATE_GLOBAL_EXPEDITED in the current runqueue to + * guarantee that no memory access following registration is reordered + * before registration. + */ + smp_mb(); +} + +void membarrier_exec_mmap(struct mm_struct *mm) +{ + /* + * Issue a memory barrier before clearing membarrier_state to + * guarantee that no memory access prior to exec is reordered after + * clearing this state. + */ + smp_mb(); + atomic_set(&mm->membarrier_state, 0); + /* + * Keep the runqueue membarrier_state in sync with this mm + * membarrier_state. + */ + this_cpu_write(runqueues.membarrier_state, 0); +} + static int membarrier_global_expedited(void) { int cpu; @@ -65,6 +98,7 @@ static int membarrier_global_expedited(void) }
cpus_read_lock(); + rcu_read_lock(); for_each_online_cpu(cpu) { struct task_struct *p;
@@ -79,17 +113,25 @@ static int membarrier_global_expedited(void) if (cpu == raw_smp_processor_id()) continue;
- rcu_read_lock(); + if (!(READ_ONCE(cpu_rq(cpu)->membarrier_state) & + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) + continue; + + /* + * Skip the CPU if it runs a kernel thread. The scheduler + * leaves the prior task mm in place as an optimization when + * scheduling a kthread. + */ p = task_rcu_dereference(&cpu_rq(cpu)->curr); - if (p && p->mm && (atomic_read(&p->mm->membarrier_state) & - MEMBARRIER_STATE_GLOBAL_EXPEDITED)) { - if (!fallback) - __cpumask_set_cpu(cpu, tmpmask); - else - smp_call_function_single(cpu, ipi_mb, NULL, 1); - } - rcu_read_unlock(); + if (p->flags & PF_KTHREAD) + continue; + + if (!fallback) + __cpumask_set_cpu(cpu, tmpmask); + else + smp_call_function_single(cpu, ipi_mb, NULL, 1); } + rcu_read_unlock(); if (!fallback) { preempt_disable(); smp_call_function_many(tmpmask, ipi_mb, NULL, 1); @@ -145,6 +187,7 @@ static int membarrier_private_expedited(int flags) }
cpus_read_lock(); + rcu_read_lock(); for_each_online_cpu(cpu) { struct task_struct *p;
@@ -166,8 +209,8 @@ static int membarrier_private_expedited(int flags) else smp_call_function_single(cpu, ipi_mb, NULL, 1); } - rcu_read_unlock(); } + rcu_read_unlock(); if (!fallback) { preempt_disable(); smp_call_function_many(tmpmask, ipi_mb, NULL, 1); @@ -186,32 +229,78 @@ static int membarrier_private_expedited(int flags) return 0; }
+static int sync_runqueues_membarrier_state(struct mm_struct *mm) +{ + int membarrier_state = atomic_read(&mm->membarrier_state); + cpumask_var_t tmpmask; + int cpu; + + if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) { + this_cpu_write(runqueues.membarrier_state, membarrier_state); + + /* + * For single mm user, we can simply issue a memory barrier + * after setting MEMBARRIER_STATE_GLOBAL_EXPEDITED in the + * mm and in the current runqueue to guarantee that no memory + * access following registration is reordered before + * registration. + */ + smp_mb(); + return 0; + } + + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) + return -ENOMEM; + + /* + * For mm with multiple users, we need to ensure all future + * scheduler executions will observe @mm's new membarrier + * state. + */ + synchronize_rcu(); + + /* + * For each cpu runqueue, if the task's mm match @mm, ensure that all + * @mm's membarrier state set bits are also set in in the runqueue's + * membarrier state. This ensures that a runqueue scheduling + * between threads which are users of @mm has its membarrier state + * updated. + */ + cpus_read_lock(); + rcu_read_lock(); + for_each_online_cpu(cpu) { + struct rq *rq = cpu_rq(cpu); + struct task_struct *p; + + p = task_rcu_dereference(&rq->curr); + if (p && p->mm == mm) + __cpumask_set_cpu(cpu, tmpmask); + } + rcu_read_unlock(); + + preempt_disable(); + smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1); + preempt_enable(); + + free_cpumask_var(tmpmask); + cpus_read_unlock(); + + return 0; +} + static int membarrier_register_global_expedited(void) { struct task_struct *p = current; struct mm_struct *mm = p->mm; + int ret;
if (atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY) return 0; atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state); - if (atomic_read(&mm->mm_users) == 1) { - /* - * For single mm user, single threaded process, we can - * simply issue a memory barrier after setting - * MEMBARRIER_STATE_GLOBAL_EXPEDITED to guarantee that - * no memory access following registration is reordered - * before registration. - */ - smp_mb(); - } else { - /* - * For multi-mm user threads, we need to ensure all - * future scheduler executions will observe the new - * thread flag state for this mm. - */ - synchronize_sched(); - } + ret = sync_runqueues_membarrier_state(mm); + if (ret) + return ret; atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY, &mm->membarrier_state);
@@ -222,12 +311,15 @@ static int membarrier_register_private_expedited(int flags) { struct task_struct *p = current; struct mm_struct *mm = p->mm; - int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY; + int ready_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY, + set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED, + ret;
if (flags & MEMBARRIER_FLAG_SYNC_CORE) { if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) return -EINVAL; - state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY; + ready_state = + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY; }
/* @@ -235,20 +327,15 @@ static int membarrier_register_private_expedited(int flags) * groups, which use the same mm. (CLONE_VM but not * CLONE_THREAD). */ - if ((atomic_read(&mm->membarrier_state) & state) == state) + if ((atomic_read(&mm->membarrier_state) & ready_state) == ready_state) return 0; - atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state); if (flags & MEMBARRIER_FLAG_SYNC_CORE) - atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE, - &mm->membarrier_state); - if (atomic_read(&mm->mm_users) != 1) { - /* - * Ensure all future scheduler executions will observe the - * new thread flag state for this process. - */ - synchronize_sched(); - } - atomic_or(state, &mm->membarrier_state); + set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE; + atomic_or(set_state, &mm->membarrier_state); + ret = sync_runqueues_membarrier_state(mm); + if (ret) + return ret; + atomic_or(ready_state, &mm->membarrier_state);
return 0; } @@ -262,8 +349,10 @@ static int membarrier_register_private_expedited(int flags) * command specified does not exist, not available on the running * kernel, or if the command argument is invalid, this system call * returns -EINVAL. For a given command, with flags argument set to 0, - * this system call is guaranteed to always return the same value until - * reboot. + * if this system call returns -ENOSYS or -EINVAL, it is guaranteed to + * always return the same value until reboot. In addition, it can return + * -ENOMEM if there is not enough memory available to perform the system + * call. * * All memory accesses performed in program order from each targeted thread * is guaranteed to be ordered with respect to sys_membarrier(). If we use diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0c2fb4b..2213dac 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -843,6 +843,10 @@ struct rq {
atomic_t nr_iowait;
+#ifdef CONFIG_MEMBARRIER + int membarrier_state; +#endif + #ifdef CONFIG_SMP struct root_domain *rd; struct sched_domain *sd; @@ -2273,3 +2277,33 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned return util; } #endif + +#ifdef CONFIG_MEMBARRIER +/* + * The scheduler provides memory barriers required by membarrier between: + * - prior user-space memory accesses and store to rq->membarrier_state, + * - store to rq->membarrier_state and following user-space memory accesses. + * In the same way it provides those guarantees around store to rq->curr. + */ +static inline void membarrier_switch_mm(struct rq *rq, + struct mm_struct *prev_mm, + struct mm_struct *next_mm) +{ + int membarrier_state; + + if (prev_mm == next_mm) + return; + + membarrier_state = atomic_read(&next_mm->membarrier_state); + if (READ_ONCE(rq->membarrier_state) == membarrier_state) + return; + + WRITE_ONCE(rq->membarrier_state, membarrier_state); +} +#else +static inline void membarrier_switch_mm(struct rq *rq, + struct mm_struct *prev_mm, + struct mm_struct *next_mm) +{ +} +#endif
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
mainline inclusion from mainline-5.4-rc1 commit 19a4ff534bb09686f53800564cb977bad2177c00 category: bugfix bugzilla: 28332 CVE: NA
-------------------------------------------------
membarrier commands cover very different code paths if they are in a single-threaded vs multi-threaded process. Therefore, exercise both scenarios in the kernel selftests to increase coverage of this selftest.
Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Shuah Khan shuahkh@osg.samsung.com Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-6-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- tools/testing/selftests/membarrier/.gitignore | 3 +- tools/testing/selftests/membarrier/Makefile | 4 +- .../testing/selftests/membarrier/membarrier_test.c | 312 -------------------- .../selftests/membarrier/membarrier_test_impl.h | 317 +++++++++++++++++++++ .../membarrier/membarrier_test_multi_thread.c | 73 +++++ .../membarrier/membarrier_test_single_thread.c | 24 ++ 6 files changed, 419 insertions(+), 314 deletions(-) delete mode 100644 tools/testing/selftests/membarrier/membarrier_test.c create mode 100644 tools/testing/selftests/membarrier/membarrier_test_impl.h create mode 100644 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c create mode 100644 tools/testing/selftests/membarrier/membarrier_test_single_thread.c
diff --git a/tools/testing/selftests/membarrier/.gitignore b/tools/testing/selftests/membarrier/.gitignore index 020c44f4..f2f7ec0 100644 --- a/tools/testing/selftests/membarrier/.gitignore +++ b/tools/testing/selftests/membarrier/.gitignore @@ -1 +1,2 @@ -membarrier_test +membarrier_test_multi_thread +membarrier_test_single_thread diff --git a/tools/testing/selftests/membarrier/Makefile b/tools/testing/selftests/membarrier/Makefile index 0284553..97b6893 100644 --- a/tools/testing/selftests/membarrier/Makefile +++ b/tools/testing/selftests/membarrier/Makefile @@ -1,6 +1,8 @@ CFLAGS += -g -I../../../../usr/include/ +LDLIBS += -lpthread
TEST_GEN_PROGS := membarrier_test +TEST_GEN_PROGS := membarrier_test_single_thread \ + membarrier_test_multi_thread
include ../lib.mk - diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c deleted file mode 100644 index 6793f8e..00000000 --- a/tools/testing/selftests/membarrier/membarrier_test.c +++ /dev/null @@ -1,312 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#define _GNU_SOURCE -#include <linux/membarrier.h> -#include <syscall.h> -#include <stdio.h> -#include <errno.h> -#include <string.h> - -#include "../kselftest.h" - -static int sys_membarrier(int cmd, int flags) -{ - return syscall(__NR_membarrier, cmd, flags); -} - -static int test_membarrier_cmd_fail(void) -{ - int cmd = -1, flags = 0; - const char *test_name = "sys membarrier invalid command"; - - if (sys_membarrier(cmd, flags) != -1) { - ksft_exit_fail_msg( - "%s test: command = %d, flags = %d. Should fail, but passed\n", - test_name, cmd, flags); - } - if (errno != EINVAL) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", - test_name, flags, EINVAL, strerror(EINVAL), - errno, strerror(errno)); - } - - ksft_test_result_pass( - "%s test: command = %d, flags = %d, errno = %d. Failed as expected\n", - test_name, cmd, flags, errno); - return 0; -} - -static int test_membarrier_flags_fail(void) -{ - int cmd = MEMBARRIER_CMD_QUERY, flags = 1; - const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags"; - - if (sys_membarrier(cmd, flags) != -1) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should fail, but passed\n", - test_name, flags); - } - if (errno != EINVAL) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", - test_name, flags, EINVAL, strerror(EINVAL), - errno, strerror(errno)); - } - - ksft_test_result_pass( - "%s test: flags = %d, errno = %d. Failed as expected\n", - test_name, flags, errno); - return 0; -} - -static int test_membarrier_global_success(void) -{ - int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", test_name, flags); - return 0; -} - -static int test_membarrier_private_expedited_fail(void) -{ - int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure"; - - if (sys_membarrier(cmd, flags) != -1) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should fail, but passed\n", - test_name, flags); - } - if (errno != EPERM) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", - test_name, flags, EPERM, strerror(EPERM), - errno, strerror(errno)); - } - - ksft_test_result_pass( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - return 0; -} - -static int test_membarrier_register_private_expedited_success(void) -{ - int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", - test_name, flags); - return 0; -} - -static int test_membarrier_private_expedited_success(void) -{ - int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", - test_name, flags); - return 0; -} - -static int test_membarrier_private_expedited_sync_core_fail(void) -{ - int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure"; - - if (sys_membarrier(cmd, flags) != -1) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should fail, but passed\n", - test_name, flags); - } - if (errno != EPERM) { - ksft_exit_fail_msg( - "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", - test_name, flags, EPERM, strerror(EPERM), - errno, strerror(errno)); - } - - ksft_test_result_pass( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - return 0; -} - -static int test_membarrier_register_private_expedited_sync_core_success(void) -{ - int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", - test_name, flags); - return 0; -} - -static int test_membarrier_private_expedited_sync_core_success(void) -{ - int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", - test_name, flags); - return 0; -} - -static int test_membarrier_register_global_expedited_success(void) -{ - int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", - test_name, flags); - return 0; -} - -static int test_membarrier_global_expedited_success(void) -{ - int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0; - const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED"; - - if (sys_membarrier(cmd, flags) != 0) { - ksft_exit_fail_msg( - "%s test: flags = %d, errno = %d\n", - test_name, flags, errno); - } - - ksft_test_result_pass( - "%s test: flags = %d\n", - test_name, flags); - return 0; -} - -static int test_membarrier(void) -{ - int status; - - status = test_membarrier_cmd_fail(); - if (status) - return status; - status = test_membarrier_flags_fail(); - if (status) - return status; - status = test_membarrier_global_success(); - if (status) - return status; - status = test_membarrier_private_expedited_fail(); - if (status) - return status; - status = test_membarrier_register_private_expedited_success(); - if (status) - return status; - status = test_membarrier_private_expedited_success(); - if (status) - return status; - status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0); - if (status < 0) { - ksft_test_result_fail("sys_membarrier() failed\n"); - return status; - } - if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) { - status = test_membarrier_private_expedited_sync_core_fail(); - if (status) - return status; - status = test_membarrier_register_private_expedited_sync_core_success(); - if (status) - return status; - status = test_membarrier_private_expedited_sync_core_success(); - if (status) - return status; - } - /* - * It is valid to send a global membarrier from a non-registered - * process. - */ - status = test_membarrier_global_expedited_success(); - if (status) - return status; - status = test_membarrier_register_global_expedited_success(); - if (status) - return status; - status = test_membarrier_global_expedited_success(); - if (status) - return status; - return 0; -} - -static int test_membarrier_query(void) -{ - int flags = 0, ret; - - ret = sys_membarrier(MEMBARRIER_CMD_QUERY, flags); - if (ret < 0) { - if (errno == ENOSYS) { - /* - * It is valid to build a kernel with - * CONFIG_MEMBARRIER=n. However, this skips the tests. - */ - ksft_exit_skip( - "sys membarrier (CONFIG_MEMBARRIER) is disabled.\n"); - } - ksft_exit_fail_msg("sys_membarrier() failed\n"); - } - if (!(ret & MEMBARRIER_CMD_GLOBAL)) - ksft_exit_skip( - "sys_membarrier unsupported: CMD_GLOBAL not found.\n"); - - ksft_test_result_pass("sys_membarrier available\n"); - return 0; -} - -int main(int argc, char **argv) -{ - ksft_print_header(); - - test_membarrier_query(); - test_membarrier(); - - return ksft_exit_pass(); -} diff --git a/tools/testing/selftests/membarrier/membarrier_test_impl.h b/tools/testing/selftests/membarrier/membarrier_test_impl.h new file mode 100644 index 00000000..186be69 --- /dev/null +++ b/tools/testing/selftests/membarrier/membarrier_test_impl.h @@ -0,0 +1,317 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#define _GNU_SOURCE +#include <linux/membarrier.h> +#include <syscall.h> +#include <stdio.h> +#include <errno.h> +#include <string.h> +#include <pthread.h> + +#include "../kselftest.h" + +static int sys_membarrier(int cmd, int flags) +{ + return syscall(__NR_membarrier, cmd, flags); +} + +static int test_membarrier_cmd_fail(void) +{ + int cmd = -1, flags = 0; + const char *test_name = "sys membarrier invalid command"; + + if (sys_membarrier(cmd, flags) != -1) { + ksft_exit_fail_msg( + "%s test: command = %d, flags = %d. Should fail, but passed\n", + test_name, cmd, flags); + } + if (errno != EINVAL) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", + test_name, flags, EINVAL, strerror(EINVAL), + errno, strerror(errno)); + } + + ksft_test_result_pass( + "%s test: command = %d, flags = %d, errno = %d. Failed as expected\n", + test_name, cmd, flags, errno); + return 0; +} + +static int test_membarrier_flags_fail(void) +{ + int cmd = MEMBARRIER_CMD_QUERY, flags = 1; + const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags"; + + if (sys_membarrier(cmd, flags) != -1) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should fail, but passed\n", + test_name, flags); + } + if (errno != EINVAL) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", + test_name, flags, EINVAL, strerror(EINVAL), + errno, strerror(errno)); + } + + ksft_test_result_pass( + "%s test: flags = %d, errno = %d. Failed as expected\n", + test_name, flags, errno); + return 0; +} + +static int test_membarrier_global_success(void) +{ + int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", test_name, flags); + return 0; +} + +static int test_membarrier_private_expedited_fail(void) +{ + int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure"; + + if (sys_membarrier(cmd, flags) != -1) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should fail, but passed\n", + test_name, flags); + } + if (errno != EPERM) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", + test_name, flags, EPERM, strerror(EPERM), + errno, strerror(errno)); + } + + ksft_test_result_pass( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + return 0; +} + +static int test_membarrier_register_private_expedited_success(void) +{ + int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", + test_name, flags); + return 0; +} + +static int test_membarrier_private_expedited_success(void) +{ + int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", + test_name, flags); + return 0; +} + +static int test_membarrier_private_expedited_sync_core_fail(void) +{ + int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure"; + + if (sys_membarrier(cmd, flags) != -1) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should fail, but passed\n", + test_name, flags); + } + if (errno != EPERM) { + ksft_exit_fail_msg( + "%s test: flags = %d. Should return (%d: "%s"), but returned (%d: "%s").\n", + test_name, flags, EPERM, strerror(EPERM), + errno, strerror(errno)); + } + + ksft_test_result_pass( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + return 0; +} + +static int test_membarrier_register_private_expedited_sync_core_success(void) +{ + int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", + test_name, flags); + return 0; +} + +static int test_membarrier_private_expedited_sync_core_success(void) +{ + int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", + test_name, flags); + return 0; +} + +static int test_membarrier_register_global_expedited_success(void) +{ + int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", + test_name, flags); + return 0; +} + +static int test_membarrier_global_expedited_success(void) +{ + int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0; + const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED"; + + if (sys_membarrier(cmd, flags) != 0) { + ksft_exit_fail_msg( + "%s test: flags = %d, errno = %d\n", + test_name, flags, errno); + } + + ksft_test_result_pass( + "%s test: flags = %d\n", + test_name, flags); + return 0; +} + +static int test_membarrier_fail(void) +{ + int status; + + status = test_membarrier_cmd_fail(); + if (status) + return status; + status = test_membarrier_flags_fail(); + if (status) + return status; + status = test_membarrier_private_expedited_fail(); + if (status) + return status; + status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0); + if (status < 0) { + ksft_test_result_fail("sys_membarrier() failed\n"); + return status; + } + if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) { + status = test_membarrier_private_expedited_sync_core_fail(); + if (status) + return status; + } + return 0; +} + +static int test_membarrier_success(void) +{ + int status; + + status = test_membarrier_global_success(); + if (status) + return status; + status = test_membarrier_register_private_expedited_success(); + if (status) + return status; + status = test_membarrier_private_expedited_success(); + if (status) + return status; + status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0); + if (status < 0) { + ksft_test_result_fail("sys_membarrier() failed\n"); + return status; + } + if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) { + status = test_membarrier_register_private_expedited_sync_core_success(); + if (status) + return status; + status = test_membarrier_private_expedited_sync_core_success(); + if (status) + return status; + } + /* + * It is valid to send a global membarrier from a non-registered + * process. + */ + status = test_membarrier_global_expedited_success(); + if (status) + return status; + status = test_membarrier_register_global_expedited_success(); + if (status) + return status; + status = test_membarrier_global_expedited_success(); + if (status) + return status; + return 0; +} + +static int test_membarrier_query(void) +{ + int flags = 0, ret; + + ret = sys_membarrier(MEMBARRIER_CMD_QUERY, flags); + if (ret < 0) { + if (errno == ENOSYS) { + /* + * It is valid to build a kernel with + * CONFIG_MEMBARRIER=n. However, this skips the tests. + */ + ksft_exit_skip( + "sys membarrier (CONFIG_MEMBARRIER) is disabled.\n"); + } + ksft_exit_fail_msg("sys_membarrier() failed\n"); + } + if (!(ret & MEMBARRIER_CMD_GLOBAL)) + ksft_exit_skip( + "sys_membarrier unsupported: CMD_GLOBAL not found.\n"); + + ksft_test_result_pass("sys_membarrier available\n"); + return 0; +} diff --git a/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c new file mode 100644 index 00000000..ac5613e --- /dev/null +++ b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <linux/membarrier.h> +#include <syscall.h> +#include <stdio.h> +#include <errno.h> +#include <string.h> +#include <pthread.h> + +#include "membarrier_test_impl.h" + +static int thread_ready, thread_quit; +static pthread_mutex_t test_membarrier_thread_mutex = + PTHREAD_MUTEX_INITIALIZER; +static pthread_cond_t test_membarrier_thread_cond = + PTHREAD_COND_INITIALIZER; + +void *test_membarrier_thread(void *arg) +{ + pthread_mutex_lock(&test_membarrier_thread_mutex); + thread_ready = 1; + pthread_cond_broadcast(&test_membarrier_thread_cond); + pthread_mutex_unlock(&test_membarrier_thread_mutex); + + pthread_mutex_lock(&test_membarrier_thread_mutex); + while (!thread_quit) + pthread_cond_wait(&test_membarrier_thread_cond, + &test_membarrier_thread_mutex); + pthread_mutex_unlock(&test_membarrier_thread_mutex); + + return NULL; +} + +static int test_mt_membarrier(void) +{ + int i; + pthread_t test_thread; + + pthread_create(&test_thread, NULL, + test_membarrier_thread, NULL); + + pthread_mutex_lock(&test_membarrier_thread_mutex); + while (!thread_ready) + pthread_cond_wait(&test_membarrier_thread_cond, + &test_membarrier_thread_mutex); + pthread_mutex_unlock(&test_membarrier_thread_mutex); + + test_membarrier_fail(); + + test_membarrier_success(); + + pthread_mutex_lock(&test_membarrier_thread_mutex); + thread_quit = 1; + pthread_cond_broadcast(&test_membarrier_thread_cond); + pthread_mutex_unlock(&test_membarrier_thread_mutex); + + pthread_join(test_thread, NULL); + + return 0; +} + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_set_plan(13); + + test_membarrier_query(); + + /* Multi-threaded */ + test_mt_membarrier(); + + return ksft_exit_pass(); +} diff --git a/tools/testing/selftests/membarrier/membarrier_test_single_thread.c b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c new file mode 100644 index 00000000..c1c9639 --- /dev/null +++ b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <linux/membarrier.h> +#include <syscall.h> +#include <stdio.h> +#include <errno.h> +#include <string.h> +#include <pthread.h> + +#include "membarrier_test_impl.h" + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_set_plan(13); + + test_membarrier_query(); + + test_membarrier_fail(); + + test_membarrier_success(); + + return ksft_exit_pass(); +}
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
mainline inclusion from mainline-5.4-rc1 commit c6d68c1c4a4d6611fc0f8145d764226571d737ca category: bugfix bugzilla: 28332 CVE: NA
-------------------------------------------------
If there is only a single mm_user for the mm, the private expedited membarrier command can skip the IPIs, because only a single thread is using the mm.
Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-7-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/sched/membarrier.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index dc25e83..5f68c54 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -154,20 +154,21 @@ static int membarrier_private_expedited(int flags) int cpu; bool fallback = false; cpumask_var_t tmpmask; + struct mm_struct *mm = current->mm;
if (flags & MEMBARRIER_FLAG_SYNC_CORE) { if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) return -EINVAL; - if (!(atomic_read(¤t->mm->membarrier_state) & + if (!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) return -EPERM; } else { - if (!(atomic_read(¤t->mm->membarrier_state) & + if (!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)) return -EPERM; }
- if (num_online_cpus() == 1) + if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) return 0;
/* @@ -203,7 +204,7 @@ static int membarrier_private_expedited(int flags) continue; rcu_read_lock(); p = task_rcu_dereference(&cpu_rq(cpu)->curr); - if (p && p->mm == current->mm) { + if (p && p->mm == mm) { if (!fallback) __cpumask_set_cpu(cpu, tmpmask); else
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
mainline inclusion from mainline-5.4-rc1 commit c172e0a3e8e65a4c6fffec5bc4d6de08d6f894f7 category: bugfix bugzilla: 28332 CVE: NA
-------------------------------------------------
Remove the IPI fallback code from membarrier to deal with very infrequent cpumask memory allocation failure. Use GFP_KERNEL rather than GFP_NOWAIT, and relax the blocking guarantees for the expedited membarrier system call commands, allowing it to block if waiting for memory to be made available.
In addition, now -ENOMEM can be returned to user-space if the cpumask memory allocation fails.
Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-8-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/sched/membarrier.c | 61 +++++++++++++++-------------------------------- 1 file changed, 19 insertions(+), 42 deletions(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 5f68c54..436c427 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -75,7 +75,6 @@ void membarrier_exec_mmap(struct mm_struct *mm) static int membarrier_global_expedited(void) { int cpu; - bool fallback = false; cpumask_var_t tmpmask;
if (num_online_cpus() == 1) @@ -87,15 +86,8 @@ static int membarrier_global_expedited(void) */ smp_mb(); /* system call entry is not a mb. */
- /* - * Expedited membarrier commands guarantee that they won't - * block, hence the GFP_NOWAIT allocation flag and fallback - * implementation. - */ - if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) { - /* Fallback for OOM. */ - fallback = true; - } + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) + return -ENOMEM;
cpus_read_lock(); rcu_read_lock(); @@ -126,18 +118,15 @@ static int membarrier_global_expedited(void) if (p->flags & PF_KTHREAD) continue;
- if (!fallback) - __cpumask_set_cpu(cpu, tmpmask); - else - smp_call_function_single(cpu, ipi_mb, NULL, 1); + __cpumask_set_cpu(cpu, tmpmask); } rcu_read_unlock(); - if (!fallback) { - preempt_disable(); - smp_call_function_many(tmpmask, ipi_mb, NULL, 1); - preempt_enable(); - free_cpumask_var(tmpmask); - } + + preempt_disable(); + smp_call_function_many(tmpmask, ipi_mb, NULL, 1); + preempt_enable(); + + free_cpumask_var(tmpmask); cpus_read_unlock();
/* @@ -152,7 +141,6 @@ static int membarrier_global_expedited(void) static int membarrier_private_expedited(int flags) { int cpu; - bool fallback = false; cpumask_var_t tmpmask; struct mm_struct *mm = current->mm;
@@ -177,15 +165,8 @@ static int membarrier_private_expedited(int flags) */ smp_mb(); /* system call entry is not a mb. */
- /* - * Expedited membarrier commands guarantee that they won't - * block, hence the GFP_NOWAIT allocation flag and fallback - * implementation. - */ - if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) { - /* Fallback for OOM. */ - fallback = true; - } + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) + return -ENOMEM;
cpus_read_lock(); rcu_read_lock(); @@ -204,20 +185,16 @@ static int membarrier_private_expedited(int flags) continue; rcu_read_lock(); p = task_rcu_dereference(&cpu_rq(cpu)->curr); - if (p && p->mm == mm) { - if (!fallback) - __cpumask_set_cpu(cpu, tmpmask); - else - smp_call_function_single(cpu, ipi_mb, NULL, 1); - } + if (p && p->mm == mm) + __cpumask_set_cpu(cpu, tmpmask); } rcu_read_unlock(); - if (!fallback) { - preempt_disable(); - smp_call_function_many(tmpmask, ipi_mb, NULL, 1); - preempt_enable(); - free_cpumask_var(tmpmask); - } + + preempt_disable(); + smp_call_function_many(tmpmask, ipi_mb, NULL, 1); + preempt_enable(); + + free_cpumask_var(tmpmask); cpus_read_unlock();
/*
From: Peter Zijlstra peterz@infradead.org
mainline inclusion from mainline-5.4-rc2 commit 73956fc07dd7b25d4a33ab3fdd6247c60d0b237c category: bugfix bugzilla: 28332 CVE: NA
-------------------------------------------------
The following commit:
227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")
got fat fingered by me when merging it with other patches. It meant to move the RCU section out of the for loop but ended up doing it partially, leaving a superfluous rcu_read_lock() inside, causing havok.
Reported-by: Ingo Molnar mingo@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Borislav Petkov bp@alien8.de Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-tip-commits@vger.kernel.org Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load") Link: https://lkml.kernel.org/r/20191001085033.GP4519@hirez.programming.kicks-ass.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/sched/membarrier.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 436c427..c4ea07e 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -183,7 +183,6 @@ static int membarrier_private_expedited(int flags) */ if (cpu == raw_smp_processor_id()) continue; - rcu_read_lock(); p = task_rcu_dereference(&cpu_rq(cpu)->curr); if (p && p->mm == mm) __cpumask_set_cpu(cpu, tmpmask);
From: Cheng Jian cj.chengjian@huawei.com
hulk inclusion category: feature bugzilla: 28332 CVE: NA
-------------------------------------------------
Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/mm_types.h | 20 ++++++++++---------- kernel/sched/sched.h | 11 +++++++---- 2 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 178e9de..bb48449 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -369,16 +369,6 @@ struct mm_struct { unsigned long highest_vm_end; /* highest vma end address */ pgd_t * pgd;
-#ifdef CONFIG_MEMBARRIER - /** - * @membarrier_state: Flags controlling membarrier behavior. - * - * This field is close to @pgd to hopefully fit in the same - * cache-line, which needs to be touched by switch_mm(). - */ - atomic_t membarrier_state; -#endif - /** * @mm_users: The number of users including userspace. * @@ -449,6 +439,16 @@ struct mm_struct {
struct core_state *core_state; /* coredumping support */
+#ifdef CONFIG_MEMBARRIER + /** + * @membarrier_state: Flags controlling membarrier behavior. + * + * This field is close to @pgd to hopefully fit in the same + * cache-line, which needs to be touched by switch_mm(). + */ + atomic_t membarrier_state; +#endif + #ifdef CONFIG_AIO spinlock_t ioctx_lock; struct kioctx_table __rcu *ioctx_table; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2213dac..378bf05 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -843,10 +843,6 @@ struct rq {
atomic_t nr_iowait;
-#ifdef CONFIG_MEMBARRIER - int membarrier_state; -#endif - #ifdef CONFIG_SMP struct root_domain *rd; struct sched_domain *sd; @@ -930,7 +926,14 @@ struct rq { struct cpuidle_state *idle_state; #endif
+#if defined(CONFIG_MEMBARRIER) && !defined(__GENKSYMS__) + union { + int membarrier_state; + long membarrier_state_KABI; + }; +#else KABI_RESERVE(1) +#endif KABI_RESERVE(2) };
From: Cheng Jian cj.chengjian@huawei.com
hulk inclusion category: bugzilla bugzilla: 28332 CVE: NA
-------------------------------------------------
commit 2293e84612ef ("selftests, sched/membarrier: Add multi-threaded test") add a new test for membarrier. Some non-existent functions were used.
ksft_set_plan() was introduced by the following commit : 5821ba969511 ("selftests: Add test plan API to kselftest.h and adjust callers") which was not be merged.
so we get the next error when compile this testcase : membarrier_test_single_thread.c: In function 'main': membarrier_test_single_thread.c:15:2: warning: implicit declaration of function 'ksft_set_plan'; did you mean 'ksft_exit_pass'? [-Wimplicit-function-declaration] ksft_set_plan(13); ^~~~~~~~~~~~~ ksft_exit_pass
Fixes : 2293e84612ef ("selftests, sched/membarrier: Add multi-threaded test") Signed-off-by: Cheng Jian cj.chengjian@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- tools/testing/selftests/membarrier/membarrier_test_multi_thread.c | 1 - tools/testing/selftests/membarrier/membarrier_test_single_thread.c | 1 - 2 files changed, 2 deletions(-)
diff --git a/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c index ac5613e..6ae8678 100644 --- a/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c +++ b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c @@ -62,7 +62,6 @@ static int test_mt_membarrier(void) int main(int argc, char **argv) { ksft_print_header(); - ksft_set_plan(13);
test_membarrier_query();
diff --git a/tools/testing/selftests/membarrier/membarrier_test_single_thread.c b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c index c1c9639..97dd81d 100644 --- a/tools/testing/selftests/membarrier/membarrier_test_single_thread.c +++ b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c @@ -12,7 +12,6 @@ int main(int argc, char **argv) { ksft_print_header(); - ksft_set_plan(13);
test_membarrier_query();