From: Xiaoming Ni nixiaoming@huawei.com
mainline inclusion from mainline-v5.5-rc1 commit 1a50cb80f219c44adb6265f5071b81fc3c1deced category: bugfix bugzilla: NA CVE: NA
---------------------------------------------
[ Upstream commit 1a50cb80f219c44adb6265f5071b81fc3c1deced ]
Registering the same notifier to a hook repeatedly can cause the hook list to form a ring or lose other members of the list.
case1: An infinite loop in notifier_chain_register() can cause soft lockup atomic_notifier_chain_register(&test_notifier_list, &test1); atomic_notifier_chain_register(&test_notifier_list, &test1); atomic_notifier_chain_register(&test_notifier_list, &test2);
case2: An infinite loop in notifier_chain_register() can cause soft lockup atomic_notifier_chain_register(&test_notifier_list, &test1); atomic_notifier_chain_register(&test_notifier_list, &test1); atomic_notifier_call_chain(&test_notifier_list, 0, NULL);
case3: lose other hook test2 atomic_notifier_chain_register(&test_notifier_list, &test1); atomic_notifier_chain_register(&test_notifier_list, &test2); atomic_notifier_chain_register(&test_notifier_list, &test1);
case4: Unregister returns 0, but the hook is still in the linked list, and it is not really registered. If you call notifier_call_chain after ko is unloaded, it will trigger oops.
If the system is configured with softlockup_panic and the same hook is repeatedly registered on the panic_notifier_list, it will cause a loop panic.
Add a check in notifier_chain_register(), intercepting duplicate registrations to avoid infinite loops
Link: http://lkml.kernel.org/r/1568861888-34045-2-git-send-email-nixiaoming@huawei... Signed-off-by: Xiaoming Ni nixiaoming@huawei.com Reviewed-by: Vasily Averin vvs@virtuozzo.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Anna Schumaker anna.schumaker@netapp.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: J. Bruce Fields bfields@fieldses.org Cc: Chuck Lever chuck.lever@oracle.com Cc: David S. Miller davem@davemloft.net Cc: Jeff Layton jlayton@kernel.org Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: Nadia Derbey Nadia.Derbey@bull.net Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Sam Protsenko semen.protsenko@linaro.org Cc: Alan Stern stern@rowland.harvard.edu Cc: Thomas Gleixner tglx@linutronix.de Cc: Trond Myklebust trond.myklebust@hammerspace.com Cc: Viresh Kumar viresh.kumar@linaro.org Cc: Xiaoming Ni nixiaoming@huawei.com Cc: YueHaibing yuehaibing@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ding Tianhong dingtianhong@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/notifier.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/kernel/notifier.c b/kernel/notifier.c index 6196af8a8223..c6de38836f50 100644 --- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -22,6 +22,11 @@ static int notifier_chain_register(struct notifier_block **nl, struct notifier_block *n) { while ((*nl) != NULL) { + if (unlikely((*nl) == n)) { + WARN(1, "double register detected"); + return 0; + } + if (n->priority > (*nl)->priority) break; nl = &((*nl)->next);
From: Michal Hocko mhocko@suse.com
mainline inclusion from mainline-v5.4-rc7 commit 93b3a674485f6a4b8ffff85d1682d5e8b7c51560 category: bugfix bugzilla: NA CVE: NA
-------------------------------------------------
pagetypeinfo_showfree_print is called by zone->lock held in irq mode. This is not really nice because it blocks both any interrupts on that cpu and the page allocator. On large machines this might even trigger the hard lockup detector.
Considering the pagetypeinfo is a debugging tool we do not really need exact numbers here. The primary reason to look at the outuput is to see how pageblocks are spread among different migratetypes and low number of pages is much more interesting therefore putting a bound on the number of pages on the free_list sounds like a reasonable tradeoff.
The new output will simply tell [...] Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648
instead of Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648
The limit has been chosen arbitrary and it is a subject of a future change should there be a need for that.
While we are at it, also drop the zone lock after each free_list iteration which will help with the IRQ and page allocator responsiveness even further as the IRQ lock held time is always bound to those 100k pages.
[akpm@linux-foundation.org: tweak comment text, per David Hildenbrand] Link: http://lkml.kernel.org/r/20191025072610.18526-3-mhocko@kernel.org Signed-off-by: Michal Hocko mhocko@suse.com Suggested-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Waiman Long longman@redhat.com Acked-by: Vlastimil Babka vbabka@suse.cz Acked-by: David Hildenbrand david@redhat.com Acked-by: Rafael Aquini aquini@redhat.com Acked-by: David Rientjes rientjes@google.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jann Horn jannh@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Mel Gorman mgorman@suse.de Cc: Roman Gushchin guro@fb.com Cc: Song Liu songliubraving@fb.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/vmstat.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c index ce81b0a7d018..96028cc96f2f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1378,12 +1378,29 @@ static void pagetypeinfo_showfree_print(struct seq_file *m, unsigned long freecount = 0; struct free_area *area; struct list_head *curr; + bool overflow = false;
area = &(zone->free_area[order]);
- list_for_each(curr, &area->free_list[mtype]) - freecount++; - seq_printf(m, "%6lu ", freecount); + list_for_each(curr, &area->free_list[mtype]) { + /* + * Cap the free_list iteration because it might + * be really large and we are under a spinlock + * so a long time spent here could trigger a + * hard lockup detector. Anyway this is a + * debugging tool so knowing there is a handful + * of pages of this order should be more than + * sufficient. + */ + if (++freecount >= 100000) { + overflow = true; + break; + } + } + seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount); + spin_unlock_irq(&zone->lock); + cond_resched(); + spin_lock_irq(&zone->lock); } seq_putc(m, '\n'); }