fix hugetlb deadlock.
Mike Kravetz (1): hugetlb: make free_huge_page irq safe
Waiman Long (1): mm/hugetlb: defer freeing of huge pages if in non-task context
fs/hugetlbfs/inode.c | 4 +- mm/hugetlb.c | 124 +++++++++++++++++++++++-------------------- mm/hugetlb_cgroup.c | 8 +-- 3 files changed, 71 insertions(+), 65 deletions(-)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/8205 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/V...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/8205 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/V...
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v5.5-rc5 commit c77c0a8ac4c522638a8242fcb9de9496e3cdbb2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
The following lockdep splat was observed when a certain hugetlbfs test was run:
================================ WARNING: inconsistent lock state 4.18.0-159.el8.x86_64+debug #1 Tainted: G W --------- - - -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/30/0 [HC0[0]:SC1[1]:HE1:SE0] takes: ffffffff9acdc038 (hugetlb_lock){+.?.}, at: free_huge_page+0x36f/0xaa0 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x14f/0x3b0 _raw_spin_lock+0x30/0x70 __nr_hugepages_store_common+0x11b/0xb30 hugetlb_sysctl_handler_common+0x209/0x2d0 proc_sys_call_handler+0x37f/0x450 vfs_write+0x157/0x460 ksys_write+0xb8/0x170 do_syscall_64+0xa5/0x4d0 entry_SYSCALL_64_after_hwframe+0x6a/0xdf irq event stamp: 691296 hardirqs last enabled at (691296): [<ffffffff99bb034b>] _raw_spin_unlock_irqrestore+0x4b/0x60 hardirqs last disabled at (691295): [<ffffffff99bb0ad2>] _raw_spin_lock_irqsave+0x22/0x81 softirqs last enabled at (691284): [<ffffffff97ff0c63>] irq_enter+0xc3/0xe0 softirqs last disabled at (691285): [<ffffffff97ff0ebe>] irq_exit+0x23e/0x2b0
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(hugetlb_lock); <Interrupt> lock(hugetlb_lock);
*** DEADLOCK *** : Call Trace: <IRQ> __lock_acquire+0x146b/0x48c0 lock_acquire+0x14f/0x3b0 _raw_spin_lock+0x30/0x70 free_huge_page+0x36f/0xaa0 bio_check_pages_dirty+0x2fc/0x5c0 clone_endio+0x17f/0x670 [dm_mod] blk_update_request+0x276/0xe50 scsi_end_request+0x7b/0x6a0 scsi_io_completion+0x1c6/0x1570 blk_done_softirq+0x22e/0x350 __do_softirq+0x23d/0xad8 irq_exit+0x23e/0x2b0 do_IRQ+0x11a/0x200 common_interrupt+0xf/0xf </IRQ>
Both the hugetbl_lock and the subpool lock can be acquired in free_huge_page(). One way to solve the problem is to make both locks irq-safe. However, Mike Kravetz had learned that the hugetlb_lock is held for a linear scan of ALL hugetlb pages during a cgroup reparentling operation. So it is just too long to have irq disabled unless we can break hugetbl_lock down into finer-grained locks with shorter lock hold times.
Another alternative is to defer the freeing to a workqueue job. This patch implements the deferred freeing by adding a free_hpage_workfn() work function to do the actual freeing. The free_huge_page() call in a non-task context saves the page to be freed in the hpage_freelist linked list in a lockless manner using the llist APIs.
The generic workqueue is used to process the work, but a dedicated workqueue can be used instead if it is desirable to have the huge page freed ASAP.
Thanks to Kirill Tkhai ktkhai@virtuozzo.com for suggesting the use of llist APIs which simplfy the code.
Link: http://lkml.kernel.org/r/20191217170331.30893-1-longman@redhat.com Signed-off-by: Waiman Long longman@redhat.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Davidlohr Bueso dbueso@suse.de Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Kirill Tkhai ktkhai@virtuozzo.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Matthew Wilcox willy@infradead.org Cc: Andi Kleen ak@linux.intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c [Context conflicts.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/hugetlb.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 04caf77c51d7..41e42d2fcc44 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -33,6 +33,7 @@ #include <linux/delay.h> #include <linux/migrate.h> #include <linux/mm_inline.h> +#include <linux/llist.h>
#include <asm/page.h> #include <asm/pgtable.h> @@ -1386,7 +1387,7 @@ void free_huge_page_to_dhugetlb_pool(struct page *page, bool restore_reserve) } #endif
-void free_huge_page(struct page *page) +static void __free_huge_page(struct page *page) { /* * Can't pass hstate in here because it is called from the @@ -1461,6 +1462,54 @@ void free_huge_page(struct page *page) spin_unlock(&hugetlb_lock); }
+/* + * As free_huge_page() can be called from a non-task context, we have + * to defer the actual freeing in a workqueue to prevent potential + * hugetlb_lock deadlock. + * + * free_hpage_workfn() locklessly retrieves the linked list of pages to + * be freed and frees them one-by-one. As the page->mapping pointer is + * going to be cleared in __free_huge_page() anyway, it is reused as the + * llist_node structure of a lockless linked list of huge pages to be freed. + */ +static LLIST_HEAD(hpage_freelist); + +static void free_hpage_workfn(struct work_struct *work) +{ + struct llist_node *node; + struct page *page; + + node = llist_del_all(&hpage_freelist); + + while (node) { + page = container_of((struct address_space **)node, + struct page, mapping); + node = node->next; + __free_huge_page(page); + } +} +static DECLARE_WORK(free_hpage_work, free_hpage_workfn); + +void free_huge_page(struct page *page) +{ + /* + * Defer freeing if in non-task context to avoid hugetlb_lock deadlock. + */ + if (!in_task()) { + /* + * Only call schedule_work() if hpage_freelist is previously + * empty. Otherwise, schedule_work() had been called but the + * workfn hasn't retrieved the list yet. + */ + if (llist_add((struct llist_node *)&page->mapping, + &hpage_freelist)) + schedule_work(&free_hpage_work); + return; + } + + __free_huge_page(page); +} + static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { INIT_LIST_HEAD(&page->lru);
From: Mike Kravetz mike.kravetz@oracle.com
mainline inclusion from mainline-v5.13-rc1 commit db71ef79b59bb2e78dc4df83d0e4bf6beaa5c82d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context") was added to address the issue of free_huge_page being called from irq context. That commit hands off free_huge_page processing to a workqueue if !in_task. However, this doesn't cover all the cases as pointed out by 0day bot lockdep report [1].
: Possible interrupt unsafe locking scenario: : : CPU0 CPU1 : ---- ---- : lock(hugetlb_lock); : local_irq_disable(); : lock(slock-AF_INET); : lock(hugetlb_lock); : <Interrupt> : lock(slock-AF_INET);
Shakeel has later explained that this is very likely TCP TX zerocopy from hugetlb pages scenario when the networking code drops a last reference to hugetlb page while having IRQ disabled. Hugetlb freeing path doesn't disable IRQ while holding hugetlb_lock so a lock dependency chain can lead to a deadlock.
This commit addresses the issue by doing the following: - Make hugetlb_lock irq safe. This is mostly a simple process of changing spin_*lock calls to spin_*lock_irq* calls. - Make subpool lock irq safe in a similar manner. - Revert the !in_task check and workqueue handoff.
[1] https://lore.kernel.org/linux-mm/000000000000f1c03b05bc43aadc@google.com/
Link: https://lkml.kernel.org/r/20210409205254.242291-8-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: "Aneesh Kumar K . V" aneesh.kumar@linux.ibm.com Cc: Barry Song song.bao.hua@hisilicon.com Cc: David Hildenbrand david@redhat.com Cc: David Rientjes rientjes@google.com Cc: Hillf Danton hdanton@sina.com Cc: HORIGUCHI NAOYA naoya.horiguchi@nec.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Mina Almasry almasrymina@google.com Cc: Peter Xu peterx@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Roman Gushchin guro@fb.com Cc: Shakeel Butt shakeelb@google.com Cc: Waiman Long longman@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c mm/hugetlb_cgroup.c fs/hugetlb/inode.c [Context conflicts. Dynamic Hugetlb feature and hugetlb_checknode() also use hugetlb_lock, convert these use too.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/hugetlbfs/inode.c | 4 +- mm/hugetlb.c | 173 ++++++++++++++++--------------------------- mm/hugetlb_cgroup.c | 8 +- 3 files changed, 71 insertions(+), 114 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 014ee6533e2e..92077383f320 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -131,7 +131,7 @@ static int hugetlb_checknode(struct vm_area_struct *vma, long nr) int ret = 0; struct hstate *h = &default_hstate;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
nid = vma->vm_flags >> CHECKNODE_BITS;
@@ -155,7 +155,7 @@ static int hugetlb_checknode(struct vm_area_struct *vma, long nr) }
err: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 41e42d2fcc44..14b87bd77376 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -105,11 +105,12 @@ static inline void ClearPageHugeFreed(struct page *head) static int hugetlb_acct_memory(struct hstate *h, long delta, struct dhugetlb_pool *hpool);
-static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) +static inline void unlock_or_release_subpool(struct hugepage_subpool *spool, + unsigned long irq_flags) { bool free = (spool->count == 0) && (spool->used_hpages == 0);
- spin_unlock(&spool->lock); + spin_unlock_irqrestore(&spool->lock, irq_flags);
/* If no pages are used, and no other handles to the subpool * remain, give up any reservations mased on minimum size and @@ -148,10 +149,12 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
void hugepage_put_subpool(struct hugepage_subpool *spool) { - spin_lock(&spool->lock); + unsigned long flags; + + spin_lock_irqsave(&spool->lock, flags); BUG_ON(!spool->count); spool->count--; - unlock_or_release_subpool(spool); + unlock_or_release_subpool(spool, flags); }
/* @@ -174,7 +177,7 @@ static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, if (dhugetlb_enabled && hpool) return ret;
- spin_lock(&spool->lock); + spin_lock_irq(&spool->lock);
if (spool->max_hpages != -1) { /* maximum size accounting */ if ((spool->used_hpages + delta) <= spool->max_hpages) @@ -201,7 +204,7 @@ static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, }
unlock_ret: - spin_unlock(&spool->lock); + spin_unlock_irq(&spool->lock); return ret; }
@@ -215,6 +218,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta, struct dhugetlb_pool *hpool) { long ret = delta; + unsigned long flags;
if (!spool) return delta; @@ -223,7 +227,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, if (dhugetlb_enabled && hpool) return ret;
- spin_lock(&spool->lock); + spin_lock_irqsave(&spool->lock, flags);
if (spool->max_hpages != -1) /* maximum size accounting */ spool->used_hpages -= delta; @@ -244,7 +248,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, * If hugetlbfs_put_super couldn't free spool due to an outstanding * quota reference, free it now. */ - unlock_or_release_subpool(spool); + unlock_or_release_subpool(spool, flags);
return ret; } @@ -1387,7 +1391,7 @@ void free_huge_page_to_dhugetlb_pool(struct page *page, bool restore_reserve) } #endif
-static void __free_huge_page(struct page *page) +void free_huge_page(struct page *page) { /* * Can't pass hstate in here because it is called from the @@ -1398,6 +1402,7 @@ static void __free_huge_page(struct page *page) struct hugepage_subpool *spool = (struct hugepage_subpool *)page_private(page); bool restore_reserve; + unsigned long flags;
sp_kmemcg_uncharge_hpage(page); set_page_private(page, 0); @@ -1408,12 +1413,12 @@ static void __free_huge_page(struct page *page) ClearPagePrivate(page);
if (dhugetlb_enabled && PagePool(page)) { - spin_lock(&hugetlb_lock); + spin_lock_irqsave(&hugetlb_lock, flags); clear_page_huge_active(page); list_del(&page->lru); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); - spin_unlock(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); free_huge_page_to_dhugetlb_pool(page, restore_reserve); return; } @@ -1437,7 +1442,7 @@ static void __free_huge_page(struct page *page) restore_reserve = true; }
- spin_lock(&hugetlb_lock); + spin_lock_irqsave(&hugetlb_lock, flags); clear_page_huge_active(page); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); @@ -1459,67 +1464,19 @@ static void __free_huge_page(struct page *page) arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); } - spin_unlock(&hugetlb_lock); -} - -/* - * As free_huge_page() can be called from a non-task context, we have - * to defer the actual freeing in a workqueue to prevent potential - * hugetlb_lock deadlock. - * - * free_hpage_workfn() locklessly retrieves the linked list of pages to - * be freed and frees them one-by-one. As the page->mapping pointer is - * going to be cleared in __free_huge_page() anyway, it is reused as the - * llist_node structure of a lockless linked list of huge pages to be freed. - */ -static LLIST_HEAD(hpage_freelist); - -static void free_hpage_workfn(struct work_struct *work) -{ - struct llist_node *node; - struct page *page; - - node = llist_del_all(&hpage_freelist); - - while (node) { - page = container_of((struct address_space **)node, - struct page, mapping); - node = node->next; - __free_huge_page(page); - } -} -static DECLARE_WORK(free_hpage_work, free_hpage_workfn); - -void free_huge_page(struct page *page) -{ - /* - * Defer freeing if in non-task context to avoid hugetlb_lock deadlock. - */ - if (!in_task()) { - /* - * Only call schedule_work() if hpage_freelist is previously - * empty. Otherwise, schedule_work() had been called but the - * workfn hasn't retrieved the list yet. - */ - if (llist_add((struct llist_node *)&page->mapping, - &hpage_freelist)) - schedule_work(&free_hpage_work); - return; - } - - __free_huge_page(page); + spin_unlock_irqrestore(&hugetlb_lock, flags); }
static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; ClearPageHugeFreed(page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); }
static void prep_compound_gigantic_page(struct page *page, unsigned int order) @@ -1821,7 +1778,7 @@ int dissolve_free_huge_page(struct page *page) if (page_belong_to_dynamic_hugetlb(page)) return -EBUSY;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!PageHuge(page)) { rc = 0; goto out; @@ -1839,7 +1796,7 @@ int dissolve_free_huge_page(struct page *page) * when it is dissolved. */ if (unlikely(!PageHugeFreed(head))) { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); cond_resched();
/* @@ -1869,7 +1826,7 @@ int dissolve_free_huge_page(struct page *page) rc = 0; } out: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return rc; }
@@ -1911,16 +1868,16 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, if (hstate_is_gigantic(h)) return NULL;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) goto out_unlock; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL); if (!page) return NULL;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); /* * We could have raced with the pool size change. * Double check that and simply deallocate the new page @@ -1930,7 +1887,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, */ if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { SetPageHugeTemporary(page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); put_page(page); return NULL; } else { @@ -1939,7 +1896,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, }
out_unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return page; } @@ -1994,10 +1951,10 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid) if (nid != NUMA_NO_NODE) gfp_mask |= __GFP_THISNODE;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL, NULL); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
if (!page) { if (enable_charge_mighp) @@ -2015,18 +1972,18 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, { gfp_t gfp_mask = htlb_alloc_mask(h);
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) { struct page *page;
page = dequeue_huge_page_nodemask(h, gfp_mask, preferred_nid, nmask, NULL); if (page) { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return page; } } - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } @@ -2073,7 +2030,7 @@ static int gather_surplus_pages(struct hstate *h, long delta)
ret = -ENOMEM; retry: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); for (i = 0; i < needed; i++) { page = alloc_surplus_huge_page(h, htlb_alloc_mask(h), NUMA_NO_NODE, NULL); @@ -2090,7 +2047,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * After retaking hugetlb_lock, we need to recalculate 'needed' * because either resv_huge_pages or free_huge_pages may have changed. */ - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); needed = (h->resv_huge_pages + delta) - (h->free_huge_pages + allocated); if (needed > 0) { @@ -2128,12 +2085,12 @@ static int gather_surplus_pages(struct hstate *h, long delta) enqueue_huge_page(h, page); } free: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
/* Free unnecessary surplus pages to the buddy allocator */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) put_page(page); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
return ret; } @@ -2489,18 +2446,18 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, * Use hugetlb_lock to manage the account of * hugetlb cgroup. */ - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); list_add(&page->lru, &h->hugepage_activelist); hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(hstate_vma(vma)), h_cg, page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); goto out; } goto out_uncharge_cgroup; }
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); /* * glb_chg is passed to indicate whether or not a page must be taken * from the global free pool (global change). gbl_chg == 0 indicates @@ -2508,11 +2465,11 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, */ page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg); if (!page) { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); page = alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { SetPagePrivate(page); h->resv_huge_pages--; @@ -2521,7 +2478,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, /* Fall through */ } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); out: set_page_private(page, (unsigned long)spool);
@@ -2830,7 +2787,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, return h->max_huge_pages; }
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
/* * Check for a node specific request. @@ -2874,14 +2831,14 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, * page, free_huge_page will handle it by freeing the page * and reducing the surplus. */ - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
/* yield cpu to avoid soft lockup */ cond_resched();
ret = alloc_pool_huge_page(h, nodes_allowed, node_alloc_noretry); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!ret) goto out;
@@ -2919,7 +2876,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, } out: ret = persistent_huge_pages(h); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
NODEMASK_FREE(node_alloc_noretry);
@@ -3086,9 +3043,9 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj, if (err) return err;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); h->nr_overcommit_huge_pages = input; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return count; } @@ -3399,7 +3356,7 @@ int alloc_hugepage_from_hugetlb(struct dhugetlb_pool *hpool, return -ENOMEM;
spin_lock(&hpool->lock); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages_node[nid] < size) { ret = -ENOMEM; goto out_unlock; @@ -3421,7 +3378,7 @@ int alloc_hugepage_from_hugetlb(struct dhugetlb_pool *hpool, } ret = 0; out_unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); spin_unlock(&hpool->lock); return ret; } @@ -3786,7 +3743,7 @@ static void free_back_hugetlb(struct dhugetlb_pool *hpool) if (!h) return;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); list_for_each_entry_safe(page, page_next, &hpool->dhugetlb_1G_freelists, lru) { nr_pages = 1 << huge_page_order(h); @@ -3813,7 +3770,7 @@ static void free_back_hugetlb(struct dhugetlb_pool *hpool) hpool->free_reserved_1G = 0; hpool->total_reserved_1G = 0; INIT_LIST_HEAD(&hpool->dhugetlb_1G_freelists); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); }
bool free_dhugetlb_pool(struct dhugetlb_pool *hpool) @@ -4583,9 +4540,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write, goto out;
if (write) { - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); h->nr_overcommit_huge_pages = tmp; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); } out: return ret; @@ -4682,7 +4639,7 @@ static int hugetlb_acct_memory(struct hstate *h, long delta, if (dhugetlb_enabled && hpool) return dhugetlb_acct_memory(h, delta, hpool);
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); /* * When cpuset is configured, it breaks the strict hugetlb page * reservation as the accounting is done on a global variable. Such @@ -4715,7 +4672,7 @@ static int hugetlb_acct_memory(struct hstate *h, long delta, return_unused_surplus_pages(h, (unsigned long) -delta);
out: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; }
@@ -6622,7 +6579,7 @@ bool isolate_huge_page(struct page *page, struct list_head *list) { bool ret = true;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!PageHeadHuge(page) || !page_huge_active(page) || !get_page_unless_zero(page)) { ret = false; @@ -6631,17 +6588,17 @@ bool isolate_huge_page(struct page *page, struct list_head *list) clear_page_huge_active(page); list_move_tail(&page->lru, list); unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; }
void putback_active_hugepage(struct page *page) { VM_BUG_ON_PAGE(!PageHead(page), page); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); set_page_huge_active(page); list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); put_page(page); }
@@ -6669,12 +6626,12 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason) SetPageHugeTemporary(oldpage); ClearPageHugeTemporary(newpage);
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->surplus_huge_pages_node[old_nid]) { h->surplus_huge_pages_node[old_nid]--; h->surplus_huge_pages_node[new_nid]++; } - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); } }
@@ -6690,10 +6647,10 @@ static struct page *hugetlb_alloc_hugepage_normal(struct hstate *h, { struct page *page = NULL;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL, NULL); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 7a93e1e439dd..13110d8ca3ea 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -167,11 +167,11 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css)
do { for_each_hstate(h) { - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); list_for_each_entry(page, &h->hugepage_activelist, lru) hugetlb_cgroup_move_parent(idx, h_cg, page);
- spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); idx++; } cond_resched(); @@ -422,14 +422,14 @@ void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) return;
VM_BUG_ON_PAGE(!PageHuge(oldhpage), oldhpage); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); h_cg = hugetlb_cgroup_from_page(oldhpage); set_hugetlb_cgroup(oldhpage, NULL);
/* move the h_cg details to new cgroup */ set_hugetlb_cgroup(newhpage, h_cg); list_move(&newhpage->lru, &h->hugepage_activelist); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return; }