fix hugetlb deadlock
Jinjiang Tu (2): Revert "hugetlbfs: fix hugetlbfs_statfs() locking" Revert "hugetlb: make free_huge_page irq safe"
Mike Kravetz (4): hugetlb: create remove_hugetlb_page() to separate functionality hugetlb: call update_and_free_page without hugetlb_lock hugetlb: change free_pool_huge_page to remove_pool_huge_page hugetlb: make free_huge_page irq safe
Mina Almasry (1): hugetlbfs: fix hugetlbfs_statfs() locking
Naoya Horiguchi (1): hugetlb: pass head page to remove_hugetlb_page()
mm/hugetlb.c | 185 ++++++++++++++++++++++++++++++++------------------- 1 file changed, 117 insertions(+), 68 deletions(-)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/8447 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/4...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/8447 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/4...
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR
--------------------------------
The prior patches are not merged, this commit will leads to do heavy things and even schedule with irq disabled. So revert it first.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/hugetlbfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 58e879f089c4..92077383f320 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1099,12 +1099,12 @@ static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf) if (sbinfo->spool) { long free_pages;
- spin_lock_irq(&sbinfo->spool->lock); + spin_lock(&sbinfo->spool->lock); buf->f_blocks = sbinfo->spool->max_hpages; free_pages = sbinfo->spool->max_hpages - sbinfo->spool->used_hpages; buf->f_bavail = buf->f_bfree = free_pages; - spin_unlock_irq(&sbinfo->spool->lock); + spin_unlock(&sbinfo->spool->lock); buf->f_files = sbinfo->max_inodes; buf->f_ffree = sbinfo->free_inodes; }
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR
--------------------------------
The prior patches are not merged, this commit will leads to do heavy things and even schedule with irq disabled. So revert it first.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/hugetlbfs/inode.c | 4 +- mm/hugetlb.c | 173 +++++++++++++++++++++++++++---------------- mm/hugetlb_cgroup.c | 8 +- 3 files changed, 114 insertions(+), 71 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 92077383f320..014ee6533e2e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -131,7 +131,7 @@ static int hugetlb_checknode(struct vm_area_struct *vma, long nr) int ret = 0; struct hstate *h = &default_hstate;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock);
nid = vma->vm_flags >> CHECKNODE_BITS;
@@ -155,7 +155,7 @@ static int hugetlb_checknode(struct vm_area_struct *vma, long nr) }
err: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); return ret; }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 14b87bd77376..41e42d2fcc44 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -105,12 +105,11 @@ static inline void ClearPageHugeFreed(struct page *head) static int hugetlb_acct_memory(struct hstate *h, long delta, struct dhugetlb_pool *hpool);
-static inline void unlock_or_release_subpool(struct hugepage_subpool *spool, - unsigned long irq_flags) +static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { bool free = (spool->count == 0) && (spool->used_hpages == 0);
- spin_unlock_irqrestore(&spool->lock, irq_flags); + spin_unlock(&spool->lock);
/* If no pages are used, and no other handles to the subpool * remain, give up any reservations mased on minimum size and @@ -149,12 +148,10 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
void hugepage_put_subpool(struct hugepage_subpool *spool) { - unsigned long flags; - - spin_lock_irqsave(&spool->lock, flags); + spin_lock(&spool->lock); BUG_ON(!spool->count); spool->count--; - unlock_or_release_subpool(spool, flags); + unlock_or_release_subpool(spool); }
/* @@ -177,7 +174,7 @@ static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, if (dhugetlb_enabled && hpool) return ret;
- spin_lock_irq(&spool->lock); + spin_lock(&spool->lock);
if (spool->max_hpages != -1) { /* maximum size accounting */ if ((spool->used_hpages + delta) <= spool->max_hpages) @@ -204,7 +201,7 @@ static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, }
unlock_ret: - spin_unlock_irq(&spool->lock); + spin_unlock(&spool->lock); return ret; }
@@ -218,7 +215,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta, struct dhugetlb_pool *hpool) { long ret = delta; - unsigned long flags;
if (!spool) return delta; @@ -227,7 +223,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, if (dhugetlb_enabled && hpool) return ret;
- spin_lock_irqsave(&spool->lock, flags); + spin_lock(&spool->lock);
if (spool->max_hpages != -1) /* maximum size accounting */ spool->used_hpages -= delta; @@ -248,7 +244,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, * If hugetlbfs_put_super couldn't free spool due to an outstanding * quota reference, free it now. */ - unlock_or_release_subpool(spool, flags); + unlock_or_release_subpool(spool);
return ret; } @@ -1391,7 +1387,7 @@ void free_huge_page_to_dhugetlb_pool(struct page *page, bool restore_reserve) } #endif
-void free_huge_page(struct page *page) +static void __free_huge_page(struct page *page) { /* * Can't pass hstate in here because it is called from the @@ -1402,7 +1398,6 @@ void free_huge_page(struct page *page) struct hugepage_subpool *spool = (struct hugepage_subpool *)page_private(page); bool restore_reserve; - unsigned long flags;
sp_kmemcg_uncharge_hpage(page); set_page_private(page, 0); @@ -1413,12 +1408,12 @@ void free_huge_page(struct page *page) ClearPagePrivate(page);
if (dhugetlb_enabled && PagePool(page)) { - spin_lock_irqsave(&hugetlb_lock, flags); + spin_lock(&hugetlb_lock); clear_page_huge_active(page); list_del(&page->lru); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); - spin_unlock_irqrestore(&hugetlb_lock, flags); + spin_unlock(&hugetlb_lock); free_huge_page_to_dhugetlb_pool(page, restore_reserve); return; } @@ -1442,7 +1437,7 @@ void free_huge_page(struct page *page) restore_reserve = true; }
- spin_lock_irqsave(&hugetlb_lock, flags); + spin_lock(&hugetlb_lock); clear_page_huge_active(page); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); @@ -1464,19 +1459,67 @@ void free_huge_page(struct page *page) arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); } - spin_unlock_irqrestore(&hugetlb_lock, flags); + spin_unlock(&hugetlb_lock); +} + +/* + * As free_huge_page() can be called from a non-task context, we have + * to defer the actual freeing in a workqueue to prevent potential + * hugetlb_lock deadlock. + * + * free_hpage_workfn() locklessly retrieves the linked list of pages to + * be freed and frees them one-by-one. As the page->mapping pointer is + * going to be cleared in __free_huge_page() anyway, it is reused as the + * llist_node structure of a lockless linked list of huge pages to be freed. + */ +static LLIST_HEAD(hpage_freelist); + +static void free_hpage_workfn(struct work_struct *work) +{ + struct llist_node *node; + struct page *page; + + node = llist_del_all(&hpage_freelist); + + while (node) { + page = container_of((struct address_space **)node, + struct page, mapping); + node = node->next; + __free_huge_page(page); + } +} +static DECLARE_WORK(free_hpage_work, free_hpage_workfn); + +void free_huge_page(struct page *page) +{ + /* + * Defer freeing if in non-task context to avoid hugetlb_lock deadlock. + */ + if (!in_task()) { + /* + * Only call schedule_work() if hpage_freelist is previously + * empty. Otherwise, schedule_work() had been called but the + * workfn hasn't retrieved the list yet. + */ + if (llist_add((struct llist_node *)&page->mapping, + &hpage_freelist)) + schedule_work(&free_hpage_work); + return; + } + + __free_huge_page(page); }
static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; ClearPageHugeFreed(page); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); }
static void prep_compound_gigantic_page(struct page *page, unsigned int order) @@ -1778,7 +1821,7 @@ int dissolve_free_huge_page(struct page *page) if (page_belong_to_dynamic_hugetlb(page)) return -EBUSY;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (!PageHuge(page)) { rc = 0; goto out; @@ -1796,7 +1839,7 @@ int dissolve_free_huge_page(struct page *page) * when it is dissolved. */ if (unlikely(!PageHugeFreed(head))) { - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); cond_resched();
/* @@ -1826,7 +1869,7 @@ int dissolve_free_huge_page(struct page *page) rc = 0; } out: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); return rc; }
@@ -1868,16 +1911,16 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, if (hstate_is_gigantic(h)) return NULL;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) goto out_unlock; - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL); if (!page) return NULL;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); /* * We could have raced with the pool size change. * Double check that and simply deallocate the new page @@ -1887,7 +1930,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, */ if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { SetPageHugeTemporary(page); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); put_page(page); return NULL; } else { @@ -1896,7 +1939,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, }
out_unlock: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
return page; } @@ -1951,10 +1994,10 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid) if (nid != NUMA_NO_NODE) gfp_mask |= __GFP_THISNODE;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL, NULL); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
if (!page) { if (enable_charge_mighp) @@ -1972,18 +2015,18 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, { gfp_t gfp_mask = htlb_alloc_mask(h);
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) { struct page *page;
page = dequeue_huge_page_nodemask(h, gfp_mask, preferred_nid, nmask, NULL); if (page) { - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); return page; } } - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } @@ -2030,7 +2073,7 @@ static int gather_surplus_pages(struct hstate *h, long delta)
ret = -ENOMEM; retry: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); for (i = 0; i < needed; i++) { page = alloc_surplus_huge_page(h, htlb_alloc_mask(h), NUMA_NO_NODE, NULL); @@ -2047,7 +2090,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * After retaking hugetlb_lock, we need to recalculate 'needed' * because either resv_huge_pages or free_huge_pages may have changed. */ - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); needed = (h->resv_huge_pages + delta) - (h->free_huge_pages + allocated); if (needed > 0) { @@ -2085,12 +2128,12 @@ static int gather_surplus_pages(struct hstate *h, long delta) enqueue_huge_page(h, page); } free: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
/* Free unnecessary surplus pages to the buddy allocator */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) put_page(page); - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock);
return ret; } @@ -2446,18 +2489,18 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, * Use hugetlb_lock to manage the account of * hugetlb cgroup. */ - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); list_add(&page->lru, &h->hugepage_activelist); hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(hstate_vma(vma)), h_cg, page); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); goto out; } goto out_uncharge_cgroup; }
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); /* * glb_chg is passed to indicate whether or not a page must be taken * from the global free pool (global change). gbl_chg == 0 indicates @@ -2465,11 +2508,11 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, */ page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg); if (!page) { - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); page = alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { SetPagePrivate(page); h->resv_huge_pages--; @@ -2478,7 +2521,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, /* Fall through */ } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); out: set_page_private(page, (unsigned long)spool);
@@ -2787,7 +2830,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, return h->max_huge_pages; }
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock);
/* * Check for a node specific request. @@ -2831,14 +2874,14 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, * page, free_huge_page will handle it by freeing the page * and reducing the surplus. */ - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
/* yield cpu to avoid soft lockup */ cond_resched();
ret = alloc_pool_huge_page(h, nodes_allowed, node_alloc_noretry); - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (!ret) goto out;
@@ -2876,7 +2919,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, } out: ret = persistent_huge_pages(h); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
NODEMASK_FREE(node_alloc_noretry);
@@ -3043,9 +3086,9 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj, if (err) return err;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); h->nr_overcommit_huge_pages = input; - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
return count; } @@ -3356,7 +3399,7 @@ int alloc_hugepage_from_hugetlb(struct dhugetlb_pool *hpool, return -ENOMEM;
spin_lock(&hpool->lock); - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (h->free_huge_pages_node[nid] < size) { ret = -ENOMEM; goto out_unlock; @@ -3378,7 +3421,7 @@ int alloc_hugepage_from_hugetlb(struct dhugetlb_pool *hpool, } ret = 0; out_unlock: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); spin_unlock(&hpool->lock); return ret; } @@ -3743,7 +3786,7 @@ static void free_back_hugetlb(struct dhugetlb_pool *hpool) if (!h) return;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); list_for_each_entry_safe(page, page_next, &hpool->dhugetlb_1G_freelists, lru) { nr_pages = 1 << huge_page_order(h); @@ -3770,7 +3813,7 @@ static void free_back_hugetlb(struct dhugetlb_pool *hpool) hpool->free_reserved_1G = 0; hpool->total_reserved_1G = 0; INIT_LIST_HEAD(&hpool->dhugetlb_1G_freelists); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); }
bool free_dhugetlb_pool(struct dhugetlb_pool *hpool) @@ -4540,9 +4583,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write, goto out;
if (write) { - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); h->nr_overcommit_huge_pages = tmp; - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); } out: return ret; @@ -4639,7 +4682,7 @@ static int hugetlb_acct_memory(struct hstate *h, long delta, if (dhugetlb_enabled && hpool) return dhugetlb_acct_memory(h, delta, hpool);
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); /* * When cpuset is configured, it breaks the strict hugetlb page * reservation as the accounting is done on a global variable. Such @@ -4672,7 +4715,7 @@ static int hugetlb_acct_memory(struct hstate *h, long delta, return_unused_surplus_pages(h, (unsigned long) -delta);
out: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); return ret; }
@@ -6579,7 +6622,7 @@ bool isolate_huge_page(struct page *page, struct list_head *list) { bool ret = true;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (!PageHeadHuge(page) || !page_huge_active(page) || !get_page_unless_zero(page)) { ret = false; @@ -6588,17 +6631,17 @@ bool isolate_huge_page(struct page *page, struct list_head *list) clear_page_huge_active(page); list_move_tail(&page->lru, list); unlock: - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); return ret; }
void putback_active_hugepage(struct page *page) { VM_BUG_ON_PAGE(!PageHead(page), page); - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); set_page_huge_active(page); list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); put_page(page); }
@@ -6626,12 +6669,12 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason) SetPageHugeTemporary(oldpage); ClearPageHugeTemporary(newpage);
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[old_nid]) { h->surplus_huge_pages_node[old_nid]--; h->surplus_huge_pages_node[new_nid]++; } - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); } }
@@ -6647,10 +6690,10 @@ static struct page *hugetlb_alloc_hugepage_normal(struct hstate *h, { struct page *page = NULL;
- spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL, NULL); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock);
return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 13110d8ca3ea..7a93e1e439dd 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -167,11 +167,11 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css)
do { for_each_hstate(h) { - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); list_for_each_entry(page, &h->hugepage_activelist, lru) hugetlb_cgroup_move_parent(idx, h_cg, page);
- spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); idx++; } cond_resched(); @@ -422,14 +422,14 @@ void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) return;
VM_BUG_ON_PAGE(!PageHuge(oldhpage), oldhpage); - spin_lock_irq(&hugetlb_lock); + spin_lock(&hugetlb_lock); h_cg = hugetlb_cgroup_from_page(oldhpage); set_hugetlb_cgroup(oldhpage, NULL);
/* move the h_cg details to new cgroup */ set_hugetlb_cgroup(newhpage, h_cg); list_move(&newhpage->lru, &h->hugepage_activelist); - spin_unlock_irq(&hugetlb_lock); + spin_unlock(&hugetlb_lock); return; }
From: Mike Kravetz mike.kravetz@oracle.com
mainline inclusion from mainline-v5.13-rc1 commit 6eb4e88a6d27022ea8aff424d47a0a5dfc9fcb34 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
The new remove_hugetlb_page() routine is designed to remove a hugetlb page from hugetlbfs processing. It will remove the page from the active or free list, update global counters and set the compound page destructor to NULL so that PageHuge() will return false for the 'page'. After this call, the 'page' can be treated as a normal compound page or a collection of base size pages.
update_and_free_page no longer decrements h->nr_huge_pages{_node} as this is performed in remove_hugetlb_page. The only functionality performed by update_and_free_page is to free the base pages to the lower level allocators.
update_and_free_page is typically called after remove_hugetlb_page.
remove_hugetlb_page is to be called with the hugetlb_lock held.
Creating this routine and separating functionality is in preparation for restructuring code to reduce lock hold times. This commit should not introduce any changes to functionality.
Link: https://lkml.kernel.org/r/20210409205254.242291-5-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: "Aneesh Kumar K . V" aneesh.kumar@linux.ibm.com Cc: Barry Song song.bao.hua@hisilicon.com Cc: David Hildenbrand david@redhat.com Cc: David Rientjes rientjes@google.com Cc: Hillf Danton hdanton@sina.com Cc: HORIGUCHI NAOYA naoya.horiguchi@nec.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Mina Almasry almasrymina@google.com Cc: Peter Xu peterx@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Roman Gushchin guro@fb.com Cc: Shakeel Butt shakeelb@google.com Cc: Waiman Long longman@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c [Context Conflicts.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/hugetlb.c | 83 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 51 insertions(+), 32 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 41e42d2fcc44..560e3c6a1a48 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1232,24 +1232,33 @@ static inline void destroy_compound_gigantic_page(struct page *page, unsigned int order) { } #endif
-static void update_and_free_page(struct hstate *h, struct page *page) +/* + * Remove hugetlb page from lists, and update dtor so that page appears + * as just a compound page. A reference is held on the page. + * + * Must be called with hugetlb lock held. + */ +static void remove_hugetlb_page(struct hstate *h, struct page *page, + bool adjust_surplus) { - int i; - struct page *subpage = page; + int nid = page_to_nid(page); + + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
if (hstate_is_gigantic(h) && !gigantic_page_supported()) return;
- h->nr_huge_pages--; - h->nr_huge_pages_node[page_to_nid(page)]--; - for (i = 0; i < pages_per_huge_page(h); - i++, subpage = mem_map_next(subpage, page, i)) { - subpage->flags &= ~(1 << PG_locked | 1 << PG_error | - 1 << PG_referenced | 1 << PG_dirty | - 1 << PG_active | 1 << PG_private | - 1 << PG_writeback); + list_del(&page->lru); + + if (PageHugeFreed(page)) { + h->free_huge_pages--; + h->free_huge_pages_node[nid]--; } - VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); + if (adjust_surplus) { + h->surplus_huge_pages--; + h->surplus_huge_pages_node[nid]--; + } + /* * Very subtle * @@ -1268,12 +1277,35 @@ static void update_and_free_page(struct hstate *h, struct page *page) * after update_and_free_page is called. */ set_page_refcounted(page); - if (hstate_is_gigantic(h)) { + if (hstate_is_gigantic(h)) set_compound_page_dtor(page, NULL_COMPOUND_DTOR); + else + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); + + h->nr_huge_pages--; + h->nr_huge_pages_node[nid]--; +} + +static void update_and_free_page(struct hstate *h, struct page *page) +{ + int i; + struct page *subpage = page; + + if (hstate_is_gigantic(h) && !gigantic_page_supported()) + return; + + for (i = 0; i < pages_per_huge_page(h); + i++, subpage = mem_map_next(subpage, page, i)) { + subpage->flags &= ~(1 << PG_locked | 1 << PG_error | + 1 << PG_referenced | 1 << PG_dirty | + 1 << PG_active | 1 << PG_private | + 1 << PG_writeback); + } + + if (hstate_is_gigantic(h)) { destroy_compound_gigantic_page(page, huge_page_order(h)); free_gigantic_page(page, huge_page_order(h)); } else { - set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); __free_pages(page, huge_page_order(h)); } } @@ -1446,15 +1478,13 @@ static void __free_huge_page(struct page *page)
if (PageHugeTemporary(page)) { sp_memcg_uncharge_hpage(page); - list_del(&page->lru); ClearPageHugeTemporary(page); + remove_hugetlb_page(h, page, false); update_and_free_page(h, page); } else if (h->surplus_huge_pages_node[nid]) { /* remove the page from active list */ - list_del(&page->lru); + remove_hugetlb_page(h, page, true); update_and_free_page(h, page); - h->surplus_huge_pages--; - h->surplus_huge_pages_node[nid]--; } else { arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); @@ -1782,13 +1812,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, struct page *page = list_entry(h->hugepage_freelists[node].next, struct page, lru); - list_del(&page->lru); - h->free_huge_pages--; - h->free_huge_pages_node[node]--; - if (acct_surplus) { - h->surplus_huge_pages--; - h->surplus_huge_pages_node[node]--; - } + remove_hugetlb_page(h, page, acct_surplus); update_and_free_page(h, page); ret = 1; break; @@ -1830,7 +1854,6 @@ int dissolve_free_huge_page(struct page *page) if (!page_count(page)) { struct page *head = compound_head(page); struct hstate *h = page_hstate(head); - int nid = page_to_nid(head); if (h->free_huge_pages - h->resv_huge_pages == 0) goto out;
@@ -1861,9 +1884,7 @@ int dissolve_free_huge_page(struct page *page) SetPageHWPoison(page); ClearPageHWPoison(head); } - list_del(&head->lru); - h->free_huge_pages--; - h->free_huge_pages_node[nid]--; + remove_hugetlb_page(h, page, false); h->max_huge_pages--; update_and_free_page(h, head); rc = 0; @@ -2762,10 +2783,8 @@ static void try_to_free_low(struct hstate *h, unsigned long count, return; if (PageHighMem(page)) continue; - list_del(&page->lru); + remove_hugetlb_page(h, page, false); update_and_free_page(h, page); - h->free_huge_pages--; - h->free_huge_pages_node[page_to_nid(page)]--; } } }
From: Mike Kravetz mike.kravetz@oracle.com
mainline inclusion from mainline-v5.13-rc1 commit 1121828a0c213caa55ddd5ee23ee78e99cbdd33e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
With the introduction of remove_hugetlb_page(), there is no need for update_and_free_page to hold the hugetlb lock. Change all callers to drop the lock before calling.
With additional code modifications, this will allow loops which decrease the huge page pool to drop the hugetlb_lock with each page to reduce long hold times.
The ugly unlock/lock cycle in free_pool_huge_page will be removed in a subsequent patch which restructures free_pool_huge_page.
Link: https://lkml.kernel.org/r/20210409205254.242291-6-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: "Aneesh Kumar K . V" aneesh.kumar@linux.ibm.com Cc: Barry Song song.bao.hua@hisilicon.com Cc: David Hildenbrand david@redhat.com Cc: David Rientjes rientjes@google.com Cc: Hillf Danton hdanton@sina.com Cc: HORIGUCHI NAOYA naoya.horiguchi@nec.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Mina Almasry almasrymina@google.com Cc: Peter Xu peterx@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Roman Gushchin guro@fb.com Cc: Shakeel Butt shakeelb@google.com Cc: Waiman Long longman@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c [Context conflicts.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/hugetlb.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 560e3c6a1a48..e2c2dfc3f241 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1480,16 +1480,18 @@ static void __free_huge_page(struct page *page) sp_memcg_uncharge_hpage(page); ClearPageHugeTemporary(page); remove_hugetlb_page(h, page, false); + spin_unlock(&hugetlb_lock); update_and_free_page(h, page); } else if (h->surplus_huge_pages_node[nid]) { /* remove the page from active list */ remove_hugetlb_page(h, page, true); + spin_unlock(&hugetlb_lock); update_and_free_page(h, page); } else { arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); + spin_unlock(&hugetlb_lock); } - spin_unlock(&hugetlb_lock); }
/* @@ -1813,7 +1815,13 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, list_entry(h->hugepage_freelists[node].next, struct page, lru); remove_hugetlb_page(h, page, acct_surplus); + /* + * unlock/lock around update_and_free_page is temporary + * and will be removed with subsequent patch. + */ + spin_unlock(&hugetlb_lock); update_and_free_page(h, page); + spin_lock(&hugetlb_lock); ret = 1; break; } @@ -1886,8 +1894,9 @@ int dissolve_free_huge_page(struct page *page) } remove_hugetlb_page(h, page, false); h->max_huge_pages--; + spin_unlock(&hugetlb_lock); update_and_free_page(h, head); - rc = 0; + return 0; } out: spin_unlock(&hugetlb_lock); @@ -2771,22 +2780,34 @@ static void try_to_free_low(struct hstate *h, unsigned long count, nodemask_t *nodes_allowed) { int i; + struct page *page, *next; + LIST_HEAD(page_list);
if (hstate_is_gigantic(h)) return;
+ /* + * Collect pages to be freed on a list, and free after dropping lock + */ for_each_node_mask(i, *nodes_allowed) { - struct page *page, *next; struct list_head *freel = &h->hugepage_freelists[i]; list_for_each_entry_safe(page, next, freel, lru) { if (count >= h->nr_huge_pages) - return; + goto out; if (PageHighMem(page)) continue; remove_hugetlb_page(h, page, false); - update_and_free_page(h, page); + list_add(&page->lru, &page_list); } } + +out: + spin_unlock(&hugetlb_lock); + list_for_each_entry_safe(page, next, &page_list, lru) { + update_and_free_page(h, page); + cond_resched(); + } + spin_lock(&hugetlb_lock); } #else static inline void try_to_free_low(struct hstate *h, unsigned long count,
From: Mike Kravetz mike.kravetz@oracle.com
mainline inclusion from mainline-v5.13-rc1 commit 10c6ec49802b1779c01fc029cfd92ea20ae33c06 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
free_pool_huge_page was called with hugetlb_lock held. It would remove a hugetlb page, and then free the corresponding pages to the lower level allocators such as buddy. free_pool_huge_page was called in a loop to remove hugetlb pages and these loops could hold the hugetlb_lock for a considerable time.
Create new routine remove_pool_huge_page to replace free_pool_huge_page. remove_pool_huge_page will remove the hugetlb page, and it must be called with the hugetlb_lock held. It will return the removed page and it is the responsibility of the caller to free the page to the lower level allocators. The hugetlb_lock is dropped before freeing to these allocators which results in shorter lock hold times.
Add new helper routine to call update_and_free_page for a list of pages.
Note: Some changes to the routine return_unused_surplus_pages are in need of explanation. Commit e5bbc8a6c992 ("mm/hugetlb.c: fix reservation race when freeing surplus pages") modified this routine to address a race which could occur when dropping the hugetlb_lock in the loop that removes pool pages. Accounting changes introduced in that commit were subtle and took some thought to understand. This commit removes the cond_resched_lock() and the potential race. Therefore, remove the subtle code and restore the more straight forward accounting effectively reverting the commit.
Link: https://lkml.kernel.org/r/20210409205254.242291-7-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Muchun Song songmuchun@bytedance.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: "Aneesh Kumar K . V" aneesh.kumar@linux.ibm.com Cc: Barry Song song.bao.hua@hisilicon.com Cc: David Hildenbrand david@redhat.com Cc: David Rientjes rientjes@google.com Cc: Hillf Danton hdanton@sina.com Cc: HORIGUCHI NAOYA naoya.horiguchi@nec.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Mina Almasry almasrymina@google.com Cc: Peter Xu peterx@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Roman Gushchin guro@fb.com Cc: Shakeel Butt shakeelb@google.com Cc: Waiman Long longman@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c [Context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/hugetlb.c | 93 ++++++++++++++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 42 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e2c2dfc3f241..0d8d9ab65d5a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1087,7 +1087,7 @@ static int hstate_next_node_to_alloc(struct hstate *h, }
/* - * helper for free_pool_huge_page() - return the previously saved + * helper for remove_pool_huge_page() - return the previously saved * node ["this node"] from which to free a huge page. Advance the * next node id whether or not we find a free huge page to free so * that the next attempt to free addresses the next node. @@ -1310,6 +1310,16 @@ static void update_and_free_page(struct hstate *h, struct page *page) } }
+static void update_and_free_pages_bulk(struct hstate *h, struct list_head *list) +{ + struct page *page, *t_page; + + list_for_each_entry_safe(page, t_page, list, lru) { + update_and_free_page(h, page); + cond_resched(); + } +} + struct hstate *size_to_hstate(unsigned long size) { struct hstate *h; @@ -1793,16 +1803,18 @@ static int alloc_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, }
/* - * Free huge page from pool from next node to free. - * Attempt to keep persistent huge pages more or less - * balanced over allowed nodes. + * Remove huge page from pool from next node to free. Attempt to keep + * persistent huge pages more or less balanced over allowed nodes. + * This routine only 'removes' the hugetlb page. The caller must make + * an additional call to free the page to low level allocators. * Called with hugetlb_lock locked. */ -static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, - bool acct_surplus) +static struct page *remove_pool_huge_page(struct hstate *h, + nodemask_t *nodes_allowed, + bool acct_surplus) { int nr_nodes, node; - int ret = 0; + struct page *page = NULL;
for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) { /* @@ -1811,23 +1823,14 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, */ if ((!acct_surplus || h->surplus_huge_pages_node[node]) && !list_empty(&h->hugepage_freelists[node])) { - struct page *page = - list_entry(h->hugepage_freelists[node].next, + page = list_entry(h->hugepage_freelists[node].next, struct page, lru); remove_hugetlb_page(h, page, acct_surplus); - /* - * unlock/lock around update_and_free_page is temporary - * and will be removed with subsequent patch. - */ - spin_unlock(&hugetlb_lock); - update_and_free_page(h, page); - spin_lock(&hugetlb_lock); - ret = 1; break; } }
- return ret; + return page; }
/* @@ -2175,17 +2178,16 @@ static int gather_surplus_pages(struct hstate *h, long delta) * to the associated reservation map. * 2) Free any unused surplus pages that may have been allocated to satisfy * the reservation. As many as unused_resv_pages may be freed. - * - * Called with hugetlb_lock held. However, the lock could be dropped (and - * reacquired) during calls to cond_resched_lock. Whenever dropping the lock, - * we must make sure nobody else can claim pages we are in the process of - * freeing. Do this by ensuring resv_huge_page always is greater than the - * number of huge pages we plan to free when dropping the lock. */ static void return_unused_surplus_pages(struct hstate *h, unsigned long unused_resv_pages) { unsigned long nr_pages; + struct page *page; + LIST_HEAD(page_list); + + /* Uncommit the reservation */ + h->resv_huge_pages -= unused_resv_pages;
/* Cannot return gigantic pages currently */ if (hstate_is_gigantic(h)) @@ -2202,24 +2204,21 @@ static void return_unused_surplus_pages(struct hstate *h, * evenly across all nodes with memory. Iterate across these nodes * until we can no longer free unreserved surplus pages. This occurs * when the nodes with surplus pages have no free pages. - * free_pool_huge_page() will balance the the freed pages across the + * remove_pool_huge_page() will balance the the freed pages across the * on-line nodes with memory and will handle the hstate accounting. - * - * Note that we decrement resv_huge_pages as we free the pages. If - * we drop the lock, resv_huge_pages will still be sufficiently large - * to cover subsequent pages we may free. */ while (nr_pages--) { - h->resv_huge_pages--; - unused_resv_pages--; - if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1)) + page = remove_pool_huge_page(h, &node_states[N_MEMORY], 1); + if (!page) goto out; - cond_resched_lock(&hugetlb_lock); + + list_add(&page->lru, &page_list); }
out: - /* Fully uncommit the reservation */ - h->resv_huge_pages -= unused_resv_pages; + spin_unlock(&hugetlb_lock); + update_and_free_pages_bulk(h, &page_list); + spin_lock(&hugetlb_lock); }
@@ -2780,7 +2779,6 @@ static void try_to_free_low(struct hstate *h, unsigned long count, nodemask_t *nodes_allowed) { int i; - struct page *page, *next; LIST_HEAD(page_list);
if (hstate_is_gigantic(h)) @@ -2790,6 +2788,7 @@ static void try_to_free_low(struct hstate *h, unsigned long count, * Collect pages to be freed on a list, and free after dropping lock */ for_each_node_mask(i, *nodes_allowed) { + struct page *page, *next; struct list_head *freel = &h->hugepage_freelists[i]; list_for_each_entry_safe(page, next, freel, lru) { if (count >= h->nr_huge_pages) @@ -2803,10 +2802,7 @@ static void try_to_free_low(struct hstate *h, unsigned long count,
out: spin_unlock(&hugetlb_lock); - list_for_each_entry_safe(page, next, &page_list, lru) { - update_and_free_page(h, page); - cond_resched(); - } + update_and_free_pages_bulk(h, &page_list); spin_lock(&hugetlb_lock); } #else @@ -2853,6 +2849,8 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { unsigned long min_count, ret; + struct page *page; + LIST_HEAD(page_list); NODEMASK_ALLOC(nodemask_t, node_alloc_noretry, GFP_KERNEL);
/* @@ -2948,11 +2946,22 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages; min_count = max(count, min_count); try_to_free_low(h, min_count, nodes_allowed); + + /* + * Collect pages to be removed on list without dropping lock + */ while (min_count < persistent_huge_pages(h)) { - if (!free_pool_huge_page(h, nodes_allowed, 0)) + page = remove_pool_huge_page(h, nodes_allowed, 0); + if (!page) break; - cond_resched_lock(&hugetlb_lock); + + list_add(&page->lru, &page_list); } + /* free the pages after dropping lock */ + spin_unlock(&hugetlb_lock); + update_and_free_pages_bulk(h, &page_list); + spin_lock(&hugetlb_lock); + while (count < persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, 1)) break;
From: Mike Kravetz mike.kravetz@oracle.com
mainline inclusion from mainline-v5.13-rc1 commit db71ef79b59bb2e78dc4df83d0e4bf6beaa5c82d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context") was added to address the issue of free_huge_page being called from irq context. That commit hands off free_huge_page processing to a workqueue if !in_task. However, this doesn't cover all the cases as pointed out by 0day bot lockdep report [1].
: Possible interrupt unsafe locking scenario: : : CPU0 CPU1 : ---- ---- : lock(hugetlb_lock); : local_irq_disable(); : lock(slock-AF_INET); : lock(hugetlb_lock); : <Interrupt> : lock(slock-AF_INET);
Shakeel has later explained that this is very likely TCP TX zerocopy from hugetlb pages scenario when the networking code drops a last reference to hugetlb page while having IRQ disabled. Hugetlb freeing path doesn't disable IRQ while holding hugetlb_lock so a lock dependency chain can lead to a deadlock.
This commit addresses the issue by doing the following: - Make hugetlb_lock irq safe. This is mostly a simple process of changing spin_*lock calls to spin_*lock_irq* calls. - Make subpool lock irq safe in a similar manner. - Revert the !in_task check and workqueue handoff.
[1] https://lore.kernel.org/linux-mm/000000000000f1c03b05bc43aadc@google.com/
Link: https://lkml.kernel.org/r/20210409205254.242291-8-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: "Aneesh Kumar K . V" aneesh.kumar@linux.ibm.com Cc: Barry Song song.bao.hua@hisilicon.com Cc: David Hildenbrand david@redhat.com Cc: David Rientjes rientjes@google.com Cc: Hillf Danton hdanton@sina.com Cc: HORIGUCHI NAOYA naoya.horiguchi@nec.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Mina Almasry almasrymina@google.com Cc: Peter Xu peterx@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Roman Gushchin guro@fb.com Cc: Shakeel Butt shakeelb@google.com Cc: Waiman Long longman@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c mm/hugetlb_cgroup.c fs/hugetlb/inode.c [Context conflicts. Dynamic Hugetlb feature and hugetlb_checknode() also use hugetlb_lock, convert these use too.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/hugetlbfs/inode.c | 4 +- mm/hugetlb.c | 191 +++++++++++++++++-------------------------- mm/hugetlb_cgroup.c | 8 +- 3 files changed, 80 insertions(+), 123 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 014ee6533e2e..92077383f320 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -131,7 +131,7 @@ static int hugetlb_checknode(struct vm_area_struct *vma, long nr) int ret = 0; struct hstate *h = &default_hstate;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
nid = vma->vm_flags >> CHECKNODE_BITS;
@@ -155,7 +155,7 @@ static int hugetlb_checknode(struct vm_area_struct *vma, long nr) }
err: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d8d9ab65d5a..2844458b5fc5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -105,11 +105,12 @@ static inline void ClearPageHugeFreed(struct page *head) static int hugetlb_acct_memory(struct hstate *h, long delta, struct dhugetlb_pool *hpool);
-static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) +static inline void unlock_or_release_subpool(struct hugepage_subpool *spool, + unsigned long irq_flags) { bool free = (spool->count == 0) && (spool->used_hpages == 0);
- spin_unlock(&spool->lock); + spin_unlock_irqrestore(&spool->lock, irq_flags);
/* If no pages are used, and no other handles to the subpool * remain, give up any reservations mased on minimum size and @@ -148,10 +149,12 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
void hugepage_put_subpool(struct hugepage_subpool *spool) { - spin_lock(&spool->lock); + unsigned long flags; + + spin_lock_irqsave(&spool->lock, flags); BUG_ON(!spool->count); spool->count--; - unlock_or_release_subpool(spool); + unlock_or_release_subpool(spool, flags); }
/* @@ -174,7 +177,7 @@ static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, if (dhugetlb_enabled && hpool) return ret;
- spin_lock(&spool->lock); + spin_lock_irq(&spool->lock);
if (spool->max_hpages != -1) { /* maximum size accounting */ if ((spool->used_hpages + delta) <= spool->max_hpages) @@ -201,7 +204,7 @@ static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, }
unlock_ret: - spin_unlock(&spool->lock); + spin_unlock_irq(&spool->lock); return ret; }
@@ -215,6 +218,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta, struct dhugetlb_pool *hpool) { long ret = delta; + unsigned long flags;
if (!spool) return delta; @@ -223,7 +227,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, if (dhugetlb_enabled && hpool) return ret;
- spin_lock(&spool->lock); + spin_lock_irqsave(&spool->lock, flags);
if (spool->max_hpages != -1) /* maximum size accounting */ spool->used_hpages -= delta; @@ -244,7 +248,7 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, * If hugetlbfs_put_super couldn't free spool due to an outstanding * quota reference, free it now. */ - unlock_or_release_subpool(spool); + unlock_or_release_subpool(spool, flags);
return ret; } @@ -1429,7 +1433,7 @@ void free_huge_page_to_dhugetlb_pool(struct page *page, bool restore_reserve) } #endif
-static void __free_huge_page(struct page *page) +void free_huge_page(struct page *page) { /* * Can't pass hstate in here because it is called from the @@ -1440,6 +1444,7 @@ static void __free_huge_page(struct page *page) struct hugepage_subpool *spool = (struct hugepage_subpool *)page_private(page); bool restore_reserve; + unsigned long flags;
sp_kmemcg_uncharge_hpage(page); set_page_private(page, 0); @@ -1450,12 +1455,12 @@ static void __free_huge_page(struct page *page) ClearPagePrivate(page);
if (dhugetlb_enabled && PagePool(page)) { - spin_lock(&hugetlb_lock); + spin_lock_irqsave(&hugetlb_lock, flags); clear_page_huge_active(page); list_del(&page->lru); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); - spin_unlock(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); free_huge_page_to_dhugetlb_pool(page, restore_reserve); return; } @@ -1479,7 +1484,7 @@ static void __free_huge_page(struct page *page) restore_reserve = true; }
- spin_lock(&hugetlb_lock); + spin_lock_irqsave(&hugetlb_lock, flags); clear_page_huge_active(page); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); @@ -1490,78 +1495,30 @@ static void __free_huge_page(struct page *page) sp_memcg_uncharge_hpage(page); ClearPageHugeTemporary(page); remove_hugetlb_page(h, page, false); - spin_unlock(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); update_and_free_page(h, page); } else if (h->surplus_huge_pages_node[nid]) { /* remove the page from active list */ remove_hugetlb_page(h, page, true); - spin_unlock(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); update_and_free_page(h, page); } else { arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); - spin_unlock(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); } }
-/* - * As free_huge_page() can be called from a non-task context, we have - * to defer the actual freeing in a workqueue to prevent potential - * hugetlb_lock deadlock. - * - * free_hpage_workfn() locklessly retrieves the linked list of pages to - * be freed and frees them one-by-one. As the page->mapping pointer is - * going to be cleared in __free_huge_page() anyway, it is reused as the - * llist_node structure of a lockless linked list of huge pages to be freed. - */ -static LLIST_HEAD(hpage_freelist); - -static void free_hpage_workfn(struct work_struct *work) -{ - struct llist_node *node; - struct page *page; - - node = llist_del_all(&hpage_freelist); - - while (node) { - page = container_of((struct address_space **)node, - struct page, mapping); - node = node->next; - __free_huge_page(page); - } -} -static DECLARE_WORK(free_hpage_work, free_hpage_workfn); - -void free_huge_page(struct page *page) -{ - /* - * Defer freeing if in non-task context to avoid hugetlb_lock deadlock. - */ - if (!in_task()) { - /* - * Only call schedule_work() if hpage_freelist is previously - * empty. Otherwise, schedule_work() had been called but the - * workfn hasn't retrieved the list yet. - */ - if (llist_add((struct llist_node *)&page->mapping, - &hpage_freelist)) - schedule_work(&free_hpage_work); - return; - } - - __free_huge_page(page); -} - static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; ClearPageHugeFreed(page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); }
static void prep_compound_gigantic_page(struct page *page, unsigned int order) @@ -1856,7 +1813,7 @@ int dissolve_free_huge_page(struct page *page) if (page_belong_to_dynamic_hugetlb(page)) return -EBUSY;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!PageHuge(page)) { rc = 0; goto out; @@ -1873,7 +1830,7 @@ int dissolve_free_huge_page(struct page *page) * when it is dissolved. */ if (unlikely(!PageHugeFreed(head))) { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); cond_resched();
/* @@ -1897,12 +1854,12 @@ int dissolve_free_huge_page(struct page *page) } remove_hugetlb_page(h, page, false); h->max_huge_pages--; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); update_and_free_page(h, head); return 0; } out: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return rc; }
@@ -1944,16 +1901,16 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, if (hstate_is_gigantic(h)) return NULL;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) goto out_unlock; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL); if (!page) return NULL;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); /* * We could have raced with the pool size change. * Double check that and simply deallocate the new page @@ -1963,7 +1920,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, */ if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { SetPageHugeTemporary(page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); put_page(page); return NULL; } else { @@ -1972,7 +1929,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, }
out_unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return page; } @@ -2027,10 +1984,10 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid) if (nid != NUMA_NO_NODE) gfp_mask |= __GFP_THISNODE;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL, NULL); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
if (!page) { if (enable_charge_mighp) @@ -2048,18 +2005,18 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, { gfp_t gfp_mask = htlb_alloc_mask(h);
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) { struct page *page;
page = dequeue_huge_page_nodemask(h, gfp_mask, preferred_nid, nmask, NULL); if (page) { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return page; } } - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } @@ -2106,7 +2063,7 @@ static int gather_surplus_pages(struct hstate *h, long delta)
ret = -ENOMEM; retry: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); for (i = 0; i < needed; i++) { page = alloc_surplus_huge_page(h, htlb_alloc_mask(h), NUMA_NO_NODE, NULL); @@ -2123,7 +2080,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * After retaking hugetlb_lock, we need to recalculate 'needed' * because either resv_huge_pages or free_huge_pages may have changed. */ - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); needed = (h->resv_huge_pages + delta) - (h->free_huge_pages + allocated); if (needed > 0) { @@ -2161,12 +2118,12 @@ static int gather_surplus_pages(struct hstate *h, long delta) enqueue_huge_page(h, page); } free: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
/* Free unnecessary surplus pages to the buddy allocator */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) put_page(page); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
return ret; } @@ -2216,9 +2173,9 @@ static void return_unused_surplus_pages(struct hstate *h, }
out: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); update_and_free_pages_bulk(h, &page_list); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); }
@@ -2518,18 +2475,18 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, * Use hugetlb_lock to manage the account of * hugetlb cgroup. */ - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); list_add(&page->lru, &h->hugepage_activelist); hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(hstate_vma(vma)), h_cg, page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); goto out; } goto out_uncharge_cgroup; }
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); /* * glb_chg is passed to indicate whether or not a page must be taken * from the global free pool (global change). gbl_chg == 0 indicates @@ -2537,11 +2494,11 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, */ page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg); if (!page) { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); page = alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);; if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { SetPagePrivate(page); h->resv_huge_pages--; @@ -2550,7 +2507,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, /* Fall through */ } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); out: set_page_private(page, (unsigned long)spool);
@@ -2801,9 +2758,9 @@ static void try_to_free_low(struct hstate *h, unsigned long count, }
out: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); update_and_free_pages_bulk(h, &page_list); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); } #else static inline void try_to_free_low(struct hstate *h, unsigned long count, @@ -2868,7 +2825,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, return h->max_huge_pages; }
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
/* * Check for a node specific request. @@ -2912,14 +2869,14 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, * page, free_huge_page will handle it by freeing the page * and reducing the surplus. */ - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
/* yield cpu to avoid soft lockup */ cond_resched();
ret = alloc_pool_huge_page(h, nodes_allowed, node_alloc_noretry); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!ret) goto out;
@@ -2958,9 +2915,9 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, list_add(&page->lru, &page_list); } /* free the pages after dropping lock */ - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); update_and_free_pages_bulk(h, &page_list); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock);
while (count < persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, 1)) @@ -2968,7 +2925,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, } out: ret = persistent_huge_pages(h); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
NODEMASK_FREE(node_alloc_noretry);
@@ -3135,9 +3092,9 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj, if (err) return err;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); h->nr_overcommit_huge_pages = input; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return count; } @@ -3448,7 +3405,7 @@ int alloc_hugepage_from_hugetlb(struct dhugetlb_pool *hpool, return -ENOMEM;
spin_lock(&hpool->lock); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages_node[nid] < size) { ret = -ENOMEM; goto out_unlock; @@ -3470,7 +3427,7 @@ int alloc_hugepage_from_hugetlb(struct dhugetlb_pool *hpool, } ret = 0; out_unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); spin_unlock(&hpool->lock); return ret; } @@ -3835,7 +3792,7 @@ static void free_back_hugetlb(struct dhugetlb_pool *hpool) if (!h) return;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); list_for_each_entry_safe(page, page_next, &hpool->dhugetlb_1G_freelists, lru) { nr_pages = 1 << huge_page_order(h); @@ -3862,7 +3819,7 @@ static void free_back_hugetlb(struct dhugetlb_pool *hpool) hpool->free_reserved_1G = 0; hpool->total_reserved_1G = 0; INIT_LIST_HEAD(&hpool->dhugetlb_1G_freelists); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); }
bool free_dhugetlb_pool(struct dhugetlb_pool *hpool) @@ -4632,9 +4589,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write, goto out;
if (write) { - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); h->nr_overcommit_huge_pages = tmp; - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); } out: return ret; @@ -4731,7 +4688,7 @@ static int hugetlb_acct_memory(struct hstate *h, long delta, if (dhugetlb_enabled && hpool) return dhugetlb_acct_memory(h, delta, hpool);
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); /* * When cpuset is configured, it breaks the strict hugetlb page * reservation as the accounting is done on a global variable. Such @@ -4764,7 +4721,7 @@ static int hugetlb_acct_memory(struct hstate *h, long delta, return_unused_surplus_pages(h, (unsigned long) -delta);
out: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; }
@@ -6671,7 +6628,7 @@ bool isolate_huge_page(struct page *page, struct list_head *list) { bool ret = true;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!PageHeadHuge(page) || !page_huge_active(page) || !get_page_unless_zero(page)) { ret = false; @@ -6680,17 +6637,17 @@ bool isolate_huge_page(struct page *page, struct list_head *list) clear_page_huge_active(page); list_move_tail(&page->lru, list); unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; }
void putback_active_hugepage(struct page *page) { VM_BUG_ON_PAGE(!PageHead(page), page); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); set_page_huge_active(page); list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); put_page(page); }
@@ -6718,12 +6675,12 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason) SetPageHugeTemporary(oldpage); ClearPageHugeTemporary(newpage);
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->surplus_huge_pages_node[old_nid]) { h->surplus_huge_pages_node[old_nid]--; h->surplus_huge_pages_node[new_nid]++; } - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); } }
@@ -6739,10 +6696,10 @@ static struct page *hugetlb_alloc_hugepage_normal(struct hstate *h, { struct page *page = NULL;
- spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL, NULL); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock);
return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 7a93e1e439dd..13110d8ca3ea 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -167,11 +167,11 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css)
do { for_each_hstate(h) { - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); list_for_each_entry(page, &h->hugepage_activelist, lru) hugetlb_cgroup_move_parent(idx, h_cg, page);
- spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); idx++; } cond_resched(); @@ -422,14 +422,14 @@ void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) return;
VM_BUG_ON_PAGE(!PageHuge(oldhpage), oldhpage); - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); h_cg = hugetlb_cgroup_from_page(oldhpage); set_hugetlb_cgroup(oldhpage, NULL);
/* move the h_cg details to new cgroup */ set_hugetlb_cgroup(newhpage, h_cg); list_move(&newhpage->lru, &h->hugepage_activelist); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return; }
From: Mina Almasry almasrymina@google.com
mainline inclusion from mainline-v5.19-rc1 commit 4b25f030ae69ba710eff587cabb4c57cb7e7a8a1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
After commit db71ef79b59b ("hugetlb: make free_huge_page irq safe"), the subpool lock should be locked with spin_lock_irq() and all call sites was modified as such, except for the ones in hugetlbfs_statfs().
Link: https://lkml.kernel.org/r/20220429202207.3045-1-almasrymina@google.com Fixes: db71ef79b59b ("hugetlb: make free_huge_page irq safe") Signed-off-by: Mina Almasry almasrymina@google.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/hugetlbfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 92077383f320..58e879f089c4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1099,12 +1099,12 @@ static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf) if (sbinfo->spool) { long free_pages;
- spin_lock(&sbinfo->spool->lock); + spin_lock_irq(&sbinfo->spool->lock); buf->f_blocks = sbinfo->spool->max_hpages; free_pages = sbinfo->spool->max_hpages - sbinfo->spool->used_hpages; buf->f_bavail = buf->f_bfree = free_pages; - spin_unlock(&sbinfo->spool->lock); + spin_unlock_irq(&sbinfo->spool->lock); buf->f_files = sbinfo->max_inodes; buf->f_ffree = sbinfo->free_inodes; }
From: Naoya Horiguchi naoya.horiguchi@nec.com
mainline inclusion from mainline-v5.13-rc5 commit 0c5da35723a961d8c02ea516da2bcfeb007d7d2c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9SZXR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
When memory_failure() or soft_offline_page() is called on a tail page of some hugetlb page, "BUG: unable to handle page fault" error can be triggered.
remove_hugetlb_page() dereferences page->lru, so it's assumed that the page points to a head page, but one of the caller, dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page' which could be a tail page. So pass 'head' to it, instead.
Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality") Signed-off-by: Naoya Horiguchi naoya.horiguchi@nec.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Muchun Song songmuchun@bytedance.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Miaohe Lin linmiaohe@huawei.com Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/hugetlb.c [Context conflicts.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2844458b5fc5..10509d9f9c58 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1852,7 +1852,7 @@ int dissolve_free_huge_page(struct page *page) SetPageHWPoison(page); ClearPageHWPoison(head); } - remove_hugetlb_page(h, page, false); + remove_hugetlb_page(h, head, false); h->max_huge_pages--; spin_unlock_irq(&hugetlb_lock); update_and_free_page(h, head);