fix CVE-2024-26960
Huang Ying (1): mm/swapfile.c: use __try_to_reclaim_swap() in free_swap_and_cache()
Ryan Roberts (1): mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/swapfile.c | 70 +++++++++++++++++++++++++++------------------------ 1 file changed, 37 insertions(+), 33 deletions(-)
From: Huang Ying ying.huang@intel.com
mainline inclusion from mainline-v4.20-rc1 commit bcd49e86710b42f15c7512de594d23b3ae0b21d7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9L4RJ CVE: CVE-2024-26960
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The code path to reclaim the swap entry in free_swap_and_cache() is almost same as that of __try_to_reclaim_swap(). The largest difference is just coding style. So the support to the additional requirement of free_swap_and_cache() is added into __try_to_reclaim_swap(). free_swap_and_cache() is changed to call __try_to_reclaim_swap(), and delete the duplicated code. This will improve code readability and reduce the potential bugs.
There are 2 functionality differences between __try_to_reclaim_swap() and swap entry reclaim code of free_swap_and_cache().
- free_swap_and_cache() only reclaims the swap entry if the page is unmapped or swap is getting full. The support has been added into __try_to_reclaim_swap().
- try_to_free_swap() (called by __try_to_reclaim_swap()) checks pm_suspended_storage(), while free_swap_and_cache() not. I think this is OK. Because the page and the swap entry can be reclaimed later eventually.
Link: http://lkml.kernel.org/r/20180827075535.17406-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" ying.huang@intel.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Shaohua Li shli@kernel.org Cc: Hugh Dickins hughd@google.com Cc: Minchan Kim minchan@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/swapfile.c | 57 ++++++++++++++++++++++----------------------------- 1 file changed, 25 insertions(+), 32 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c index 9e0e519254b4..f096c09d917b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -112,26 +112,39 @@ static inline unsigned char swap_count(unsigned char ent) return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */ }
+/* Reclaim the swap entry anyway if possible */ +#define TTRS_ANYWAY 0x1 +/* + * Reclaim the swap entry if there are no more mappings of the + * corresponding page + */ +#define TTRS_UNMAPPED 0x2 +/* Reclaim the swap entry if swap is getting full*/ +#define TTRS_FULL 0x4 + /* returns 1 if swap entry is freed */ -static int -__try_to_reclaim_swap(struct swap_info_struct *si, unsigned long offset) +static int __try_to_reclaim_swap(struct swap_info_struct *si, + unsigned long offset, unsigned long flags) { swp_entry_t entry = swp_entry(si->type, offset); struct page *page; int ret = 0;
- page = find_get_page(swap_address_space(entry), swp_offset(entry)); + page = find_get_page(swap_address_space(entry), offset); if (!page) return 0; /* - * This function is called from scan_swap_map() and it's called - * by vmscan.c at reclaiming pages. So, we hold a lock on a page, here. - * We have to use trylock for avoiding deadlock. This is a special + * When this function is called from scan_swap_map_slots() and it's + * called by vmscan.c at reclaiming pages. So, we hold a lock on a page, + * here. We have to use trylock for avoiding deadlock. This is a special * case and you should use try_to_free_swap() with explicit lock_page() * in usual operations. */ if (trylock_page(page)) { - ret = try_to_free_swap(page); + if ((flags & TTRS_ANYWAY) || + ((flags & TTRS_UNMAPPED) && !page_mapped(page)) || + ((flags & TTRS_FULL) && mem_cgroup_swap_full(page))) + ret = try_to_free_swap(page); unlock_page(page); } put_page(page); @@ -790,7 +803,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si, int swap_was_freed; unlock_cluster(ci); spin_unlock(&si->lock); - swap_was_freed = __try_to_reclaim_swap(si, offset); + swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); spin_lock(&si->lock); /* entry was freed successfully, try to use this again */ if (swap_was_freed) @@ -1702,7 +1715,6 @@ int try_to_free_swap(struct page *page) int free_swap_and_cache(swp_entry_t entry) { struct swap_info_struct *p; - struct page *page = NULL; unsigned char count;
if (non_swap_entry(entry)) @@ -1712,31 +1724,12 @@ int free_swap_and_cache(swp_entry_t entry) if (p) { count = __swap_entry_free(p, entry, 1); if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) { - page = find_get_page(swap_address_space(entry), - swp_offset(entry)); - if (page && !trylock_page(page)) { - put_page(page); - page = NULL; - } - } else if (!count) + !swap_page_trans_huge_swapped(p, entry)) + __try_to_reclaim_swap(p, swp_offset(entry), + TTRS_UNMAPPED | TTRS_FULL); + else if (!count) free_swap_slot(entry); } - if (page) { - /* - * Not mapped elsewhere, or swap space full? Free it! - * Also recheck PageSwapCache now page is locked (above). - */ - if (PageSwapCache(page) && !PageWriteback(page) && - (!page_mapped(page) || mem_cgroup_swap_full(page)) && - !swap_page_trans_huge_swapped(p, entry)) { - page = compound_head(page); - delete_from_swap_cache(page); - SetPageDirty(page); - } - unlock_page(page); - put_page(page); - } return p != NULL; }
From: Ryan Roberts ryan.roberts@arm.com
stable inclusion from stable-v5.10.215 commit d85c11c97ecf92d47a4b29e3faca714dc1f18d0d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9L4RJ CVE: CVE-2024-26960
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ]
There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache().
Details of how to provoke one possible issue (thanks to David Hildenbrand for deriving this):
--8<-----
__swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE".
swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries.
Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry.
Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.]
Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE
Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()?
--8<-----
Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat... Signed-off-by: Ryan Roberts ryan.roberts@arm.com Cc: David Hildenbrand david@redhat.com Cc: "Huang, Ying" ying.huang@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org
Conflicts: mm/swap.c [ conflicts due to prior cleanup commit is not merged]
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/swapfile.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c index f096c09d917b..1d98be001d3d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1192,6 +1192,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, }
/* + * Note that when only holding the PTL, swapoff might succeed immediately + * after freeing a swap entry. Therefore, immediately after + * __swap_entry_free(), the swap info might become stale and should not + * be touched without a prior get_swap_device(). + * * Check whether swap entry is valid in the swap device. If so, * return pointer to swap_info_struct, and keep the swap entry valid * via preventing the swap device from being swapoff, until @@ -1720,8 +1725,13 @@ int free_swap_and_cache(swp_entry_t entry) if (non_swap_entry(entry)) return 1;
- p = _swap_info_get(entry); + p = get_swap_device(entry); if (p) { + if (WARN_ON(!p->swap_map[swp_offset(entry)])) { + put_swap_device(p); + return 0; + } + count = __swap_entry_free(p, entry, 1); if (count == SWAP_HAS_CACHE && !swap_page_trans_huge_swapped(p, entry)) @@ -1729,6 +1739,7 @@ int free_swap_and_cache(swp_entry_t entry) TTRS_UNMAPPED | TTRS_FULL); else if (!count) free_swap_slot(entry); + put_swap_device(p); } return p != NULL; }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/7109 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/6...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/7109 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/6...