From: Guo Ziliang guo.ziliang@zte.com.cn
stable inclusion from stable-v5.10.108 commit fa3aa103e79c7e7628e7c0ac55c80f0cb7a668b4 bugzilla: https://gitee.com/openeuler/kernel/issues/I574A9
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 029c4628b2eb2ca969e9bf979b05dc18d8d5575e upstream.
In our testing, a livelock task was found. Through sysrq printing, same stack was found every time, as follows:
__swap_duplicate+0x58/0x1a0 swapcache_prepare+0x24/0x30 __read_swap_cache_async+0xac/0x220 read_swap_cache_async+0x58/0xa0 swapin_readahead+0x24c/0x628 do_swap_page+0x374/0x8a0 __handle_mm_fault+0x598/0xd60 handle_mm_fault+0x114/0x200 do_page_fault+0x148/0x4d0 do_translation_fault+0xb0/0xd4 do_mem_abort+0x50/0xb0
The reason for the livelock is that swapcache_prepare() always returns EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it cannot jump out of the loop. We suspect that the task that clears the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the priority of the task stuck in a livelock so that the task that clears the SWAP_HAS_CACHE flag will run. The results show that the system returns to normal after the priority is lowered.
In our testing, multiple real-time tasks are bound to the same core, and the task in the livelock is the highest priority task of the core, so the livelocked task cannot be preempted.
Although cond_resched() is used by __read_swap_cache_async, it is an empty function in the preemptive system and cannot achieve the purpose of releasing the CPU. A high-priority task cannot release the CPU unless preempted by a higher-priority task. But when this task is already the highest priority task on this core, other tasks will not be able to be scheduled. So we think we should replace cond_resched() with schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will call set_current_state first to set the task state, so the task will be removed from the running queue, so as to achieve the purpose of giving up the CPU and prevent it from running in kernel mode for too long.
(akpm: ugly hack becomes uglier. But it fixes the issue in a backportable-to-stable fashion while we hopefully work on something better)
Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com Signed-off-by: Guo Ziliang guo.ziliang@zte.com.cn Reported-by: Zeal Robot zealci@zte.com.cn Reviewed-by: Ran Xiaokai ran.xiaokai@zte.com.cn Reviewed-by: Jiang Xuexin jiang.xuexin@zte.com.cn Reviewed-by: Yang Yang yang.yang29@zte.com.cn Acked-by: Hugh Dickins hughd@google.com Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Michal Hocko mhocko@kernel.org Cc: Minchan Kim minchan@kernel.org Cc: Johannes Weiner hannes@cmpxchg.org Cc: Roger Quadros rogerq@kernel.org Cc: Ziliang Guo guo.ziliang@zte.com.cn Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yu Liao liaoyu15@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/swap_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/swap_state.c b/mm/swap_state.c index a9e42d48312b..149f46781061 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -502,7 +502,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * __read_swap_cache_async(), which has set SWAP_HAS_CACHE * in swap_map, but not yet added its page to swap cache. */ - cond_resched(); + schedule_timeout_uninterruptible(1); }
/*