[PATCH OLK-5.10 0/2] fix CFS bandwidth vs. hrtimer self deadlock

25 Nov 2023

      Fix the following issue:

	CPU1      		         CPU2                            CPU3

T1 sets cfs_quota
   starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU3
					T2(worker thread) initiates
					offlining of CPU1
Hotplug operation starts
  ...
'period_timer' expires and is
re-enqueued on CPU1
  ...
take_cpu_down()
  CPU1 shuts down and does not handle timers
  anymore. They have to be migrated in the
  post dead hotplug steps by the control task.

					T2(worker thread) runs the
					post dead offline operation
									T1 holds lockA
									T1 is scheduled out
									//throttled by CFS bandwidth control
									T1 waits for 'period_timer' to expire
					T2(worker thread) waits for lockA

T1 waits there forever if it is scheduled out before it can execute the
hrtimer offline callback hrtimers_dead_cpu().
Thus T2 waits for lockA forever.

Thomas Gleixner (1):
  hrtimers: Push pending hrtimers away from outgoing CPU earlier

Yu Liao (1):
  cpu/hotplug: fix kabi breakage in enum cpuhp_state

 include/linux/hrtimer.h |  4 ++--
 include/linux/smp.h     |  1 +
 kernel/cpu.c            | 17 +++++++++++++++--
 kernel/smp.c            |  8 ++++++++
 kernel/time/hrtimer.c   | 33 ++++++++++++---------------------
 5 files changed, 38 insertions(+), 25 deletions(-)

-- 
2.25.1