From: Benjamin Berg bberg@redhat.com
[ Upstream commit 9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7 ]
On modern CPUs it is quite normal that the temperature limits are reached and the CPU is throttled. In fact, often the thermal design is not sufficient to cool the CPU at full load and limits can quickly be reached when a burst in load happens. This will even happen with technologies like RAPL limitting the long term power consumption of the package.
Also, these limits are "softer", as Srinivas explains:
"CPU temperature doesn't have to hit max(TjMax) to get these warnings. OEMs ha[ve] an ability to program a threshold where a thermal interrupt can be generated. In some systems the offset is 20C+ (Read only value).
In recent systems, there is another offset on top of it which can be programmed by OS, once some agent can adjust power limits dynamically. By default this is set to low by the firmware, which I guess the prime motivation of Benjamin to submit the patch."
So these messages do not usually indicate a hardware issue (e.g. insufficient cooling). Log them as warnings to avoid confusion about their severity.
[ bp: Massage commit mesage. ]
Signed-off-by: Benjamin Berg bberg@redhat.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Hans de Goede hdegoede@redhat.com Tested-by: Christian Kellner ckellner@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: linux-edac linux-edac@vger.kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/x86/kernel/cpu/mce/therm_throt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c index ee229ce..ec6a07b 100644 --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -185,7 +185,7 @@ static void therm_throt_process(bool new_event, int event, int level) /* if we just entered the thermal event */ if (new_event) { if (event == THERMAL_THROTTLING_EVENT) - pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", + pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", this_cpu, level == CORE_LEVEL ? "Core" : "Package", state->count);