[PATCH kernel-4.19 1/3] x86/mce: Add Zhaoxin MCE support

mainline inclusion from mainline-5.5 commit 6e898d2bf67a82df0aa0c955adc9278faba9a635 category: x86/mce Add support for more Zhaoxin CPUs. -------------------------------- All newer Zhaoxin CPUs are compatible with Intel's Machine-Check Architecture, so add support for them. [ bp: Reflow comment in vendor_disable_error_reporting() and massage commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhao... Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com> --- arch/x86/kernel/cpu/mce/core.c | 42 ++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 5221c49d335e..dce0fbd4cb0f 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -473,8 +473,10 @@ int mce_usable_address(struct mce *m) if (!(m->status & MCI_STATUS_ADDRV)) return 0; - /* Checks after this one are Intel-specific: */ - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + /* Checks after this one are Intel/Zhaoxin-specific: */ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN && + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR) return 1; if (!(m->status & MCI_STATUS_MISCV)) @@ -492,10 +494,14 @@ EXPORT_SYMBOL_GPL(mce_usable_address); bool mce_is_memory_error(struct mce *m) { - if (m->cpuvendor == X86_VENDOR_AMD || - m->cpuvendor == X86_VENDOR_HYGON) { + switch (m->cpuvendor) { + case X86_VENDOR_AMD: + case X86_VENDOR_HYGON: return amd_mce_is_memory_error(m); - } else if (m->cpuvendor == X86_VENDOR_INTEL) { + + case X86_VENDOR_INTEL: + case X86_VENDOR_ZHAOXIN: + case X86_VENDOR_CENTAUR: /* * Intel SDM Volume 3B - 15.9.2 Compound Error Codes * @@ -512,9 +518,10 @@ bool mce_is_memory_error(struct mce *m) return (m->status & 0xef80) == BIT(7) || (m->status & 0xef00) == BIT(8) || (m->status & 0xeffc) == 0xc; - } - return false; + default: + return false; + } } EXPORT_SYMBOL_GPL(mce_is_memory_error); @@ -1658,6 +1665,19 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) if (c->x86 == 6 && c->x86_model == 45) quirk_no_way_out = quirk_sandybridge_ifu; } + + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || + c->x86_vendor == X86_VENDOR_CENTAUR) { + /* + * All newer Zhaoxin CPUs support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) { + if (cfg->monarch_timeout < 0) + cfg->monarch_timeout = USEC_PER_SEC; + } + } + if (cfg->monarch_timeout < 0) cfg->monarch_timeout = 0; if (cfg->bootlog != 0) @@ -1963,15 +1983,17 @@ static void mce_disable_error_reporting(void) static void vendor_disable_error_reporting(void) { /* - * Don't clear on Intel, AMD or Hygon CPUs. Some of these MSRs are - * socket-wide. + * Don't clear on Intel, AMD, Hygon or Zhaoxin CPUs. Some of these + * MSRs are socket-wide. * Disabling them for just a single offlined CPU is bad, since it will * inhibit reporting for all shared resources on the socket like the * last level cache (LLC), the integrated memory controller (iMC), etc. */ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL || boot_cpu_data.x86_vendor == X86_VENDOR_HYGON || - boot_cpu_data.x86_vendor == X86_VENDOR_AMD) ++ boot_cpu_data.x86_vendor == X86_VENDOR_AMD || ++ boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN || ++ boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR) return; mce_disable_error_reporting(); -- 2.20.1

On 2021/3/25 18:07, LeoLiu-oc wrote:
mainline inclusion from mainline-5.5 commit 6e898d2bf67a82df0aa0c955adc9278faba9a635 category: x86/mce
Add support for more Zhaoxin CPUs.
--------------------------------
All newer Zhaoxin CPUs are compatible with Intel's Machine-Check Architecture, so add support for them.
[ bp: Reflow comment in vendor_disable_error_reporting() and massage commit message. ]
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhao...
Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com> --- arch/x86/kernel/cpu/mce/core.c | 42 ++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 5221c49d335e..dce0fbd4cb0f 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -473,8 +473,10 @@ int mce_usable_address(struct mce *m) if (!(m->status & MCI_STATUS_ADDRV)) return 0;
- /* Checks after this one are Intel-specific: */ - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + /* Checks after this one are Intel/Zhaoxin-specific: */ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN && + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR)
This looks good to me, [...]
+ + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || + c->x86_vendor == X86_VENDOR_CENTAUR) { + /* + * All newer Zhaoxin CPUs support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
But do we need constrains for x86_model for both vendor Zhaoxin and Centaur? I'm not familiar with those two CPUs and you are the experts, but I can see patches for C state, for Centaur: + /* + * For all recent Centaur CPUs, the ucode will make sure that each + * core can keep cache coherence with each other while entering C3 + * type state. So, set bm_check to 1 to indicate that the kernel + * doesn't need to execute a cache flush operation (WBINVD) when + * entering C3 type state. + */ + if (c->x86_vendor == X86_VENDOR_CENTAUR) { + if (c->x86 > 6 || (c->x86 == 6 && c->x86_model == 0x0f && + c->x86_stepping >= 0x0e)) + flags->bm_check = 1; + } But for Zhaoxin, + if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { + /* + * All Zhaoxin CPUs that support C3 share cache. + * And caches should not be flushed by software while + * entering C3 type state. + */ + flags->bm_check = 1; I'm just curious, correct me if I'm wrong. Thanks Hanjun

On 26/03/2021 09:56, Hanjun Guo wrote:
On 2021/3/25 18:07, LeoLiu-oc wrote:
mainline inclusion from mainline-5.5 commit 6e898d2bf67a82df0aa0c955adc9278faba9a635 category: x86/mce
Add support for more Zhaoxin CPUs.
--------------------------------
All newer Zhaoxin CPUs are compatible with Intel's Machine-Check Architecture, so add support for them.
[ bp: Reflow comment in vendor_disable_error_reporting() and massage commit message. ]
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhao...
Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com> --- arch/x86/kernel/cpu/mce/core.c | 42 ++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 5221c49d335e..dce0fbd4cb0f 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -473,8 +473,10 @@ int mce_usable_address(struct mce *m) if (!(m->status & MCI_STATUS_ADDRV)) return 0;
- /* Checks after this one are Intel-specific: */ - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + /* Checks after this one are Intel/Zhaoxin-specific: */ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN && + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR)
This looks good to me,
[...]
+ + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || + c->x86_vendor == X86_VENDOR_CENTAUR) { + /* + * All newer Zhaoxin CPUs support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
But do we need constrains for x86_model for both vendor Zhaoxin and Centaur?
Yes. Zhaoxin have two CPU vendor ID of "CentaurHauls" and " Shanghai " now. Zhaoxin CPUs with Family > 6, or Family == 6 and Model == 0x19/0x1f support MCE broadcasting.
I'm not familiar with those two CPUs and you are the experts, but I can see patches for C state, for Centaur:
+ /* + * For all recent Centaur CPUs, the ucode will make sure that each + * core can keep cache coherence with each other while entering C3 + * type state. So, set bm_check to 1 to indicate that the kernel + * doesn't need to execute a cache flush operation (WBINVD) when + * entering C3 type state. + */ + if (c->x86_vendor == X86_VENDOR_CENTAUR) { + if (c->x86 > 6 || (c->x86 == 6 && c->x86_model == 0x0f && + c->x86_stepping >= 0x0e)) + flags->bm_check = 1; + }
This is different item with MCE. These CPUs are belong to Zhaoxin and the if case distinguish from old VIA made "CentaurHauls" CPU vendor ID CPUs.
But for Zhaoxin,
+ if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { + /* + * All Zhaoxin CPUs that support C3 share cache. + * And caches should not be flushed by software while + * entering C3 type state. + */ + flags->bm_check = 1;
I'm just curious, correct me if I'm wrong.
Zhaoxin CPUs with " Shanghai " Vendor ID do not need distinguish from other Vendor ID CPUs. Sincerely TonyWWangoc
Thanks Hanjun .

On 2021/3/26 11:37, Tony W Wang-oc wrote:
On 26/03/2021 09:56, Hanjun Guo wrote:
On 2021/3/25 18:07, LeoLiu-oc wrote:
mainline inclusion from mainline-5.5 commit 6e898d2bf67a82df0aa0c955adc9278faba9a635 category: x86/mce
Add support for more Zhaoxin CPUs.
--------------------------------
All newer Zhaoxin CPUs are compatible with Intel's Machine-Check Architecture, so add support for them.
[ bp: Reflow comment in vendor_disable_error_reporting() and massage commit message. ]
Signed-off-by: Tony W Wang-oc<TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov<bp@suse.de> Cc:CooperYan@zhaoxin.com Cc:DavidWang@zhaoxin.com Cc:HerryYang@zhaoxin.com Cc: "H. Peter Anvin"<hpa@zytor.com> Cc: Ingo Molnar<mingo@redhat.com> Cc: linux-edac<linux-edac@vger.kernel.org> Cc:QiyuanWang@zhaoxin.com Cc: Thomas Gleixner<tglx@linutronix.de> Cc: Tony Luck<tony.luck@intel.com> Cc: x86-ml<x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhao...
Signed-off-by: LeoLiu-oc<LeoLiu-oc@zhaoxin.com> --- arch/x86/kernel/cpu/mce/core.c | 42 ++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 5221c49d335e..dce0fbd4cb0f 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -473,8 +473,10 @@ int mce_usable_address(struct mce *m) if (!(m->status & MCI_STATUS_ADDRV)) return 0;
- /* Checks after this one are Intel-specific: */ - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + /* Checks after this one are Intel/Zhaoxin-specific: */ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN && + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR) This looks good to me,
[...]
+ + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || + c->x86_vendor == X86_VENDOR_CENTAUR) { + /* + * All newer Zhaoxin CPUs support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) { But do we need constrains for x86_model for both vendor Zhaoxin and Centaur?
Yes. Zhaoxin have two CPU vendor ID of "CentaurHauls" and " Shanghai " now. Zhaoxin CPUs with Family > 6, or Family == 6 and Model == 0x19/0x1f support MCE broadcasting.
I'm not familiar with those two CPUs and you are the experts, but I can see patches for C state, for Centaur:
+ /* + * For all recent Centaur CPUs, the ucode will make sure that each + * core can keep cache coherence with each other while entering C3 + * type state. So, set bm_check to 1 to indicate that the kernel + * doesn't need to execute a cache flush operation (WBINVD) when + * entering C3 type state. + */ + if (c->x86_vendor == X86_VENDOR_CENTAUR) { + if (c->x86 > 6 || (c->x86 == 6 && c->x86_model == 0x0f && + c->x86_stepping >= 0x0e)) + flags->bm_check = 1; + }
This is different item with MCE. These CPUs are belong to Zhaoxin and the if case distinguish from old VIA made "CentaurHauls" CPU vendor ID CPUs.
But for Zhaoxin,
+ if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { + /* + * All Zhaoxin CPUs that support C3 share cache. + * And caches should not be flushed by software while + * entering C3 type state. + */ + flags->bm_check = 1;
I'm just curious, correct me if I'm wrong.
Zhaoxin CPUs with " Shanghai " Vendor ID do not need distinguish from other Vendor ID CPUs.
Thanks for the reply I will add my review for this patch set. Thanks Hanjun
participants (3)
-
Hanjun Guo
-
LeoLiu-oc
-
Tony W Wang-oc