- Kernel - mailweb.openeuler.org

[PATCH kernel-4.19 1/2] Revert "kretprobe: check re-registration of the same kretprobe earlier"
by Yang Yingliang 28 Jul '21

28 Jul '21

From: Wang ShaoBo <bobo.shaobowang(a)huawei.com> hulk inclusion category: bugfix bugzilla: 31369 CVE: NA ----------------------------------------------- This reverts commit dd99d9a22a13b9a33dd829048cdf8fe4dc50556b. Following patches mainline has fixed this issue: 0188b87899ff ("kretprobe: Avoid re-registration of the same kretprobe earlier") 33b1d1466885 ("kprobes: Warn if the kprobe is reregistered") Signed-off-by: Wang ShaoBo <bobo.shaobowang(a)huawei.com> Reviewed-by: Cheng Jian <cj.chengjian(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- kernel/kprobes.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 60e7f181a97d3..91985ae1fe0b3 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1979,14 +1979,6 @@ int register_kretprobe(struct kretprobe *rp) } } - /* - * Return error if it's being re-registered, - * also give a warning message to the developer. - */ - ret = check_kprobe_rereg(&rp->kp); - if (WARN_ON(ret)) - return ret; - rp->kp.pre_handler = pre_handler_kretprobe; rp->kp.post_handler = NULL; rp->kp.fault_handler = NULL; -- 2.25.1

1 1

High quliaty JYMED Foley Catheter Strap and foam belt products factory Shanghai
by allan01＠jyylsh.cn 28 Jul '21

28 Jul '21

Only they who fulfill their duties in everyday matters will fulfill them on great occasions.hello Sir , Shanghai JianYing Medical company is the main medical bandages, Catheter holder leg strap fetal belt, CPAP headgear strap and pad manufacturer in China since 2006, our major customers are those have very good reputation and renown companies in the field。 We believe that you can rely on us as we provide you quite satisfy factory service and products with best quality in the world. Our products are medical bandages, fetal belt,CPAP headgear strap . We are very happy if you do not hesitate to send us any of your inquiries and we will always give satisfy factory services. Looking forward to hearing from you at your convenience. Thanks & Best regards, Janey Shanghai QiuMing Textile Co.,Ltd. Shanghai JianYing Medical Technology Co.,Ltd. Area B 4th Floor Building 11, Lane 258, Yinlong road Jiading district, Shanghai China 201806 Mob: +86 177 49730956 WhatsApp: +86 177 49730956 Web:www.cpapjy.com Web: http://www.elasticmedicalbelt.com/news.html If you are not interested in our products and services, you can unsubscribed our email and you will no longer receive similar email notifications.

1 0

[PATCH openEuler-21.09 0/7] Support for kernel seamless upgrade
by Zheng Zengkai 27 Jul '21

27 Jul '21

We introduced several features to support kernel seamless upgrade: 1 pin mem function for checkpoint and restore: 2 pid reserve function for checkpoint and restore 3 cpu park to accelerate cpus' down and up 4 quick kexec to accelerate relocating kernel images 5 legacy pmem support for arm64 ----------------------------------- Jingxian He (2): mm: add pin memory method for checkpoint add restore pid: add pid reserve method for checkpoint and recover Sang Yan (4): kexec: Add quick kexec support for kernel arm64: Reserve memory for quick kexec arm64: smp: Add support for cpu park config: enable kernel hotupgrade features by default ZhuLing (1): arm64: Add memmap parameter and register pmem arch/Kconfig | 10 + arch/arm64/Kconfig | 33 + arch/arm64/configs/openeuler_defconfig | 8 + arch/arm64/include/asm/kexec.h | 6 + arch/arm64/include/asm/smp.h | 15 + arch/arm64/kernel/Makefile | 2 + arch/arm64/kernel/cpu-park.S | 58 ++ arch/arm64/kernel/machine_kexec.c | 2 +- arch/arm64/kernel/pmem.c | 35 + arch/arm64/kernel/process.c | 4 + arch/arm64/kernel/setup.c | 26 + arch/arm64/kernel/smp.c | 231 +++++ arch/arm64/mm/init.c | 258 ++++++ drivers/char/Kconfig | 7 + drivers/char/Makefile | 1 + drivers/char/pin_memory.c | 209 +++++ drivers/nvdimm/Kconfig | 5 + drivers/nvdimm/Makefile | 2 +- fs/proc/task_mmu.c | 138 +++ include/linux/crash_core.h | 5 + include/linux/ioport.h | 1 + include/linux/kexec.h | 24 +- include/linux/pin_mem.h | 99 ++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 + kernel/kexec.c | 10 + kernel/kexec_core.c | 42 +- kernel/pid.c | 10 + mm/Kconfig | 18 + mm/Makefile | 1 + mm/huge_memory.c | 63 ++ mm/memory.c | 65 ++ mm/pin_mem.c | 1142 ++++++++++++++++++++++++ 33 files changed, 2527 insertions(+), 15 deletions(-) create mode 100644 arch/arm64/kernel/cpu-park.S create mode 100644 arch/arm64/kernel/pmem.c create mode 100644 drivers/char/pin_memory.c create mode 100644 include/linux/pin_mem.h create mode 100644 mm/pin_mem.c -- 2.20.1

1 7

[PATCH openEuler-21.09 01/24] irqchip/gic-phytium-2500: Add support for GIC of Phytium S2500
by Zheng Zengkai 27 Jul '21

27 Jul '21

From: Wang Yinfeng <wangyinfeng(a)phytium.com.cn> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I41AUQ CVE: NA ------------------------------------------------- Phytium S2500 adjusts the GIC's implementation to support multi-socket system designs. This patch adds the driver of this new implementation while keeping the kernel binary compatible with other ARM servers in the ecosystem. Signed-off-by: Wang Yinfeng <wangyinfeng(a)phytium.com.cn> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> Reviewed-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- arch/arm64/Kconfig.platforms | 6 + drivers/irqchip/Kconfig | 9 + drivers/irqchip/Makefile | 1 + drivers/irqchip/irq-gic-phytium-2500-its.c | 5524 ++++++++++++++++++ drivers/irqchip/irq-gic-phytium-2500.c | 2503 ++++++++ include/acpi/actbl2.h | 3 +- include/linux/irqchip/arm-gic-phytium-2500.h | 725 +++ 7 files changed, 8770 insertions(+), 1 deletion(-) create mode 100644 drivers/irqchip/irq-gic-phytium-2500-its.c create mode 100644 drivers/irqchip/irq-gic-phytium-2500.c create mode 100644 include/linux/irqchip/arm-gic-phytium-2500.h diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms index 5c4ac1c9f4e0..dd6b7466fe28 100644 --- a/arch/arm64/Kconfig.platforms +++ b/arch/arm64/Kconfig.platforms @@ -199,6 +199,12 @@ config ARCH_MXC This enables support for the ARMv8 based SoCs in the NXP i.MX family. +config ARCH_PHYTIUM + bool "Phytium SoC Family" + help + This enables support for Phytium ARMv8 SoC family. + select ARM_GIC_PHYTIUM_2500 + config ARCH_QCOM bool "Qualcomm Platforms" select GPIOLIB diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index 6156a065681b..8bee8d6d7a42 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -56,6 +56,15 @@ config ARM_GIC_V3_ITS_FSL_MC depends on FSL_MC_BUS default ARM_GIC_V3_ITS +config ARM_GIC_PHYTIUM_2500 + bool + select IRQ_DOMAIN + select GENERIC_IRQ_MULTI_HANDLER + select IRQ_DOMAIN_HIERARCHY + select PARTITION_PERCPU + select GENERIC_IRQ_EFFECTIVE_AFF_MASK + select GENERIC_MSI_IRQ_DOMAIN + config ARM_NVIC bool select IRQ_DOMAIN_HIERARCHY diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile index 94c2885882ee..ed171c68cc6d 100644 --- a/drivers/irqchip/Makefile +++ b/drivers/irqchip/Makefile @@ -34,6 +34,7 @@ obj-$(CONFIG_ARM_GIC_V3) += irq-gic-v3.o irq-gic-v3-mbi.o irq-gic-common.o obj-$(CONFIG_ARM_GIC_V3_ITS) += irq-gic-v3-its.o irq-gic-v3-its-platform-msi.o irq-gic-v4.o obj-$(CONFIG_ARM_GIC_V3_ITS_PCI) += irq-gic-v3-its-pci-msi.o obj-$(CONFIG_ARM_GIC_V3_ITS_FSL_MC) += irq-gic-v3-its-fsl-mc-msi.o +obj-$(CONFIG_ARM_GIC_PHYTIUM_2500) += irq-gic-phytium-2500.o irq-gic-phytium-2500-its.o obj-$(CONFIG_PARTITION_PERCPU) += irq-partition-percpu.o obj-$(CONFIG_HISILICON_IRQ_MBIGEN) += irq-mbigen.o obj-$(CONFIG_ARM_NVIC) += irq-nvic.o diff --git a/drivers/irqchip/irq-gic-phytium-2500-its.c b/drivers/irqchip/irq-gic-phytium-2500-its.c new file mode 100644 index 000000000000..227fbc00c3da --- /dev/null +++ b/drivers/irqchip/irq-gic-phytium-2500-its.c @@ -0,0 +1,5524 @@ +/* + * Copyright (C) 2020 Phytium Corporation. + * Author: Wang Yinfeng <wangyinfeng(a)phytium.com.cn> + * Chen Baozi <chenbaozi(a)phytium.com.cn> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/acpi.h> +#include <linux/acpi_iort.h> +#include <linux/bitfield.h> +#include <linux/bitmap.h> +#include <linux/cpu.h> +#include <linux/crash_dump.h> +#include <linux/delay.h> +#include <linux/dma-iommu.h> +#include <linux/efi.h> +#include <linux/interrupt.h> +#include <linux/iopoll.h> +#include <linux/irqdomain.h> +#include <linux/list.h> +#include <linux/log2.h> +#include <linux/memblock.h> +#include <linux/mm.h> +#include <linux/msi.h> +#include <linux/of.h> +#include <linux/of_address.h> +#include <linux/of_irq.h> +#include <linux/of_pci.h> +#include <linux/of_platform.h> +#include <linux/percpu.h> +#include <linux/slab.h> +#include <linux/syscore_ops.h> + +#include <linux/irqchip.h> +#include <linux/irqchip/arm-gic-phytium-2500.h> +#include <linux/irqchip/arm-gic-v4.h> + +#include <asm/cputype.h> +#include <asm/exception.h> + +#include "irq-gic-common.h" + +#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0) +#define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1) +#define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2) + +#define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0) +#define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1) + +static u32 lpi_id_bits; + +/* + * We allocate memory for PROPBASE to cover 2 ^ lpi_id_bits LPIs to + * deal with (one configuration byte per interrupt). PENDBASE has to + * be 64kB aligned (one bit per LPI, plus 8192 bits for SPI/PPI/SGI). + */ +#define LPI_NRBITS lpi_id_bits +#define LPI_PROPBASE_SZ ALIGN(BIT(LPI_NRBITS), SZ_64K) +#define LPI_PENDBASE_SZ ALIGN(BIT(LPI_NRBITS) / 8, SZ_64K) + +#define LPI_PROP_DEFAULT_PRIO GICD_INT_DEF_PRI + +/* + * Collection structure - just an ID, and a redistributor address to + * ping. We use one per CPU as a bag of interrupts assigned to this + * CPU. + */ +struct its_collection { + u64 target_address; + u16 col_id; +}; + +/* + * The ITS_BASER structure - contains memory information, cached + * value of BASER register configuration and ITS page size. + */ +struct its_baser { + void *base; + u64 val; + u32 order; + u32 psz; +}; + +struct its_device; + +/* + * The ITS structure - contains most of the infrastructure, with the + * top-level MSI domain, the command queue, the collections, and the + * list of devices writing to it. + * + * dev_alloc_lock has to be taken for device allocations, while the + * spinlock must be taken to parse data structures such as the device + * list. + */ +struct its_node { + raw_spinlock_t lock; + struct mutex dev_alloc_lock; + struct list_head entry; + void __iomem *base; + void __iomem *sgir_base; + phys_addr_t phys_base; + struct its_cmd_block *cmd_base; + struct its_cmd_block *cmd_write; + struct its_baser tables[GITS_BASER_NR_REGS]; + struct its_collection *collections; + struct fwnode_handle *fwnode_handle; + u64 (*get_msi_base)(struct its_device *its_dev); + u64 typer; + u64 cbaser_save; + u32 ctlr_save; + u32 mpidr; + struct list_head its_device_list; + u64 flags; + unsigned long list_nr; + int numa_node; + unsigned int msi_domain_flags; + u32 pre_its_base; /* for Socionext Synquacer */ + int vlpi_redist_offset; +}; + +#define is_v4(its) (!!((its)->typer & GITS_TYPER_VLPIS)) +#define is_v4_1(its) (!!((its)->typer & GITS_TYPER_VMAPP)) +#define device_ids(its) (FIELD_GET(GITS_TYPER_DEVBITS, (its)->typer) + 1) + +#define ITS_ITT_ALIGN SZ_256 + +/* The maximum number of VPEID bits supported by VLPI commands */ +#define ITS_MAX_VPEID_BITS \ + ({ \ + int nvpeid = 16; \ + if (gic_rdists->has_rvpeid && \ + gic_rdists->gicd_typer2 & GICD_TYPER2_VIL) \ + nvpeid = 1 + (gic_rdists->gicd_typer2 & \ + GICD_TYPER2_VID); \ + \ + nvpeid; \ + }) +#define ITS_MAX_VPEID (1 << (ITS_MAX_VPEID_BITS)) + +/* Convert page order to size in bytes */ +#define PAGE_ORDER_TO_SIZE(o) (PAGE_SIZE << (o)) + +struct event_lpi_map { + unsigned long *lpi_map; + u16 *col_map; + irq_hw_number_t lpi_base; + int nr_lpis; + raw_spinlock_t vlpi_lock; + struct its_vm *vm; + struct its_vlpi_map *vlpi_maps; + int nr_vlpis; +}; + +/* + * The ITS view of a device - belongs to an ITS, owns an interrupt + * translation table, and a list of interrupts. If it some of its + * LPIs are injected into a guest (GICv4), the event_map.vm field + * indicates which one. + */ +struct its_device { + struct list_head entry; + struct its_node *its; + struct event_lpi_map event_map; + void *itt; + u32 nr_ites; + u32 device_id; + bool shared; +}; + +static struct { + raw_spinlock_t lock; + struct its_device *dev; + struct its_vpe **vpes; + int next_victim; +} vpe_proxy; + +struct cpu_lpi_count { + atomic_t managed; + atomic_t unmanaged; +}; + +static DEFINE_PER_CPU(struct cpu_lpi_count, cpu_lpi_count_ft2500); + +static LIST_HEAD(its_nodes); +static DEFINE_RAW_SPINLOCK(its_lock); +static struct rdists *gic_rdists; +static struct irq_domain *its_parent; + +static unsigned long its_list_map; +static u16 vmovp_seq_num; +static DEFINE_RAW_SPINLOCK(vmovp_lock); + +static DEFINE_IDA(its_vpeid_ida); + +#define gic_data_rdist() (raw_cpu_ptr(gic_rdists->rdist)) +#define gic_data_rdist_cpu(cpu) (per_cpu_ptr(gic_rdists->rdist, cpu)) +#define gic_data_rdist_rd_base() (gic_data_rdist()->rd_base) +#define gic_data_rdist_vlpi_base() (gic_data_rdist_rd_base() + SZ_128K) + +/* + * Skip ITSs that have no vLPIs mapped, unless we're on GICv4.1, as we + * always have vSGIs mapped. + */ +static bool require_its_list_vmovp(struct its_vm *vm, struct its_node *its) +{ + return (gic_rdists->has_rvpeid || vm->vlpi_count[its->list_nr]); +} + +static u16 get_its_list(struct its_vm *vm) +{ + struct its_node *its; + unsigned long its_list = 0; + + list_for_each_entry(its, &its_nodes, entry) { + if (!is_v4(its)) + continue; + + if (require_its_list_vmovp(vm, its)) + __set_bit(its->list_nr, &its_list); + } + + return (u16)its_list; +} + +static inline u32 its_get_event_id(struct irq_data *d) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + return d->hwirq - its_dev->event_map.lpi_base; +} + +static struct its_collection *dev_event_to_col(struct its_device *its_dev, + u32 event) +{ + struct its_node *its = its_dev->its; + + return its->collections + its_dev->event_map.col_map[event]; +} + +static struct its_vlpi_map *dev_event_to_vlpi_map(struct its_device *its_dev, + u32 event) +{ + if (WARN_ON_ONCE(event >= its_dev->event_map.nr_lpis)) + return NULL; + + return &its_dev->event_map.vlpi_maps[event]; +} + +static struct its_vlpi_map *get_vlpi_map(struct irq_data *d) +{ + if (irqd_is_forwarded_to_vcpu(d)) { + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + + return dev_event_to_vlpi_map(its_dev, event); + } + + return NULL; +} + +static int vpe_to_cpuid_lock(struct its_vpe *vpe, unsigned long *flags) +{ + raw_spin_lock_irqsave(&vpe->vpe_lock, *flags); + return vpe->col_idx; +} + +static void vpe_to_cpuid_unlock(struct its_vpe *vpe, unsigned long flags) +{ + raw_spin_unlock_irqrestore(&vpe->vpe_lock, flags); +} + +static int irq_to_cpuid_lock(struct irq_data *d, unsigned long *flags) +{ + struct its_vlpi_map *map = get_vlpi_map(d); + int cpu; + + if (map) { + cpu = vpe_to_cpuid_lock(map->vpe, flags); + } else { + /* Physical LPIs are already locked via the irq_desc lock */ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + cpu = its_dev->event_map.col_map[its_get_event_id(d)]; + /* Keep GCC quiet... */ + *flags = 0; + } + + return cpu; +} + +static void irq_to_cpuid_unlock(struct irq_data *d, unsigned long flags) +{ + struct its_vlpi_map *map = get_vlpi_map(d); + + if (map) + vpe_to_cpuid_unlock(map->vpe, flags); +} + +static struct its_collection *valid_col(struct its_collection *col) +{ + if (WARN_ON_ONCE(col->target_address & GENMASK_ULL(15, 0))) + return NULL; + + return col; +} + +static struct its_vpe *valid_vpe(struct its_node *its, struct its_vpe *vpe) +{ + if (valid_col(its->collections + vpe->col_idx)) + return vpe; + + return NULL; +} + +/* + * ITS command descriptors - parameters to be encoded in a command + * block. + */ +struct its_cmd_desc { + union { + struct { + struct its_device *dev; + u32 event_id; + } its_inv_cmd; + + struct { + struct its_device *dev; + u32 event_id; + } its_clear_cmd; + + struct { + struct its_device *dev; + u32 event_id; + } its_int_cmd; + + struct { + struct its_device *dev; + int valid; + } its_mapd_cmd; + + struct { + struct its_collection *col; + int valid; + } its_mapc_cmd; + + struct { + struct its_device *dev; + u32 phys_id; + u32 event_id; + } its_mapti_cmd; + + struct { + struct its_device *dev; + struct its_collection *col; + u32 event_id; + } its_movi_cmd; + + struct { + struct its_device *dev; + u32 event_id; + } its_discard_cmd; + + struct { + struct its_collection *col; + } its_invall_cmd; + + struct { + struct its_vpe *vpe; + } its_vinvall_cmd; + + struct { + struct its_vpe *vpe; + struct its_collection *col; + bool valid; + } its_vmapp_cmd; + + struct { + struct its_vpe *vpe; + struct its_device *dev; + u32 virt_id; + u32 event_id; + bool db_enabled; + } its_vmapti_cmd; + + struct { + struct its_vpe *vpe; + struct its_device *dev; + u32 event_id; + bool db_enabled; + } its_vmovi_cmd; + + struct { + struct its_vpe *vpe; + struct its_collection *col; + u16 seq_num; + u16 its_list; + } its_vmovp_cmd; + + struct { + struct its_vpe *vpe; + } its_invdb_cmd; + + struct { + struct its_vpe *vpe; + u8 sgi; + u8 priority; + bool enable; + bool group; + bool clear; + } its_vsgi_cmd; + }; +}; + +/* + * The ITS command block, which is what the ITS actually parses. + */ +struct its_cmd_block { + union { + u64 raw_cmd[4]; + __le64 raw_cmd_le[4]; + }; +}; + +#define ITS_CMD_QUEUE_SZ SZ_64K +#define ITS_CMD_QUEUE_NR_ENTRIES (ITS_CMD_QUEUE_SZ / sizeof(struct its_cmd_block)) + +typedef struct its_collection *(*its_cmd_builder_t)(struct its_node *, + struct its_cmd_block *, + struct its_cmd_desc *); + +typedef struct its_vpe *(*its_cmd_vbuilder_t)(struct its_node *, + struct its_cmd_block *, + struct its_cmd_desc *); + +static void its_mask_encode(u64 *raw_cmd, u64 val, int h, int l) +{ + u64 mask = GENMASK_ULL(h, l); + *raw_cmd &= ~mask; + *raw_cmd |= (val << l) & mask; +} + +static void its_encode_cmd(struct its_cmd_block *cmd, u8 cmd_nr) +{ + its_mask_encode(&cmd->raw_cmd[0], cmd_nr, 7, 0); +} + +static void its_encode_devid(struct its_cmd_block *cmd, u32 devid) +{ + its_mask_encode(&cmd->raw_cmd[0], devid, 63, 32); +} + +static void its_encode_event_id(struct its_cmd_block *cmd, u32 id) +{ + its_mask_encode(&cmd->raw_cmd[1], id, 31, 0); +} + +static void its_encode_phys_id(struct its_cmd_block *cmd, u32 phys_id) +{ + its_mask_encode(&cmd->raw_cmd[1], phys_id, 63, 32); +} + +static void its_encode_size(struct its_cmd_block *cmd, u8 size) +{ + its_mask_encode(&cmd->raw_cmd[1], size, 4, 0); +} + +static void its_encode_itt(struct its_cmd_block *cmd, u64 itt_addr) +{ + its_mask_encode(&cmd->raw_cmd[2], itt_addr >> 8, 51, 8); +} + +static void its_encode_valid(struct its_cmd_block *cmd, int valid) +{ + its_mask_encode(&cmd->raw_cmd[2], !!valid, 63, 63); +} + +static void its_encode_target(struct its_cmd_block *cmd, u64 target_addr) +{ + its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16); +} + +static void its_encode_collection(struct its_cmd_block *cmd, u16 col) +{ + its_mask_encode(&cmd->raw_cmd[2], col, 15, 0); +} + +static void its_encode_vpeid(struct its_cmd_block *cmd, u16 vpeid) +{ + its_mask_encode(&cmd->raw_cmd[1], vpeid, 47, 32); +} + +static void its_encode_virt_id(struct its_cmd_block *cmd, u32 virt_id) +{ + its_mask_encode(&cmd->raw_cmd[2], virt_id, 31, 0); +} + +static void its_encode_db_phys_id(struct its_cmd_block *cmd, u32 db_phys_id) +{ + its_mask_encode(&cmd->raw_cmd[2], db_phys_id, 63, 32); +} + +static void its_encode_db_valid(struct its_cmd_block *cmd, bool db_valid) +{ + its_mask_encode(&cmd->raw_cmd[2], db_valid, 0, 0); +} + +static void its_encode_seq_num(struct its_cmd_block *cmd, u16 seq_num) +{ + its_mask_encode(&cmd->raw_cmd[0], seq_num, 47, 32); +} + +static void its_encode_its_list(struct its_cmd_block *cmd, u16 its_list) +{ + its_mask_encode(&cmd->raw_cmd[1], its_list, 15, 0); +} + +static void its_encode_vpt_addr(struct its_cmd_block *cmd, u64 vpt_pa) +{ + its_mask_encode(&cmd->raw_cmd[3], vpt_pa >> 16, 51, 16); +} + +static void its_encode_vpt_size(struct its_cmd_block *cmd, u8 vpt_size) +{ + its_mask_encode(&cmd->raw_cmd[3], vpt_size, 4, 0); +} + +static void its_encode_vconf_addr(struct its_cmd_block *cmd, u64 vconf_pa) +{ + its_mask_encode(&cmd->raw_cmd[0], vconf_pa >> 16, 51, 16); +} + +static void its_encode_alloc(struct its_cmd_block *cmd, bool alloc) +{ + its_mask_encode(&cmd->raw_cmd[0], alloc, 8, 8); +} + +static void its_encode_ptz(struct its_cmd_block *cmd, bool ptz) +{ + its_mask_encode(&cmd->raw_cmd[0], ptz, 9, 9); +} + +static void its_encode_vmapp_default_db(struct its_cmd_block *cmd, + u32 vpe_db_lpi) +{ + its_mask_encode(&cmd->raw_cmd[1], vpe_db_lpi, 31, 0); +} + +static void its_encode_vmovp_default_db(struct its_cmd_block *cmd, + u32 vpe_db_lpi) +{ + its_mask_encode(&cmd->raw_cmd[3], vpe_db_lpi, 31, 0); +} + +static void its_encode_db(struct its_cmd_block *cmd, bool db) +{ + its_mask_encode(&cmd->raw_cmd[2], db, 63, 63); +} + +static void its_encode_sgi_intid(struct its_cmd_block *cmd, u8 sgi) +{ + its_mask_encode(&cmd->raw_cmd[0], sgi, 35, 32); +} + +static void its_encode_sgi_priority(struct its_cmd_block *cmd, u8 prio) +{ + its_mask_encode(&cmd->raw_cmd[0], prio >> 4, 23, 20); +} + +static void its_encode_sgi_group(struct its_cmd_block *cmd, bool grp) +{ + its_mask_encode(&cmd->raw_cmd[0], grp, 10, 10); +} + +static void its_encode_sgi_clear(struct its_cmd_block *cmd, bool clr) +{ + its_mask_encode(&cmd->raw_cmd[0], clr, 9, 9); +} + +static void its_encode_sgi_enable(struct its_cmd_block *cmd, bool en) +{ + its_mask_encode(&cmd->raw_cmd[0], en, 8, 8); +} + +static inline void its_fixup_cmd(struct its_cmd_block *cmd) +{ + /* Let's fixup BE commands */ + cmd->raw_cmd_le[0] = cpu_to_le64(cmd->raw_cmd[0]); + cmd->raw_cmd_le[1] = cpu_to_le64(cmd->raw_cmd[1]); + cmd->raw_cmd_le[2] = cpu_to_le64(cmd->raw_cmd[2]); + cmd->raw_cmd_le[3] = cpu_to_le64(cmd->raw_cmd[3]); +} + +static struct its_collection *its_build_mapd_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + unsigned long itt_addr; + u8 size = ilog2(desc->its_mapd_cmd.dev->nr_ites); + + itt_addr = virt_to_phys(desc->its_mapd_cmd.dev->itt); + itt_addr = ALIGN(itt_addr, ITS_ITT_ALIGN); + + its_encode_cmd(cmd, GITS_CMD_MAPD); + its_encode_devid(cmd, desc->its_mapd_cmd.dev->device_id); + its_encode_size(cmd, size - 1); + its_encode_itt(cmd, itt_addr); + its_encode_valid(cmd, desc->its_mapd_cmd.valid); + + its_fixup_cmd(cmd); + + return NULL; +} + +static struct its_collection *its_build_mapc_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + its_encode_cmd(cmd, GITS_CMD_MAPC); + its_encode_collection(cmd, desc->its_mapc_cmd.col->col_id); + its_encode_target(cmd, desc->its_mapc_cmd.col->target_address); + its_encode_valid(cmd, desc->its_mapc_cmd.valid); + + its_fixup_cmd(cmd); + + return desc->its_mapc_cmd.col; +} + +static struct its_collection *its_build_mapti_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_collection *col; + + col = dev_event_to_col(desc->its_mapti_cmd.dev, + desc->its_mapti_cmd.event_id); + col->col_id = col->col_id % 64; + + its_encode_cmd(cmd, GITS_CMD_MAPTI); + its_encode_devid(cmd, desc->its_mapti_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_mapti_cmd.event_id); + its_encode_phys_id(cmd, desc->its_mapti_cmd.phys_id); + its_encode_collection(cmd, col->col_id); + + its_fixup_cmd(cmd); + + return valid_col(col); +} + +static struct its_collection *its_build_movi_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_collection *col; + + col = dev_event_to_col(desc->its_movi_cmd.dev, + desc->its_movi_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_MOVI); + its_encode_devid(cmd, desc->its_movi_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_movi_cmd.event_id); + its_encode_collection(cmd, desc->its_movi_cmd.col->col_id); + + its_fixup_cmd(cmd); + + return valid_col(col); +} + +static struct its_collection *its_build_discard_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_collection *col; + + col = dev_event_to_col(desc->its_discard_cmd.dev, + desc->its_discard_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_DISCARD); + its_encode_devid(cmd, desc->its_discard_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_discard_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_col(col); +} + +static struct its_collection *its_build_inv_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_collection *col; + + col = dev_event_to_col(desc->its_inv_cmd.dev, + desc->its_inv_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_INV); + its_encode_devid(cmd, desc->its_inv_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_inv_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_col(col); +} + +static struct its_collection *its_build_int_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_collection *col; + + col = dev_event_to_col(desc->its_int_cmd.dev, + desc->its_int_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_INT); + its_encode_devid(cmd, desc->its_int_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_int_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_col(col); +} + +static struct its_collection *its_build_clear_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_collection *col; + + col = dev_event_to_col(desc->its_clear_cmd.dev, + desc->its_clear_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_CLEAR); + its_encode_devid(cmd, desc->its_clear_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_clear_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_col(col); +} + +static struct its_collection *its_build_invall_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + its_encode_cmd(cmd, GITS_CMD_INVALL); + its_encode_collection(cmd, desc->its_invall_cmd.col->col_id); + + its_fixup_cmd(cmd); + + return NULL; +} + +static struct its_vpe *its_build_vinvall_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + its_encode_cmd(cmd, GITS_CMD_VINVALL); + its_encode_vpeid(cmd, desc->its_vinvall_cmd.vpe->vpe_id); + + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_vinvall_cmd.vpe); +} + +static struct its_vpe *its_build_vmapp_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + unsigned long vpt_addr, vconf_addr; + u64 target; + bool alloc; + + its_encode_cmd(cmd, GITS_CMD_VMAPP); + its_encode_vpeid(cmd, desc->its_vmapp_cmd.vpe->vpe_id); + its_encode_valid(cmd, desc->its_vmapp_cmd.valid); + + if (!desc->its_vmapp_cmd.valid) { + if (is_v4_1(its)) { + alloc = !atomic_dec_return(&desc->its_vmapp_cmd.vpe->vmapp_count); + its_encode_alloc(cmd, alloc); + } + + goto out; + } + + vpt_addr = virt_to_phys(page_address(desc->its_vmapp_cmd.vpe->vpt_page)); + target = desc->its_vmapp_cmd.col->target_address + its->vlpi_redist_offset; + + its_encode_target(cmd, target); + its_encode_vpt_addr(cmd, vpt_addr); + its_encode_vpt_size(cmd, LPI_NRBITS - 1); + + if (!is_v4_1(its)) + goto out; + + vconf_addr = virt_to_phys(page_address(desc->its_vmapp_cmd.vpe->its_vm->vprop_page)); + + alloc = !atomic_fetch_inc(&desc->its_vmapp_cmd.vpe->vmapp_count); + + its_encode_alloc(cmd, alloc); + + /* We can only signal PTZ when alloc==1. Why do we have two bits? */ + its_encode_ptz(cmd, alloc); + its_encode_vconf_addr(cmd, vconf_addr); + its_encode_vmapp_default_db(cmd, desc->its_vmapp_cmd.vpe->vpe_db_lpi); + +out: + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_vmapp_cmd.vpe); +} + +static struct its_vpe *its_build_vmapti_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + u32 db; + + if (!is_v4_1(its) && desc->its_vmapti_cmd.db_enabled) + db = desc->its_vmapti_cmd.vpe->vpe_db_lpi; + else + db = 1023; + + its_encode_cmd(cmd, GITS_CMD_VMAPTI); + its_encode_devid(cmd, desc->its_vmapti_cmd.dev->device_id); + its_encode_vpeid(cmd, desc->its_vmapti_cmd.vpe->vpe_id); + its_encode_event_id(cmd, desc->its_vmapti_cmd.event_id); + its_encode_db_phys_id(cmd, db); + its_encode_virt_id(cmd, desc->its_vmapti_cmd.virt_id); + + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_vmapti_cmd.vpe); +} + +static struct its_vpe *its_build_vmovi_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + u32 db; + + if (!is_v4_1(its) && desc->its_vmovi_cmd.db_enabled) + db = desc->its_vmovi_cmd.vpe->vpe_db_lpi; + else + db = 1023; + + its_encode_cmd(cmd, GITS_CMD_VMOVI); + its_encode_devid(cmd, desc->its_vmovi_cmd.dev->device_id); + its_encode_vpeid(cmd, desc->its_vmovi_cmd.vpe->vpe_id); + its_encode_event_id(cmd, desc->its_vmovi_cmd.event_id); + its_encode_db_phys_id(cmd, db); + its_encode_db_valid(cmd, true); + + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_vmovi_cmd.vpe); +} + +static struct its_vpe *its_build_vmovp_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + u64 target; + + target = desc->its_vmovp_cmd.col->target_address + its->vlpi_redist_offset; + its_encode_cmd(cmd, GITS_CMD_VMOVP); + its_encode_seq_num(cmd, desc->its_vmovp_cmd.seq_num); + its_encode_its_list(cmd, desc->its_vmovp_cmd.its_list); + its_encode_vpeid(cmd, desc->its_vmovp_cmd.vpe->vpe_id); + its_encode_target(cmd, target); + + if (is_v4_1(its)) { + its_encode_db(cmd, true); + its_encode_vmovp_default_db(cmd, desc->its_vmovp_cmd.vpe->vpe_db_lpi); + } + + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_vmovp_cmd.vpe); +} + +static struct its_vpe *its_build_vinv_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_vlpi_map *map; + + map = dev_event_to_vlpi_map(desc->its_inv_cmd.dev, + desc->its_inv_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_INV); + its_encode_devid(cmd, desc->its_inv_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_inv_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_vpe(its, map->vpe); +} + +static struct its_vpe *its_build_vint_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_vlpi_map *map; + + map = dev_event_to_vlpi_map(desc->its_int_cmd.dev, + desc->its_int_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_INT); + its_encode_devid(cmd, desc->its_int_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_int_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_vpe(its, map->vpe); +} + +static struct its_vpe *its_build_vclear_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + struct its_vlpi_map *map; + + map = dev_event_to_vlpi_map(desc->its_clear_cmd.dev, + desc->its_clear_cmd.event_id); + + its_encode_cmd(cmd, GITS_CMD_CLEAR); + its_encode_devid(cmd, desc->its_clear_cmd.dev->device_id); + its_encode_event_id(cmd, desc->its_clear_cmd.event_id); + + its_fixup_cmd(cmd); + + return valid_vpe(its, map->vpe); +} + +static struct its_vpe *its_build_invdb_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + if (WARN_ON(!is_v4_1(its))) + return NULL; + + its_encode_cmd(cmd, GITS_CMD_INVDB); + its_encode_vpeid(cmd, desc->its_invdb_cmd.vpe->vpe_id); + + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_invdb_cmd.vpe); +} + +static struct its_vpe *its_build_vsgi_cmd(struct its_node *its, + struct its_cmd_block *cmd, + struct its_cmd_desc *desc) +{ + if (WARN_ON(!is_v4_1(its))) + return NULL; + + its_encode_cmd(cmd, GITS_CMD_VSGI); + its_encode_vpeid(cmd, desc->its_vsgi_cmd.vpe->vpe_id); + its_encode_sgi_intid(cmd, desc->its_vsgi_cmd.sgi); + its_encode_sgi_priority(cmd, desc->its_vsgi_cmd.priority); + its_encode_sgi_group(cmd, desc->its_vsgi_cmd.group); + its_encode_sgi_clear(cmd, desc->its_vsgi_cmd.clear); + its_encode_sgi_enable(cmd, desc->its_vsgi_cmd.enable); + + its_fixup_cmd(cmd); + + return valid_vpe(its, desc->its_vsgi_cmd.vpe); +} + +static u64 its_cmd_ptr_to_offset(struct its_node *its, + struct its_cmd_block *ptr) +{ + return (ptr - its->cmd_base) * sizeof(*ptr); +} + +static int its_queue_full(struct its_node *its) +{ + int widx; + int ridx; + + widx = its->cmd_write - its->cmd_base; + ridx = readl_relaxed(its->base + GITS_CREADR) / sizeof(struct its_cmd_block); + + /* This is incredibly unlikely to happen, unless the ITS locks up. */ + if (((widx + 1) % ITS_CMD_QUEUE_NR_ENTRIES) == ridx) + return 1; + + return 0; +} + +static struct its_cmd_block *its_allocate_entry(struct its_node *its) +{ + struct its_cmd_block *cmd; + u32 count = 1000000; /* 1s! */ + + while (its_queue_full(its)) { + count--; + if (!count) { + pr_err_ratelimited("ITS queue not draining\n"); + return NULL; + } + cpu_relax(); + udelay(1); + } + + cmd = its->cmd_write++; + + /* Handle queue wrapping */ + if (its->cmd_write == (its->cmd_base + ITS_CMD_QUEUE_NR_ENTRIES)) + its->cmd_write = its->cmd_base; + + /* Clear command */ + cmd->raw_cmd[0] = 0; + cmd->raw_cmd[1] = 0; + cmd->raw_cmd[2] = 0; + cmd->raw_cmd[3] = 0; + + return cmd; +} + +static struct its_cmd_block *its_post_commands(struct its_node *its) +{ + u64 wr = its_cmd_ptr_to_offset(its, its->cmd_write); + + writel_relaxed(wr, its->base + GITS_CWRITER); + + return its->cmd_write; +} + +static void its_flush_cmd(struct its_node *its, struct its_cmd_block *cmd) +{ + /* + * Make sure the commands written to memory are observable by + * the ITS. + */ + if (its->flags & ITS_FLAGS_CMDQ_NEEDS_FLUSHING) + gic_flush_dcache_to_poc(cmd, sizeof(*cmd)); + else + dsb(ishst); +} + +static int its_wait_for_range_completion(struct its_node *its, + u64 prev_idx, + struct its_cmd_block *to) +{ + u64 rd_idx, to_idx, linear_idx; + u32 count = 1000000; /* 1s! */ + + /* Linearize to_idx if the command set has wrapped around */ + to_idx = its_cmd_ptr_to_offset(its, to); + if (to_idx < prev_idx) + to_idx += ITS_CMD_QUEUE_SZ; + + linear_idx = prev_idx; + + while (1) { + s64 delta; + + rd_idx = readl_relaxed(its->base + GITS_CREADR); + + /* + * Compute the read pointer progress, taking the + * potential wrap-around into account. + */ + delta = rd_idx - prev_idx; + if (rd_idx < prev_idx) + delta += ITS_CMD_QUEUE_SZ; + + linear_idx += delta; + if (linear_idx >= to_idx) + break; + + count--; + if (!count) { + pr_err_ratelimited("ITS queue timeout (%llu %llu)\n", + to_idx, linear_idx); + return -1; + } + prev_idx = rd_idx; + cpu_relax(); + udelay(1); + } + + return 0; +} + +/* Warning, macro hell follows */ +#define BUILD_SINGLE_CMD_FUNC(name, buildtype, synctype, buildfn) \ +void name(struct its_node *its, \ + buildtype builder, \ + struct its_cmd_desc *desc) \ +{ \ + struct its_cmd_block *cmd, *sync_cmd, *next_cmd; \ + synctype *sync_obj; \ + unsigned long flags; \ + u64 rd_idx; \ + \ + raw_spin_lock_irqsave(&its->lock, flags); \ + \ + cmd = its_allocate_entry(its); \ + if (!cmd) { /* We're soooooo screewed... */ \ + raw_spin_unlock_irqrestore(&its->lock, flags); \ + return; \ + } \ + sync_obj = builder(its, cmd, desc); \ + its_flush_cmd(its, cmd); \ + \ + if (sync_obj) { \ + sync_cmd = its_allocate_entry(its); \ + if (!sync_cmd) \ + goto post; \ + \ + buildfn(its, sync_cmd, sync_obj); \ + its_flush_cmd(its, sync_cmd); \ + } \ + \ +post: \ + rd_idx = readl_relaxed(its->base + GITS_CREADR); \ + next_cmd = its_post_commands(its); \ + raw_spin_unlock_irqrestore(&its->lock, flags); \ + \ + if (its_wait_for_range_completion(its, rd_idx, next_cmd)) \ + pr_err_ratelimited("ITS cmd %ps failed\n", builder); \ +} + +static void its_build_sync_cmd(struct its_node *its, + struct its_cmd_block *sync_cmd, + struct its_collection *sync_col) +{ + its_encode_cmd(sync_cmd, GITS_CMD_SYNC); + its_encode_target(sync_cmd, sync_col->target_address); + + its_fixup_cmd(sync_cmd); +} + +static BUILD_SINGLE_CMD_FUNC(its_send_single_command, its_cmd_builder_t, + struct its_collection, its_build_sync_cmd) + +static void its_build_vsync_cmd(struct its_node *its, + struct its_cmd_block *sync_cmd, + struct its_vpe *sync_vpe) +{ + its_encode_cmd(sync_cmd, GITS_CMD_VSYNC); + its_encode_vpeid(sync_cmd, sync_vpe->vpe_id); + + its_fixup_cmd(sync_cmd); +} + +static BUILD_SINGLE_CMD_FUNC(its_send_single_vcommand, its_cmd_vbuilder_t, + struct its_vpe, its_build_vsync_cmd) + +static void its_send_int(struct its_device *dev, u32 event_id) +{ + struct its_cmd_desc desc; + + desc.its_int_cmd.dev = dev; + desc.its_int_cmd.event_id = event_id; + + its_send_single_command(dev->its, its_build_int_cmd, &desc); +} + +static void its_send_clear(struct its_device *dev, u32 event_id) +{ + struct its_cmd_desc desc; + + desc.its_clear_cmd.dev = dev; + desc.its_clear_cmd.event_id = event_id; + + its_send_single_command(dev->its, its_build_clear_cmd, &desc); +} + +static void its_send_inv(struct its_device *dev, u32 event_id) +{ + struct its_cmd_desc desc; + + desc.its_inv_cmd.dev = dev; + desc.its_inv_cmd.event_id = event_id; + + its_send_single_command(dev->its, its_build_inv_cmd, &desc); +} + +static void its_send_mapd(struct its_device *dev, int valid) +{ + struct its_cmd_desc desc; + + desc.its_mapd_cmd.dev = dev; + desc.its_mapd_cmd.valid = !!valid; + + its_send_single_command(dev->its, its_build_mapd_cmd, &desc); +} + +static void its_send_mapc(struct its_node *its, struct its_collection *col, + int valid) +{ + struct its_cmd_desc desc; + + desc.its_mapc_cmd.col = col; + desc.its_mapc_cmd.valid = !!valid; + + its_send_single_command(its, its_build_mapc_cmd, &desc); +} + +static void its_send_mapti(struct its_device *dev, u32 irq_id, u32 id) +{ + struct its_cmd_desc desc; + + desc.its_mapti_cmd.dev = dev; + desc.its_mapti_cmd.phys_id = irq_id; + desc.its_mapti_cmd.event_id = id; + + its_send_single_command(dev->its, its_build_mapti_cmd, &desc); +} + +static void its_send_movi(struct its_device *dev, + struct its_collection *col, u32 id) +{ + struct its_cmd_desc desc; + + desc.its_movi_cmd.dev = dev; + desc.its_movi_cmd.col = col; + desc.its_movi_cmd.event_id = id; + + its_send_single_command(dev->its, its_build_movi_cmd, &desc); +} + +static void its_send_discard(struct its_device *dev, u32 id) +{ + struct its_cmd_desc desc; + + desc.its_discard_cmd.dev = dev; + desc.its_discard_cmd.event_id = id; + + its_send_single_command(dev->its, its_build_discard_cmd, &desc); +} + +static void its_send_invall(struct its_node *its, struct its_collection *col) +{ + struct its_cmd_desc desc; + + desc.its_invall_cmd.col = col; + + its_send_single_command(its, its_build_invall_cmd, &desc); +} + +static void its_send_vmapti(struct its_device *dev, u32 id) +{ + struct its_vlpi_map *map = dev_event_to_vlpi_map(dev, id); + struct its_cmd_desc desc; + + desc.its_vmapti_cmd.vpe = map->vpe; + desc.its_vmapti_cmd.dev = dev; + desc.its_vmapti_cmd.virt_id = map->vintid; + desc.its_vmapti_cmd.event_id = id; + desc.its_vmapti_cmd.db_enabled = map->db_enabled; + + its_send_single_vcommand(dev->its, its_build_vmapti_cmd, &desc); +} + +static void its_send_vmovi(struct its_device *dev, u32 id) +{ + struct its_vlpi_map *map = dev_event_to_vlpi_map(dev, id); + struct its_cmd_desc desc; + + desc.its_vmovi_cmd.vpe = map->vpe; + desc.its_vmovi_cmd.dev = dev; + desc.its_vmovi_cmd.event_id = id; + desc.its_vmovi_cmd.db_enabled = map->db_enabled; + + its_send_single_vcommand(dev->its, its_build_vmovi_cmd, &desc); +} + +static void its_send_vmapp(struct its_node *its, + struct its_vpe *vpe, bool valid) +{ + struct its_cmd_desc desc; + + desc.its_vmapp_cmd.vpe = vpe; + desc.its_vmapp_cmd.valid = valid; + desc.its_vmapp_cmd.col = &its->collections[vpe->col_idx]; + + its_send_single_vcommand(its, its_build_vmapp_cmd, &desc); +} + +static void its_send_vmovp(struct its_vpe *vpe) +{ + struct its_cmd_desc desc = {}; + struct its_node *its; + unsigned long flags; + int col_id = vpe->col_idx; + + desc.its_vmovp_cmd.vpe = vpe; + + if (!its_list_map) { + its = list_first_entry(&its_nodes, struct its_node, entry); + desc.its_vmovp_cmd.col = &its->collections[col_id]; + its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); + return; + } + + /* + * Yet another marvel of the architecture. If using the + * its_list "feature", we need to make sure that all ITSs + * receive all VMOVP commands in the same order. The only way + * to guarantee this is to make vmovp a serialization point. + * + * Wall <-- Head. + */ + raw_spin_lock_irqsave(&vmovp_lock, flags); + + desc.its_vmovp_cmd.seq_num = vmovp_seq_num++; + desc.its_vmovp_cmd.its_list = get_its_list(vpe->its_vm); + + /* Emit VMOVPs */ + list_for_each_entry(its, &its_nodes, entry) { + if (!is_v4(its)) + continue; + + if (!require_its_list_vmovp(vpe->its_vm, its)) + continue; + + desc.its_vmovp_cmd.col = &its->collections[col_id]; + its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); + } + + raw_spin_unlock_irqrestore(&vmovp_lock, flags); +} + +static void its_send_vinvall(struct its_node *its, struct its_vpe *vpe) +{ + struct its_cmd_desc desc; + + desc.its_vinvall_cmd.vpe = vpe; + its_send_single_vcommand(its, its_build_vinvall_cmd, &desc); +} + +static void its_send_vinv(struct its_device *dev, u32 event_id) +{ + struct its_cmd_desc desc; + + /* + * There is no real VINV command. This is just a normal INV, + * with a VSYNC instead of a SYNC. + */ + desc.its_inv_cmd.dev = dev; + desc.its_inv_cmd.event_id = event_id; + + its_send_single_vcommand(dev->its, its_build_vinv_cmd, &desc); +} + +static void its_send_vint(struct its_device *dev, u32 event_id) +{ + struct its_cmd_desc desc; + + /* + * There is no real VINT command. This is just a normal INT, + * with a VSYNC instead of a SYNC. + */ + desc.its_int_cmd.dev = dev; + desc.its_int_cmd.event_id = event_id; + + its_send_single_vcommand(dev->its, its_build_vint_cmd, &desc); +} + +static void its_send_vclear(struct its_device *dev, u32 event_id) +{ + struct its_cmd_desc desc; + + /* + * There is no real VCLEAR command. This is just a normal CLEAR, + * with a VSYNC instead of a SYNC. + */ + desc.its_clear_cmd.dev = dev; + desc.its_clear_cmd.event_id = event_id; + + its_send_single_vcommand(dev->its, its_build_vclear_cmd, &desc); +} + +static void its_send_invdb(struct its_node *its, struct its_vpe *vpe) +{ + struct its_cmd_desc desc; + + desc.its_invdb_cmd.vpe = vpe; + its_send_single_vcommand(its, its_build_invdb_cmd, &desc); +} + +/* + * irqchip functions - assumes MSI, mostly. + */ +static void lpi_write_config(struct irq_data *d, u8 clr, u8 set) +{ + struct its_vlpi_map *map = get_vlpi_map(d); + irq_hw_number_t hwirq; + void *va; + u8 *cfg; + + if (map) { + va = page_address(map->vm->vprop_page); + hwirq = map->vintid; + + /* Remember the updated property */ + map->properties &= ~clr; + map->properties |= set | LPI_PROP_GROUP1; + } else { + va = gic_rdists->prop_table_va; + hwirq = d->hwirq; + } + + cfg = va + hwirq - 8192; + *cfg &= ~clr; + *cfg |= set | LPI_PROP_GROUP1; + + /* + * Make the above write visible to the redistributors. + * And yes, we're flushing exactly: One. Single. Byte. + * Humpf... + */ + if (gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING) + gic_flush_dcache_to_poc(cfg, sizeof(*cfg)); + else + dsb(ishst); +} + +static void wait_for_syncr(void __iomem *rdbase) +{ + while (readl_relaxed(rdbase + GICR_SYNCR) & 1) + cpu_relax(); +} + +static void direct_lpi_inv(struct irq_data *d) +{ + struct its_vlpi_map *map = get_vlpi_map(d); + void __iomem *rdbase; + unsigned long flags; + u64 val; + int cpu; + + if (map) { + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + + WARN_ON(!is_v4_1(its_dev->its)); + + val = GICR_INVLPIR_V; + val |= FIELD_PREP(GICR_INVLPIR_VPEID, map->vpe->vpe_id); + val |= FIELD_PREP(GICR_INVLPIR_INTID, map->vintid); + } else { + val = d->hwirq; + } + + /* Target the redistributor this LPI is currently routed to */ + cpu = irq_to_cpuid_lock(d, &flags); + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock); + rdbase = per_cpu_ptr(gic_rdists->rdist, cpu)->rd_base; + gic_write_lpir(val, rdbase + GICR_INVLPIR); + + wait_for_syncr(rdbase); + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock); + irq_to_cpuid_unlock(d, flags); +} + +static void lpi_update_config(struct irq_data *d, u8 clr, u8 set) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + + lpi_write_config(d, clr, set); + if (gic_rdists->has_direct_lpi && + (is_v4_1(its_dev->its) || !irqd_is_forwarded_to_vcpu(d))) + direct_lpi_inv(d); + else if (!irqd_is_forwarded_to_vcpu(d)) + its_send_inv(its_dev, its_get_event_id(d)); + else + its_send_vinv(its_dev, its_get_event_id(d)); +} + +static void its_vlpi_set_doorbell(struct irq_data *d, bool enable) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + struct its_vlpi_map *map; + + /* + * GICv4.1 does away with the per-LPI nonsense, nothing to do + * here. + */ + if (is_v4_1(its_dev->its)) + return; + + map = dev_event_to_vlpi_map(its_dev, event); + + if (map->db_enabled == enable) + return; + + map->db_enabled = enable; + + /* + * More fun with the architecture: + * + * Ideally, we'd issue a VMAPTI to set the doorbell to its LPI + * value or to 1023, depending on the enable bit. But that + * would be issueing a mapping for an /existing/ DevID+EventID + * pair, which is UNPREDICTABLE. Instead, let's issue a VMOVI + * to the /same/ vPE, using this opportunity to adjust the + * doorbell. Mouahahahaha. We loves it, Precious. + */ + its_send_vmovi(its_dev, event); +} + +static void its_mask_irq(struct irq_data *d) +{ + if (irqd_is_forwarded_to_vcpu(d)) + its_vlpi_set_doorbell(d, false); + + lpi_update_config(d, LPI_PROP_ENABLED, 0); +} + +static void its_unmask_irq(struct irq_data *d) +{ + if (irqd_is_forwarded_to_vcpu(d)) + its_vlpi_set_doorbell(d, true); + + lpi_update_config(d, 0, LPI_PROP_ENABLED); +} + +static __maybe_unused u32 its_read_lpi_count(struct irq_data *d, int cpu) +{ + if (irqd_affinity_is_managed(d)) + return atomic_read(&per_cpu_ptr(&cpu_lpi_count_ft2500, cpu)->managed); + + return atomic_read(&per_cpu_ptr(&cpu_lpi_count_ft2500, cpu)->unmanaged); +} + +static void its_inc_lpi_count(struct irq_data *d, int cpu) +{ + if (irqd_affinity_is_managed(d)) + atomic_inc(&per_cpu_ptr(&cpu_lpi_count_ft2500, cpu)->managed); + else + atomic_inc(&per_cpu_ptr(&cpu_lpi_count_ft2500, cpu)->unmanaged); +} + +static void its_dec_lpi_count(struct irq_data *d, int cpu) +{ + if (irqd_affinity_is_managed(d)) + atomic_dec(&per_cpu_ptr(&cpu_lpi_count_ft2500, cpu)->managed); + else + atomic_dec(&per_cpu_ptr(&cpu_lpi_count_ft2500, cpu)->unmanaged); +} + +static unsigned int cpumask_pick_least_loaded(struct irq_data *d, + const struct cpumask *cpu_mask) +{ + unsigned int cpu = nr_cpu_ids, tmp; + int count = S32_MAX; + + for_each_cpu(tmp, cpu_mask) { + int this_count = its_read_lpi_count(d, tmp); + if (this_count < count) { + cpu = tmp; + count = this_count; + } + } + + return cpu; +} + +/* + * As suggested by Thomas Gleixner in: + * https://lore.kernel.org/r/87h80q2aoc.fsf@nanos.tec.linutronix.de + */ +static int its_select_cpu(struct irq_data *d, + const struct cpumask *aff_mask) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + cpumask_var_t tmpmask; + int cpu, node; + + if (!alloc_cpumask_var(&tmpmask, GFP_ATOMIC)) + return -ENOMEM; + + node = its_dev->its->numa_node; + + if (!irqd_affinity_is_managed(d)) { + /* First try the NUMA node */ + if (node != NUMA_NO_NODE) { + /* + * Try the intersection of the affinity mask and the + * node mask (and the online mask, just to be safe). + */ + cpumask_and(tmpmask, cpumask_of_node(node), aff_mask); + cpumask_and(tmpmask, tmpmask, cpu_online_mask); + + /* + * Ideally, we would check if the mask is empty, and + * try again on the full node here. + * + * But it turns out that the way ACPI describes the + * affinity for ITSs only deals about memory, and + * not target CPUs, so it cannot describe a single + * ITS placed next to two NUMA nodes. + * + * Instead, just fallback on the online mask. This + * diverges from Thomas' suggestion above. + */ + cpu = cpumask_pick_least_loaded(d, tmpmask); + if (cpu < nr_cpu_ids) + goto out; + + /* If we can't cross sockets, give up */ + if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)) + goto out; + + /* If the above failed, expand the search */ + } + + /* Try the intersection of the affinity and online masks */ + cpumask_and(tmpmask, aff_mask, cpu_online_mask); + + /* If that doesn't fly, the online mask is the last resort */ + if (cpumask_empty(tmpmask)) + cpumask_copy(tmpmask, cpu_online_mask); + + cpu = cpumask_pick_least_loaded(d, tmpmask); + } else { + cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask); + + /* If we cannot cross sockets, limit the search to that node */ + if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) && + node != NUMA_NO_NODE) + cpumask_and(tmpmask, tmpmask, cpumask_of_node(node)); + + cpu = cpumask_pick_least_loaded(d, tmpmask); + } +out: + free_cpumask_var(tmpmask); + + pr_debug("IRQ%d -> %*pbl CPU%d\n", d->irq, cpumask_pr_args(aff_mask), cpu); + return cpu; +} + +#define MAX_MARS3_SKT_COUNT 8 + +static int its_cpumask_select(struct its_device *its_dev, + const struct cpumask *mask_val, + const struct cpumask *cpu_mask) +{ + unsigned int skt, skt_id, i; + phys_addr_t its_phys_base; + unsigned int cpu, cpus = 0; + + unsigned int skt_cpu_cnt[MAX_MARS3_SKT_COUNT] = {0}; + + for (i = 0; i < nr_cpu_ids; i++) { + skt = (cpu_logical_map(i) >> 16) & 0xff; + if ((skt >= 0) && (skt < MAX_MARS3_SKT_COUNT)) { + skt_cpu_cnt[skt]++; + } else if (0xff != skt) { + pr_err("socket address: %d is out of range.", skt); + } + } + + its_phys_base = its_dev->its->phys_base; + skt_id = (its_phys_base >> 41) & 0x7; + + if (0 != skt_id) { + for (i = 0; i < skt_id; i++) { + cpus += skt_cpu_cnt[i]; + } + } + + cpu = cpumask_any_and(mask_val, cpu_mask); + if ((cpu > cpus) && (cpu < (cpus + skt_cpu_cnt[skt_id]))) { + cpus = cpu; + } + + return cpus; +} + +static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val, + bool force) +{ + unsigned int cpu; + const struct cpumask *cpu_mask = cpu_online_mask; + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + struct its_collection *target_col; + u32 id = its_get_event_id(d); + int prev_cpu; + unsigned int skt_t1, skt_t2, cpu_idx; + + /* A forwarded interrupt should use irq_set_vcpu_affinity */ + if (irqd_is_forwarded_to_vcpu(d)) + return -EINVAL; + + prev_cpu = its_dev->event_map.col_map[id]; + its_dec_lpi_count(d, prev_cpu); + + cpu_idx = its_cpumask_select(its_dev, mask_val, cpu_mask); + skt_t1 = (cpu_logical_map(cpu_idx) >> 16) & 0xff; + if (!force) + cpu = its_select_cpu(d, mask_val); + else + cpu = cpumask_pick_least_loaded(d, mask_val); + skt_t2 = (cpu_logical_map(cpu) >> 16) & 0xff; + if (skt_t1 != skt_t2) + cpu = cpu_idx; + + if (cpu < 0 || cpu >= nr_cpu_ids) + goto err; + + /* don't set the affinity when the target cpu is same as current one */ + if (cpu != prev_cpu) { + target_col = &its_dev->its->collections[cpu]; + its_send_movi(its_dev, target_col, id); + its_dev->event_map.col_map[id] = cpu; + irq_data_update_effective_affinity(d, cpumask_of(cpu)); + } + + its_inc_lpi_count(d, cpu); + + return IRQ_SET_MASK_OK_DONE; + +err: + its_inc_lpi_count(d, prev_cpu); + return -EINVAL; +} + +static u64 its_irq_get_msi_base(struct its_device *its_dev) +{ + struct its_node *its = its_dev->its; + + return its->phys_base + GITS_TRANSLATER; +} + +static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + struct its_node *its; + u64 addr; + + its = its_dev->its; + addr = its->get_msi_base(its_dev); + + msg->address_lo = lower_32_bits(addr); + msg->address_hi = upper_32_bits(addr); + msg->data = its_get_event_id(d); +} + +static int its_irq_set_irqchip_state(struct irq_data *d, + enum irqchip_irq_state which, + bool state) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + + if (which != IRQCHIP_STATE_PENDING) + return -EINVAL; + + if (irqd_is_forwarded_to_vcpu(d)) { + if (state) + its_send_vint(its_dev, event); + else + its_send_vclear(its_dev, event); + } else { + if (state) + its_send_int(its_dev, event); + else + its_send_clear(its_dev, event); + } + + return 0; +} + +static int its_irq_retrigger(struct irq_data *d) +{ + return !its_irq_set_irqchip_state(d, IRQCHIP_STATE_PENDING, true); +} + +/* + * Two favourable cases: + * + * (a) Either we have a GICv4.1, and all vPEs have to be mapped at all times + * for vSGI delivery + * + * (b) Or the ITSs do not use a list map, meaning that VMOVP is cheap enough + * and we're better off mapping all VPEs always + * + * If neither (a) nor (b) is true, then we map vPEs on demand. + * + */ +static bool gic_requires_eager_mapping(void) +{ + if (!its_list_map || gic_rdists->has_rvpeid) + return true; + + return false; +} + +static void its_map_vm(struct its_node *its, struct its_vm *vm) +{ + unsigned long flags; + + if (gic_requires_eager_mapping()) + return; + + raw_spin_lock_irqsave(&vmovp_lock, flags); + + /* + * If the VM wasn't mapped yet, iterate over the vpes and get + * them mapped now. + */ + vm->vlpi_count[its->list_nr]++; + + if (vm->vlpi_count[its->list_nr] == 1) { + int i; + + for (i = 0; i < vm->nr_vpes; i++) { + struct its_vpe *vpe = vm->vpes[i]; + struct irq_data *d = irq_get_irq_data(vpe->irq); + + /* Map the VPE to the first possible CPU */ + vpe->col_idx = cpumask_first(cpu_online_mask); + its_send_vmapp(its, vpe, true); + its_send_vinvall(its, vpe); + irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx)); + } + } + + raw_spin_unlock_irqrestore(&vmovp_lock, flags); +} + +static void its_unmap_vm(struct its_node *its, struct its_vm *vm) +{ + unsigned long flags; + + /* Not using the ITS list? Everything is always mapped. */ + if (gic_requires_eager_mapping()) + return; + + raw_spin_lock_irqsave(&vmovp_lock, flags); + + if (!--vm->vlpi_count[its->list_nr]) { + int i; + + for (i = 0; i < vm->nr_vpes; i++) + its_send_vmapp(its, vm->vpes[i], false); + } + + raw_spin_unlock_irqrestore(&vmovp_lock, flags); +} + +static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + int ret = 0; + + if (!info->map) + return -EINVAL; + + raw_spin_lock(&its_dev->event_map.vlpi_lock); + + if (!its_dev->event_map.vm) { + struct its_vlpi_map *maps; + + maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps), + GFP_ATOMIC); + if (!maps) { + ret = -ENOMEM; + goto out; + } + + its_dev->event_map.vm = info->map->vm; + its_dev->event_map.vlpi_maps = maps; + } else if (its_dev->event_map.vm != info->map->vm) { + ret = -EINVAL; + goto out; + } + + /* Get our private copy of the mapping information */ + its_dev->event_map.vlpi_maps[event] = *info->map; + + if (irqd_is_forwarded_to_vcpu(d)) { + /* Already mapped, move it around */ + its_send_vmovi(its_dev, event); + } else { + /* Ensure all the VPEs are mapped on this ITS */ + its_map_vm(its_dev->its, info->map->vm); + + /* + * Flag the interrupt as forwarded so that we can + * start poking the virtual property table. + */ + irqd_set_forwarded_to_vcpu(d); + + /* Write out the property to the prop table */ + lpi_write_config(d, 0xff, info->map->properties); + + /* Drop the physical mapping */ + its_send_discard(its_dev, event); + + /* and install the virtual one */ + its_send_vmapti(its_dev, event); + + /* Increment the number of VLPIs */ + its_dev->event_map.nr_vlpis++; + } + +out: + raw_spin_unlock(&its_dev->event_map.vlpi_lock); + return ret; +} + +static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + struct its_vlpi_map *map; + int ret = 0; + + raw_spin_lock(&its_dev->event_map.vlpi_lock); + + map = get_vlpi_map(d); + + if (!its_dev->event_map.vm || !map) { + ret = -EINVAL; + goto out; + } + + /* Copy our mapping information to the incoming request */ + *info->map = *map; + +out: + raw_spin_unlock(&its_dev->event_map.vlpi_lock); + return ret; +} + +static int its_vlpi_unmap(struct irq_data *d) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + int ret = 0; + + raw_spin_lock(&its_dev->event_map.vlpi_lock); + + if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) { + ret = -EINVAL; + goto out; + } + + /* Drop the virtual mapping */ + its_send_discard(its_dev, event); + + /* and restore the physical one */ + irqd_clr_forwarded_to_vcpu(d); + its_send_mapti(its_dev, d->hwirq, event); + lpi_update_config(d, 0xff, (LPI_PROP_DEFAULT_PRIO | + LPI_PROP_ENABLED | + LPI_PROP_GROUP1)); + + /* Potentially unmap the VM from this ITS */ + its_unmap_vm(its_dev->its, its_dev->event_map.vm); + + /* + * Drop the refcount and make the device available again if + * this was the last VLPI. + */ + if (!--its_dev->event_map.nr_vlpis) { + its_dev->event_map.vm = NULL; + kfree(its_dev->event_map.vlpi_maps); + } + +out: + raw_spin_unlock(&its_dev->event_map.vlpi_lock); + return ret; +} + +static int its_vlpi_prop_update(struct irq_data *d, struct its_cmd_info *info) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + + if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) + return -EINVAL; + + if (info->cmd_type == PROP_UPDATE_AND_INV_VLPI) + lpi_update_config(d, 0xff, info->config); + else + lpi_write_config(d, 0xff, info->config); + its_vlpi_set_doorbell(d, !!(info->config & LPI_PROP_ENABLED)); + + return 0; +} + +static int its_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu_info) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + struct its_cmd_info *info = vcpu_info; + + /* Need a v4 ITS */ + if (!is_v4(its_dev->its)) + return -EINVAL; + + /* Unmap request? */ + if (!info) + return its_vlpi_unmap(d); + + switch (info->cmd_type) { + case MAP_VLPI: + return its_vlpi_map(d, info); + + case GET_VLPI: + return its_vlpi_get(d, info); + + case PROP_UPDATE_VLPI: + case PROP_UPDATE_AND_INV_VLPI: + return its_vlpi_prop_update(d, info); + + default: + return -EINVAL; + } +} + +static struct irq_chip its_irq_chip = { + .name = "ITS", + .irq_mask = its_mask_irq, + .irq_unmask = its_unmask_irq, + .irq_eoi = irq_chip_eoi_parent, + .irq_set_affinity = its_set_affinity, + .irq_compose_msi_msg = its_irq_compose_msi_msg, + .irq_set_irqchip_state = its_irq_set_irqchip_state, + .irq_retrigger = its_irq_retrigger, + .irq_set_vcpu_affinity = its_irq_set_vcpu_affinity, +}; + + +/* + * How we allocate LPIs: + * + * lpi_range_list contains ranges of LPIs that are to available to + * allocate from. To allocate LPIs, just pick the first range that + * fits the required allocation, and reduce it by the required + * amount. Once empty, remove the range from the list. + * + * To free a range of LPIs, add a free range to the list, sort it and + * merge the result if the new range happens to be adjacent to an + * already free block. + * + * The consequence of the above is that allocation is cost is low, but + * freeing is expensive. We assumes that freeing rarely occurs. + */ +#define ITS_MAX_LPI_NRBITS 16 /* 64K LPIs */ + +static DEFINE_MUTEX(lpi_range_lock); +static LIST_HEAD(lpi_range_list); + +struct lpi_range { + struct list_head entry; + u32 base_id; + u32 span; +}; + +static struct lpi_range *mk_lpi_range(u32 base, u32 span) +{ + struct lpi_range *range; + + range = kmalloc(sizeof(*range), GFP_KERNEL); + if (range) { + range->base_id = base; + range->span = span; + } + + return range; +} + +static int alloc_lpi_range(u32 nr_lpis, u32 *base) +{ + struct lpi_range *range, *tmp; + int err = -ENOSPC; + + mutex_lock(&lpi_range_lock); + + list_for_each_entry_safe(range, tmp, &lpi_range_list, entry) { + if (range->span >= nr_lpis) { + *base = range->base_id; + range->base_id += nr_lpis; + range->span -= nr_lpis; + + if (range->span == 0) { + list_del(&range->entry); + kfree(range); + } + + err = 0; + break; + } + } + + mutex_unlock(&lpi_range_lock); + + pr_debug("ITS: alloc %u:%u\n", *base, nr_lpis); + return err; +} + +static void merge_lpi_ranges(struct lpi_range *a, struct lpi_range *b) +{ + if (&a->entry == &lpi_range_list || &b->entry == &lpi_range_list) + return; + if (a->base_id + a->span != b->base_id) + return; + b->base_id = a->base_id; + b->span += a->span; + list_del(&a->entry); + kfree(a); +} + +static int free_lpi_range(u32 base, u32 nr_lpis) +{ + struct lpi_range *new, *old; + + new = mk_lpi_range(base, nr_lpis); + if (!new) + return -ENOMEM; + + mutex_lock(&lpi_range_lock); + + list_for_each_entry_reverse(old, &lpi_range_list, entry) { + if (old->base_id < base) + break; + } + /* + * old is the last element with ->base_id smaller than base, + * so new goes right after it. If there are no elements with + * ->base_id smaller than base, &old->entry ends up pointing + * at the head of the list, and inserting new it the start of + * the list is the right thing to do in that case as well. + */ + list_add(&new->entry, &old->entry); + /* + * Now check if we can merge with the preceding and/or + * following ranges. + */ + merge_lpi_ranges(old, new); + merge_lpi_ranges(new, list_next_entry(new, entry)); + + mutex_unlock(&lpi_range_lock); + return 0; +} + +static int __init its_lpi_init(u32 id_bits) +{ + u32 lpis = (1UL << id_bits) - 8192; + u32 numlpis; + int err; + + numlpis = 1UL << GICD_TYPER_NUM_LPIS(gic_rdists->gicd_typer); + + if (numlpis > 2 && !WARN_ON(numlpis > lpis)) { + lpis = numlpis; + pr_info("ITS: Using hypervisor restricted LPI range [%u]\n", + lpis); + } + + /* + * Initializing the allocator is just the same as freeing the + * full range of LPIs. + */ + err = free_lpi_range(8192, lpis); + pr_debug("ITS: Allocator initialized for %u LPIs\n", lpis); + return err; +} + +static unsigned long *its_lpi_alloc(int nr_irqs, u32 *base, int *nr_ids) +{ + unsigned long *bitmap = NULL; + int err = 0; + + do { + err = alloc_lpi_range(nr_irqs, base); + if (!err) + break; + + nr_irqs /= 2; + } while (nr_irqs > 0); + + if (!nr_irqs) + err = -ENOSPC; + + if (err) + goto out; + + bitmap = kcalloc(BITS_TO_LONGS(nr_irqs), sizeof (long), GFP_ATOMIC); + if (!bitmap) + goto out; + + *nr_ids = nr_irqs; + +out: + if (!bitmap) + *base = *nr_ids = 0; + + return bitmap; +} + +static void its_lpi_free(unsigned long *bitmap, u32 base, u32 nr_ids) +{ + WARN_ON(free_lpi_range(base, nr_ids)); + kfree(bitmap); +} + +static void gic_reset_prop_table(void *va) +{ + /* Priority 0xa0, Group-1, disabled */ + memset(va, LPI_PROP_DEFAULT_PRIO | LPI_PROP_GROUP1, LPI_PROPBASE_SZ); + + /* Make sure the GIC will observe the written configuration */ + gic_flush_dcache_to_poc(va, LPI_PROPBASE_SZ); +} + +static struct page *its_allocate_prop_table(gfp_t gfp_flags) +{ + struct page *prop_page; + + prop_page = alloc_pages(gfp_flags, get_order(LPI_PROPBASE_SZ)); + if (!prop_page) + return NULL; + + gic_reset_prop_table(page_address(prop_page)); + + return prop_page; +} + +static void its_free_prop_table(struct page *prop_page) +{ + free_pages((unsigned long)page_address(prop_page), + get_order(LPI_PROPBASE_SZ)); +} + +static bool gic_check_reserved_range(phys_addr_t addr, unsigned long size) +{ + phys_addr_t start, end, addr_end; + u64 i; + + /* + * We don't bother checking for a kdump kernel as by + * construction, the LPI tables are out of this kernel's + * memory map. + */ + if (is_kdump_kernel()) + return true; + + addr_end = addr + size - 1; + + for_each_reserved_mem_range(i, &start, &end) { + if (addr >= start && addr_end <= end) + return true; + } + + /* Not found, not a good sign... */ + pr_warn("GIC-2500: Expected reserved range [%pa:%pa], not found\n", + &addr, &addr_end); + add_taint(TAINT_CRAP, LOCKDEP_STILL_OK); + return false; +} + +static int gic_reserve_range(phys_addr_t addr, unsigned long size) +{ + if (efi_enabled(EFI_CONFIG_TABLES)) + return efi_mem_reserve_persistent(addr, size); + + return 0; +} + +static int __init its_setup_lpi_prop_table(void) +{ + if (gic_rdists->flags & RDIST_FLAGS_RD_TABLES_PREALLOCATED) { + u64 val; + + val = gicr_read_propbaser(gic_data_rdist_rd_base() + GICR_PROPBASER); + lpi_id_bits = (val & GICR_PROPBASER_IDBITS_MASK) + 1; + + gic_rdists->prop_table_pa = val & GENMASK_ULL(51, 12); + gic_rdists->prop_table_va = memremap(gic_rdists->prop_table_pa, + LPI_PROPBASE_SZ, + MEMREMAP_WB); + gic_reset_prop_table(gic_rdists->prop_table_va); + } else { + struct page *page; + + lpi_id_bits = min_t(u32, + GICD_TYPER_ID_BITS(gic_rdists->gicd_typer), + ITS_MAX_LPI_NRBITS); + page = its_allocate_prop_table(GFP_NOWAIT); + if (!page) { + pr_err("Failed to allocate PROPBASE\n"); + return -ENOMEM; + } + + gic_rdists->prop_table_pa = page_to_phys(page); + gic_rdists->prop_table_va = page_address(page); + WARN_ON(gic_reserve_range(gic_rdists->prop_table_pa, + LPI_PROPBASE_SZ)); + } + + pr_info("GIC-2500: using LPI property table @%pa\n", + &gic_rdists->prop_table_pa); + + return its_lpi_init(lpi_id_bits); +} + +static const char *its_base_type_string[] = { + [GITS_BASER_TYPE_DEVICE] = "Devices", + [GITS_BASER_TYPE_VCPU] = "Virtual CPUs", + [GITS_BASER_TYPE_RESERVED3] = "Reserved (3)", + [GITS_BASER_TYPE_COLLECTION] = "Interrupt Collections", + [GITS_BASER_TYPE_RESERVED5] = "Reserved (5)", + [GITS_BASER_TYPE_RESERVED6] = "Reserved (6)", + [GITS_BASER_TYPE_RESERVED7] = "Reserved (7)", +}; + +static u64 its_read_baser(struct its_node *its, struct its_baser *baser) +{ + u32 idx = baser - its->tables; + + return gits_read_baser(its->base + GITS_BASER + (idx << 3)); +} + +static void its_write_baser(struct its_node *its, struct its_baser *baser, + u64 val) +{ + u32 idx = baser - its->tables; + + gits_write_baser(val, its->base + GITS_BASER + (idx << 3)); + baser->val = its_read_baser(its, baser); +} + +static int its_setup_baser(struct its_node *its, struct its_baser *baser, + u64 cache, u64 shr, u32 order, bool indirect) +{ + u64 val = its_read_baser(its, baser); + u64 esz = GITS_BASER_ENTRY_SIZE(val); + u64 type = GITS_BASER_TYPE(val); + u64 baser_phys, tmp; + u32 alloc_pages, psz; + struct page *page; + void *base; + + psz = baser->psz; + alloc_pages = (PAGE_ORDER_TO_SIZE(order) / psz); + if (alloc_pages > GITS_BASER_PAGES_MAX) { + pr_warn("ITS@%pa: %s too large, reduce ITS pages %u->%u\n", + &its->phys_base, its_base_type_string[type], + alloc_pages, GITS_BASER_PAGES_MAX); + alloc_pages = GITS_BASER_PAGES_MAX; + order = get_order(GITS_BASER_PAGES_MAX * psz); + } + + page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, order); + if (!page) + return -ENOMEM; + + base = (void *)page_address(page); + baser_phys = virt_to_phys(base); + + /* Check if the physical address of the memory is above 48bits */ + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES) && (baser_phys >> 48)) { + + /* 52bit PA is supported only when PageSize=64K */ + if (psz != SZ_64K) { + pr_err("ITS: no 52bit PA support when psz=%d\n", psz); + free_pages((unsigned long)base, order); + return -ENXIO; + } + + /* Convert 52bit PA to 48bit field */ + baser_phys = GITS_BASER_PHYS_52_to_48(baser_phys); + } + +retry_baser: + val = (baser_phys | + (type << GITS_BASER_TYPE_SHIFT) | + ((esz - 1) << GITS_BASER_ENTRY_SIZE_SHIFT) | + ((alloc_pages - 1) << GITS_BASER_PAGES_SHIFT) | + cache | + shr | + GITS_BASER_VALID); + + val |= indirect ? GITS_BASER_INDIRECT : 0x0; + + switch (psz) { + case SZ_4K: + val |= GITS_BASER_PAGE_SIZE_4K; + break; + case SZ_16K: + val |= GITS_BASER_PAGE_SIZE_16K; + break; + case SZ_64K: + val |= GITS_BASER_PAGE_SIZE_64K; + break; + } + + its_write_baser(its, baser, val); + tmp = baser->val; + + if ((val ^ tmp) & GITS_BASER_SHAREABILITY_MASK) { + /* + * Shareability didn't stick. Just use + * whatever the read reported, which is likely + * to be the only thing this redistributor + * supports. If that's zero, make it + * non-cacheable as well. + */ + shr = tmp & GITS_BASER_SHAREABILITY_MASK; + if (!shr) { + cache = GITS_BASER_nC; + gic_flush_dcache_to_poc(base, PAGE_ORDER_TO_SIZE(order)); + } + goto retry_baser; + } + + if (val != tmp) { + pr_err("ITS@%pa: %s doesn't stick: %llx %llx\n", + &its->phys_base, its_base_type_string[type], + val, tmp); + free_pages((unsigned long)base, order); + return -ENXIO; + } + + baser->order = order; + baser->base = base; + baser->psz = psz; + tmp = indirect ? GITS_LVL1_ENTRY_SIZE : esz; + + pr_info("ITS@%pa: allocated %d %s @%lx (%s, esz %d, psz %dK, shr %d)\n", + &its->phys_base, (int)(PAGE_ORDER_TO_SIZE(order) / (int)tmp), + its_base_type_string[type], + (unsigned long)virt_to_phys(base), + indirect ? "indirect" : "flat", (int)esz, + psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT); + + return 0; +} + +static bool its_parse_indirect_baser(struct its_node *its, + struct its_baser *baser, + u32 *order, u32 ids) +{ + u64 tmp = its_read_baser(its, baser); + u64 type = GITS_BASER_TYPE(tmp); + u64 esz = GITS_BASER_ENTRY_SIZE(tmp); + u64 val = GITS_BASER_InnerShareable | GITS_BASER_RaWaWb; + u32 new_order = *order; + u32 psz = baser->psz; + bool indirect = false; + + /* No need to enable Indirection if memory requirement < (psz*2)bytes */ + if ((esz << ids) > (psz * 2)) { + /* + * Find out whether hw supports a single or two-level table by + * table by reading bit at offset '62' after writing '1' to it. + */ + its_write_baser(its, baser, val | GITS_BASER_INDIRECT); + indirect = !!(baser->val & GITS_BASER_INDIRECT); + + if (indirect) { + /* + * The size of the lvl2 table is equal to ITS page size + * which is 'psz'. For computing lvl1 table size, + * subtract ID bits that sparse lvl2 table from 'ids' + * which is reported by ITS hardware times lvl1 table + * entry size. + */ + ids -= ilog2(psz / (int)esz); + esz = GITS_LVL1_ENTRY_SIZE; + } + } + + /* + * Allocate as many entries as required to fit the + * range of device IDs that the ITS can grok... The ID + * space being incredibly sparse, this results in a + * massive waste of memory if two-level device table + * feature is not supported by hardware. + */ + new_order = max_t(u32, get_order(esz << ids), new_order); + if (new_order >= MAX_ORDER) { + new_order = MAX_ORDER - 1; + ids = ilog2(PAGE_ORDER_TO_SIZE(new_order) / (int)esz); + pr_warn("ITS@%pa: %s Table too large, reduce ids %llu->%u\n", + &its->phys_base, its_base_type_string[type], + device_ids(its), ids); + } + + *order = new_order; + + return indirect; +} + +static u32 compute_common_aff(u64 val) +{ + u32 aff, clpiaff; + + aff = FIELD_GET(GICR_TYPER_AFFINITY, val); + clpiaff = FIELD_GET(GICR_TYPER_COMMON_LPI_AFF, val); + + return aff & ~(GENMASK(31, 0) >> (clpiaff * 8)); +} + +static u32 compute_its_aff(struct its_node *its) +{ + u64 val; + u32 svpet; + + /* + * Reencode the ITS SVPET and MPIDR as a GICR_TYPER, and compute + * the resulting affinity. We then use that to see if this match + * our own affinity. + */ + svpet = FIELD_GET(GITS_TYPER_SVPET, its->typer); + val = FIELD_PREP(GICR_TYPER_COMMON_LPI_AFF, svpet); + val |= FIELD_PREP(GICR_TYPER_AFFINITY, its->mpidr); + return compute_common_aff(val); +} + +static struct its_node *find_sibling_its(struct its_node *cur_its) +{ + struct its_node *its; + u32 aff; + + if (!FIELD_GET(GITS_TYPER_SVPET, cur_its->typer)) + return NULL; + + aff = compute_its_aff(cur_its); + + list_for_each_entry(its, &its_nodes, entry) { + u64 baser; + + if (!is_v4_1(its) || its == cur_its) + continue; + + if (!FIELD_GET(GITS_TYPER_SVPET, its->typer)) + continue; + + if (aff != compute_its_aff(its)) + continue; + + /* GICv4.1 guarantees that the vPE table is GITS_BASER2 */ + baser = its->tables[2].val; + if (!(baser & GITS_BASER_VALID)) + continue; + + return its; + } + + return NULL; +} + +static void its_free_tables(struct its_node *its) +{ + int i; + + for (i = 0; i < GITS_BASER_NR_REGS; i++) { + if (its->tables[i].base) { + free_pages((unsigned long)its->tables[i].base, + its->tables[i].order); + its->tables[i].base = NULL; + } + } +} + +static int its_probe_baser_psz(struct its_node *its, struct its_baser *baser) +{ + u64 psz = SZ_64K; + + while (psz) { + u64 val, gpsz; + + val = its_read_baser(its, baser); + val &= ~GITS_BASER_PAGE_SIZE_MASK; + + switch (psz) { + case SZ_64K: + gpsz = GITS_BASER_PAGE_SIZE_64K; + break; + case SZ_16K: + gpsz = GITS_BASER_PAGE_SIZE_16K; + break; + case SZ_4K: + default: + gpsz = GITS_BASER_PAGE_SIZE_4K; + break; + } + + gpsz >>= GITS_BASER_PAGE_SIZE_SHIFT; + + val |= FIELD_PREP(GITS_BASER_PAGE_SIZE_MASK, gpsz); + its_write_baser(its, baser, val); + + if (FIELD_GET(GITS_BASER_PAGE_SIZE_MASK, baser->val) == gpsz) + break; + + switch (psz) { + case SZ_64K: + psz = SZ_16K; + break; + case SZ_16K: + psz = SZ_4K; + break; + case SZ_4K: + default: + return -1; + } + } + + baser->psz = psz; + return 0; +} + +static int its_alloc_tables(struct its_node *its) +{ + u64 shr = GITS_BASER_InnerShareable; + u64 cache = GITS_BASER_RaWaWb; + int err, i; + + if (its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_22375) + /* erratum 24313: ignore memory access type */ + cache = GITS_BASER_nCnB; + + for (i = 0; i < GITS_BASER_NR_REGS; i++) { + struct its_baser *baser = its->tables + i; + u64 val = its_read_baser(its, baser); + u64 type = GITS_BASER_TYPE(val); + bool indirect = false; + u32 order; + + if (type == GITS_BASER_TYPE_NONE) + continue; + + if (its_probe_baser_psz(its, baser)) { + its_free_tables(its); + return -ENXIO; + } + + order = get_order(baser->psz); + + switch (type) { + case GITS_BASER_TYPE_DEVICE: + indirect = its_parse_indirect_baser(its, baser, &order, + device_ids(its)); + break; + + case GITS_BASER_TYPE_VCPU: + if (is_v4_1(its)) { + struct its_node *sibling; + + WARN_ON(i != 2); + if ((sibling = find_sibling_its(its))) { + *baser = sibling->tables[2]; + its_write_baser(its, baser, baser->val); + continue; + } + } + + indirect = its_parse_indirect_baser(its, baser, &order, + ITS_MAX_VPEID_BITS); + break; + } + + err = its_setup_baser(its, baser, cache, shr, order, indirect); + if (err < 0) { + its_free_tables(its); + return err; + } + + /* Update settings which will be used for next BASERn */ + cache = baser->val & GITS_BASER_CACHEABILITY_MASK; + shr = baser->val & GITS_BASER_SHAREABILITY_MASK; + } + + return 0; +} + +static u64 inherit_vpe_l1_table_from_its(void) +{ + struct its_node *its; + u64 val; + u32 aff; + + val = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER); + aff = compute_common_aff(val); + + list_for_each_entry(its, &its_nodes, entry) { + u64 baser, addr; + + if (!is_v4_1(its)) + continue; + + if (!FIELD_GET(GITS_TYPER_SVPET, its->typer)) + continue; + + if (aff != compute_its_aff(its)) + continue; + + /* GICv4.1 guarantees that the vPE table is GITS_BASER2 */ + baser = its->tables[2].val; + if (!(baser & GITS_BASER_VALID)) + continue; + + /* We have a winner! */ + gic_data_rdist()->vpe_l1_base = its->tables[2].base; + + val = GICR_VPROPBASER_4_1_VALID; + if (baser & GITS_BASER_INDIRECT) + val |= GICR_VPROPBASER_4_1_INDIRECT; + val |= FIELD_PREP(GICR_VPROPBASER_4_1_PAGE_SIZE, + FIELD_GET(GITS_BASER_PAGE_SIZE_MASK, baser)); + switch (FIELD_GET(GITS_BASER_PAGE_SIZE_MASK, baser)) { + case GIC_PAGE_SIZE_64K: + addr = GITS_BASER_ADDR_48_to_52(baser); + break; + default: + addr = baser & GENMASK_ULL(47, 12); + break; + } + val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, addr >> 12); + val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK, + FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser)); + val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK, + FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser)); + val |= FIELD_PREP(GICR_VPROPBASER_4_1_SIZE, GITS_BASER_NR_PAGES(baser) - 1); + + return val; + } + + return 0; +} + +static u64 inherit_vpe_l1_table_from_rd(cpumask_t **mask) +{ + u32 aff; + u64 val; + int cpu; + + val = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER); + aff = compute_common_aff(val); + + for_each_possible_cpu(cpu) { + void __iomem *base = gic_data_rdist_cpu(cpu)->rd_base; + + if (!base || cpu == smp_processor_id()) + continue; + + val = gic_read_typer(base + GICR_TYPER); + if (aff != compute_common_aff(val)) + continue; + + /* + * At this point, we have a victim. This particular CPU + * has already booted, and has an affinity that matches + * ours wrt CommonLPIAff. Let's use its own VPROPBASER. + * Make sure we don't write the Z bit in that case. + */ + val = gicr_read_vpropbaser(base + SZ_128K + GICR_VPROPBASER); + val &= ~GICR_VPROPBASER_4_1_Z; + + gic_data_rdist()->vpe_l1_base = gic_data_rdist_cpu(cpu)->vpe_l1_base; + *mask = gic_data_rdist_cpu(cpu)->vpe_table_mask; + + return val; + } + + return 0; +} + +static bool allocate_vpe_l2_table(int cpu, u32 id) +{ + void __iomem *base = gic_data_rdist_cpu(cpu)->rd_base; + unsigned int psz, esz, idx, npg, gpsz; + u64 val; + struct page *page; + __le64 *table; + + if (!gic_rdists->has_rvpeid) + return true; + + /* Skip non-present CPUs */ + if (!base) + return true; + + val = gicr_read_vpropbaser(base + SZ_128K + GICR_VPROPBASER); + + esz = FIELD_GET(GICR_VPROPBASER_4_1_ENTRY_SIZE, val) + 1; + gpsz = FIELD_GET(GICR_VPROPBASER_4_1_PAGE_SIZE, val); + npg = FIELD_GET(GICR_VPROPBASER_4_1_SIZE, val) + 1; + + switch (gpsz) { + default: + WARN_ON(1); + fallthrough; + case GIC_PAGE_SIZE_4K: + psz = SZ_4K; + break; + case GIC_PAGE_SIZE_16K: + psz = SZ_16K; + break; + case GIC_PAGE_SIZE_64K: + psz = SZ_64K; + break; + } + + /* Don't allow vpe_id that exceeds single, flat table limit */ + if (!(val & GICR_VPROPBASER_4_1_INDIRECT)) + return (id < (npg * psz / (esz * SZ_8))); + + /* Compute 1st level table index & check if that exceeds table limit */ + idx = id >> ilog2(psz / (esz * SZ_8)); + if (idx >= (npg * psz / GITS_LVL1_ENTRY_SIZE)) + return false; + + table = gic_data_rdist_cpu(cpu)->vpe_l1_base; + + /* Allocate memory for 2nd level table */ + if (!table[idx]) { + page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(psz)); + if (!page) + return false; + + /* Flush Lvl2 table to PoC if hw doesn't support coherency */ + if (!(val & GICR_VPROPBASER_SHAREABILITY_MASK)) + gic_flush_dcache_to_poc(page_address(page), psz); + + table[idx] = cpu_to_le64(page_to_phys(page) | GITS_BASER_VALID); + + /* Flush Lvl1 entry to PoC if hw doesn't support coherency */ + if (!(val & GICR_VPROPBASER_SHAREABILITY_MASK)) + gic_flush_dcache_to_poc(table + idx, GITS_LVL1_ENTRY_SIZE); + + /* Ensure updated table contents are visible to RD hardware */ + dsb(sy); + } + + return true; +} + +static int allocate_vpe_l1_table(void) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val, gpsz, npg, pa; + unsigned int psz = SZ_64K; + unsigned int np, epp, esz; + struct page *page; + + if (!gic_rdists->has_rvpeid) + return 0; + + /* + * if VPENDBASER.Valid is set, disable any previously programmed + * VPE by setting PendingLast while clearing Valid. This has the + * effect of making sure no doorbell will be generated and we can + * then safely clear VPROPBASER.Valid. + */ + if (gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER) & GICR_VPENDBASER_Valid) + gicr_write_vpendbaser(GICR_VPENDBASER_PendingLast, + vlpi_base + GICR_VPENDBASER); + + /* + * If we can inherit the configuration from another RD, let's do + * so. Otherwise, we have to go through the allocation process. We + * assume that all RDs have the exact same requirements, as + * nothing will work otherwise. + */ + val = inherit_vpe_l1_table_from_rd(&gic_data_rdist()->vpe_table_mask); + if (val & GICR_VPROPBASER_4_1_VALID) + goto out; + + gic_data_rdist()->vpe_table_mask = kzalloc(sizeof(cpumask_t), GFP_ATOMIC); + if (!gic_data_rdist()->vpe_table_mask) + return -ENOMEM; + + val = inherit_vpe_l1_table_from_its(); + if (val & GICR_VPROPBASER_4_1_VALID) + goto out; + + /* First probe the page size */ + val = FIELD_PREP(GICR_VPROPBASER_4_1_PAGE_SIZE, GIC_PAGE_SIZE_64K); + gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER); + val = gicr_read_vpropbaser(vlpi_base + GICR_VPROPBASER); + gpsz = FIELD_GET(GICR_VPROPBASER_4_1_PAGE_SIZE, val); + esz = FIELD_GET(GICR_VPROPBASER_4_1_ENTRY_SIZE, val); + + switch (gpsz) { + default: + gpsz = GIC_PAGE_SIZE_4K; + fallthrough; + case GIC_PAGE_SIZE_4K: + psz = SZ_4K; + break; + case GIC_PAGE_SIZE_16K: + psz = SZ_16K; + break; + case GIC_PAGE_SIZE_64K: + psz = SZ_64K; + break; + } + + /* + * Start populating the register from scratch, including RO fields + * (which we want to print in debug cases...) + */ + val = 0; + val |= FIELD_PREP(GICR_VPROPBASER_4_1_PAGE_SIZE, gpsz); + val |= FIELD_PREP(GICR_VPROPBASER_4_1_ENTRY_SIZE, esz); + + /* How many entries per GIC page? */ + esz++; + epp = psz / (esz * SZ_8); + + /* + * If we need more than just a single L1 page, flag the table + * as indirect and compute the number of required L1 pages. + */ + if (epp < ITS_MAX_VPEID) { + int nl2; + + val |= GICR_VPROPBASER_4_1_INDIRECT; + + /* Number of L2 pages required to cover the VPEID space */ + nl2 = DIV_ROUND_UP(ITS_MAX_VPEID, epp); + + /* Number of L1 pages to point to the L2 pages */ + npg = DIV_ROUND_UP(nl2 * SZ_8, psz); + } else { + npg = 1; + } + + val |= FIELD_PREP(GICR_VPROPBASER_4_1_SIZE, npg - 1); + + /* Right, that's the number of CPU pages we need for L1 */ + np = DIV_ROUND_UP(npg * psz, PAGE_SIZE); + + pr_debug("np = %d, npg = %lld, psz = %d, epp = %d, esz = %d\n", + np, npg, psz, epp, esz); + page = alloc_pages(GFP_ATOMIC | __GFP_ZERO, get_order(np * PAGE_SIZE)); + if (!page) + return -ENOMEM; + + gic_data_rdist()->vpe_l1_base = page_address(page); + pa = virt_to_phys(page_address(page)); + WARN_ON(!IS_ALIGNED(pa, psz)); + + val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, pa >> 12); + val |= GICR_VPROPBASER_RaWb; + val |= GICR_VPROPBASER_InnerShareable; + val |= GICR_VPROPBASER_4_1_Z; + val |= GICR_VPROPBASER_4_1_VALID; + +out: + gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER); + cpumask_set_cpu(smp_processor_id(), gic_data_rdist()->vpe_table_mask); + + pr_debug("CPU%d: VPROPBASER = %llx %*pbl\n", + smp_processor_id(), val, + cpumask_pr_args(gic_data_rdist()->vpe_table_mask)); + + return 0; +} + +static int its_alloc_collections(struct its_node *its) +{ + int i; + + its->collections = kcalloc(nr_cpu_ids, sizeof(*its->collections), + GFP_KERNEL); + if (!its->collections) + return -ENOMEM; + + for (i = 0; i < nr_cpu_ids; i++) + its->collections[i].target_address = ~0ULL; + + return 0; +} + +static struct page *its_allocate_pending_table(gfp_t gfp_flags) +{ + struct page *pend_page; + + pend_page = alloc_pages(gfp_flags | __GFP_ZERO, + get_order(LPI_PENDBASE_SZ)); + if (!pend_page) + return NULL; + + /* Make sure the GIC will observe the zero-ed page */ + gic_flush_dcache_to_poc(page_address(pend_page), LPI_PENDBASE_SZ); + + return pend_page; +} + +static void its_free_pending_table(struct page *pt) +{ + free_pages((unsigned long)page_address(pt), get_order(LPI_PENDBASE_SZ)); +} + +/* + * Booting with kdump and LPIs enabled is generally fine. Any other + * case is wrong in the absence of firmware/EFI support. + */ +static bool enabled_lpis_allowed(void) +{ + phys_addr_t addr; + u64 val; + + /* Check whether the property table is in a reserved region */ + val = gicr_read_propbaser(gic_data_rdist_rd_base() + GICR_PROPBASER); + addr = val & GENMASK_ULL(51, 12); + + return gic_check_reserved_range(addr, LPI_PROPBASE_SZ); +} + +static int __init allocate_lpi_tables(void) +{ + u64 val; + int err, cpu; + + /* + * If LPIs are enabled while we run this from the boot CPU, + * flag the RD tables as pre-allocated if the stars do align. + */ + val = readl_relaxed(gic_data_rdist_rd_base() + GICR_CTLR); + if ((val & GICR_CTLR_ENABLE_LPIS) && enabled_lpis_allowed()) { + gic_rdists->flags |= (RDIST_FLAGS_RD_TABLES_PREALLOCATED | + RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING); + pr_info("GIC-2500: Using preallocated redistributor tables\n"); + } + + err = its_setup_lpi_prop_table(); + if (err) + return err; + + /* + * We allocate all the pending tables anyway, as we may have a + * mix of RDs that have had LPIs enabled, and some that + * don't. We'll free the unused ones as each CPU comes online. + */ + for_each_possible_cpu(cpu) { + struct page *pend_page; + + pend_page = its_allocate_pending_table(GFP_NOWAIT); + if (!pend_page) { + pr_err("Failed to allocate PENDBASE for CPU%d\n", cpu); + return -ENOMEM; + } + + gic_data_rdist_cpu(cpu)->pend_page = pend_page; + } + + return 0; +} + +static u64 its_clear_vpend_valid(void __iomem *vlpi_base, u64 clr, u64 set) +{ + u32 count = 1000000; /* 1s! */ + bool clean; + u64 val; + + val = gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER); + val &= ~GICR_VPENDBASER_Valid; + val &= ~clr; + val |= set; + gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + do { + val = gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER); + clean = !(val & GICR_VPENDBASER_Dirty); + if (!clean) { + count--; + cpu_relax(); + udelay(1); + } + } while (!clean && count); + + if (unlikely(val & GICR_VPENDBASER_Dirty)) { + pr_err_ratelimited("ITS virtual pending table not cleaning\n"); + val |= GICR_VPENDBASER_PendingLast; + } + + return val; +} + +static void its_cpu_init_lpis(void) +{ + void __iomem *rbase = gic_data_rdist_rd_base(); + struct page *pend_page; + phys_addr_t paddr; + u64 val, tmp; + + if (gic_data_rdist()->lpi_enabled) + return; + + val = readl_relaxed(rbase + GICR_CTLR); + if ((gic_rdists->flags & RDIST_FLAGS_RD_TABLES_PREALLOCATED) && + (val & GICR_CTLR_ENABLE_LPIS)) { + /* + * Check that we get the same property table on all + * RDs. If we don't, this is hopeless. + */ + paddr = gicr_read_propbaser(rbase + GICR_PROPBASER); + paddr &= GENMASK_ULL(51, 12); + if (WARN_ON(gic_rdists->prop_table_pa != paddr)) + add_taint(TAINT_CRAP, LOCKDEP_STILL_OK); + + paddr = gicr_read_pendbaser(rbase + GICR_PENDBASER); + paddr &= GENMASK_ULL(51, 16); + + WARN_ON(!gic_check_reserved_range(paddr, LPI_PENDBASE_SZ)); + its_free_pending_table(gic_data_rdist()->pend_page); + gic_data_rdist()->pend_page = NULL; + + goto out; + } + + pend_page = gic_data_rdist()->pend_page; + paddr = page_to_phys(pend_page); + WARN_ON(gic_reserve_range(paddr, LPI_PENDBASE_SZ)); + + /* set PROPBASE */ + val = (gic_rdists->prop_table_pa | + GICR_PROPBASER_InnerShareable | + GICR_PROPBASER_RaWaWb | + ((LPI_NRBITS - 1) & GICR_PROPBASER_IDBITS_MASK)); + + gicr_write_propbaser(val, rbase + GICR_PROPBASER); + tmp = gicr_read_propbaser(rbase + GICR_PROPBASER); + + if ((tmp ^ val) & GICR_PROPBASER_SHAREABILITY_MASK) { + if (!(tmp & GICR_PROPBASER_SHAREABILITY_MASK)) { + /* + * The HW reports non-shareable, we must + * remove the cacheability attributes as + * well. + */ + val &= ~(GICR_PROPBASER_SHAREABILITY_MASK | + GICR_PROPBASER_CACHEABILITY_MASK); + val |= GICR_PROPBASER_nC; + gicr_write_propbaser(val, rbase + GICR_PROPBASER); + } + pr_info_once("GIC: using cache flushing for LPI property table\n"); + gic_rdists->flags |= RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING; + } + + /* set PENDBASE */ + val = (page_to_phys(pend_page) | + GICR_PENDBASER_InnerShareable | + GICR_PENDBASER_RaWaWb); + + gicr_write_pendbaser(val, rbase + GICR_PENDBASER); + tmp = gicr_read_pendbaser(rbase + GICR_PENDBASER); + + if (!(tmp & GICR_PENDBASER_SHAREABILITY_MASK)) { + /* + * The HW reports non-shareable, we must remove the + * cacheability attributes as well. + */ + val &= ~(GICR_PENDBASER_SHAREABILITY_MASK | + GICR_PENDBASER_CACHEABILITY_MASK); + val |= GICR_PENDBASER_nC; + gicr_write_pendbaser(val, rbase + GICR_PENDBASER); + } + + /* Enable LPIs */ + val = readl_relaxed(rbase + GICR_CTLR); + val |= GICR_CTLR_ENABLE_LPIS; + writel_relaxed(val, rbase + GICR_CTLR); + + if (gic_rdists->has_vlpis && !gic_rdists->has_rvpeid) { + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + + /* + * It's possible for CPU to receive VLPIs before it is + * sheduled as a vPE, especially for the first CPU, and the + * VLPI with INTID larger than 2^(IDbits+1) will be considered + * as out of range and dropped by GIC. + * So we initialize IDbits to known value to avoid VLPI drop. + */ + val = (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK; + pr_debug("GICv4: CPU%d: Init IDbits to 0x%llx for GICR_VPROPBASER\n", + smp_processor_id(), val); + gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER); + + /* + * Also clear Valid bit of GICR_VPENDBASER, in case some + * ancient programming gets left in and has possibility of + * corrupting memory. + */ + val = its_clear_vpend_valid(vlpi_base, 0, 0); + } + + if (allocate_vpe_l1_table()) { + /* + * If the allocation has failed, we're in massive trouble. + * Disable direct injection, and pray that no VM was + * already running... + */ + gic_rdists->has_rvpeid = false; + gic_rdists->has_vlpis = false; + } + + /* Make sure the GIC has seen the above */ + dsb(sy); +out: + gic_data_rdist()->lpi_enabled = true; + pr_info("GIC-2500: CPU%d: using %s LPI pending table @%pa\n", + smp_processor_id(), + gic_data_rdist()->pend_page ? "allocated" : "reserved", + &paddr); +} + +static void its_cpu_init_collection(struct its_node *its) +{ + int cpu = smp_processor_id(); + u64 target; + unsigned long mpid; + phys_addr_t its_phys_base; + unsigned long skt_id; + + /* avoid cross node collections and its mapping */ + if (its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) { + struct device_node *cpu_node; + + cpu_node = of_get_cpu_node(cpu, NULL); + if (its->numa_node != NUMA_NO_NODE && + its->numa_node != of_node_to_nid(cpu_node)) + return; + } + + mpid = cpu_logical_map(cpu); + its_phys_base = its->phys_base; + skt_id = (its_phys_base >> 41) & 0x7; + + /* + * We now have to bind each collection to its target + * redistributor. + */ + if (gic_read_typer(its->base + GITS_TYPER) & GITS_TYPER_PTA) { + /* + * This ITS wants the physical address of the + * redistributor. + */ + target = gic_data_rdist()->phys_base; + } else { + /* This ITS wants a linear CPU number. */ + target = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER); + target = GICR_TYPER_CPU_NUMBER(target) << 16; + } + + /* Perform collection mapping */ + its->collections[cpu].target_address = target; + its->collections[cpu].col_id = cpu % 64; + + its_send_mapc(its, &its->collections[cpu], 1); + its_send_invall(its, &its->collections[cpu]); +} + +static void its_cpu_init_collections(void) +{ + struct its_node *its; + + raw_spin_lock(&its_lock); + + list_for_each_entry(its, &its_nodes, entry) + its_cpu_init_collection(its); + + raw_spin_unlock(&its_lock); +} + +static struct its_device *its_find_device(struct its_node *its, u32 dev_id) +{ + struct its_device *its_dev = NULL, *tmp; + unsigned long flags; + + raw_spin_lock_irqsave(&its->lock, flags); + + list_for_each_entry(tmp, &its->its_device_list, entry) { + if (tmp->device_id == dev_id) { + its_dev = tmp; + break; + } + } + + raw_spin_unlock_irqrestore(&its->lock, flags); + + return its_dev; +} + +static struct its_baser *its_get_baser(struct its_node *its, u32 type) +{ + int i; + + for (i = 0; i < GITS_BASER_NR_REGS; i++) { + if (GITS_BASER_TYPE(its->tables[i].val) == type) + return &its->tables[i]; + } + + return NULL; +} + +static bool its_alloc_table_entry(struct its_node *its, + struct its_baser *baser, u32 id) +{ + struct page *page; + u32 esz, idx; + __le64 *table; + + /* Don't allow device id that exceeds single, flat table limit */ + esz = GITS_BASER_ENTRY_SIZE(baser->val); + if (!(baser->val & GITS_BASER_INDIRECT)) + return (id < (PAGE_ORDER_TO_SIZE(baser->order) / esz)); + + /* Compute 1st level table index & check if that exceeds table limit */ + idx = id >> ilog2(baser->psz / esz); + if (idx >= (PAGE_ORDER_TO_SIZE(baser->order) / GITS_LVL1_ENTRY_SIZE)) + return false; + + table = baser->base; + + /* Allocate memory for 2nd level table */ + if (!table[idx]) { + page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, + get_order(baser->psz)); + if (!page) + return false; + + /* Flush Lvl2 table to PoC if hw doesn't support coherency */ + if (!(baser->val & GITS_BASER_SHAREABILITY_MASK)) + gic_flush_dcache_to_poc(page_address(page), baser->psz); + + table[idx] = cpu_to_le64(page_to_phys(page) | GITS_BASER_VALID); + + /* Flush Lvl1 entry to PoC if hw doesn't support coherency */ + if (!(baser->val & GITS_BASER_SHAREABILITY_MASK)) + gic_flush_dcache_to_poc(table + idx, GITS_LVL1_ENTRY_SIZE); + + /* Ensure updated table contents are visible to ITS hardware */ + dsb(sy); + } + + return true; +} + +static bool its_alloc_device_table(struct its_node *its, u32 dev_id) +{ + struct its_baser *baser; + + baser = its_get_baser(its, GITS_BASER_TYPE_DEVICE); + + /* Don't allow device id that exceeds ITS hardware limit */ + if (!baser) + return (ilog2(dev_id) < device_ids(its)); + + return its_alloc_table_entry(its, baser, dev_id); +} + +static bool its_alloc_vpe_table(u32 vpe_id) +{ + struct its_node *its; + int cpu; + + /* + * Make sure the L2 tables are allocated on *all* v4 ITSs. We + * could try and only do it on ITSs corresponding to devices + * that have interrupts targeted at this VPE, but the + * complexity becomes crazy (and you have tons of memory + * anyway, right?). + */ + list_for_each_entry(its, &its_nodes, entry) { + struct its_baser *baser; + + if (!is_v4(its)) + continue; + + baser = its_get_baser(its, GITS_BASER_TYPE_VCPU); + if (!baser) + return false; + + if (!its_alloc_table_entry(its, baser, vpe_id)) + return false; + } + + /* Non v4.1? No need to iterate RDs and go back early. */ + if (!gic_rdists->has_rvpeid) + return true; + + /* + * Make sure the L2 tables are allocated for all copies of + * the L1 table on *all* v4.1 RDs. + */ + for_each_possible_cpu(cpu) { + if (!allocate_vpe_l2_table(cpu, vpe_id)) + return false; + } + + return true; +} + +static struct its_device *its_create_device(struct its_node *its, u32 dev_id, + int nvecs, bool alloc_lpis) +{ + struct its_device *dev; + unsigned long *lpi_map = NULL; + unsigned long flags; + u16 *col_map = NULL; + void *itt; + int lpi_base; + int nr_lpis; + int nr_ites; + int sz; + + if (!its_alloc_device_table(its, dev_id)) + return NULL; + + if (WARN_ON(!is_power_of_2(nvecs))) + nvecs = roundup_pow_of_two(nvecs); + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + /* + * Even if the device wants a single LPI, the ITT must be + * sized as a power of two (and you need at least one bit...). + */ + nr_ites = max(2, nvecs); + sz = nr_ites * (FIELD_GET(GITS_TYPER_ITT_ENTRY_SIZE, its->typer) + 1); + sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; + itt = kzalloc_node(sz, GFP_KERNEL, its->numa_node); + if (alloc_lpis) { + lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis); + if (lpi_map) + col_map = kcalloc(nr_lpis, sizeof(*col_map), + GFP_KERNEL); + } else { + col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL); + nr_lpis = 0; + lpi_base = 0; + } + + if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis)) { + kfree(dev); + kfree(itt); + kfree(lpi_map); + kfree(col_map); + return NULL; + } + + gic_flush_dcache_to_poc(itt, sz); + + dev->its = its; + dev->itt = itt; + dev->nr_ites = nr_ites; + dev->event_map.lpi_map = lpi_map; + dev->event_map.col_map = col_map; + dev->event_map.lpi_base = lpi_base; + dev->event_map.nr_lpis = nr_lpis; + raw_spin_lock_init(&dev->event_map.vlpi_lock); + dev->device_id = dev_id; + INIT_LIST_HEAD(&dev->entry); + + raw_spin_lock_irqsave(&its->lock, flags); + list_add(&dev->entry, &its->its_device_list); + raw_spin_unlock_irqrestore(&its->lock, flags); + + /* Map device to its ITT */ + its_send_mapd(dev, 1); + + return dev; +} + +static void its_free_device(struct its_device *its_dev) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&its_dev->its->lock, flags); + list_del(&its_dev->entry); + raw_spin_unlock_irqrestore(&its_dev->its->lock, flags); + kfree(its_dev->event_map.col_map); + kfree(its_dev->itt); + kfree(its_dev); +} + +static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq) +{ + int idx; + + /* Find a free LPI region in lpi_map and allocate them. */ + idx = bitmap_find_free_region(dev->event_map.lpi_map, + dev->event_map.nr_lpis, + get_count_order(nvecs)); + if (idx < 0) + return -ENOSPC; + + *hwirq = dev->event_map.lpi_base + idx; + + return 0; +} + +static int its_msi_prepare(struct irq_domain *domain, struct device *dev, + int nvec, msi_alloc_info_t *info) +{ + struct its_node *its; + struct its_device *its_dev; + struct msi_domain_info *msi_info; + u32 dev_id; + int err = 0; + + /* + * We ignore "dev" entirely, and rely on the dev_id that has + * been passed via the scratchpad. This limits this domain's + * usefulness to upper layers that definitely know that they + * are built on top of the ITS. + */ + dev_id = info->scratchpad[0].ul; + + msi_info = msi_get_domain_info(domain); + its = msi_info->data; + + if (!gic_rdists->has_direct_lpi && + vpe_proxy.dev && + vpe_proxy.dev->its == its && + dev_id == vpe_proxy.dev->device_id) { + /* Bad luck. Get yourself a better implementation */ + WARN_ONCE(1, "DevId %x clashes with GICv4 VPE proxy device\n", + dev_id); + return -EINVAL; + } + + mutex_lock(&its->dev_alloc_lock); + its_dev = its_find_device(its, dev_id); + if (its_dev) { + /* + * We already have seen this ID, probably through + * another alias (PCI bridge of some sort). No need to + * create the device. + */ + its_dev->shared = true; + pr_debug("Reusing ITT for devID %x\n", dev_id); + goto out; + } + + its_dev = its_create_device(its, dev_id, nvec, true); + if (!its_dev) { + err = -ENOMEM; + goto out; + } + + pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec)); +out: + mutex_unlock(&its->dev_alloc_lock); + info->scratchpad[0].ptr = its_dev; + return err; +} + +static struct msi_domain_ops its_msi_domain_ops = { + .msi_prepare = its_msi_prepare, +}; + +static int its_irq_gic_domain_alloc(struct irq_domain *domain, + unsigned int virq, + irq_hw_number_t hwirq) +{ + struct irq_fwspec fwspec; + + if (irq_domain_get_of_node(domain->parent)) { + fwspec.fwnode = domain->parent->fwnode; + fwspec.param_count = 3; + fwspec.param[0] = GIC_IRQ_TYPE_LPI; + fwspec.param[1] = hwirq; + fwspec.param[2] = IRQ_TYPE_EDGE_RISING; + } else if (is_fwnode_irqchip(domain->parent->fwnode)) { + fwspec.fwnode = domain->parent->fwnode; + fwspec.param_count = 2; + fwspec.param[0] = hwirq; + fwspec.param[1] = IRQ_TYPE_EDGE_RISING; + } else { + return -EINVAL; + } + + return irq_domain_alloc_irqs_parent(domain, virq, 1, &fwspec); +} + +static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, + unsigned int nr_irqs, void *args) +{ + msi_alloc_info_t *info = args; + struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; + struct irq_data *irqd; + irq_hw_number_t hwirq; + int err; + int i; + + err = its_alloc_device_irq(its_dev, nr_irqs, &hwirq); + if (err) + return err; + + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); + if (err) + return err; + + for (i = 0; i < nr_irqs; i++) { + err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); + if (err) + return err; + + irq_domain_set_hwirq_and_chip(domain, virq + i, + hwirq + i, &its_irq_chip, its_dev); + irqd = irq_get_irq_data(virq + i); + irqd_set_single_target(irqd); + irqd_set_affinity_on_activate(irqd); + pr_debug("ID:%d pID:%d vID:%d\n", + (int)(hwirq + i - its_dev->event_map.lpi_base), + (int)(hwirq + i), virq + i); + } + + return 0; +} + +static int its_cpumask_first(struct its_device *its_dev, + const struct cpumask *cpu_mask) +{ + unsigned int skt, skt_id, i; + phys_addr_t its_phys_base; + unsigned int cpu, cpus = 0; + + unsigned int skt_cpu_cnt[MAX_MARS3_SKT_COUNT] = {0}; + + for (i = 0; i < nr_cpu_ids; i++) { + skt = (cpu_logical_map(i) >> 16) & 0xff; + if ((skt >= 0) && (skt < MAX_MARS3_SKT_COUNT)) { + skt_cpu_cnt[skt]++; + } else if (0xff != skt) { + pr_err("socket address: %d is out of range.", skt); + } + } + + its_phys_base = its_dev->its->phys_base; + skt_id = (its_phys_base >> 41) & 0x7; + + if (0 != skt_id) { + for (i = 0; i < skt_id; i++) { + cpus += skt_cpu_cnt[i]; + } + } + + cpu = cpumask_first(cpu_mask); + if ((cpu > cpus) && (cpu < (cpus + skt_cpu_cnt[skt_id]))) { + cpus = cpu; + } + + return cpus; +} + +static int its_irq_domain_activate(struct irq_domain *domain, + struct irq_data *d, bool reserve) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + const struct cpumask *cpu_mask = cpu_online_mask; + int cpu; + + cpu = its_cpumask_first(its_dev, cpu_mask); + + if (cpu < 0 || cpu >= nr_cpu_ids) + return -EINVAL; + + its_inc_lpi_count(d, cpu); + its_dev->event_map.col_map[event] = cpu; + irq_data_update_effective_affinity(d, cpumask_of(cpu)); + + /* Map the GIC IRQ and event to the device */ + its_send_mapti(its_dev, d->hwirq, event); + return 0; +} + +static void its_irq_domain_deactivate(struct irq_domain *domain, + struct irq_data *d) +{ + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + u32 event = its_get_event_id(d); + + its_dec_lpi_count(d, its_dev->event_map.col_map[event]); + /* Stop the delivery of interrupts */ + its_send_discard(its_dev, event); +} + +static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq, + unsigned int nr_irqs) +{ + struct irq_data *d = irq_domain_get_irq_data(domain, virq); + struct its_device *its_dev = irq_data_get_irq_chip_data(d); + struct its_node *its = its_dev->its; + int i; + + bitmap_release_region(its_dev->event_map.lpi_map, + its_get_event_id(irq_domain_get_irq_data(domain, virq)), + get_count_order(nr_irqs)); + + for (i = 0; i < nr_irqs; i++) { + struct irq_data *data = irq_domain_get_irq_data(domain, + virq + i); + /* Nuke the entry in the domain */ + irq_domain_reset_irq_data(data); + } + + mutex_lock(&its->dev_alloc_lock); + + /* + * If all interrupts have been freed, start mopping the + * floor. This is conditionned on the device not being shared. + */ + if (!its_dev->shared && + bitmap_empty(its_dev->event_map.lpi_map, + its_dev->event_map.nr_lpis)) { + its_lpi_free(its_dev->event_map.lpi_map, + its_dev->event_map.lpi_base, + its_dev->event_map.nr_lpis); + + /* Unmap device/itt */ + its_send_mapd(its_dev, 0); + its_free_device(its_dev); + } + + mutex_unlock(&its->dev_alloc_lock); + + irq_domain_free_irqs_parent(domain, virq, nr_irqs); +} + +static const struct irq_domain_ops its_domain_ops = { + .alloc = its_irq_domain_alloc, + .free = its_irq_domain_free, + .activate = its_irq_domain_activate, + .deactivate = its_irq_domain_deactivate, +}; + +/* + * This is insane. + * + * If a GICv4.0 doesn't implement Direct LPIs (which is extremely + * likely), the only way to perform an invalidate is to use a fake + * device to issue an INV command, implying that the LPI has first + * been mapped to some event on that device. Since this is not exactly + * cheap, we try to keep that mapping around as long as possible, and + * only issue an UNMAP if we're short on available slots. + * + * Broken by design(tm). + * + * GICv4.1, on the other hand, mandates that we're able to invalidate + * by writing to a MMIO register. It doesn't implement the whole of + * DirectLPI, but that's good enough. And most of the time, we don't + * even have to invalidate anything, as the redistributor can be told + * whether to generate a doorbell or not (we thus leave it enabled, + * always). + */ +static void its_vpe_db_proxy_unmap_locked(struct its_vpe *vpe) +{ + /* GICv4.1 doesn't use a proxy, so nothing to do here */ + if (gic_rdists->has_rvpeid) + return; + + /* Already unmapped? */ + if (vpe->vpe_proxy_event == -1) + return; + + its_send_discard(vpe_proxy.dev, vpe->vpe_proxy_event); + vpe_proxy.vpes[vpe->vpe_proxy_event] = NULL; + + /* + * We don't track empty slots at all, so let's move the + * next_victim pointer if we can quickly reuse that slot + * instead of nuking an existing entry. Not clear that this is + * always a win though, and this might just generate a ripple + * effect... Let's just hope VPEs don't migrate too often. + */ + if (vpe_proxy.vpes[vpe_proxy.next_victim]) + vpe_proxy.next_victim = vpe->vpe_proxy_event; + + vpe->vpe_proxy_event = -1; +} + +static void its_vpe_db_proxy_unmap(struct its_vpe *vpe) +{ + /* GICv4.1 doesn't use a proxy, so nothing to do here */ + if (gic_rdists->has_rvpeid) + return; + + if (!gic_rdists->has_direct_lpi) { + unsigned long flags; + + raw_spin_lock_irqsave(&vpe_proxy.lock, flags); + its_vpe_db_proxy_unmap_locked(vpe); + raw_spin_unlock_irqrestore(&vpe_proxy.lock, flags); + } +} + +static void its_vpe_db_proxy_map_locked(struct its_vpe *vpe) +{ + /* GICv4.1 doesn't use a proxy, so nothing to do here */ + if (gic_rdists->has_rvpeid) + return; + + /* Already mapped? */ + if (vpe->vpe_proxy_event != -1) + return; + + /* This slot was already allocated. Kick the other VPE out. */ + if (vpe_proxy.vpes[vpe_proxy.next_victim]) + its_vpe_db_proxy_unmap_locked(vpe_proxy.vpes[vpe_proxy.next_victim]); + + /* Map the new VPE instead */ + vpe_proxy.vpes[vpe_proxy.next_victim] = vpe; + vpe->vpe_proxy_event = vpe_proxy.next_victim; + vpe_proxy.next_victim = (vpe_proxy.next_victim + 1) % vpe_proxy.dev->nr_ites; + + vpe_proxy.dev->event_map.col_map[vpe->vpe_proxy_event] = vpe->col_idx; + its_send_mapti(vpe_proxy.dev, vpe->vpe_db_lpi, vpe->vpe_proxy_event); +} + +static void its_vpe_db_proxy_move(struct its_vpe *vpe, int from, int to) +{ + unsigned long flags; + struct its_collection *target_col; + + /* GICv4.1 doesn't use a proxy, so nothing to do here */ + if (gic_rdists->has_rvpeid) + return; + + if (gic_rdists->has_direct_lpi) { + void __iomem *rdbase; + + rdbase = per_cpu_ptr(gic_rdists->rdist, from)->rd_base; + gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_CLRLPIR); + wait_for_syncr(rdbase); + + return; + } + + raw_spin_lock_irqsave(&vpe_proxy.lock, flags); + + its_vpe_db_proxy_map_locked(vpe); + + target_col = &vpe_proxy.dev->its->collections[to]; + its_send_movi(vpe_proxy.dev, target_col, vpe->vpe_proxy_event); + vpe_proxy.dev->event_map.col_map[vpe->vpe_proxy_event] = to; + + raw_spin_unlock_irqrestore(&vpe_proxy.lock, flags); +} + +static int its_vpe_set_affinity(struct irq_data *d, + const struct cpumask *mask_val, + bool force) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + int from, cpu = cpumask_first(mask_val); + unsigned long flags; + + /* + * Changing affinity is mega expensive, so let's be as lazy as + * we can and only do it if we really have to. Also, if mapped + * into the proxy device, we need to move the doorbell + * interrupt to its new location. + * + * Another thing is that changing the affinity of a vPE affects + * *other interrupts* such as all the vLPIs that are routed to + * this vPE. This means that the irq_desc lock is not enough to + * protect us, and that we must ensure nobody samples vpe->col_idx + * during the update, hence the lock below which must also be + * taken on any vLPI handling path that evaluates vpe->col_idx. + */ + from = vpe_to_cpuid_lock(vpe, &flags); + if (from == cpu) + goto out; + + vpe->col_idx = cpu; + + /* + * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD + * is sharing its VPE table with the current one. + */ + if (gic_data_rdist_cpu(cpu)->vpe_table_mask && + cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask)) + goto out; + + its_send_vmovp(vpe); + its_vpe_db_proxy_move(vpe, from, cpu); + +out: + irq_data_update_effective_affinity(d, cpumask_of(cpu)); + vpe_to_cpuid_unlock(vpe, flags); + + return IRQ_SET_MASK_OK_DONE; +} + +static void its_wait_vpt_parse_complete(void) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val; + + if (!gic_rdists->has_vpend_valid_dirty) + return; + + WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + GICR_VPENDBASER, + val, + !(val & GICR_VPENDBASER_Dirty), + 10, 500)); +} + +static void its_vpe_schedule(struct its_vpe *vpe) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val; + + /* Schedule the VPE */ + val = virt_to_phys(page_address(vpe->its_vm->vprop_page)) & + GENMASK_ULL(51, 12); + val |= (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK; + val |= GICR_VPROPBASER_RaWb; + val |= GICR_VPROPBASER_InnerShareable; + gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER); + + val = virt_to_phys(page_address(vpe->vpt_page)) & + GENMASK_ULL(51, 16); + val |= GICR_VPENDBASER_RaWaWb; + val |= GICR_VPENDBASER_InnerShareable; + /* + * There is no good way of finding out if the pending table is + * empty as we can race against the doorbell interrupt very + * easily. So in the end, vpe->pending_last is only an + * indication that the vcpu has something pending, not one + * that the pending table is empty. A good implementation + * would be able to read its coarse map pretty quickly anyway, + * making this a tolerable issue. + */ + val |= GICR_VPENDBASER_PendingLast; + val |= vpe->idai ? GICR_VPENDBASER_IDAI : 0; + val |= GICR_VPENDBASER_Valid; + gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + its_wait_vpt_parse_complete(); +} + +static void its_vpe_deschedule(struct its_vpe *vpe) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val; + + val = its_clear_vpend_valid(vlpi_base, 0, 0); + + vpe->idai = !!(val & GICR_VPENDBASER_IDAI); + vpe->pending_last = !!(val & GICR_VPENDBASER_PendingLast); +} + +static void its_vpe_invall(struct its_vpe *vpe) +{ + struct its_node *its; + + list_for_each_entry(its, &its_nodes, entry) { + if (!is_v4(its)) + continue; + + if (its_list_map && !vpe->its_vm->vlpi_count[its->list_nr]) + continue; + + /* + * Sending a VINVALL to a single ITS is enough, as all + * we need is to reach the redistributors. + */ + its_send_vinvall(its, vpe); + return; + } +} + +static int its_vpe_set_vcpu_affinity(struct irq_data *d, void *vcpu_info) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_cmd_info *info = vcpu_info; + + switch (info->cmd_type) { + case SCHEDULE_VPE: + its_vpe_schedule(vpe); + return 0; + + case DESCHEDULE_VPE: + its_vpe_deschedule(vpe); + return 0; + + case INVALL_VPE: + its_vpe_invall(vpe); + return 0; + + default: + return -EINVAL; + } +} + +static void its_vpe_send_cmd(struct its_vpe *vpe, + void (*cmd)(struct its_device *, u32)) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&vpe_proxy.lock, flags); + + its_vpe_db_proxy_map_locked(vpe); + cmd(vpe_proxy.dev, vpe->vpe_proxy_event); + + raw_spin_unlock_irqrestore(&vpe_proxy.lock, flags); +} + +static void its_vpe_send_inv(struct irq_data *d) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + + if (gic_rdists->has_direct_lpi) { + void __iomem *rdbase; + + /* Target the redistributor this VPE is currently known on */ + raw_spin_lock(&gic_data_rdist_cpu(vpe->col_idx)->rd_lock); + rdbase = per_cpu_ptr(gic_rdists->rdist, vpe->col_idx)->rd_base; + gic_write_lpir(d->parent_data->hwirq, rdbase + GICR_INVLPIR); + wait_for_syncr(rdbase); + raw_spin_unlock(&gic_data_rdist_cpu(vpe->col_idx)->rd_lock); + } else { + its_vpe_send_cmd(vpe, its_send_inv); + } +} + +static void its_vpe_mask_irq(struct irq_data *d) +{ + /* + * We need to unmask the LPI, which is described by the parent + * irq_data. Instead of calling into the parent (which won't + * exactly do the right thing, let's simply use the + * parent_data pointer. Yes, I'm naughty. + */ + lpi_write_config(d->parent_data, LPI_PROP_ENABLED, 0); + its_vpe_send_inv(d); +} + +static void its_vpe_unmask_irq(struct irq_data *d) +{ + /* Same hack as above... */ + lpi_write_config(d->parent_data, 0, LPI_PROP_ENABLED); + its_vpe_send_inv(d); +} + +static int its_vpe_set_irqchip_state(struct irq_data *d, + enum irqchip_irq_state which, + bool state) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + + if (which != IRQCHIP_STATE_PENDING) + return -EINVAL; + + if (gic_rdists->has_direct_lpi) { + void __iomem *rdbase; + + rdbase = per_cpu_ptr(gic_rdists->rdist, vpe->col_idx)->rd_base; + if (state) { + gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_SETLPIR); + } else { + gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_CLRLPIR); + wait_for_syncr(rdbase); + } + } else { + if (state) + its_vpe_send_cmd(vpe, its_send_int); + else + its_vpe_send_cmd(vpe, its_send_clear); + } + + return 0; +} + +static int its_vpe_retrigger(struct irq_data *d) +{ + return !its_vpe_set_irqchip_state(d, IRQCHIP_STATE_PENDING, true); +} + +static struct irq_chip its_vpe_irq_chip = { + .name = "GICv4-vpe", + .irq_mask = its_vpe_mask_irq, + .irq_unmask = its_vpe_unmask_irq, + .irq_eoi = irq_chip_eoi_parent, + .irq_set_affinity = its_vpe_set_affinity, + .irq_retrigger = its_vpe_retrigger, + .irq_set_irqchip_state = its_vpe_set_irqchip_state, + .irq_set_vcpu_affinity = its_vpe_set_vcpu_affinity, +}; + +static struct its_node *find_4_1_its(void) +{ + static struct its_node *its = NULL; + + if (!its) { + list_for_each_entry(its, &its_nodes, entry) { + if (is_v4_1(its)) + return its; + } + + /* Oops? */ + its = NULL; + } + + return its; +} + +static void its_vpe_4_1_send_inv(struct irq_data *d) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_node *its; + + /* + * GICv4.1 wants doorbells to be invalidated using the + * INVDB command in order to be broadcast to all RDs. Send + * it to the first valid ITS, and let the HW do its magic. + */ + its = find_4_1_its(); + if (its) + its_send_invdb(its, vpe); +} + +static void its_vpe_4_1_mask_irq(struct irq_data *d) +{ + lpi_write_config(d->parent_data, LPI_PROP_ENABLED, 0); + its_vpe_4_1_send_inv(d); +} + +static void its_vpe_4_1_unmask_irq(struct irq_data *d) +{ + lpi_write_config(d->parent_data, 0, LPI_PROP_ENABLED); + its_vpe_4_1_send_inv(d); +} + +static void its_vpe_4_1_schedule(struct its_vpe *vpe, + struct its_cmd_info *info) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val = 0; + + /* Schedule the VPE */ + val |= GICR_VPENDBASER_Valid; + val |= info->g0en ? GICR_VPENDBASER_4_1_VGRP0EN : 0; + val |= info->g1en ? GICR_VPENDBASER_4_1_VGRP1EN : 0; + val |= FIELD_PREP(GICR_VPENDBASER_4_1_VPEID, vpe->vpe_id); + + gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + its_wait_vpt_parse_complete(); +} + +static void its_vpe_4_1_deschedule(struct its_vpe *vpe, + struct its_cmd_info *info) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val; + + if (info->req_db) { + unsigned long flags; + + /* + * vPE is going to block: make the vPE non-resident with + * PendingLast clear and DB set. The GIC guarantees that if + * we read-back PendingLast clear, then a doorbell will be + * delivered when an interrupt comes. + * + * Note the locking to deal with the concurrent update of + * pending_last from the doorbell interrupt handler that can + * run concurrently. + */ + raw_spin_lock_irqsave(&vpe->vpe_lock, flags); + val = its_clear_vpend_valid(vlpi_base, + GICR_VPENDBASER_PendingLast, + GICR_VPENDBASER_4_1_DB); + vpe->pending_last = !!(val & GICR_VPENDBASER_PendingLast); + raw_spin_unlock_irqrestore(&vpe->vpe_lock, flags); + } else { + /* + * We're not blocking, so just make the vPE non-resident + * with PendingLast set, indicating that we'll be back. + */ + val = its_clear_vpend_valid(vlpi_base, + 0, + GICR_VPENDBASER_PendingLast); + vpe->pending_last = true; + } +} + +static void its_vpe_4_1_invall(struct its_vpe *vpe) +{ + void __iomem *rdbase; + unsigned long flags; + u64 val; + int cpu; + + val = GICR_INVALLR_V; + val |= FIELD_PREP(GICR_INVALLR_VPEID, vpe->vpe_id); + + /* Target the redistributor this vPE is currently known on */ + cpu = vpe_to_cpuid_lock(vpe, &flags); + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock); + rdbase = per_cpu_ptr(gic_rdists->rdist, cpu)->rd_base; + gic_write_lpir(val, rdbase + GICR_INVALLR); + + wait_for_syncr(rdbase); + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock); + vpe_to_cpuid_unlock(vpe, flags); +} + +static int its_vpe_4_1_set_vcpu_affinity(struct irq_data *d, void *vcpu_info) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_cmd_info *info = vcpu_info; + + switch (info->cmd_type) { + case SCHEDULE_VPE: + its_vpe_4_1_schedule(vpe, info); + return 0; + + case DESCHEDULE_VPE: + its_vpe_4_1_deschedule(vpe, info); + return 0; + + case INVALL_VPE: + its_vpe_4_1_invall(vpe); + return 0; + + default: + return -EINVAL; + } +} + +static struct irq_chip its_vpe_4_1_irq_chip = { + .name = "GICv4.1-vpe", + .irq_mask = its_vpe_4_1_mask_irq, + .irq_unmask = its_vpe_4_1_unmask_irq, + .irq_eoi = irq_chip_eoi_parent, + .irq_set_affinity = its_vpe_set_affinity, + .irq_set_vcpu_affinity = its_vpe_4_1_set_vcpu_affinity, +}; + +static void its_configure_sgi(struct irq_data *d, bool clear) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_cmd_desc desc; + + desc.its_vsgi_cmd.vpe = vpe; + desc.its_vsgi_cmd.sgi = d->hwirq; + desc.its_vsgi_cmd.priority = vpe->sgi_config[d->hwirq].priority; + desc.its_vsgi_cmd.enable = vpe->sgi_config[d->hwirq].enabled; + desc.its_vsgi_cmd.group = vpe->sgi_config[d->hwirq].group; + desc.its_vsgi_cmd.clear = clear; + + /* + * GICv4.1 allows us to send VSGI commands to any ITS as long as the + * destination VPE is mapped there. Since we map them eagerly at + * activation time, we're pretty sure the first GICv4.1 ITS will do. + */ + its_send_single_vcommand(find_4_1_its(), its_build_vsgi_cmd, &desc); +} + +static void its_sgi_mask_irq(struct irq_data *d) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + + vpe->sgi_config[d->hwirq].enabled = false; + its_configure_sgi(d, false); +} + +static void its_sgi_unmask_irq(struct irq_data *d) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + + vpe->sgi_config[d->hwirq].enabled = true; + its_configure_sgi(d, false); +} + +static int its_sgi_set_affinity(struct irq_data *d, + const struct cpumask *mask_val, + bool force) +{ + /* + * There is no notion of affinity for virtual SGIs, at least + * not on the host (since they can only be targetting a vPE). + * Tell the kernel we've done whatever it asked for. + */ + irq_data_update_effective_affinity(d, mask_val); + return IRQ_SET_MASK_OK; +} + +static int its_sgi_set_irqchip_state(struct irq_data *d, + enum irqchip_irq_state which, + bool state) +{ + if (which != IRQCHIP_STATE_PENDING) + return -EINVAL; + + if (state) { + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_node *its = find_4_1_its(); + u64 val; + + val = FIELD_PREP(GITS_SGIR_VPEID, vpe->vpe_id); + val |= FIELD_PREP(GITS_SGIR_VINTID, d->hwirq); + writeq_relaxed(val, its->sgir_base + GITS_SGIR - SZ_128K); + } else { + its_configure_sgi(d, true); + } + + return 0; +} + +static int its_sgi_get_irqchip_state(struct irq_data *d, + enum irqchip_irq_state which, bool *val) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + void __iomem *base; + unsigned long flags; + u32 count = 1000000; /* 1s! */ + u32 status; + int cpu; + + if (which != IRQCHIP_STATE_PENDING) + return -EINVAL; + + /* + * Locking galore! We can race against two different events: + * + * - Concurent vPE affinity change: we must make sure it cannot + * happen, or we'll talk to the wrong redistributor. This is + * identical to what happens with vLPIs. + * + * - Concurrent VSGIPENDR access: As it involves accessing two + * MMIO registers, this must be made atomic one way or another. + */ + cpu = vpe_to_cpuid_lock(vpe, &flags); + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock); + base = gic_data_rdist_cpu(cpu)->rd_base + SZ_128K; + writel_relaxed(vpe->vpe_id, base + GICR_VSGIR); + do { + status = readl_relaxed(base + GICR_VSGIPENDR); + if (!(status & GICR_VSGIPENDR_BUSY)) + goto out; + + count--; + if (!count) { + pr_err_ratelimited("Unable to get SGI status\n"); + goto out; + } + cpu_relax(); + udelay(1); + } while (count); + +out: + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock); + vpe_to_cpuid_unlock(vpe, flags); + + if (!count) + return -ENXIO; + + *val = !!(status & (1 << d->hwirq)); + + return 0; +} + +static int its_sgi_set_vcpu_affinity(struct irq_data *d, void *vcpu_info) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_cmd_info *info = vcpu_info; + + switch (info->cmd_type) { + case PROP_UPDATE_VSGI: + vpe->sgi_config[d->hwirq].priority = info->priority; + vpe->sgi_config[d->hwirq].group = info->group; + its_configure_sgi(d, false); + return 0; + + default: + return -EINVAL; + } +} + +static struct irq_chip its_sgi_irq_chip = { + .name = "GICv4.1-sgi", + .irq_mask = its_sgi_mask_irq, + .irq_unmask = its_sgi_unmask_irq, + .irq_set_affinity = its_sgi_set_affinity, + .irq_set_irqchip_state = its_sgi_set_irqchip_state, + .irq_get_irqchip_state = its_sgi_get_irqchip_state, + .irq_set_vcpu_affinity = its_sgi_set_vcpu_affinity, +}; + +static int its_sgi_irq_domain_alloc(struct irq_domain *domain, + unsigned int virq, unsigned int nr_irqs, + void *args) +{ + struct its_vpe *vpe = args; + int i; + + /* Yes, we do want 16 SGIs */ + WARN_ON(nr_irqs != 16); + + for (i = 0; i < 16; i++) { + vpe->sgi_config[i].priority = 0; + vpe->sgi_config[i].enabled = false; + vpe->sgi_config[i].group = false; + + irq_domain_set_hwirq_and_chip(domain, virq + i, i, + &its_sgi_irq_chip, vpe); + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY); + } + + return 0; +} + +static void its_sgi_irq_domain_free(struct irq_domain *domain, + unsigned int virq, + unsigned int nr_irqs) +{ + /* Nothing to do */ +} + +static int its_sgi_irq_domain_activate(struct irq_domain *domain, + struct irq_data *d, bool reserve) +{ + /* Write out the initial SGI configuration */ + its_configure_sgi(d, false); + return 0; +} + +static void its_sgi_irq_domain_deactivate(struct irq_domain *domain, + struct irq_data *d) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + + /* + * The VSGI command is awkward: + * + * - To change the configuration, CLEAR must be set to false, + * leaving the pending bit unchanged. + * - To clear the pending bit, CLEAR must be set to true, leaving + * the configuration unchanged. + * + * You just can't do both at once, hence the two commands below. + */ + vpe->sgi_config[d->hwirq].enabled = false; + its_configure_sgi(d, false); + its_configure_sgi(d, true); +} + +static const struct irq_domain_ops its_sgi_domain_ops = { + .alloc = its_sgi_irq_domain_alloc, + .free = its_sgi_irq_domain_free, + .activate = its_sgi_irq_domain_activate, + .deactivate = its_sgi_irq_domain_deactivate, +}; + +static int its_vpe_id_alloc(void) +{ + return ida_simple_get(&its_vpeid_ida, 0, ITS_MAX_VPEID, GFP_KERNEL); +} + +static void its_vpe_id_free(u16 id) +{ + ida_simple_remove(&its_vpeid_ida, id); +} + +static int its_vpe_init(struct its_vpe *vpe) +{ + struct page *vpt_page; + int vpe_id; + + /* Allocate vpe_id */ + vpe_id = its_vpe_id_alloc(); + if (vpe_id < 0) + return vpe_id; + + /* Allocate VPT */ + vpt_page = its_allocate_pending_table(GFP_KERNEL); + if (!vpt_page) { + its_vpe_id_free(vpe_id); + return -ENOMEM; + } + + if (!its_alloc_vpe_table(vpe_id)) { + its_vpe_id_free(vpe_id); + its_free_pending_table(vpt_page); + return -ENOMEM; + } + + raw_spin_lock_init(&vpe->vpe_lock); + vpe->vpe_id = vpe_id; + vpe->vpt_page = vpt_page; + if (gic_rdists->has_rvpeid) + atomic_set(&vpe->vmapp_count, 0); + else + vpe->vpe_proxy_event = -1; + + return 0; +} + +static void its_vpe_teardown(struct its_vpe *vpe) +{ + its_vpe_db_proxy_unmap(vpe); + its_vpe_id_free(vpe->vpe_id); + its_free_pending_table(vpe->vpt_page); +} + +static void its_vpe_irq_domain_free(struct irq_domain *domain, + unsigned int virq, + unsigned int nr_irqs) +{ + struct its_vm *vm = domain->host_data; + int i; + + irq_domain_free_irqs_parent(domain, virq, nr_irqs); + + for (i = 0; i < nr_irqs; i++) { + struct irq_data *data = irq_domain_get_irq_data(domain, + virq + i); + struct its_vpe *vpe = irq_data_get_irq_chip_data(data); + + BUG_ON(vm != vpe->its_vm); + + clear_bit(data->hwirq, vm->db_bitmap); + its_vpe_teardown(vpe); + irq_domain_reset_irq_data(data); + } + + if (bitmap_empty(vm->db_bitmap, vm->nr_db_lpis)) { + its_lpi_free(vm->db_bitmap, vm->db_lpi_base, vm->nr_db_lpis); + its_free_prop_table(vm->vprop_page); + } +} + +static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, + unsigned int nr_irqs, void *args) +{ + struct irq_chip *irqchip = &its_vpe_irq_chip; + struct its_vm *vm = args; + unsigned long *bitmap; + struct page *vprop_page; + int base, nr_ids, i, err = 0; + + BUG_ON(!vm); + + bitmap = its_lpi_alloc(roundup_pow_of_two(nr_irqs), &base, &nr_ids); + if (!bitmap) + return -ENOMEM; + + if (nr_ids < nr_irqs) { + its_lpi_free(bitmap, base, nr_ids); + return -ENOMEM; + } + + vprop_page = its_allocate_prop_table(GFP_KERNEL); + if (!vprop_page) { + its_lpi_free(bitmap, base, nr_ids); + return -ENOMEM; + } + + vm->db_bitmap = bitmap; + vm->db_lpi_base = base; + vm->nr_db_lpis = nr_ids; + vm->vprop_page = vprop_page; + + if (gic_rdists->has_rvpeid) + irqchip = &its_vpe_4_1_irq_chip; + + for (i = 0; i < nr_irqs; i++) { + vm->vpes[i]->vpe_db_lpi = base + i; + err = its_vpe_init(vm->vpes[i]); + if (err) + break; + err = its_irq_gic_domain_alloc(domain, virq + i, + vm->vpes[i]->vpe_db_lpi); + if (err) + break; + irq_domain_set_hwirq_and_chip(domain, virq + i, i, + irqchip, vm->vpes[i]); + set_bit(i, bitmap); + } + + if (err) { + if (i > 0) + its_vpe_irq_domain_free(domain, virq, i - 1); + + its_lpi_free(bitmap, base, nr_ids); + its_free_prop_table(vprop_page); + } + + return err; +} + +static int its_vpe_irq_domain_activate(struct irq_domain *domain, + struct irq_data *d, bool reserve) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_node *its; + + /* + * If we use the list map, we issue VMAPP on demand... Unless + * we're on a GICv4.1 and we eagerly map the VPE on all ITSs + * so that VSGIs can work. + */ + if (!gic_requires_eager_mapping()) + return 0; + + /* Map the VPE to the first possible CPU */ + vpe->col_idx = cpumask_first(cpu_online_mask); + + list_for_each_entry(its, &its_nodes, entry) { + if (!is_v4(its)) + continue; + + its_send_vmapp(its, vpe, true); + its_send_vinvall(its, vpe); + } + + irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx)); + + return 0; +} + +static void its_vpe_irq_domain_deactivate(struct irq_domain *domain, + struct irq_data *d) +{ + struct its_vpe *vpe = irq_data_get_irq_chip_data(d); + struct its_node *its; + + /* + * If we use the list map on GICv4.0, we unmap the VPE once no + * VLPIs are associated with the VM. + */ + if (!gic_requires_eager_mapping()) + return; + + list_for_each_entry(its, &its_nodes, entry) { + if (!is_v4(its)) + continue; + + its_send_vmapp(its, vpe, false); + } +} + +static const struct irq_domain_ops its_vpe_domain_ops = { + .alloc = its_vpe_irq_domain_alloc, + .free = its_vpe_irq_domain_free, + .activate = its_vpe_irq_domain_activate, + .deactivate = its_vpe_irq_domain_deactivate, +}; + +static int its_force_quiescent(void __iomem *base) +{ + u32 count = 1000000; /* 1s */ + u32 val; + + val = readl_relaxed(base + GITS_CTLR); + /* + * GIC architecture specification requires the ITS to be both + * disabled and quiescent for writes to GITS_BASER<n> or + * GITS_CBASER to not have UNPREDICTABLE results. + */ + if ((val & GITS_CTLR_QUIESCENT) && !(val & GITS_CTLR_ENABLE)) + return 0; + + /* Disable the generation of all interrupts to this ITS */ + val &= ~(GITS_CTLR_ENABLE | GITS_CTLR_ImDe); + writel_relaxed(val, base + GITS_CTLR); + + /* Poll GITS_CTLR and wait until ITS becomes quiescent */ + while (1) { + val = readl_relaxed(base + GITS_CTLR); + if (val & GITS_CTLR_QUIESCENT) + return 0; + + count--; + if (!count) + return -EBUSY; + + cpu_relax(); + udelay(1); + } +} + +static bool __maybe_unused its_enable_quirk_cavium_22375(void *data) +{ + struct its_node *its = data; + + /* erratum 22375: only alloc 8MB table size (20 bits) */ + its->typer &= ~GITS_TYPER_DEVBITS; + its->typer |= FIELD_PREP(GITS_TYPER_DEVBITS, 20 - 1); + its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_22375; + + return true; +} + +static bool __maybe_unused its_enable_quirk_cavium_23144(void *data) +{ + struct its_node *its = data; + + its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_23144; + + return true; +} + +static bool __maybe_unused its_enable_quirk_qdf2400_e0065(void *data) +{ + struct its_node *its = data; + + /* On QDF2400, the size of the ITE is 16Bytes */ + its->typer &= ~GITS_TYPER_ITT_ENTRY_SIZE; + its->typer |= FIELD_PREP(GITS_TYPER_ITT_ENTRY_SIZE, 16 - 1); + + return true; +} + +static u64 its_irq_get_msi_base_pre_its(struct its_device *its_dev) +{ + struct its_node *its = its_dev->its; + + /* + * The Socionext Synquacer SoC has a so-called 'pre-ITS', + * which maps 32-bit writes targeted at a separate window of + * size '4 << device_id_bits' onto writes to GITS_TRANSLATER + * with device ID taken from bits [device_id_bits + 1:2] of + * the window offset. + */ + return its->pre_its_base + (its_dev->device_id << 2); +} + +static bool __maybe_unused its_enable_quirk_socionext_synquacer(void *data) +{ + struct its_node *its = data; + u32 pre_its_window[2]; + u32 ids; + + if (!fwnode_property_read_u32_array(its->fwnode_handle, + "socionext,synquacer-pre-its", + pre_its_window, + ARRAY_SIZE(pre_its_window))) { + + its->pre_its_base = pre_its_window[0]; + its->get_msi_base = its_irq_get_msi_base_pre_its; + + ids = ilog2(pre_its_window[1]) - 2; + if (device_ids(its) > ids) { + its->typer &= ~GITS_TYPER_DEVBITS; + its->typer |= FIELD_PREP(GITS_TYPER_DEVBITS, ids - 1); + } + + /* the pre-ITS breaks isolation, so disable MSI remapping */ + its->msi_domain_flags &= ~IRQ_DOMAIN_FLAG_MSI_REMAP; + return true; + } + return false; +} + +static bool __maybe_unused its_enable_quirk_hip07_161600802(void *data) +{ + struct its_node *its = data; + + /* + * Hip07 insists on using the wrong address for the VLPI + * page. Trick it into doing the right thing... + */ + its->vlpi_redist_offset = SZ_128K; + return true; +} + +static const struct gic_quirk its_quirks[] = { +#ifdef CONFIG_CAVIUM_ERRATUM_22375 + { + .desc = "ITS: Cavium errata 22375, 24313", + .iidr = 0xa100034c, /* ThunderX pass 1.x */ + .mask = 0xffff0fff, + .init = its_enable_quirk_cavium_22375, + }, +#endif +#ifdef CONFIG_CAVIUM_ERRATUM_23144 + { + .desc = "ITS: Cavium erratum 23144", + .iidr = 0xa100034c, /* ThunderX pass 1.x */ + .mask = 0xffff0fff, + .init = its_enable_quirk_cavium_23144, + }, +#endif +#ifdef CONFIG_QCOM_QDF2400_ERRATUM_0065 + { + .desc = "ITS: QDF2400 erratum 0065", + .iidr = 0x00001070, /* QDF2400 ITS rev 1.x */ + .mask = 0xffffffff, + .init = its_enable_quirk_qdf2400_e0065, + }, +#endif +#ifdef CONFIG_SOCIONEXT_SYNQUACER_PREITS + { + /* + * The Socionext Synquacer SoC incorporates ARM's own GIC-500 + * implementation, but with a 'pre-ITS' added that requires + * special handling in software. + */ + .desc = "ITS: Socionext Synquacer pre-ITS", + .iidr = 0x0001143b, + .mask = 0xffffffff, + .init = its_enable_quirk_socionext_synquacer, + }, +#endif +#ifdef CONFIG_HISILICON_ERRATUM_161600802 + { + .desc = "ITS: Hip07 erratum 161600802", + .iidr = 0x00000004, + .mask = 0xffffffff, + .init = its_enable_quirk_hip07_161600802, + }, +#endif + { + } +}; + +static void its_enable_quirks(struct its_node *its) +{ + u32 iidr = readl_relaxed(its->base + GITS_IIDR); + + gic_enable_quirks(iidr, its_quirks, its); +} + +static int its_save_disable(void) +{ + struct its_node *its; + int err = 0; + + raw_spin_lock(&its_lock); + list_for_each_entry(its, &its_nodes, entry) { + void __iomem *base; + + base = its->base; + its->ctlr_save = readl_relaxed(base + GITS_CTLR); + err = its_force_quiescent(base); + if (err) { + pr_err("ITS@%pa: failed to quiesce: %d\n", + &its->phys_base, err); + writel_relaxed(its->ctlr_save, base + GITS_CTLR); + goto err; + } + + its->cbaser_save = gits_read_cbaser(base + GITS_CBASER); + } + +err: + if (err) { + list_for_each_entry_continue_reverse(its, &its_nodes, entry) { + void __iomem *base; + + base = its->base; + writel_relaxed(its->ctlr_save, base + GITS_CTLR); + } + } + raw_spin_unlock(&its_lock); + + return err; +} + +static void its_restore_enable(void) +{ + struct its_node *its; + int ret; + + raw_spin_lock(&its_lock); + list_for_each_entry(its, &its_nodes, entry) { + void __iomem *base; + int i; + + base = its->base; + + /* + * Make sure that the ITS is disabled. If it fails to quiesce, + * don't restore it since writing to CBASER or BASER<n> + * registers is undefined according to the GIC v3 ITS + * Specification. + * + * Firmware resuming with the ITS enabled is terminally broken. + */ + WARN_ON(readl_relaxed(base + GITS_CTLR) & GITS_CTLR_ENABLE); + ret = its_force_quiescent(base); + if (ret) { + pr_err("ITS@%pa: failed to quiesce on resume: %d\n", + &its->phys_base, ret); + continue; + } + + gits_write_cbaser(its->cbaser_save, base + GITS_CBASER); + + /* + * Writing CBASER resets CREADR to 0, so make CWRITER and + * cmd_write line up with it. + */ + its->cmd_write = its->cmd_base; + gits_write_cwriter(0, base + GITS_CWRITER); + + /* Restore GITS_BASER from the value cache. */ + for (i = 0; i < GITS_BASER_NR_REGS; i++) { + struct its_baser *baser = &its->tables[i]; + + if (!(baser->val & GITS_BASER_VALID)) + continue; + + its_write_baser(its, baser, baser->val); + } + writel_relaxed(its->ctlr_save, base + GITS_CTLR); + + /* + * Reinit the collection if it's stored in the ITS. This is + * indicated by the col_id being less than the HCC field. + * CID < HCC as specified in the GIC v3 Documentation. + */ + if (its->collections[smp_processor_id()].col_id < + GITS_TYPER_HCC(gic_read_typer(base + GITS_TYPER))) + its_cpu_init_collection(its); + } + raw_spin_unlock(&its_lock); +} + +static struct syscore_ops its_syscore_ops = { + .suspend = its_save_disable, + .resume = its_restore_enable, +}; + +static int its_init_domain(struct fwnode_handle *handle, struct its_node *its) +{ + struct irq_domain *inner_domain; + struct msi_domain_info *info; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + + inner_domain = irq_domain_create_tree(handle, &its_domain_ops, its); + if (!inner_domain) { + kfree(info); + return -ENOMEM; + } + + inner_domain->parent = its_parent; + irq_domain_update_bus_token(inner_domain, DOMAIN_BUS_NEXUS); + inner_domain->flags |= its->msi_domain_flags; + info->ops = &its_msi_domain_ops; + info->data = its; + inner_domain->host_data = info; + + return 0; +} + +static int its_init_vpe_domain(void) +{ + struct its_node *its; + u32 devid; + int entries; + + if (gic_rdists->has_direct_lpi) { + pr_info("ITS: Using DirectLPI for VPE invalidation\n"); + return 0; + } + + /* Any ITS will do, even if not v4 */ + its = list_first_entry(&its_nodes, struct its_node, entry); + + entries = roundup_pow_of_two(nr_cpu_ids); + vpe_proxy.vpes = kcalloc(entries, sizeof(*vpe_proxy.vpes), + GFP_KERNEL); + if (!vpe_proxy.vpes) { + pr_err("ITS: Can't allocate GICv4 proxy device array\n"); + return -ENOMEM; + } + + /* Use the last possible DevID */ + devid = GENMASK(device_ids(its) - 1, 0); + vpe_proxy.dev = its_create_device(its, devid, entries, false); + if (!vpe_proxy.dev) { + kfree(vpe_proxy.vpes); + pr_err("ITS: Can't allocate GICv4 proxy device\n"); + return -ENOMEM; + } + + BUG_ON(entries > vpe_proxy.dev->nr_ites); + + raw_spin_lock_init(&vpe_proxy.lock); + vpe_proxy.next_victim = 0; + pr_info("ITS: Allocated DevID %x as GICv4 proxy device (%d slots)\n", + devid, vpe_proxy.dev->nr_ites); + + return 0; +} + +static int __init its_compute_its_list_map(struct resource *res, + void __iomem *its_base) +{ + int its_number; + u32 ctlr; + + /* + * This is assumed to be done early enough that we're + * guaranteed to be single-threaded, hence no + * locking. Should this change, we should address + * this. + */ + its_number = find_first_zero_bit(&its_list_map, GICv4_ITS_LIST_MAX); + if (its_number >= GICv4_ITS_LIST_MAX) { + pr_err("ITS@%pa: No ITSList entry available!\n", + &res->start); + return -EINVAL; + } + + ctlr = readl_relaxed(its_base + GITS_CTLR); + ctlr &= ~GITS_CTLR_ITS_NUMBER; + ctlr |= its_number << GITS_CTLR_ITS_NUMBER_SHIFT; + writel_relaxed(ctlr, its_base + GITS_CTLR); + ctlr = readl_relaxed(its_base + GITS_CTLR); + if ((ctlr & GITS_CTLR_ITS_NUMBER) != (its_number << GITS_CTLR_ITS_NUMBER_SHIFT)) { + its_number = ctlr & GITS_CTLR_ITS_NUMBER; + its_number >>= GITS_CTLR_ITS_NUMBER_SHIFT; + } + + if (test_and_set_bit(its_number, &its_list_map)) { + pr_err("ITS@%pa: Duplicate ITSList entry %d\n", + &res->start, its_number); + return -EINVAL; + } + + return its_number; +} + +static int __init its_probe_one(struct resource *res, + struct fwnode_handle *handle, int numa_node) +{ + struct its_node *its; + void __iomem *its_base; + u32 val, ctlr; + u64 baser, tmp, typer; + struct page *page; + int err; + + its_base = ioremap(res->start, SZ_64K); + if (!its_base) { + pr_warn("ITS@%pa: Unable to map ITS registers\n", &res->start); + return -ENOMEM; + } + + val = readl_relaxed(its_base + GITS_PIDR2) & GIC_PIDR2_ARCH_MASK; + if (val != 0x30 && val != 0x40) { + pr_warn("ITS@%pa: No ITS detected, giving up\n", &res->start); + err = -ENODEV; + goto out_unmap; + } + + err = its_force_quiescent(its_base); + if (err) { + pr_warn("ITS@%pa: Failed to quiesce, giving up\n", &res->start); + goto out_unmap; + } + + pr_info("ITS %pR\n", res); + + its = kzalloc(sizeof(*its), GFP_KERNEL); + if (!its) { + err = -ENOMEM; + goto out_unmap; + } + + raw_spin_lock_init(&its->lock); + mutex_init(&its->dev_alloc_lock); + INIT_LIST_HEAD(&its->entry); + INIT_LIST_HEAD(&its->its_device_list); + typer = gic_read_typer(its_base + GITS_TYPER); + its->typer = typer; + its->base = its_base; + its->phys_base = res->start; + if (is_v4(its)) { + if (!(typer & GITS_TYPER_VMOVP)) { + err = its_compute_its_list_map(res, its_base); + if (err < 0) + goto out_free_its; + + its->list_nr = err; + + pr_info("ITS@%pa: Using ITS number %d\n", + &res->start, err); + } else { + pr_info("ITS@%pa: Single VMOVP capable\n", &res->start); + } + + if (is_v4_1(its)) { + u32 svpet = FIELD_GET(GITS_TYPER_SVPET, typer); + + its->sgir_base = ioremap(res->start + SZ_128K, SZ_64K); + if (!its->sgir_base) { + err = -ENOMEM; + goto out_free_its; + } + + its->mpidr = readl_relaxed(its_base + GITS_MPIDR); + + pr_info("ITS@%pa: Using GICv4.1 mode %08x %08x\n", + &res->start, its->mpidr, svpet); + } + } + + its->numa_node = numa_node; + + page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, + get_order(ITS_CMD_QUEUE_SZ)); + if (!page) { + err = -ENOMEM; + goto out_unmap_sgir; + } + its->cmd_base = (void *)page_address(page); + its->cmd_write = its->cmd_base; + its->fwnode_handle = handle; + its->get_msi_base = its_irq_get_msi_base; + its->msi_domain_flags = IRQ_DOMAIN_FLAG_MSI_REMAP; + + its_enable_quirks(its); + + err = its_alloc_tables(its); + if (err) + goto out_free_cmd; + + err = its_alloc_collections(its); + if (err) + goto out_free_tables; + + baser = (virt_to_phys(its->cmd_base) | + GITS_CBASER_RaWaWb | + GITS_CBASER_InnerShareable | + (ITS_CMD_QUEUE_SZ / SZ_4K - 1) | + GITS_CBASER_VALID); + + gits_write_cbaser(baser, its->base + GITS_CBASER); + tmp = gits_read_cbaser(its->base + GITS_CBASER); + + if ((tmp ^ baser) & GITS_CBASER_SHAREABILITY_MASK) { + if (!(tmp & GITS_CBASER_SHAREABILITY_MASK)) { + /* + * The HW reports non-shareable, we must + * remove the cacheability attributes as + * well. + */ + baser &= ~(GITS_CBASER_SHAREABILITY_MASK | + GITS_CBASER_CACHEABILITY_MASK); + baser |= GITS_CBASER_nC; + gits_write_cbaser(baser, its->base + GITS_CBASER); + } + pr_info("ITS: using cache flushing for cmd queue\n"); + its->flags |= ITS_FLAGS_CMDQ_NEEDS_FLUSHING; + } + + gits_write_cwriter(0, its->base + GITS_CWRITER); + ctlr = readl_relaxed(its->base + GITS_CTLR); + ctlr |= GITS_CTLR_ENABLE; + if (is_v4(its)) + ctlr |= GITS_CTLR_ImDe; + writel_relaxed(ctlr, its->base + GITS_CTLR); + + err = its_init_domain(handle, its); + if (err) + goto out_free_tables; + + raw_spin_lock(&its_lock); + list_add(&its->entry, &its_nodes); + raw_spin_unlock(&its_lock); + + return 0; + +out_free_tables: + its_free_tables(its); +out_free_cmd: + free_pages((unsigned long)its->cmd_base, get_order(ITS_CMD_QUEUE_SZ)); +out_unmap_sgir: + if (its->sgir_base) + iounmap(its->sgir_base); +out_free_its: + kfree(its); +out_unmap: + iounmap(its_base); + pr_err("ITS@%pa: failed probing (%d)\n", &res->start, err); + return err; +} + +static bool gic_rdists_supports_plpis(void) +{ + return !!(gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER) & GICR_TYPER_PLPIS); +} + +static int redist_disable_lpis(void) +{ + void __iomem *rbase = gic_data_rdist_rd_base(); + u64 timeout = USEC_PER_SEC; + u64 val; + + if (!gic_rdists_supports_plpis()) { + pr_info("CPU%d: LPIs not supported\n", smp_processor_id()); + return -ENXIO; + } + + val = readl_relaxed(rbase + GICR_CTLR); + if (!(val & GICR_CTLR_ENABLE_LPIS)) + return 0; + + /* + * If coming via a CPU hotplug event, we don't need to disable + * LPIs before trying to re-enable them. They are already + * configured and all is well in the world. + * + * If running with preallocated tables, there is nothing to do. + */ + if (gic_data_rdist()->lpi_enabled || + (gic_rdists->flags & RDIST_FLAGS_RD_TABLES_PREALLOCATED)) + return 0; + + /* + * From that point on, we only try to do some damage control. + */ + pr_warn("GIC-2500: CPU%d: Booted with LPIs enabled, memory probably corrupted\n", + smp_processor_id()); + add_taint(TAINT_CRAP, LOCKDEP_STILL_OK); + + /* Disable LPIs */ + val &= ~GICR_CTLR_ENABLE_LPIS; + writel_relaxed(val, rbase + GICR_CTLR); + + /* Make sure any change to GICR_CTLR is observable by the GIC */ + dsb(sy); + + /* + * Software must observe RWP==0 after clearing GICR_CTLR.EnableLPIs + * from 1 to 0 before programming GICR_PEND{PROP}BASER registers. + * Error out if we time out waiting for RWP to clear. + */ + while (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_RWP) { + if (!timeout) { + pr_err("CPU%d: Timeout while disabling LPIs\n", + smp_processor_id()); + return -ETIMEDOUT; + } + udelay(1); + timeout--; + } + + /* + * After it has been written to 1, it is IMPLEMENTATION + * DEFINED whether GICR_CTLR.EnableLPI becomes RES1 or can be + * cleared to 0. Error out if clearing the bit failed. + */ + if (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_ENABLE_LPIS) { + pr_err("CPU%d: Failed to disable LPIs\n", smp_processor_id()); + return -EBUSY; + } + + return 0; +} + +int phytium_its_cpu_init(void) +{ + if (!list_empty(&its_nodes)) { + int ret; + + ret = redist_disable_lpis(); + if (ret) + return ret; + + its_cpu_init_lpis(); + its_cpu_init_collections(); + } + + return 0; +} + +static const struct of_device_id its_device_id[] = { + { .compatible = "arm,gic-phytium-2500-its", }, + {}, +}; + +static int __init its_of_probe(struct device_node *node) +{ + struct device_node *np; + struct resource res; + + for (np = of_find_matching_node(node, its_device_id); np; + np = of_find_matching_node(np, its_device_id)) { + if (!of_device_is_available(np)) + continue; + if (!of_property_read_bool(np, "msi-controller")) { + pr_warn("%pOF: no msi-controller property, ITS ignored\n", + np); + continue; + } + + if (of_address_to_resource(np, 0, &res)) { + pr_warn("%pOF: no regs?\n", np); + continue; + } + + its_probe_one(&res, &np->fwnode, of_node_to_nid(np)); + } + return 0; +} + +#ifdef CONFIG_ACPI + +#define ACPI_GICV3_ITS_MEM_SIZE (SZ_128K) + +#ifdef CONFIG_ACPI_NUMA +struct its_srat_map { + /* numa node id */ + u32 numa_node; + /* GIC ITS ID */ + u32 its_id; +}; + +static struct its_srat_map *its_srat_maps __initdata; +static int its_in_srat __initdata; + +static int __init acpi_get_its_numa_node(u32 its_id) +{ + int i; + + for (i = 0; i < its_in_srat; i++) { + if (its_id == its_srat_maps[i].its_id) + return its_srat_maps[i].numa_node; + } + return NUMA_NO_NODE; +} + +static int __init gic_acpi_match_srat_its(union acpi_subtable_headers *header, + const unsigned long end) +{ + return 0; +} + +static int __init gic_acpi_parse_srat_its(union acpi_subtable_headers *header, + const unsigned long end) +{ + int node; + struct acpi_srat_gic_its_affinity *its_affinity; + + its_affinity = (struct acpi_srat_gic_its_affinity *)header; + if (!its_affinity) + return -EINVAL; + + if (its_affinity->header.length < sizeof(*its_affinity)) { + pr_err("SRAT: Invalid header length %d in ITS affinity\n", + its_affinity->header.length); + return -EINVAL; + } + + /* + * Note that in theory a new proximity node could be created by this + * entry as it is an SRAT resource allocation structure. + * We do not currently support doing so. + */ + node = pxm_to_node(its_affinity->proximity_domain); + + if (node == NUMA_NO_NODE || node >= MAX_NUMNODES) { + pr_err("SRAT: Invalid NUMA node %d in ITS affinity\n", node); + return 0; + } + + its_srat_maps[its_in_srat].numa_node = node; + its_srat_maps[its_in_srat].its_id = its_affinity->its_id; + its_in_srat++; + pr_info("SRAT: PXM %d -> ITS %d -> Node %d\n", + its_affinity->proximity_domain, its_affinity->its_id, node); + + return 0; +} + +static void __init acpi_table_parse_srat_its(void) +{ + int count; + + count = acpi_table_parse_entries(ACPI_SIG_SRAT, + sizeof(struct acpi_table_srat), + ACPI_SRAT_TYPE_GIC_ITS_AFFINITY, + gic_acpi_match_srat_its, 0); + if (count <= 0) + return; + + its_srat_maps = kmalloc_array(count, sizeof(struct its_srat_map), + GFP_KERNEL); + if (!its_srat_maps) { + pr_warn("SRAT: Failed to allocate memory for its_srat_maps!\n"); + return; + } + + acpi_table_parse_entries(ACPI_SIG_SRAT, + sizeof(struct acpi_table_srat), + ACPI_SRAT_TYPE_GIC_ITS_AFFINITY, + gic_acpi_parse_srat_its, 0); +} + +/* free the its_srat_maps after ITS probing */ +static void __init acpi_its_srat_maps_free(void) +{ + kfree(its_srat_maps); +} +#else +static void __init acpi_table_parse_srat_its(void) { } +static int __init acpi_get_its_numa_node(u32 its_id) { return NUMA_NO_NODE; } +static void __init acpi_its_srat_maps_free(void) { } +#endif + +static int __init gic_acpi_parse_madt_its(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_generic_translator *its_entry; + struct fwnode_handle *dom_handle; + struct resource res; + int err; + + its_entry = (struct acpi_madt_generic_translator *)header; + memset(&res, 0, sizeof(res)); + res.start = its_entry->base_address; + res.end = its_entry->base_address + ACPI_GICV3_ITS_MEM_SIZE - 1; + res.flags = IORESOURCE_MEM; + + dom_handle = irq_domain_alloc_fwnode(&res.start); + if (!dom_handle) { + pr_err("ITS@%pa: Unable to allocate GIC-phytium-2500 ITS domain token\n", + &res.start); + return -ENOMEM; + } + + err = iort_register_domain_token(its_entry->translation_id, res.start, + dom_handle); + if (err) { + pr_err("ITS@%pa: Unable to register GIC-phytium-2500 ITS domain token (ITS ID %d) to IORT\n", + &res.start, its_entry->translation_id); + goto dom_err; + } + + err = its_probe_one(&res, dom_handle, + acpi_get_its_numa_node(its_entry->translation_id)); + if (!err) + return 0; + + iort_deregister_domain_token(its_entry->translation_id); +dom_err: + irq_domain_free_fwnode(dom_handle); + return err; +} + +static void __init its_acpi_probe(void) +{ + acpi_table_parse_srat_its(); + acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR, + gic_acpi_parse_madt_its, 0); + acpi_its_srat_maps_free(); +} +#else +static void __init its_acpi_probe(void) { } +#endif + +int __init phytium_its_init(struct fwnode_handle *handle, struct rdists *rdists, + struct irq_domain *parent_domain) +{ + struct device_node *of_node; + struct its_node *its; + bool has_v4 = false; + bool has_v4_1 = false; + int err; + + gic_rdists = rdists; + + its_parent = parent_domain; + of_node = to_of_node(handle); + if (of_node) + its_of_probe(of_node); + else + its_acpi_probe(); + + if (list_empty(&its_nodes)) { + pr_warn("ITS: No ITS available, not enabling LPIs\n"); + return -ENXIO; + } + + err = allocate_lpi_tables(); + if (err) + return err; + + list_for_each_entry(its, &its_nodes, entry) { + has_v4 |= is_v4(its); + has_v4_1 |= is_v4_1(its); + } + + /* Don't bother with inconsistent systems */ + if (WARN_ON(!has_v4_1 && rdists->has_rvpeid)) + rdists->has_rvpeid = false; + + if (has_v4 & rdists->has_vlpis) { + const struct irq_domain_ops *sgi_ops; + + if (has_v4_1) + sgi_ops = &its_sgi_domain_ops; + else + sgi_ops = NULL; + + if (its_init_vpe_domain() || + its_init_v4(parent_domain, &its_vpe_domain_ops, sgi_ops)) { + rdists->has_vlpis = false; + pr_err("ITS: Disabling GICv4 support\n"); + } + } + + register_syscore_ops(&its_syscore_ops); + + return 0; +} diff --git a/drivers/irqchip/irq-gic-phytium-2500.c b/drivers/irqchip/irq-gic-phytium-2500.c new file mode 100644 index 000000000000..ba0545fcee56 --- /dev/null +++ b/drivers/irqchip/irq-gic-phytium-2500.c @@ -0,0 +1,2503 @@ +/* + * Copyright (C) 2020 Phytium Corporation. + * Author: Wang Yinfeng <wangyinfeng(a)phytium.com.cn> + * Chen Baozi <chenbaozi(a)phytium.com.cn> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + + +#define pr_fmt(fmt) "GIC-2500: " fmt + +#include <linux/acpi.h> +#include <linux/cpu.h> +#include <linux/cpu_pm.h> +#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/irqdomain.h> +#include <linux/of.h> +#include <linux/of_address.h> +#include <linux/of_irq.h> +#include <linux/percpu.h> +#include <linux/refcount.h> +#include <linux/slab.h> + +#include <linux/irqchip.h> +#include <linux/irqchip/arm-gic-common.h> +#include <linux/irqchip/arm-gic-phytium-2500.h> +#include <linux/irqchip/irq-partition-percpu.h> + +#include <asm/cputype.h> +#include <asm/exception.h> +#include <asm/smp_plat.h> +#include <asm/virt.h> + +#include "irq-gic-common.h" + +#define MAX_MARS3_SOC_COUNT 8 +#define MARS3_ADDR_SKTID_SHIFT 41 + +struct gic_dist_desc { + void __iomem *dist_base; + phys_addr_t phys_base; + unsigned long size; + +}; + +static struct gic_dist_desc mars3_gic_dists[MAX_MARS3_SOC_COUNT] __read_mostly; + +static unsigned int mars3_sockets_bitmap = 0x1; + +#define mars3_irq_to_skt(hwirq) (((hwirq) - 32) % 8) + +#define GICD_INT_NMI_PRI (GICD_INT_DEF_PRI & ~0x80) + +#define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0) +#define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539 (1ULL << 1) + +#define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) + +struct redist_region { + void __iomem *redist_base; + phys_addr_t phys_base; + bool single_redist; +}; + +struct gic_chip_data { + struct fwnode_handle *fwnode; + void __iomem *dist_base; + struct redist_region *redist_regions; + struct rdists rdists; + struct irq_domain *domain; + u64 redist_stride; + u32 nr_redist_regions; + u64 flags; + bool has_rss; + unsigned int ppi_nr; + struct partition_desc **ppi_descs; +}; + +static struct gic_chip_data gic_data __read_mostly; +static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key); + +#define GIC_ID_NR (1U << GICD_TYPER_ID_BITS(gic_data.rdists.gicd_typer)) +#define GIC_LINE_NR min(GICD_TYPER_SPIS(gic_data.rdists.gicd_typer), 1020U) +#define GIC_ESPI_NR GICD_TYPER_ESPIS(gic_data.rdists.gicd_typer) + +/* + * The behaviours of RPR and PMR registers differ depending on the value of + * SCR_EL3.FIQ, and the behaviour of non-secure priority registers of the + * distributor and redistributors depends on whether security is enabled in the + * GIC. + * + * When security is enabled, non-secure priority values from the (re)distributor + * are presented to the GIC CPUIF as follow: + * (GIC_(R)DIST_PRI[irq] >> 1) | 0x80; + * + * If SCR_EL3.FIQ == 1, the values writen to/read from PMR and RPR at non-secure + * EL1 are subject to a similar operation thus matching the priorities presented + * from the (re)distributor when security is enabled. When SCR_EL3.FIQ == 0, + * these values are unchanched by the GIC. + * + * see GICv3/GICv4 Architecture Specification (IHI0069D): + * - section 4.8.1 Non-secure accesses to register fields for Secure interrupt + * priorities. + * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1 + * interrupt. + */ +static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis_ft2500); + +#ifndef CONFIG_ARM_GIC_V3 +/* + * Global static key controlling whether an update to PMR allowing more + * interrupts requires to be propagated to the redistributor (DSB SY). + * And this needs to be exported for modules to be able to enable + * interrupts... + */ +DEFINE_STATIC_KEY_FALSE(gic_pmr_sync); +EXPORT_SYMBOL(gic_pmr_sync); + +DEFINE_STATIC_KEY_FALSE(gic_nonsecure_priorities); +EXPORT_SYMBOL(gic_nonsecure_priorities); +#else +extern struct static_key_false gic_pmr_sync; +extern struct static_key_false gic_nonsecure_priorities; +#endif + +/* ppi_nmi_refs[n] == number of cpus having ppi[n + 16] set as NMI */ +static refcount_t *ppi_nmi_refs; + +static struct gic_kvm_info gic_v3_kvm_info; +static DEFINE_PER_CPU(bool, has_rss_ft2500); + +#define MPIDR_RS(mpidr) (((mpidr) & 0xF0UL) >> 4) +#define gic_data_rdist() (this_cpu_ptr(gic_data.rdists.rdist)) +#define gic_data_rdist_rd_base() (gic_data_rdist()->rd_base) +#define gic_data_rdist_sgi_base() (gic_data_rdist_rd_base() + SZ_64K) + +/* Our default, arbitrary priority value. Linux only uses one anyway. */ +#define DEFAULT_PMR_VALUE 0xf0 + +enum gic_intid_range { + SGI_RANGE, + PPI_RANGE, + SPI_RANGE, + EPPI_RANGE, + ESPI_RANGE, + LPI_RANGE, + __INVALID_RANGE__ +}; + +static enum gic_intid_range __get_intid_range(irq_hw_number_t hwirq) +{ + switch (hwirq) { + case 0 ... 15: + return SGI_RANGE; + case 16 ... 31: + return PPI_RANGE; + case 32 ... 1019: + return SPI_RANGE; + case EPPI_BASE_INTID ... (EPPI_BASE_INTID + 63): + return EPPI_RANGE; + case ESPI_BASE_INTID ... (ESPI_BASE_INTID + 1023): + return ESPI_RANGE; + case 8192 ... GENMASK(23, 0): + return LPI_RANGE; + default: + return __INVALID_RANGE__; + } +} + +static enum gic_intid_range get_intid_range(struct irq_data *d) +{ + return __get_intid_range(d->hwirq); +} + +static inline unsigned int gic_irq(struct irq_data *d) +{ + return d->hwirq; +} + +static inline bool gic_irq_in_rdist(struct irq_data *d) +{ + switch (get_intid_range(d)) { + case SGI_RANGE: + case PPI_RANGE: + case EPPI_RANGE: + return true; + default: + return false; + } +} + +static inline void __iomem *gic_dist_base(struct irq_data *d) +{ + switch (get_intid_range(d)) { + case SGI_RANGE: + case PPI_RANGE: + case EPPI_RANGE: + /* SGI+PPI -> SGI_base for this CPU */ + return gic_data_rdist_sgi_base(); + + case SPI_RANGE: + case ESPI_RANGE: + /* SPI -> dist_base */ + return gic_data.dist_base; + + default: + return NULL; + } +} + +static void gic_do_wait_for_rwp(void __iomem *base) +{ + u32 count = 1000000; /* 1s! */ + + while (readl_relaxed(base + GICD_CTLR) & GICD_CTLR_RWP) { + count--; + if (!count) { + pr_err_ratelimited("RWP timeout, gone fishing\n"); + return; + } + cpu_relax(); + udelay(1); + } +} + +/* Wait for completion of a distributor change */ +static void gic_dist_wait_for_rwp(void) +{ + gic_do_wait_for_rwp(gic_data.dist_base); +} + +/* Wait for completion of a redistributor change */ +static void gic_redist_wait_for_rwp(void) +{ + gic_do_wait_for_rwp(gic_data_rdist_rd_base()); +} + +#ifdef CONFIG_ARM64 + +static u64 __maybe_unused gic_read_iar(void) +{ + if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154)) + return gic_read_iar_cavium_thunderx(); + else + return gic_read_iar_common(); +} +#endif + +static void gic_enable_redist(bool enable) +{ + void __iomem *rbase; + u32 count = 1000000; /* 1s! */ + u32 val; + unsigned long mpidr; + int i; + + if (gic_data.flags & FLAGS_WORKAROUND_GICR_WAKER_MSM8996) + return; + + rbase = gic_data_rdist_rd_base(); + + val = readl_relaxed(rbase + GICR_WAKER); + if (enable) + /* Wake up this CPU redistributor */ + val &= ~GICR_WAKER_ProcessorSleep; + else + val |= GICR_WAKER_ProcessorSleep; + writel_relaxed(val, rbase + GICR_WAKER); + + if (!enable) { /* Check that GICR_WAKER is writeable */ + val = readl_relaxed(rbase + GICR_WAKER); + if (!(val & GICR_WAKER_ProcessorSleep)) + return; /* No PM support in this redistributor */ + } + + while (--count) { + val = readl_relaxed(rbase + GICR_WAKER); + if (enable ^ (bool)(val & GICR_WAKER_ChildrenAsleep)) + break; + cpu_relax(); + udelay(1); + } + if (!count) + pr_err_ratelimited("redistributor failed to %s...\n", + enable ? "wakeup" : "sleep"); + + mpidr = (unsigned long)cpu_logical_map(smp_processor_id()); + + if (mpidr & 0xFFFF) // either Aff1 or Aff0 is not zero + return; + + rbase = rbase + 64*SZ_128K; // skip 64 Redistributors + + for (i = 0; i < 4; i++) { + val = readl_relaxed(rbase + GICR_WAKER); + if (enable) + /* Wake up this CPU redistributor */ + val &= ~GICR_WAKER_ProcessorSleep; + else + val |= GICR_WAKER_ProcessorSleep; + writel_relaxed(val, rbase + GICR_WAKER); + + if (!enable) { /* Check that GICR_WAKER is writeable */ + val = readl_relaxed(rbase + GICR_WAKER); + if (!(val & GICR_WAKER_ProcessorSleep)) + return; /* No PM support in this redistributor */ + } + + count = 1000000; /* 1s! */ + while (--count) { + val = readl_relaxed(rbase + GICR_WAKER); + if (enable ^ (bool)(val & GICR_WAKER_ChildrenAsleep)) + break; + cpu_relax(); + udelay(1); + }; + if (!count) + pr_err_ratelimited("CPU MPIDR 0x%lx: redistributor %d failed to %s...\n", mpidr, 64 + i, + enable ? "wakeup" : "sleep"); + + rbase = rbase + SZ_128K; // next redistributor + } + +} + +/* + * Routines to disable, enable, EOI and route interrupts + */ +static u32 convert_offset_index(struct irq_data *d, u32 offset, u32 *index) +{ + switch (get_intid_range(d)) { + case SGI_RANGE: + case PPI_RANGE: + case SPI_RANGE: + *index = d->hwirq; + return offset; + case EPPI_RANGE: + /* + * Contrary to the ESPI range, the EPPI range is contiguous + * to the PPI range in the registers, so let's adjust the + * displacement accordingly. Consistency is overrated. + */ + *index = d->hwirq - EPPI_BASE_INTID + 32; + return offset; + case ESPI_RANGE: + *index = d->hwirq - ESPI_BASE_INTID; + switch (offset) { + case GICD_ISENABLER: + return GICD_ISENABLERnE; + case GICD_ICENABLER: + return GICD_ICENABLERnE; + case GICD_ISPENDR: + return GICD_ISPENDRnE; + case GICD_ICPENDR: + return GICD_ICPENDRnE; + case GICD_ISACTIVER: + return GICD_ISACTIVERnE; + case GICD_ICACTIVER: + return GICD_ICACTIVERnE; + case GICD_IPRIORITYR: + return GICD_IPRIORITYRnE; + case GICD_ICFGR: + return GICD_ICFGRnE; + case GICD_IROUTER: + return GICD_IROUTERnE; + default: + break; + } + break; + default: + break; + } + + WARN_ON(1); + *index = d->hwirq; + return offset; +} + +static int gic_peek_irq(struct irq_data *d, u32 offset) +{ + void __iomem *base; + u32 index, mask; + + offset = convert_offset_index(d, offset, &index); + mask = 1 << (index % 32); + + if (gic_irq_in_rdist(d)) + base = gic_data_rdist_sgi_base(); + else { + unsigned int skt; + + skt = mars3_irq_to_skt(gic_irq(d)); + base = mars3_gic_dists[skt].dist_base; + } + + return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask); +} + +static void gic_poke_irq(struct irq_data *d, u32 offset) +{ + void __iomem *base; + + unsigned long mpidr; + void __iomem *rbase; + int i; + unsigned int skt; + u32 index, mask; + + offset = convert_offset_index(d, offset, &index); + mask = 1 << (index % 32); + + if (gic_irq_in_rdist(d)) { + base = gic_data_rdist_sgi_base(); + + writel_relaxed(mask, base + offset + (index / 32) * 4); + gic_redist_wait_for_rwp(); + + mpidr = (unsigned long)cpu_logical_map(smp_processor_id()); + + if ((mpidr & 0xFFFF) == 0) { // both Aff1 and Aff0 are zero + rbase = base + 64*SZ_128K; // skip 64 Redistributors + + for (i = 0; i < 4; i++) { + writel_relaxed(mask, rbase + offset + (index / 32) * 4); + gic_do_wait_for_rwp(rbase - SZ_64K); // RD from SGI base + rbase = rbase + SZ_128K; + } + } // core 0 of each socket + } else { + skt = mars3_irq_to_skt(gic_irq(d)); + base = mars3_gic_dists[skt].dist_base; + writel_relaxed(mask, base + offset + (index / 32) * 4); + gic_do_wait_for_rwp(base); + } +} + +static void gic_mask_irq(struct irq_data *d) +{ + gic_poke_irq(d, GICD_ICENABLER); +} + +static void gic_eoimode1_mask_irq(struct irq_data *d) +{ + gic_mask_irq(d); + /* + * When masking a forwarded interrupt, make sure it is + * deactivated as well. + * + * This ensures that an interrupt that is getting + * disabled/masked will not get "stuck", because there is + * noone to deactivate it (guest is being terminated). + */ + if (irqd_is_forwarded_to_vcpu(d)) + gic_poke_irq(d, GICD_ICACTIVER); +} + +static void gic_unmask_irq(struct irq_data *d) +{ + gic_poke_irq(d, GICD_ISENABLER); +} + +static inline bool gic_supports_nmi_ft2500(void) +{ + return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && + static_branch_likely(&supports_pseudo_nmis_ft2500); +} + +static int gic_irq_set_irqchip_state(struct irq_data *d, + enum irqchip_irq_state which, bool val) +{ + u32 reg; + + if (d->hwirq >= 8192) /* SGI/PPI/SPI only */ + return -EINVAL; + + switch (which) { + case IRQCHIP_STATE_PENDING: + reg = val ? GICD_ISPENDR : GICD_ICPENDR; + break; + + case IRQCHIP_STATE_ACTIVE: + reg = val ? GICD_ISACTIVER : GICD_ICACTIVER; + break; + + case IRQCHIP_STATE_MASKED: + reg = val ? GICD_ICENABLER : GICD_ISENABLER; + break; + + default: + return -EINVAL; + } + + gic_poke_irq(d, reg); + return 0; +} + +static int gic_irq_get_irqchip_state(struct irq_data *d, + enum irqchip_irq_state which, bool *val) +{ + if (d->hwirq >= 8192) /* PPI/SPI only */ + return -EINVAL; + + switch (which) { + case IRQCHIP_STATE_PENDING: + *val = gic_peek_irq(d, GICD_ISPENDR); + break; + + case IRQCHIP_STATE_ACTIVE: + *val = gic_peek_irq(d, GICD_ISACTIVER); + break; + + case IRQCHIP_STATE_MASKED: + *val = !gic_peek_irq(d, GICD_ISENABLER); + break; + + default: + return -EINVAL; + } + + return 0; +} + +static void gic_irq_set_prio(struct irq_data *d, u8 prio) +{ + void __iomem *base = gic_dist_base(d); + u32 offset, index; + + offset = convert_offset_index(d, GICD_IPRIORITYR, &index); + + writeb_relaxed(prio, base + offset + index); +} + +static u32 gic_get_ppi_index(struct irq_data *d) +{ + switch (get_intid_range(d)) { + case PPI_RANGE: + return d->hwirq - 16; + case EPPI_RANGE: + return d->hwirq - EPPI_BASE_INTID + 16; + default: + unreachable(); + } +} + +static int gic_irq_nmi_setup(struct irq_data *d) +{ + struct irq_desc *desc = irq_to_desc(d->irq); + + if (!gic_supports_nmi_ft2500()) + return -EINVAL; + + if (gic_peek_irq(d, GICD_ISENABLER)) { + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq); + return -EINVAL; + } + + /* + * A secondary irq_chip should be in charge of LPI request, + * it should not be possible to get there + */ + if (WARN_ON(gic_irq(d) >= 8192)) + return -EINVAL; + + /* desc lock should already be held */ + if (gic_irq_in_rdist(d)) { + u32 idx = gic_get_ppi_index(d); + + /* Setting up PPI as NMI, only switch handler for first NMI */ + if (!refcount_inc_not_zero(&ppi_nmi_refs[idx])) { + refcount_set(&ppi_nmi_refs[idx], 1); + desc->handle_irq = handle_percpu_devid_fasteoi_nmi; + } + } else { + desc->handle_irq = handle_fasteoi_nmi; + } + + gic_irq_set_prio(d, GICD_INT_NMI_PRI); + + return 0; +} + +static void gic_irq_nmi_teardown(struct irq_data *d) +{ + struct irq_desc *desc = irq_to_desc(d->irq); + + if (WARN_ON(!gic_supports_nmi_ft2500())) + return; + + if (gic_peek_irq(d, GICD_ISENABLER)) { + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq); + return; + } + + /* + * A secondary irq_chip should be in charge of LPI request, + * it should not be possible to get there + */ + if (WARN_ON(gic_irq(d) >= 8192)) + return; + + /* desc lock should already be held */ + if (gic_irq_in_rdist(d)) { + u32 idx = gic_get_ppi_index(d); + + /* Tearing down NMI, only switch handler for last NMI */ + if (refcount_dec_and_test(&ppi_nmi_refs[idx])) + desc->handle_irq = handle_percpu_devid_irq; + } else { + desc->handle_irq = handle_fasteoi_irq; + } + + gic_irq_set_prio(d, GICD_INT_DEF_PRI); +} + +static void gic_eoi_irq(struct irq_data *d) +{ + gic_write_eoir(gic_irq(d)); +} + +static void gic_eoimode1_eoi_irq(struct irq_data *d) +{ + /* + * No need to deactivate an LPI, or an interrupt that + * is is getting forwarded to a vcpu. + */ + if (gic_irq(d) >= 8192 || irqd_is_forwarded_to_vcpu(d)) + return; + gic_write_dir(gic_irq(d)); +} + +static int gic_set_type(struct irq_data *d, unsigned int type) +{ + enum gic_intid_range range; + unsigned int irq = gic_irq(d); + void __iomem *base; + u32 offset, index; + int ret; + + unsigned long mpidr; + int i; + void __iomem *rbase; + unsigned int skt; + range = get_intid_range(d); + /* Interrupt configuration for SGIs can't be changed */ + if (range == SGI_RANGE) + return type != IRQ_TYPE_EDGE_RISING ? -EINVAL : 0; + + /* SPIs have restrictions on the supported types */ + if ((range == SPI_RANGE || range == ESPI_RANGE) && + type != IRQ_TYPE_LEVEL_HIGH && type != IRQ_TYPE_EDGE_RISING) + return -EINVAL; + + offset = convert_offset_index(d, GICD_ICFGR, &index); + + if (gic_irq_in_rdist(d)) { + base = gic_data_rdist_sgi_base(); + ret = gic_configure_irq(index, type, base + offset, gic_redist_wait_for_rwp); + + mpidr = (unsigned long)cpu_logical_map(smp_processor_id()); + + if ((mpidr & 0xFFFF) == 0) { // both Aff1 and Aff0 are zero + rbase = base + 64*SZ_128K; // skip 64 Redistributors + + for (i = 0; i < 4; i++) { + ret = gic_configure_irq(index, type, rbase + offset, NULL); + gic_do_wait_for_rwp(rbase - SZ_64K); + rbase = rbase + SZ_128K; + } + } + } else { + skt = mars3_irq_to_skt(gic_irq(d)); + base = mars3_gic_dists[skt].dist_base; + ret = gic_configure_irq(index, type, base + offset, NULL); + gic_do_wait_for_rwp(base); + } + + + if (ret && (range == PPI_RANGE || range == EPPI_RANGE)) { + /* Misconfigured PPIs are usually not fatal */ + pr_warn("GIC: PPI INTID%d is secure or misconfigured\n", irq); + ret = 0; + } + + return ret; +} + +static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) +{ + if (get_intid_range(d) == SGI_RANGE) + return -EINVAL; + + if (vcpu) + irqd_set_forwarded_to_vcpu(d); + else + irqd_clr_forwarded_to_vcpu(d); + return 0; +} + +static u64 gic_mpidr_to_affinity(unsigned long mpidr) +{ + u64 aff; + + aff = ((u64)MPIDR_AFFINITY_LEVEL(mpidr, 3) << 32 | + MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 | + MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 | + MPIDR_AFFINITY_LEVEL(mpidr, 0)); + + return aff; +} + +static void gic_deactivate_unhandled(u32 irqnr) +{ + if (static_branch_likely(&supports_deactivate_key)) { + if (irqnr < 8192) + gic_write_dir(irqnr); + } else { + gic_write_eoir(irqnr); + } +} + +static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) +{ + bool irqs_enabled = interrupts_enabled(regs); + int err; + + if (irqs_enabled) + nmi_enter(); + + if (static_branch_likely(&supports_deactivate_key)) + gic_write_eoir(irqnr); + /* + * Leave the PSR.I bit set to prevent other NMIs to be + * received while handling this one. + * PSR.I will be restored when we ERET to the + * interrupted context. + */ + err = handle_domain_nmi(gic_data.domain, irqnr, regs); + if (err) + gic_deactivate_unhandled(irqnr); + + if (irqs_enabled) + nmi_exit(); +} + +static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) +{ + u32 irqnr; + + irqnr = gic_read_iar(); + + if (gic_supports_nmi_ft2500() && + unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) { + gic_handle_nmi(irqnr, regs); + return; + } + + if (gic_prio_masking_enabled()) { + gic_pmr_mask_irqs(); + gic_arch_enable_irqs(); + } + + /* Check for special IDs first */ + if ((irqnr >= 1020 && irqnr <= 1023)) + return; + + if (static_branch_likely(&supports_deactivate_key)) + gic_write_eoir(irqnr); + else + isb(); + + if (handle_domain_irq(gic_data.domain, irqnr, regs)) { + WARN_ONCE(true, "Unexpected interrupt received!\n"); + gic_deactivate_unhandled(irqnr); + } +} + +static u32 gic_get_pribits(void) +{ + u32 pribits; + + pribits = gic_read_ctlr(); + pribits &= ICC_CTLR_EL1_PRI_BITS_MASK; + pribits >>= ICC_CTLR_EL1_PRI_BITS_SHIFT; + pribits++; + + return pribits; +} + +static bool gic_has_group0(void) +{ + u32 val; + u32 old_pmr; + + old_pmr = gic_read_pmr(); + + /* + * Let's find out if Group0 is under control of EL3 or not by + * setting the highest possible, non-zero priority in PMR. + * + * If SCR_EL3.FIQ is set, the priority gets shifted down in + * order for the CPU interface to set bit 7, and keep the + * actual priority in the non-secure range. In the process, it + * looses the least significant bit and the actual priority + * becomes 0x80. Reading it back returns 0, indicating that + * we're don't have access to Group0. + */ + gic_write_pmr(BIT(8 - gic_get_pribits())); + val = gic_read_pmr(); + + gic_write_pmr(old_pmr); + + return val != 0; +} + +static void __init gic_dist_init(void) +{ + unsigned int i; + u64 affinity; + void __iomem *base = gic_data.dist_base; + u32 val; + + unsigned int skt; + + for (skt = 0; skt < MAX_MARS3_SOC_COUNT; skt++) { + + if ((((unsigned int)1 << skt) & mars3_sockets_bitmap) == 0) + continue; + + base = mars3_gic_dists[skt].dist_base; + + /* Disable the distributor */ + writel_relaxed(0, base + GICD_CTLR); + gic_do_wait_for_rwp(base); + + /* + * Configure SPIs as non-secure Group-1. This will only matter + * if the GIC only has a single security state. This will not + * do the right thing if the kernel is running in secure mode, + * but that's not the intended use case anyway. + */ + for (i = 32; i < GIC_LINE_NR; i += 32) + writel_relaxed(~0, base + GICD_IGROUPR + i / 8); + + /* Extended SPI range, not handled by the GICv2/GICv3 common code */ + for (i = 0; i < GIC_ESPI_NR; i += 32) { + writel_relaxed(~0U, base + GICD_ICENABLERnE + i / 8); + writel_relaxed(~0U, base + GICD_ICACTIVERnE + i / 8); + } + + for (i = 0; i < GIC_ESPI_NR; i += 32) + writel_relaxed(~0U, base + GICD_IGROUPRnE + i / 8); + + for (i = 0; i < GIC_ESPI_NR; i += 16) + writel_relaxed(0, base + GICD_ICFGRnE + i / 4); + + for (i = 0; i < GIC_ESPI_NR; i += 4) + writel_relaxed(GICD_INT_DEF_PRI_X4, base + GICD_IPRIORITYRnE + i); + + /* Now do the common stuff, and wait for the distributor to drain */ + gic_dist_config(base, GIC_LINE_NR, NULL); + gic_do_wait_for_rwp(base); // do sync outside of gic_dist_config + + val = GICD_CTLR_ARE_NS | GICD_CTLR_ENABLE_G1A | GICD_CTLR_ENABLE_G1; + if (gic_data.rdists.gicd_typer2 & GICD_TYPER2_nASSGIcap) { + pr_info("Enabling SGIs without active state\n"); + val |= GICD_CTLR_nASSGIreq; + } + + /* Enable distributor with ARE, Group1 */ + writel_relaxed(val, base + GICD_CTLR); + + /* + * Set all global interrupts to the boot CPU only. ARE must be + * enabled. + */ + affinity = gic_mpidr_to_affinity(cpu_logical_map(smp_processor_id())); + for (i = 32; i < GIC_LINE_NR; i++) + gic_write_irouter(affinity, base + GICD_IROUTER + i * 8); + + for (i = 0; i < GIC_ESPI_NR; i++) + gic_write_irouter(affinity, base + GICD_IROUTERnE + i * 8); + } +} + +static int gic_iterate_rdists(int (*fn)(struct redist_region *, void __iomem *)) +{ + int ret = -ENODEV; + int i; + + for (i = 0; i < gic_data.nr_redist_regions; i++) { + void __iomem *ptr = gic_data.redist_regions[i].redist_base; + u64 typer; + u32 reg; + + reg = readl_relaxed(ptr + GICR_PIDR2) & GIC_PIDR2_ARCH_MASK; + if (reg != GIC_PIDR2_ARCH_GICv3 && + reg != GIC_PIDR2_ARCH_GICv4) { /* We're in trouble... */ + pr_warn("No redistributor present @%p\n", ptr); + break; + } + + do { + typer = gic_read_typer(ptr + GICR_TYPER); + ret = fn(gic_data.redist_regions + i, ptr); + if (!ret) + return 0; + + if (gic_data.redist_regions[i].single_redist) + break; + + if (gic_data.redist_stride) { + ptr += gic_data.redist_stride; + } else { + ptr += SZ_64K * 2; /* Skip RD_base + SGI_base */ + if (typer & GICR_TYPER_VLPIS) + ptr += SZ_64K * 2; /* Skip VLPI_base + reserved page */ + } + } while (!(typer & GICR_TYPER_LAST)); + } + + return ret ? -ENODEV : 0; +} + +static int __gic_populate_rdist(struct redist_region *region, void __iomem *ptr) +{ + unsigned long mpidr = cpu_logical_map(smp_processor_id()); + u64 typer; + u32 aff; + u32 aff2_skt; + u32 redist_skt; + + /* + * Convert affinity to a 32bit value that can be matched to + * GICR_TYPER bits [63:32]. + */ + aff = (MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 | + MPIDR_AFFINITY_LEVEL(mpidr, 0)); + + aff2_skt = MPIDR_AFFINITY_LEVEL(mpidr, 2) & 0x7; + redist_skt = (((u64)region->phys_base >> MARS3_ADDR_SKTID_SHIFT) & 0x7); + + if (aff2_skt != redist_skt) { + return 1; + } + + typer = gic_read_typer(ptr + GICR_TYPER); + if ((typer >> 32) == aff) { + u64 offset = ptr - region->redist_base; + raw_spin_lock_init(&gic_data_rdist()->rd_lock); + gic_data_rdist_rd_base() = ptr; + gic_data_rdist()->phys_base = region->phys_base + offset; + + pr_info("CPU%d: found redistributor %lx region %d:%pa\n", + smp_processor_id(), mpidr, + (int)(region - gic_data.redist_regions), + &gic_data_rdist()->phys_base); + return 0; + } + + /* Try next one */ + return 1; +} + +static int gic_populate_rdist(void) +{ + if (gic_iterate_rdists(__gic_populate_rdist) == 0) + return 0; + + /* We couldn't even deal with ourselves... */ + WARN(true, "CPU%d: mpidr %lx has no re-distributor!\n", + smp_processor_id(), + (unsigned long)cpu_logical_map(smp_processor_id())); + return -ENODEV; +} + +static int __gic_update_rdist_properties(struct redist_region *region, + void __iomem *ptr) +{ + u64 typer = gic_read_typer(ptr + GICR_TYPER); + + gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS); + + /* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */ + gic_data.rdists.has_rvpeid &= !!(typer & GICR_TYPER_RVPEID); + gic_data.rdists.has_direct_lpi &= (!!(typer & GICR_TYPER_DirectLPIS) | + gic_data.rdists.has_rvpeid); + gic_data.rdists.has_vpend_valid_dirty &= !!(typer & GICR_TYPER_DIRTY); + + /* Detect non-sensical configurations */ + if (WARN_ON_ONCE(gic_data.rdists.has_rvpeid && !gic_data.rdists.has_vlpis)) { + gic_data.rdists.has_direct_lpi = false; + gic_data.rdists.has_vlpis = false; + gic_data.rdists.has_rvpeid = false; + } + + gic_data.ppi_nr = min(GICR_TYPER_NR_PPIS(typer), gic_data.ppi_nr); + + return 1; +} + +static void gic_update_rdist_properties(void) +{ + gic_data.ppi_nr = UINT_MAX; + gic_iterate_rdists(__gic_update_rdist_properties); + if (WARN_ON(gic_data.ppi_nr == UINT_MAX)) + gic_data.ppi_nr = 0; + pr_info("%d PPIs implemented\n", gic_data.ppi_nr); + if (gic_data.rdists.has_vlpis) + pr_info("GICv4 features: %s%s%s\n", + gic_data.rdists.has_direct_lpi ? "DirectLPI " : "", + gic_data.rdists.has_rvpeid ? "RVPEID " : "", + gic_data.rdists.has_vpend_valid_dirty ? "Valid+Dirty " : ""); +} + +/* Check whether it's single security state view */ +static inline bool gic_dist_security_disabled(void) +{ + return readl_relaxed(gic_data.dist_base + GICD_CTLR) & GICD_CTLR_DS; +} + +static void gic_cpu_sys_reg_init(void) +{ + int i, cpu = smp_processor_id(); + u64 mpidr = cpu_logical_map(cpu); + u64 need_rss = MPIDR_RS(mpidr); + bool group0; + u32 pribits; + + /* + * Need to check that the SRE bit has actually been set. If + * not, it means that SRE is disabled at EL2. We're going to + * die painfully, and there is nothing we can do about it. + * + * Kindly inform the luser. + */ + if (!gic_enable_sre()) + pr_err("GIC: unable to set SRE (disabled at EL2), panic ahead\n"); + + pribits = gic_get_pribits(); + + group0 = gic_has_group0(); + + /* Set priority mask register */ + if (!gic_prio_masking_enabled()) { + write_gicreg(DEFAULT_PMR_VALUE, ICC_PMR_EL1); + } else if (gic_supports_nmi_ft2500()) { + /* + * Mismatch configuration with boot CPU, the system is likely + * to die as interrupt masking will not work properly on all + * CPUs + * + * The boot CPU calls this function before enabling NMI support, + * and as a result we'll never see this warning in the boot path + * for that CPU. + */ + if (static_branch_unlikely(&gic_nonsecure_priorities)) + WARN_ON(!group0 || gic_dist_security_disabled()); + else + WARN_ON(group0 && !gic_dist_security_disabled()); + } + + /* + * Some firmwares hand over to the kernel with the BPR changed from + * its reset value (and with a value large enough to prevent + * any pre-emptive interrupts from working at all). Writing a zero + * to BPR restores is reset value. + */ + gic_write_bpr1(0); + + if (static_branch_likely(&supports_deactivate_key)) { + /* EOI drops priority only (mode 1) */ + gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop); + } else { + /* EOI deactivates interrupt too (mode 0) */ + gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir); + } + + /* Always whack Group0 before Group1 */ + if (group0) { + switch (pribits) { + case 8: + case 7: + write_gicreg(0, ICC_AP0R3_EL1); + write_gicreg(0, ICC_AP0R2_EL1); + fallthrough; + case 6: + write_gicreg(0, ICC_AP0R1_EL1); + fallthrough; + case 5: + case 4: + write_gicreg(0, ICC_AP0R0_EL1); + } + + isb(); + } + + switch (pribits) { + case 8: + case 7: + write_gicreg(0, ICC_AP1R3_EL1); + write_gicreg(0, ICC_AP1R2_EL1); + fallthrough; + case 6: + write_gicreg(0, ICC_AP1R1_EL1); + fallthrough; + case 5: + case 4: + write_gicreg(0, ICC_AP1R0_EL1); + } + + isb(); + + /* ... and let's hit the road... */ + gic_write_grpen1(1); + + /* Keep the RSS capability status in per_cpu variable */ + per_cpu(has_rss_ft2500, cpu) = !!(gic_read_ctlr() & ICC_CTLR_EL1_RSS); + + /* Check all the CPUs have capable of sending SGIs to other CPUs */ + for_each_online_cpu(i) { + bool have_rss = per_cpu(has_rss_ft2500, i) && per_cpu(has_rss_ft2500, cpu); + + need_rss |= MPIDR_RS(cpu_logical_map(i)); + if (need_rss && (!have_rss)) + pr_crit("CPU%d (%lx) can't SGI CPU%d (%lx), no RSS\n", + cpu, (unsigned long)mpidr, + i, (unsigned long)cpu_logical_map(i)); + } + + /** + * GIC spec says, when ICC_CTLR_EL1.RSS==1 and GICD_TYPER.RSS==0, + * writing ICC_ASGI1R_EL1 register with RS != 0 is a CONSTRAINED + * UNPREDICTABLE choice of : + * - The write is ignored. + * - The RS field is treated as 0. + */ + if (need_rss && (!gic_data.has_rss)) + pr_crit_once("RSS is required but GICD doesn't support it\n"); +} + +static bool gicv3_nolpi; + +static int __init gicv3_nolpi_cfg(char *buf) +{ + return strtobool(buf, &gicv3_nolpi); +} +early_param("irqchip.gicv3_nolpi", gicv3_nolpi_cfg); + +static int gic_dist_supports_lpis(void) +{ + return (IS_ENABLED(CONFIG_ARM_GIC_V3_ITS) && + !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS) && + !gicv3_nolpi); +} + +static void gic_cpu_init(void) +{ + void __iomem *rbase; + int i; + unsigned long mpidr; + + /* Register ourselves with the rest of the world */ + if (gic_populate_rdist()) + return; + + gic_enable_redist(true); + + WARN((gic_data.ppi_nr > 16 || GIC_ESPI_NR != 0) && + !(gic_read_ctlr() & ICC_CTLR_EL1_ExtRange), + "Distributor has extended ranges, but CPU%d doesn't\n", + smp_processor_id()); + + rbase = gic_data_rdist_sgi_base(); + + /* Configure SGIs/PPIs as non-secure Group-1 */ + for (i = 0; i < gic_data.ppi_nr + 16; i += 32) + writel_relaxed(~0, rbase + GICR_IGROUPR0 + i / 8); + + gic_cpu_config(rbase, gic_data.ppi_nr + 16, gic_redist_wait_for_rwp); + + mpidr = (unsigned long)cpu_logical_map(smp_processor_id()); + + if ((mpidr & 0xFFFF) == 0) { // both Aff1 and Aff0 is zero + rbase = rbase + 64*SZ_128K; // skip 64 Redistributors + + for (i = 0; i < 4; i++) { + /* Configure SGIs/PPIs as non-secure Group-1 */ + writel_relaxed(~0, rbase + GICR_IGROUPR0); + + gic_cpu_config(rbase, gic_data.ppi_nr + 16, NULL); + gic_do_wait_for_rwp(rbase - SZ_64K); + + rbase = rbase + SZ_128K; + + } + + } + + /* initialise system registers */ + gic_cpu_sys_reg_init(); +} + +#ifdef CONFIG_SMP + +#define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) +#define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) + +static int gic_starting_cpu(unsigned int cpu) +{ + gic_cpu_init(); + + if (gic_dist_supports_lpis()) + phytium_its_cpu_init(); + + return 0; +} + +static u16 gic_compute_target_list(int *base_cpu, const struct cpumask *mask, + unsigned long cluster_id) +{ + int next_cpu, cpu = *base_cpu; + unsigned long mpidr = cpu_logical_map(cpu); + u16 tlist = 0; + + while (cpu < nr_cpu_ids) { + tlist |= 1 << (mpidr & 0xf); + + next_cpu = cpumask_next(cpu, mask); + if (next_cpu >= nr_cpu_ids) + goto out; + cpu = next_cpu; + + mpidr = cpu_logical_map(cpu); + + if (cluster_id != MPIDR_TO_SGI_CLUSTER_ID(mpidr)) { + cpu--; + goto out; + } + } +out: + *base_cpu = cpu; + return tlist; +} + +#define MPIDR_TO_SGI_AFFINITY(cluster_id, level) \ + (MPIDR_AFFINITY_LEVEL(cluster_id, level) \ + << ICC_SGI1R_AFFINITY_## level ##_SHIFT) + +static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq) +{ + u64 val; + + val = (MPIDR_TO_SGI_AFFINITY(cluster_id, 3) | + MPIDR_TO_SGI_AFFINITY(cluster_id, 2) | + irq << ICC_SGI1R_SGI_ID_SHIFT | + MPIDR_TO_SGI_AFFINITY(cluster_id, 1) | + MPIDR_TO_SGI_RS(cluster_id) | + tlist << ICC_SGI1R_TARGET_LIST_SHIFT); + + pr_devel("CPU%d: ICC_SGI1R_EL1 %llx\n", smp_processor_id(), val); + gic_write_sgi1r(val); +} + +static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) +{ + int cpu; + + if (WARN_ON(d->hwirq >= 16)) + return; + + /* + * Ensure that stores to Normal memory are visible to the + * other CPUs before issuing the IPI. + */ + wmb(); + + for_each_cpu(cpu, mask) { + u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu)); + u16 tlist; + + tlist = gic_compute_target_list(&cpu, mask, cluster_id); + gic_send_sgi(cluster_id, tlist, d->hwirq); + } + + /* Force the above writes to ICC_SGI1R_EL1 to be executed */ + isb(); +} + +static void __init gic_smp_init(void) +{ + struct irq_fwspec sgi_fwspec = { + .fwnode = gic_data.fwnode, + .param_count = 1, + }; + int base_sgi; + + cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING, + "irqchip/arm/gicv3:starting", + gic_starting_cpu, NULL); + + /* Register all 8 non-secure SGIs */ + base_sgi = __irq_domain_alloc_irqs(gic_data.domain, -1, 8, + NUMA_NO_NODE, &sgi_fwspec, + false, NULL); + if (WARN_ON(base_sgi <= 0)) + return; + + set_smp_ipi_range(base_sgi, 8); +} + +static int gic_cpumask_select(struct irq_data *d, const struct cpumask *mask_val) +{ + unsigned int skt, irq_skt, i; + unsigned int cpu, cpus = 0; + + unsigned int skt_cpu_cnt[MAX_MARS3_SOC_COUNT] = {0}; + + for (i = 0; i < nr_cpu_ids; i++) { + skt = (cpu_logical_map(i) >> 16) & 0xff; + if ((skt >= 0) && (skt < MAX_MARS3_SOC_COUNT)) { + skt_cpu_cnt[skt]++; + } else if (0xff != skt) { + pr_err("socket address: %d is out of range.", skt); + } + } + + irq_skt = mars3_irq_to_skt(gic_irq(d)); + + if (0 != irq_skt) { + for (i = 0; i < irq_skt; i++) { + cpus += skt_cpu_cnt[i]; + } + } + + cpu = cpumask_any_and(mask_val, cpu_online_mask); + if ((cpu > cpus) && (cpu < (cpus + skt_cpu_cnt[irq_skt]))) { + cpus = cpu; + } + + return cpus; +} + +static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, + bool force) +{ + unsigned int cpu; + u32 offset, index; + void __iomem *reg; + int enabled; + u64 val; + unsigned int skt; + + if (force) + cpu = cpumask_first(mask_val); + else + cpu = gic_cpumask_select(d, mask_val); + + if (cpu >= nr_cpu_ids) + return -EINVAL; + + if (gic_irq_in_rdist(d)) + return -EINVAL; + + /* If interrupt was enabled, disable it first */ + enabled = gic_peek_irq(d, GICD_ISENABLER); + if (enabled) + gic_mask_irq(d); + + offset = convert_offset_index(d, GICD_IROUTER, &index); + + skt = mars3_irq_to_skt(gic_irq(d)); + reg = mars3_gic_dists[skt].dist_base + offset + GICD_IROUTER + (index * 8); + val = gic_mpidr_to_affinity(cpu_logical_map(cpu)); + + gic_write_irouter(val, reg); + + /* + * If the interrupt was enabled, enabled it again. Otherwise, + * just wait for the distributor to have digested our changes. + */ + if (enabled) + gic_unmask_irq(d); + else + gic_dist_wait_for_rwp(); + + irq_data_update_effective_affinity(d, cpumask_of(cpu)); + + return IRQ_SET_MASK_OK_DONE; +} +#else +#define gic_set_affinity NULL +#define gic_ipi_send_mask NULL +#define gic_smp_init() do { } while (0) +#endif + +static int gic_retrigger(struct irq_data *data) +{ + return !gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); +} + +#ifdef CONFIG_CPU_PM +static int gic_cpu_pm_notifier(struct notifier_block *self, + unsigned long cmd, void *v) +{ + if (cmd == CPU_PM_EXIT) { + if (gic_dist_security_disabled()) + gic_enable_redist(true); + gic_cpu_sys_reg_init(); + } else if (cmd == CPU_PM_ENTER && gic_dist_security_disabled()) { + gic_write_grpen1(0); + gic_enable_redist(false); + } + return NOTIFY_OK; +} + +static struct notifier_block gic_cpu_pm_notifier_block = { + .notifier_call = gic_cpu_pm_notifier, +}; + +static void gic_cpu_pm_init(void) +{ + cpu_pm_register_notifier(&gic_cpu_pm_notifier_block); +} + +#else +static inline void gic_cpu_pm_init(void) { } +#endif /* CONFIG_CPU_PM */ + +static struct irq_chip gic_chip = { + .name = "GIC-phytium-2500", + .irq_mask = gic_mask_irq, + .irq_unmask = gic_unmask_irq, + .irq_eoi = gic_eoi_irq, + .irq_set_type = gic_set_type, + .irq_set_affinity = gic_set_affinity, + .irq_retrigger = gic_retrigger, + .irq_get_irqchip_state = gic_irq_get_irqchip_state, + .irq_set_irqchip_state = gic_irq_set_irqchip_state, + .irq_nmi_setup = gic_irq_nmi_setup, + .irq_nmi_teardown = gic_irq_nmi_teardown, + .ipi_send_mask = gic_ipi_send_mask, + .flags = IRQCHIP_SET_TYPE_MASKED | + IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_MASK_ON_SUSPEND, +}; + +static struct irq_chip gic_eoimode1_chip = { + .name = "GICv3-phytium-2500", + .irq_mask = gic_eoimode1_mask_irq, + .irq_unmask = gic_unmask_irq, + .irq_eoi = gic_eoimode1_eoi_irq, + .irq_set_type = gic_set_type, + .irq_set_affinity = gic_set_affinity, + .irq_retrigger = gic_retrigger, + .irq_get_irqchip_state = gic_irq_get_irqchip_state, + .irq_set_irqchip_state = gic_irq_set_irqchip_state, + .irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity, + .irq_nmi_setup = gic_irq_nmi_setup, + .irq_nmi_teardown = gic_irq_nmi_teardown, + .ipi_send_mask = gic_ipi_send_mask, + .flags = IRQCHIP_SET_TYPE_MASKED | + IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_MASK_ON_SUSPEND, +}; + +static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq, + irq_hw_number_t hw) +{ + struct irq_chip *chip = &gic_chip; + struct irq_data *irqd = irq_desc_get_irq_data(irq_to_desc(irq)); + + if (static_branch_likely(&supports_deactivate_key)) + chip = &gic_eoimode1_chip; + + switch (__get_intid_range(hw)) { + case SGI_RANGE: + irq_set_percpu_devid(irq); + irq_domain_set_info(d, irq, hw, chip, d->host_data, + handle_percpu_devid_fasteoi_ipi, + NULL, NULL); + break; + + case PPI_RANGE: + case EPPI_RANGE: + irq_set_percpu_devid(irq); + irq_domain_set_info(d, irq, hw, chip, d->host_data, + handle_percpu_devid_irq, NULL, NULL); + break; + + case SPI_RANGE: + case ESPI_RANGE: + irq_domain_set_info(d, irq, hw, chip, d->host_data, + handle_fasteoi_irq, NULL, NULL); + irq_set_probe(irq); + irqd_set_single_target(irqd); + break; + + case LPI_RANGE: + if (!gic_dist_supports_lpis()) + return -EPERM; + irq_domain_set_info(d, irq, hw, chip, d->host_data, + handle_fasteoi_irq, NULL, NULL); + break; + + default: + return -EPERM; + } + + /* Prevents SW retriggers which mess up the ACK/EOI ordering */ + irqd_set_handle_enforce_irqctx(irqd); + return 0; +} + +static int gic_irq_domain_translate(struct irq_domain *d, + struct irq_fwspec *fwspec, + unsigned long *hwirq, + unsigned int *type) +{ + if (fwspec->param_count == 1 && fwspec->param[0] < 16) { + *hwirq = fwspec->param[0]; + *type = IRQ_TYPE_EDGE_RISING; + return 0; + } + + if (is_of_node(fwspec->fwnode)) { + if (fwspec->param_count < 3) + return -EINVAL; + + switch (fwspec->param[0]) { + case 0: /* SPI */ + *hwirq = fwspec->param[1] + 32; + break; + case 1: /* PPI */ + *hwirq = fwspec->param[1] + 16; + break; + case 2: /* ESPI */ + *hwirq = fwspec->param[1] + ESPI_BASE_INTID; + break; + case 3: /* EPPI */ + *hwirq = fwspec->param[1] + EPPI_BASE_INTID; + break; + case GIC_IRQ_TYPE_LPI: /* LPI */ + *hwirq = fwspec->param[1]; + break; + case GIC_IRQ_TYPE_PARTITION: + *hwirq = fwspec->param[1]; + if (fwspec->param[1] >= 16) + *hwirq += EPPI_BASE_INTID - 16; + else + *hwirq += 16; + break; + default: + return -EINVAL; + } + + *type = fwspec->param[2] & IRQ_TYPE_SENSE_MASK; + + /* + * Make it clear that broken DTs are... broken. + * Partitionned PPIs are an unfortunate exception. + */ + WARN_ON(*type == IRQ_TYPE_NONE && + fwspec->param[0] != GIC_IRQ_TYPE_PARTITION); + return 0; + } + + if (is_fwnode_irqchip(fwspec->fwnode)) { + if (fwspec->param_count != 2) + return -EINVAL; + + *hwirq = fwspec->param[0]; + *type = fwspec->param[1]; + + WARN_ON(*type == IRQ_TYPE_NONE); + return 0; + } + + return -EINVAL; +} + +static int gic_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, + unsigned int nr_irqs, void *arg) +{ + int i, ret; + irq_hw_number_t hwirq; + unsigned int type = IRQ_TYPE_NONE; + struct irq_fwspec *fwspec = arg; + + ret = gic_irq_domain_translate(domain, fwspec, &hwirq, &type); + if (ret) + return ret; + + for (i = 0; i < nr_irqs; i++) { + ret = gic_irq_domain_map(domain, virq + i, hwirq + i); + if (ret) + return ret; + } + + return 0; +} + +static void gic_irq_domain_free(struct irq_domain *domain, unsigned int virq, + unsigned int nr_irqs) +{ + int i; + + for (i = 0; i < nr_irqs; i++) { + struct irq_data *d = irq_domain_get_irq_data(domain, virq + i); + irq_set_handler(virq + i, NULL); + irq_domain_reset_irq_data(d); + } +} + +static int gic_irq_domain_select(struct irq_domain *d, + struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + /* Not for us */ + if (fwspec->fwnode != d->fwnode) + return 0; + + /* If this is not DT, then we have a single domain */ + if (!is_of_node(fwspec->fwnode)) + return 1; + + /* + * If this is a PPI and we have a 4th (non-null) parameter, + * then we need to match the partition domain. + */ + if (fwspec->param_count >= 4 && + fwspec->param[0] == 1 && fwspec->param[3] != 0 && + gic_data.ppi_descs) + return d == partition_get_domain(gic_data.ppi_descs[fwspec->param[1]]); + + return d == gic_data.domain; +} + +static const struct irq_domain_ops gic_irq_domain_ops = { + .translate = gic_irq_domain_translate, + .alloc = gic_irq_domain_alloc, + .free = gic_irq_domain_free, + .select = gic_irq_domain_select, +}; + +static int partition_domain_translate(struct irq_domain *d, + struct irq_fwspec *fwspec, + unsigned long *hwirq, + unsigned int *type) +{ + struct device_node *np; + int ret; + + if (!gic_data.ppi_descs) + return -ENOMEM; + + np = of_find_node_by_phandle(fwspec->param[3]); + if (WARN_ON(!np)) + return -EINVAL; + + ret = partition_translate_id(gic_data.ppi_descs[fwspec->param[1]], + of_node_to_fwnode(np)); + if (ret < 0) + return ret; + + *hwirq = ret; + *type = fwspec->param[2] & IRQ_TYPE_SENSE_MASK; + + return 0; +} + +static const struct irq_domain_ops partition_domain_ops = { + .translate = partition_domain_translate, + .select = gic_irq_domain_select, +}; + +static bool gic_enable_quirk_msm8996(void *data) +{ + struct gic_chip_data *d = data; + + d->flags |= FLAGS_WORKAROUND_GICR_WAKER_MSM8996; + + return true; +} + +static bool gic_enable_quirk_cavium_38539(void *data) +{ + struct gic_chip_data *d = data; + + d->flags |= FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539; + + return true; +} + +static bool gic_enable_quirk_hip06_07(void *data) +{ + struct gic_chip_data *d = data; + + /* + * HIP06 GICD_IIDR clashes with GIC-600 product number (despite + * not being an actual ARM implementation). The saving grace is + * that GIC-600 doesn't have ESPI, so nothing to do in that case. + * HIP07 doesn't even have a proper IIDR, and still pretends to + * have ESPI. In both cases, put them right. + */ + if (d->rdists.gicd_typer & GICD_TYPER_ESPI) { + /* Zero both ESPI and the RES0 field next to it... */ + d->rdists.gicd_typer &= ~GENMASK(9, 8); + return true; + } + + return false; +} + +static const struct gic_quirk gic_quirks[] = { + { + .desc = "GICv3: Qualcomm MSM8996 broken firmware", + .compatible = "qcom,msm8996-gic-v3", + .init = gic_enable_quirk_msm8996, + }, + { + .desc = "GICv3: HIP06 erratum 161010803", + .iidr = 0x0204043b, + .mask = 0xffffffff, + .init = gic_enable_quirk_hip06_07, + }, + { + .desc = "GICv3: HIP07 erratum 161010803", + .iidr = 0x00000000, + .mask = 0xffffffff, + .init = gic_enable_quirk_hip06_07, + }, + { + /* + * Reserved register accesses generate a Synchronous + * External Abort. This erratum applies to: + * - ThunderX: CN88xx + * - OCTEON TX: CN83xx, CN81xx + * - OCTEON TX2: CN93xx, CN96xx, CN98xx, CNF95xx* + */ + .desc = "GICv3: Cavium erratum 38539", + .iidr = 0xa000034c, + .mask = 0xe8f00fff, + .init = gic_enable_quirk_cavium_38539, + }, + { + } +}; + +static void gic_enable_nmi_support(void) +{ + int i; + + if (!gic_prio_masking_enabled()) + return; + + ppi_nmi_refs = kcalloc(gic_data.ppi_nr, sizeof(*ppi_nmi_refs), GFP_KERNEL); + if (!ppi_nmi_refs) + return; + + for (i = 0; i < gic_data.ppi_nr; i++) + refcount_set(&ppi_nmi_refs[i], 0); + + /* + * Linux itself doesn't use 1:N distribution, so has no need to + * set PMHE. The only reason to have it set is if EL3 requires it + * (and we can't change it). + */ + if (gic_read_ctlr() & ICC_CTLR_EL1_PMHE_MASK) + static_branch_enable(&gic_pmr_sync); + + pr_info("Pseudo-NMIs enabled using %s ICC_PMR_EL1 synchronisation\n", + static_branch_unlikely(&gic_pmr_sync) ? "forced" : "relaxed"); + + /* + * How priority values are used by the GIC depends on two things: + * the security state of the GIC (controlled by the GICD_CTRL.DS bit) + * and if Group 0 interrupts can be delivered to Linux in the non-secure + * world as FIQs (controlled by the SCR_EL3.FIQ bit). These affect the + * the ICC_PMR_EL1 register and the priority that software assigns to + * interrupts: + * + * GICD_CTRL.DS | SCR_EL3.FIQ | ICC_PMR_EL1 | Group 1 priority + * ----------------------------------------------------------- + * 1 | - | unchanged | unchanged + * ----------------------------------------------------------- + * 0 | 1 | non-secure | non-secure + * ----------------------------------------------------------- + * 0 | 0 | unchanged | non-secure + * + * where non-secure means that the value is right-shifted by one and the + * MSB bit set, to make it fit in the non-secure priority range. + * + * In the first two cases, where ICC_PMR_EL1 and the interrupt priority + * are both either modified or unchanged, we can use the same set of + * priorities. + * + * In the last case, where only the interrupt priorities are modified to + * be in the non-secure range, we use a different PMR value to mask IRQs + * and the rest of the values that we use remain unchanged. + */ + if (gic_has_group0() && !gic_dist_security_disabled()) + static_branch_enable(&gic_nonsecure_priorities); + + static_branch_enable(&supports_pseudo_nmis_ft2500); + + if (static_branch_likely(&supports_deactivate_key)) + gic_eoimode1_chip.flags |= IRQCHIP_SUPPORTS_NMI; + else + gic_chip.flags |= IRQCHIP_SUPPORTS_NMI; +} + +static int __init gic_init_bases(void __iomem *dist_base, + struct redist_region *rdist_regs, + u32 nr_redist_regions, + u64 redist_stride, + struct fwnode_handle *handle) +{ + u32 typer; + int err; + + if (!is_hyp_mode_available()) + static_branch_disable(&supports_deactivate_key); + + if (static_branch_likely(&supports_deactivate_key)) + pr_info("GIC: Using split EOI/Deactivate mode\n"); + + gic_data.fwnode = handle; + gic_data.dist_base = dist_base; + gic_data.redist_regions = rdist_regs; + gic_data.nr_redist_regions = nr_redist_regions; + gic_data.redist_stride = redist_stride; + + /* + * Find out how many interrupts are supported. + */ + typer = readl_relaxed(gic_data.dist_base + GICD_TYPER); + gic_data.rdists.gicd_typer = typer; + + gic_enable_quirks(readl_relaxed(gic_data.dist_base + GICD_IIDR), + gic_quirks, &gic_data); + + pr_info("%d SPIs implemented\n", GIC_LINE_NR - 32); + pr_info("%d Extended SPIs implemented\n", GIC_ESPI_NR); + + /* + * ThunderX1 explodes on reading GICD_TYPER2, in violation of the + * architecture spec (which says that reserved registers are RES0). + */ + if (!(gic_data.flags & FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539)) + gic_data.rdists.gicd_typer2 = readl_relaxed(gic_data.dist_base + GICD_TYPER2); + + gic_data.domain = irq_domain_create_tree(handle, &gic_irq_domain_ops, + &gic_data); + gic_data.rdists.rdist = alloc_percpu(typeof(*gic_data.rdists.rdist)); + gic_data.rdists.has_rvpeid = true; + gic_data.rdists.has_vlpis = true; + gic_data.rdists.has_direct_lpi = true; + gic_data.rdists.has_vpend_valid_dirty = true; + + if (WARN_ON(!gic_data.domain) || WARN_ON(!gic_data.rdists.rdist)) { + err = -ENOMEM; + goto out_free; + } + + irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED); + + gic_data.has_rss = !!(typer & GICD_TYPER_RSS); + pr_info("Distributor has %sRange Selector support\n", + gic_data.has_rss ? "" : "no "); + + if (typer & GICD_TYPER_MBIS) { + err = mbi_init(handle, gic_data.domain); + if (err) + pr_err("Failed to initialize MBIs\n"); + } + + set_handle_irq(gic_handle_irq); + + gic_update_rdist_properties(); + + gic_dist_init(); + gic_cpu_init(); + gic_smp_init(); + gic_cpu_pm_init(); + + if (gic_dist_supports_lpis()) { + phytium_its_init(handle, &gic_data.rdists, gic_data.domain); + phytium_its_cpu_init(); + } else { + if (IS_ENABLED(CONFIG_ARM_GIC_V2M)) + gicv2m_init(handle, gic_data.domain); + } + + gic_enable_nmi_support(); + + return 0; + +out_free: + if (gic_data.domain) + irq_domain_remove(gic_data.domain); + free_percpu(gic_data.rdists.rdist); + return err; +} + +static int __init gic_validate_dist_version(void __iomem *dist_base) +{ + u32 reg = readl_relaxed(dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; + + if (reg != GIC_PIDR2_ARCH_GICv3 && reg != GIC_PIDR2_ARCH_GICv4) + return -ENODEV; + + return 0; +} + +/* Create all possible partitions at boot time */ +static void __init gic_populate_ppi_partitions(struct device_node *gic_node) +{ + struct device_node *parts_node, *child_part; + int part_idx = 0, i; + int nr_parts; + struct partition_affinity *parts; + + parts_node = of_get_child_by_name(gic_node, "ppi-partitions"); + if (!parts_node) + return; + + gic_data.ppi_descs = kcalloc(gic_data.ppi_nr, sizeof(*gic_data.ppi_descs), GFP_KERNEL); + if (!gic_data.ppi_descs) + return; + + nr_parts = of_get_child_count(parts_node); + + if (!nr_parts) + goto out_put_node; + + parts = kcalloc(nr_parts, sizeof(*parts), GFP_KERNEL); + if (WARN_ON(!parts)) + goto out_put_node; + + for_each_child_of_node(parts_node, child_part) { + struct partition_affinity *part; + int n; + + part = &parts[part_idx]; + + part->partition_id = of_node_to_fwnode(child_part); + + pr_info("GIC: PPI partition %pOFn[%d] { ", + child_part, part_idx); + + n = of_property_count_elems_of_size(child_part, "affinity", + sizeof(u32)); + WARN_ON(n <= 0); + + for (i = 0; i < n; i++) { + int err, cpu; + u32 cpu_phandle; + struct device_node *cpu_node; + + err = of_property_read_u32_index(child_part, "affinity", + i, &cpu_phandle); + if (WARN_ON(err)) + continue; + + cpu_node = of_find_node_by_phandle(cpu_phandle); + if (WARN_ON(!cpu_node)) + continue; + + cpu = of_cpu_node_to_id(cpu_node); + if (WARN_ON(cpu < 0)) + continue; + + pr_cont("%pOF[%d] ", cpu_node, cpu); + + cpumask_set_cpu(cpu, &part->mask); + } + + pr_cont("}\n"); + part_idx++; + } + + for (i = 0; i < gic_data.ppi_nr; i++) { + unsigned int irq; + struct partition_desc *desc; + struct irq_fwspec ppi_fwspec = { + .fwnode = gic_data.fwnode, + .param_count = 3, + .param = { + [0] = GIC_IRQ_TYPE_PARTITION, + [1] = i, + [2] = IRQ_TYPE_NONE, + }, + }; + + irq = irq_create_fwspec_mapping(&ppi_fwspec); + if (WARN_ON(!irq)) + continue; + desc = partition_create_desc(gic_data.fwnode, parts, nr_parts, + irq, &partition_domain_ops); + if (WARN_ON(!desc)) + continue; + + gic_data.ppi_descs[i] = desc; + } + +out_put_node: + of_node_put(parts_node); +} + +static void __init gic_of_setup_kvm_info(struct device_node *node) +{ + int ret; + struct resource r; + u32 gicv_idx; + + gic_v3_kvm_info.type = GIC_V3; + + gic_v3_kvm_info.maint_irq = irq_of_parse_and_map(node, 0); + if (!gic_v3_kvm_info.maint_irq) + return; + + if (of_property_read_u32(node, "#redistributor-regions", + &gicv_idx)) + gicv_idx = 1; + + gicv_idx += 3; /* Also skip GICD, GICC, GICH */ + ret = of_address_to_resource(node, gicv_idx, &r); + if (!ret) + gic_v3_kvm_info.vcpu = r; + + gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis; + gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid; + gic_set_kvm_info(&gic_v3_kvm_info); +} + +static int __init gic_of_init(struct device_node *node, struct device_node *parent) +{ + void __iomem *dist_base; + struct redist_region *rdist_regs; + u64 redist_stride; + u32 nr_redist_regions; + int err, i; + struct resource res; + unsigned long skt; + + dist_base = of_iomap(node, 0); + if (!dist_base) { + pr_err("%pOF: unable to map gic dist registers\n", node); + return -ENXIO; + } + + err = gic_validate_dist_version(dist_base); + if (err) { + pr_err("%pOF: no distributor detected, giving up\n", node); + goto out_unmap_dist; + } + + if (of_address_to_resource(node, 0, &res)) { + printk("Error: No GIC Distributor in FDT\n"); + goto out_unmap_dist; + } + + mars3_gic_dists[0].phys_base = res.start; + mars3_gic_dists[0].size = resource_size(&res); + mars3_gic_dists[0].dist_base = dist_base; + + if (of_property_read_u32(node, "#mars3_soc_bitmap", &mars3_sockets_bitmap)) + mars3_sockets_bitmap = 0x1; + + + for (skt = 1; skt < MAX_MARS3_SOC_COUNT; skt++) { + if ((((unsigned int)1 << skt) & mars3_sockets_bitmap) == 0) + continue; + + mars3_gic_dists[skt].phys_base = ((unsigned long)skt << MARS3_ADDR_SKTID_SHIFT) | mars3_gic_dists[0].phys_base; + mars3_gic_dists[skt].size = mars3_gic_dists[0].size; + mars3_gic_dists[skt].dist_base = ioremap(mars3_gic_dists[skt].phys_base, mars3_gic_dists[skt].size); + + } + + if (of_property_read_u32(node, "#redistributor-regions", &nr_redist_regions)) + nr_redist_regions = 1; + + rdist_regs = kcalloc(nr_redist_regions, sizeof(*rdist_regs), + GFP_KERNEL); + if (!rdist_regs) { + err = -ENOMEM; + goto out_unmap_dist; + } + + for (i = 0; i < nr_redist_regions; i++) { + struct resource res; + int ret; + + ret = of_address_to_resource(node, 1 + i, &res); + rdist_regs[i].redist_base = of_iomap(node, 1 + i); + if (ret || !rdist_regs[i].redist_base) { + pr_err("%pOF: couldn't map region %d\n", node, i); + err = -ENODEV; + goto out_unmap_rdist; + } + rdist_regs[i].phys_base = res.start; + } + + if (of_property_read_u64(node, "redistributor-stride", &redist_stride)) + redist_stride = 0; + + gic_enable_of_quirks(node, gic_quirks, &gic_data); + + err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions, + redist_stride, &node->fwnode); + if (err) + goto out_unmap_rdist; + + gic_populate_ppi_partitions(node); + + if (static_branch_likely(&supports_deactivate_key)) + gic_of_setup_kvm_info(node); + return 0; + +out_unmap_rdist: + for (i = 0; i < nr_redist_regions; i++) + if (rdist_regs[i].redist_base) + iounmap(rdist_regs[i].redist_base); + kfree(rdist_regs); +out_unmap_dist: + iounmap(dist_base); + return err; +} + +IRQCHIP_DECLARE(gic_phyt_2500, "arm,gic-phytium-2500", gic_of_init); + +#ifdef CONFIG_ACPI +static struct +{ + void __iomem *dist_base; + struct redist_region *redist_regs; + u32 nr_redist_regions; + bool single_redist; + int enabled_rdists; + u32 maint_irq; + int maint_irq_mode; + phys_addr_t vcpu_base; +} acpi_data __initdata; + +static int gic_mars3_sockets_bitmap(void) +{ + unsigned int skt, i; + int skt_bitmap = 0; + + unsigned int skt_cpu_cnt[MAX_MARS3_SOC_COUNT] = {0}; + + for (i = 0; i < nr_cpu_ids; i++) { + skt = (cpu_logical_map(i) >> 16) & 0xff; + if ((skt >= 0) && (skt < MAX_MARS3_SOC_COUNT)) { + skt_cpu_cnt[skt]++; + } else if (0xff != skt) { + pr_err("socket address: %d is out of range.", skt); + } + } + + for (i = 0; i < MAX_MARS3_SOC_COUNT; i++) { + if (skt_cpu_cnt[i] > 0) + skt_bitmap |= (1 << i); + } + + return skt_bitmap; +} + +static void __init +gic_acpi_register_redist(phys_addr_t phys_base, void __iomem *redist_base) +{ + static int count = 0; + + acpi_data.redist_regs[count].phys_base = phys_base; + acpi_data.redist_regs[count].redist_base = redist_base; + acpi_data.redist_regs[count].single_redist = acpi_data.single_redist; + count++; +} + +static int __init +gic_acpi_parse_madt_redist(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_generic_redistributor *redist = + (struct acpi_madt_generic_redistributor *)header; + void __iomem *redist_base; + + redist_base = ioremap(redist->base_address, redist->length); + if (!redist_base) { + pr_err("Couldn't map GICR region @%llx\n", redist->base_address); + return -ENOMEM; + } + + gic_acpi_register_redist(redist->base_address, redist_base); + return 0; +} + +static int __init +gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_generic_interrupt *gicc = + (struct acpi_madt_generic_interrupt *)header; + u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; + u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; + void __iomem *redist_base; + + /* GICC entry which has !ACPI_MADT_ENABLED is not unusable so skip */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) + return 0; + + redist_base = ioremap(gicc->gicr_base_address, size); + if (!redist_base) + return -ENOMEM; + + gic_acpi_register_redist(gicc->gicr_base_address, redist_base); + return 0; +} + +static int __init gic_acpi_collect_gicr_base(void) +{ + acpi_tbl_entry_handler redist_parser; + enum acpi_madt_type type; + + if (acpi_data.single_redist) { + type = ACPI_MADT_TYPE_GENERIC_INTERRUPT; + redist_parser = gic_acpi_parse_madt_gicc; + } else { + type = ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR; + redist_parser = gic_acpi_parse_madt_redist; + } + + /* Collect redistributor base addresses in GICR entries */ + if (acpi_table_parse_madt(type, redist_parser, 0) > 0) + return 0; + + pr_info("No valid GICR entries exist\n"); + return -ENODEV; +} + +static int __init gic_acpi_match_gicr(union acpi_subtable_headers *header, + const unsigned long end) +{ + /* Subtable presence means that redist exists, that's it */ + return 0; +} + +static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_generic_interrupt *gicc = + (struct acpi_madt_generic_interrupt *)header; + + /* + * If GICC is enabled and has valid gicr base address, then it means + * GICR base is presented via GICC + */ + if ((gicc->flags & ACPI_MADT_ENABLED) && gicc->gicr_base_address) { + acpi_data.enabled_rdists++; + return 0; + } + + /* + * It's perfectly valid firmware can pass disabled GICC entry, driver + * should not treat as errors, skip the entry instead of probe fail. + */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) + return 0; + + return -ENODEV; +} + +static int __init gic_acpi_count_gicr_regions(void) +{ + int count; + + /* + * Count how many redistributor regions we have. It is not allowed + * to mix redistributor description, GICR and GICC subtables have to be + * mutually exclusive. + */ + count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR, + gic_acpi_match_gicr, 0); + if (count > 0) { + acpi_data.single_redist = false; + return count; + } + + count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT, + gic_acpi_match_gicc, 0); + if (count > 0) { + acpi_data.single_redist = true; + count = acpi_data.enabled_rdists; + } + + return count; +} + +static bool __init acpi_validate_gic_table(struct acpi_subtable_header *header, + struct acpi_probe_entry *ape) +{ + struct acpi_madt_generic_distributor *dist; + int count; + + dist = (struct acpi_madt_generic_distributor *)header; + if (dist->version != ape->driver_data) + return false; + + /* We need to do that exercise anyway, the sooner the better */ + count = gic_acpi_count_gicr_regions(); + if (count <= 0) + return false; + + acpi_data.nr_redist_regions = count; + return true; +} + +static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_generic_interrupt *gicc = + (struct acpi_madt_generic_interrupt *)header; + int maint_irq_mode; + static int first_madt = true; + + /* Skip unusable CPUs */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) + return 0; + + maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ? + ACPI_EDGE_SENSITIVE : ACPI_LEVEL_SENSITIVE; + + if (first_madt) { + first_madt = false; + + acpi_data.maint_irq = gicc->vgic_interrupt; + acpi_data.maint_irq_mode = maint_irq_mode; + acpi_data.vcpu_base = gicc->gicv_base_address; + + return 0; + } + + /* + * The maintenance interrupt and GICV should be the same for every CPU + */ + if ((acpi_data.maint_irq != gicc->vgic_interrupt) || + (acpi_data.maint_irq_mode != maint_irq_mode) || + (acpi_data.vcpu_base != gicc->gicv_base_address)) + return -EINVAL; + + return 0; +} + +static bool __init gic_acpi_collect_virt_info(void) +{ + int count; + + count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT, + gic_acpi_parse_virt_madt_gicc, 0); + + return (count > 0); +} + +#define ACPI_GICV3_DIST_MEM_SIZE (SZ_64K) +#define ACPI_GICV2_VCTRL_MEM_SIZE (SZ_4K) +#define ACPI_GICV2_VCPU_MEM_SIZE (SZ_8K) + +static void __init gic_acpi_setup_kvm_info(void) +{ + int irq; + + if (!gic_acpi_collect_virt_info()) { + pr_warn("Unable to get hardware information used for virtualization\n"); + return; + } + + gic_v3_kvm_info.type = GIC_V3; + + irq = acpi_register_gsi(NULL, acpi_data.maint_irq, + acpi_data.maint_irq_mode, + ACPI_ACTIVE_HIGH); + if (irq <= 0) + return; + + gic_v3_kvm_info.maint_irq = irq; + + if (acpi_data.vcpu_base) { + struct resource *vcpu = &gic_v3_kvm_info.vcpu; + + vcpu->flags = IORESOURCE_MEM; + vcpu->start = acpi_data.vcpu_base; + vcpu->end = vcpu->start + ACPI_GICV2_VCPU_MEM_SIZE - 1; + } + + gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis; + gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid; + gic_set_kvm_info(&gic_v3_kvm_info); +} + +static int __init +gic_acpi_init(union acpi_subtable_headers *header, const unsigned long end) +{ + struct acpi_madt_generic_distributor *dist; + struct fwnode_handle *domain_handle; + size_t size; + int i, err; + int skt; + + /* Get distributor base address */ + dist = (struct acpi_madt_generic_distributor *)header; + acpi_data.dist_base = ioremap(dist->base_address, + ACPI_GICV3_DIST_MEM_SIZE); + if (!acpi_data.dist_base) { + pr_err("Unable to map GICD registers\n"); + return -ENOMEM; + } + + err = gic_validate_dist_version(acpi_data.dist_base); + if (err) { + pr_err("No distributor detected at @%p, giving up\n", + acpi_data.dist_base); + goto out_dist_unmap; + } + + mars3_gic_dists[0].phys_base = dist->base_address; + mars3_gic_dists[0].size = ACPI_GICV3_DIST_MEM_SIZE; + mars3_gic_dists[0].dist_base = acpi_data.dist_base; + +#ifdef CONFIG_ACPI + mars3_sockets_bitmap = gic_mars3_sockets_bitmap(); + if (mars3_sockets_bitmap == 0) { + mars3_sockets_bitmap = 0x1; + pr_err("No socket, please check cpus MPIDR_AFFINITY_LEVEL!!!"); + } else + printk("mars3_sockets_bitmap = 0x%x\n", mars3_sockets_bitmap); +#endif + + for (skt = 1; skt < MAX_MARS3_SOC_COUNT; skt++) { + if ((((unsigned int)1 << skt) & mars3_sockets_bitmap) == 0) + continue; + + mars3_gic_dists[skt].phys_base = ((unsigned long)skt << MARS3_ADDR_SKTID_SHIFT) | mars3_gic_dists[0].phys_base; + mars3_gic_dists[skt].size = mars3_gic_dists[0].size; + mars3_gic_dists[skt].dist_base = ioremap(mars3_gic_dists[skt].phys_base, mars3_gic_dists[skt].size); + + } + + size = sizeof(*acpi_data.redist_regs) * acpi_data.nr_redist_regions; + acpi_data.redist_regs = kzalloc(size, GFP_KERNEL); + if (!acpi_data.redist_regs) { + err = -ENOMEM; + goto out_dist_unmap; + } + + err = gic_acpi_collect_gicr_base(); + if (err) + goto out_redist_unmap; + + domain_handle = irq_domain_alloc_fwnode(&dist->base_address); + if (!domain_handle) { + err = -ENOMEM; + goto out_redist_unmap; + } + + err = gic_init_bases(acpi_data.dist_base, acpi_data.redist_regs, + acpi_data.nr_redist_regions, 0, domain_handle); + if (err) + goto out_fwhandle_free; + + acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle); + + if (static_branch_likely(&supports_deactivate_key)) + gic_acpi_setup_kvm_info(); + + return 0; + +out_fwhandle_free: + irq_domain_free_fwnode(domain_handle); +out_redist_unmap: + for (i = 0; i < acpi_data.nr_redist_regions; i++) + if (acpi_data.redist_regs[i].redist_base) + iounmap(acpi_data.redist_regs[i].redist_base); + kfree(acpi_data.redist_regs); +out_dist_unmap: + iounmap(acpi_data.dist_base); + return err; +} +IRQCHIP_ACPI_DECLARE(gic_phyt_2500, ACPI_MADT_TYPE_PHYTIUM_2500, + acpi_validate_gic_table, ACPI_MADT_GIC_VERSION_V3, + gic_acpi_init); +#endif diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index e44230e4aff9..a1f33b091801 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -518,7 +518,8 @@ enum acpi_madt_type { ACPI_MADT_TYPE_GENERIC_MSI_FRAME = 13, ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR = 14, ACPI_MADT_TYPE_GENERIC_TRANSLATOR = 15, - ACPI_MADT_TYPE_RESERVED = 16 /* 16 and greater are reserved */ + ACPI_MADT_TYPE_RESERVED = 16, /* 16 and greater are reserved */ + ACPI_MADT_TYPE_PHYTIUM_2500 = 128 }; /* diff --git a/include/linux/irqchip/arm-gic-phytium-2500.h b/include/linux/irqchip/arm-gic-phytium-2500.h new file mode 100644 index 000000000000..96c5ca83f44d --- /dev/null +++ b/include/linux/irqchip/arm-gic-phytium-2500.h @@ -0,0 +1,725 @@ +/* + * Copyright (C) 2020 Phytium Corporation. + * Author: Wang Yinfeng <wangyinfeng(a)phytium.com.cn> + * Chen Baozi <chenbaozi(a)phytium.com.cn> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef __LINUX_IRQCHIP_ARM_GIC_PHYTIUM_2500_H +#define __LINUX_IRQCHIP_ARM_GIC_PHYTIUM_2500_H + +/* + * Distributor registers. We assume we're running non-secure, with ARE + * being set. Secure-only and non-ARE registers are not described. + */ +#define GICD_CTLR 0x0000 +#define GICD_TYPER 0x0004 +#define GICD_IIDR 0x0008 +#define GICD_TYPER2 0x000C +#define GICD_STATUSR 0x0010 +#define GICD_SETSPI_NSR 0x0040 +#define GICD_CLRSPI_NSR 0x0048 +#define GICD_SETSPI_SR 0x0050 +#define GICD_CLRSPI_SR 0x0058 +#define GICD_IGROUPR 0x0080 +#define GICD_ISENABLER 0x0100 +#define GICD_ICENABLER 0x0180 +#define GICD_ISPENDR 0x0200 +#define GICD_ICPENDR 0x0280 +#define GICD_ISACTIVER 0x0300 +#define GICD_ICACTIVER 0x0380 +#define GICD_IPRIORITYR 0x0400 +#define GICD_ICFGR 0x0C00 +#define GICD_IGRPMODR 0x0D00 +#define GICD_NSACR 0x0E00 +#define GICD_IGROUPRnE 0x1000 +#define GICD_ISENABLERnE 0x1200 +#define GICD_ICENABLERnE 0x1400 +#define GICD_ISPENDRnE 0x1600 +#define GICD_ICPENDRnE 0x1800 +#define GICD_ISACTIVERnE 0x1A00 +#define GICD_ICACTIVERnE 0x1C00 +#define GICD_IPRIORITYRnE 0x2000 +#define GICD_ICFGRnE 0x3000 +#define GICD_IROUTER 0x6000 +#define GICD_IROUTERnE 0x8000 +#define GICD_IDREGS 0xFFD0 +#define GICD_PIDR2 0xFFE8 + +#define ESPI_BASE_INTID 4096 + +/* + * Those registers are actually from GICv2, but the spec demands that they + * are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3). + */ +#define GICD_ITARGETSR 0x0800 +#define GICD_SGIR 0x0F00 +#define GICD_CPENDSGIR 0x0F10 +#define GICD_SPENDSGIR 0x0F20 + +#define GICD_CTLR_RWP (1U << 31) +#define GICD_CTLR_nASSGIreq (1U << 8) +#define GICD_CTLR_DS (1U << 6) +#define GICD_CTLR_ARE_NS (1U << 4) +#define GICD_CTLR_ENABLE_G1A (1U << 1) +#define GICD_CTLR_ENABLE_G1 (1U << 0) + +#define GICD_IIDR_IMPLEMENTER_SHIFT 0 +#define GICD_IIDR_IMPLEMENTER_MASK (0xfff << GICD_IIDR_IMPLEMENTER_SHIFT) +#define GICD_IIDR_REVISION_SHIFT 12 +#define GICD_IIDR_REVISION_MASK (0xf << GICD_IIDR_REVISION_SHIFT) +#define GICD_IIDR_VARIANT_SHIFT 16 +#define GICD_IIDR_VARIANT_MASK (0xf << GICD_IIDR_VARIANT_SHIFT) +#define GICD_IIDR_PRODUCT_ID_SHIFT 24 +#define GICD_IIDR_PRODUCT_ID_MASK (0xff << GICD_IIDR_PRODUCT_ID_SHIFT) + + +/* + * In systems with a single security state (what we emulate in KVM) + * the meaning of the interrupt group enable bits is slightly different + */ +#define GICD_CTLR_ENABLE_SS_G1 (1U << 1) +#define GICD_CTLR_ENABLE_SS_G0 (1U << 0) + +#define GICD_TYPER_RSS (1U << 26) +#define GICD_TYPER_LPIS (1U << 17) +#define GICD_TYPER_MBIS (1U << 16) +#define GICD_TYPER_ESPI (1U << 8) + +#define GICD_TYPER_ID_BITS(typer) ((((typer) >> 19) & 0x1f) + 1) +#define GICD_TYPER_NUM_LPIS(typer) ((((typer) >> 11) & 0x1f) + 1) +#define GICD_TYPER_SPIS(typer) ((((typer) & 0x1f) + 1) * 32) +#define GICD_TYPER_ESPIS(typer) \ + (((typer) & GICD_TYPER_ESPI) ? GICD_TYPER_SPIS((typer) >> 27) : 0) + +#define GICD_TYPER2_nASSGIcap (1U << 8) +#define GICD_TYPER2_VIL (1U << 7) +#define GICD_TYPER2_VID GENMASK(4, 0) + +#define GICD_IROUTER_SPI_MODE_ONE (0U << 31) +#define GICD_IROUTER_SPI_MODE_ANY (1U << 31) + +#define GIC_PIDR2_ARCH_MASK 0xf0 +#define GIC_PIDR2_ARCH_GICv3 0x30 +#define GIC_PIDR2_ARCH_GICv4 0x40 + +#define GIC_V3_DIST_SIZE 0x10000 + +#define GIC_PAGE_SIZE_4K 0ULL +#define GIC_PAGE_SIZE_16K 1ULL +#define GIC_PAGE_SIZE_64K 2ULL +#define GIC_PAGE_SIZE_MASK 3ULL + +/* + * Re-Distributor registers, offsets from RD_base + */ +#define GICR_CTLR GICD_CTLR +#define GICR_IIDR 0x0004 +#define GICR_TYPER 0x0008 +#define GICR_STATUSR GICD_STATUSR +#define GICR_WAKER 0x0014 +#define GICR_SETLPIR 0x0040 +#define GICR_CLRLPIR 0x0048 +#define GICR_PROPBASER 0x0070 +#define GICR_PENDBASER 0x0078 +#define GICR_INVLPIR 0x00A0 +#define GICR_INVALLR 0x00B0 +#define GICR_SYNCR 0x00C0 +#define GICR_IDREGS GICD_IDREGS +#define GICR_PIDR2 GICD_PIDR2 + +#define GICR_CTLR_ENABLE_LPIS (1UL << 0) +#define GICR_CTLR_RWP (1UL << 3) + +#define GICR_TYPER_CPU_NUMBER(r) (((r) >> 8) & 0xffff) + +#define EPPI_BASE_INTID 1056 + +#define GICR_TYPER_NR_PPIS(r) \ + ({ \ + unsigned int __ppinum = ((r) >> 27) & 0x1f; \ + unsigned int __nr_ppis = 16; \ + if (__ppinum == 1 || __ppinum == 2) \ + __nr_ppis += __ppinum * 32; \ + \ + __nr_ppis; \ + }) + +#define GICR_WAKER_ProcessorSleep (1U << 1) +#define GICR_WAKER_ChildrenAsleep (1U << 2) + +#define GIC_BASER_CACHE_nCnB 0ULL +#define GIC_BASER_CACHE_SameAsInner 0ULL +#define GIC_BASER_CACHE_nC 1ULL +#define GIC_BASER_CACHE_RaWt 2ULL +#define GIC_BASER_CACHE_RaWb 3ULL +#define GIC_BASER_CACHE_WaWt 4ULL +#define GIC_BASER_CACHE_WaWb 5ULL +#define GIC_BASER_CACHE_RaWaWt 6ULL +#define GIC_BASER_CACHE_RaWaWb 7ULL +#define GIC_BASER_CACHE_MASK 7ULL +#define GIC_BASER_NonShareable 0ULL +#define GIC_BASER_InnerShareable 1ULL +#define GIC_BASER_OuterShareable 2ULL +#define GIC_BASER_SHAREABILITY_MASK 3ULL + +#define GIC_BASER_CACHEABILITY(reg, inner_outer, type) \ + (GIC_BASER_CACHE_##type << reg##_##inner_outer##_CACHEABILITY_SHIFT) + +#define GIC_BASER_SHAREABILITY(reg, type) \ + (GIC_BASER_##type << reg##_SHAREABILITY_SHIFT) + +/* encode a size field of width @w containing @n - 1 units */ +#define GIC_ENCODE_SZ(n, w) (((unsigned long)(n) - 1) & GENMASK_ULL(((w) - 1), 0)) + +#define GICR_PROPBASER_SHAREABILITY_SHIFT (10) +#define GICR_PROPBASER_INNER_CACHEABILITY_SHIFT (7) +#define GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT (56) +#define GICR_PROPBASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GICR_PROPBASER, SHAREABILITY_MASK) +#define GICR_PROPBASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, MASK) +#define GICR_PROPBASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, MASK) +#define GICR_PROPBASER_CACHEABILITY_MASK GICR_PROPBASER_INNER_CACHEABILITY_MASK + +#define GICR_PROPBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable) + +#define GICR_PROPBASER_nCnB GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nCnB) +#define GICR_PROPBASER_nC GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nC) +#define GICR_PROPBASER_RaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWt) +#define GICR_PROPBASER_RaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb) +#define GICR_PROPBASER_WaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWt) +#define GICR_PROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWb) +#define GICR_PROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWt) +#define GICR_PROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWb) + +#define GICR_PROPBASER_IDBITS_MASK (0x1f) +#define GICR_PROPBASER_ADDRESS(x) ((x) & GENMASK_ULL(51, 12)) +#define GICR_PENDBASER_ADDRESS(x) ((x) & GENMASK_ULL(51, 16)) + +#define GICR_PENDBASER_SHAREABILITY_SHIFT (10) +#define GICR_PENDBASER_INNER_CACHEABILITY_SHIFT (7) +#define GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT (56) +#define GICR_PENDBASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GICR_PENDBASER, SHAREABILITY_MASK) +#define GICR_PENDBASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, MASK) +#define GICR_PENDBASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, MASK) +#define GICR_PENDBASER_CACHEABILITY_MASK GICR_PENDBASER_INNER_CACHEABILITY_MASK + +#define GICR_PENDBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable) + +#define GICR_PENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nCnB) +#define GICR_PENDBASER_nC GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nC) +#define GICR_PENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWt) +#define GICR_PENDBASER_RaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb) +#define GICR_PENDBASER_WaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWt) +#define GICR_PENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWb) +#define GICR_PENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWt) +#define GICR_PENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWb) + +#define GICR_PENDBASER_PTZ BIT_ULL(62) + +/* + * Re-Distributor registers, offsets from SGI_base + */ +#define GICR_IGROUPR0 GICD_IGROUPR +#define GICR_ISENABLER0 GICD_ISENABLER +#define GICR_ICENABLER0 GICD_ICENABLER +#define GICR_ISPENDR0 GICD_ISPENDR +#define GICR_ICPENDR0 GICD_ICPENDR +#define GICR_ISACTIVER0 GICD_ISACTIVER +#define GICR_ICACTIVER0 GICD_ICACTIVER +#define GICR_IPRIORITYR0 GICD_IPRIORITYR +#define GICR_ICFGR0 GICD_ICFGR +#define GICR_IGRPMODR0 GICD_IGRPMODR +#define GICR_NSACR GICD_NSACR + +#define GICR_TYPER_PLPIS (1U << 0) +#define GICR_TYPER_VLPIS (1U << 1) +#define GICR_TYPER_DIRTY (1U << 2) +#define GICR_TYPER_DirectLPIS (1U << 3) +#define GICR_TYPER_LAST (1U << 4) +#define GICR_TYPER_RVPEID (1U << 7) +#define GICR_TYPER_COMMON_LPI_AFF GENMASK_ULL(25, 24) +#define GICR_TYPER_AFFINITY GENMASK_ULL(63, 32) + +#define GICR_INVLPIR_INTID GENMASK_ULL(31, 0) +#define GICR_INVLPIR_VPEID GENMASK_ULL(47, 32) +#define GICR_INVLPIR_V GENMASK_ULL(63, 63) + +#define GICR_INVALLR_VPEID GICR_INVLPIR_VPEID +#define GICR_INVALLR_V GICR_INVLPIR_V + +#define GIC_V3_REDIST_SIZE 0x20000 + +#define LPI_PROP_GROUP1 (1 << 1) +#define LPI_PROP_ENABLED (1 << 0) + +/* + * Re-Distributor registers, offsets from VLPI_base + */ +#define GICR_VPROPBASER 0x0070 + +#define GICR_VPROPBASER_IDBITS_MASK 0x1f + +#define GICR_VPROPBASER_SHAREABILITY_SHIFT (10) +#define GICR_VPROPBASER_INNER_CACHEABILITY_SHIFT (7) +#define GICR_VPROPBASER_OUTER_CACHEABILITY_SHIFT (56) + +#define GICR_VPROPBASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GICR_VPROPBASER, SHAREABILITY_MASK) +#define GICR_VPROPBASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, MASK) +#define GICR_VPROPBASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_VPROPBASER, OUTER, MASK) +#define GICR_VPROPBASER_CACHEABILITY_MASK \ + GICR_VPROPBASER_INNER_CACHEABILITY_MASK + +#define GICR_VPROPBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GICR_VPROPBASER, InnerShareable) + +#define GICR_VPROPBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, nCnB) +#define GICR_VPROPBASER_nC GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, nC) +#define GICR_VPROPBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWt) +#define GICR_VPROPBASER_RaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWb) +#define GICR_VPROPBASER_WaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, WaWt) +#define GICR_VPROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, WaWb) +#define GICR_VPROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWaWt) +#define GICR_VPROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWaWb) + +/* + * GICv4.1 VPROPBASER reinvention. A subtle mix between the old + * VPROPBASER and ITS_BASER. Just not quite any of the two. + */ +#define GICR_VPROPBASER_4_1_VALID (1ULL << 63) +#define GICR_VPROPBASER_4_1_ENTRY_SIZE GENMASK_ULL(61, 59) +#define GICR_VPROPBASER_4_1_INDIRECT (1ULL << 55) +#define GICR_VPROPBASER_4_1_PAGE_SIZE GENMASK_ULL(54, 53) +#define GICR_VPROPBASER_4_1_Z (1ULL << 52) +#define GICR_VPROPBASER_4_1_ADDR GENMASK_ULL(51, 12) +#define GICR_VPROPBASER_4_1_SIZE GENMASK_ULL(6, 0) + +#define GICR_VPENDBASER 0x0078 + +#define GICR_VPENDBASER_SHAREABILITY_SHIFT (10) +#define GICR_VPENDBASER_INNER_CACHEABILITY_SHIFT (7) +#define GICR_VPENDBASER_OUTER_CACHEABILITY_SHIFT (56) +#define GICR_VPENDBASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GICR_VPENDBASER, SHAREABILITY_MASK) +#define GICR_VPENDBASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, MASK) +#define GICR_VPENDBASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GICR_VPENDBASER, OUTER, MASK) +#define GICR_VPENDBASER_CACHEABILITY_MASK \ + GICR_VPENDBASER_INNER_CACHEABILITY_MASK + +#define GICR_VPENDBASER_NonShareable \ + GIC_BASER_SHAREABILITY(GICR_VPENDBASER, NonShareable) + +#define GICR_VPENDBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GICR_VPENDBASER, InnerShareable) + +#define GICR_VPENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nCnB) +#define GICR_VPENDBASER_nC GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nC) +#define GICR_VPENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWt) +#define GICR_VPENDBASER_RaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWb) +#define GICR_VPENDBASER_WaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, WaWt) +#define GICR_VPENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, WaWb) +#define GICR_VPENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWaWt) +#define GICR_VPENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWaWb) + +#define GICR_VPENDBASER_Dirty (1ULL << 60) +#define GICR_VPENDBASER_PendingLast (1ULL << 61) +#define GICR_VPENDBASER_IDAI (1ULL << 62) +#define GICR_VPENDBASER_Valid (1ULL << 63) + +/* + * GICv4.1 VPENDBASER, used for VPE residency. On top of these fields, + * also use the above Valid, PendingLast and Dirty. + */ +#define GICR_VPENDBASER_4_1_DB (1ULL << 62) +#define GICR_VPENDBASER_4_1_VGRP0EN (1ULL << 59) +#define GICR_VPENDBASER_4_1_VGRP1EN (1ULL << 58) +#define GICR_VPENDBASER_4_1_VPEID GENMASK_ULL(15, 0) + +#define GICR_VSGIR 0x0080 + +#define GICR_VSGIR_VPEID GENMASK(15, 0) + +#define GICR_VSGIPENDR 0x0088 + +#define GICR_VSGIPENDR_BUSY (1U << 31) +#define GICR_VSGIPENDR_PENDING GENMASK(15, 0) + +/* + * ITS registers, offsets from ITS_base + */ +#define GITS_CTLR 0x0000 +#define GITS_IIDR 0x0004 +#define GITS_TYPER 0x0008 +#define GITS_MPIDR 0x0018 +#define GITS_CBASER 0x0080 +#define GITS_CWRITER 0x0088 +#define GITS_CREADR 0x0090 +#define GITS_BASER 0x0100 +#define GITS_IDREGS_BASE 0xffd0 +#define GITS_PIDR0 0xffe0 +#define GITS_PIDR1 0xffe4 +#define GITS_PIDR2 GICR_PIDR2 +#define GITS_PIDR4 0xffd0 +#define GITS_CIDR0 0xfff0 +#define GITS_CIDR1 0xfff4 +#define GITS_CIDR2 0xfff8 +#define GITS_CIDR3 0xfffc + +#define GITS_TRANSLATER 0x10040 + +#define GITS_SGIR 0x20020 + +#define GITS_SGIR_VPEID GENMASK_ULL(47, 32) +#define GITS_SGIR_VINTID GENMASK_ULL(3, 0) + +#define GITS_CTLR_ENABLE (1U << 0) +#define GITS_CTLR_ImDe (1U << 1) +#define GITS_CTLR_ITS_NUMBER_SHIFT 4 +#define GITS_CTLR_ITS_NUMBER (0xFU << GITS_CTLR_ITS_NUMBER_SHIFT) +#define GITS_CTLR_QUIESCENT (1U << 31) + +#define GITS_TYPER_PLPIS (1UL << 0) +#define GITS_TYPER_VLPIS (1UL << 1) +#define GITS_TYPER_ITT_ENTRY_SIZE_SHIFT 4 +#define GITS_TYPER_ITT_ENTRY_SIZE GENMASK_ULL(7, 4) +#define GITS_TYPER_IDBITS_SHIFT 8 +#define GITS_TYPER_DEVBITS_SHIFT 13 +#define GITS_TYPER_DEVBITS GENMASK_ULL(17, 13) +#define GITS_TYPER_PTA (1UL << 19) +#define GITS_TYPER_HCC_SHIFT 24 +#define GITS_TYPER_HCC(r) (((r) >> GITS_TYPER_HCC_SHIFT) & 0xff) +#define GITS_TYPER_VMOVP (1ULL << 37) +#define GITS_TYPER_VMAPP (1ULL << 40) +#define GITS_TYPER_SVPET GENMASK_ULL(42, 41) + +#define GITS_IIDR_REV_SHIFT 12 +#define GITS_IIDR_REV_MASK (0xf << GITS_IIDR_REV_SHIFT) +#define GITS_IIDR_REV(r) (((r) >> GITS_IIDR_REV_SHIFT) & 0xf) +#define GITS_IIDR_PRODUCTID_SHIFT 24 + +#define GITS_CBASER_VALID (1ULL << 63) +#define GITS_CBASER_SHAREABILITY_SHIFT (10) +#define GITS_CBASER_INNER_CACHEABILITY_SHIFT (59) +#define GITS_CBASER_OUTER_CACHEABILITY_SHIFT (53) +#define GITS_CBASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GITS_CBASER, SHAREABILITY_MASK) +#define GITS_CBASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, MASK) +#define GITS_CBASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GITS_CBASER, OUTER, MASK) +#define GITS_CBASER_CACHEABILITY_MASK GITS_CBASER_INNER_CACHEABILITY_MASK + +#define GITS_CBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GITS_CBASER, InnerShareable) + +#define GITS_CBASER_nCnB GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nCnB) +#define GITS_CBASER_nC GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nC) +#define GITS_CBASER_RaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWt) +#define GITS_CBASER_RaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWb) +#define GITS_CBASER_WaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWt) +#define GITS_CBASER_WaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWb) +#define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt) +#define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb) + +#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(51, 12)) + +#define GITS_BASER_NR_REGS 8 + +#define GITS_BASER_VALID (1ULL << 63) +#define GITS_BASER_INDIRECT (1ULL << 62) + +#define GITS_BASER_INNER_CACHEABILITY_SHIFT (59) +#define GITS_BASER_OUTER_CACHEABILITY_SHIFT (53) +#define GITS_BASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GITS_BASER, INNER, MASK) +#define GITS_BASER_CACHEABILITY_MASK GITS_BASER_INNER_CACHEABILITY_MASK +#define GITS_BASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, MASK) +#define GITS_BASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GITS_BASER, SHAREABILITY_MASK) + +#define GITS_BASER_nCnB GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nCnB) +#define GITS_BASER_nC GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nC) +#define GITS_BASER_RaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWt) +#define GITS_BASER_RaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb) +#define GITS_BASER_WaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWt) +#define GITS_BASER_WaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWb) +#define GITS_BASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWaWt) +#define GITS_BASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWaWb) + +#define GITS_BASER_TYPE_SHIFT (56) +#define GITS_BASER_TYPE(r) (((r) >> GITS_BASER_TYPE_SHIFT) & 7) +#define GITS_BASER_ENTRY_SIZE_SHIFT (48) +#define GITS_BASER_ENTRY_SIZE(r) ((((r) >> GITS_BASER_ENTRY_SIZE_SHIFT) & 0x1f) + 1) +#define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48) +#define GITS_BASER_PHYS_52_to_48(phys) \ + (((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12) +#define GITS_BASER_ADDR_48_to_52(baser) \ + (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48) + +#define GITS_BASER_SHAREABILITY_SHIFT (10) +#define GITS_BASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable) +#define GITS_BASER_PAGE_SIZE_SHIFT (8) +#define __GITS_BASER_PSZ(sz) (GIC_PAGE_SIZE_ ## sz << GITS_BASER_PAGE_SIZE_SHIFT) +#define GITS_BASER_PAGE_SIZE_4K __GITS_BASER_PSZ(4K) +#define GITS_BASER_PAGE_SIZE_16K __GITS_BASER_PSZ(16K) +#define GITS_BASER_PAGE_SIZE_64K __GITS_BASER_PSZ(64K) +#define GITS_BASER_PAGE_SIZE_MASK __GITS_BASER_PSZ(MASK) +#define GITS_BASER_PAGES_MAX 256 +#define GITS_BASER_PAGES_SHIFT (0) +#define GITS_BASER_NR_PAGES(r) (((r) & 0xff) + 1) + +#define GITS_BASER_TYPE_NONE 0 +#define GITS_BASER_TYPE_DEVICE 1 +#define GITS_BASER_TYPE_VCPU 2 +#define GITS_BASER_TYPE_RESERVED3 3 +#define GITS_BASER_TYPE_COLLECTION 4 +#define GITS_BASER_TYPE_RESERVED5 5 +#define GITS_BASER_TYPE_RESERVED6 6 +#define GITS_BASER_TYPE_RESERVED7 7 + +#define GITS_LVL1_ENTRY_SIZE (8UL) + +/* + * ITS commands + */ +#define GITS_CMD_MAPD 0x08 +#define GITS_CMD_MAPC 0x09 +#define GITS_CMD_MAPTI 0x0a +#define GITS_CMD_MAPI 0x0b +#define GITS_CMD_MOVI 0x01 +#define GITS_CMD_DISCARD 0x0f +#define GITS_CMD_INV 0x0c +#define GITS_CMD_MOVALL 0x0e +#define GITS_CMD_INVALL 0x0d +#define GITS_CMD_INT 0x03 +#define GITS_CMD_CLEAR 0x04 +#define GITS_CMD_SYNC 0x05 + +/* + * GICv4 ITS specific commands + */ +#define GITS_CMD_GICv4(x) ((x) | 0x20) +#define GITS_CMD_VINVALL GITS_CMD_GICv4(GITS_CMD_INVALL) +#define GITS_CMD_VMAPP GITS_CMD_GICv4(GITS_CMD_MAPC) +#define GITS_CMD_VMAPTI GITS_CMD_GICv4(GITS_CMD_MAPTI) +#define GITS_CMD_VMOVI GITS_CMD_GICv4(GITS_CMD_MOVI) +#define GITS_CMD_VSYNC GITS_CMD_GICv4(GITS_CMD_SYNC) +/* VMOVP, VSGI and INVDB are the odd ones, as they dont have a physical counterpart */ +#define GITS_CMD_VMOVP GITS_CMD_GICv4(2) +#define GITS_CMD_VSGI GITS_CMD_GICv4(3) +#define GITS_CMD_INVDB GITS_CMD_GICv4(0xe) + +/* + * ITS error numbers + */ +#define E_ITS_MOVI_UNMAPPED_INTERRUPT 0x010107 +#define E_ITS_MOVI_UNMAPPED_COLLECTION 0x010109 +#define E_ITS_INT_UNMAPPED_INTERRUPT 0x010307 +#define E_ITS_CLEAR_UNMAPPED_INTERRUPT 0x010507 +#define E_ITS_MAPD_DEVICE_OOR 0x010801 +#define E_ITS_MAPD_ITTSIZE_OOR 0x010802 +#define E_ITS_MAPC_PROCNUM_OOR 0x010902 +#define E_ITS_MAPC_COLLECTION_OOR 0x010903 +#define E_ITS_MAPTI_UNMAPPED_DEVICE 0x010a04 +#define E_ITS_MAPTI_ID_OOR 0x010a05 +#define E_ITS_MAPTI_PHYSICALID_OOR 0x010a06 +#define E_ITS_INV_UNMAPPED_INTERRUPT 0x010c07 +#define E_ITS_INVALL_UNMAPPED_COLLECTION 0x010d09 +#define E_ITS_MOVALL_PROCNUM_OOR 0x010e01 +#define E_ITS_DISCARD_UNMAPPED_INTERRUPT 0x010f07 + +/* + * CPU interface registers + */ +#define ICC_CTLR_EL1_EOImode_SHIFT (1) +#define ICC_CTLR_EL1_EOImode_drop_dir (0U << ICC_CTLR_EL1_EOImode_SHIFT) +#define ICC_CTLR_EL1_EOImode_drop (1U << ICC_CTLR_EL1_EOImode_SHIFT) +#define ICC_CTLR_EL1_EOImode_MASK (1 << ICC_CTLR_EL1_EOImode_SHIFT) +#define ICC_CTLR_EL1_CBPR_SHIFT 0 +#define ICC_CTLR_EL1_CBPR_MASK (1 << ICC_CTLR_EL1_CBPR_SHIFT) +#define ICC_CTLR_EL1_PMHE_SHIFT 6 +#define ICC_CTLR_EL1_PMHE_MASK (1 << ICC_CTLR_EL1_PMHE_SHIFT) +#define ICC_CTLR_EL1_PRI_BITS_SHIFT 8 +#define ICC_CTLR_EL1_PRI_BITS_MASK (0x7 << ICC_CTLR_EL1_PRI_BITS_SHIFT) +#define ICC_CTLR_EL1_ID_BITS_SHIFT 11 +#define ICC_CTLR_EL1_ID_BITS_MASK (0x7 << ICC_CTLR_EL1_ID_BITS_SHIFT) +#define ICC_CTLR_EL1_SEIS_SHIFT 14 +#define ICC_CTLR_EL1_SEIS_MASK (0x1 << ICC_CTLR_EL1_SEIS_SHIFT) +#define ICC_CTLR_EL1_A3V_SHIFT 15 +#define ICC_CTLR_EL1_A3V_MASK (0x1 << ICC_CTLR_EL1_A3V_SHIFT) +#define ICC_CTLR_EL1_RSS (0x1 << 18) +#define ICC_CTLR_EL1_ExtRange (0x1 << 19) +#define ICC_PMR_EL1_SHIFT 0 +#define ICC_PMR_EL1_MASK (0xff << ICC_PMR_EL1_SHIFT) +#define ICC_BPR0_EL1_SHIFT 0 +#define ICC_BPR0_EL1_MASK (0x7 << ICC_BPR0_EL1_SHIFT) +#define ICC_BPR1_EL1_SHIFT 0 +#define ICC_BPR1_EL1_MASK (0x7 << ICC_BPR1_EL1_SHIFT) +#define ICC_IGRPEN0_EL1_SHIFT 0 +#define ICC_IGRPEN0_EL1_MASK (1 << ICC_IGRPEN0_EL1_SHIFT) +#define ICC_IGRPEN1_EL1_SHIFT 0 +#define ICC_IGRPEN1_EL1_MASK (1 << ICC_IGRPEN1_EL1_SHIFT) +#define ICC_SRE_EL1_DIB (1U << 2) +#define ICC_SRE_EL1_DFB (1U << 1) +#define ICC_SRE_EL1_SRE (1U << 0) + +/* + * Hypervisor interface registers (SRE only) + */ +#define ICH_LR_VIRTUAL_ID_MASK ((1ULL << 32) - 1) + +#define ICH_LR_EOI (1ULL << 41) +#define ICH_LR_GROUP (1ULL << 60) +#define ICH_LR_HW (1ULL << 61) +#define ICH_LR_STATE (3ULL << 62) +#define ICH_LR_PENDING_BIT (1ULL << 62) +#define ICH_LR_ACTIVE_BIT (1ULL << 63) +#define ICH_LR_PHYS_ID_SHIFT 32 +#define ICH_LR_PHYS_ID_MASK (0x3ffULL << ICH_LR_PHYS_ID_SHIFT) +#define ICH_LR_PRIORITY_SHIFT 48 +#define ICH_LR_PRIORITY_MASK (0xffULL << ICH_LR_PRIORITY_SHIFT) + +/* These are for GICv2 emulation only */ +#define GICH_LR_VIRTUALID (0x3ffUL << 0) +#define GICH_LR_PHYSID_CPUID_SHIFT (10) +#define GICH_LR_PHYSID_CPUID (7UL << GICH_LR_PHYSID_CPUID_SHIFT) + +#define ICH_MISR_EOI (1 << 0) +#define ICH_MISR_U (1 << 1) + +#define ICH_HCR_EN (1 << 0) +#define ICH_HCR_UIE (1 << 1) +#define ICH_HCR_NPIE (1 << 3) +#define ICH_HCR_TC (1 << 10) +#define ICH_HCR_TALL0 (1 << 11) +#define ICH_HCR_TALL1 (1 << 12) +#define ICH_HCR_EOIcount_SHIFT 27 +#define ICH_HCR_EOIcount_MASK (0x1f << ICH_HCR_EOIcount_SHIFT) + +#define ICH_VMCR_ACK_CTL_SHIFT 2 +#define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT) +#define ICH_VMCR_FIQ_EN_SHIFT 3 +#define ICH_VMCR_FIQ_EN_MASK (1 << ICH_VMCR_FIQ_EN_SHIFT) +#define ICH_VMCR_CBPR_SHIFT 4 +#define ICH_VMCR_CBPR_MASK (1 << ICH_VMCR_CBPR_SHIFT) +#define ICH_VMCR_EOIM_SHIFT 9 +#define ICH_VMCR_EOIM_MASK (1 << ICH_VMCR_EOIM_SHIFT) +#define ICH_VMCR_BPR1_SHIFT 18 +#define ICH_VMCR_BPR1_MASK (7 << ICH_VMCR_BPR1_SHIFT) +#define ICH_VMCR_BPR0_SHIFT 21 +#define ICH_VMCR_BPR0_MASK (7 << ICH_VMCR_BPR0_SHIFT) +#define ICH_VMCR_PMR_SHIFT 24 +#define ICH_VMCR_PMR_MASK (0xffUL << ICH_VMCR_PMR_SHIFT) +#define ICH_VMCR_ENG0_SHIFT 0 +#define ICH_VMCR_ENG0_MASK (1 << ICH_VMCR_ENG0_SHIFT) +#define ICH_VMCR_ENG1_SHIFT 1 +#define ICH_VMCR_ENG1_MASK (1 << ICH_VMCR_ENG1_SHIFT) + +#define ICH_VTR_PRI_BITS_SHIFT 29 +#define ICH_VTR_PRI_BITS_MASK (7 << ICH_VTR_PRI_BITS_SHIFT) +#define ICH_VTR_ID_BITS_SHIFT 23 +#define ICH_VTR_ID_BITS_MASK (7 << ICH_VTR_ID_BITS_SHIFT) +#define ICH_VTR_SEIS_SHIFT 22 +#define ICH_VTR_SEIS_MASK (1 << ICH_VTR_SEIS_SHIFT) +#define ICH_VTR_A3V_SHIFT 21 +#define ICH_VTR_A3V_MASK (1 << ICH_VTR_A3V_SHIFT) + +#define ICC_IAR1_EL1_SPURIOUS 0x3ff + +#define ICC_SRE_EL2_SRE (1 << 0) +#define ICC_SRE_EL2_ENABLE (1 << 3) + +#define ICC_SGI1R_TARGET_LIST_SHIFT 0 +#define ICC_SGI1R_TARGET_LIST_MASK (0xffff << ICC_SGI1R_TARGET_LIST_SHIFT) +#define ICC_SGI1R_AFFINITY_1_SHIFT 16 +#define ICC_SGI1R_AFFINITY_1_MASK (0xff << ICC_SGI1R_AFFINITY_1_SHIFT) +#define ICC_SGI1R_SGI_ID_SHIFT 24 +#define ICC_SGI1R_SGI_ID_MASK (0xfULL << ICC_SGI1R_SGI_ID_SHIFT) +#define ICC_SGI1R_AFFINITY_2_SHIFT 32 +#define ICC_SGI1R_AFFINITY_2_MASK (0xffULL << ICC_SGI1R_AFFINITY_2_SHIFT) +#define ICC_SGI1R_IRQ_ROUTING_MODE_BIT 40 +#define ICC_SGI1R_RS_SHIFT 44 +#define ICC_SGI1R_RS_MASK (0xfULL << ICC_SGI1R_RS_SHIFT) +#define ICC_SGI1R_AFFINITY_3_SHIFT 48 +#define ICC_SGI1R_AFFINITY_3_MASK (0xffULL << ICC_SGI1R_AFFINITY_3_SHIFT) + +#include <asm/arch_gicv3.h> + +#ifndef __ASSEMBLY__ + +/* + * We need a value to serve as a irq-type for LPIs. Choose one that will + * hopefully pique the interest of the reviewer. + */ +#define GIC_IRQ_TYPE_LPI 0xa110c8ed + +struct rdists { + struct { + raw_spinlock_t rd_lock; + void __iomem *rd_base; + struct page *pend_page; + phys_addr_t phys_base; + bool lpi_enabled; + cpumask_t *vpe_table_mask; + void *vpe_l1_base; + } __percpu *rdist; + phys_addr_t prop_table_pa; + void *prop_table_va; + u64 flags; + u32 gicd_typer; + u32 gicd_typer2; + bool has_vlpis; + bool has_rvpeid; + bool has_direct_lpi; + bool has_vpend_valid_dirty; +}; + +struct irq_domain; +struct fwnode_handle; +int phytium_its_cpu_init(void); +int phytium_its_init(struct fwnode_handle *handle, struct rdists *rdists, + struct irq_domain *domain); +int mbi_init(struct fwnode_handle *fwnode, struct irq_domain *parent); + +static inline bool gic_enable_sre(void) +{ + u32 val; + + val = gic_read_sre(); + if (val & ICC_SRE_EL1_SRE) + return true; + + val |= ICC_SRE_EL1_SRE; + gic_write_sre(val); + val = gic_read_sre(); + + return !!(val & ICC_SRE_EL1_SRE); +} + +#endif + +#endif -- 2.20.1

2 24

Re: [PATCH OLK-5.10 0/2] Enable smmu-v3 when 3408iMR/3416iMRraid card exist
by Hanjun Guo 27 Jul '21

27 Jul '21

On 2021/7/27 9:53, luochunsheng wrote: > smmu-v3 support Virtualization pass-through feature when 3408iMR/3416iMRraid card exist > > Zheng Zengkai (1): > openeuler_defconfig: Enable CONFIG_SMMU_BYPASS_DEV by default > > luochunsheng (1): > iommu: smmu-v3 support Virtualization pass-through feature when > 3408iMR/3416iMRraid card exist > > arch/arm64/configs/openeuler_defconfig | 1 + > drivers/iommu/Kconfig | 9 ++ > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 120 ++++++++++++++++++++ > 3 files changed, 130 insertions(+) > Reviewed-by: Hanjun Guo <guohanjun(a)huawei.com>

1 0

[PATCH openEuler-21.09 0/4] Scan for an idle sibling in a single pass
by Zheng Zucheng 26 Jul '21

26 Jul '21

From: Zheng Zengkai <zhengzengkai(a)huawei.com> Scan for an idle sibling in a single pass Mel Gorman (4): sched/fair: Remove SIS_AVG_CPU sched/fair: Move avg_scan_cost calculations under SIS_PROP sched/fair: Remove select_idle_smt() sched/fair: Merge select_idle_core/cpu() kernel/sched/fair.c | 150 +++++++++++++++++++--------------------- kernel/sched/features.h | 1 - 2 files changed, 70 insertions(+), 81 deletions(-) -- 2.20.1

1 4

[openEuler-21.09 000/141] net: hns3: add some optimizations and
by Zheng Zengkai 24 Jul '21

24 Jul '21

This series includes some updates for HNS3 ethernet driver and contains the following contents: 1.refactor and add debugfs cmd; 2.optimize IO path; 3.add support for ptp, imp-controlled PHYs, query tx spare buffer size, RXD advanced layout, handling all errors through MSI-X; 4.clear cleanups for drivers; 5.add capability bits to resolve compatibility; 6.fix drivers bugs; Baokun Li (2): net: hns3: use list_move_tail instead of list_del/list_add_tail in hclgevf_main.c net: hns3: use list_move_tail instead of list_del/list_add_tail in hclge_main.c Christophe JAILLET (1): net: hns3: Fix a memory leak in an error handling path in 'hclge_handle_error_info_log()' Colin Ian King (3): net: hns3: remove redundant null check of an array net: hns3: Fix potential null pointer defererence of null ae_dev net: hns3: Fix return of uninitialized variable ret Dan Carpenter (2): net: hns3: fix a double shift bug net: hns3: fix different snprintf() limit Guangbin Huang (18): net: hns3: add interfaces to query information of tm priority/qset net: hns3: add debugfs support for tm nodes, priority and qset info net: hns3: RSS indirection table use device specification net: hns3: debugfs add max tm rate specification print net: hns3: replace macro of max qset number with specification net: hns3: add support for imp-controlled PHYs net: hns3: add get/set pause parameters support for imp-controlled PHYs net: hns3: add ioctl support for imp-controlled PHYs net: hns3: add phy loopback support for imp-controlled PHYs net: hns3: PF add support for pushing link status to VFs net: hns3: VF not request link status when PF support push link status feature net: hns3: refactor dump tm map of debugfs net: hns3: refactor dump tm of debugfs net: hns3: refactor dump tc of debugfs net: hns3: refactor dump qos pause cfg of debugfs net: hns3: refactor dump qos pri map of debugfs net: hns3: refactor dump qos buf cfg of debugfs net: hns3: refactor dump qs shaper of debugfs GuoJia Liao (1): net: hns3: optimize the code when update the tc info Guojia Liao (2): net: hns3: split out hclge_tm_vport_tc_info_update() net: hns3: expand the tc config command Gustavo A. R. Silva (1): net: hns3: fix return of random stack value Hao Chen (4): net: hns3: refactor out hclge_rm_vport_all_mac_table() net: hns3: refactor queue map of debugfs net: hns3: refactor queue info of debugfs net: hns3: refactor dump fd tcam of debugfs Huazhong Tan (19): net: hns3: remove redundant return value of hns3_uninit_all_ring() net: hns3: remove an unused parameter in hclge_vf_rate_param_check() net: hns3: refactor out hclge_set_rss_tuple() net: hns3: refactor out hclgevf_set_rss_tuple() net: hns3: remove unused parameter from hclge_dbg_dump_loopback() net: hns3: fix prototype warning net: hns3: fix some typos in hclge_main.c net: hns3: remove a duplicate pf reset counting net: hns3: cleanup inappropriate spaces in struct hlcgevf_tqp_stats net: hns3: change the value of the SEPARATOR_VALUE macro in hclgevf_main.c net: hns3: support RXD advanced layout net: hns3: refactor out RX completion checksum net: hns3: refactor dump bd info of debugfs net: hns3: refactor dump mac list of debugfs net: hns3: fix user's coalesce configuration lost issue net: hns3: switch to dim algorithm for adaptive interrupt moderation net: hns3: add support for PTP net: hns3: add debugfs support for ptp info net: hns3: add support to query tx spare buffer size for pf Jian Shen (29): net: hns3: add api capability bits for firmware net: hns3: remove redundant client_setup_tc handle net: hns3: cleanup for endian issue for VF RSS net: hns3: refactor out hclge_get_rss_tuple() net: hns3: refactor out hclgevf_get_rss_tuple() net: hns3: split out hclge_dbg_dump_qos_buf_cfg() net: hns3: refactor out hclge_add_fd_entry() net: hns3: refactor out hclge_fd_get_tuple() net: hns3: refactor for function hclge_fd_convert_tuple net: hns3: add support for traffic class tuple support for flow director by ethtool net: hns3: refactor flow director configuration net: hns3: refine for hns3_del_all_fd_entries() net: hns3: add support for user-def data of flow director net: hns3: remove unused code of vmdq net: hns3: fix missing rule state assignment net: hns3: fix use-after-free issue for hclge_add_fd_entry_common() net: hns3: remove the rss_size limitation by vector num net: hns3: configure promisc mode for VF asynchronously net: hns3: use HCLGE_VPORT_STATE_PROMISC_CHANGE to replace HCLGE_STATE_PROMISC_CHANGED net: hns3: add 'QoS' support for port based VLAN configuration net: hns3: refine for hclge_push_vf_port_base_vlan_info() net: hns3: remove unnecessary updating port based VLAN net: hns3: refine function hclge_set_vf_vlan_cfg() net: hns3: add support for modify VLAN filter state net: hns3: add query basic info support for VF net: hns3: add support for VF modify VLAN filter state net: hns3: add debugfs support for vlan configuration net: hns3: add support for FD counter in debugfs net: hns3: add support for dumping MAC umv counter in debugfs Jiaran Zhang (17): net: hns3: modify some unmacthed types print parameter net: hns3: use ipv6_addr_any() helper net: hns3: remove redundant query in hclge_config_tm_hw_err_int() net: hns3: change flr_prepare/flr_done function names net: hns3: add suspend and resume pm_ops net: hns3: refactor dev capability and dev spec of debugfs net: hns3: refactor dump intr of debugfs net: hns3: refactor dump reset info of debugfs net: hns3: refactor dump m7 info of debugfs net: hns3: refactor dump ncl config of debugfs net: hns3: refactor dump mac tnl status of debugfs net: hns3: add a separate error handling task net: hns3: add scheduling logic for error handling task net: hns3: add the RAS compatibility adaptation solution net: hns3: add support for imp-handle ras capability net: hns3: update error recovery module and type net: hns3: add error handling compatibility during initialization Liu Jian (1): net: hns3: no return statement in hclge_clear_arfs_rules Peng Li (10): net: hns3: remove the shaper param magic number net: hns3: change hclge_parse_speed() param type net: hns3: change hclge_query_bd_num() param type net: hns3: remove unused macro definition net: hns3: refactor out hclge_cmd_convert_err_code() net: hns3: refactor out hclgevf_cmd_convert_err_code() net: hns3: clean up hns3_dbg_cmd_write() net: hns3: refactor out hclge_set_vf_vlan_common() net: hns3: remove redundant blank lines net: hns3: remove unused parameter from hclge_set_vf_vlan_common() Salil Mehta (3): net: hns3: Remove the left over redundant check & assignment net: hns3: Remove un-necessary 'else-if' in the hclge_reset_event() net: hns3: Trivial spell fix in hns3 driver Yonglong Liu (1): net: hns3: clean up some incorrect variable types in hclge_dbg_dump_tm_map() Yufeng Mo (18): net: hns3: add support for obtaining the maximum frame size net: hns3: clean up unnecessary parentheses in macro definitions net: hns3: split out hclge_cmd_send() net: hns3: split out hclgevf_cmd_send() net: hns3: use FEC capability queried from firmware net: hns3: use pause capability queried from firmware net: hns3: split function hclge_reset_rebuild() net: hns3: optimize the process of queue reset net: hns3: clear unnecessary reset request in hclge_reset_rebuild net: hns3: refactor the debugfs process net: hns3: refactor dump mng tbl of debugfs net: hns3: refactor dump loopback of debugfs net: hns3: refactor dump reg of debugfs net: hns3: refactor dump reg dcb info of debugfs net: hns3: refactor dump serv info of debugfs net: hns3: remove the useless debugfs file node cmd net: hns3: remove now redundant logic related to HNAE3_UNKNOWN_RESET net: hns3: add support for handling all errors through MSI-X Yunsheng Lin (9): net: hns3: add tx send size handling for tso skb net: hns3: add stats logging when skb padding fails net: hns3: minor refactor related to desc_cb handling net: hns3: refactor for hns3_fill_desc() function net: hns3: use tx bounce buffer for small packets net: hns3: support dma_map_sg() for multi frags skb net: hns3: optimize the rx page reuse handling process net: hns3: use bounce buffer when rx page can not be reused net: hns3: fix reuse conflict of the rx page drivers/net/ethernet/hisilicon/Kconfig | 2 + .../net/ethernet/hisilicon/hns3/hclge_mbx.h | 13 +- drivers/net/ethernet/hisilicon/hns3/hnae3.h | 117 +- .../ethernet/hisilicon/hns3/hns3_debugfs.c | 1429 ++++++-- .../ethernet/hisilicon/hns3/hns3_debugfs.h | 64 + .../net/ethernet/hisilicon/hns3/hns3_enet.c | 1448 ++++++-- .../net/ethernet/hisilicon/hns3/hns3_enet.h | 122 +- .../ethernet/hisilicon/hns3/hns3_ethtool.c | 177 +- .../ethernet/hisilicon/hns3/hns3pf/Makefile | 2 +- .../hisilicon/hns3/hns3pf/hclge_cmd.c | 192 +- .../hisilicon/hns3/hns3pf/hclge_cmd.h | 125 +- .../hisilicon/hns3/hns3pf/hclge_dcb.c | 27 - .../hisilicon/hns3/hns3pf/hclge_debugfs.c | 2791 ++++++++++----- .../hisilicon/hns3/hns3pf/hclge_debugfs.h | 47 +- .../hisilicon/hns3/hns3pf/hclge_err.c | 428 ++- .../hisilicon/hns3/hns3pf/hclge_err.h | 89 + .../hisilicon/hns3/hns3pf/hclge_main.c | 3110 +++++++++++------ .../hisilicon/hns3/hns3pf/hclge_main.h | 137 +- .../hisilicon/hns3/hns3pf/hclge_mbx.c | 155 +- .../hisilicon/hns3/hns3pf/hclge_mdio.c | 39 + .../hisilicon/hns3/hns3pf/hclge_mdio.h | 2 + .../hisilicon/hns3/hns3pf/hclge_ptp.c | 542 +++ .../hisilicon/hns3/hns3pf/hclge_ptp.h | 134 + .../ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 448 ++- .../ethernet/hisilicon/hns3/hns3pf/hclge_tm.h | 67 +- .../hisilicon/hns3/hns3vf/hclgevf_cmd.c | 207 +- .../hisilicon/hns3/hns3vf/hclgevf_cmd.h | 28 +- .../hisilicon/hns3/hns3vf/hclgevf_main.c | 361 +- .../hisilicon/hns3/hns3vf/hclgevf_main.h | 16 +- .../hisilicon/hns3/hns3vf/hclgevf_mbx.c | 6 + 30 files changed, 8958 insertions(+), 3367 deletions(-) create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.h create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.h -- 2.20.1

1 135

[PATCH openEuler-1.0-LTS] mm: vmalloc: prevent use after free in _vm_unmap_aliases
by Yang Yingliang 23 Jul '21

23 Jul '21

From: Vijayanand Jitta <vjitta(a)codeaurora.org> mainline inclusion from mainline-5.13-rc1 commit ad216c0316ad6391d90f4de0a7f59396b2925a06 category: bugfix bugzilla: 56231 CVE: NA ------------------------------------------------- A potential use after free can occur in _vm_unmap_aliases where an already freed vmap_area could be accessed, Consider the following scenario: Process 1 Process 2 __vm_unmap_aliases __vm_unmap_aliases purge_fragmented_blocks_allcpus rcu_read_lock() rcu_read_lock() list_del_rcu(&vb->free_list) list_for_each_entry_rcu(vb .. ) __purge_vmap_area_lazy kmem_cache_free(va) va_start = vb->va->va_start Here Process 1 is in purge path and it does list_del_rcu on vmap_block and later frees the vmap_area, since Process 2 was holding the rcu lock at this time vmap_block will still be present in and Process 2 accesse it and thereby it tries to access vmap_area of that vmap_block which was already freed by Process 1 and this results in use after free. Fix this by adding a check for vb->dirty before accessing vmap_area structure since vb->dirty will be set to VMAP_BBMAP_BITS in purge path checking for this will prevent the use after free. Link: https://lkml.kernel.org/r/1616062105-23263-1-git-send-email-vjitta@codeauro… Signed-off-by: Vijayanand Jitta <vjitta(a)codeaurora.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> (cherry picked from commit ad216c0316ad6391d90f4de0a7f59396b2925a06) Signed-off-by: Yue Zou <zouyue3(a)huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Reviewed-by: tong tiangen <tongtiangen(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 12c68353feebf..881d0c79a6d0b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1631,7 +1631,7 @@ void vm_unmap_aliases(void) rcu_read_lock(); list_for_each_entry_rcu(vb, &vbq->free, free_list) { spin_lock(&vb->lock); - if (vb->dirty) { + if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { unsigned long va_start = vb->va->va_start; unsigned long s, e; -- 2.25.1

1 0

[PATCH 01/17] ima: fix CONFIG_IMA_DIGEST_DB_MEGABYTES in openeuler_defconfig
by Zheng Zengkai 23 Jul '21

23 Jul '21

From: Zhang Tianxing <zhangtianxing3(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I409K9 CVE: NA ----------------------------------------------------------------- Commit 7c9d18bcaa ("ima: Add max size for IMA digest database") adds a new Kconfig for IMA Digest Lists: CONFIG_IMA_DIGEST_DB_MEGABYTES. However, that commit has typos in openeuler_defconfig. This patch is to fix that typo. Fixes: 7c9d18bcaa ("ima: Add max size for IMA digest database") Signed-off-by: Zhang Tianxing <zhangtianxing3(a)huawei.com> Reviewed-by: Roberto Sassu <roberto.sassu(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 2 +- arch/x86/configs/openeuler_defconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 39cb54bcaa49..0ff8e6ce6b78 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -6424,7 +6424,7 @@ CONFIG_IMA_DIGEST_LISTS_DIR="/etc/ima/digest_lists" CONFIG_IMA_STANDARD_DIGEST_DB_SIZE=y # CONFIG_IMA_MAX_DIGEST_DB_SIZE is not set # CONFIG_IMA_CUSTOM_DIGEST_DB_SIZE is not set -CONFIG_IMA_DIGEST_DB_SIZE=16 +CONFIG_IMA_DIGEST_DB_MEGABYTES=16 CONFIG_IMA_PARSER_BINARY_PATH="/usr/bin/upload_digest_lists" CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 607d4a7dfcba..e2a7fda97fa3 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -7792,7 +7792,7 @@ CONFIG_IMA_DIGEST_LISTS_DIR="/etc/ima/digest_lists" CONFIG_IMA_STANDARD_DIGEST_DB_SIZE=y # CONFIG_IMA_MAX_DIGEST_DB_SIZE is not set # CONFIG_IMA_CUSTOM_DIGEST_DB_SIZE is not set -CONFIG_IMA_DIGEST_DB_SIZE=16 +CONFIG_IMA_DIGEST_DB_MEGABYTES=16 CONFIG_IMA_PARSER_BINARY_PATH="/usr/bin/upload_digest_lists" CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y -- 2.20.1

1 16

[PATCH kernel-4.19] PCI/sysfs: Take reference on device to be removed
by Yang Yingliang 23 Jul '21

23 Jul '21

From: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> hulk inclusion category: bugfix bugzilla: NA CVE: NA ------------------------------------ When I do some aer-inject and sysfs remove stress tests, I got the following use-after-free Calltrace: ================================================================== BUG: KASAN: use-after-free in pci_stop_bus_device+0x174/0x178 Read of size 8 at addr fffffc3e2e402218 by task bash/26311 CPU: 38 PID: 26311 Comm: bash Tainted: G W 4.19.105+ #82 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B161.01 06/10/2021 Call trace: dump_backtrace+0x0/0x360 show_stack+0x24/0x30 dump_stack+0x130/0x164 print_address_description+0x68/0x278 kasan_report+0x204/0x330 __asan_report_load8_noabort+0x30/0x40 pci_stop_bus_device+0x174/0x178 pci_stop_and_remove_bus_device_locked+0x24/0x40 remove_store+0x1c8/0x1e0 dev_attr_store+0x60/0x80 sysfs_kf_write+0x104/0x170 kernfs_fop_write+0x23c/0x430 __vfs_write+0xec/0x4e0 vfs_write+0x12c/0x3d0 ksys_write+0xe8/0x208 __arm64_sys_write+0x70/0xa0 el0_svc_common+0x10c/0x450 el0_svc_handler+0x50/0xc0 el0_svc+0x10/0x14 Allocated by task 684: kasan_kmalloc+0xe0/0x190 kmem_cache_alloc_trace+0x110/0x240 pci_alloc_dev+0x4c/0x110 pci_scan_single_device+0x100/0x218 pci_scan_slot+0x8c/0x2d8 pci_scan_child_bus_extend+0x90/0x628 pci_scan_child_bus+0x24/0x30 pci_scan_bridge_extend+0x3b8/0xb28 pci_scan_child_bus_extend+0x350/0x628 pci_rescan_bus+0x24/0x48 pcie_do_fatal_recovery+0x390/0x4b0 handle_error_source+0x124/0x158 aer_isr+0x5a0/0x800 process_one_work+0x598/0x1250 worker_thread+0x384/0xf08 kthread+0x2a4/0x320 ret_from_fork+0x10/0x18 Freed by task 685: __kasan_slab_free+0x120/0x228 kasan_slab_free+0x10/0x18 kfree+0x88/0x218 pci_release_dev+0xb4/0xd8 device_release+0x6c/0x1c0 kobject_put+0x12c/0x400 put_device+0x24/0x30 pci_dev_put+0x24/0x30 handle_error_source+0x12c/0x158 aer_isr+0x5a0/0x800 process_one_work+0x598/0x1250 worker_thread+0x384/0xf08 kthread+0x2a4/0x320 ret_from_fork+0x10/0x18 The buggy address belongs to the object at fffffc3e2e402200 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 24 bytes inside of 4096-byte region [fffffc3e2e402200, fffffc3e2e403200) The buggy address belongs to the page: page:ffff7ff0f8b90000 count:1 mapcount:0 mapping:ffffdc365f016e00 index:0x0 compound_mapcount: 0 flags: 0x6ffffe0000008100(slab|head) raw: 6ffffe0000008100 ffff7f70d83aae00 0000000300000003 ffffdc365f016e00 raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: fffffc3e2e402100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fffffc3e2e402180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >fffffc3e2e402200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ fffffc3e2e402280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffc3e2e402300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== It is caused by the following race condition: CPU0 CPU1 remove_store() aer_isr() device_remove_file_self() handle_error_source() pci_stop_and_remove_bus_device_locked pcie_do_fatal_recovery() (blocked) pci_lock_rescan_remove() #CPU1 acquire the lock pci_stop_and_remove_bus_device() pci_unlock_rescan_remove() #CPU1 release the lock pci_lock_rescan_remove() #CPU0 acquire the lock pci_dev_put() #free pci_dev pci_stop_and_remove_bus_device() pci_stop_bus_device() #use-after-free pci_unlock_rescan_remove() An AER interrupt is triggered on CPU1. CPU1 starts to process it. A work 'aer_isr()' is scheduled on CPU1. It calling into pcie_do_fatal_recovery(), and aquire lock 'pci_rescan_remove_lock'. Before it removes the sysfs corresponding to the error pci device, a sysfs remove operation is executed on CPU0. CPU0 use device_remove_file_self() to remove the sysfs directory and wait for the lock to be released. After CPU1 finish pci_stop_and_remove_bus_device(), it release the lock and free the 'pci_dev' in pci_dev_put(). CPU0 acquire the lock and access the 'pci_dev'. Then a use-after-free is triggered. To fix this issue, we increase the reference count in remove_store() before remove the device and decrease the reference count in the end. Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> Reviewed-by: Hanjun Guo <guohanjun(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/pci/pci-sysfs.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 391a811ba3445..fcc997cbf8438 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -470,7 +470,8 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { unsigned long val; - struct pci_dev *rpdev = to_pci_dev(dev)->rpdev; + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_dev *rpdev = pdev->rpdev; if (kstrtoul(buf, 0, &val) < 0) return -EINVAL; @@ -490,8 +491,12 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, if (rpdev) pci_dev_get(rpdev); - if (val && device_remove_file_self(dev, attr)) - pci_stop_and_remove_bus_device_locked(to_pci_dev(dev)); + if (val) { + pci_dev_get(pdev); + if (device_remove_file_self(dev, attr)) + pci_stop_and_remove_bus_device_locked(pdev); + pci_dev_put(pdev); + } if (rpdev) { clear_bit(0, &rpdev->slot_being_removed_rescanned); -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] seq_file: disallow extremely large seq buffer allocations
by Yang Yingliang 23 Jul '21

23 Jul '21

From: Eric Sandeen <sandeen(a)redhat.com> stable inclusion from linux-4.19.198 commit 6de9f0bf7cacc772a618699f9ed5c9f6fca58a1d CVE: CVE-2021-33909 -------------------------------- commit 8cae8cd89f05f6de223d63e6d15e31c8ba9cf53b upstream. There is no reasonable need for a buffer larger than this, and it avoids int overflow pitfalls. Fixes: 058504edd026 ("fs/seq_file: fallback to vmalloc allocation") Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk> Reported-by: Qualys Security Advisory <qsa(a)qualys.com> Signed-off-by: Eric Sandeen <sandeen(a)redhat.com> Cc: stable(a)kernel.org Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/seq_file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/seq_file.c b/fs/seq_file.c index 05e58b56f6202..e11f62b29be87 100644 --- a/fs/seq_file.c +++ b/fs/seq_file.c @@ -29,6 +29,9 @@ static void seq_set_overflow(struct seq_file *m) static void *seq_buf_alloc(unsigned long size) { + if (unlikely(size > MAX_RW_COUNT)) + return NULL; + return kvmalloc(size, GFP_KERNEL_ACCOUNT); } -- 2.25.1

1 0

[PATCH kernel-4.19 000/412] backport linux-4.19.198
by Yang Yingliang 23 Jul '21

23 Jul '21

Al Cooper (1): mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode Al Viro (1): iov_iter_fault_in_readable() should do nothing in xarray case Alex Williamson (1): vfio/pci: Handle concurrent vma faults Alexander Aring (2): fs: dlm: cancel work sync othercon fs: dlm: fix memory leak when fenced Alexander Larkin (1): Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl Alexander Shishkin (1): intel_th: Wait until port is in reset before programming it Alvin Šipraga (2): brcmfmac: fix setting of station info chains bitmask brcmfmac: correctly report average RSSI in station info Andrew Gabbasov (1): usb: gadget: f_fs: Fix setting of device and driver data cross-references Andy Shevchenko (5): net: mvpp2: Put fwnode in error case during ->probe() net: pch_gbe: Propagate error from devm_gpio_request_one() eeprom: idt_89hpesx: Put fwnode in matching case during ->probe() eeprom: idt_89hpesx: Restore printing the unsupported fwnode name net: pch_gbe: Use proper accessors to BE data in pch_ptp_match() Anirudh Rayabharam (2): ext4: fix kernel infoleak via ext4_extent_header media: pvrusb2: fix warning in pvr2_i2c_core_done Ard Biesheuvel (1): crypto: shash - avoid comparing pointers to exported functions under CFI Arnaldo Carvalho de Melo (1): perf llvm: Return -ENOMEM when asprintf() fails Arnd Bergmann (4): ia64: mca_drv: fix incorrect array size calculation mwifiex: re-fix for unaligned accesses media: subdev: disallow ioctl for saa6588/davinci mips: always link byteswap helpers into decompressor Arturo Giusti (1): udf: Fix NULL pointer dereference in udf_symlink function Aswath Govindraju (2): ARM: dts: am335x: align ti,pindir-d0-out-d1-in property with dt-shema ARM: dts: am437x: align ti,pindir-d0-out-d1-in property with dt-shema Athira Rajeev (1): selftests/powerpc: Fix "no_handler" EBB selftest Axel Lin (1): regulator: da9052: Ensure enough delay time for .set_voltage_time_sel Bean Huo (1): mmc: block: Disable CMDQ on the ioctl path Benjamin Drung (1): media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K Benjamin Herrenschmidt (1): powerpc/boot: Fixup device-tree on little endian Bibo Mao (1): hugetlb: clear huge pte during flush function on mips platform Bixuan Cui (3): crypto: nx - add missing MODULE_DEVICE_TABLE EDAC/ti: Add missing MODULE_DEVICE_TABLE power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE Bob Pearson (1): RDMA/rxe: Fix qp reference counting for atomic ops Bryan O'Donoghue (1): wcn36xx: Move hal_buf allocation to devm_kmalloc in probe Chao Yu (1): f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfs Charles Keepax (1): spi: Make of_register_spi_device also set the fwnode Chris Chiu (1): ACPI: EC: Make more Asus laptops use ECDT _GPE Christian Löhle (1): mmc: core: Allow UHS-I voltage switch for SDSC cards if supported Christoph Niedermaier (3): ARM: dts: imx6q-dhcom: Fix ethernet reset time properties ARM: dts: imx6q-dhcom: Fix ethernet plugin detection problems ARM: dts: imx6q-dhcom: Add gpios pinctrl for i2c bus recovery Christophe JAILLET (9): crypto: ccp - Fix a resource leak in an error handling path media: rc: i2c: Fix an error message brcmsmac: mac80211_if: Fix a resource leak in an error handling path tty: nozomi: Fix a resource leak in an error handling function tty: nozomi: Fix the error handling path of 'nozomi_card_init()' phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe() leds: ktd2692: Fix an error handling path tty: serial: 8250: serial_cs: Fix a memory leak in error handling path scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe() Christophe Leroy (1): btrfs: disable build on platforms having page size 256K Chung-Chiang Cheng (1): configfs: fix memleak in configfs_release_bin_file Codrin Ciubotariu (1): ASoC: atmel-i2s: Fix usage of capture and playback at the same time Colin Ian King (2): drm: qxl: ensure surf.data is ininitialized fsi: core: Fix return of error values on failures Corentin Labbe (1): crypto: ixp4xx - dma_unmap the correct address Daehwan Jung (1): ALSA: usb-audio: fix rate on Ozone Z90 USB headset Dan Carpenter (5): ocfs2: fix snprintf() checking staging: gdm724x: check for buffer overflow in gdm_lte_multi_sdu_pkt() staging: gdm724x: check for overflow in gdm_lte_netif_rx() rtc: fix snprintf() checking in is_rtc_hctosys() scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg() Daniel Vetter (1): drm/msm/mdp4: Fix modifier support enabling Dany Madden (1): Revert "ibmvnic: remove duplicate napi_schedule call in open function" Dave Hansen (1): selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random David Sterba (2): btrfs: clear defrag status of a root if starting transaction fails btrfs: clear log tree recovering status if starting transaction fails Desmond Cheong Zhi Xi (1): ntfs: fix validity check for file name attribute Dillon Min (1): media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx Dimitri John Ledkov (1): lib/decompress_unlz4.c: correctly handle zero-padding around initrds. Dinghao Liu (1): i40e: Fix error handling in i40e_vsi_open Dmitry Osipenko (2): clk: tegra: Ensure that PLLU configuration is applied properly ASoC: tegra: Set driver_name=tegra for all machine drivers Dmitry Torokhov (2): HID: do not use down_interruptible() when unbinding devices i2c: core: Disable client irq on reboot/shutdown Dmytro Laktyushkin (1): drm/amd/display: fix use_max_lb flag for 420 pixel formats Dongliang Mu (3): media: dvd_usb: memory leak in cinergyt2_fe_attach ieee802154: hwsim: Fix possible memory leak in hwsim_subscribe_all_others ieee802154: hwsim: Fix memory leak in hwsim_add_one Eddie James (1): fsi: scom: Reset the FSI2PIB engine for any error Eric Biggers (1): fscrypt: don't ignore minor_hash when hash is 0 Eric Dumazet (5): pkt_sched: sch_qfq: fix qfq_change_class() error path vxlan: add missing rcu_read_lock() in neigh_reduce() ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl() ipv6: exthdrs: do not blindly use init_net ipv6: fix out-of-bound access in ip6_parse_tlv() Eric Sandeen (1): seq_file: disallow extremely large seq buffer allocations Erik Kaneda (1): ACPICA: Fix memory leak caused by _CID repair function Evgeny Novikov (1): media: st-hva: Fix potential NULL pointer dereferences Fabio Aiuto (1): staging: rtl8723bs: fix macro value for 2.4Ghz only device Filipe Manana (1): btrfs: send: fix invalid path for unlink operations after parent orphanization Gao Xiang (1): nfs: fix acl memory leak of posix_acl_create() Geert Uytterhoeven (2): of: Fix truncation of memory sizes on 32-bit platforms ARM: dts: r8a7779, marzen: Fix DU clock names Geoff Levand (1): powerpc/ps3: Add dma_mask to ps3_dma_region Gerd Rausch (1): RDMA/cma: Fix rdma_resolve_route() memory leak Greg Kroah-Hartman (1): Linux 4.19.198 Guchun Chen (1): drm/amd/display: fix incorrrect valid irq check Guenter Roeck (2): hwmon: (max31722) Remove non-standard ACPI device IDs hwmon: (max31790) Fix fan speed reporting for fan7..12 Gustavo A. R. Silva (2): media: siano: Fix out-of-bounds warnings in smscore_load_firmware_family2() wireless: wext-spy: Fix out-of-bounds warning Hanjun Guo (1): ACPI: bus: Call kobject_put() in acpi_init() error path Hannes Reinecke (1): scsi: scsi_dh_alua: Check for negative result value Hannu Hartikainen (1): USB: cdc-acm: blacklist Heimann USB Appset device Hans Verkuil (1): media: cobalt: fix race condition in setting HPD Hans de Goede (1): ACPI: video: Add quirk for the Dell Vostro 3350 Herbert Xu (1): crypto: nx - Fix RCU warning in nx842_OF_upd_status Huang Pei (1): MIPS: add PMD table accounting into MIPS'pmd_alloc_one Igor Matheus Andrade Torrente (1): media: em28xx: Fix possible memory leak of em28xx struct Jack Xu (2): crypto: qat - check return code of qat_hal_rd_rel_reg() crypto: qat - remove unused macro in FW loader Jack Zhang (1): drm/amd/amdgpu/sriov disable all ip hw status by default Jakub Kicinski (1): net: ip: avoid OOM kills with large UDP sends over loopback James Smart (2): scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs Jan Kiszka (1): watchdog: iTCO_wdt: Account for rebooting on second timeout Jason Gerecke (1): HID: wacom: Correct base usage for capacitive ExpressKey status bits Jay Fang (2): spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf' spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages() Jeff Layton (1): ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty Jesse Brandeburg (1): e100: handle eeprom as little endian Jiajun Cao (1): ALSA: hda: Add IRQ check for platform_get_irq() Jian Shen (1): net: fix mistake path for netdev_features_strings Jian-Hong Pan (1): net: bcmgenet: Fix attaching to PYH failed on RPi 4B Jiapeng Chong (3): platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard() RDMA/cxgb4: Fix missing error code in create_qp() fs/jfs: Fix missing error code in lmLogInit() Jing Xiangfeng (2): usb: typec: Add the missed altmode_id_remove() in typec_register_altmode() drm/radeon: Add the missed drm_gem_object_put() in radeon_user_framebuffer_create() Joachim Fenkes (2): fsi/sbefifo: Clean up correct FIFO when receiving reset request from SBE fsi/sbefifo: Fix reset timeout Joe Thornber (1): dm space maps: don't reset space map allocation cursor when committing Johan Hovold (6): Input: usbtouchscreen - fix control-request directions media: gspca/gl860: fix zero-length control requests mmc: vub3000: fix control-request direction media: dtv5100: fix control-request directions media: gspca/sq905: fix control-request direction media: gspca/sunplus: fix zero-length control requests Johannes Berg (2): iwlwifi: mvm: don't change band on bound PHY contexts iwlwifi: pcie: free IML DMA memory allocation John Garry (1): scsi: core: Cap scsi_host cmd_per_lun at can_queue Jonathan Cameron (21): iio: accel: bma180: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: accel: bma220: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: accel: hid: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: accel: kxcjk-1013: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: accel: stk8312: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: accel: stk8ba50: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: adc: ti-ads1015: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: adc: vf610: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: gyro: bmg160: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: humidity: am2315: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: prox: srf08: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: prox: pulsed-light: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: prox: as3935: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: light: isl29125: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: light: tcs3414: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: light: tcs3472: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: potentiostat: lmp91000: Fix alignment of buffer in iio_push_to_buffers_with_timestamp() iio: adc: hx711: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: adc: mxs-lradc: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: adc: ti-ads8688: Fix alignment of buffer in iio_push_to_buffers_with_timestamp() iio: prox: isl29501: Fix buffer alignment in iio_push_to_buffers_with_timestamp() Josef Bacik (2): btrfs: fix error handling in __btrfs_update_delayed_inode btrfs: abort transaction if we fail to update the delayed inode Kai-Heng Feng (1): Bluetooth: Shutdown controller after workqueues are flushed or cancelled Kamal Heib (1): RDMA/rxe: Fix failure during driver load Konstantin Kharlamov (1): PCI: Leave Apple Thunderbolt controllers on for s2idle or standby Krzysztof Kozlowski (8): power: supply: max17042: Do not enforce (incorrect) interrupt trigger type reset: a10sr: add missing of_match_table reference ARM: dts: exynos: fix PWM LED max brightness on Odroid XU/XU3 ARM: dts: exynos: fix PWM LED max brightness on Odroid HC1 ARM: dts: exynos: fix PWM LED max brightness on Odroid XU4 memory: atmel-ebi: add missing of_node_put for loop iteration memory: fsl_ifc: fix leak of IO mapping on probe failure memory: fsl_ifc: fix leak of private memory on probe failure Krzysztof Wilczyński (2): ACPI: sysfs: Fix a buffer overrun problem with description_show() PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrun Kuninori Morimoto (2): ASoC: rsnd: tidyup loop on rsnd_adg_clk_query() clk: renesas: r8a77995: Add ZA2 clock Lai Jiangshan (1): KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() Lee Gibson (1): wl1251: Fix possible buffer overflow in wl1251_cmd_scan Leon Romanovsky (2): RDMA/mlx5: Don't add slave port to unaffiliated list RDMA/mlx5: Don't access NULL-cleared mpi pointer Liguang Zhang (1): ACPI: AMBA: Fix resource name in /proc/iomem Linus Walleij (2): power: supply: ab8500: Fix an old bug power: supply: ab8500: Avoid NULL pointers Linyu Yuan (1): usb: gadget: eem: fix echo command packet response issue Liu Shixin (1): netlabel: Fix memory leak in netlbl_mgmt_add_common Liwei Song (1): ice: set the value of global config lock timeout longer Longpeng(Mike) (1): vsock: notify server to shutdown when client has pending signal Ludovic Desroches (1): ARM: dts: at91: sama5d4: fix pinctrl muxing Luiz Augusto von Dentz (2): Bluetooth: mgmt: Fix slab-out-of-bounds in tlv_data_is_valid Bluetooth: Fix handling of HCI_LE_Advertising_Set_Terminated event Luiz Sampaio (1): w1: ds2438: fixing bug that would always get page0 Lv Yunlong (4): media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release media: exynos4-is: Fix a use after free in isp_video_release ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe misc/libmasm/module: Fix two use after free in ibmasm_init_one Maciej W. Rozycki (1): serial: 8250: Actually allow UPF_MAGIC_MULTIPLIER baud rates Maciej Żenczykowski (1): bpf: Do not change gso_size during bpf_skb_change_proto() Marc Kleine-Budde (1): iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too Marcelo Ricardo Leitner (2): sctp: validate from_addr_param return sctp: add size validation when walking chunks Marek Szyprowski (1): extcon: max8997: Add missing modalias string Marek Vasut (1): rsi: Assign beacon rate settings to the correct rate_info descriptor field Mario Limonciello (1): ACPI: processor idle: Fix up C-state latency if not ordered Martin Fuzzey (2): rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path rsi: fix AP mode with WPA failure due to encrypted EAPOL Martin Fäcknitz (1): MIPS: vdso: Invalid GIC access through VDSO Mateusz Palczewski (1): i40e: Fix autoneg disabling for non-10GBaseT links Mauro Carvalho Chehab (3): media: dvb_net: avoid speculation from net slot media: siano: fix device register error path media: s5p_cec: decrement usage count if disabled Maximilian Luz (1): pinctrl/amd: Add device HID for new AMD GPIO controller Miao Wang (1): net/ipv4: swap flow ports when validating source Miaohe Lin (1): mm/huge_memory.c: don't discard hugepage if other processes are mapping it Michael Buesch (1): ssb: sdio: Don't overwrite const buffer if block_write fails Michael Ellerman (1): powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() Michael S. Tsirkin (1): virtio_net: move tx vq operation under tx queue lock Mike Christie (4): scsi: iscsi: Add iscsi_cls_conn refcount helpers scsi: iscsi: Fix conn use after free during resets scsi: iscsi: Fix shost->max_id use scsi: qedi: Fix null ref during abort handling Mike Marshall (1): orangefs: fix orangefs df output. Miklos Szeredi (2): fuse: check connected before queueing on fpq->io fuse: reject internal errno Mimi Zohar (1): evm: fix writing <securityfs>/evm overflow Minas Harutyunyan (1): usb: dwc3: Fix debugfs creation flow Minchan Kim (1): selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC Miquel Raynal (1): serial: mvebu-uart: clarify the baud rate derivation Mirko Vogt (1): spi: spi-sun6i: Fix chipselect/clock bug Muchun Song (1): writeback: fix obtain a reference to a freeing memcg css Nathan Chancellor (3): powerpc/barrier: Avoid collision with clang's __lwsync macro qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute hexagon: use common DISCARDS macro Nicholas Piggin (1): powerpc: Offline CPU in stop_this_cpu() Nick Desaulniers (2): MIPS: set mips32r5 for virt extensions ARM: 9087/1: kprobes: test-thumb: fix for LLVM_IAS=1 Nikolay Aleksandrov (1): net: bridge: multicast: fix PIM hello router port marking race Nuno Sa (1): iio: adis_buffer: do not return ints in irq handlers Odin Ugedal (1): sched/fair: Fix ascii art by relpacing tabs Oliver Hartkopp (1): can: gw: synchronize rcu operations before removing gw job entry Oliver Lang (2): iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR iio: ltr501: ltr501_read_ps(): add missing endianness conversion Ondrej Zary (2): serial_cs: Add Option International GSM-Ready 56K/ISDN modem serial_cs: remove wrong GLOBETROTTER.cis entry Pablo Neira Ayuso (3): netfilter: nft_exthdr: check for IPv6 packet before further processing netfilter: nft_osf: check for TCP packet before further processing netfilter: nft_tproxy: restrict support to TCP and UDP transport protocols Pali Rohár (6): ath9k: Fix kernel NULL pointer dereference during ath_reset_internal() serial: mvebu-uart: correctly calculate minimal possible baudrate arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART serial: mvebu-uart: fix calculation of clock divisor PCI: aardvark: Fix checking for PIO Non-posted Request PCI: aardvark: Fix kernel panic during PIO transfer Pan Dong (1): ext4: fix avefreec in find_group_orlov Pascal Terjan (1): rtl8xxxu: Fix device info for RTL8192EU devices Paul Burton (2): tracing: Simplify & fix saved_tgids logic tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT Paul E. McKenney (1): clocksource: Retry clock read if long delays detected Pavel Skripkin (10): media: dvb-usb: fix wrong definition net: can: ems_usb: fix use-after-free in ems_usb_disconnect() media: cpia2: fix memory leak in cpia2_usb_probe net: ethernet: aeroflex: fix UAF in greth_of_remove net: ethernet: ezchip: fix UAF in nps_enet_remove net: ethernet: ezchip: fix error handling net: sched: fix warning in tcindex_alloc_perfect_hash reiserfs: add check for invalid 1st journal block media: zr364xx: fix memory leak in zr364xx_start_readpipe jfs: fix GPF in diFree Peter Robinson (1): gpio: pca953x: Add support for the On Semi pca9655 Petr Pavlu (1): ipmi/watchdog: Stop watchdog timer when the current action is 'none' Philipp Zabel (1): reset: bail if try_module_get() fails Ping-Ke Shih (1): mac80211: remove iwlwifi specific workaround NDPs of null_response Po-Hsu Lin (1): selftests: timers: rtcpie: skip test if default RTC device does not exist Quat Le (1): scsi: core: Retry I/O for Notify (Enable Spinup) Required error Radim Pavlik (1): pinctrl: mcp23s08: fix race condition in irq handler Rafał Miłecki (1): ARM: dts: BCM5301X: Fixup SPI binding Randy Dunlap (5): media: I2C: change 'RST' to "RSET" to fix multiple build errors wireless: carl9170: fix LEDS build errors & warnings scsi: FlashPoint: Rename si_flags field s390: appldata depends on PROC_SYSCTL mips: disable branch profiling in boot/decompress.o Remi Pommarel (1): PCI: aardvark: Don't rely on jiffies while holding spinlock Richard Fitzgerald (4): lib: vsprintf: Fix handling of number field widths in vsscanf random32: Fix implicit truncation warning in prandom_seed_state() ACPI: tables: Add custom DSDT file as makefile prerequisite ASoC: cs42l42: Correct definition of CS42L42_ADC_PDN_MASK Roberto Sassu (2): evm: Execute evm_inode_init_security() only when an HMAC key is loaded evm: Refuse EVM_ALLOW_METADATA_WRITES only if an HMAC key is loaded Ruslan Bilovol (1): usb: gadget: f_hid: fix endianness issue with descriptors Sai Prakash Ranjan (1): coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer() Samuel Holland (1): clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround Sandor Bodo-Merle (2): PCI: iproc: Fix multi-MSI base vector number allocation PCI: iproc: Support multi-MSI only on uniprocessor kernel Sean Christopherson (1): KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled Sean Young (1): media, bpf: Do not copy more entries than user space requested Sebastian Andrzej Siewior (1): net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sergey Shtylyov (4): sata_highbank: fix deferred probing pata_rb532_cf: fix deferred probing pata_octeon_cf: avoid WARN_ON() in ata_host_activate() pata_ep93xx: fix deferred probing Sergio Paracuellos (1): staging: mt7621-dts: fix pci address for PCI memory range Sherry Sun (1): tty: serial: fsl_lpuart: fix the potential risk of division or modulo by zero Srinivas Neeli (1): gpio: zynq: Check return value of pm_runtime_get_sync Steffen Klassert (1): xfrm: Fix error reporting in xfrm_state_construct. Stephan Gerhold (2): extcon: sm5502: Drop invalid register write in sm5502_reg_data power: supply: rt5033_battery: Fix device tree enumeration Stephane Grosjean (1): can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path Stephen Brennan (1): ext4: use ext4_grp_locked_error in mb_find_extent Steve Longerbeam (1): media: imx-csi: Skip first few frames from a BT.656 source Steven Rostedt (VMware) (3): tracing/histograms: Fix parsing of "sym-offset" modifier tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing tracing: Do not reference char * as a string in histograms Sukadev Bhattiprolu (1): ibmvnic: free tx_pool if tso_pool alloc fails Takashi Iwai (2): ALSA: usb-audio: Fix OOB access at proc output ALSA: sb: Fix potential double-free of CSP mixer elements Takashi Sakamoto (2): Revert "ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro" ALSA: bebob: add support for ToneWeal FW66 Tao Ren (1): watchdog: aspeed: fix hardware timeout calculation Tetsuo Handa (1): smackfs: restrict bytes count in smk_set_cipso() Thomas Gleixner (3): cpu/hotplug: Cure the cpusets trainwreck x86/fpu: Return proper error codes from user access functions x86/fpu: Limit xstate copy size in xstateregs_set() Thomas Zimmermann (2): drm/mxsfb: Don't select DRM_KMS_FB_HELPER drm/zte: Don't select DRM_KMS_FB_HELPER Tian Tao (1): spi: omap-100k: Fix the length judgment problem Tim Jiang (1): Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc. Timo Sigurdsson (1): ata: ahci_sunxi: Disable DIPM Tony Lindgren (1): wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP Trond Myklebust (3): NFS: nfs_find_open_context() may only select open files NFSv4: Initialise connection to the server in nfs4_alloc_client() NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times Tyrel Datwyler (1): scsi: core: Fix bad pointer dereference when ehandler kthread is invalid Uwe Kleine-König (3): backlight: lm3630a: Fix return code of .update_status() callback pwm: spear: Don't modify HW state in .remove callback pwm: tegra: Don't modify HW state in .remove callback Vadim Fedorenko (1): net: lwtunnel: handle MTU calculation in forwading Valentin Vidic (1): s390/sclp_vt220: fix console name to match device Valentine Barshak (1): arm64: dts: renesas: v3msk: Fix memory size Vineeth Vijayan (1): s390/cio: dont call css_wait_for_slow_path() inside a lock Wang Hai (1): samples/bpf: Fix the error return code of xdp_redirect's main() Willy Tarreau (1): ipv6: use prandom_u32() for ID generation Wolfram Sang (1): mmc: core: clear flags before allowing to retune Xianting Tian (1): virtio_net: Remove BUG() to avoid machine dead Xiao Yang (1): RDMA/rxe: Don't overwrite errno from ib_umem_get() Xie Yongji (4): drm/virtio: Fix double free on probe failure virtio-blk: Fix memory leak among suspend/resume procedure virtio_net: Fix error handling in virtnet_restore() virtio_console: Assure used length from device is limited Yang Jihong (1): arm_pmu: Fix write counter incorrect in ARMv7 big-endian mode Yang Li (1): ath10k: Fix an error code in ath10k_add_interface() Yang Yingliang (10): ext4: return error code when ext4_fill_flex_info() fails drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write() ASoC: hisilicon: fix missing clk_disable_unprepare() on error in hi6210_i2s_startup() mtd: rawnand: marvell: add missing clk_disable_unprepare() on error in marvell_nfc_resume() net: bcmgenet: check return value after calling platform_get_resource() net: mvpp2: check return value after calling platform_get_resource() net: micrel: check return value after calling platform_get_resource() fjes: check return value after calling platform_get_resource() ALSA: ppc: fix error return code in snd_pmac_probe() usb: gadget: hid: fix error return code in hid_bind() Yizhuo Zhai (1): Input: hideep - fix the uninitialized use in hideep_nvm_unlock() Yoshihiro Shimoda (1): serial: sh-sci: Stop dmaengine transfer in sci_stop_tx() Yu Kuai (1): char: pcmcia: error out if 'num_bytes_read' is greater than 4 in set_protocol() Yu Liu (1): Bluetooth: Fix the HCI to MGMT status conversion table YueHaibing (1): hv_utils: Fix passing zero to 'PTR_ERR' warning Yufen Yu (1): ALSA: ac97: fix PM reference leak in ac97_bus_remove() Yun Zhou (2): seq_buf: Make trace_seq_putmem_hex() support data longer than 8 seq_buf: Fix overflow in seq_buf_putmem_hex() Zhang Yi (2): ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit ext4: remove check for zero nr_to_scan in ext4_es_scan() Zhangjiantao (Kirin, nanjing) (1): xhci: solve a double free problem while doing s4 Zhen Lei (13): crypto: ux500 - Fix error return code in hash_hw_final() media: tc358743: Fix error return code in tc358743_probe_of() mmc: usdhi6rol0: fix error return code in usdhi6_probe() ehea: fix error return code in ehea_restart_qps() ssb: Fix error return code in ssb_bus_scan() Input: hil_kbd - fix error return code in hil_dev_connect() visorbus: fix error return code in visorchipset_init() scsi: mpt3sas: Fix error return value in _scsih_expander_add() leds: as3645a: Fix error return code in as3645a_parse_node() ASoC: soc-core: Fix the error return code in snd_soc_of_parse_audio_routing() um: fix error return code in slip_open() um: fix error return code in winch_tramp() ALSA: isa: Fix error return code in snd_cmi8330_probe() Zheyu Ma (4): media: bt8xx: Fix a missing check bug in bt878_probe mmc: via-sdmmc: add a check against NULL pointer dereference atm: nicstar: use 'dma_free_coherent' instead of 'kfree' atm: nicstar: register the interrupt handler in the right place Zhihao Cheng (2): tools/bpftool: Fix error return code in do_batch() ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode Zou Wei (13): regulator: uniphier: Add missing MODULE_DEVICE_TABLE atm: iphase: fix possible use-after-free in ia_module_exit() mISDN: fix possible use-after-free in HFC_cleanup() atm: nicstar: Fix possible use-after-free in nicstar_cleanup() cw1200: add missing MODULE_DEVICE_TABLE pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq() mfd: da9052/stmpe: Add and modify MODULE_DEVICE_TABLE watchdog: Fix possible use-after-free in wdt_startup() watchdog: sc520_wdt: Fix possible use-after-free in wdt_turnoff() watchdog: Fix possible use-after-free by calling del_timer_sync() PCI: tegra: Add missing MODULE_DEVICE_TABLE power: supply: charger-manager: add missing MODULE_DEVICE_TABLE power: supply: ab8500: add missing MODULE_DEVICE_TABLE frank zago (1): iio: light: tcs3472: do not free unallocated IRQ Íñigo Huguet (2): sfc: avoid double pci_remove of VFs sfc: error code if SRIOV cannot be disabled Documentation/ABI/testing/evm | 26 +++- .../admin-guide/kernel-parameters.txt | 6 + Makefile | 2 +- arch/arm/boot/dts/am335x-cm-t335.dts | 2 +- arch/arm/boot/dts/am43x-epos-evm.dts | 4 +- arch/arm/boot/dts/bcm5301x.dtsi | 18 +-- arch/arm/boot/dts/exynos5422-odroidhc1.dts | 2 +- arch/arm/boot/dts/exynos5422-odroidxu4.dts | 2 +- .../boot/dts/exynos54xx-odroidxu-leds.dtsi | 4 +- arch/arm/boot/dts/imx6q-dhcom-som.dtsi | 41 +++++- arch/arm/boot/dts/r8a7779-marzen.dts | 2 +- arch/arm/boot/dts/r8a7779.dtsi | 1 + arch/arm/boot/dts/sama5d4.dtsi | 2 +- arch/arm/kernel/perf_event_v7.c | 4 +- arch/arm/probes/kprobes/test-thumb.c | 10 +- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- .../arm64/boot/dts/renesas/r8a77970-v3msk.dts | 2 +- arch/hexagon/kernel/vmlinux.lds.S | 7 +- arch/ia64/kernel/mca_drv.c | 2 +- arch/mips/boot/compressed/Makefile | 4 +- arch/mips/boot/compressed/decompress.c | 2 + arch/mips/include/asm/hugetlb.h | 8 +- arch/mips/include/asm/mipsregs.h | 8 +- arch/mips/include/asm/pgalloc.h | 10 +- arch/mips/vdso/vdso.h | 2 +- arch/powerpc/boot/devtree.c | 59 +++++---- arch/powerpc/boot/ns16550.c | 9 +- arch/powerpc/include/asm/barrier.h | 2 + arch/powerpc/include/asm/ps3.h | 2 + arch/powerpc/kernel/smp.c | 11 ++ arch/powerpc/kernel/stacktrace.c | 27 +++- arch/powerpc/platforms/ps3/mm.c | 12 ++ arch/s390/Kconfig | 2 +- arch/s390/kernel/setup.c | 2 +- arch/um/drivers/chan_user.c | 3 +- arch/um/drivers/slip_user.c | 3 +- arch/x86/include/asm/fpu/internal.h | 19 ++- arch/x86/kernel/fpu/regset.c | 2 +- arch/x86/kvm/cpuid.c | 8 +- arch/x86/kvm/x86.c | 2 + crypto/shash.c | 18 ++- drivers/acpi/Makefile | 5 + drivers/acpi/acpi_amba.c | 1 + drivers/acpi/acpi_video.c | 9 ++ drivers/acpi/acpica/nsrepair2.c | 7 + drivers/acpi/bus.c | 1 + drivers/acpi/device_sysfs.c | 2 +- drivers/acpi/ec.c | 16 +++ drivers/acpi/processor_idle.c | 40 ++++++ drivers/ata/ahci_sunxi.c | 2 +- drivers/ata/pata_ep93xx.c | 2 +- drivers/ata/pata_octeon_cf.c | 5 +- drivers/ata/pata_rb532_cf.c | 6 +- drivers/ata/sata_highbank.c | 6 +- drivers/atm/iphase.c | 2 +- drivers/atm/nicstar.c | 26 ++-- drivers/block/virtio_blk.c | 2 + drivers/bluetooth/btusb.c | 5 + drivers/char/ipmi/ipmi_watchdog.c | 22 ++-- drivers/char/pcmcia/cm4000_cs.c | 4 + drivers/char/virtio_console.c | 4 +- drivers/clk/renesas/r8a77995-cpg-mssr.c | 1 + drivers/clk/tegra/clk-pll.c | 6 +- drivers/clocksource/arm_arch_timer.c | 2 +- drivers/crypto/ccp/sp-pci.c | 6 +- drivers/crypto/ixp4xx_crypto.c | 2 +- drivers/crypto/nx/nx-842-pseries.c | 9 +- drivers/crypto/qat/qat_common/qat_hal.c | 6 +- drivers/crypto/qat/qat_common/qat_uclo.c | 1 - drivers/crypto/ux500/hash/hash_core.c | 1 + drivers/edac/ti_edac.c | 1 + drivers/extcon/extcon-max8997.c | 1 + drivers/extcon/extcon-sm5502.c | 1 - drivers/firmware/qemu_fw_cfg.c | 8 +- drivers/fsi/fsi-core.c | 4 +- drivers/fsi/fsi-sbefifo.c | 10 +- drivers/fsi/fsi-scom.c | 16 ++- drivers/gpio/gpio-pca953x.c | 1 + drivers/gpio/gpio-zynq.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- .../drm/amd/display/dc/dcn10/dcn10_dpp_dscl.c | 9 +- drivers/gpu/drm/amd/display/dc/irq_types.h | 2 +- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 2 - drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 8 +- drivers/gpu/drm/mxsfb/Kconfig | 1 - drivers/gpu/drm/qxl/qxl_dumb.c | 2 + drivers/gpu/drm/radeon/radeon_display.c | 1 + drivers/gpu/drm/rockchip/cdn-dp-core.c | 1 + drivers/gpu/drm/virtio/virtgpu_kms.c | 1 + drivers/gpu/drm/zte/Kconfig | 1 - drivers/hid/hid-core.c | 10 +- drivers/hid/wacom_wac.h | 2 +- drivers/hv/hv_util.c | 4 +- drivers/hwmon/max31722.c | 9 -- drivers/hwmon/max31790.c | 2 +- .../hwtracing/coresight/coresight-tmc-etf.c | 2 +- drivers/hwtracing/intel_th/core.c | 17 +++ drivers/hwtracing/intel_th/gth.c | 16 +++ drivers/hwtracing/intel_th/intel_th.h | 3 + drivers/i2c/i2c-core-base.c | 3 + drivers/iio/accel/bma180.c | 10 +- drivers/iio/accel/bma220_spi.c | 10 +- drivers/iio/accel/hid-sensor-accel-3d.c | 13 +- drivers/iio/accel/kxcjk-1013.c | 24 ++-- drivers/iio/accel/stk8312.c | 12 +- drivers/iio/accel/stk8ba50.c | 17 ++- drivers/iio/adc/hx711.c | 4 +- drivers/iio/adc/mxs-lradc-adc.c | 3 +- drivers/iio/adc/ti-ads1015.c | 12 +- drivers/iio/adc/ti-ads8688.c | 3 +- drivers/iio/adc/vf610_adc.c | 10 +- drivers/iio/gyro/bmg160_core.c | 10 +- drivers/iio/humidity/am2315.c | 16 ++- drivers/iio/imu/adis_buffer.c | 3 - drivers/iio/light/isl29125.c | 10 +- drivers/iio/light/ltr501.c | 15 ++- drivers/iio/light/tcs3414.c | 10 +- drivers/iio/light/tcs3472.c | 16 ++- drivers/iio/potentiostat/lmp91000.c | 4 +- drivers/iio/proximity/as3935.c | 10 +- drivers/iio/proximity/isl29501.c | 2 +- .../iio/proximity/pulsedlight-lidar-lite-v2.c | 10 +- drivers/iio/proximity/srf08.c | 14 +- drivers/infiniband/core/cma.c | 3 +- drivers/infiniband/hw/cxgb4/qp.c | 1 + drivers/infiniband/hw/mlx5/main.c | 4 +- drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_net.c | 10 +- drivers/infiniband/sw/rxe/rxe_qp.c | 1 - drivers/infiniband/sw/rxe/rxe_resp.c | 2 - drivers/input/joydev.c | 2 +- drivers/input/keyboard/hil_kbd.c | 1 + drivers/input/touchscreen/hideep.c | 13 +- drivers/input/touchscreen/usbtouchscreen.c | 8 +- drivers/ipack/carriers/tpci200.c | 5 +- drivers/isdn/hardware/mISDN/hfcpci.c | 2 +- drivers/leds/leds-as3645a.c | 1 + drivers/leds/leds-ktd2692.c | 27 ++-- .../md/persistent-data/dm-space-map-disk.c | 9 +- .../persistent-data/dm-space-map-metadata.c | 9 +- drivers/media/common/siano/smscoreapi.c | 22 ++-- drivers/media/common/siano/smscoreapi.h | 4 +- drivers/media/common/siano/smsdvb-main.c | 4 + drivers/media/dvb-core/dvb_net.c | 25 +++- drivers/media/i2c/ir-kbd-i2c.c | 4 +- drivers/media/i2c/s5c73m3/s5c73m3-core.c | 6 +- drivers/media/i2c/s5c73m3/s5c73m3.h | 2 +- drivers/media/i2c/s5k4ecgx.c | 10 +- drivers/media/i2c/s5k5baf.c | 6 +- drivers/media/i2c/s5k6aa.c | 10 +- drivers/media/i2c/saa6588.c | 4 +- drivers/media/i2c/tc358743.c | 1 + drivers/media/pci/bt8xx/bt878.c | 3 + drivers/media/pci/bt8xx/bttv-driver.c | 6 +- drivers/media/pci/cobalt/cobalt-driver.c | 1 + drivers/media/pci/cobalt/cobalt-driver.h | 7 +- drivers/media/pci/saa7134/saa7134-video.c | 6 +- drivers/media/platform/davinci/vpbe_display.c | 2 +- drivers/media/platform/davinci/vpbe_venc.c | 6 +- .../platform/exynos4-is/fimc-isp-video.c | 7 +- drivers/media/platform/s5p-cec/s5p_cec.c | 2 +- drivers/media/platform/s5p-g2d/g2d.c | 3 + drivers/media/platform/sti/hva/hva-hw.c | 3 +- drivers/media/rc/bpf-lirc.c | 3 +- drivers/media/usb/cpia2/cpia2.h | 1 + drivers/media/usb/cpia2/cpia2_core.c | 12 ++ drivers/media/usb/cpia2/cpia2_usb.c | 13 +- drivers/media/usb/dvb-usb/cinergyT2-core.c | 2 + drivers/media/usb/dvb-usb/cxusb.c | 2 +- drivers/media/usb/dvb-usb/dtv5100.c | 7 +- drivers/media/usb/em28xx/em28xx-input.c | 8 +- drivers/media/usb/gspca/gl860/gl860.c | 4 +- drivers/media/usb/gspca/sq905.c | 2 +- drivers/media/usb/gspca/sunplus.c | 8 +- drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +- drivers/media/usb/uvc/uvc_video.c | 27 ++++ drivers/media/usb/zr364xx/zr364xx.c | 1 + drivers/media/v4l2-core/v4l2-fh.c | 1 + drivers/memory/atmel-ebi.c | 4 +- drivers/memory/fsl_ifc.c | 8 +- drivers/mfd/da9052-i2c.c | 1 + drivers/mfd/stmpe-i2c.c | 2 +- drivers/misc/eeprom/idt_89hpesx.c | 8 +- drivers/misc/ibmasm/module.c | 5 +- drivers/mmc/core/block.c | 8 ++ drivers/mmc/core/core.c | 7 +- drivers/mmc/core/sd.c | 10 +- drivers/mmc/host/sdhci.c | 4 + drivers/mmc/host/sdhci.h | 1 + drivers/mmc/host/usdhi6rol0.c | 1 + drivers/mmc/host/via-sdmmc.c | 3 + drivers/mmc/host/vub300.c | 2 +- drivers/mtd/nand/raw/marvell_nand.c | 4 +- drivers/net/can/peak_canfd/peak_canfd.c | 4 +- drivers/net/can/usb/ems_usb.c | 3 +- drivers/net/ethernet/aeroflex/greth.c | 3 +- .../net/ethernet/broadcom/genet/bcmgenet.c | 1 + drivers/net/ethernet/broadcom/genet/bcmmii.c | 4 + drivers/net/ethernet/ezchip/nps_enet.c | 4 +- drivers/net/ethernet/ibm/ehea/ehea_main.c | 9 +- drivers/net/ethernet/ibm/ibmvnic.c | 10 +- drivers/net/ethernet/intel/e100.c | 12 +- .../net/ethernet/intel/i40e/i40e_ethtool.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 + drivers/net/ethernet/intel/ice/ice_type.h | 2 +- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 6 + drivers/net/ethernet/micrel/ks8842.c | 4 + .../ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 29 ++--- drivers/net/ethernet/sfc/ef10_sriov.c | 25 ++-- drivers/net/fjes/fjes_main.c | 4 + drivers/net/ieee802154/mac802154_hwsim.c | 11 +- drivers/net/virtio_net.c | 29 ++++- drivers/net/vxlan.c | 2 + drivers/net/wireless/ath/ath10k/mac.c | 1 + drivers/net/wireless/ath/ath9k/main.c | 5 + drivers/net/wireless/ath/carl9170/Kconfig | 8 +- drivers/net/wireless/ath/wcn36xx/main.c | 21 ++- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 37 +++--- .../broadcom/brcm80211/brcmsmac/mac80211_if.c | 8 +- .../net/wireless/intel/iwlwifi/mvm/mac80211.c | 24 +++- .../intel/iwlwifi/pcie/ctxt-info-gen3.c | 15 ++- .../wireless/intel/iwlwifi/pcie/internal.h | 3 + drivers/net/wireless/marvell/mwifiex/pcie.c | 10 +- .../net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 11 +- .../realtek/rtl8xxxu/rtl8xxxu_8192e.c | 59 ++++++++- drivers/net/wireless/rsi/rsi_91x_hal.c | 6 +- drivers/net/wireless/rsi/rsi_91x_mac80211.c | 3 - drivers/net/wireless/rsi/rsi_91x_mgmt.c | 3 +- drivers/net/wireless/rsi/rsi_main.h | 1 - drivers/net/wireless/st/cw1200/cw1200_sdio.c | 1 + drivers/net/wireless/ti/wl1251/cmd.c | 9 +- drivers/net/wireless/ti/wl12xx/main.c | 7 + drivers/of/fdt.c | 8 +- drivers/of/of_reserved_mem.c | 8 +- drivers/pci/controller/pci-aardvark.c | 61 ++++++--- drivers/pci/controller/pci-tegra.c | 1 + drivers/pci/controller/pcie-iproc-msi.c | 29 +++-- drivers/pci/pci-label.c | 2 +- drivers/pci/quirks.c | 11 ++ drivers/phy/ti/phy-dm816x-usb.c | 17 ++- drivers/pinctrl/pinctrl-amd.c | 1 + drivers/pinctrl/pinctrl-mcp23s08.c | 10 +- drivers/platform/x86/toshiba_acpi.c | 1 + drivers/power/reset/gpio-poweroff.c | 1 + drivers/power/supply/Kconfig | 3 +- drivers/power/supply/ab8500_btemp.c | 1 + drivers/power/supply/ab8500_charger.c | 19 ++- drivers/power/supply/ab8500_fg.c | 1 + drivers/power/supply/charger-manager.c | 1 + drivers/power/supply/max17042_battery.c | 2 +- drivers/power/supply/rt5033_battery.c | 7 + drivers/pwm/pwm-spear.c | 4 - drivers/pwm/pwm-tegra.c | 13 -- drivers/regulator/da9052-regulator.c | 3 +- drivers/regulator/uniphier-regulator.c | 1 + drivers/reset/core.c | 5 +- drivers/reset/reset-a10sr.c | 1 + drivers/rtc/rtc-proc.c | 4 +- drivers/rtc/rtc-stm32.c | 6 +- drivers/s390/char/sclp_vt220.c | 4 +- drivers/s390/cio/chp.c | 3 + drivers/s390/cio/chsc.c | 2 - drivers/scsi/FlashPoint.c | 32 ++--- drivers/scsi/be2iscsi/be_main.c | 5 +- drivers/scsi/bnx2i/bnx2i_iscsi.c | 2 +- drivers/scsi/cxgbi/libcxgbi.c | 4 +- drivers/scsi/device_handler/scsi_dh_alua.c | 11 +- drivers/scsi/hosts.c | 4 + drivers/scsi/libiscsi.c | 122 ++++++++---------- drivers/scsi/lpfc/lpfc_els.c | 9 ++ drivers/scsi/lpfc/lpfc_sli.c | 5 +- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 4 +- drivers/scsi/qedi/qedi_fw.c | 2 +- drivers/scsi/qedi/qedi_main.c | 2 +- drivers/scsi/scsi_lib.c | 1 + drivers/scsi/scsi_transport_iscsi.c | 12 ++ drivers/spi/spi-loopback-test.c | 2 +- drivers/spi/spi-omap-100k.c | 2 +- drivers/spi/spi-sun6i.c | 6 +- drivers/spi/spi-topcliff-pch.c | 4 +- drivers/spi/spi.c | 1 + drivers/ssb/scan.c | 1 + drivers/ssb/sdio.c | 1 - drivers/staging/gdm724x/gdm_lte.c | 20 ++- drivers/staging/media/imx/imx-media-csi.c | 14 +- drivers/staging/mt7621-dts/mt7621.dtsi | 2 +- drivers/staging/rtl8723bs/hal/odm.h | 5 +- drivers/tty/nozomi.c | 9 +- drivers/tty/serial/8250/8250_port.c | 19 ++- drivers/tty/serial/8250/serial_cs.c | 13 +- drivers/tty/serial/fsl_lpuart.c | 3 + drivers/tty/serial/mvebu-uart.c | 33 +++-- drivers/tty/serial/sh-sci.c | 8 ++ drivers/usb/class/cdc-acm.c | 5 + drivers/usb/dwc3/core.c | 3 +- drivers/usb/gadget/function/f_eem.c | 43 +++++- drivers/usb/gadget/function/f_fs.c | 67 +++++----- drivers/usb/gadget/function/f_hid.c | 2 +- drivers/usb/gadget/legacy/hid.c | 4 +- drivers/usb/host/xhci-mem.c | 1 + drivers/usb/typec/class.c | 4 +- drivers/vfio/pci/vfio_pci.c | 29 +++-- drivers/video/backlight/lm3630a_bl.c | 12 +- drivers/visorbus/visorchipset.c | 6 +- drivers/w1/slaves/w1_ds2438.c | 4 +- drivers/watchdog/aspeed_wdt.c | 2 +- drivers/watchdog/iTCO_wdt.c | 12 +- drivers/watchdog/lpc18xx_wdt.c | 2 +- drivers/watchdog/sbc60xxwdt.c | 2 +- drivers/watchdog/sc520_wdt.c | 2 +- drivers/watchdog/w83877f_wdt.c | 2 +- fs/btrfs/Kconfig | 2 + fs/btrfs/delayed-inode.c | 18 ++- fs/btrfs/send.c | 11 ++ fs/btrfs/transaction.c | 6 +- fs/btrfs/tree-log.c | 1 + fs/ceph/addr.c | 10 +- fs/configfs/file.c | 10 +- fs/crypto/fname.c | 9 +- fs/dlm/config.c | 9 ++ fs/dlm/lowcomms.c | 2 +- fs/ext4/extents.c | 3 + fs/ext4/extents_status.c | 4 +- fs/ext4/ialloc.c | 11 +- fs/ext4/mballoc.c | 9 +- fs/ext4/super.c | 1 + fs/f2fs/super.c | 1 + fs/fs-writeback.c | 9 +- fs/fuse/dev.c | 11 +- fs/jfs/inode.c | 3 +- fs/jfs/jfs_logmgr.c | 1 + fs/nfs/inode.c | 4 + fs/nfs/nfs3proc.c | 4 +- fs/nfs/nfs4client.c | 82 ++++++------ fs/nfs/pnfs_nfs.c | 52 ++++---- fs/ntfs/inode.c | 2 +- fs/ocfs2/filecheck.c | 6 +- fs/ocfs2/stackglue.c | 8 +- fs/orangefs/super.c | 2 +- fs/reiserfs/journal.c | 14 ++ fs/seq_file.c | 3 + fs/ubifs/dir.c | 7 + fs/udf/namei.c | 4 + include/crypto/internal/hash.h | 8 +- include/linux/mfd/abx500/ux500_chargalg.h | 2 +- include/linux/netdev_features.h | 2 +- include/linux/nfs_fs.h | 1 + include/linux/prandom.h | 2 +- include/linux/tracepoint.h | 10 ++ include/media/v4l2-subdev.h | 4 + include/net/ip.h | 12 +- include/net/ip6_route.h | 16 ++- include/net/sctp/structs.h | 2 +- include/scsi/libiscsi.h | 11 +- include/scsi/scsi_transport_iscsi.h | 2 + include/uapi/linux/ethtool.h | 4 +- kernel/cpu.c | 49 +++++++ kernel/sched/fair.c | 8 +- kernel/time/clocksource.c | 53 +++++++- kernel/trace/bpf_trace.c | 3 +- kernel/trace/trace.c | 91 +++++++------ kernel/trace/trace_events_hist.c | 13 +- kernel/tracepoint.c | 33 ++++- lib/decompress_unlz4.c | 8 ++ lib/iov_iter.c | 2 +- lib/kstrtox.c | 13 +- lib/kstrtox.h | 2 + lib/seq_buf.c | 8 +- lib/vsprintf.c | 82 +++++++----- mm/huge_memory.c | 2 +- net/bluetooth/hci_core.c | 16 +-- net/bluetooth/hci_event.c | 13 +- net/bluetooth/mgmt.c | 6 + net/bridge/br_multicast.c | 2 + net/can/gw.c | 3 + net/core/dev.c | 11 +- net/core/filter.c | 4 - net/ipv4/fib_frontend.c | 2 + net/ipv4/ip_output.c | 32 +++-- net/ipv4/route.c | 3 +- net/ipv6/exthdrs.c | 31 +++-- net/ipv6/ip6_output.c | 32 ++--- net/ipv6/output_core.c | 28 +--- net/mac80211/sta_info.c | 5 - net/netfilter/nft_exthdr.c | 3 + net/netfilter/nft_osf.c | 5 + net/netfilter/nft_tproxy.c | 9 +- net/netlabel/netlabel_mgmt.c | 19 +-- net/sched/cls_tcindex.c | 2 +- net/sched/sch_qfq.c | 8 +- net/sctp/bind_addr.c | 19 +-- net/sctp/input.c | 8 +- net/sctp/ipv6.c | 7 +- net/sctp/protocol.c | 7 +- net/sctp/sm_make_chunk.c | 29 +++-- net/vmw_vsock/af_vsock.c | 2 +- net/wireless/wext-spy.c | 14 +- net/xfrm/xfrm_user.c | 28 ++-- samples/bpf/xdp_redirect_user.c | 2 +- security/integrity/evm/evm_main.c | 2 +- security/integrity/evm/evm_secfs.c | 13 +- security/selinux/avc.c | 13 +- security/smack/smackfs.c | 2 + sound/ac97/bus.c | 2 +- sound/firewire/Kconfig | 5 +- sound/firewire/bebob/bebob.c | 5 +- sound/firewire/oxfw/oxfw.c | 2 +- sound/isa/cmi8330.c | 2 +- sound/isa/sb/sb16_csp.c | 8 +- sound/pci/hda/hda_tegra.c | 3 + sound/ppc/powermac.c | 6 +- sound/soc/atmel/atmel-i2s.c | 34 +++-- sound/soc/codecs/cs42l42.h | 2 +- sound/soc/hisilicon/hi6210-i2s.c | 14 +- sound/soc/sh/rcar/adg.c | 4 +- sound/soc/soc-core.c | 2 +- sound/soc/tegra/tegra_alc5632.c | 1 + sound/soc/tegra/tegra_max98090.c | 1 + sound/soc/tegra/tegra_rt5640.c | 1 + sound/soc/tegra/tegra_rt5677.c | 1 + sound/soc/tegra/tegra_sgtl5000.c | 1 + sound/soc/tegra/tegra_wm8753.c | 1 + sound/soc/tegra/tegra_wm8903.c | 1 + sound/soc/tegra/tegra_wm9712.c | 1 + sound/soc/tegra/trimslice.c | 1 + sound/usb/format.c | 2 + sound/usb/mixer.c | 5 +- tools/bpf/bpftool/main.c | 4 +- tools/perf/util/llvm-utils.c | 2 + .../powerpc/pmu/ebb/no_handler_test.c | 2 - tools/testing/selftests/timers/rtcpie.c | 10 +- tools/testing/selftests/x86/protection_keys.c | 3 +- 432 files changed, 2528 insertions(+), 1274 deletions(-) -- 2.25.1

1 403

【Meeting Notice】openEuler kernel 技术分享第八期 & 双周例会 Time: 2021-07-23 14:00-18:00
by Meeting Book 23 Jul '21

23 Jul '21

1 0

[PATCH OLK-5.10] usb: gadget: rndis: Fix info leak of rndis
by Zheng Zengkai 22 Jul '21

22 Jul '21

From: Wang Hai <wanghai38(a)huawei.com> hulk inclusion category: bugfix bugzilla: 172330 CVE: HWPSIRT-2021-84477 -------------------------------- We can construct some special USB packets that cause kernel info leak by the following steps of rndis. 1. construct the packet to make rndis call gen_ndis_set_resp(). In gen_ndis_set_resp(), BufOffset comes from the USB packet and it is not checked so that BufOffset can be any value. Therefore, if OID is RNDIS_OID_GEN_CURRENT_PACKET_FILTER, then *params->filter can get data at any address. 2. construct the packet to make rndis call rndis_query_response(). In rndis_query_response(), if OID is RNDIS_OID_GEN_CURRENT_PACKET_FILTER, then the data of *params->filter is fetched and returned, resulting in info leak. Therefore, we need to check the BufOffset to prevent info leak. Here, buf size is USB_COMP_EP0_BUFSIZ, as long as "8 + BufOffset + BufLength" is less than USB_COMP_EP0_BUFSIZ, it will be considered legal. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wang Hai <wanghai38(a)huawei.com> Reviewed-by: Wei Yongjun <weiyongjun1(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- drivers/usb/gadget/composite.c | 2 +- drivers/usb/gadget/function/rndis.c | 37 +++++++++++++++++++++++++---- 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c index 1a556a628971..7f963bb1c59b 100644 --- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c @@ -2157,7 +2157,7 @@ int composite_dev_prepare(struct usb_composite_driver *composite, if (!cdev->req) return -ENOMEM; - cdev->req->buf = kmalloc(USB_COMP_EP0_BUFSIZ, GFP_KERNEL); + cdev->req->buf = kzalloc(USB_COMP_EP0_BUFSIZ, GFP_KERNEL); if (!cdev->req->buf) goto fail; diff --git a/drivers/usb/gadget/function/rndis.c b/drivers/usb/gadget/function/rndis.c index 64de9f1b874c..9ea94215e113 100644 --- a/drivers/usb/gadget/function/rndis.c +++ b/drivers/usb/gadget/function/rndis.c @@ -506,6 +506,10 @@ static int gen_ndis_set_resp(struct rndis_params *params, u32 OID, switch (OID) { case RNDIS_OID_GEN_CURRENT_PACKET_FILTER: + if (buf_len < 2) { + pr_err("%s:Not support for buf_len < 2\n", __func__); + break; + } /* these NDIS_PACKET_TYPE_* bitflags are shared with * cdc_filter; it's not RNDIS-specific @@ -592,6 +596,7 @@ static int rndis_query_response(struct rndis_params *params, rndis_query_msg_type *buf) { rndis_query_cmplt_type *resp; + u32 BufOffset, BufLength; rndis_resp_t *r; /* pr_debug("%s: OID = %08X\n", __func__, cpu_to_le32(buf->OID)); */ @@ -612,12 +617,25 @@ static int rndis_query_response(struct rndis_params *params, resp->MessageType = cpu_to_le32(RNDIS_MSG_QUERY_C); resp->RequestID = buf->RequestID; /* Still LE in msg buffer */ + BufOffset = le32_to_cpu(buf->InformationBufferOffset); + BufLength = le32_to_cpu(buf->InformationBufferLength); + + /* + * If the address of the buf to be accessed exceeds the valid + * range of the buf, then return RNDIS_STATUS_NOT_SUPPORTED. + */ + if (8 + BufOffset + BufLength >= USB_COMP_EP0_BUFSIZ) { + resp->Status = cpu_to_le32(RNDIS_STATUS_NOT_SUPPORTED); + resp->MessageLength = cpu_to_le32(sizeof(*resp)); + resp->InformationBufferLength = cpu_to_le32(0); + resp->InformationBufferOffset = cpu_to_le32(0); + params->resp_avail(params->v); + return 0; + } if (gen_ndis_query_resp(params, le32_to_cpu(buf->OID), - le32_to_cpu(buf->InformationBufferOffset) - + 8 + (u8 *)buf, - le32_to_cpu(buf->InformationBufferLength), - r)) { + BufOffset + 8 + (u8 *)buf, BufLength, + r)) { /* OID not supported */ resp->Status = cpu_to_le32(RNDIS_STATUS_NOT_SUPPORTED); resp->MessageLength = cpu_to_le32(sizeof *resp); @@ -660,6 +678,17 @@ static int rndis_set_response(struct rndis_params *params, resp->MessageType = cpu_to_le32(RNDIS_MSG_SET_C); resp->MessageLength = cpu_to_le32(16); resp->RequestID = buf->RequestID; /* Still LE in msg buffer */ + + /* + * If the address of the buf to be accessed exceeds the valid + * range of the buf, then return RNDIS_STATUS_NOT_SUPPORTED. + */ + if (8 + BufOffset + BufLength >= USB_COMP_EP0_BUFSIZ) { + resp->Status = cpu_to_le32(RNDIS_STATUS_NOT_SUPPORTED); + params->resp_avail(params->v); + return 0; + } + if (gen_ndis_set_resp(params, le32_to_cpu(buf->OID), ((u8 *)buf) + 8 + BufOffset, BufLength, r)) resp->Status = cpu_to_le32(RNDIS_STATUS_NOT_SUPPORTED); -- 2.20.1

1 0

[PATCH openEuler-21.09 000/147] crypto: hisilicon - ACC add new feature
by Zheng Zengkai 22 Jul '21

22 Jul '21

crypto: hisilicon - ACC add new feature and bugfix. Colin Ian King (1): crypto: hisilicon/sec - Fix spelling mistake "fallbcak" -> "fallback" Eric Biggers (1): crypto: sha - split sha.h into sha1.h and sha2.h Hao Fang (1): crypto: hisilicon - use the correct HiSilicon copyright Hui Tang (37): crypto: hisilicon/hpre - delete ECC 1bit error reported threshold crypto: hisilicon/hpre - add two RAS correctable errors processing crypto: hisilicon/hpre - add ecc algorithm inqury for uacce device crypto: hisilicon/hpre - adapt the number of clusters crypto: hisilicon/hpre - tiny fix crypto: hisilicon/hpre - enable Elliptic curve cryptography crypto: hisilicon/hpre - delete wrap of 'CONFIG_CRYPTO_DH' crypto: hisilicon/hpre - optimise 'hpre_algs_register' error path crypto: hisilicon - fix the check on dma address crypto: hisilicon/hpre - fix "hpre_ctx_init" resource leak crypto: hisilicon/hpre - fix Kconfig crypto: hisilicon/hpre - fix PASID setting on kunpeng 920 crypto: hisilicon/hpre - fix a typo and delete redundant blank line crypto: hisilicon/hpre - delete redundant '\n' crypto: hisilicon/hpre - delete the rudundant space after return crypto: hisilicon/hpre - use the correct variable type crypto: hisilicon/hpre - add debug log crypto: hisilicon/hpre - delete redundant log and return in advance crypto: testmgr - fix initialization of 'secret_size' crypto: ecdh - extend 'cra_driver_name' with curve name crypto: hisilicon/hpre - extend 'cra_driver_name' with curve name crypto: hisilicon/hpre - fix unmapping invalid dma address crypto: hisilicon/hpre - the macro 'HPRE_ADDR' expands crypto: hisilicon/hpre - init a structure member each line crypto: hisilicon/hpre - replace macro with inline function crypto: hisilicon/hpre - remove the macro of 'HPRE_DEV' crypto: hisilicon/hpre - delete rudundant initialization crypto: hisilicon/hpre - use 'GENMASK' to generate mask value crypto: hisilicon/hpre - delete rudundant macro definition crypto: hisilicon/hpre - add 'default' for switch statement crypto: ecdh - fix ecdh-nist-p192's entry in testmgr crypto: ecdh - fix 'ecdh_init' crypto: ecdh - register NIST P384 tfm crypto: ecdh - add test suite for NIST P384 crypto: hisilicon/hpre - fix ecdh self test issue crypto: hisilicon/hpre - add check before gx modulo p crypto: hisilicon/hpre - register ecdh NIST P384 Kai Ye (36): crypto: hisilicon/sec2 - Fix aead authentication setting key error uacce: delete some redundant code. uacce: modify the module author information. crypto: hisilicon/qm - SVA bugfixed on Kunpeng920 crypto: hisilicon - add ZIP device using mode parameter crypto: hisilicon/hpre - register HPRE device to uacce crypto: hisilicon/sec - register SEC device to uacce uacce: delete unneeded variable initialization crypto: hisilicon/sec - fixup checking the 3DES weak key crypto: hisilicon/qm - delete redundant code crypto: hisilicon/sec - use the correct print format crypto: hisilicon/sgl - add a comment for block size initialization crypto: hisilicon/sgl - delete unneeded variable initialization crypto: hisilicon/sgl - add some dfx logs crypto: hisilicon/sgl - fix the soft sg map to hardware sg crypto: hisilicon/sgl - fix the sg buf unmap crypto: hisilicon/qm - add dfx log if not use hardware crypto algs crypto: hisilicon/qm - fix the process of VF's list adding crypto: hisilicon/sec - add new type of SQE crypto: hisilicon/sec - driver adapt to new SQE crypto: hisilicon/sec - add new skcipher mode for SEC crypto: hisilicon/sec - add fallback tfm supporting for XTS mode crypto: hisilicon/sec - fixup 3des minimum key size declaration crypto: hisilicon/sec - add new algorithm mode for AEAD crypto: hisilicon/sec - add fallback tfm supporting for aeads crypto: hisilicon/sec - add hardware integrity check value process crypto: hisilicon/sec - modify the SEC request structure uacce: add print information if not enable sva crypto: hisilicon/qm - supports writing QoS int the host crypto: hisilicon/qm - add the "alg_qos" file node crypto: hisilicon/qm - merges the work initialization process into a single function crypto: hisilicon/qm - add pf ping single vf function crypto: hisilicon/qm - supports to inquiry each function's QoS crypto: hisilicon/sec - adds the max shaper type rate crypto: hisilicon/hpre - adds the max shaper type rate crypto: hisilicon/zip - adds the max shaper type rate Lee Jones (1): crypto: hisilicon/sec - Supply missing description for 'sec_queue_empty()'s 'queue' param Longfang Liu (7): crypto: hisilicon - delete unused structure member variables crypto: hisilicon - fixes some coding style crypto: hisilicon/sec - fixes some coding style crypto: hisilicon/sec - fixes some driver coding style crypto: hisilicon/sec - Fixes AES algorithm mode parameter problem crypto: hisilicon/sec - Fix a module parameter error crypto: hisilicon/qm - support address prefetching Meng Yu (10): crypto: hisilicon/hpre - add version adapt to new algorithms crypto: hisilicon/hpre - add algorithm type crypto: ecdh - move curve_id of ECDH from the key to algorithm name crypto: ecc - expose ecc curves crypto: ecc - add curve25519 params and expose them crypto: hisilicon/hpre - add 'ECDH' algorithm crypto: hisilicon/hpre - add 'CURVE25519' algorithm crypto: ecc - Correct an error in the comments crypto: hisilicon/hpre - Add processing of src_data in 'CURVE25519' crypto: ecc - delete a useless function declaration Ruiqi Gong (1): crypto: hisilicon/hpre - fix a typo in hpre_crypto.c Saulo Alessandre (3): crypto: ecc - Add NIST P384 curve parameters crypto: ecc - Add math to support fast NIST P384 crypto: ecdsa - Register NIST P384 and extend test suite Shiju Jose (1): crypto: hisilicon - Fix doc warnings in sgl.c and qm.c Sihang Chen (1): crypto: hisilicon/qm - update irqflag Stefan Berger (2): oid_registry: Add OIDs for ECDSA with SHA224/256/384/512 crypto: ecdsa - Add support for ECDSA signature verification Weili Qian (37): crypto: hisilicon/qm - numbers are replaced by macros crypto: hisilicon/qm - modify the return type of function crypto: hisilicon/qm - modify the return type of debugfs interface crypto: hisilicon/qm - modify return type of 'qm_set_sqctype' crypto: hisilicon/qm - replace 'sprintf' with 'scnprintf' crypto: hisilicon/qm - split 'qm_qp_ctx_cfg' into smaller pieces crypto: hisilicon/qm - split 'qm_eq_ctx_cfg' into smaller pieces crypto: hisilicon/qm - split 'hisi_qm_init' into smaller pieces hwrng: hisi - remove HiSilicon TRNG driver crypto: hisilicon/trng - add HiSilicon TRNG driver support crypto: hisilicon/trng - add support for PRNG crypto: hisilicon/qm - fix use of 'dma_map_single' crypto: hisilicon - PASID fixed on Kunpeng 930 crypto: hisilicon/qm - removing driver after reset crypto: hisilicon/qm - fix request missing error crypto: hisilicon/qm - fix the value of 'QM_SQC_VFT_BASE_MASK_V2' crypto: hisilicon/qm - do not reset hardware when CE happens crypto: hisilicon/qm - fix printing format issue crypto: hisilicon/qm - set the total number of queues crypto: hisilicon/qm - move 'CURRENT_QM' code to qm.c crypto: hisilicon/qm - set the number of queues for function crypto: hisilicon/qm - add queue isolation support for Kunpeng930 crypto: hisilicon/qm - add stop queue by hardware crypto: hisilicon/trng - add version to adapt new algorithm crypto: hisilicon - dynamic configuration 'err_info' crypto: hisilicon - support new error types for ZIP crypto: hisilicon - add new error type for SEC crypto: hisilicon - enable new error types for QM crypto: hisilicon/qm - initialize the device before doing tasks crypto: hisilicon/qm - modify 'QM_RESETTING' clearing error crypto: hisilicon/qm - adjust order of device error configuration crypto: hisilicon/qm - enable to close master ooo when NFE occurs crypto: hisilicon/qm - add MSI detection steps on Kunpeng930 crypto: hisilicon/qm - adjust reset interface crypto: hisilicon/qm - enable PF and VFs communication crypto: hisilicon/qm - add callback to support communication crypto: hisilicon/qm - update reset flow Wenkai Lin (1): crypto: hisilicon/qm - implement for querying hardware tasks status. Yang Shen (5): crypto: hisilicon/zip - add a work_queue for zip irq crypto: hisilicon/zip - adjust functions location crypto: hisilicon/zip - add comments for 'hisi_zip_sqe' crypto: hisilicon/zip - initialize operations about 'sqe' in 'acomp_alg.init' crypto: hisilicon/zip - support new 'sqe' type in Kunpeng930 Yejune Deng (1): crypto: hisilicon/trng - replace atomic_add_return() Zou Wei (1): crypto: hisilicon - switch to memdup_user_nul() arch/arm/crypto/sha1-ce-glue.c | 2 +- arch/arm/crypto/sha1.h | 2 +- arch/arm/crypto/sha1_glue.c | 2 +- arch/arm/crypto/sha1_neon_glue.c | 2 +- arch/arm/crypto/sha2-ce-glue.c | 2 +- arch/arm/crypto/sha256_glue.c | 2 +- arch/arm/crypto/sha256_neon_glue.c | 2 +- arch/arm/crypto/sha512-glue.c | 2 +- arch/arm/crypto/sha512-neon-glue.c | 2 +- arch/arm64/configs/defconfig | 1 + arch/arm64/crypto/aes-glue.c | 2 +- arch/arm64/crypto/sha1-ce-glue.c | 2 +- arch/arm64/crypto/sha2-ce-glue.c | 2 +- arch/arm64/crypto/sha256-glue.c | 2 +- arch/arm64/crypto/sha512-ce-glue.c | 2 +- arch/arm64/crypto/sha512-glue.c | 2 +- arch/mips/cavium-octeon/crypto/octeon-sha1.c | 2 +- .../mips/cavium-octeon/crypto/octeon-sha256.c | 2 +- .../mips/cavium-octeon/crypto/octeon-sha512.c | 2 +- arch/powerpc/crypto/sha1-spe-glue.c | 2 +- arch/powerpc/crypto/sha1.c | 2 +- arch/powerpc/crypto/sha256-spe-glue.c | 2 +- arch/s390/crypto/sha.h | 3 +- arch/s390/crypto/sha1_s390.c | 2 +- arch/s390/crypto/sha256_s390.c | 2 +- arch/s390/crypto/sha3_256_s390.c | 1 - arch/s390/crypto/sha3_512_s390.c | 1 - arch/s390/crypto/sha512_s390.c | 2 +- arch/s390/purgatory/purgatory.c | 2 +- arch/sparc/crypto/sha1_glue.c | 2 +- arch/sparc/crypto/sha256_glue.c | 2 +- arch/sparc/crypto/sha512_glue.c | 2 +- arch/x86/crypto/sha1_ssse3_glue.c | 2 +- arch/x86/crypto/sha256_ssse3_glue.c | 2 +- arch/x86/crypto/sha512_ssse3_glue.c | 2 +- arch/x86/purgatory/purgatory.c | 2 +- crypto/Kconfig | 10 + crypto/Makefile | 6 + crypto/asymmetric_keys/asym_tpm.c | 2 +- crypto/ecc.c | 291 +- crypto/ecc.h | 49 +- crypto/ecc_curve_defs.h | 49 + crypto/ecdh.c | 117 +- crypto/ecdh_helper.c | 4 +- crypto/ecdsa.c | 376 +++ crypto/ecdsasignature.asn1 | 4 + crypto/sha1_generic.c | 2 +- crypto/sha256_generic.c | 2 +- crypto/sha512_generic.c | 2 +- crypto/testmgr.c | 35 +- crypto/testmgr.h | 527 +++- drivers/char/hw_random/Kconfig | 13 - drivers/char/hw_random/Makefile | 1 - drivers/char/hw_random/hisi-trng-v2.c | 99 - drivers/char/random.c | 2 +- drivers/crypto/allwinner/sun4i-ss/sun4i-ss.h | 2 +- .../crypto/allwinner/sun8i-ce/sun8i-ce-hash.c | 3 +- drivers/crypto/allwinner/sun8i-ce/sun8i-ce.h | 3 +- .../crypto/allwinner/sun8i-ss/sun8i-ss-hash.c | 3 +- drivers/crypto/allwinner/sun8i-ss/sun8i-ss.h | 3 +- drivers/crypto/amcc/crypto4xx_alg.c | 2 +- drivers/crypto/amcc/crypto4xx_core.c | 2 +- drivers/crypto/atmel-authenc.h | 3 +- drivers/crypto/atmel-ecc.c | 28 +- drivers/crypto/atmel-sha.c | 3 +- drivers/crypto/axis/artpec6_crypto.c | 3 +- drivers/crypto/bcm/cipher.c | 3 +- drivers/crypto/bcm/cipher.h | 3 +- drivers/crypto/bcm/spu.h | 3 +- drivers/crypto/caam/compat.h | 3 +- drivers/crypto/cavium/nitrox/nitrox_aead.c | 1 - drivers/crypto/ccp/ccp-crypto-sha.c | 3 +- drivers/crypto/ccp/ccp-crypto.h | 3 +- drivers/crypto/ccree/cc_driver.h | 3 +- drivers/crypto/chelsio/chcr_algo.c | 3 +- drivers/crypto/hisilicon/Kconfig | 10 + drivers/crypto/hisilicon/Makefile | 1 + drivers/crypto/hisilicon/hpre/hpre.h | 26 +- drivers/crypto/hisilicon/hpre/hpre_crypto.c | 1052 ++++++- drivers/crypto/hisilicon/hpre/hpre_main.c | 489 +-- drivers/crypto/hisilicon/qm.c | 2716 +++++++++++++---- drivers/crypto/hisilicon/qm.h | 81 +- drivers/crypto/hisilicon/sec/sec_algs.c | 2 +- drivers/crypto/hisilicon/sec/sec_drv.c | 13 +- drivers/crypto/hisilicon/sec/sec_drv.h | 2 +- drivers/crypto/hisilicon/sec2/sec.h | 35 +- drivers/crypto/hisilicon/sec2/sec_crypto.c | 1170 +++++-- drivers/crypto/hisilicon/sec2/sec_crypto.h | 199 +- drivers/crypto/hisilicon/sec2/sec_main.c | 415 ++- drivers/crypto/hisilicon/sgl.c | 39 +- drivers/crypto/hisilicon/trng/Makefile | 2 + drivers/crypto/hisilicon/trng/trng.c | 341 +++ drivers/crypto/hisilicon/zip/zip.h | 50 +- drivers/crypto/hisilicon/zip/zip_crypto.c | 710 +++-- drivers/crypto/hisilicon/zip/zip_main.c | 249 +- drivers/crypto/img-hash.c | 3 +- drivers/crypto/inside-secure/safexcel.h | 3 +- .../crypto/inside-secure/safexcel_cipher.c | 3 +- drivers/crypto/inside-secure/safexcel_hash.c | 3 +- drivers/crypto/ixp4xx_crypto.c | 2 +- drivers/crypto/marvell/cesa/hash.c | 3 +- .../crypto/marvell/octeontx/otx_cptvf_algs.c | 3 +- drivers/crypto/mediatek/mtk-sha.c | 3 +- drivers/crypto/mxs-dcp.c | 3 +- drivers/crypto/n2_core.c | 3 +- drivers/crypto/nx/nx-sha256.c | 2 +- drivers/crypto/nx/nx-sha512.c | 2 +- drivers/crypto/nx/nx.c | 2 +- drivers/crypto/omap-sham.c | 3 +- drivers/crypto/padlock-sha.c | 3 +- drivers/crypto/picoxcell_crypto.c | 3 +- drivers/crypto/qat/qat_common/qat_algs.c | 3 +- drivers/crypto/qce/common.c | 3 +- drivers/crypto/qce/core.c | 1 - drivers/crypto/qce/sha.h | 3 +- drivers/crypto/rockchip/rk3288_crypto.h | 3 +- drivers/crypto/s5p-sss.c | 3 +- drivers/crypto/sa2ul.c | 3 +- drivers/crypto/sa2ul.h | 2 + drivers/crypto/sahara.c | 3 +- drivers/crypto/stm32/stm32-hash.c | 3 +- drivers/crypto/talitos.c | 3 +- drivers/crypto/ux500/hash/hash_core.c | 3 +- drivers/firmware/efi/embedded-firmware.c | 2 +- drivers/misc/uacce/uacce.c | 26 +- .../inline_crypto/ch_ipsec/chcr_ipsec.c | 3 +- .../chelsio/inline_crypto/chtls/chtls.h | 3 +- drivers/nfc/s3fwrn5/firmware.c | 2 +- drivers/tee/tee_core.c | 2 +- fs/crypto/fname.c | 2 +- fs/crypto/hkdf.c | 2 +- fs/ubifs/auth.c | 1 - fs/verity/fsverity_private.h | 2 +- include/crypto/ecc_curve.h | 60 + include/crypto/ecdh.h | 3 +- include/crypto/hash_info.h | 3 +- include/crypto/sha1.h | 46 + include/crypto/sha1_base.h | 2 +- include/crypto/{sha.h => sha2.h} | 41 +- include/crypto/sha256_base.h | 2 +- include/crypto/sha512_base.h | 2 +- include/linux/ccp.h | 3 +- include/linux/filter.h | 2 +- include/linux/oid_registry.h | 6 +- include/linux/purgatory.h | 2 +- include/uapi/misc/uacce/hisi_qm.h | 1 + kernel/crash_core.c | 2 +- kernel/kexec_core.c | 1 - kernel/kexec_file.c | 2 +- lib/crypto/sha256.c | 2 +- lib/digsig.c | 2 +- lib/sha1.c | 2 +- net/bluetooth/ecdh_helper.c | 2 - net/bluetooth/selftest.c | 2 +- net/bluetooth/smp.c | 6 +- net/ipv6/seg6_hmac.c | 1 - net/mptcp/crypto.c | 2 +- net/mptcp/options.c | 2 +- net/mptcp/subflow.c | 2 +- security/integrity/integrity.h | 2 +- security/keys/encrypted-keys/encrypted.c | 2 +- security/keys/trusted-keys/trusted_tpm1.c | 2 +- sound/soc/codecs/cros_ec_codec.c | 2 +- 163 files changed, 7589 insertions(+), 2085 deletions(-) create mode 100644 crypto/ecdsa.c create mode 100644 crypto/ecdsasignature.asn1 delete mode 100644 drivers/char/hw_random/hisi-trng-v2.c create mode 100644 drivers/crypto/hisilicon/trng/Makefile create mode 100644 drivers/crypto/hisilicon/trng/trng.c create mode 100644 include/crypto/ecc_curve.h create mode 100644 include/crypto/sha1.h rename include/crypto/{sha.h => sha2.h} (77%) -- 2.20.1

1 143

[PATCH openEuler-1.0-LTS] ARM: footbridge: remove personal server platform
by Yang Yingliang 21 Jul '21

21 Jul '21

From: Russell King <rmk+kernel(a)armlinux.org.uk> mainline inclusion from mainline-v5.13-rc1 commit 298a58e165e447ccfaae35fe9f651f9d7e15166f category: bugfix bugzilla: NA CVE: CVE-2021-32078 -------------------------------- Remove the personal server platform, as that has had an array overrun issue identified. It is believed that no one is using this code. Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Conflicts: arch/arm/mach-footbridge/Kconfig arch/arm/mach-footbridge/personal-pci.c Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm/configs/footbridge_defconfig | 1 - arch/arm/mach-footbridge/Kconfig | 21 --------- arch/arm/mach-footbridge/Makefile | 2 - arch/arm/mach-footbridge/personal-pci.c | 58 ------------------------- arch/arm/mach-footbridge/personal.c | 25 ----------- 5 files changed, 107 deletions(-) delete mode 100644 arch/arm/mach-footbridge/personal-pci.c delete mode 100644 arch/arm/mach-footbridge/personal.c diff --git a/arch/arm/configs/footbridge_defconfig b/arch/arm/configs/footbridge_defconfig index 3a7938f244e56..2aa3ebeb89d7f 100644 --- a/arch/arm/configs/footbridge_defconfig +++ b/arch/arm/configs/footbridge_defconfig @@ -7,7 +7,6 @@ CONFIG_EXPERT=y CONFIG_MODULES=y CONFIG_ARCH_FOOTBRIDGE=y CONFIG_ARCH_CATS=y -CONFIG_ARCH_PERSONAL_SERVER=y CONFIG_ARCH_EBSA285_HOST=y CONFIG_ARCH_NETWINDER=y CONFIG_LEDS=y diff --git a/arch/arm/mach-footbridge/Kconfig b/arch/arm/mach-footbridge/Kconfig index cbbdd84cf49ad..84c400f96aa21 100644 --- a/arch/arm/mach-footbridge/Kconfig +++ b/arch/arm/mach-footbridge/Kconfig @@ -15,27 +15,6 @@ config ARCH_CATS Saying N will reduce the size of the Footbridge kernel. -config ARCH_PERSONAL_SERVER - bool "Compaq Personal Server" - select FOOTBRIDGE_HOST - select ISA - select ISA_DMA - select PCI - ---help--- - Say Y here if you intend to run this kernel on the Compaq - Personal Server. - - Saying N will reduce the size of the Footbridge kernel. - - The Compaq Personal Server is not available for purchase. - There are no product plans beyond the current research - prototypes at this time. Information is available at: - - <http://www.crl.hpl.hp.com/projects/personalserver/> - - If you have any questions or comments about the Compaq Personal - Server, send e-mail to <skiff(a)crl.dec.com>. - config ARCH_EBSA285_ADDIN bool "EBSA285 (addin mode)" select ARCH_EBSA285 diff --git a/arch/arm/mach-footbridge/Makefile b/arch/arm/mach-footbridge/Makefile index a09f1041f1413..6262993c05558 100644 --- a/arch/arm/mach-footbridge/Makefile +++ b/arch/arm/mach-footbridge/Makefile @@ -11,12 +11,10 @@ pci-y += dc21285.o pci-$(CONFIG_ARCH_CATS) += cats-pci.o pci-$(CONFIG_ARCH_EBSA285_HOST) += ebsa285-pci.o pci-$(CONFIG_ARCH_NETWINDER) += netwinder-pci.o -pci-$(CONFIG_ARCH_PERSONAL_SERVER) += personal-pci.o obj-$(CONFIG_ARCH_CATS) += cats-hw.o isa-timer.o obj-$(CONFIG_ARCH_EBSA285) += ebsa285.o dc21285-timer.o obj-$(CONFIG_ARCH_NETWINDER) += netwinder-hw.o isa-timer.o -obj-$(CONFIG_ARCH_PERSONAL_SERVER) += personal.o dc21285-timer.o obj-$(CONFIG_PCI) +=$(pci-y) diff --git a/arch/arm/mach-footbridge/personal-pci.c b/arch/arm/mach-footbridge/personal-pci.c deleted file mode 100644 index 4391e433a4b2f..0000000000000 --- a/arch/arm/mach-footbridge/personal-pci.c +++ /dev/null @@ -1,58 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * linux/arch/arm/mach-footbridge/personal-pci.c - * - * PCI bios-type initialisation for PCI machines - * - * Bits taken from various places. - */ -#include <linux/kernel.h> -#include <linux/pci.h> -#include <linux/init.h> - -#include <asm/irq.h> -#include <asm/mach/pci.h> -#include <asm/mach-types.h> - -static int irqmap_personal_server[] __initdata = { - IRQ_IN0, IRQ_IN1, IRQ_IN2, IRQ_IN3, 0, 0, 0, - IRQ_DOORBELLHOST, IRQ_DMA1, IRQ_DMA2, IRQ_PCI -}; - -static int __init personal_server_map_irq(const struct pci_dev *dev, u8 slot, - u8 pin) -{ - unsigned char line; - - pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &line); - - if (line > 0x40 && line <= 0x5f) { - /* line corresponds to the bit controlling this interrupt - * in the footbridge. Ignore the first 8 interrupt bits, - * look up the rest in the map. IN0 is bit number 8 - */ - return irqmap_personal_server[(line & 0x1f) - 8]; - } else if (line == 0) { - /* no interrupt */ - return 0; - } else - return irqmap_personal_server[(line - 1) & 3]; -} - -static struct hw_pci personal_server_pci __initdata = { - .map_irq = personal_server_map_irq, - .nr_controllers = 1, - .ops = &dc21285_ops, - .setup = dc21285_setup, - .preinit = dc21285_preinit, - .postinit = dc21285_postinit, -}; - -static int __init personal_pci_init(void) -{ - if (machine_is_personal_server()) - pci_common_init(&personal_server_pci); - return 0; -} - -subsys_initcall(personal_pci_init); diff --git a/arch/arm/mach-footbridge/personal.c b/arch/arm/mach-footbridge/personal.c deleted file mode 100644 index ca715754fc007..0000000000000 --- a/arch/arm/mach-footbridge/personal.c +++ /dev/null @@ -1,25 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * linux/arch/arm/mach-footbridge/personal.c - * - * Personal server (Skiff) machine fixup - */ -#include <linux/init.h> -#include <linux/spinlock.h> - -#include <asm/hardware/dec21285.h> -#include <asm/mach-types.h> - -#include <asm/mach/arch.h> - -#include "common.h" - -MACHINE_START(PERSONAL_SERVER, "Compaq-PersonalServer") - /* Maintainer: Jamey Hicks / George France */ - .atag_offset = 0x100, - .map_io = footbridge_map_io, - .init_irq = footbridge_init_irq, - .init_time = footbridge_timer_init, - .restart = footbridge_restart, -MACHINE_END - -- 2.25.1

1 0

[PATCH kernel-4.19] ARM: footbridge: remove personal server platform
by Yang Yingliang 21 Jul '21

21 Jul '21

From: Russell King <rmk+kernel(a)armlinux.org.uk> mainline inclusion from mainline-v5.13-rc1 commit 298a58e165e447ccfaae35fe9f651f9d7e15166f category: bugfix bugzilla: NA CVE: CVE-2021-32078 -------------------------------- Remove the personal server platform, as that has had an array overrun issue identified. It is believed that no one is using this code. Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Conflicts: arch/arm/mach-footbridge/Kconfig arch/arm/mach-footbridge/personal-pci.c Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm/configs/footbridge_defconfig | 1 - arch/arm/mach-footbridge/Kconfig | 21 --------- arch/arm/mach-footbridge/Makefile | 2 - arch/arm/mach-footbridge/personal-pci.c | 57 ------------------------- arch/arm/mach-footbridge/personal.c | 25 ----------- 5 files changed, 106 deletions(-) delete mode 100644 arch/arm/mach-footbridge/personal-pci.c delete mode 100644 arch/arm/mach-footbridge/personal.c diff --git a/arch/arm/configs/footbridge_defconfig b/arch/arm/configs/footbridge_defconfig index 3a7938f244e56..2aa3ebeb89d7f 100644 --- a/arch/arm/configs/footbridge_defconfig +++ b/arch/arm/configs/footbridge_defconfig @@ -7,7 +7,6 @@ CONFIG_EXPERT=y CONFIG_MODULES=y CONFIG_ARCH_FOOTBRIDGE=y CONFIG_ARCH_CATS=y -CONFIG_ARCH_PERSONAL_SERVER=y CONFIG_ARCH_EBSA285_HOST=y CONFIG_ARCH_NETWINDER=y CONFIG_LEDS=y diff --git a/arch/arm/mach-footbridge/Kconfig b/arch/arm/mach-footbridge/Kconfig index cbbdd84cf49ad..84c400f96aa21 100644 --- a/arch/arm/mach-footbridge/Kconfig +++ b/arch/arm/mach-footbridge/Kconfig @@ -15,27 +15,6 @@ config ARCH_CATS Saying N will reduce the size of the Footbridge kernel. -config ARCH_PERSONAL_SERVER - bool "Compaq Personal Server" - select FOOTBRIDGE_HOST - select ISA - select ISA_DMA - select PCI - ---help--- - Say Y here if you intend to run this kernel on the Compaq - Personal Server. - - Saying N will reduce the size of the Footbridge kernel. - - The Compaq Personal Server is not available for purchase. - There are no product plans beyond the current research - prototypes at this time. Information is available at: - - <http://www.crl.hpl.hp.com/projects/personalserver/> - - If you have any questions or comments about the Compaq Personal - Server, send e-mail to <skiff(a)crl.dec.com>. - config ARCH_EBSA285_ADDIN bool "EBSA285 (addin mode)" select ARCH_EBSA285 diff --git a/arch/arm/mach-footbridge/Makefile b/arch/arm/mach-footbridge/Makefile index a09f1041f1413..6262993c05558 100644 --- a/arch/arm/mach-footbridge/Makefile +++ b/arch/arm/mach-footbridge/Makefile @@ -11,12 +11,10 @@ pci-y += dc21285.o pci-$(CONFIG_ARCH_CATS) += cats-pci.o pci-$(CONFIG_ARCH_EBSA285_HOST) += ebsa285-pci.o pci-$(CONFIG_ARCH_NETWINDER) += netwinder-pci.o -pci-$(CONFIG_ARCH_PERSONAL_SERVER) += personal-pci.o obj-$(CONFIG_ARCH_CATS) += cats-hw.o isa-timer.o obj-$(CONFIG_ARCH_EBSA285) += ebsa285.o dc21285-timer.o obj-$(CONFIG_ARCH_NETWINDER) += netwinder-hw.o isa-timer.o -obj-$(CONFIG_ARCH_PERSONAL_SERVER) += personal.o dc21285-timer.o obj-$(CONFIG_PCI) +=$(pci-y) diff --git a/arch/arm/mach-footbridge/personal-pci.c b/arch/arm/mach-footbridge/personal-pci.c deleted file mode 100644 index 9d19aa98a663e..0000000000000 --- a/arch/arm/mach-footbridge/personal-pci.c +++ /dev/null @@ -1,57 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * linux/arch/arm/mach-footbridge/personal-pci.c - * - * PCI bios-type initialisation for PCI machines - * - * Bits taken from various places. - */ -#include <linux/kernel.h> -#include <linux/pci.h> -#include <linux/init.h> - -#include <asm/irq.h> -#include <asm/mach/pci.h> -#include <asm/mach-types.h> - -static int irqmap_personal_server[] = { - IRQ_IN0, IRQ_IN1, IRQ_IN2, IRQ_IN3, 0, 0, 0, - IRQ_DOORBELLHOST, IRQ_DMA1, IRQ_DMA2, IRQ_PCI -}; - -static int personal_server_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) -{ - unsigned char line; - - pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &line); - - if (line > 0x40 && line <= 0x5f) { - /* line corresponds to the bit controlling this interrupt - * in the footbridge. Ignore the first 8 interrupt bits, - * look up the rest in the map. IN0 is bit number 8 - */ - return irqmap_personal_server[(line & 0x1f) - 8]; - } else if (line == 0) { - /* no interrupt */ - return 0; - } else - return irqmap_personal_server[(line - 1) & 3]; -} - -static struct hw_pci personal_server_pci __initdata = { - .map_irq = personal_server_map_irq, - .nr_controllers = 1, - .ops = &dc21285_ops, - .setup = dc21285_setup, - .preinit = dc21285_preinit, - .postinit = dc21285_postinit, -}; - -static int __init personal_pci_init(void) -{ - if (machine_is_personal_server()) - pci_common_init(&personal_server_pci); - return 0; -} - -subsys_initcall(personal_pci_init); diff --git a/arch/arm/mach-footbridge/personal.c b/arch/arm/mach-footbridge/personal.c deleted file mode 100644 index ca715754fc007..0000000000000 --- a/arch/arm/mach-footbridge/personal.c +++ /dev/null @@ -1,25 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * linux/arch/arm/mach-footbridge/personal.c - * - * Personal server (Skiff) machine fixup - */ -#include <linux/init.h> -#include <linux/spinlock.h> - -#include <asm/hardware/dec21285.h> -#include <asm/mach-types.h> - -#include <asm/mach/arch.h> - -#include "common.h" - -MACHINE_START(PERSONAL_SERVER, "Compaq-PersonalServer") - /* Maintainer: Jamey Hicks / George France */ - .atag_offset = 0x100, - .map_io = footbridge_map_io, - .init_irq = footbridge_init_irq, - .init_time = footbridge_timer_init, - .restart = footbridge_restart, -MACHINE_END - -- 2.25.1

1 0

[PATCH kernel-4.19] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
by Yang Yingliang 21 Jul '21

21 Jul '21

From: Nanyong Sun <sunnanyong(a)huawei.com> hulk inclusion category: bugfix bugzilla: 174641 CVE: NA ------------------------------------ The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy") introduced a problem: If one thread destroy a kmem_cache A and another thread concurrently create a kmem_cache B, which is mergeable with A and has same size with A, the B may fail to create due to the duplicate sysfs node. The scenario in detail: 1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is mergeable, it decreases A's refcount and if refcount is 0, then call memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true, then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost a while. Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still present, it will be deleted in shutdown_cache() which will be called after flush_memcg_workqueue() is done and lock the slab_mutex again. 2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which is mergeable with A(their size is same), it gain the lock of slab_mutex, then call __kmem_cache_alias() trying to find a mergeable node, because of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy"), B is not mergeable with A whose memcg_params.dying is true. int slab_unmergeable(struct kmem_cache *s) if (s->refcount < 0) return 1; /* * Skip the dying kmem_cache. */ if (s->memcg_params.dying) return 1; return 0; } So B has to create its own sysfs node by calling: create_cache-> __kmem_cache_create-> sysfs_slab_add-> kobject_init_and_add Because B is mergeable itself, its filename of sysfs node is based on its size, like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs node of A is still present now, so kobject_init_and_add() will return fail and result in kmem_cache_create() fail. Concurrently modprobe and rmmod the two modules below can reproduce the issue quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end. LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after v5.9 do not have this problem because the patchset: ("The new cgroup slab memory controller") almost refactored memcg slab. A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable, the slab_mutex lock can prevent the race between alias kmem_cache creating thread and root kmem_cache destroying thread. In the destroying thread, after flush_memcg_workqueue() is done, judge the refcount again, if someone reference it again during un-lock time, we don't need to destroy the kmem_cache completely, we can reuse it. Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy"), compare to the fail of kmem_cache_create, the memory leak in special scenario seems less harmful. Call trace: sysfs: cannot create duplicate filename '/kernel/slab/:0000248' Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xb0/0x100 sysfs_warn_dup+0x6c/0x88 sysfs_create_dir_ns+0x104/0x120 kobject_add_internal+0xd0/0x378 kobject_init_and_add+0x90/0xd8 sysfs_slab_add+0x16c/0x2d0 __kmem_cache_create+0x16c/0x1d8 create_cache+0xbc/0x1f8 kmem_cache_create_usercopy+0x1a0/0x230 kmem_cache_create+0x50/0x68 init_se_kmem_caches+0x38/0x258 [target_core_mod] target_core_init_configfs+0x8c/0x390 [target_core_mod] do_one_initcall+0x54/0x230 do_init_module+0x64/0x1ec load_module+0x150c/0x16f0 __se_sys_finit_module+0xf0/0x108 __arm64_sys_finit_module+0x24/0x30 el0_svc_common+0x80/0x1c0 el0_svc_handler+0x78/0xe0 el0_svc+0x10/0x260 kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory. kmem_cache_create(se_sess_cache) failed with error -17 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xb0/0x100 kmem_cache_create_usercopy+0xa8/0x230 kmem_cache_create+0x50/0x68 init_se_kmem_caches+0x38/0x258 [target_core_mod] target_core_init_configfs+0x8c/0x390 [target_core_mod] do_one_initcall+0x54/0x230 do_init_module+0x64/0x1ec load_module+0x150c/0x16f0 __se_sys_finit_module+0xf0/0x108 __arm64_sys_finit_module+0x24/0x30 el0_svc_common+0x80/0x1c0 el0_svc_handler+0x78/0xe0 el0_svc+0x10/0x260 Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy") Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> Cc: stable(a)vger.kernel.org Reviewed-by: tong tiangen <tongtiangen(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/slab_common.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index d208b47e01a8e..acc743315bb5c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -326,14 +326,6 @@ int slab_unmergeable(struct kmem_cache *s) if (s->refcount < 0) return 1; -#ifdef CONFIG_MEMCG_KMEM - /* - * Skip the dying kmem_cache. - */ - if (s->memcg_params.dying) - return 1; -#endif - return 0; } @@ -947,6 +939,16 @@ void kmem_cache_destroy(struct kmem_cache *s) get_online_mems(); mutex_lock(&slab_mutex); + + /* + *Another thread referenced it again + */ + if (READ_ONCE(s->refcount)) { + spin_lock_irq(&memcg_kmem_wq_lock); + s->memcg_params.dying = false; + spin_unlock_irq(&memcg_kmem_wq_lock); + goto out_unlock; + } #endif err = shutdown_memcg_caches(s); -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] ARM: ensure the signal page contains defined contents
by Yang Yingliang 21 Jul '21

21 Jul '21

From: Russell King <rmk+kernel(a)armlinux.org.uk> stable inclusion from linux-4.19.177 commit 80ef523d2cb719c3de66787e922a96b5099d2fbb CVE: CVE-2021-21781 -------------------------------- [ Upstream commit 9c698bff66ab4914bb3d71da7dc6112519bde23e ] Ensure that the signal page contains our poison instruction to increase the protection against ROP attacks and also contains well defined contents. Acked-by: Will Deacon <will(a)kernel.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm/kernel/signal.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 76bb8de6bf6b6..dfe24883cc93a 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -697,18 +697,20 @@ struct page *get_signal_page(void) addr = page_address(page); + /* Poison the entire page */ + memset32(addr, __opcode_to_mem_arm(0xe7fddef1), + PAGE_SIZE / sizeof(u32)); + /* Give the signal return code some randomness */ offset = 0x200 + (get_random_int() & 0x7fc); signal_return_offset = offset; - /* - * Copy signal return handlers into the vector page, and - * set sigreturn to be a pointer to these. - */ + /* Copy signal return handlers into the page */ memcpy(addr + offset, sigreturn_codes, sizeof(sigreturn_codes)); - ptr = (unsigned long)addr + offset; - flush_icache_range(ptr, ptr + sizeof(sigreturn_codes)); + /* Flush out all instructions in this page */ + ptr = (unsigned long)addr; + flush_icache_range(ptr, ptr + PAGE_SIZE); return page; } -- 2.25.1

1 0

[PATCH kernel-4.19 01/11] nvme-pci: Fix controller freeze wait disabling
by Yang Yingliang 21 Jul '21

21 Jul '21

From: Keith Busch <keith.busch(a)intel.com> mainline inclusion from mainline-5.2-rc2 commit e43269e6e5c49d7fec599e6bba71963935b0e4ba category: bugfix bugzilla: 167363 CVE: NA --------------------------- If a controller disabling didn't start a freeze, don't wait for the operation to complete. Reviewed-by: Ming Lei <ming.lei(a)redhat.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Keith Busch <keith.busch(a)intel.com> Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: Hou Tao <houtao1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/nvme/host/pci.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d295594da64d7..78ce02e476aad 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2185,7 +2185,7 @@ static void nvme_pci_disable(struct nvme_dev *dev) static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) { - bool dead = true; + bool dead = true, freeze = false; struct pci_dev *pdev = to_pci_dev(dev->dev); mutex_lock(&dev->shutdown_lock); @@ -2193,8 +2193,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) u32 csts = readl(dev->bar + NVME_REG_CSTS); if (dev->ctrl.state == NVME_CTRL_LIVE || - dev->ctrl.state == NVME_CTRL_RESETTING) + dev->ctrl.state == NVME_CTRL_RESETTING) { + freeze = true; nvme_start_freeze(&dev->ctrl); + } dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) || pdev->error_state != pci_channel_io_normal); } @@ -2203,10 +2205,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) * Give the controller a chance to complete all entered requests if * doing a safe shutdown. */ - if (!dead) { - if (shutdown) - nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); - } + if (!dead && shutdown && freeze) + nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); nvme_stop_queues(&dev->ctrl); -- 2.25.1

1 10

[PATCH kernel-4.19] block: error out if blk_get_queue() failed in blk_init_rl()
by Yang Yingliang 20 Jul '21

20 Jul '21

From: Yu Kuai <yukuai3(a)huawei.com> hulk inclusion category: bugfix bugzilla: 174635 CVE: NA ----------------------------------------------- If blkg creation concurrent with device removal, queue ref might be unbalanced: t1 t2 t3 cgroup_file_write blkg_conf_prep q = disk->queue dm_ctl_ioctl dev_remove blk_cleanup_queue queue_flag_set(QUEUE_FLAG_DYING, q) blkg_alloc blk_init_cl blk_get_queue -> failed blk_exit_queue blkcg_exit_queue blkg_destroy_all blkg_destroy call_rcu(&blkg->rcu_head, __blkg_release_rcu); __blkg_release_rcu blkg_free blk_exit_rl blk_put_queue -> extra put Thus error out if blk_get_queue() failed in blk_init_rl(), since there is no need to create blkg while queue is dying. Signed-off-by: Yu Kuai <yukuai3(a)huawei.com> Reviewed-by: Hou Tao <houtao1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- block/blk-core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 93970be37c17c..4b63336465306 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -888,8 +888,11 @@ int blk_init_rl(struct request_list *rl, struct request_queue *q, if (!rl->rq_pool) return -ENOMEM; - if (rl != &q->root_rl) - WARN_ON_ONCE(!blk_get_queue(q)); + if (rl != &q->root_rl && !blk_get_queue(q)) { + mempool_destroy(rl->rq_pool); + rl->rq_pool = NULL; + return -ENODEV; + } return 0; } -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS 1/2] Revert "vt: Fix character height handling with VT_RESIZEX"
by Yang Yingliang 20 Jul '21

20 Jul '21

hulk inclusion category: bugfix bugzilla: NA CVE: NA -------------------------------- This reverts commit cb173fc207d51b6781dc059c84b3f3a4fa309b56 to avoid kabi broken. Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Jian Cheng <cj.chengjian(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/tty/vt/vt_ioctl.c | 6 ++--- drivers/video/console/vgacon.c | 44 +++++++++++++++++----------------- include/linux/console_struct.h | 1 - 3 files changed, 25 insertions(+), 26 deletions(-) diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c index ce6c7dd7bc126..2e959563af534 100644 --- a/drivers/tty/vt/vt_ioctl.c +++ b/drivers/tty/vt/vt_ioctl.c @@ -895,17 +895,17 @@ int vt_ioctl(struct tty_struct *tty, if (vcp) { int ret; int save_scan_lines = vcp->vc_scan_lines; - int save_cell_height = vcp->vc_cell_height; + int save_font_height = vcp->vc_font.height; if (v.v_vlin) vcp->vc_scan_lines = v.v_vlin; if (v.v_clin) - vcp->vc_cell_height = v.v_clin; + vcp->vc_font.height = v.v_clin; vcp->vc_resize_user = 1; ret = vc_resize(vcp, v.v_cols, v.v_rows); if (ret) { vcp->vc_scan_lines = save_scan_lines; - vcp->vc_cell_height = save_cell_height; + vcp->vc_font.height = save_font_height; console_unlock(); return ret; } diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c index a992d922b3a71..4509d0547087a 100644 --- a/drivers/video/console/vgacon.c +++ b/drivers/video/console/vgacon.c @@ -384,7 +384,7 @@ static void vgacon_init(struct vc_data *c, int init) vc_resize(c, vga_video_num_columns, vga_video_num_lines); c->vc_scan_lines = vga_scan_lines; - c->vc_font.height = c->vc_cell_height = vga_video_font_height; + c->vc_font.height = vga_video_font_height; c->vc_complement_mask = 0x7700; if (vga_512_chars) c->vc_hi_font_mask = 0x0800; @@ -517,32 +517,32 @@ static void vgacon_cursor(struct vc_data *c, int mode) switch (c->vc_cursor_type & 0x0f) { case CUR_UNDERLINE: vgacon_set_cursor_size(c->vc_x, - c->vc_cell_height - - (c->vc_cell_height < + c->vc_font.height - + (c->vc_font.height < 10 ? 2 : 3), - c->vc_cell_height - - (c->vc_cell_height < + c->vc_font.height - + (c->vc_font.height < 10 ? 1 : 2)); break; case CUR_TWO_THIRDS: vgacon_set_cursor_size(c->vc_x, - c->vc_cell_height / 3, - c->vc_cell_height - - (c->vc_cell_height < + c->vc_font.height / 3, + c->vc_font.height - + (c->vc_font.height < 10 ? 1 : 2)); break; case CUR_LOWER_THIRD: vgacon_set_cursor_size(c->vc_x, - (c->vc_cell_height * 2) / 3, - c->vc_cell_height - - (c->vc_cell_height < + (c->vc_font.height * 2) / 3, + c->vc_font.height - + (c->vc_font.height < 10 ? 1 : 2)); break; case CUR_LOWER_HALF: vgacon_set_cursor_size(c->vc_x, - c->vc_cell_height / 2, - c->vc_cell_height - - (c->vc_cell_height < + c->vc_font.height / 2, + c->vc_font.height - + (c->vc_font.height < 10 ? 1 : 2)); break; case CUR_NONE: @@ -553,7 +553,7 @@ static void vgacon_cursor(struct vc_data *c, int mode) break; default: vgacon_set_cursor_size(c->vc_x, 1, - c->vc_cell_height); + c->vc_font.height); break; } break; @@ -564,13 +564,13 @@ static int vgacon_doresize(struct vc_data *c, unsigned int width, unsigned int height) { unsigned long flags; - unsigned int scanlines = height * c->vc_cell_height; + unsigned int scanlines = height * c->vc_font.height; u8 scanlines_lo = 0, r7 = 0, vsync_end = 0, mode, max_scan; raw_spin_lock_irqsave(&vga_lock, flags); vgacon_xres = width * VGA_FONTWIDTH; - vgacon_yres = height * c->vc_cell_height; + vgacon_yres = height * c->vc_font.height; if (vga_video_type >= VIDEO_TYPE_VGAC) { outb_p(VGA_CRTC_MAX_SCAN, vga_video_port_reg); max_scan = inb_p(vga_video_port_val); @@ -625,9 +625,9 @@ static int vgacon_doresize(struct vc_data *c, static int vgacon_switch(struct vc_data *c) { int x = c->vc_cols * VGA_FONTWIDTH; - int y = c->vc_rows * c->vc_cell_height; + int y = c->vc_rows * c->vc_font.height; int rows = screen_info.orig_video_lines * vga_default_font_height/ - c->vc_cell_height; + c->vc_font.height; /* * We need to save screen size here as it's the only way * we can spot the screen has been resized and we need to @@ -1058,7 +1058,7 @@ static int vgacon_adjust_height(struct vc_data *vc, unsigned fontheight) cursor_size_lastto = 0; c->vc_sw->con_cursor(c, CM_DRAW); } - c->vc_font.height = c->vc_cell_height = fontheight; + c->vc_font.height = fontheight; vc_resize(c, 0, rows); /* Adjust console size */ } } @@ -1113,12 +1113,12 @@ static int vgacon_resize(struct vc_data *c, unsigned int width, */ screen_info.orig_video_cols = width; screen_info.orig_video_lines = height; - vga_default_font_height = c->vc_cell_height; + vga_default_font_height = c->vc_font.height; return 0; } if (width % 2 || width > screen_info.orig_video_cols || height > (screen_info.orig_video_lines * vga_default_font_height)/ - c->vc_cell_height) + c->vc_font.height) return -EINVAL; if (con_is_visible(c) && !vga_is_gfx) /* who knows */ diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h index 8b5bc3a47bf5f..fea64f2692a05 100644 --- a/include/linux/console_struct.h +++ b/include/linux/console_struct.h @@ -62,7 +62,6 @@ struct vc_data { unsigned int vc_rows; unsigned int vc_size_row; /* Bytes per row */ unsigned int vc_scan_lines; /* # of scan lines */ - unsigned int vc_cell_height; /* CRTC character cell height */ unsigned long vc_origin; /* [!] Start of real screen */ unsigned long vc_scr_end; /* [!] End of real screen */ unsigned long vc_visible_origin; /* [!] Top of visible window */ -- 2.25.1

1 1

[PATCH kernel-4.19] block: only call sched requeue_request() for scheduled requests
by Yang Yingliang 20 Jul '21

20 Jul '21

From: Omar Sandoval <osandov(a)fb.com> mainline inclusion from mainline-5.9-rc5 commit e8a8a185051a460e3eb0617dca33f996f4e31516 category: bugfix bugzilla: 172928 CVE: NA --------------------------- Yang Yang reported the following crash caused by requeueing a flush request in Kyber: [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 ... [ 2.517468] pc : clear_bit+0x18/0x2c [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 ... [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) [ 2.517602] Call trace: [ 2.517606] clear_bit+0x18/0x2c [ 2.517619] kyber_finish_request+0x74/0x80 [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 [ 2.517637] __scsi_queue_insert+0x11c/0x148 [ 2.517640] scsi_softirq_done+0x114/0x130 [ 2.517643] blk_done_softirq+0x7c/0xb0 [ 2.517651] __do_softirq+0x208/0x3bc [ 2.517657] run_ksoftirqd+0x34/0x60 [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 [ 2.517667] kthread+0x110/0x120 [ 2.517669] ret_from_fork+0x10/0x18 This happens because Kyber doesn't track flush requests, so kyber_finish_request() reads a garbage domain token. Only call the scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for the finish_request() hook in blk_mq_free_request()). Now that we're handling it in blk-mq, also remove the check from BFQ. Reported-by: Yang Yang <yang.yang(a)vivo.com> Signed-off-by: Omar Sandoval <osandov(a)fb.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflicts: block/blk-mq-sched.h [ Cleanup patch 67cae4c948("blk-mq: cleanup and improve list insertion") is not applied. ] Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: Jason Yan <yanaijie(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- block/bfq-iosched.c | 12 ------------ block/blk-mq-sched.h | 2 +- 2 files changed, 1 insertion(+), 13 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index a214c95f071ad..c15e2c112ea8a 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -4867,18 +4867,6 @@ static void bfq_finish_requeue_request(struct request *rq) struct bfq_queue *bfqq = RQ_BFQQ(rq); struct bfq_data *bfqd; - /* - * Requeue and finish hooks are invoked in blk-mq without - * checking whether the involved request is actually still - * referenced in the scheduler. To handle this fact, the - * following two checks make this function exit in case of - * spurious invocations, for which there is nothing to do. - * - * First, check whether rq has nothing to do with an elevator. - */ - if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) - return; - /* * rq either is not associated with any icq, or is an already * requeued request that has not (yet) been re-inserted into diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index fe660764b8d13..011445f7852bf 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -72,7 +72,7 @@ static inline void blk_mq_sched_requeue_request(struct request *rq) struct request_queue *q = rq->q; struct elevator_queue *e = q->elevator; - if (e && e->type->ops.mq.requeue_request) + if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.mq.requeue_request) e->type->ops.mq.requeue_request(rq); } -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] KVM: arm/arm64: Drop the WARN_ON() in probe_hisi_cpu_type()
by Zenghui Yu 19 Jul '21

19 Jul '21

virt inclusion category: bugfix bugzilla: NA CVE: NA ----------------------------------------------- I used WARN_ON(hi_cpu_type == UNKNOWN_HI_TYPE) to catch some probing issues about 2 years ago, when the HiSilicon servers are the only available arm64 boxes to me. However it's obvious that this can be hit if we're playing with other kinds of boards (or with L1 guests in NV), which ends up with a boot failure if warn_on_panic is set. Given that hi_cpu_type is only used in the probing of (HiSilicon specific) ncsnp (and there's already a hint for it in probe_hisi_ncsnp_support()), let's drop the unnecessary WARN_ON() and use a kvm_debug() instead. Reported-by: zou_wei(a)huawei.com Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com> --- virt/kvm/arm/hisi_cpu_model.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/hisi_cpu_model.c b/virt/kvm/arm/hisi_cpu_model.c index b0f7e0d7ef1c..6fd91b538100 100644 --- a/virt/kvm/arm/hisi_cpu_model.c +++ b/virt/kvm/arm/hisi_cpu_model.c @@ -78,7 +78,8 @@ void probe_hisi_cpu_type(void) else of_get_hw_cpu_type(); - WARN_ON(hi_cpu_type == UNKNOWN_HI_TYPE); + /* UNKNOWN_HI_TYPE is not fatal */ + kvm_debug("HiSilicon CPU type [%u] detected\n", hi_cpu_type); } #define NCSNP_MMIO_BASE 0x20107E238 -- 2.19.1

1 0

[PATCH openEuler-1.0-LTS 01/60] efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Heiner Kallweit <hkallweit1(a)gmail.com> stable inclusion from linux-4.19.194 commit 5fdb418b14e41b33880b949db64b18f1f448916b -------------------------------- [ Upstream commit 45add3cc99feaaf57d4b6f01d52d532c16a1caee ] UEFI spec 2.9, p.108, table 4-1 lists the scenario that both attributes are cleared with the description "No memory access protection is possible for Entry". So we can have valid entries where both attributes are cleared, so remove the check. Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com> Fixes: 10f0d2f577053 ("efi: Implement generic support for the Memory Attributes table") Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/firmware/efi/memattr.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c index aac972b056d91..e0889922cc6d7 100644 --- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -69,11 +69,6 @@ static bool entry_is_valid(const efi_memory_desc_t *in, efi_memory_desc_t *out) return false; } - if (!(in->attribute & (EFI_MEMORY_RO | EFI_MEMORY_XP))) { - pr_warn("Entry attributes invalid: RO and XP bits both cleared\n"); - return false; - } - if (PAGE_SIZE > EFI_PAGE_SIZE && (!PAGE_ALIGNED(in->phys_addr) || !PAGE_ALIGNED(in->num_pages << EFI_PAGE_SHIFT))) { -- 2.25.1

1 59

【Meeting Notice】openEuler kernel 技术分享第八期 & 双周例会 Time: 2021-07-23 14:00-18:00
by Meeting Book 19 Jul '21

19 Jul '21

1 0

[PATCH kernel-4.19] lib/clear_user: ensure loop in __arch_clear_user cache-aligned
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Cheng Jian <cj.chengjian(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I3OX0C CVE: #682 -------------------------------- We must ensure that the following four instructions are cache-aligned. Otherwise, it will cause problems with the performance of libMicro pread. 1: # uao_user_alternative 9f, str, sttr, xzr, x0, 8 str xzr, [x0], #8 nop subs x1, x1, #8 b.pl 1b with this patch: prc thr usecs/call samples errors cnt/samp size pread_z100 1 1 5.88400 807 0 1 102400 The result of pread can range from 5 to 9 depending on the alignment performance of this function. Signed-off-by: Cheng Jian <cj.chengjian(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/lib/clear_user.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S index 4374020c824a0..9ebc5d84e6154 100644 --- a/arch/arm64/lib/clear_user.S +++ b/arch/arm64/lib/clear_user.S @@ -20,7 +20,7 @@ #include <asm/asm-uaccess.h> .text - + .align 6 /* Prototype: int __arch_clear_user(void *addr, size_t sz) * Purpose : clear some user memory * Params : addr - user memory address to clear -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS 1/3] ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Ye Bin <yebin10(a)huawei.com> mainline inclusion from mainline-v5.14-rc1 commit 558d6450c7755aa005d89021204b6cdcae5e848f category: bugfix bugzilla: 109280 CVE: NA ----------------------------------------------- If a writeback of the superblock fails with an I/O error, the buffer is marked not uptodate. However, this can cause a WARN_ON to trigger when we attempt to write superblock a second time. (Which might succeed this time, for cerrtain types of block devices such as iSCSI devices over a flaky network.) Try to detect this case in flush_stashed_error_work(), and also change __ext4_handle_dirty_metadata() so we always set the uptodate flag, not just in the nojournal case. Before this commit, this problem can be repliciated via: 1. dmsetup create dust1 --table '0 2097152 dust /dev/sdc 0 4096' 2. mount /dev/mapper/dust1 /home/test 3. dmsetup message dust1 0 addbadblock 0 10 4. cd /home/test 5. echo "XXXXXXX" > t After a few seconds, we got following warning: [ 80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0 [ 80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata [ 91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0 [ 91.417038] ------------[ cut here ]------------ [ 91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e [ 91.440322] Call Trace: [ 91.440652] __jbd2_journal_temp_unlink_buffer+0x135/0x220 [ 91.441354] __jbd2_journal_unfile_buffer+0x24/0x90 [ 91.441981] __jbd2_journal_refile_buffer+0x134/0x1d0 [ 91.442628] jbd2_journal_commit_transaction+0x249a/0x3240 [ 91.443336] ? put_prev_entity+0x2a/0x200 [ 91.443856] ? kjournald2+0x12e/0x510 [ 91.444324] kjournald2+0x12e/0x510 [ 91.444773] ? woken_wake_function+0x30/0x30 [ 91.445326] kthread+0x150/0x1b0 [ 91.445739] ? commit_timeout+0x20/0x20 [ 91.446258] ? kthread_flush_worker+0xb0/0xb0 [ 91.446818] ret_from_fork+0x1f/0x30 [ 91.447293] ---[ end trace 66f0b6bf3d1abade ]--- Signed-off-by: Ye Bin <yebin10(a)huawei.com> Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Signed-off-by: Ye Bin <yebin10(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/super.c | 12 ++++++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index fd7c41da1f8f9..022ab8afd9499 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -304,6 +304,7 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, set_buffer_meta(bh); set_buffer_prio(bh); + set_buffer_uptodate(bh); if (ext4_handle_valid(handle)) { err = jbd2_journal_dirty_metadata(handle, bh); /* Errors can only happen due to aborted journal or a nasty bug */ @@ -332,7 +333,6 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, err); } } else { - set_buffer_uptodate(bh); if (inode) mark_buffer_dirty_inode(bh, inode); else diff --git a/fs/ext4/super.c b/fs/ext4/super.c index fd67f3037fb8e..dd96627ec6ae4 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -695,15 +695,23 @@ static void flush_stashed_error_work(struct work_struct *work) * ext4 error handling code during handling of previous errors. */ if (!sb_rdonly(sbi->s_sb) && journal) { + struct buffer_head *sbh = sbi->s_sbh; handle = jbd2_journal_start(journal, 1); if (IS_ERR(handle)) goto write_directly; - if (jbd2_journal_get_write_access(handle, sbi->s_sbh)) { + if (jbd2_journal_get_write_access(handle, sbh)) { jbd2_journal_stop(handle); goto write_directly; } ext4_update_super(sbi->s_sb); - if (jbd2_journal_dirty_metadata(handle, sbi->s_sbh)) { + if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { + ext4_msg(sbi->s_sb, KERN_ERR, "previous I/O error to " + "superblock detected"); + clear_buffer_write_io_error(sbh); + set_buffer_uptodate(sbh); + } + + if (jbd2_journal_dirty_metadata(handle, sbh)) { jbd2_journal_stop(handle); goto write_directly; } -- 2.25.1

1 2

[PATCH kernel-4.19 1/3] ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Ye Bin <yebin10(a)huawei.com> mainline inclusion from mainline-v5.14-rc1 commit 558d6450c7755aa005d89021204b6cdcae5e848f category: bugfix bugzilla: 109280 CVE: NA ----------------------------------------------- If a writeback of the superblock fails with an I/O error, the buffer is marked not uptodate. However, this can cause a WARN_ON to trigger when we attempt to write superblock a second time. (Which might succeed this time, for cerrtain types of block devices such as iSCSI devices over a flaky network.) Try to detect this case in flush_stashed_error_work(), and also change __ext4_handle_dirty_metadata() so we always set the uptodate flag, not just in the nojournal case. Before this commit, this problem can be repliciated via: 1. dmsetup create dust1 --table '0 2097152 dust /dev/sdc 0 4096' 2. mount /dev/mapper/dust1 /home/test 3. dmsetup message dust1 0 addbadblock 0 10 4. cd /home/test 5. echo "XXXXXXX" > t After a few seconds, we got following warning: [ 80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0 [ 80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata [ 91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0 [ 91.417038] ------------[ cut here ]------------ [ 91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e [ 91.440322] Call Trace: [ 91.440652] __jbd2_journal_temp_unlink_buffer+0x135/0x220 [ 91.441354] __jbd2_journal_unfile_buffer+0x24/0x90 [ 91.441981] __jbd2_journal_refile_buffer+0x134/0x1d0 [ 91.442628] jbd2_journal_commit_transaction+0x249a/0x3240 [ 91.443336] ? put_prev_entity+0x2a/0x200 [ 91.443856] ? kjournald2+0x12e/0x510 [ 91.444324] kjournald2+0x12e/0x510 [ 91.444773] ? woken_wake_function+0x30/0x30 [ 91.445326] kthread+0x150/0x1b0 [ 91.445739] ? commit_timeout+0x20/0x20 [ 91.446258] ? kthread_flush_worker+0xb0/0xb0 [ 91.446818] ret_from_fork+0x1f/0x30 [ 91.447293] ---[ end trace 66f0b6bf3d1abade ]--- Signed-off-by: Ye Bin <yebin10(a)huawei.com> Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Signed-off-by: Ye Bin <yebin10(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/super.c | 12 ++++++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index fd7c41da1f8f9..022ab8afd9499 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -304,6 +304,7 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, set_buffer_meta(bh); set_buffer_prio(bh); + set_buffer_uptodate(bh); if (ext4_handle_valid(handle)) { err = jbd2_journal_dirty_metadata(handle, bh); /* Errors can only happen due to aborted journal or a nasty bug */ @@ -332,7 +333,6 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, err); } } else { - set_buffer_uptodate(bh); if (inode) mark_buffer_dirty_inode(bh, inode); else diff --git a/fs/ext4/super.c b/fs/ext4/super.c index fd67f3037fb8e..dd96627ec6ae4 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -695,15 +695,23 @@ static void flush_stashed_error_work(struct work_struct *work) * ext4 error handling code during handling of previous errors. */ if (!sb_rdonly(sbi->s_sb) && journal) { + struct buffer_head *sbh = sbi->s_sbh; handle = jbd2_journal_start(journal, 1); if (IS_ERR(handle)) goto write_directly; - if (jbd2_journal_get_write_access(handle, sbi->s_sbh)) { + if (jbd2_journal_get_write_access(handle, sbh)) { jbd2_journal_stop(handle); goto write_directly; } ext4_update_super(sbi->s_sb); - if (jbd2_journal_dirty_metadata(handle, sbi->s_sbh)) { + if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { + ext4_msg(sbi->s_sb, KERN_ERR, "previous I/O error to " + "superblock detected"); + clear_buffer_write_io_error(sbh); + set_buffer_uptodate(sbh); + } + + if (jbd2_journal_dirty_metadata(handle, sbh)) { jbd2_journal_stop(handle); goto write_directly; } -- 2.25.1

1 2

[PATCH kernel-4.19] arm64/config: Set CONFIG_TXGBE=m by default
by Yang Yingliang 19 Jul '21

19 Jul '21

From: zhenpengzheng <zhenpengzheng(a)net-swift.com> hulk inclusion category: feature bugzilla: 50777 CVE: NA ------------------------------------------------------------------------- Ensure the netswift 10G NIC driver ko can be distributed in ISO on X86. Signed-off-by: zhenpengzheng <zhenpengzheng(a)net-swift.com> Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com> Reviewed-by: Jian Cheng <cj.chengjian(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 1e1f7f3201028..8cfe2c3cd7dde 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -2489,6 +2489,8 @@ CONFIG_HNS3_CAE=m CONFIG_NET_VENDOR_HUAWEI=y CONFIG_HINIC=m CONFIG_BMA=m +CONFIG_NET_VENDOR_NETSWIFT=y +CONFIG_TXGBE=m # CONFIG_NET_VENDOR_I825XX is not set CONFIG_NET_VENDOR_INTEL=y # CONFIG_E100 is not set -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS 1/4] Add traffic policy for low cache available.
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Xu Wei <xuwei56(a)huawei.com> euleros inclusion category: feature bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327 CVE: NA When cache available is low, bcache turn to writethrough mode. Therefore, All write IO will be directly sent to backend device, which is usually HDD. At same time, cache device flush dirty data to the backend device in the bcache writeback process. So write IO from user will damage the sequentiality of writeback. And if there is lots of IO from writeback, user's write IO may be block. This patch add traffic policy in bcache to solve the problem and improve the performance for bcache when cache available is low. Signed-off-by: qinghaixiang <xuweiqhx(a)163.com> Signed-off-by: Xu Wei <xuwei56(a)huawei.com> Acked-by: Xie XiuQi <xiexiuqi(a)huawei.com> Reviewed-by: Li Ruilin <liruilin4(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/md/bcache/bcache.h | 49 ++++++++++++ drivers/md/bcache/btree.h | 6 +- drivers/md/bcache/request.c | 143 +++++++++++++++++++++++++++++++++- drivers/md/bcache/request.h | 2 + drivers/md/bcache/super.c | 35 +++++++++ drivers/md/bcache/sysfs.c | 56 +++++++++++++ drivers/md/bcache/writeback.c | 11 ++- drivers/md/bcache/writeback.h | 6 +- 8 files changed, 300 insertions(+), 8 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 99d12fce876b2..70fbde8ca70c9 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -399,6 +399,28 @@ struct cached_dev { unsigned int offline_seconds; char backing_dev_name[BDEVNAME_SIZE]; + + /* Count the front and writeback io bandwidth per second */ + atomic_t writeback_sector_size; + atomic_t writeback_io_num; + atomic_t front_io_num; + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + struct timer_list io_stat_timer; + + unsigned int writeback_state; +#define WRITEBACK_DEFAULT 0 +#define WRITEBACK_QUICK 1 +#define WRITEBACK_SLOW 2 + + /* realize for token bucket */ + spinlock_t token_lock; + unsigned int max_sector_size; + unsigned int max_io_num; + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + struct timer_list token_assign_timer; }; enum alloc_reserve { @@ -717,6 +739,10 @@ struct cache_set { #define BUCKET_HASH_BITS 12 struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; + unsigned int cutoff_writeback_sync; + bool traffic_policy_start; + bool force_write_through; + unsigned int gc_sectors; }; struct bbio { @@ -732,6 +758,29 @@ struct bbio { struct bio bio; }; +struct get_bcache_status { + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + uint64_t dirty_rate; + unsigned int available; +}; + +struct set_bcache_status { + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + bool traffic_policy_start; + bool force_write_through; + bool copy_gc_enabled; + bool trigger_gc; + unsigned int writeback_state; + unsigned int gc_sectors; + unsigned int cutoff_writeback_sync; +}; +#define BCACHE_MAJOR 'B' +#define BCACHE_GET_WRITE_STATUS _IOR(BCACHE_MAJOR, 0x0, struct get_bcache_status) +#define BCACHE_SET_WRITE_STATUS _IOW(BCACHE_MAJOR, 0x1, struct set_bcache_status) + #define BTREE_PRIO USHRT_MAX #define INITIAL_PRIO 32768U diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index 4d0cca145f699..7ddadcc485ea6 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -193,7 +193,11 @@ static inline unsigned int bset_block_offset(struct btree *b, struct bset *i) static inline void set_gc_sectors(struct cache_set *c) { - atomic_set(&c->sectors_to_gc, c->sb.bucket_size * c->nbuckets / 16); + if (c->gc_sectors == 0) + atomic_set(&c->sectors_to_gc, + c->sb.bucket_size * c->nbuckets / 16); + else + atomic_set(&c->sectors_to_gc, c->gc_sectors); } void bkey_put(struct cache_set *c, struct bkey *k); diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 6d89e56a4a410..c05544e07722e 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -28,6 +28,7 @@ struct kmem_cache *bch_search_cache; static void bch_data_insert_start(struct closure *cl); +static void alloc_token(struct cached_dev *dc, unsigned int sectors); static unsigned int cache_mode(struct cached_dev *dc) { @@ -396,7 +397,8 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) goto skip; if (mode == CACHE_MODE_NONE || - (mode == CACHE_MODE_WRITEAROUND && + ((mode == CACHE_MODE_WRITEAROUND || + c->force_write_through == true) && op_is_write(bio_op(bio)))) goto skip; @@ -858,6 +860,10 @@ static void cached_dev_read_done(struct closure *cl) if (s->iop.bio && (!dc->read_bypass || s->prefetch) && !test_bit(CACHE_SET_STOPPING, &s->iop.c->flags)) { BUG_ON(!s->iop.replace); + if ((dc->disk.c->traffic_policy_start == true) && + (dc->disk.c->force_write_through != true)) { + alloc_token(dc, bio_sectors(s->iop.bio)); + } closure_call(&s->iop.cl, bch_data_insert, NULL, cl); } @@ -1000,6 +1006,35 @@ static void cached_dev_write_complete(struct closure *cl) continue_at(cl, cached_dev_bio_complete, NULL); } +static void alloc_token(struct cached_dev *dc, unsigned int sectors) +{ + int count = 0; + + spin_lock_bh(&dc->token_lock); + + while ((dc->write_token_sector_size < sectors) && + (dc->write_token_io_num == 0)) { + spin_unlock_bh(&dc->token_lock); + schedule_timeout_interruptible(msecs_to_jiffies(10)); + count++; + if ((dc->disk.c->traffic_policy_start != true) || + (cache_mode(dc) != CACHE_MODE_WRITEBACK) || + (count > 100)) + return; + spin_lock_bh(&dc->token_lock); + } + + if (dc->write_token_sector_size >= sectors) + dc->write_token_sector_size -= sectors; + else + dc->write_token_sector_size = 0; + + if (dc->write_token_io_num > 0) + dc->write_token_io_num--; + + spin_unlock_bh(&dc->token_lock); +} + static void cached_dev_write(struct cached_dev *dc, struct search *s) { struct closure *cl = &s->cl; @@ -1247,6 +1282,7 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, cached_dev_nodata, bcache_wq); } else { + atomic_inc(&dc->front_io_num); s->iop.bypass = check_should_bypass(dc, bio); if (!s->iop.bypass && bio->bi_iter.bi_size && !rw) { @@ -1258,10 +1294,17 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, save_circ_item(&s->smp); } - if (rw) + if (rw) { + if ((s->iop.bypass == false) && + (dc->disk.c->traffic_policy_start == true) && + (cache_mode(dc) == CACHE_MODE_WRITEBACK) && + (bio_op(bio) != REQ_OP_DISCARD)) { + alloc_token(dc, bio_sectors(bio)); + } cached_dev_write(dc, s); - else + } else { cached_dev_read(dc, s); + } } } else /* I/O request sent to backing device */ @@ -1270,6 +1313,65 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, return BLK_QC_T_NONE; } +static int bcache_get_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct get_bcache_status a; + uint64_t cache_sectors; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + + a.writeback_sector_size_per_sec = dc->writeback_sector_size_per_sec; + a.writeback_io_num_per_sec = dc->writeback_io_num_per_sec; + a.front_io_num_per_sec = dc->front_io_num_per_sec; + cache_sectors = c->nbuckets * c->sb.bucket_size - + atomic_long_read(&c->flash_dev_dirty_sectors); + a.dirty_rate = div64_u64(bcache_dev_sectors_dirty(&dc->disk) * 100, + cache_sectors); + a.available = 100 - c->gc_stats.in_use; + if (copy_to_user((struct get_bcache_status *)arg, &a, + sizeof(struct get_bcache_status))) + return -EFAULT; + return 0; +} + +static int bcache_set_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct set_bcache_status a; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + if (copy_from_user(&a, (struct set_bcache_status *)arg, + sizeof(struct set_bcache_status))) + return -EFAULT; + + if (c->traffic_policy_start != a.traffic_policy_start) + pr_info("%s traffic policy %s", dc->disk.disk->disk_name, + (a.traffic_policy_start == true) ? "enable" : "disable"); + if (c->force_write_through != a.force_write_through) + pr_info("%s force write through %s", dc->disk.disk->disk_name, + (a.force_write_through == true) ? "enable" : "disable"); + if (a.trigger_gc) { + pr_info("trigger %s gc", dc->disk.disk->disk_name); + atomic_set(&c->sectors_to_gc, -1); + wake_up_gc(c); + } + if ((a.cutoff_writeback_sync >= MIN_CUTOFF_WRITEBACK_SYNC) && + (a.cutoff_writeback_sync <= MAX_CUTOFF_WRITEBACK_SYNC)) { + c->cutoff_writeback_sync = a.cutoff_writeback_sync; + } + + dc->max_sector_size = a.write_token_sector_size; + dc->max_io_num = a.write_token_io_num; + c->traffic_policy_start = a.traffic_policy_start; + c->force_write_through = a.force_write_through; + c->gc_sectors = a.gc_sectors; + dc->writeback_state = a.writeback_state; + return 0; +} + static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, unsigned int cmd, unsigned long arg) { @@ -1278,7 +1380,14 @@ static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, if (dc->io_disable) return -EIO; - return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + switch (cmd) { + case BCACHE_GET_WRITE_STATUS: + return bcache_get_write_status(dc, arg); + case BCACHE_SET_WRITE_STATUS: + return bcache_set_write_status(dc, arg); + default: + return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + } } static int cached_dev_congested(void *data, int bits) @@ -1438,3 +1547,29 @@ int __init bch_request_init(void) return 0; } + +static void token_assign(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, token_assign_timer); + + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); + + spin_lock(&dc->token_lock); + dc->write_token_sector_size = dc->max_sector_size / 8; + dc->write_token_io_num = dc->max_io_num / 8; + dc->write_token_io_num = + (dc->write_token_io_num == 0) ? 1 : dc->write_token_io_num; + spin_unlock(&dc->token_lock); +} + +void bch_traffic_policy_init(struct cached_dev *dc) +{ + spin_lock_init(&dc->token_lock); + dc->write_token_sector_size = 0; + dc->write_token_io_num = 0; + + timer_setup(&dc->token_assign_timer, token_assign, 0); + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); +} diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h index 3667bc5390dfe..f677ba8704940 100644 --- a/drivers/md/bcache/request.h +++ b/drivers/md/bcache/request.h @@ -41,6 +41,8 @@ void bch_data_insert(struct closure *cl); void bch_cached_dev_request_init(struct cached_dev *dc); void bch_flash_dev_request_init(struct bcache_device *d); +void bch_traffic_policy_init(struct cached_dev *dc); + extern struct kmem_cache *bch_search_cache, *bch_passthrough_cache; struct search { diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index e7f7a0f038682..3f858de9e9602 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1210,6 +1210,8 @@ static void cached_dev_free(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); + del_timer_sync(&dc->io_stat_timer); + del_timer_sync(&dc->token_assign_timer); if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) cancel_writeback_rate_update_dwork(dc); @@ -1250,6 +1252,36 @@ static void cached_dev_flush(struct closure *cl) continue_at(cl, cached_dev_free, system_wq); } +static void cached_dev_io_stat(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, io_stat_timer); + + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); + + dc->writeback_sector_size_per_sec = + atomic_read(&dc->writeback_sector_size); + dc->writeback_io_num_per_sec = atomic_read(&dc->writeback_io_num); + dc->front_io_num_per_sec = atomic_read(&dc->front_io_num); + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); +} + +static void cached_dev_timer_init(struct cached_dev *dc) +{ + dc->writeback_sector_size_per_sec = 0; + dc->writeback_io_num_per_sec = 0; + dc->front_io_num_per_sec = 0; + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); + + timer_setup(&dc->io_stat_timer, cached_dev_io_stat, 0); + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); +} + static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) { int ret; @@ -1266,6 +1298,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) INIT_LIST_HEAD(&dc->io_lru); spin_lock_init(&dc->io_lock); bch_cache_accounting_init(&dc->accounting, &dc->disk.cl); + cached_dev_timer_init(dc); + bch_traffic_policy_init(dc); dc->sequential_cutoff = 4 << 20; dc->inflight_block_enable = 1; @@ -1774,6 +1808,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->congested_read_threshold_us = 2000; c->congested_write_threshold_us = 20000; c->error_limit = DEFAULT_IO_ERROR_LIMIT; + c->cutoff_writeback_sync = MIN_CUTOFF_WRITEBACK_SYNC; WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags)); return c; diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 706d3a245dba6..4c693ac29b0e0 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -51,6 +51,13 @@ static const char * const error_actions[] = { NULL }; +static const char * const writeback_state[] = { + "default", + "quick", + "slow", + NULL +}; + write_attribute(attach); write_attribute(detach); write_attribute(unregister); @@ -96,6 +103,9 @@ read_attribute(io_errors); read_attribute(congested); rw_attribute(congested_read_threshold_us); rw_attribute(congested_write_threshold_us); +rw_attribute(gc_sectors); +rw_attribute(traffic_policy_start); +rw_attribute(force_write_through); rw_attribute(sequential_cutoff); rw_attribute(read_bypass); @@ -114,7 +124,13 @@ rw_attribute(writeback_rate_update_seconds); rw_attribute(writeback_rate_i_term_inverse); rw_attribute(writeback_rate_p_term_inverse); rw_attribute(writeback_rate_minimum); +rw_attribute(writeback_state); +read_attribute(writeback_sector_size_per_sec); +read_attribute(writeback_io_num_per_sec); +read_attribute(front_io_num_per_sec); read_attribute(writeback_rate_debug); +read_attribute(write_token_sector_size); +read_attribute(write_token_io_num); read_attribute(stripe_size); read_attribute(partial_stripes_expensive); @@ -169,6 +185,11 @@ SHOW(__bch_cached_dev) bch_cache_modes, BDEV_CACHE_MODE(&dc->sb)); + if (attr == &sysfs_writeback_state) + return bch_snprint_string_list(buf, PAGE_SIZE, + writeback_state, + dc->writeback_state); + if (attr == &sysfs_readahead_cache_policy) return bch_snprint_string_list(buf, PAGE_SIZE, bch_reada_cache_policies, @@ -186,6 +207,9 @@ SHOW(__bch_cached_dev) var_printf(writeback_metadata, "%i"); var_printf(writeback_running, "%i"); var_print(writeback_delay); + var_print(writeback_sector_size_per_sec); + var_print(writeback_io_num_per_sec); + var_print(front_io_num_per_sec); var_print(writeback_percent); sysfs_hprint(writeback_rate, wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); @@ -248,6 +272,8 @@ SHOW(__bch_cached_dev) sysfs_print(running, atomic_read(&dc->running)); sysfs_print(state, states[BDEV_STATE(&dc->sb)]); + var_print(write_token_sector_size); + var_print(write_token_io_num); if (attr == &sysfs_label) { memcpy(buf, dc->sb.label, SB_LABEL_SIZE); @@ -346,6 +372,15 @@ STORE(__cached_dev) } } + if (attr == &sysfs_writeback_state) { + v = __sysfs_match_string(writeback_state, -1, buf); + + if (v < 0) + return v; + + dc->writeback_state = v; + } + if (attr == &sysfs_readahead_cache_policy) { v = __sysfs_match_string(bch_reada_cache_policies, -1, buf); if (v < 0) @@ -448,11 +483,14 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_data_csum, #endif &sysfs_cache_mode, + &sysfs_writeback_state, &sysfs_readahead_cache_policy, &sysfs_stop_when_cache_set_failed, &sysfs_writeback_metadata, &sysfs_writeback_running, &sysfs_writeback_delay, + &sysfs_writeback_sector_size_per_sec, + &sysfs_writeback_io_num_per_sec, &sysfs_writeback_percent, &sysfs_writeback_rate, &sysfs_writeback_rate_update_seconds, @@ -460,6 +498,9 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_minimum, &sysfs_writeback_rate_debug, + &sysfs_write_token_sector_size, + &sysfs_write_token_io_num, + &sysfs_front_io_num_per_sec, &sysfs_io_errors, &sysfs_io_error_limit, &sysfs_io_disable, @@ -714,6 +755,12 @@ SHOW(__bch_cache_set) c->congested_read_threshold_us); sysfs_print(congested_write_threshold_us, c->congested_write_threshold_us); + sysfs_print(gc_sectors, + c->gc_sectors); + sysfs_print(traffic_policy_start, + c->traffic_policy_start); + sysfs_print(force_write_through, + c->force_write_through); sysfs_print(active_journal_entries, fifo_used(&c->journal.pin)); sysfs_printf(verify, "%i", c->verify); @@ -800,6 +847,12 @@ STORE(__bch_cache_set) c->congested_read_threshold_us); sysfs_strtoul(congested_write_threshold_us, c->congested_write_threshold_us); + sysfs_strtoul(gc_sectors, + c->gc_sectors); + sysfs_strtoul(traffic_policy_start, + c->traffic_policy_start); + sysfs_strtoul(force_write_through, + c->force_write_through); if (attr == &sysfs_errors) { v = __sysfs_match_string(error_actions, -1, buf); @@ -926,6 +979,9 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_btree_shrinker_disabled, &sysfs_copy_gc_enabled, &sysfs_io_disable, + &sysfs_gc_sectors, + &sysfs_traffic_policy_start, + &sysfs_force_write_through, NULL }; KTYPE(bch_cache_set_internal); diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index b5fc3c6c7178e..901ad8bae7614 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -222,7 +222,13 @@ static unsigned int writeback_delay(struct cached_dev *dc, !dc->writeback_percent) return 0; - return bch_next_delay(&dc->writeback_rate, sectors); + if (dc->writeback_state == WRITEBACK_DEFAULT) { + return bch_next_delay(&dc->writeback_rate, sectors); + } else if (dc->writeback_state == WRITEBACK_QUICK) { + return 0; + } else { + return msecs_to_jiffies(1000); + } } struct dirty_io { @@ -287,6 +293,9 @@ static void write_dirty_finish(struct closure *cl) : &dc->disk.c->writeback_keys_done); } + atomic_add(KEY_SIZE(&w->key), &dc->writeback_sector_size); + atomic_inc(&dc->writeback_io_num); + bch_keybuf_del(&dc->writeback_keys, w); up(&dc->in_flight); diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index e75dc33339f6f..a3151c0e96609 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -3,7 +3,8 @@ #define _BCACHE_WRITEBACK_H #define CUTOFF_WRITEBACK 40 -#define CUTOFF_WRITEBACK_SYNC 70 +#define MIN_CUTOFF_WRITEBACK_SYNC 70 +#define MAX_CUTOFF_WRITEBACK_SYNC 90 #define MAX_WRITEBACKS_IN_PASS 5 #define MAX_WRITESIZE_IN_PASS 5000 /* *512b */ @@ -57,10 +58,11 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio, unsigned int cache_mode, bool would_skip) { unsigned int in_use = dc->disk.c->gc_stats.in_use; + unsigned int cutoff = dc->disk.c->cutoff_writeback_sync; if (cache_mode != CACHE_MODE_WRITEBACK || test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || - in_use > CUTOFF_WRITEBACK_SYNC) + in_use > cutoff) return false; if (bio_op(bio) == REQ_OP_DISCARD) -- 2.25.1

1 3

[PATCH kernel-4.19 1/4] Add traffic policy for low cache available.
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Xu Wei <xuwei56(a)huawei.com> euleros inclusion category: feature bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327 CVE: NA When cache available is low, bcache turn to writethrough mode. Therefore, All write IO will be directly sent to backend device, which is usually HDD. At same time, cache device flush dirty data to the backend device in the bcache writeback process. So write IO from user will damage the sequentiality of writeback. And if there is lots of IO from writeback, user's write IO may be block. This patch add traffic policy in bcache to solve the problem and improve the performance for bcache when cache available is low. Signed-off-by: qinghaixiang <xuweiqhx(a)163.com> Signed-off-by: Xu Wei <xuwei56(a)huawei.com> Acked-by: Xie XiuQi <xiexiuqi(a)huawei.com> Reviewed-by: Li Ruilin <liruilin4(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/md/bcache/bcache.h | 49 ++++++++++++ drivers/md/bcache/btree.h | 6 +- drivers/md/bcache/request.c | 145 ++++++++++++++++++++++++++++++++-- drivers/md/bcache/request.h | 2 + drivers/md/bcache/super.c | 36 +++++++++ drivers/md/bcache/sysfs.c | 56 +++++++++++++ drivers/md/bcache/writeback.c | 11 ++- drivers/md/bcache/writeback.h | 3 +- 8 files changed, 300 insertions(+), 8 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 3340f59117112..aa9e85dc04872 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -400,6 +400,28 @@ struct cached_dev { unsigned int offline_seconds; char backing_dev_name[BDEVNAME_SIZE]; + + /* Count the front and writeback io bandwidth per second */ + atomic_t writeback_sector_size; + atomic_t writeback_io_num; + atomic_t front_io_num; + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + struct timer_list io_stat_timer; + + unsigned int writeback_state; +#define WRITEBACK_DEFAULT 0 +#define WRITEBACK_QUICK 1 +#define WRITEBACK_SLOW 2 + + /* realize for token bucket */ + spinlock_t token_lock; + unsigned int max_sector_size; + unsigned int max_io_num; + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + struct timer_list token_assign_timer; }; enum alloc_reserve { @@ -739,6 +761,10 @@ struct cache_set { #define BUCKET_HASH_BITS 12 struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; + unsigned int cutoff_writeback_sync; + bool traffic_policy_start; + bool force_write_through; + unsigned int gc_sectors; }; struct bbio { @@ -754,6 +780,29 @@ struct bbio { struct bio bio; }; +struct get_bcache_status { + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + uint64_t dirty_rate; + unsigned int available; +}; + +struct set_bcache_status { + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + bool traffic_policy_start; + bool force_write_through; + bool copy_gc_enabled; + bool trigger_gc; + unsigned int writeback_state; + unsigned int gc_sectors; + unsigned int cutoff_writeback_sync; +}; +#define BCACHE_MAJOR 'B' +#define BCACHE_GET_WRITE_STATUS _IOR(BCACHE_MAJOR, 0x0, struct get_bcache_status) +#define BCACHE_SET_WRITE_STATUS _IOW(BCACHE_MAJOR, 0x1, struct set_bcache_status) + #define BTREE_PRIO USHRT_MAX #define INITIAL_PRIO 32768U diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index 76cfd121a4861..7c5f7c235c533 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -193,7 +193,11 @@ static inline unsigned int bset_block_offset(struct btree *b, struct bset *i) static inline void set_gc_sectors(struct cache_set *c) { - atomic_set(&c->sectors_to_gc, c->sb.bucket_size * c->nbuckets / 16); + if (c->gc_sectors == 0) + atomic_set(&c->sectors_to_gc, + c->sb.bucket_size * c->nbuckets / 16); + else + atomic_set(&c->sectors_to_gc, c->gc_sectors); } void bkey_put(struct cache_set *c, struct bkey *k); diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 2f3ecd8fc5801..50303841dd45b 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -28,6 +28,7 @@ struct kmem_cache *bch_search_cache; static void bch_data_insert_start(struct closure *cl); +static void alloc_token(struct cached_dev *dc, unsigned int sectors); static unsigned int cache_mode(struct cached_dev *dc) { @@ -384,8 +385,9 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) goto skip; if (mode == CACHE_MODE_NONE || - (mode == CACHE_MODE_WRITEAROUND && - op_is_write(bio_op(bio)))) + ((mode == CACHE_MODE_WRITEAROUND || + c->force_write_through == true) && + op_is_write(bio_op(bio)))) goto skip; /* @@ -863,6 +865,10 @@ static void cached_dev_read_done(struct closure *cl) if (s->iop.bio && (!dc->read_bypass || s->prefetch) && !test_bit(CACHE_SET_STOPPING, &s->iop.c->flags)) { BUG_ON(!s->iop.replace); + if ((dc->disk.c->traffic_policy_start == true) && + (dc->disk.c->force_write_through != true)) { + alloc_token(dc, bio_sectors(s->iop.bio)); + } closure_call(&s->iop.cl, bch_data_insert, NULL, cl); } @@ -1005,6 +1011,35 @@ static void cached_dev_write_complete(struct closure *cl) continue_at(cl, cached_dev_bio_complete, NULL); } +static void alloc_token(struct cached_dev *dc, unsigned int sectors) +{ + int count = 0; + + spin_lock_bh(&dc->token_lock); + + while ((dc->write_token_sector_size < sectors) && + (dc->write_token_io_num == 0)) { + spin_unlock_bh(&dc->token_lock); + schedule_timeout_interruptible(msecs_to_jiffies(10)); + count++; + if ((dc->disk.c->traffic_policy_start != true) || + (cache_mode(dc) != CACHE_MODE_WRITEBACK) || + (count > 100)) + return; + spin_lock_bh(&dc->token_lock); + } + + if (dc->write_token_sector_size >= sectors) + dc->write_token_sector_size -= sectors; + else + dc->write_token_sector_size = 0; + + if (dc->write_token_io_num > 0) + dc->write_token_io_num--; + + spin_unlock_bh(&dc->token_lock); +} + static void cached_dev_write(struct cached_dev *dc, struct search *s) { struct closure *cl = &s->cl; @@ -1252,6 +1287,7 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, cached_dev_nodata, bcache_wq); } else { + atomic_inc(&dc->front_io_num); s->iop.bypass = check_should_bypass(dc, bio); if (!s->iop.bypass && bio->bi_iter.bi_size && !rw) { @@ -1263,10 +1299,17 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, save_circ_item(&s->smp); } - if (rw) + if (rw) { + if ((s->iop.bypass == false) && + (dc->disk.c->traffic_policy_start == true) && + (cache_mode(dc) == CACHE_MODE_WRITEBACK) && + (bio_op(bio) != REQ_OP_DISCARD)) { + alloc_token(dc, bio_sectors(bio)); + } cached_dev_write(dc, s); - else + } else { cached_dev_read(dc, s); + } } } else /* I/O request sent to backing device */ @@ -1275,6 +1318,65 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, return BLK_QC_T_NONE; } +static int bcache_get_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct get_bcache_status a; + uint64_t cache_sectors; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + + a.writeback_sector_size_per_sec = dc->writeback_sector_size_per_sec; + a.writeback_io_num_per_sec = dc->writeback_io_num_per_sec; + a.front_io_num_per_sec = dc->front_io_num_per_sec; + cache_sectors = c->nbuckets * c->sb.bucket_size - + atomic_long_read(&c->flash_dev_dirty_sectors); + a.dirty_rate = div64_u64(bcache_dev_sectors_dirty(&dc->disk) * 100, + cache_sectors); + a.available = 100 - c->gc_stats.in_use; + if (copy_to_user((struct get_bcache_status *)arg, &a, + sizeof(struct get_bcache_status))) + return -EFAULT; + return 0; +} + +static int bcache_set_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct set_bcache_status a; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + if (copy_from_user(&a, (struct set_bcache_status *)arg, + sizeof(struct set_bcache_status))) + return -EFAULT; + + if (c->traffic_policy_start != a.traffic_policy_start) + pr_info("%s traffic policy %s", dc->disk.disk->disk_name, + (a.traffic_policy_start == true) ? "enable" : "disable"); + if (c->force_write_through != a.force_write_through) + pr_info("%s force write through %s", dc->disk.disk->disk_name, + (a.force_write_through == true) ? "enable" : "disable"); + if (a.trigger_gc) { + pr_info("trigger %s gc", dc->disk.disk->disk_name); + atomic_set(&c->sectors_to_gc, -1); + wake_up_gc(c); + } + if ((a.cutoff_writeback_sync >= CUTOFF_WRITEBACK_SYNC) && + (a.cutoff_writeback_sync <= CUTOFF_WRITEBACK_SYNC_MAX)) { + c->cutoff_writeback_sync = a.cutoff_writeback_sync; + } + + dc->max_sector_size = a.write_token_sector_size; + dc->max_io_num = a.write_token_io_num; + c->traffic_policy_start = a.traffic_policy_start; + c->force_write_through = a.force_write_through; + c->gc_sectors = a.gc_sectors; + dc->writeback_state = a.writeback_state; + return 0; +} + static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, unsigned int cmd, unsigned long arg) { @@ -1283,7 +1385,14 @@ static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, if (dc->io_disable) return -EIO; - return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + switch (cmd) { + case BCACHE_GET_WRITE_STATUS: + return bcache_get_write_status(dc, arg); + case BCACHE_SET_WRITE_STATUS: + return bcache_set_write_status(dc, arg); + default: + return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + } } static int cached_dev_congested(void *data, int bits) @@ -1443,3 +1552,29 @@ int __init bch_request_init(void) return 0; } + +static void token_assign(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, token_assign_timer); + + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); + + spin_lock(&dc->token_lock); + dc->write_token_sector_size = dc->max_sector_size / 8; + dc->write_token_io_num = dc->max_io_num / 8; + dc->write_token_io_num = + (dc->write_token_io_num == 0) ? 1 : dc->write_token_io_num; + spin_unlock(&dc->token_lock); +} + +void bch_traffic_policy_init(struct cached_dev *dc) +{ + spin_lock_init(&dc->token_lock); + dc->write_token_sector_size = 0; + dc->write_token_io_num = 0; + + timer_setup(&dc->token_assign_timer, token_assign, 0); + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); +} diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h index 7682630db73f6..8e2ab8071b5bf 100644 --- a/drivers/md/bcache/request.h +++ b/drivers/md/bcache/request.h @@ -41,6 +41,8 @@ void bch_data_insert(struct closure *cl); void bch_cached_dev_request_init(struct cached_dev *dc); void bch_flash_dev_request_init(struct bcache_device *d); +void bch_traffic_policy_init(struct cached_dev *dc); + extern struct kmem_cache *bch_search_cache; struct search { diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 754e88895738b..e2828c762edc5 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1260,6 +1260,9 @@ static void cached_dev_free(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); + del_timer_sync(&dc->io_stat_timer); + del_timer_sync(&dc->token_assign_timer); + if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) cancel_writeback_rate_update_dwork(dc); @@ -1303,6 +1306,36 @@ static void cached_dev_flush(struct closure *cl) continue_at(cl, cached_dev_free, system_wq); } +static void cached_dev_io_stat(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, io_stat_timer); + + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); + + dc->writeback_sector_size_per_sec = + atomic_read(&dc->writeback_sector_size); + dc->writeback_io_num_per_sec = atomic_read(&dc->writeback_io_num); + dc->front_io_num_per_sec = atomic_read(&dc->front_io_num); + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); +} + +static void cached_dev_timer_init(struct cached_dev *dc) +{ + dc->writeback_sector_size_per_sec = 0; + dc->writeback_io_num_per_sec = 0; + dc->front_io_num_per_sec = 0; + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); + + timer_setup(&dc->io_stat_timer, cached_dev_io_stat, 0); + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); +} + static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) { int ret; @@ -1319,6 +1352,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) INIT_LIST_HEAD(&dc->io_lru); spin_lock_init(&dc->io_lock); bch_cache_accounting_init(&dc->accounting, &dc->disk.cl); + cached_dev_timer_init(dc); + bch_traffic_policy_init(dc); dc->sequential_cutoff = 4 << 20; dc->inflight_block_enable = 1; @@ -1837,6 +1872,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->congested_read_threshold_us = 2000; c->congested_write_threshold_us = 20000; c->error_limit = DEFAULT_IO_ERROR_LIMIT; + c->cutoff_writeback_sync = bch_cutoff_writeback_sync; c->idle_max_writeback_rate_enabled = 1; WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags)); diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index e23c426229399..097560389760c 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -53,6 +53,13 @@ static const char * const error_actions[] = { NULL }; +static const char * const writeback_state[] = { + "default", + "quick", + "slow", + NULL +}; + write_attribute(attach); write_attribute(detach); write_attribute(unregister); @@ -102,6 +109,9 @@ read_attribute(cutoff_writeback); read_attribute(cutoff_writeback_sync); rw_attribute(congested_read_threshold_us); rw_attribute(congested_write_threshold_us); +rw_attribute(gc_sectors); +rw_attribute(traffic_policy_start); +rw_attribute(force_write_through); rw_attribute(sequential_cutoff); rw_attribute(read_bypass); @@ -120,7 +130,13 @@ rw_attribute(writeback_rate_update_seconds); rw_attribute(writeback_rate_i_term_inverse); rw_attribute(writeback_rate_p_term_inverse); rw_attribute(writeback_rate_minimum); +rw_attribute(writeback_state); +read_attribute(writeback_sector_size_per_sec); +read_attribute(writeback_io_num_per_sec); +read_attribute(front_io_num_per_sec); read_attribute(writeback_rate_debug); +read_attribute(write_token_sector_size); +read_attribute(write_token_io_num); read_attribute(stripe_size); read_attribute(partial_stripes_expensive); @@ -177,6 +193,11 @@ SHOW(__bch_cached_dev) bch_cache_modes, BDEV_CACHE_MODE(&dc->sb)); + if (attr == &sysfs_writeback_state) + return bch_snprint_string_list(buf, PAGE_SIZE, + writeback_state, + dc->writeback_state); + if (attr == &sysfs_readahead_cache_policy) return bch_snprint_string_list(buf, PAGE_SIZE, bch_reada_cache_policies, @@ -194,6 +215,9 @@ SHOW(__bch_cached_dev) var_printf(writeback_metadata, "%i"); var_printf(writeback_running, "%i"); var_print(writeback_delay); + var_print(writeback_sector_size_per_sec); + var_print(writeback_io_num_per_sec); + var_print(front_io_num_per_sec); var_print(writeback_percent); sysfs_hprint(writeback_rate, wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); @@ -256,6 +280,8 @@ SHOW(__bch_cached_dev) sysfs_print(running, atomic_read(&dc->running)); sysfs_print(state, states[BDEV_STATE(&dc->sb)]); + var_print(write_token_sector_size); + var_print(write_token_io_num); if (attr == &sysfs_label) { memcpy(buf, dc->sb.label, SB_LABEL_SIZE); @@ -375,6 +401,15 @@ STORE(__cached_dev) } } + if (attr == &sysfs_writeback_state) { + v = __sysfs_match_string(writeback_state, -1, buf); + + if (v < 0) + return v; + + dc->writeback_state = v; + } + if (attr == &sysfs_readahead_cache_policy) { v = __sysfs_match_string(bch_reada_cache_policies, -1, buf); if (v < 0) @@ -498,11 +533,14 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_data_csum, #endif &sysfs_cache_mode, + &sysfs_writeback_state, &sysfs_readahead_cache_policy, &sysfs_stop_when_cache_set_failed, &sysfs_writeback_metadata, &sysfs_writeback_running, &sysfs_writeback_delay, + &sysfs_writeback_sector_size_per_sec, + &sysfs_writeback_io_num_per_sec, &sysfs_writeback_percent, &sysfs_writeback_rate, &sysfs_writeback_rate_update_seconds, @@ -510,6 +548,9 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_minimum, &sysfs_writeback_rate_debug, + &sysfs_write_token_sector_size, + &sysfs_write_token_io_num, + &sysfs_front_io_num_per_sec, &sysfs_io_errors, &sysfs_io_error_limit, &sysfs_io_disable, @@ -770,6 +811,12 @@ SHOW(__bch_cache_set) c->congested_read_threshold_us); sysfs_print(congested_write_threshold_us, c->congested_write_threshold_us); + sysfs_print(gc_sectors, + c->gc_sectors); + sysfs_print(traffic_policy_start, + c->traffic_policy_start); + sysfs_print(force_write_through, + c->force_write_through); sysfs_print(cutoff_writeback, bch_cutoff_writeback); sysfs_print(cutoff_writeback_sync, bch_cutoff_writeback_sync); @@ -855,6 +902,12 @@ STORE(__bch_cache_set) sysfs_strtoul_clamp(congested_write_threshold_us, c->congested_write_threshold_us, 0, UINT_MAX); + sysfs_strtoul(gc_sectors, + c->gc_sectors); + sysfs_strtoul(traffic_policy_start, + c->traffic_policy_start); + sysfs_strtoul(force_write_through, + c->force_write_through); if (attr == &sysfs_errors) { v = __sysfs_match_string(error_actions, -1, buf); @@ -997,6 +1050,9 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_idle_max_writeback_rate, &sysfs_gc_after_writeback, &sysfs_io_disable, + &sysfs_gc_sectors, + &sysfs_traffic_policy_start, + &sysfs_force_write_through, &sysfs_cutoff_writeback, &sysfs_cutoff_writeback_sync, NULL diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 25a66e54f7ba5..1e3041f6363c7 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -239,7 +239,13 @@ static unsigned int writeback_delay(struct cached_dev *dc, !dc->writeback_percent) return 0; - return bch_next_delay(&dc->writeback_rate, sectors); + if (dc->writeback_state == WRITEBACK_DEFAULT) { + return bch_next_delay(&dc->writeback_rate, sectors); + } else if (dc->writeback_state == WRITEBACK_QUICK) { + return 0; + } else { + return msecs_to_jiffies(1000); + } } struct dirty_io { @@ -304,6 +310,9 @@ static void write_dirty_finish(struct closure *cl) : &dc->disk.c->writeback_keys_done); } + atomic_add(KEY_SIZE(&w->key), &dc->writeback_sector_size); + atomic_inc(&dc->writeback_io_num); + bch_keybuf_del(&dc->writeback_keys, w); up(&dc->in_flight); diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index c4ff76037227b..912b1a146cea2 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -80,10 +80,11 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio, unsigned int cache_mode, bool would_skip) { unsigned int in_use = dc->disk.c->gc_stats.in_use; + unsigned int cutoff = dc->disk.c->cutoff_writeback_sync; if (cache_mode != CACHE_MODE_WRITEBACK || test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || - in_use > bch_cutoff_writeback_sync) + in_use > cutoff) return false; if (bio_op(bio) == REQ_OP_DISCARD) -- 2.25.1

1 3

[PATCH openEuler-20.03-LTS-tag] kernel.spec: fix rpmbuild error with patch
by Gou Hao 19 Jul '21

19 Jul '21

From: gouhao <gouhao(a)uniontech.com> kernel.spec: fix rpmbuild error with patch uniontech inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I3ZNR0 CVE: NA -------------------------------------------------------------- When %{with_patch} is 1, the second compilation will fail when compiling with rpmbuild, and the failure message is: Reversed (or previously applied) patch detected! rpm package compilation usually deletes the source directory in ~/rpmbuild/BUILD in the %prep stage, but in our spec, in the %prep stage, if %{with patch} is 1 and vanilla-%{TarballVer} exists, there is no delete the source code directory and enter it directly， it will cause this bug. Signed-off-by: gouhao <gouhao(a)uniontech.com> --- kernel.spec | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/kernel.spec b/kernel.spec index e00a323..11c76aa 100644 --- a/kernel.spec +++ b/kernel.spec @@ -205,20 +205,14 @@ package or when debugging this package.\ %endif %prep -%if 0%{?with_patch} -if [ ! -d kernel-%{version}/vanilla-%{TarballVer} ];then -%setup -q -n kernel-%{version} -a 9998 -c - mv linux-%{TarballVer} vanilla-%{TarballVer} -else - cd kernel-%{version} -fi -cp -rl vanilla-%{TarballVer} linux-%{KernelVer} -%else + %setup -q -n kernel-%{version} -c -mv kernel linux-%{version} -cp -rl linux-%{version} linux-%{KernelVer} + +%if 0%{?with_patch} +tar -xjf %{SOURCE9998} %endif +mv linux-%{version} linux-%{KernelVer} cd linux-%{KernelVer} %if 0%{?with_patch} -- 2.20.1

1 0

[PATCH kernel-4.19] igmp: Add ip_mc_list lock in ip_check_mc_rcu
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Liu Jian <liujian56(a)huawei.com> mainline inclusion from mainline-net-next commit 23d2b94043ca8835bd1e67749020e839f396a1c2 category: bugfix bugzilla: NA CVE: NA -------------------------------- I got below panic when doing fuzz test: Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G B 5.14.0-rc1-00195-gcff5c4254439-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x7a/0x9b panic+0x2cd/0x5af end_report.cold+0x5a/0x5a kasan_report+0xec/0x110 ip_check_mc_rcu+0x556/0x5d0 __mkroute_output+0x895/0x1740 ip_route_output_key_hash_rcu+0x2d0/0x1050 ip_route_output_key_hash+0x182/0x2e0 ip_route_output_flow+0x28/0x130 udp_sendmsg+0x165d/0x2280 udpv6_sendmsg+0x121e/0x24f0 inet6_sendmsg+0xf7/0x140 sock_sendmsg+0xe9/0x180 ____sys_sendmsg+0x2b8/0x7a0 ___sys_sendmsg+0xf0/0x160 __sys_sendmmsg+0x17e/0x3c0 __x64_sys_sendmmsg+0x9e/0x100 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x462eb9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9 RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff It is one use-after-free in ip_check_mc_rcu. In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection. But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock. Signed-off-by: Liu Jian <liujian56(a)huawei.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Reviewed-by: Wei Yongjun <weiyongjun1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- net/ipv4/igmp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index ffa847fc96194..1a0953b2b1396 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -2736,6 +2736,7 @@ int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u rv = 1; } else if (im) { if (src_addr) { + spin_lock_bh(&im->lock); for (psf = im->sources; psf; psf = psf->sf_next) { if (psf->sf_inaddr == src_addr) break; @@ -2746,6 +2747,7 @@ int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u im->sfcount[MCAST_EXCLUDE]; else rv = im->sfcount[MCAST_EXCLUDE] != 0; + spin_unlock_bh(&im->lock); } else rv = 1; /* unspecified source; tentatively allow */ } -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] memcg: fix unsuitable null check after alloc memory
by Yang Yingliang 18 Jul '21

18 Jul '21

From: Lu Jialin <lujialin4(a)huawei.com> hulk inclusion category: bugfix bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I CVE: NA -------- Signed-off-by: Lu Jialin <lujialin4(a)huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/memcontrol.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 713a839013f72..e55b46d5d0fcb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4850,8 +4850,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!node_state(node, N_NORMAL_MEMORY)) tmp = -1; pn_ext = kzalloc_node(sizeof(*pn_ext), GFP_KERNEL, tmp); - pn = &pn_ext->pn; - if (!pn) + if (!pn_ext) return 1; pn_ext->lruvec_stat_local = alloc_percpu(struct lruvec_stat); @@ -4860,6 +4859,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) return 1; } + pn = &pn_ext->pn; pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); if (!pn->lruvec_stat_cpu) { free_percpu(pn_ext->lruvec_stat_local); @@ -4927,10 +4927,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void) size += nr_node_ids * sizeof(struct mem_cgroup_per_node *); memcg_ext = kzalloc(size, GFP_KERNEL); - memcg = &memcg_ext->memcg; - if (!memcg) + if (!memcg_ext) return NULL; + memcg = &memcg_ext->memcg; memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX, GFP_KERNEL); -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] config: enable KASAN and UBSAN by default
by Yang Yingliang 16 Jul '21

16 Jul '21

hulk inclusion category: bugfix bugzilla: NA CVE: NA -------------------------------- Enable KASAN and UBSAN by default for test. Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/configs/hulk_defconfig | 10 ++++++++-- arch/x86/configs/hulk_defconfig | 10 ++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/arm64/configs/hulk_defconfig b/arch/arm64/configs/hulk_defconfig index 741b33fab4e9c..6a3937a4017dd 100644 --- a/arch/arm64/configs/hulk_defconfig +++ b/arch/arm64/configs/hulk_defconfig @@ -5504,7 +5504,10 @@ CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_PER_CPU_MAPS is not set CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_CC_HAS_SANCOV_TRACE_PC=y # CONFIG_KCOV is not set @@ -5668,7 +5671,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y diff --git a/arch/x86/configs/hulk_defconfig b/arch/x86/configs/hulk_defconfig index 6ea79ca8a8a1e..f43f5ead30fa4 100644 --- a/arch/x86/configs/hulk_defconfig +++ b/arch/x86/configs/hulk_defconfig @@ -7331,7 +7331,10 @@ CONFIG_DEBUG_MEMORY_INIT=y CONFIG_HAVE_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_DEBUG_SHIRQ=y @@ -7504,7 +7507,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y # CONFIG_IO_STRICT_DEVMEM is not set -- 2.25.1

1 0

[PATCH kernel-4.19] config: enable KASAN and UBSAN by default
by Yang Yingliang 16 Jul '21

16 Jul '21

hulk inclusion category: bugfix bugzilla: NA CVE: NA -------------------------------- Enable KASAN and UBSAN by default for test. Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/configs/hulk_defconfig | 10 ++++++++-- arch/x86/configs/hulk_defconfig | 10 ++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/arm64/configs/hulk_defconfig b/arch/arm64/configs/hulk_defconfig index 59069097da854..6e9b7ff0bf2dc 100644 --- a/arch/arm64/configs/hulk_defconfig +++ b/arch/arm64/configs/hulk_defconfig @@ -5542,7 +5542,10 @@ CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_PER_CPU_MAPS is not set CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_CC_HAS_SANCOV_TRACE_PC=y # CONFIG_KCOV is not set @@ -5708,7 +5711,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y diff --git a/arch/x86/configs/hulk_defconfig b/arch/x86/configs/hulk_defconfig index b0e92300198c2..38ac853d4e31c 100644 --- a/arch/x86/configs/hulk_defconfig +++ b/arch/x86/configs/hulk_defconfig @@ -7320,7 +7320,10 @@ CONFIG_DEBUG_MEMORY_INIT=y CONFIG_HAVE_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_CC_HAS_SANCOV_TRACE_PC=y # CONFIG_KCOV is not set @@ -7495,7 +7498,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y # CONFIG_IO_STRICT_DEVMEM is not set -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] cpuidle: fix a build error when compiling haltpoll into module
by GONG, Ruiqi 16 Jul '21

16 Jul '21

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZURN CVE: NA -------- Kernel build would fail in case of CONFIG_HALTPOLL_CPUIDLE=m, caused by haltpoll_switch_governor() not marked as an exported symbol. Fix this by complementing the EXPORT_SYMBOL statement. Fixes: 97c227885e0a ("cpuidle: fix container_of err in cpuidle_device and cpuidle_driver") Signed-off-by: GONG, Ruiqi <gongruiqi1(a)huawei.com> Cc: Jiajun Chen <chenjiajun8(a)huawei.com> --- drivers/cpuidle/driver.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 484d0e9655fc..c97d08c90655 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -260,6 +260,8 @@ void haltpoll_switch_governor(struct cpuidle_driver *drv) } } +EXPORT_SYMBOL_GPL(haltpoll_switch_governor); + /** * cpuidle_register_driver - registers a driver * @drv: a pointer to a valid struct cpuidle_driver -- 2.27.0

3 2

[PATCH openEuler-1.0-LTS 1/6] KVM: cpuid: do_cpuid_ent works on a whole CPUID function
by Yang Yingliang 16 Jul '21

16 Jul '21

From: Paolo Bonzini <pbonzini(a)redhat.com> mainline inclusion from mainline-v5.3-rc1 commit ab8bcf64971180e1344ce2c7e70c49b0f24f6b0d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG CVE: NA ----------------------------- Rename it as well as __do_cpuid_ent and __do_cpuid_ent_emulated to have "func" in its name, and drop the index parameter which is always 0. Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Jingyi Wang <wangjingyi11(a)huawei.com> Reviewed-by: Keqian Zhu <zhukeqian1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/x86/kvm/cpuid.c | 105 +++++++++++++++++++++---------------------- 1 file changed, 50 insertions(+), 55 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1a0b5843b0e34..fe925f146773a 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -287,14 +287,19 @@ static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function, { entry->function = function; entry->index = index; + entry->flags = 0; + cpuid_count(entry->function, entry->index, &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); - entry->flags = 0; } -static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry, - u32 func, u32 index, int *nent, int maxnent) +static int __do_cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, + u32 func, int *nent, int maxnent) { + entry->function = func; + entry->index = 0; + entry->flags = 0; + switch (func) { case 0: entry->eax = 7; @@ -306,21 +311,18 @@ static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry, break; case 7: entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; - if (index == 0) - entry->ecx = F(RDPID); + entry->eax = 0; + entry->ecx = F(RDPID); ++*nent; default: break; } - entry->function = func; - entry->index = index; - return 0; } -static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, - u32 index, int *nent, int maxnent) +static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function, + int *nent, int maxnent) { int r; unsigned f_nx = is_efer_nx() ? F(NX) : 0; @@ -423,7 +425,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (WARN_ON(*nent >= maxnent)) goto out; - do_cpuid_1_ent(entry, function, index); + do_cpuid_1_ent(entry, function, 0); ++*nent; switch (function) { @@ -486,42 +488,36 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, break; case 7: { entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; - /* Mask ebx against host capability word 9 */ - if (index == 0) { - entry->ebx &= kvm_cpuid_7_0_ebx_x86_features; - cpuid_mask(&entry->ebx, CPUID_7_0_EBX); - // TSC_ADJUST is emulated - entry->ebx |= F(TSC_ADJUST); - entry->ecx &= kvm_cpuid_7_0_ecx_x86_features; - f_la57 = entry->ecx & F(LA57); - cpuid_mask(&entry->ecx, CPUID_7_ECX); - /* Set LA57 based on hardware capability. */ - entry->ecx |= f_la57; - entry->ecx |= f_umip; - /* PKU is not yet implemented for shadow paging. */ - if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) - entry->ecx &= ~F(PKU); - - entry->edx &= kvm_cpuid_7_0_edx_x86_features; - cpuid_mask(&entry->edx, CPUID_7_EDX); - if (boot_cpu_has(X86_FEATURE_IBPB) && - boot_cpu_has(X86_FEATURE_IBRS)) - entry->edx |= F(SPEC_CTRL); - if (boot_cpu_has(X86_FEATURE_STIBP)) - entry->edx |= F(INTEL_STIBP); - if (boot_cpu_has(X86_FEATURE_SSBD)) - entry->edx |= F(SPEC_CTRL_SSBD); - /* - * We emulate ARCH_CAPABILITIES in software even - * if the host doesn't support it. - */ - entry->edx |= F(ARCH_CAPABILITIES); - } else { - entry->ebx = 0; - entry->ecx = 0; - entry->edx = 0; - } entry->eax = 0; + /* Mask ebx against host capability word 9 */ + entry->ebx &= kvm_cpuid_7_0_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_7_0_EBX); + // TSC_ADJUST is emulated + entry->ebx |= F(TSC_ADJUST); + entry->ecx &= kvm_cpuid_7_0_ecx_x86_features; + f_la57 = entry->ecx & F(LA57); + cpuid_mask(&entry->ecx, CPUID_7_ECX); + /* Set LA57 based on hardware capability. */ + entry->ecx |= f_la57; + entry->ecx |= f_umip; + /* PKU is not yet implemented for shadow paging. */ + if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) + entry->ecx &= ~F(PKU); + + entry->edx &= kvm_cpuid_7_0_edx_x86_features; + cpuid_mask(&entry->edx, CPUID_7_EDX); + if (boot_cpu_has(X86_FEATURE_IBPB) && + boot_cpu_has(X86_FEATURE_IBRS)) + entry->edx |= F(SPEC_CTRL); + if (boot_cpu_has(X86_FEATURE_STIBP)) + entry->edx |= F(INTEL_STIBP); + if (boot_cpu_has(X86_FEATURE_SSBD)) + entry->edx |= F(SPEC_CTRL_SSBD); + /* + * We emulate ARCH_CAPABILITIES in software even + * if the host doesn't support it. + */ + entry->edx |= F(ARCH_CAPABILITIES); break; } case 9: @@ -727,23 +723,22 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, return r; } -static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 func, - u32 idx, int *nent, int maxnent, unsigned int type) +static int do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 func, + int *nent, int maxnent, unsigned int type) { if (*nent >= maxnent) return -E2BIG; if (type == KVM_GET_EMULATED_CPUID) - return __do_cpuid_ent_emulated(entry, func, idx, nent, maxnent); + return __do_cpuid_func_emulated(entry, func, nent, maxnent); - return __do_cpuid_ent(entry, func, idx, nent, maxnent); + return __do_cpuid_func(entry, func, nent, maxnent); } #undef F struct kvm_cpuid_param { u32 func; - u32 idx; bool has_leaf_count; bool (*qualifier)(const struct kvm_cpuid_param *param); }; @@ -816,8 +811,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, if (ent->qualifier && !ent->qualifier(ent)) continue; - r = do_cpuid_ent(&cpuid_entries[nent], ent->func, ent->idx, - &nent, cpuid->nent, type); + r = do_cpuid_func(&cpuid_entries[nent], ent->func, + &nent, cpuid->nent, type); if (r) goto out_free; @@ -827,8 +822,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, limit = cpuid_entries[nent - 1].eax; for (func = ent->func + 1; func <= limit && nent < cpuid->nent && r == 0; ++func) - r = do_cpuid_ent(&cpuid_entries[nent], func, ent->idx, - &nent, cpuid->nent, type); + r = do_cpuid_func(&cpuid_entries[nent], func, + &nent, cpuid->nent, type); if (r) goto out_free; -- 2.25.1

1 5

[PATCH kernel-4.19 1/6] KVM: cpuid: do_cpuid_ent works on a whole CPUID function
by Yang Yingliang 15 Jul '21

15 Jul '21

From: Paolo Bonzini <pbonzini(a)redhat.com> mainline inclusion from mainline-v5.3-rc1 commit ab8bcf64971180e1344ce2c7e70c49b0f24f6b0d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG CVE: NA ----------------------------- Rename it as well as __do_cpuid_ent and __do_cpuid_ent_emulated to have "func" in its name, and drop the index parameter which is always 0. Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Jingyi Wang <wangjingyi11(a)huawei.com> Reviewed-by: Keqian Zhu <zhukeqian1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/x86/kvm/cpuid.c | 107 +++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 56 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 85f439bd02f34..6775b9bd82911 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -287,14 +287,19 @@ static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function, { entry->function = function; entry->index = index; + entry->flags = 0; + cpuid_count(entry->function, entry->index, &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); - entry->flags = 0; } -static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry, - u32 func, u32 index, int *nent, int maxnent) +static int __do_cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, + u32 func, int *nent, int maxnent) { + entry->function = func; + entry->index = 0; + entry->flags = 0; + switch (func) { case 0: entry->eax = 7; @@ -306,21 +311,18 @@ static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry, break; case 7: entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; - if (index == 0) - entry->ecx = F(RDPID); + entry->eax = 0; + entry->ecx = F(RDPID); ++*nent; default: break; } - entry->function = func; - entry->index = index; - return 0; } -static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, - u32 index, int *nent, int maxnent) +static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function, + int *nent, int maxnent) { int r; unsigned f_nx = is_efer_nx() ? F(NX) : 0; @@ -423,7 +425,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (WARN_ON(*nent >= maxnent)) goto out; - do_cpuid_1_ent(entry, function, index); + do_cpuid_1_ent(entry, function, 0); ++*nent; switch (function) { @@ -486,43 +488,37 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, break; case 7: { entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; - /* Mask ebx against host capability word 9 */ - if (index == 0) { - entry->ebx &= kvm_cpuid_7_0_ebx_x86_features; - cpuid_mask(&entry->ebx, CPUID_7_0_EBX); - // TSC_ADJUST is emulated - entry->ebx |= F(TSC_ADJUST); - entry->ecx &= kvm_cpuid_7_0_ecx_x86_features; - f_la57 = entry->ecx & F(LA57); - cpuid_mask(&entry->ecx, CPUID_7_ECX); - /* Set LA57 based on hardware capability. */ - entry->ecx |= f_la57; - entry->ecx |= f_umip; - /* PKU is not yet implemented for shadow paging. */ - if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) - entry->ecx &= ~F(PKU); - - entry->edx &= kvm_cpuid_7_0_edx_x86_features; - cpuid_mask(&entry->edx, CPUID_7_EDX); - if (boot_cpu_has(X86_FEATURE_IBPB) && - boot_cpu_has(X86_FEATURE_IBRS)) - entry->edx |= F(SPEC_CTRL); - if (boot_cpu_has(X86_FEATURE_STIBP)) - entry->edx |= F(INTEL_STIBP); - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || - boot_cpu_has(X86_FEATURE_AMD_SSBD)) - entry->edx |= F(SPEC_CTRL_SSBD); - /* - * We emulate ARCH_CAPABILITIES in software even - * if the host doesn't support it. - */ - entry->edx |= F(ARCH_CAPABILITIES); - } else { - entry->ebx = 0; - entry->ecx = 0; - entry->edx = 0; - } entry->eax = 0; + /* Mask ebx against host capability word 9 */ + entry->ebx &= kvm_cpuid_7_0_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_7_0_EBX); + // TSC_ADJUST is emulated + entry->ebx |= F(TSC_ADJUST); + entry->ecx &= kvm_cpuid_7_0_ecx_x86_features; + f_la57 = entry->ecx & F(LA57); + cpuid_mask(&entry->ecx, CPUID_7_ECX); + /* Set LA57 based on hardware capability. */ + entry->ecx |= f_la57; + entry->ecx |= f_umip; + /* PKU is not yet implemented for shadow paging. */ + if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) + entry->ecx &= ~F(PKU); + + entry->edx &= kvm_cpuid_7_0_edx_x86_features; + cpuid_mask(&entry->edx, CPUID_7_EDX); + if (boot_cpu_has(X86_FEATURE_IBPB) && + boot_cpu_has(X86_FEATURE_IBRS)) + entry->edx |= F(SPEC_CTRL); + if (boot_cpu_has(X86_FEATURE_STIBP)) + entry->edx |= F(INTEL_STIBP); + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || + boot_cpu_has(X86_FEATURE_AMD_SSBD)) + entry->edx |= F(SPEC_CTRL_SSBD); + /* + * We emulate ARCH_CAPABILITIES in software even + * if the host doesn't support it. + */ + entry->edx |= F(ARCH_CAPABILITIES); break; } case 9: @@ -728,23 +724,22 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, return r; } -static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 func, - u32 idx, int *nent, int maxnent, unsigned int type) +static int do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 func, + int *nent, int maxnent, unsigned int type) { if (*nent >= maxnent) return -E2BIG; if (type == KVM_GET_EMULATED_CPUID) - return __do_cpuid_ent_emulated(entry, func, idx, nent, maxnent); + return __do_cpuid_func_emulated(entry, func, nent, maxnent); - return __do_cpuid_ent(entry, func, idx, nent, maxnent); + return __do_cpuid_func(entry, func, nent, maxnent); } #undef F struct kvm_cpuid_param { u32 func; - u32 idx; bool has_leaf_count; bool (*qualifier)(const struct kvm_cpuid_param *param); }; @@ -817,8 +812,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, if (ent->qualifier && !ent->qualifier(ent)) continue; - r = do_cpuid_ent(&cpuid_entries[nent], ent->func, ent->idx, - &nent, cpuid->nent, type); + r = do_cpuid_func(&cpuid_entries[nent], ent->func, + &nent, cpuid->nent, type); if (r) goto out_free; @@ -828,8 +823,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, limit = cpuid_entries[nent - 1].eax; for (func = ent->func + 1; func <= limit && nent < cpuid->nent && r == 0; ++func) - r = do_cpuid_ent(&cpuid_entries[nent], func, ent->idx, - &nent, cpuid->nent, type); + r = do_cpuid_func(&cpuid_entries[nent], func, + &nent, cpuid->nent, type); if (r) goto out_free; -- 2.25.1

1 5

[PATCH openEuler-1.0-LTS 1/6] Revert "sched/membarrier: fix NULL poiner in membarrier_global_expedited"
by Yang Yingliang 15 Jul '21

15 Jul '21

From: Li Hua <hucool.lihua(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I3UKOW CVE: NA ------------------------------------------------- This reverts commit 281caf306accfea76683f1dabc86749cc1de616a. Remove the work aroud to fix NULL poiner in membarrier_global_expedited, so as to prepare to introduce formal bug fix solution. Signed-off-by: Li Hua <hucool.lihua(a)huawei.com> Reviewed-by: Jian Cheng <cj.chengjian(a)huawei.com> Reviewed-by: Zhang Qiao <zhangqiao22(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- kernel/sched/membarrier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index ea888ddb914f6..c4ea07e857985 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -115,7 +115,7 @@ static int membarrier_global_expedited(void) * scheduling a kthread. */ p = task_rcu_dereference(&cpu_rq(cpu)->curr); - if (p && p->flags & PF_KTHREAD) + if (p->flags & PF_KTHREAD) continue; __cpumask_set_cpu(cpu, tmpmask); -- 2.25.1

1 5

[PATCH OLK-5.10] patchwork test
by Zheng Zengkai 14 Jul '21

14 Jul '21

From: Xiaofei Tan <tanxiaofei(a)huawei.com> mainline inclusion from mainline-5.13-rc2 commit ccb5ecdc2ddeaff744ee075b54cdff8a689e8fa7 category: bugfix bugzilla: 167378 DTS: #525 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work"), do_sea() would unconditionally signal the affected task from the arch code. Since that change, the GHES driver sends the signals. This exposes a problem as errors the GHES driver doesn't understand or doesn't handle effectively are silently ignored. It will cause the errors get taken again, and circulate endlessly. User-space task get stuck in this loop. Existing firmware on Kunpeng9xx systems reports cache errors with the 'ARM Processor Error' CPER records. Do memory failure handling for ARM Processor Error Section just like for Memory Error Section. Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work") Signed-off-by: Xiaofei Tan <tanxiaofei(a)huawei.com> Reviewed-by: James Morse <james.morse(a)arm.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Chen Jun <chenjun102(a)huawei.com> Patchwork Links: http://patchwork.huawei.com/project/hulk5.10/list/?series=16359 --- drivers/acpi/apei/ghes.c | 81 +++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 17 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index fce7ade2aba9..0c8330ed1ffd 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -441,28 +441,35 @@ static void ghes_kick_task_work(struct callback_head *head) gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); } -static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, - int sev) +static bool ghes_do_memory_failure(u64 physical_addr, int flags) { unsigned long pfn; - int flags = -1; - int sec_sev = ghes_severity(gdata->error_severity); - struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; - if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) - return false; - - pfn = mem_err->physical_addr >> PAGE_SHIFT; + pfn = PHYS_PFN(physical_addr); if (!pfn_valid(pfn)) { pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid address in generic error data: %#llx\n", - mem_err->physical_addr); + physical_addr); return false; } + memory_failure_queue(pfn, flags); + return true; +} + +static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, + int sev) +{ + int flags = -1; + int sec_sev = ghes_severity(gdata->error_severity); + struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); + + if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) + return false; + /* iff following two events can be handled properly by now */ if (sec_sev == GHES_SEV_CORRECTED && (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) @@ -470,14 +477,56 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) flags = 0; - if (flags != -1) { - memory_failure_queue(pfn, flags); - return true; - } + if (flags != -1) + return ghes_do_memory_failure(mem_err->physical_addr, flags); return false; } +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev) +{ + struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); + bool queued = false; + int sec_sev, i; + char *p; + + log_arm_hw_error(err); + + sec_sev = ghes_severity(gdata->error_severity); + if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE) + return false; + + p = (char *)(err + 1); + for (i = 0; i < err->err_info_num; i++) { + struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p; + bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR); + bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR); + const char *error_type = "unknown error"; + + /* + * The field (err_info->error_info & BIT(26)) is fixed to set to + * 1 in some old firmware of HiSilicon Kunpeng920. We assume that + * firmware won't mix corrected errors in an uncorrected section, + * and don't filter out 'corrected' error here. + */ + if (is_cache && has_pa) { + queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0); + p += err_info->length; + continue; + } + + if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs)) + error_type = cper_proc_error_type_strs[err_info->type]; + + pr_warn_ratelimited(FW_WARN GHES_PFX + "Unhandled processor error type: %s\n", + error_type); + p += err_info->length; + } + + return queued; +} + /* * PCIe AER errors need to be sent to the AER driver for reporting and * recovery. The GHES severities map to the following AER severities and @@ -605,9 +654,7 @@ static bool ghes_do_proc(struct ghes *ghes, ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { - struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); - - log_arm_hw_error(err); + queued = ghes_handle_arm_hw_error(gdata, sev); } else { void *err = acpi_hest_get_payload(gdata); -- 2.20.1

1 0

[PATCH openEuler-1.0-LTS] cifs: Fix leak when handling lease break for cached root fid
by Yang Yingliang 14 Jul '21

14 Jul '21

From: Paul Aurich <paul(a)darkrain42.org> mainline inclusion from mainline-5.9-rc1 commit baf57b56d3604880ccb3956ec6c62ea894f5de99 category: bugfix bugzilla: 40791 CVE: NA --------------------------- Handling a lease break for the cached root didn't free the smb2_lease_break_work allocation, resulting in a leak: unreferenced object 0xffff98383a5af480 (size 128): comm "cifsd", pid 684, jiffies 4294936606 (age 534.868s) hex dump (first 32 bytes): c0 ff ff ff 1f 00 00 00 88 f4 5a 3a 38 98 ff ff ..........Z:8... 88 f4 5a 3a 38 98 ff ff 80 88 d6 8a ff ff ff ff ..Z:8........... backtrace: [<0000000068957336>] smb2_is_valid_oplock_break+0x1fa/0x8c0 [<0000000073b70b9e>] cifs_demultiplex_thread+0x73d/0xcc0 [<00000000905fa372>] kthread+0x11c/0x150 [<0000000079378e4e>] ret_from_fork+0x22/0x30 Avoid this leak by only allocating when necessary. Fixes: a93864d93977 ("cifs: add lease tracking to the cached root fid") Signed-off-by: Paul Aurich <paul(a)darkrain42.org> CC: Stable <stable(a)vger.kernel.org> # v4.18+ Reviewed-by: Aurelien Aptel <aaptel(a)suse.com> Signed-off-by: Steve French <stfrench(a)microsoft.com> Conflicts: fs/cifs/smb2misc.c [ Not apply 9bd4540836("CIFS: Properly process SMB3 lease breaks") ] Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/cifs/smb2misc.c | 73 +++++++++++++++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 21 deletions(-) diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index 3b2c1f2ac9563..5236636f71b54 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -509,15 +509,31 @@ cifs_ses_oplock_break(struct work_struct *work) kfree(lw); } +static void +smb2_queue_pending_open_break(struct tcon_link *tlink, __u8 *lease_key, + __le32 new_lease_state) +{ + struct smb2_lease_break_work *lw; + + lw = kmalloc(sizeof(struct smb2_lease_break_work), GFP_KERNEL); + if (!lw) { + cifs_put_tlink(tlink); + return; + } + + INIT_WORK(&lw->lease_break, cifs_ses_oplock_break); + lw->tlink = tlink; + lw->lease_state = new_lease_state; + memcpy(lw->lease_key, lease_key, SMB2_LEASE_KEY_SIZE); + queue_work(cifsiod_wq, &lw->lease_break); +} + static bool -smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp, - struct smb2_lease_break_work *lw) +smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp) { - bool found; __u8 lease_state; struct list_head *tmp; struct cifsFileInfo *cfile; - struct cifs_pending_open *open; struct cifsInodeInfo *cinode; int ack_req = le32_to_cpu(rsp->Flags & SMB2_NOTIFY_BREAK_LEASE_FLAG_ACK_REQUIRED); @@ -556,22 +572,29 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp, &cinode->flags); cifs_queue_oplock_break(cfile); - kfree(lw); return true; } - found = false; + return false; +} + +static struct cifs_pending_open * +smb2_tcon_find_pending_open_lease(struct cifs_tcon *tcon, + struct smb2_lease_break *rsp) +{ + __u8 lease_state = le32_to_cpu(rsp->NewLeaseState); + int ack_req = le32_to_cpu(rsp->Flags & + SMB2_NOTIFY_BREAK_LEASE_FLAG_ACK_REQUIRED); + struct cifs_pending_open *open; + struct cifs_pending_open *found = NULL; + list_for_each_entry(open, &tcon->pending_opens, olist) { if (memcmp(open->lease_key, rsp->LeaseKey, SMB2_LEASE_KEY_SIZE)) continue; if (!found && ack_req) { - found = true; - memcpy(lw->lease_key, open->lease_key, - SMB2_LEASE_KEY_SIZE); - lw->tlink = cifs_get_tlink(open->tlink); - queue_work(cifsiod_wq, &lw->lease_break); + found = open; } cifs_dbg(FYI, "found in the pending open list\n"); @@ -592,14 +615,7 @@ smb2_is_valid_lease_break(char *buffer) struct TCP_Server_Info *server; struct cifs_ses *ses; struct cifs_tcon *tcon; - struct smb2_lease_break_work *lw; - - lw = kmalloc(sizeof(struct smb2_lease_break_work), GFP_KERNEL); - if (!lw) - return false; - - INIT_WORK(&lw->lease_break, cifs_ses_oplock_break); - lw->lease_state = rsp->NewLeaseState; + struct cifs_pending_open *open; cifs_dbg(FYI, "Checking for lease break\n"); @@ -617,11 +633,27 @@ smb2_is_valid_lease_break(char *buffer) spin_lock(&tcon->open_file_lock); cifs_stats_inc( &tcon->stats.cifs_stats.num_oplock_brks); - if (smb2_tcon_has_lease(tcon, rsp, lw)) { + if (smb2_tcon_has_lease(tcon, rsp)) { spin_unlock(&tcon->open_file_lock); spin_unlock(&cifs_tcp_ses_lock); return true; } + open = smb2_tcon_find_pending_open_lease(tcon, + rsp); + if (open) { + __u8 lease_key[SMB2_LEASE_KEY_SIZE]; + struct tcon_link *tlink; + + tlink = cifs_get_tlink(open->tlink); + memcpy(lease_key, open->lease_key, + SMB2_LEASE_KEY_SIZE); + spin_unlock(&tcon->open_file_lock); + spin_unlock(&cifs_tcp_ses_lock); + smb2_queue_pending_open_break(tlink, + lease_key, + rsp->NewLeaseState); + return true; + } spin_unlock(&tcon->open_file_lock); if (tcon->crfid.is_valid && @@ -639,7 +671,6 @@ smb2_is_valid_lease_break(char *buffer) } } spin_unlock(&cifs_tcp_ses_lock); - kfree(lw); cifs_dbg(FYI, "Can not process lease break - no lease matched\n"); return false; } -- 2.25.1

1 0

[PATCH kernel-4.19 1/6] nbd: handle device refs for DESTROY_ON_DISCONNECT properly
by Yang Yingliang 14 Jul '21

14 Jul '21

From: Josef Bacik <josef(a)toxicpanda.com> mainline inclusion from mainline-5.12-rc1 commit c9a2f90f4d6b category: bugfix bugzilla: 50455 CVE: NA ------------------------------------------------- There exists a race where we can be attempting to create a new nbd configuration while a previous configuration is going down, both configured with DESTROY_ON_DISCONNECT. Normally devices all have a reference of 1, as they won't be cleaned up until the module is torn down. However with DESTROY_ON_DISCONNECT we'll make sure that there is only 1 reference (generally) on the device for the config itself, and then once the config is dropped, the device is torn down. The race that exists looks like this TASK1 TASK2 nbd_genl_connect() idr_find() refcount_inc_not_zero(nbd) * count is 2 here ^^ nbd_config_put() nbd_put(nbd) (count is 1) setup new config check DESTROY_ON_DISCONNECT put_dev = true if (put_dev) nbd_put(nbd) * free'd here ^^ In nbd_genl_connect() we assume that the nbd ref count will be 2, however clearly that won't be true if the nbd device had been setup as DESTROY_ON_DISCONNECT with its prior configuration. Fix this by getting rid of the runtime flag to check if we need to mess with the nbd device refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if we need to adjust the ref counts. This was reported by syzkaller with the following kasan dump BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451 CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230 __kasan_report mm/kasan/report.c:396 [inline] kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115 nbd_put drivers/block/nbd.c:248 [inline] nbd_release+0x116/0x190 drivers/block/nbd.c:1508 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc1e92b5270 Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270 RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007 RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008 R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e Allocated by task 1: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:401 [inline] ____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:682 [inline] nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673 nbd_init+0x250/0x271 drivers/block/nbd.c:2394 do_one_initcall+0x103/0x650 init/main.c:1223 do_initcall_level init/main.c:1296 [inline] do_initcalls init/main.c:1312 [inline] do_basic_setup init/main.c:1332 [inline] kernel_init_freeable+0x605/0x689 init/main.c:1533 kernel_init+0xd/0x1b8 init/main.c:1421 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Freed by task 8451: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356 ____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362 kasan_slab_free include/linux/kasan.h:192 [inline] slab_free_hook mm/slub.c:1547 [inline] slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580 slab_free mm/slub.c:3143 [inline] kfree+0xdb/0x3b0 mm/slub.c:4139 nbd_dev_remove drivers/block/nbd.c:243 [inline] nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251 nbd_put drivers/block/nbd.c:295 [inline] nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242 nbd_release+0x103/0x190 drivers/block/nbd.c:1507 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888143bf7000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 416 bytes inside of 1024-byte region [ffff888143bf7000, ffff888143bf7400) The buggy address belongs to the page: page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0 head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab|head) raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-and-tested-by: syzbot+429d3f82d757c211bff3(a)syzkaller.appspotmail.com Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Luo Meng <luomeng12(a)huawei.com> Reviewed-by: Jason Yan <yanaijie(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/block/nbd.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 55552f719b25b..01bede097ed25 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -76,8 +76,7 @@ struct link_dead_args { #define NBD_RT_HAS_PID_FILE 3 #define NBD_RT_HAS_CONFIG_REF 4 #define NBD_RT_BOUND 5 -#define NBD_RT_DESTROY_ON_DISCONNECT 6 -#define NBD_RT_DISCONNECT_ON_CLOSE 7 +#define NBD_RT_DISCONNECT_ON_CLOSE 6 #define NBD_DESTROY_ON_DISCONNECT 0 #define NBD_DISCONNECT_REQUESTED 1 @@ -1894,12 +1893,21 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) if (info->attrs[NBD_ATTR_CLIENT_FLAGS]) { u64 flags = nla_get_u64(info->attrs[NBD_ATTR_CLIENT_FLAGS]); if (flags & NBD_CFLAG_DESTROY_ON_DISCONNECT) { - set_bit(NBD_RT_DESTROY_ON_DISCONNECT, - &config->runtime_flags); - set_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags); - put_dev = true; + /* + * We have 1 ref to keep the device around, and then 1 + * ref for our current operation here, which will be + * inherited by the config. If we already have + * DESTROY_ON_DISCONNECT set then we know we don't have + * that extra ref already held so we don't need the + * put_dev. + */ + if (!test_and_set_bit(NBD_DESTROY_ON_DISCONNECT, + &nbd->flags)) + put_dev = true; } else { - clear_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags); + if (test_and_clear_bit(NBD_DESTROY_ON_DISCONNECT, + &nbd->flags)) + refcount_inc(&nbd->refs); } if (flags & NBD_CFLAG_DISCONNECT_ON_CLOSE) { set_bit(NBD_RT_DISCONNECT_ON_CLOSE, @@ -2067,15 +2075,13 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info) if (info->attrs[NBD_ATTR_CLIENT_FLAGS]) { u64 flags = nla_get_u64(info->attrs[NBD_ATTR_CLIENT_FLAGS]); if (flags & NBD_CFLAG_DESTROY_ON_DISCONNECT) { - if (!test_and_set_bit(NBD_RT_DESTROY_ON_DISCONNECT, - &config->runtime_flags)) + if (!test_and_set_bit(NBD_DESTROY_ON_DISCONNECT, + &nbd->flags)) put_dev = true; - set_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags); } else { - if (test_and_clear_bit(NBD_RT_DESTROY_ON_DISCONNECT, - &config->runtime_flags)) + if (test_and_clear_bit(NBD_DESTROY_ON_DISCONNECT, + &nbd->flags)) refcount_inc(&nbd->refs); - clear_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags); } if (flags & NBD_CFLAG_DISCONNECT_ON_CLOSE) { -- 2.25.1

1 5

[PATCH kernel-4.19 00/33] backport linux-4.19.197
by Yang Yingliang 14 Jul '21

14 Jul '21

Alex Shi (1): mm: add VM_WARN_ON_ONCE_PAGE() macro Alper Gun (1): KVM: SVM: Call SEV Guest Decommission if ASID binding fails Anson Huang (1): ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment Christian König (1): drm/nouveau: fix dma_address check for CPU/GPU sync Greg Kroah-Hartman (1): Linux 4.19.197 Hugh Dickins (16): mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm, futex: fix shared futex pgoff on shmem huge page Jue Wang (1): mm/thp: fix page_address_in_vma() on file THP tails Juergen Gross (1): xen/events: reset active flag for lateeoi events later ManYi Li (1): scsi: sr: Return appropriate error code when disk is ejected Miaohe Lin (2): mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: use page_not_mapped in try_to_unmap() Petr Mladek (2): kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Tony Lindgren (3): clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 Yang Shi (1): mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split afzal mohammed (1): ARM: OMAP: replace setup_irq() by request_irq() Makefile | 2 +- arch/arm/boot/dts/dra7.dtsi | 11 ++ arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 - arch/arm/mach-omap1/pm.c | 13 +- arch/arm/mach-omap1/time.c | 10 +- arch/arm/mach-omap1/timer32k.c | 10 +- arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 181 +++++++++++++++++-------- arch/x86/kvm/svm.c | 32 +++-- drivers/clk/ti/clk-7xx.c | 1 + drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/scsi/sr.c | 2 + drivers/xen/events/events_base.c | 23 +++- include/linux/cpuhotplug.h | 1 + include/linux/huge_mm.h | 8 +- include/linux/hugetlb.h | 16 --- include/linux/mm.h | 3 + include/linux/mmdebug.h | 13 ++ include/linux/pagemap.h | 13 +- include/linux/rmap.h | 3 +- kernel/futex.c | 2 +- kernel/kthread.c | 77 +++++++---- mm/huge_memory.c | 56 ++++---- mm/hugetlb.c | 5 +- mm/internal.h | 53 ++++++-- mm/memory.c | 41 ++++++ mm/page_vma_mapped.c | 160 +++++++++++++--------- mm/pgtable-generic.c | 4 +- mm/rmap.c | 48 ++++--- mm/truncate.c | 43 +++--- 30 files changed, 541 insertions(+), 302 deletions(-) -- 2.25.1

1 33

[PATCH openEuler-1.0-LTS] mm/memcontrol.c: fix kasan slab-out-of-bounds in mem_cgroup_css_alloc
by Yang Yingliang 12 Jul '21

12 Jul '21

From: Lu Jialin <lujialin4(a)huawei.com> hulk inclusion category: bugfix bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I CVE: NA -------- static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) { ... pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp); if (!pn) return 1; pnext = to_mgpn_ext(pn); pnext->lruvec_stat_local = alloc_percpu(struct lruvec_stat); } the size of pnext is larger than pn, so pnext->lruvec_stat_local is out of bounds Signed-off-by: Lu Jialin <lujialin4(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- include/linux/memcontrol.h | 12 +++++----- mm/memcontrol.c | 48 +++++++++++++++++++------------------- 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index c7c7c0a418771..b4a25979ee8c7 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -609,11 +609,11 @@ static inline unsigned long memcg_page_state_local(struct mem_cgroup *memcg, { long x = 0; int cpu; - struct mem_cgroup_extension *mgext; + struct mem_cgroup_extension *memcg_ext; - mgext = to_memcg_ext(memcg); + memcg_ext = to_memcg_ext(memcg); for_each_possible_cpu(cpu) - x += per_cpu(mgext->vmstats_local->count[idx], cpu); + x += per_cpu(memcg_ext->vmstats_local->count[idx], cpu); #ifdef CONFIG_SMP if (x < 0) x = 0; @@ -687,7 +687,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, enum node_stat_item idx) { struct mem_cgroup_per_node *pn; - struct mem_cgroup_per_node_extension *pnext; + struct mem_cgroup_per_node_extension *pn_ext; long x = 0; int cpu; @@ -695,9 +695,9 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return node_page_state(lruvec_pgdat(lruvec), idx); pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - pnext = to_mgpn_ext(pn); + pn_ext = to_mgpn_ext(pn); for_each_possible_cpu(cpu) - x += per_cpu(pnext->lruvec_stat_local->count[idx], cpu); + x += per_cpu(pn_ext->lruvec_stat_local->count[idx], cpu); #ifdef CONFIG_SMP if (x < 0) x = 0; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index da10300a6e7d7..713a839013f72 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -698,14 +698,14 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) x = val + __this_cpu_read(memcg->stat_cpu->count[idx]); if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { struct mem_cgroup *mi; - struct mem_cgroup_extension *mgext; + struct mem_cgroup_extension *memcg_ext; /* * Batch local counters to keep them in sync with * the hierarchical ones. */ - mgext = to_memcg_ext(memcg); - __this_cpu_add(mgext->vmstats_local->count[idx], x); + memcg_ext = to_memcg_ext(memcg); + __this_cpu_add(memcg_ext->vmstats_local->count[idx], x); for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) atomic_long_add(x, &mi->stat[idx]); x = 0; @@ -739,7 +739,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, { pg_data_t *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pn; - struct mem_cgroup_per_node_extension *pnext; + struct mem_cgroup_per_node_extension *pn_ext; struct mem_cgroup *memcg; long x; @@ -756,8 +756,8 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __mod_memcg_state(memcg, idx, val); /* Update lruvec */ - pnext = to_mgpn_ext(pn); - __this_cpu_add(pnext->lruvec_stat_local->count[idx], val); + pn_ext = to_mgpn_ext(pn); + __this_cpu_add(pn_ext->lruvec_stat_local->count[idx], val); x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { @@ -787,14 +787,14 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, x = count + __this_cpu_read(memcg->stat_cpu->events[idx]); if (unlikely(x > MEMCG_CHARGE_BATCH)) { struct mem_cgroup *mi; - struct mem_cgroup_extension *mgext; + struct mem_cgroup_extension *memcg_ext; /* * Batch local counters to keep them in sync with * the hierarchical ones. */ - mgext = to_memcg_ext(memcg); - __this_cpu_add(mgext->vmstats_local->events[idx], x); + memcg_ext = to_memcg_ext(memcg); + __this_cpu_add(memcg_ext->vmstats_local->events[idx], x); for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) atomic_long_add(x, &mi->events[idx]); x = 0; @@ -811,11 +811,11 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) { long x = 0; int cpu; - struct mem_cgroup_extension *mgext; + struct mem_cgroup_extension *memcg_ext; - mgext = to_memcg_ext(memcg); + memcg_ext = to_memcg_ext(memcg); for_each_possible_cpu(cpu) - x += per_cpu(mgext->vmstats_local->events[event], cpu); + x += per_cpu(memcg_ext->vmstats_local->events[event], cpu); return x; } @@ -4837,7 +4837,7 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id) static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) { struct mem_cgroup_per_node *pn; - struct mem_cgroup_per_node_extension *pnext; + struct mem_cgroup_per_node_extension *pn_ext; int tmp = node; /* * This routine is called against possible nodes. @@ -4849,21 +4849,21 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) */ if (!node_state(node, N_NORMAL_MEMORY)) tmp = -1; - pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp); + pn_ext = kzalloc_node(sizeof(*pn_ext), GFP_KERNEL, tmp); + pn = &pn_ext->pn; if (!pn) return 1; - pnext = to_mgpn_ext(pn); - pnext->lruvec_stat_local = alloc_percpu(struct lruvec_stat); - if (!pnext->lruvec_stat_local) { - kfree(pnext); + pn_ext->lruvec_stat_local = alloc_percpu(struct lruvec_stat); + if (!pn_ext->lruvec_stat_local) { + kfree(pn_ext); return 1; } pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); if (!pn->lruvec_stat_cpu) { - free_percpu(pnext->lruvec_stat_local); - kfree(pn); + free_percpu(pn_ext->lruvec_stat_local); + kfree(pn_ext); return 1; } @@ -4879,15 +4879,15 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) { struct mem_cgroup_per_node *pn = memcg->nodeinfo[node]; - struct mem_cgroup_per_node_extension *pnext; + struct mem_cgroup_per_node_extension *pn_ext; if (!pn) return; + pn_ext = to_mgpn_ext(pn); free_percpu(pn->lruvec_stat_cpu); - pnext = to_mgpn_ext(pn); - free_percpu(pnext->lruvec_stat_local); - kfree(pn); + free_percpu(pn_ext->lruvec_stat_local); + kfree(pn_ext); } static void __mem_cgroup_free(struct mem_cgroup *memcg) -- 2.25.1

1 0

[PATCH kernel-4.19] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation
by Yang Yingliang 12 Jul '21

12 Jul '21

From: Qu Wenruo <wqu(a)suse.com> mainline inclusion from mainline-v5.13-rc5 commit 6d4572a9d71d5fc2affee0258d8582d39859188c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I39MZM CVE: NA ------------------------------------------------------ [BUG] When the data space is exhausted, even if the inode has NOCOW attribute, we will still refuse to truncate unaligned range due to ENOSPC. The following script can reproduce it pretty easily: #!/bin/bash dev=/dev/test/test mnt=/mnt/btrfs umount $dev &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f $dev -b 1G mount -o nospace_cache $dev $mnt touch $mnt/foobar chattr +C $mnt/foobar xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null sync xfs_io -c "fpunch 0 2k" $mnt/foobar umount $mnt Currently this will fail at the fpunch part. [CAUSE] Because btrfs_truncate_block() always reserves space without checking the NOCOW attribute. Since the writeback path follows NOCOW bit, we only need to bother the space reservation code in btrfs_truncate_block(). [FIX] Make btrfs_truncate_block() follow btrfs_buffered_write() to try to reserve data space first, and fall back to NOCOW check only when we don't have enough space. Such always-try-reserve is an optimization introduced in btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call. This patch will export check_can_nocow() as btrfs_check_can_nocow(), and use it in btrfs_truncate_block() to fix the problem. Reference: https://patchwork.kernel.org/project/linux-btrfs/patch/20200130052822.11765… Reported-by: Martin Doucha <martin.doucha(a)suse.com> Reviewed-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: Anand Jain <anand.jain(a)oracle.com> Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Conflicts: fs/btrfs/file.c fs/btrfs/inode.c Signed-off-by: Gou Hao <gouhao(a)uniontech.com> Signed-off-by: Cheng Jian <cj.chengjian(a)huawei.com> Reviewed-by: Jiao Fenfang <jiaofenfang(a)uniontech.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/file.c | 8 ++++---- fs/btrfs/inode.c | 39 +++++++++++++++++++++++++++++++++------ 3 files changed, 39 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4d1c12faada89..4f5c58d40a79f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3271,7 +3271,8 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages, int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); int btrfs_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, u64 len); - +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, + size_t *write_bytes); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, struct btrfs_root *root); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 41ad37f8062a9..3cd05edca30ce 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1536,8 +1536,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, return ret; } -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, - size_t *write_bytes) +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, + size_t *write_bytes) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_root *root = inode->root; @@ -1647,7 +1647,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, if (ret < 0) { if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) && - check_can_nocow(BTRFS_I(inode), pos, + btrfs_check_can_nocow(BTRFS_I(inode), pos, &write_bytes) > 0) { /* * For nodata cow case, no need to reserve @@ -1925,7 +1925,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, */ if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) || - check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { + btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { inode_unlock(inode); return -EAGAIN; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bf0e0e3e09c5d..9e3e003e4488e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4958,11 +4958,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, struct extent_state *cached_state = NULL; struct extent_changeset *data_reserved = NULL; char *kaddr; + bool only_release_metadata = false; u32 blocksize = fs_info->sectorsize; pgoff_t index = from >> PAGE_SHIFT; unsigned offset = from & (blocksize - 1); struct page *page; gfp_t mask = btrfs_alloc_write_mask(mapping); + size_t write_bytes = blocksize; int ret = 0; u64 block_start; u64 block_end; @@ -4974,10 +4976,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, block_start = round_down(from, blocksize); block_end = block_start + blocksize - 1; - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, - block_start, blocksize); - if (ret) + ret = btrfs_check_data_free_space(inode, &data_reserved, + block_start, blocksize); + if (ret < 0) { + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | + BTRFS_INODE_PREALLOC)) && + btrfs_check_can_nocow(BTRFS_I(inode), block_start, + &write_bytes) > 0) { + /* For nocow case, no need to reserve data space */ + only_release_metadata = true; + } else { + goto out; + } + } + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize); + if (ret < 0) { + if (!only_release_metadata) + btrfs_free_reserved_data_space(inode, data_reserved, + block_start, blocksize); goto out; + } again: page = find_or_create_page(mapping, index, mask); @@ -5048,10 +5066,19 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, set_page_dirty(page); unlock_extent_cached(io_tree, block_start, block_end, &cached_state); + if (only_release_metadata) + set_extent_bit(&BTRFS_I(inode)->io_tree, block_start, + block_end, EXTENT_NORESERVE, NULL, NULL, + GFP_NOFS); out_unlock: - if (ret) - btrfs_delalloc_release_space(inode, data_reserved, block_start, - blocksize, true); + if (ret) { + if (only_release_metadata) + btrfs_delalloc_release_metadata(BTRFS_I(inode), + blocksize, true); + else + btrfs_delalloc_release_space(inode, data_reserved, + block_start, blocksize, true); + } btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); unlock_page(page); put_page(page); -- 2.25.1

2 1

[PATCH openEuler-1.0-LTS] module: limit enabling module.sig_enforce
by Yang Yingliang 12 Jul '21

12 Jul '21

From: Mimi Zohar <zohar(a)linux.ibm.com> stable inclusion from linux-4.19.196 commit ff660863628fb144badcb3395cde7821c82c13a6 CVE: CVE-2021-35039 -------------------------------- [ Upstream commit 0c18f29aae7ce3dadd26d8ee3505d07cc982df75 ] Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying "module.sig_enforce=1" on the boot command line sets "sig_enforce". Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured. This patch makes the presence of /sys/module/module/parameters/sig_enforce dependent on CONFIG_MODULE_SIG=y. Fixes: fda784e50aac ("module: export module signature enforcement status") Reported-by: Nayna Jain <nayna(a)linux.ibm.com> Tested-by: Mimi Zohar <zohar(a)linux.ibm.com> Tested-by: Jessica Yu <jeyu(a)kernel.org> Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com> Signed-off-by: Jessica Yu <jeyu(a)kernel.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- kernel/module.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/module.c b/kernel/module.c index ad4c1d7b7a956..82c83cbf6ce6c 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -268,9 +268,18 @@ static void module_assert_mutex_or_preempt(void) #endif } +#ifdef CONFIG_MODULE_SIG static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE); module_param(sig_enforce, bool_enable_only, 0644); +void set_module_sig_enforced(void) +{ + sig_enforce = true; +} +#else +#define sig_enforce false +#endif + /* * Export sig_enforce kernel cmdline parameter to allow other subsystems rely * on that instead of directly to CONFIG_MODULE_SIG_FORCE config. -- 2.25.1

1 0

[PATCH kernel-4.19 1/4] xfs: let writable tracepoint enable to clear flag of f_mode
by Yang Yingliang 12 Jul '21

12 Jul '21

From: Yufen Yu <yuyufen(a)huawei.com> hulk inclusion category: feature bugzilla: 173267 CVE: NA --------------------------- Adding a new member clear_f_mode into struct xfs_writable_file, then we can clear some flag of file->f_mode. Signed-off-by: Yufen Yu <yuyufen(a)huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: Hou Tao <houtao1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/xfs/xfs_file.c | 10 +++++++--- include/linux/fs.h | 9 ++++++--- include/uapi/linux/xfs.h | 4 +++- tools/include/uapi/linux/xfs.h | 2 +- 4 files changed, 17 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index ffc388c8b4523..bd8ae4df20042 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -35,6 +35,8 @@ #include <linux/mman.h> #include <linux/fadvise.h> +#define FMODE_MASK (FMODE_RANDOM | FMODE_WILLNEED | FMODE_SPC_READAHEAD) + static const struct vm_operations_struct xfs_file_vm_ops; int @@ -238,15 +240,17 @@ xfs_file_buffered_aio_read( struct xfs_writable_file file; file.name = file_dentry(filp)->d_name.name; + file.clear_f_mode = 0; file.f_mode = 0; file.i_size = file_inode(filp)->i_size; - file.prev_pos = filp->f_ra.prev_pos; + file.prev_pos = filp->f_ra.prev_pos >> PAGE_SHIFT; + file.pos = iocb->ki_pos >> PAGE_SHIFT; trace_xfs_file_buffered_read(ip, iov_iter_count(to), iocb->ki_pos); trace_xfs_file_read(&file, ip, iov_iter_count(to), iocb->ki_pos); - if (file.f_mode) - filp->f_mode |= file.f_mode; + filp->f_mode |= file.f_mode & FMODE_MASK; + filp->f_mode &= ~(file.clear_f_mode & FMODE_MASK); if (iocb->ki_flags & IOCB_NOWAIT) { if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) diff --git a/include/linux/fs.h b/include/linux/fs.h index f5bc43ac95035..394da46d143c2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -160,6 +160,12 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File is stream-like */ #define FMODE_STREAM ((__force fmode_t)0x200000) +/* File will try to read head of the file into pagecache */ +#define FMODE_WILLNEED ((__force fmode_t)0x400000) + +/* File will do specail readahead */ +#define FMODE_SPC_READAHEAD ((__force fmode_t)0x800000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) @@ -169,9 +175,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File does not contribute to nr_files count */ #define FMODE_NOACCOUNT ((__force fmode_t)0x20000000) -/* File will try to read head of the file into pagecache */ -#define FMODE_WILLNEED ((__force fmode_t)0x40000000) - /* * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector * that indicates that they should check the contents of the iovec are diff --git a/include/uapi/linux/xfs.h b/include/uapi/linux/xfs.h index 635a83914273b..0a11c2344e5a3 100644 --- a/include/uapi/linux/xfs.h +++ b/include/uapi/linux/xfs.h @@ -7,9 +7,11 @@ struct xfs_writable_file { const unsigned char *name; + unsigned int clear_f_mode; /* can be cleared from file->f_mode */ unsigned int f_mode; /* can be set into file->f_mode */ long long i_size; /* file size */ - long long prev_pos; /* ra->prev_pos */ + long long prev_pos; /* ra->prev_pos page index */ + long long pos; /* iocb->ki_pos page index */ }; #endif /* _UAPI_LINUX_XFS_H */ diff --git a/tools/include/uapi/linux/xfs.h b/tools/include/uapi/linux/xfs.h index f333a2eb74074..2c4c61d5ba539 100644 --- a/tools/include/uapi/linux/xfs.h +++ b/tools/include/uapi/linux/xfs.h @@ -5,7 +5,7 @@ #include <linux/types.h> #define FMODE_RANDOM (0x1000) -#define FMODE_WILLNEED (0x40000000) +#define FMODE_WILLNEED (0x400000) struct xfs_writable_file { const unsigned char *name; -- 2.25.1

1 3

[PATCH openEuler-1.0-LTS] jbd2: fix kabi broken in struct journal_s
by Yang Yingliang 09 Jul '21

09 Jul '21

From: yangerkun <yangerkun(a)huawei.com> hulk inclusion category: bugfix bugzilla: 172974 CVE: NA --------------------------- 72c9e4df6a99 ('jbd2: ensure abort the journal if detect IO error when writing original buffer back') will add 'j_atomic_flags' which can lead lots of kabi broken like jbd2_journal_destroy/jbd2_journal_abort and so on. Fix it by add a wrapper. Signed-off-by: yangerkun <yangerkun(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/jbd2/checkpoint.c | 5 ++++- fs/jbd2/journal.c | 19 +++++++++++++------ include/linux/jbd2.h | 23 ++++++++++++++++++----- 3 files changed, 35 insertions(+), 12 deletions(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index b1af15ad36dcb..f2c36c9c58be3 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -562,6 +562,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) struct transaction_chp_stats_s *stats; transaction_t *transaction; journal_t *journal; + journal_wrapper_t *journal_wrapper; struct buffer_head *bh = jh2bh(jh); JBUFFER_TRACE(jh, "entry"); @@ -572,6 +573,8 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) return 0; } journal = transaction->t_journal; + journal_wrapper = container_of(journal, journal_wrapper_t, + jw_journal); JBUFFER_TRACE(jh, "removing from transaction"); @@ -583,7 +586,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) * journal here and we abort the journal later from a better context. */ if (buffer_write_io_error(bh)) - set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags); + set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags); __buffer_unlink(jh); jh->b_cp_transaction = NULL; diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 89fad4c3e13cb..ef9a942fc9a1a 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1129,14 +1129,17 @@ static journal_t *journal_init_common(struct block_device *bdev, { static struct lock_class_key jbd2_trans_commit_key; journal_t *journal; + journal_wrapper_t *journal_wrapper; int err; struct buffer_head *bh; int n; - journal = kzalloc(sizeof(*journal), GFP_KERNEL); - if (!journal) + journal_wrapper = kzalloc(sizeof(*journal_wrapper), GFP_KERNEL); + if (!journal_wrapper) return NULL; + journal = &(journal_wrapper->jw_journal); + init_waitqueue_head(&journal->j_wait_transaction_locked); init_waitqueue_head(&journal->j_wait_done_commit); init_waitqueue_head(&journal->j_wait_commit); @@ -1195,7 +1198,7 @@ static journal_t *journal_init_common(struct block_device *bdev, err_cleanup: kfree(journal->j_wbuf); jbd2_journal_destroy_revoke(journal); - kfree(journal); + kfree(journal_wrapper); return NULL; } @@ -1425,11 +1428,13 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid, unsigned long tail_block, int write_op) { journal_superblock_t *sb = journal->j_superblock; + journal_wrapper_t *journal_wrapper = container_of(journal, + journal_wrapper_t, jw_journal); int ret; if (is_journal_aborted(journal)) return -EIO; - if (test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags)) { + if (test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags)) { jbd2_journal_abort(journal, -EIO); return -EIO; } @@ -1754,6 +1759,8 @@ int jbd2_journal_load(journal_t *journal) int jbd2_journal_destroy(journal_t *journal) { int err = 0; + journal_wrapper_t *journal_wrapper = container_of(journal, + journal_wrapper_t, jw_journal); /* Wait for the commit thread to wake up and die. */ journal_kill_thread(journal); @@ -1795,7 +1802,7 @@ int jbd2_journal_destroy(journal_t *journal) * may become inconsistent. */ if (!is_journal_aborted(journal) && - test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags)) + test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags)) jbd2_journal_abort(journal, -EIO); if (journal->j_sb_buffer) { @@ -1823,7 +1830,7 @@ int jbd2_journal_destroy(journal_t *journal) if (journal->j_chksum_driver) crypto_free_shash(journal->j_chksum_driver); kfree(journal->j_wbuf); - kfree(journal); + kfree(journal_wrapper); return err; } diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 5c0446f22bee1..ef213666c3a3b 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -105,6 +105,8 @@ typedef struct jbd2_journal_handle handle_t; /* Atomic operation type */ * This is an opaque datatype. **/ typedef struct journal_s journal_t; /* Journal control structure */ + +typedef struct journal_wrapper_s journal_wrapper_t; #endif /* @@ -780,11 +782,6 @@ struct journal_s */ unsigned long j_flags; - /** - * @j_atomic_flags: Atomic journaling state flags. - */ - unsigned long j_atomic_flags; - /** * @j_errno: * @@ -1199,6 +1196,22 @@ struct journal_s #endif }; +/** + * struct journal_wrapper_s - The wrapper of journal_s to fix KABI. + */ +struct journal_wrapper_s +{ + /** + * @jw_journal: real journal. + */ + journal_t jw_journal; + + /** + * @j_atomic_flags: Atomic journaling state flags. + */ + unsigned long j_atomic_flags; +}; + #define jbd2_might_wait_for_commit(j) \ do { \ rwsem_acquire(&j->j_trans_commit_map, 0, 0, _THIS_IP_); \ -- 2.25.1

1 0