September 2021 - Kernel - mailweb.openeuler.org

[PATCH openEuler-1.0-LTS v4 000/292] Enable Intel Icelake CPU support
by Zheng Zengkai 11 Sep '21

11 Sep '21

Enable Intel Icelake CPU support, backport related patches from upstream. Fix kabi broken introduced by the patch set with patches 0288 - 0292. Alexander Shishkin (28): Intel: perf/ring_buffer: Fix AUX software double buffering Intel: perf/x86/intel/pt: Remove software double buffering PMU capability intel_th: Only create useful device nodes intel_th: Rework resource passing between glue layers and core intel_th: Skip subdevices if their MMIO is missing intel_th: Add "rtit" source device intel_th: Communicate IRQ via resource intel_th: pci: Use MSI interrupt signalling intel_th: msu: Start handling IRQs intel_th: Only report useful IRQs to subdevices intel_th: msu: Replace open-coded list_{first,last,next}_entry variants intel_th: msu: Switch over to scatterlist intel_th: msu: Factor out pipeline draining intel_th: gth: Factor out trace start/stop intel_th: Add switch triggering support intel_th: msu: Correct the block wrap detection intel_th: msu: Add a sysfs attribute to trigger window switch intel_th: msu: Add current window tracking intel_th: msu: Support multipage blocks intel_th: msu: Split sgt array and pointer in multiwindow mode intel_th: msu: Start read iterator from a non-empty window intel_th: msu: Introduce buffer interface intel_th: msu-sink: An example msu buffer "sink" intel_th: gth: Fix the window switching sequence intel_th: msu: Fix an uninitialized mutex intel_th: Fix freeing IRQs intel_th: msu: Fix window switching without windows intel_th: msu: Fix the unexpected state warning Alexandru Gagniuc (1): Intel:PCI: pciehp: Wait for PDS if in-band presence is disabled Alison Schofield (1): acpi/hmat: Update acpi_hmat_type enum with ACPI_HMAT_TYPE_PROXIMITY Andi Kleen (3): Intel: perf/x86/intel: Extract memory code PEBS parser for reuse Intel: perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them Intel: perf tools x86: Add support for recording and printing XMM registers Andrew Murray (2): Intel: perf/core: Add function to test for event exclusion flags Intel: perf/core: Add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs Andy Lutomirski (1): x86/traps: Stop using ist_enter/exit() in do_int3() Andy Shevchenko (4): Intel:PCI/AER: Use match_string() helper to simplify the code Intel:PCI/AER: Use for_each_set_bit() to simplify code Intel:PCI/AER: Fix kernel-doc warnings intel_th: pti: Use sysfs_match_string() helper Aneesh Kumar K.V (1): drivers/dax: Allow to include DEV_DAX_PMEM as builtin Arnaldo Carvalho de Melo (3): Intel: perf record: Fix suggestion to get list of registers usable with --user-regs and --intr-regs Intel: perf parse-regs: Improve error output when faced with unknown register name tools x86 uapi asm: Sync the pt_regs.h copy with the kernel sources Bharat Kumar Gogada (1): Intel:PCI: Enable SERR# forwarding for all bridges Bjorn Helgaas (5): PCI: Add pci_speed_string() PCI: Use pci_speed_string() for all PCI/PCI-X/PCIe strings Intel:PCI/ASPM: Save LTR Capability for suspend/resume Intel:PCI/portdrv: Use conventional Device ID table formatting Intel:PCI: Use dev_printk() when possible Chen Yu (3): Intel: intel_idle: Customize IceLake server support tools/power turbostat: Support Ice Lake server intel_idle: Fix max_cstate for processor models without C-state tables Colin Ian King (2): intel_th: msu: Fix missing allocation failure check on a kstrndup intel_th: msu: Fix overflow in shift of an unsigned int Dan Carpenter (2): tools/power/x86/intel-speed-select: Fix a read overflow in isst_set_tdp_level_msr() node: fix device cleanups in error handling code Dan Williams (11): Intel: device-dax: Kill dax_region ida Intel: device-dax: Kill dax_region base Intel: device-dax: Remove multi-resource infrastructure Intel: device-dax: Start defining a dax bus model Intel: device-dax: Introduce bus + driver model Intel: device-dax: Move resource pinning+mapping into the common driver Intel: device-dax: Add support for a dax override driver Intel: device-dax: Add /sys/class/dax backwards compatibility Intel: acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node Intel: device-dax: Auto-bind device after successful new_id Intel: device-dax: Add a 'target_node' attribute Dave Hansen (4): Intel: mm/resource: Move HMM pr_debug() deeper into resource code Intel: mm/memory-hotplug: Allow memory resources to be children mm/resource: Let walk_system_ram_range() search child resources Intel: device-dax: "Hotplug" persistent memory for use like normal RAM Dave Jiang (7): Intel: dmaengine: ioatdma: Add Snow Ridge ioatdma device id Intel: dmaengine: ioatdma: disable DCA enabling on IOATDMA v3.4 Intel: dmaengine: ioatdma: add descriptor pre-fetch support for v3.4 Intel: dmaengine: ioatdma: support latency tolerance report (LTR) for v3.4 Intel: ntb: intel: Add Icelake (gen4) support for Intel NTB Intel: ntb: intel: fix static declaration Intel: ntb: intel: add hw workaround for NTB BAR alignment Dinghao Liu (1): ntb: intel: Fix memleak in intel_ntb_pci_probe Dongdong Liu (1): PCI/AER: Initialize aer_fifo Erik Schmauss (1): Intel: ACPICA: ACPI 6.3: HMAT updates Felipe Balbi (1): Intel: PCI: Add support for Immediate Readiness Frederick Lawler (1): Intel:PCI/DPC: Log messages with pci_dev, not pcie_device Guoqing Jiang (11): Intel: intel_idle: Use ACPI _CST on server systems Intel: perf/x86/intel: Add more Icelake CPUIDs Intel:PCI/AER: Use threaded IRQ for bottom half Intel:PCI/AER: Use managed resource allocations Intel:PCI/AER: Log messages with pci_dev, not pcie_device Intel:PCI: pciehp: Disable in-band presence detect when possible perf/x86/intel: Export mem events only if there's PEBS support Intel: perf/x86: Use the new pmu::update_attrs attribute group intel: perf/x86/intel: Use update attributes for skylake format Intel: perf/x86/amd: Constrain Large Increment per Cycle events Intel: perf/x86/intel: Support TopDown metrics on Ice Lake Gustavo A. R. Silva (1): intel_th: Mark expected switch fall-throughs Gustavo Pimentel (1): PCI: Decode PCIe 32 GT/s link speed Honghui Zhang (1): Intel:PCI/portdrv: Support PCIe services on subtractive decode bridges Jann Horn (2): Intel: x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups Intel: x86/insn-eval: Add support for 64-bit kernel mode Jay Fang (1): PCI/PME: Fix kernel-doc of pcie_pme_resume() and pcie_pme_remove() Jiri Olsa (7): Intel: sysfs: Add sysfs_update_groups function Intel: perf/core: Add attr_groups_update into struct pmu Intel: perf/x86: Get rid of x86_pmu::event_attrs Intel: perf/x86: Add is_visible attribute_group callback for base events Intel: perf/x86: Use update attribute groups for caps Intel: perf/x86: Use update attribute groups for extra format Intel: perf/x86: Use update attribute groups for default attributes Jonathan Corbet (1): docs: fix numaperf.rst and add it to the doc tree Kan Liang (39): perf/x86/intel: Add Icelake support perf/x86/intel: Add Tremont core PMU support Intel: perf/x86: Support outputting XMM registers Intel: perf/x86/intel/ds: Extract code of event update in short period Intel: perf/x86/intel: Support adaptive PEBS v4 Intel: perf/x86/intel/uncore: Add Intel Icelake uncore support Intel: perf parse-regs: Split parse_regs Intel: perf parse-regs: Add generic support for arch__intr/user_reg_mask() Intel: perf regs x86: Add X86 specific arch__intr_reg_mask() Intel: perf/x86: Disable extended registers for non-supported PMUs Intel: perf/x86/regs: Check reserved bits Intel: perf/x86: Clean up PEBS_XMM_REGS Intel: perf/x86: Remove pmu->pebs_no_xmm_regs Intel: perf/x86/regs: Use PERF_REG_EXTENDED_MASK Intel: perf/x86/intel/uncore: Add uncore support for Snow Ridge server Intel: perf/x86/intel/uncore: Factor out box ref/unref functions Intel: perf/x86/intel/uncore: Support MMIO type uncore blocks Intel: perf/x86/intel/uncore: Clean up client IMC Intel: perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge Intel: perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box Intel: perf/x86/intel/uncore: Add box_offsets for free-running counters Intel: perf/x86/intel/uncore: Add Ice Lake server uncore support Intel: perf/x86/intel: Factor out common code of PMI handler Intel: perf/x86/intel: Fix SLOTS PEBS event constraint Intel: perf/x86: Use event_base_rdpmc for the RDPMC userspace support Intel: perf/x86/intel: Name the global status bit in NMI handler Intel: perf/x86/intel: Introduce the fourth fixed counter Intel: perf/x86/intel: Move BTS index to 47 Intel: perf/x86/intel: Fix the name of perf METRICS Intel: perf/x86/intel: Use switch in intel_pmu_disable/enable_event Intel: perf/core: Add a new PERF_EV_CAP_SIBLING event capability Intel: perf/x86/intel: Generic support for hardware TopDown metrics Intel: perf/x86: Add a macro for RDPMC offset of fixed counters Intel: perf/x86/intel: Support per-thread RDPMC TopDown metrics Intel: perf/x86/intel: Check perf metrics feature for each CPU perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events perf/x86/intel/uncore: Reduce the number of CBOX counters perf/x86/intel/uncore: Fix the scale of the IMC free-running events perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server Keith Busch (23): PCI/AER: Remove error source from AER struct aer_rpc PCI/AER: Use kfifo for tracking events instead of reimplementing it Intel: acpi: Create subtable parsing infrastructure Intel: acpi: Add HMAT to generic parsing tables Intel: acpi/hmat: Parse and report heterogeneous memory node: Link memory nodes to their compute nodes Intel: node: Add heterogenous memory access attributes Intel: node: Add memory-side caching attributes Intel: acpi/hmat: Register processor domain to its memory Intel: acpi/hmat: Register performance attributes Intel: acpi/hmat: Register memory side cache attributes Intel: doc/mm: New documentation for memory performance PCI: portdrv: Restore PCI config state on slot reset Intel:PCI/DPC: Save and restore config state PCI/ERR: Handle fatal error recovery PCI/ERR: Simplify broadcast callouts Intel:PCI/ERR: Always report current recovery status for udev PCI: Unify device inaccessible Intel:PCI: Make link active reporting detection generic Intel:PCI/AER: Remove unused aer_error_resume() Intel:PCI/AER: Use kfifo_in_spinlocked() to insert locked elements Intel:PCI/AER: Reuse existing pcie_port_find_device() interface Intel:PCI/AER: Abstract AER interrupt handling Kim Phillips (2): perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Fix sampling Large Increment per Cycle events Kuppuswamy Sathyanarayanan (2): PCI/ERR: Combine pci_channel_io_frozen cases PCI/ERR: Update error status after reset_link() Len Brown (9): Intel: topology: Simplify cputopology.txt formatting and wording Intel: x86/topology: Add CPUID.1F multi-die/package support Intel: x86/topology: Create topology_max_die_per_package() Intel: cpu/topology: Export die_id Intel: x86/topology: Define topology_die_id() Intel: x86/topology: Define topology_logical_die_id() tools/power turbostat: reduce debug output tools/power turbostat: consolidate duplicate model numbers tools/power turbostat: Fix Haswell Core systems Leonid Ravich (1): Intel: NTB: add new parameter to peer_db_addr() db_bit and db_data Like Xu (4): perf/x86/core: Refactor hw->idx checks and cleanup Intel: perf/x86/lbr: Add interface to get LBR information Intel: perf/x86: Add constraint to create guest LBR event without hw counter Intel: perf/x86: Keep LBR records unchanged in host context for guest usage Lukas Wunner (1): PCI: Simplify disconnected marking Mel Gorman (1): intel_idle: Ignore _CST if control cannot be taken from the platform Mika Westerberg (2): Intel:PCI: Make pcie_downstream_port() available outside of access.c Intel:PCI: Get rid of dev->has_secondary_link flag Mohan Kumar (2): Intel:PCI: Replace printk(KERN_INFO) with pr_info(), etc Intel:PCI: Replace dev_printk(KERN_DEBUG) with dev_info(), etc Olof Johansson (1): Intel:PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control Otto Sabart (1): doc: trace: fix reference to cpuidle documentation file Patel, Mayurkumar (1): Intel:PCI/AER: Save AER Capability for suspend/resume Pavel Tatashin (1): device-dax: fix memory and resource leak if hotplug fails Peter Zijlstra (4): perf/x86: Support constraint ranges Intel: perf/x86: Fix n_metric for cancelled txn Intel: x86/uaccess: Move copy_user_handle_tail() into asm Intel: hardirq/nmi: Allow nested nmi_enter() Qian Cai (2): acpi/hmat: fix memory leaks in hmat_init() acpi/hmat: fix an uninitialized memory_target Qiuxu Zhuo (9): Intel: EDAC, skx_common: Separate common code out from skx_edac EDAC, skx_edac: Delete duplicated code Intel: EDAC, i10nm: Add a driver for Intel 10nm server processors Intel: EDAC, skx, i10nm: Make skx_common.c a pure library Intel: EDAC, i10nm: Add Intel additional Ice-Lake support Intel: EDAC, i10nm: Check ECC enabling status per channel Intel: EDAC, skx, i10nm: Fix source ID register offset Intel: EDAC, {skx,i10nm}: Make some configurations CPU model specific Intel: EDAC/i10nm: Update driver to support different bus number config register offsets Rafael J. Wysocki (12): Intel: ACPI: processor: Export function to claim _CST control Intel: ACPI: processor: Introduce acpi_processor_evaluate_cst() Intel: ACPI: processor: Clean up acpi_processor_evaluate_cst() Intel: ACPI: processor: Export acpi_processor_evaluate_cst() Intel: intel_idle: Refactor intel_idle_cpuidle_driver_init() Intel: intel_idle: Use ACPI _CST for processor models without C-state tables Intel: Documentation: admin-guide: PM: Add cpuidle document Intel: cpuidle: Allow idle states to be disabled by default Intel: intel_idle: Allow ACPI _CST to be used for selected known processors Intel: intel_idle: Add module parameter to prevent ACPI _CST from being used Intel: ACPI: processor: Make ACPI_PROCESSOR_CSTATE depend on ACPI_PROCESSOR Intel: Documentation: admin-guide: PM: Add intel_idle document Shaokun Zhang (1): intel_th: msu: Fix unused variable warning on arm64 platform Srinivas Pandruvada (13): Intel: platform/x86: ISST: Update ioctl-number.txt for Intel Speed Select interface Intel: platform/x86: ISST: Add common API to register and handle ioctls Intel: platform/x86: ISST: Store per CPU information Intel: platform/x86: ISST: Add IOCTL to Translate Linux logical CPU to PUNIT CPU number Intel: platform/x86: ISST: Add Intel Speed Select mmio interface Intel: platform/x86: ISST: Add Intel Speed Select mailbox interface via PCI Intel: platform/x86: ISST: Add Intel Speed Select mailbox interface via MSRs Intel: platform/x86: ISST: Add Intel Speed Select PUNIT MSR interface Intel: platform/x86: ISST: Restore state on resume Intel: tools/power/x86: A tool to validate Intel Speed Select commands Intel: ICX: platform/x86: ISST: Allow additional core-power mailbox commands Intel: ICX: platform/x86: ISST: Fix wrong unregister type Intel: platform/x86: ISST: Increase timeout Stephen Rothwell (1): intel_rapl: need linux/cpuhotplug.h for enum cpuhp_state Stuart Hayes (1): PCI: pciehp: Add DMI table for in-band presence detection disabled Subbaraya Sundeep (2): Intel:PCI: Assign bus numbers present in EA capability for bridges PCI: Do not use bus number zero from EA capability Sumeet Pawnikar (1): powercap: RAPL: remove unused local MSR define Thomas Gleixner (2): genirq: Provide interrupt injection mechanism Intel:PCI/AER: Fix the broken interrupt injection Tony Luck (5): Intel: EDAC, skx_common: Add code to recognise new compound error code Intel: EDAC, skx_common: Refactor so that we initialize "dev" in result of adxl decode. Intel: EDAC, skx: Retrieve and print retry_rd_err_log registers MAINTAINERS: Update entry for EDAC-SKYLAKE MAINTAINERS: Add entry for EDAC-I10NM Vishal Verma (1): Intel: device-dax: Add a 'modalias' attribute to DAX 'bus' devices Wang Hai (1): device-dax/core: Fix memory leak when rmmod dax.ko Wei Li (1): kabi: Fix "Intel: perf/core: Add attr_groups_update into struct pmu" Wei Wang (1): Intel: perf/x86: Fix variable types for LBR registers Wei Yongjun (1): intel_th: msu: Fix possible memory leak in mode_store() Yangtao Li (1): Intel: cpuidle: use BIT() for idle state flags and remove CPUIDLE_DRIVER_FLAGS_MASK Yanjiang Jin (1): Intel:PCI/AER: Queue one GHES event, not several uninitialized ones Yicong Yang (3): PCI/AER: Log which device prevents error recovery PCI: Add 32 GT/s decoding in some macros PCI: Add PCIE_LNKCAP2_SLS2SPEED() macro YueHaibing (2): Intel:PCI/ERR: Remove duplicated include from err.c intel_th: msu: Remove set but not used variable 'last' Yunying Sun (1): Intel: perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x register Zhang Rui (18): Intel: powercap/intel_rapl: Simplify rapl_find_package() Intel: powercap/intel_rapl: Support multi-die/package Intel: powercap/intel_rapl: Update RAPL domain name and debug messages Intel: intel_rapl: use reg instead of msr Intel: intel_rapl: remove hardcoded register index Intel: intel_rapl: introduce intel_rapl.h Intel: intel_rapl: introduce struct rapl_if_private Intel: intel_rapl: abstract register address Intel: intel_rapl: abstract register access operations Intel: intel_rapl: cleanup some functions Intel: intel_rapl: cleanup hardcoded MSR access intel_rapl: abstract RAPL common code Intel: intel_rapl: support 64 bit register Intel: intel_rapl: support two power limits for every RAPL domain Intel: intel_rapl: Fix module autoloading issue Intel: powercap/intel_rapl: add support for IceLake desktop Intel: powercap/intel_rapl: add support for ICX Intel: powercap/intel_rapl: add support for ICX-D Zheng Zengkai (9): irqchip: phytium-2500: Fix compilation issues hulk_defconfig: Enable some Icelake support configs openeuler_defconfig: Enable some Icelake support configs hulk_defconfig: Adjust some configs for Intel icelake support openeuler_defconfig: Adjust some configs for Intel icelake support kabi: Fix "PCI: Decode PCIe 32 GT/s link speed" PCI: kabi: fix kabi broken for struct pci_dev kabi: Fix "perf/x86/intel: Support per-thread RDPMC TopDown metrics" x86: Fix kabi broken for struct cpuinfo_x86 Documentation/ABI/obsolete/sysfs-class-dax | 22 + Documentation/ABI/stable/sysfs-devices-node | 87 +- .../testing/sysfs-bus-intel_th-devices-msc | 11 +- .../ABI/testing/sysfs-devices-system-cpu | 6 + Documentation/PCI/pci-error-recovery.txt | 35 +- .../admin-guide/kernel-parameters.txt | 2 + Documentation/admin-guide/mm/index.rst | 1 + Documentation/admin-guide/mm/numaperf.rst | 169 ++ Documentation/admin-guide/pm/intel_idle.rst | 246 +++ .../admin-guide/pm/working-state.rst | 2 + Documentation/cpuidle/core.txt | 23 - Documentation/cpuidle/sysfs.txt | 98 - Documentation/cputopology.txt | 55 +- Documentation/ioctl/ioctl-number.txt | 1 + Documentation/trace/coresight-cpu-debug.txt | 2 +- Documentation/x86/topology.txt | 4 + MAINTAINERS | 10 +- arch/arm64/configs/hulk_defconfig | 2 + arch/arm64/configs/openeuler_defconfig | 2 + arch/arm64/kernel/acpi_numa.c | 2 +- arch/arm64/kernel/smp.c | 4 +- arch/ia64/kernel/acpi.c | 14 +- arch/x86/configs/hulk_defconfig | 18 +- arch/x86/configs/openeuler_defconfig | 18 +- arch/x86/events/amd/core.c | 107 +- arch/x86/events/core.c | 315 ++-- arch/x86/events/intel/core.c | 929 ++++++++-- arch/x86/events/intel/ds.c | 501 ++++- arch/x86/events/intel/lbr.c | 84 +- arch/x86/events/intel/pt.c | 3 +- arch/x86/events/intel/uncore.c | 137 +- arch/x86/events/intel/uncore.h | 40 +- arch/x86/events/intel/uncore_snb.c | 107 +- arch/x86/events/intel/uncore_snbep.c | 1119 ++++++++++++ arch/x86/events/perf_event.h | 160 +- arch/x86/include/asm/asm.h | 31 +- arch/x86/include/asm/futex.h | 6 +- arch/x86/include/asm/intel_ds.h | 2 +- arch/x86/include/asm/msr-index.h | 4 + arch/x86/include/asm/perf_event.h | 181 +- arch/x86/include/asm/processor.h | 6 +- arch/x86/include/asm/ptrace.h | 13 + arch/x86/include/asm/topology.h | 16 + arch/x86/include/asm/uaccess.h | 16 +- arch/x86/include/asm/uaccess_64.h | 3 - arch/x86/include/uapi/asm/perf_regs.h | 26 +- arch/x86/kernel/acpi/boot.c | 36 +- arch/x86/kernel/cpu/common.c | 1 + arch/x86/kernel/cpu/topology.c | 88 +- arch/x86/kernel/perf_regs.c | 28 +- arch/x86/kernel/smpboot.c | 47 + arch/x86/kernel/traps.c | 21 +- arch/x86/lib/checksum_32.S | 4 +- arch/x86/lib/copy_user_64.S | 138 +- arch/x86/lib/csum-copy_64.S | 8 +- arch/x86/lib/getuser.S | 12 +- arch/x86/lib/insn-eval.c | 26 +- arch/x86/lib/putuser.S | 10 +- arch/x86/lib/usercopy_32.c | 126 +- arch/x86/lib/usercopy_64.c | 24 +- arch/x86/mm/extable.c | 8 + drivers/acpi/Kconfig | 2 + drivers/acpi/Makefile | 1 + drivers/acpi/acpi_processor.c | 182 ++ drivers/acpi/hmat/Kconfig | 11 + drivers/acpi/hmat/Makefile | 1 + drivers/acpi/hmat/hmat.c | 666 +++++++ drivers/acpi/nfit/core.c | 8 +- drivers/acpi/numa.c | 17 +- drivers/acpi/processor_idle.c | 174 +- drivers/acpi/scan.c | 4 +- drivers/acpi/tables.c | 76 +- drivers/base/Kconfig | 8 + drivers/base/memory.c | 1 + drivers/base/node.c | 350 +++- drivers/base/topology.c | 4 + drivers/cpuidle/cpuidle.c | 7 +- drivers/cpuidle/sysfs.c | 10 + drivers/dax/Kconfig | 27 +- drivers/dax/Makefile | 6 +- drivers/dax/bus.c | 503 ++++++ drivers/dax/bus.h | 61 + drivers/dax/dax-private.h | 34 +- drivers/dax/dax.h | 18 - drivers/dax/device-dax.h | 25 - drivers/dax/device.c | 363 +--- drivers/dax/kmem.c | 111 ++ drivers/dax/pmem.c | 153 -- drivers/dax/pmem/Makefile | 7 + drivers/dax/pmem/compat.c | 73 + drivers/dax/pmem/core.c | 71 + drivers/dax/pmem/pmem.c | 40 + drivers/dax/super.c | 42 +- drivers/dma/ioat/dma.c | 12 + drivers/dma/ioat/dma.h | 2 +- drivers/dma/ioat/hw.h | 3 + drivers/dma/ioat/init.c | 40 +- drivers/dma/ioat/registers.h | 24 + drivers/edac/Kconfig | 12 + drivers/edac/Makefile | 7 +- drivers/edac/i10nm_base.c | 344 ++++ drivers/edac/skx_base.c | 752 ++++++++ drivers/edac/skx_common.c | 657 +++++++ drivers/edac/skx_common.h | 153 ++ drivers/edac/skx_edac.c | 1357 -------------- drivers/hwtracing/intel_th/Makefile | 3 + drivers/hwtracing/intel_th/acpi.c | 10 +- drivers/hwtracing/intel_th/core.c | 146 +- drivers/hwtracing/intel_th/gth.c | 130 +- drivers/hwtracing/intel_th/gth.h | 19 + drivers/hwtracing/intel_th/intel_th.h | 32 +- drivers/hwtracing/intel_th/msu-sink.c | 116 ++ drivers/hwtracing/intel_th/msu.c | 832 +++++++-- drivers/hwtracing/intel_th/msu.h | 28 +- drivers/hwtracing/intel_th/pci.c | 32 +- drivers/hwtracing/intel_th/pti.c | 16 +- drivers/hwtracing/intel_th/sth.c | 4 + drivers/idle/intel_idle.c | 388 +++- drivers/irqchip/irq-gic-phytium-2500-its.c | 6 +- drivers/irqchip/irq-gic-phytium-2500.c | 10 +- drivers/irqchip/irq-gic-v2m.c | 2 +- drivers/irqchip/irq-gic-v3-its-pci-msi.c | 2 +- drivers/irqchip/irq-gic-v3-its-platform-msi.c | 2 +- drivers/irqchip/irq-gic-v3-its.c | 6 +- drivers/irqchip/irq-gic-v3.c | 10 +- drivers/irqchip/irq-gic.c | 4 +- drivers/mailbox/pcc.c | 2 +- drivers/ntb/hw/intel/Makefile | 2 +- drivers/ntb/hw/intel/ntb_hw_gen1.c | 72 +- drivers/ntb/hw/intel/ntb_hw_gen1.h | 6 +- drivers/ntb/hw/intel/ntb_hw_gen3.c | 44 +- drivers/ntb/hw/intel/ntb_hw_gen3.h | 8 + drivers/ntb/hw/intel/ntb_hw_gen4.c | 552 ++++++ drivers/ntb/hw/intel/ntb_hw_gen4.h | 100 + drivers/ntb/hw/intel/ntb_hw_intel.h | 12 + drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 9 +- drivers/nvdimm/e820.c | 1 + drivers/nvdimm/nd.h | 2 +- drivers/nvdimm/of_pmem.c | 1 + drivers/nvdimm/region_devs.c | 1 + drivers/pci/access.c | 11 +- drivers/pci/bus.c | 5 +- drivers/pci/hotplug/pciehp.h | 1 + drivers/pci/hotplug/pciehp_hpc.c | 75 +- drivers/pci/hotplug/pciehp_pci.c | 9 +- drivers/pci/pci-acpi.c | 11 +- drivers/pci/pci-stub.c | 10 +- drivers/pci/pci-sysfs.c | 27 +- drivers/pci/pci.c | 143 +- drivers/pci/pci.h | 101 +- drivers/pci/pcie/Kconfig | 2 +- drivers/pci/pcie/aer.c | 340 ++-- drivers/pci/pcie/aer_inject.c | 48 +- drivers/pci/pcie/aspm.c | 8 +- drivers/pci/pcie/dpc.c | 106 +- drivers/pci/pcie/err.c | 210 +-- drivers/pci/pcie/pme.c | 4 +- drivers/pci/pcie/portdrv.h | 6 +- drivers/pci/pcie/portdrv_core.c | 12 +- drivers/pci/pcie/portdrv_pci.c | 24 +- drivers/pci/probe.c | 193 +- drivers/pci/quirks.c | 15 +- drivers/pci/setup-bus.c | 30 +- drivers/pci/slot.c | 39 +- drivers/pci/vc.c | 4 +- drivers/pci/xen-pcifront.c | 7 +- drivers/platform/x86/Kconfig | 2 + drivers/platform/x86/Makefile | 1 + .../x86/intel_speed_select_if/Kconfig | 17 + .../x86/intel_speed_select_if/Makefile | 10 + .../intel_speed_select_if/isst_if_common.c | 675 +++++++ .../intel_speed_select_if/isst_if_common.h | 69 + .../intel_speed_select_if/isst_if_mbox_msr.c | 216 +++ .../intel_speed_select_if/isst_if_mbox_pci.c | 213 +++ .../x86/intel_speed_select_if/isst_if_mmio.c | 180 ++ drivers/powercap/Kconfig | 11 +- drivers/powercap/Makefile | 4 +- .../{intel_rapl.c => intel_rapl_common.c} | 857 ++++----- drivers/powercap/intel_rapl_msr.c | 183 ++ fs/sysfs/group.c | 54 +- include/acpi/actbl1.h | 10 +- include/linux/acpi.h | 26 +- include/linux/aer.h | 4 + include/linux/cpuidle.h | 10 +- include/linux/hardirq.h | 5 +- include/linux/intel_rapl.h | 155 ++ include/linux/intel_th.h | 79 + include/linux/interrupt.h | 2 + include/linux/libnvdimm.h | 1 + include/linux/node.h | 71 + include/linux/ntb.h | 10 +- include/linux/pci.h | 15 +- include/linux/perf_event.h | 47 +- include/linux/perf_regs.h | 8 + include/linux/preempt.h | 4 +- include/linux/sysfs.h | 8 + include/linux/topology.h | 3 + include/uapi/linux/isst_if.h | 172 ++ include/uapi/linux/pci_regs.h | 13 + kernel/events/core.c | 90 +- kernel/events/ring_buffer.c | 3 +- kernel/irq/Kconfig | 5 + kernel/irq/chip.c | 2 +- kernel/irq/debugfs.c | 34 +- kernel/irq/internals.h | 2 +- kernel/irq/resend.c | 53 +- kernel/resource.c | 14 +- mm/memory_hotplug.c | 33 +- tools/Makefile | 11 +- tools/arch/x86/include/uapi/asm/perf_regs.h | 26 +- tools/perf/Documentation/perf-record.txt | 3 +- tools/perf/arch/x86/include/perf_regs.h | 25 +- tools/perf/arch/x86/util/perf_regs.c | 44 + tools/perf/builtin-record.c | 4 +- tools/perf/util/parse-regs-options.c | 33 +- tools/perf/util/parse-regs-options.h | 3 +- tools/perf/util/perf_regs.c | 10 + tools/perf/util/perf_regs.h | 3 + tools/power/x86/intel-speed-select/Build | 1 + tools/power/x86/intel-speed-select/Makefile | 56 + .../x86/intel-speed-select/isst-config.c | 1607 +++++++++++++++++ .../power/x86/intel-speed-select/isst-core.c | 721 ++++++++ .../x86/intel-speed-select/isst-display.c | 479 +++++ tools/power/x86/intel-speed-select/isst.h | 231 +++ tools/power/x86/turbostat/turbostat.c | 80 +- tools/testing/nvdimm/Kbuild | 7 +- tools/testing/nvdimm/dax-dev.c | 16 +- 227 files changed, 17781 insertions(+), 4637 deletions(-) create mode 100644 Documentation/ABI/obsolete/sysfs-class-dax create mode 100644 Documentation/admin-guide/mm/numaperf.rst create mode 100644 Documentation/admin-guide/pm/intel_idle.rst delete mode 100644 Documentation/cpuidle/core.txt delete mode 100644 Documentation/cpuidle/sysfs.txt create mode 100644 drivers/acpi/hmat/Kconfig create mode 100644 drivers/acpi/hmat/Makefile create mode 100644 drivers/acpi/hmat/hmat.c create mode 100644 drivers/dax/bus.c create mode 100644 drivers/dax/bus.h delete mode 100644 drivers/dax/dax.h delete mode 100644 drivers/dax/device-dax.h create mode 100644 drivers/dax/kmem.c delete mode 100644 drivers/dax/pmem.c create mode 100644 drivers/dax/pmem/Makefile create mode 100644 drivers/dax/pmem/compat.c create mode 100644 drivers/dax/pmem/core.c create mode 100644 drivers/dax/pmem/pmem.c create mode 100644 drivers/edac/i10nm_base.c create mode 100644 drivers/edac/skx_base.c create mode 100644 drivers/edac/skx_common.c create mode 100644 drivers/edac/skx_common.h delete mode 100644 drivers/edac/skx_edac.c create mode 100644 drivers/hwtracing/intel_th/msu-sink.c create mode 100644 drivers/ntb/hw/intel/ntb_hw_gen4.c create mode 100644 drivers/ntb/hw/intel/ntb_hw_gen4.h create mode 100644 drivers/platform/x86/intel_speed_select_if/Kconfig create mode 100644 drivers/platform/x86/intel_speed_select_if/Makefile create mode 100644 drivers/platform/x86/intel_speed_select_if/isst_if_common.c create mode 100644 drivers/platform/x86/intel_speed_select_if/isst_if_common.h create mode 100644 drivers/platform/x86/intel_speed_select_if/isst_if_mbox_msr.c create mode 100644 drivers/platform/x86/intel_speed_select_if/isst_if_mbox_pci.c create mode 100644 drivers/platform/x86/intel_speed_select_if/isst_if_mmio.c rename drivers/powercap/{intel_rapl.c => intel_rapl_common.c} (61%) create mode 100644 drivers/powercap/intel_rapl_msr.c create mode 100644 include/linux/intel_rapl.h create mode 100644 include/linux/intel_th.h create mode 100644 include/uapi/linux/isst_if.h create mode 100644 tools/power/x86/intel-speed-select/Build create mode 100644 tools/power/x86/intel-speed-select/Makefile create mode 100644 tools/power/x86/intel-speed-select/isst-config.c create mode 100644 tools/power/x86/intel-speed-select/isst-core.c create mode 100644 tools/power/x86/intel-speed-select/isst-display.c create mode 100644 tools/power/x86/intel-speed-select/isst.h -- 2.20.1

1 292

[PATCH openEuler-21.09 01/46] ima: Introduce ima namespace
by Zheng Zengkai 10 Sep '21

10 Sep '21

From: Krzysztof Struczynski <krzysztof.struczynski(a)huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I49KW1 CVE: NA -------------------------------- IMA namespace wraps global ima resources in an abstraction, to enable ima to work with the containers. Currently, ima namespace contains no useful data but a dummy interface. IMA resources related to different aspects of IMA, namely IMA-audit, IMA-measurement, IMA-appraisal will be added in the following patches. The way how ima namespace is created is analogous to the time namespace: unshare(CLONE_NEWIMA) system call creates a new ima namespace but doesn't assign it to the current process. All children of the process will be born in the new ima namespace, or a process can use setns() system call to join the new ima namespace. Call to clone3(CLONE_NEWIMA) system call creates a new namespace, which the new process joins instantly. This scheme, allows to configure the new ima namespace before any process appears in it. If user initially unshares the new ima namespace, ima can be configured using ima entries in the securityfs. If user calls clone3() system call directly, the new ima namespace can be configured using clone arguments. To allow this, new securityfs entries have to be added, and structures clone_args and kernel_clone_args have to be extended. Early configuration is crucial. The new ima polices must apply to the first process in the new namespace, and the appraisal key has to be loaded beforehand. Add a new CONFIG_IMA_NS option to the kernel configuration, that enables one to create a new IMA namespace. IMA namespace functionality is disabled by default. Signed-off-by: Krzysztof Struczynski <krzysztof.struczynski(a)huawei.com> Reviewed-by: Zhang Tianxing <zhangtianxing3(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- fs/proc/namespaces.c | 4 + include/linux/ima.h | 57 ++++++++ include/linux/nsproxy.h | 3 + include/linux/proc_ns.h | 5 +- include/linux/user_namespace.h | 1 + include/uapi/linux/sched.h | 1 + init/Kconfig | 12 ++ kernel/fork.c | 24 +++- kernel/nsproxy.c | 42 +++++- kernel/ucount.c | 1 + security/integrity/ima/Makefile | 1 + security/integrity/ima/ima.h | 13 ++ security/integrity/ima/ima_fs.c | 4 +- security/integrity/ima/ima_init.c | 13 ++ security/integrity/ima/ima_ns.c | 232 ++++++++++++++++++++++++++++++ 15 files changed, 402 insertions(+), 11 deletions(-) create mode 100644 security/integrity/ima/ima_ns.c diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c index 8e159fc78c0a..117812a59e5d 100644 --- a/fs/proc/namespaces.c +++ b/fs/proc/namespaces.c @@ -37,6 +37,10 @@ static const struct proc_ns_operations *ns_entries[] = { &timens_operations, &timens_for_children_operations, #endif +#ifdef CONFIG_IMA_NS + &imans_operations, + &imans_for_children_operations, +#endif }; static const char *proc_ns_get_link(struct dentry *dentry, diff --git a/include/linux/ima.h b/include/linux/ima.h index f7a088b2579e..67af79961e4e 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -13,6 +13,9 @@ #include <linux/kexec.h> struct linux_binprm; +struct nsproxy; +struct task_struct; + #ifdef CONFIG_IMA extern int ima_bprm_check(struct linux_binprm *bprm); extern int ima_file_check(struct file *file, int mask); @@ -197,4 +200,58 @@ static inline bool ima_appraise_signature(enum kernel_read_file_id func) return false; } #endif /* CONFIG_IMA_APPRAISE && CONFIG_INTEGRITY_TRUSTED_KEYRING */ + +struct ima_namespace { + struct kref kref; + struct ns_common ns; + struct ucounts *ucounts; + struct user_namespace *user_ns; +} __randomize_layout; + +extern struct ima_namespace init_ima_ns; + +#ifdef CONFIG_IMA_NS +struct ima_namespace *copy_ima_ns(unsigned long flags, + struct user_namespace *user_ns, + struct ima_namespace *old_ns); + +void free_ima_ns(struct kref *kref); + +int imans_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk); + +static inline struct ima_namespace *get_ima_ns(struct ima_namespace *ns) +{ + if (ns) + kref_get(&ns->kref); + return ns; +} +static inline void put_ima_ns(struct ima_namespace *ns) +{ + if (ns) + kref_put(&ns->kref, free_ima_ns); +} + +#else +static inline struct ima_namespace *copy_ima_ns(unsigned long flags, + struct user_namespace *user_ns, + struct ima_namespace *old_ns) +{ + return old_ns; +} + +static inline int imans_on_fork(struct nsproxy *nsproxy, + struct task_struct *tsk) +{ + return 0; +} + +static inline struct ima_namespace *get_ima_ns(struct ima_namespace *ns) +{ + return ns; +} + +static inline void put_ima_ns(struct ima_namespace *ns) +{ +} +#endif /* CONFIG_IMA_NS */ #endif /* _LINUX_IMA_H */ diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index cdb171efc7cb..56216a94009d 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -10,6 +10,7 @@ struct uts_namespace; struct ipc_namespace; struct pid_namespace; struct cgroup_namespace; +struct ima_namespace; struct fs_struct; /* @@ -38,6 +39,8 @@ struct nsproxy { struct time_namespace *time_ns; struct time_namespace *time_ns_for_children; struct cgroup_namespace *cgroup_ns; + struct ima_namespace *ima_ns; + struct ima_namespace *ima_ns_for_children; }; extern struct nsproxy init_nsproxy; diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 75807ecef880..c8c596d67629 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -16,7 +16,7 @@ struct inode; struct proc_ns_operations { const char *name; const char *real_ns_name; - int type; + uint64_t type; struct ns_common *(*get)(struct task_struct *task); void (*put)(struct ns_common *ns); int (*install)(struct nsset *nsset, struct ns_common *ns); @@ -34,6 +34,8 @@ extern const struct proc_ns_operations mntns_operations; extern const struct proc_ns_operations cgroupns_operations; extern const struct proc_ns_operations timens_operations; extern const struct proc_ns_operations timens_for_children_operations; +extern const struct proc_ns_operations imans_operations; +extern const struct proc_ns_operations imans_for_children_operations; /* * We always define these enumerators @@ -46,6 +48,7 @@ enum { PROC_PID_INIT_INO = 0xEFFFFFFCU, PROC_CGROUP_INIT_INO = 0xEFFFFFFBU, PROC_TIME_INIT_INO = 0xEFFFFFFAU, + PROC_IMA_INIT_INO = 0xEFFFFFF9U, }; #ifdef CONFIG_PROC_FS diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 7616c7bf4b24..3eb64a50f248 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -46,6 +46,7 @@ enum ucount_type { UCOUNT_MNT_NAMESPACES, UCOUNT_CGROUP_NAMESPACES, UCOUNT_TIME_NAMESPACES, + UCOUNT_IMA_NAMESPACES, #ifdef CONFIG_INOTIFY_USER UCOUNT_INOTIFY_INSTANCES, UCOUNT_INOTIFY_WATCHES, diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 3bac0a8ceab2..b30e27efee92 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -36,6 +36,7 @@ /* Flags for the clone3() syscall. */ #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */ #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */ +#define CLONE_NEWIMA 0x400000000ULL /* New IMA namespace. */ /* * cloning flags intersect with CSIGNAL so can be used with unshare and clone3 diff --git a/init/Kconfig b/init/Kconfig index fb3eb910f224..b095baa2b83c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1206,6 +1206,18 @@ config NET_NS Allow user space to create what appear to be multiple instances of the network stack. +config IMA_NS + bool "IMA namespace" + depends on IMA + default n + help + This allows container engines to use ima namespaces to provide + different IMA policy rules for different containers. Also, it allows + to create what appear to be multiple instances of the IMA measurement + list and other IMA related resources. + + If unsure, say N. + endif # NAMESPACES config CHECKPOINT_RESTORE diff --git a/kernel/fork.c b/kernel/fork.c index a1db14fc6656..705c3a8dcecd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1922,11 +1922,24 @@ static __latent_entropy struct task_struct *copy_process( } /* - * If the new process will be in a different time namespace - * do not allow it to share VM or a thread group with the forking task. + * If the new process will be in a different time namespace or a + * different ima namespace, do not allow it to share VM or a thread + * group with the forking task. */ if (clone_flags & (CLONE_THREAD | CLONE_VM)) { - if (nsp->time_ns != nsp->time_ns_for_children) + if ((nsp->time_ns != nsp->time_ns_for_children) || + ((clone_flags & CLONE_NEWIMA) || + (nsp->ima_ns != nsp->ima_ns_for_children))) + return ERR_PTR(-EINVAL); + } + + /* + * If the new process will be in a different ima namespace + * do not allow it to share the same file descriptor table. + */ + if (clone_flags & CLONE_FILES) { + if ((clone_flags & CLONE_NEWIMA) || + (nsp->ima_ns != nsp->ima_ns_for_children)) return ERR_PTR(-EINVAL); } @@ -2701,7 +2714,8 @@ static bool clone3_args_valid(struct kernel_clone_args *kargs) { /* Verify that no unknown flags are passed along. */ if (kargs->flags & - ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP)) + ~(CLONE_LEGACY_FLAGS | + CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP | CLONE_NEWIMA)) return false; /* @@ -2848,7 +2862,7 @@ static int check_unshare_flags(unsigned long unshare_flags) CLONE_VM|CLONE_FILES|CLONE_SYSVSEM| CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET| CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP| - CLONE_NEWTIME)) + CLONE_NEWTIME|CLONE_NEWIMA)) return -EINVAL; /* * Not implemented, but pretend it works if there is nothing diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index 12dd41b39a7f..e2cddc22dc53 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -19,6 +19,7 @@ #include <net/net_namespace.h> #include <linux/ipc_namespace.h> #include <linux/time_namespace.h> +#include <linux/ima.h> #include <linux/fs_struct.h> #include <linux/proc_fs.h> #include <linux/proc_ns.h> @@ -47,6 +48,10 @@ struct nsproxy init_nsproxy = { .time_ns = &init_time_ns, .time_ns_for_children = &init_time_ns, #endif +#ifdef CONFIG_IMA_NS + .ima_ns = &init_ima_ns, + .ima_ns_for_children = &init_ima_ns, +#endif }; static inline struct nsproxy *create_nsproxy(void) @@ -121,8 +126,19 @@ static struct nsproxy *create_new_namespaces(unsigned long flags, } new_nsp->time_ns = get_time_ns(tsk->nsproxy->time_ns); + new_nsp->ima_ns_for_children = copy_ima_ns(flags, user_ns, + tsk->nsproxy->ima_ns_for_children); + if (IS_ERR(new_nsp->ima_ns_for_children)) { + err = PTR_ERR(new_nsp->ima_ns_for_children); + goto out_ima; + } + new_nsp->ima_ns = get_ima_ns(tsk->nsproxy->ima_ns); + return new_nsp; +out_ima: + put_time_ns(new_nsp->time_ns); + put_time_ns(new_nsp->time_ns_for_children); out_time: put_net(new_nsp->net_ns); out_net: @@ -157,8 +173,10 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNET | - CLONE_NEWCGROUP | CLONE_NEWTIME)))) { - if (likely(old_ns->time_ns_for_children == old_ns->time_ns)) { + CLONE_NEWCGROUP | CLONE_NEWTIME | + CLONE_NEWIMA)))) { + if (likely((old_ns->time_ns_for_children == old_ns->time_ns) && + (old_ns->ima_ns_for_children == old_ns->ima_ns))) { get_nsproxy(old_ns); return 0; } @@ -186,6 +204,12 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) return ret; } + ret = imans_on_fork(new_ns, tsk); + if (ret) { + free_nsproxy(new_ns); + return ret; + } + tsk->nsproxy = new_ns; return 0; } @@ -204,6 +228,10 @@ void free_nsproxy(struct nsproxy *ns) put_time_ns(ns->time_ns); if (ns->time_ns_for_children) put_time_ns(ns->time_ns_for_children); + if (ns->ima_ns) + put_ima_ns(ns->ima_ns); + if (ns->ima_ns_for_children) + put_ima_ns(ns->ima_ns_for_children); put_cgroup_ns(ns->cgroup_ns); put_net(ns->net_ns); kmem_cache_free(nsproxy_cachep, ns); @@ -221,7 +249,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags, if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP | - CLONE_NEWTIME))) + CLONE_NEWTIME | CLONE_NEWIMA))) return 0; user_ns = new_cred ? new_cred->user_ns : current_user_ns(); @@ -476,6 +504,14 @@ static int validate_nsset(struct nsset *nsset, struct pid *pid) } #endif +#ifdef CONFIG_IMA_NS + if (flags & CLONE_NEWIMA) { + ret = validate_ns(nsset, &nsp->ima_ns->ns); + if (ret) + goto out; + } +#endif + out: if (pid_ns) put_pid_ns(pid_ns); diff --git a/kernel/ucount.c b/kernel/ucount.c index 11b1596e2542..3f4768d62b8f 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -70,6 +70,7 @@ static struct ctl_table user_table[] = { UCOUNT_ENTRY("max_mnt_namespaces"), UCOUNT_ENTRY("max_cgroup_namespaces"), UCOUNT_ENTRY("max_time_namespaces"), + UCOUNT_ENTRY("max_ima_namespaces"), #ifdef CONFIG_INOTIFY_USER UCOUNT_ENTRY("max_inotify_instances"), UCOUNT_ENTRY("max_inotify_watches"), diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile index 9dda78739c85..7c7272b8df65 100644 --- a/security/integrity/ima/Makefile +++ b/security/integrity/ima/Makefile @@ -15,3 +15,4 @@ ima-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o ima-$(CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS) += ima_asymmetric_keys.o ima-$(CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS) += ima_queue_keys.o ima-$(CONFIG_IMA_DIGEST_LIST) += ima_digest_list.o +ima-$(CONFIG_IMA_NS) += ima_ns.o diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index bd554510d67f..dc536da8058e 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -20,6 +20,7 @@ #include <linux/hash.h> #include <linux/tpm.h> #include <linux/audit.h> +#include <linux/ima.h> #include <crypto/hash_info.h> #include "../integrity.h" @@ -362,6 +363,18 @@ static inline enum integrity_status ima_get_cache_status(struct integrity_iint_c #endif /* CONFIG_IMA_APPRAISE */ +#ifdef CONFIG_IMA_NS +static inline struct ima_namespace *get_current_ns(void) +{ + return current->nsproxy->ima_ns; +} +#else +static inline struct ima_namespace *get_current_ns(void) +{ + return &init_ima_ns; +} +#endif /* CONFIG_IMA_NS */ + #ifdef CONFIG_IMA_APPRAISE_MODSIG int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len, struct modsig **modsig); diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c index 96eeee9e12c1..e3d5f4154586 100644 --- a/security/integrity/ima/ima_fs.c +++ b/security/integrity/ima/ima_fs.c @@ -275,7 +275,7 @@ static const struct file_operations ima_ascii_measurements_ops = { .release = seq_release, }; -static ssize_t ima_read_file(char *path, struct dentry *dentry) +static ssize_t ima_read_sfs_file(char *path, struct dentry *dentry) { void *data = NULL; char *datap; @@ -398,7 +398,7 @@ static ssize_t ima_write_data(struct file *file, const char __user *buf, goto out_free; if (data[0] == '/') { - result = ima_read_file(data, dentry); + result = ima_read_sfs_file(data, dentry); } else if (dentry == ima_policy) { if (ima_appraise & IMA_APPRAISE_POLICY) { pr_err("signed policy file (specified " diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c index 913d6b879b0b..b1c341e239a8 100644 --- a/security/integrity/ima/ima_init.c +++ b/security/integrity/ima/ima_init.c @@ -15,6 +15,9 @@ #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/err.h> +#include <linux/kref.h> +#include <linux/proc_ns.h> +#include <linux/user_namespace.h> #include "ima.h" @@ -22,6 +25,16 @@ const char boot_aggregate_name[] = "boot_aggregate"; struct tpm_chip *ima_tpm_chip; +struct ima_namespace init_ima_ns = { + .kref = KREF_INIT(2), + .user_ns = &init_user_ns, + .ns.inum = PROC_IMA_INIT_INO, +#ifdef CONFIG_IMA_NS + .ns.ops = &imans_operations, +#endif +}; +EXPORT_SYMBOL(init_ima_ns); + /* Add the boot aggregate to the IMA measurement list and extend * the PCR register. * diff --git a/security/integrity/ima/ima_ns.c b/security/integrity/ima/ima_ns.c new file mode 100644 index 000000000000..8f5f301406a2 --- /dev/null +++ b/security/integrity/ima/ima_ns.c @@ -0,0 +1,232 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019-2020 Huawei Technologies Duesseldorf GmbH + * + * Author: Krzysztof Struczynski <krzysztof.struczynski(a)huawei.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * File: ima_ns.c + * Functions to manage the IMA namespace. + */ + +#include <linux/export.h> +#include <linux/ima.h> +#include <linux/kref.h> +#include <linux/proc_ns.h> +#include <linux/slab.h> +#include <linux/user_namespace.h> +#include <linux/nsproxy.h> +#include <linux/sched.h> + +#include "ima.h" + +static struct ucounts *inc_ima_namespaces(struct user_namespace *ns) +{ + return inc_ucount(ns, current_euid(), UCOUNT_IMA_NAMESPACES); +} + +static void dec_ima_namespaces(struct ucounts *ucounts) +{ + return dec_ucount(ucounts, UCOUNT_IMA_NAMESPACES); +} + +static struct ima_namespace *ima_ns_alloc(void) +{ + struct ima_namespace *ima_ns; + + ima_ns = kzalloc(sizeof(*ima_ns), GFP_KERNEL); + if (!ima_ns) + return NULL; + + return ima_ns; +} + +/** + * Clone a new ns copying an original ima namespace, setting refcount to 1 + * + * @user_ns: User namespace that current task runs in + * @old_ns: Old ima namespace to clone + * Return: ERR_PTR(-ENOMEM) on error (failure to kmalloc), new ns otherwise + */ +static struct ima_namespace *clone_ima_ns(struct user_namespace *user_ns, + struct ima_namespace *old_ns) +{ + struct ima_namespace *ns; + struct ucounts *ucounts; + int err; + + err = -ENOSPC; + ucounts = inc_ima_namespaces(user_ns); + if (!ucounts) + goto fail; + + err = -ENOMEM; + ns = ima_ns_alloc(); + if (!ns) + goto fail_dec; + + kref_init(&ns->kref); + + err = ns_alloc_inum(&ns->ns); + if (err) + goto fail_free; + + ns->ns.ops = &imans_operations; + ns->user_ns = get_user_ns(user_ns); + ns->ucounts = ucounts; + + return ns; + +fail_free: + kfree(ns); +fail_dec: + dec_ima_namespaces(ucounts); +fail: + return ERR_PTR(err); +} + +/** + * Copy task's ima namespace, or clone it if flags specifies CLONE_NEWNS. + * + * @flags: Cloning flags + * @user_ns: User namespace that current task runs in + * @old_ns: Old ima namespace to clone + * + * Return: IMA namespace or ERR_PTR. + */ + +struct ima_namespace *copy_ima_ns(unsigned long flags, + struct user_namespace *user_ns, + struct ima_namespace *old_ns) +{ + if (!(flags & CLONE_NEWIMA)) + return get_ima_ns(old_ns); + + return clone_ima_ns(user_ns, old_ns); +} + +static void destroy_ima_ns(struct ima_namespace *ns) +{ + dec_ima_namespaces(ns->ucounts); + put_user_ns(ns->user_ns); + ns_free_inum(&ns->ns); + kfree(ns); +} + +void free_ima_ns(struct kref *kref) +{ + struct ima_namespace *ns; + + ns = container_of(kref, struct ima_namespace, kref); + + destroy_ima_ns(ns); +} + +static inline struct ima_namespace *to_ima_ns(struct ns_common *ns) +{ + return container_of(ns, struct ima_namespace, ns); +} + +static struct ns_common *imans_get(struct task_struct *task) +{ + struct ima_namespace *ns = NULL; + struct nsproxy *nsproxy; + + task_lock(task); + nsproxy = task->nsproxy; + if (nsproxy) { + ns = nsproxy->ima_ns; + get_ima_ns(ns); + } + task_unlock(task); + + return ns ? &ns->ns : NULL; +} + +static struct ns_common *imans_for_children_get(struct task_struct *task) +{ + struct ima_namespace *ns = NULL; + struct nsproxy *nsproxy; + + task_lock(task); + nsproxy = task->nsproxy; + if (nsproxy) { + ns = nsproxy->ima_ns_for_children; + get_ima_ns(ns); + } + task_unlock(task); + + return ns ? &ns->ns : NULL; +} + +static void imans_put(struct ns_common *ns) +{ + put_ima_ns(to_ima_ns(ns)); +} + +static int imans_install(struct nsset *nsset, struct ns_common *new) +{ + struct nsproxy *nsproxy = nsset->nsproxy; + struct ima_namespace *ns = to_ima_ns(new); + + if (!current_is_single_threaded()) + return -EUSERS; + + if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) || + !ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN)) + return -EPERM; + + get_ima_ns(ns); + put_ima_ns(nsproxy->ima_ns); + nsproxy->ima_ns = ns; + + get_ima_ns(ns); + put_ima_ns(nsproxy->ima_ns_for_children); + nsproxy->ima_ns_for_children = ns; + + return 0; +} + +int imans_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk) +{ + struct ns_common *nsc = &nsproxy->ima_ns_for_children->ns; + struct ima_namespace *ns = to_ima_ns(nsc); + + /* create_new_namespaces() already incremented the ref counter */ + if (nsproxy->ima_ns == nsproxy->ima_ns_for_children) + return 0; + + get_ima_ns(ns); + put_ima_ns(nsproxy->ima_ns); + nsproxy->ima_ns = ns; + + return 0; +} + +static struct user_namespace *imans_owner(struct ns_common *ns) +{ + return to_ima_ns(ns)->user_ns; +} + +const struct proc_ns_operations imans_operations = { + .name = "ima", + .type = CLONE_NEWIMA, + .get = imans_get, + .put = imans_put, + .install = imans_install, + .owner = imans_owner, +}; + +const struct proc_ns_operations imans_for_children_operations = { + .name = "ima_for_children", + .type = CLONE_NEWIMA, + .get = imans_for_children_get, + .put = imans_put, + .install = imans_install, + .owner = imans_owner, +}; + -- 2.20.1

1 45

[PATCH 1/1] spraid: support Ramaxel raid controller
by Yanling Song 10 Sep '21

10 Sep '21

This initial commit contains Ramaxel's spraid module. The spraid controller has two modes: HBA mode and raid mode. Raid mode supports raid 0/1/5/6/10/50/60 mode. The spraid driver works under scsi sub system and transfers scsi commands to ramaxel raid chip. Signed-off-by: Yanling Song <songyl(a)ramaxel.com> --- drivers/scsi/Kconfig | 1 + drivers/scsi/Makefile | 1 + drivers/scsi/spraid/Kconfig | 11 + drivers/scsi/spraid/Makefile | 7 + drivers/scsi/spraid/spraid.h | 572 ++++++ drivers/scsi/spraid/spraid_main.c | 3159 +++++++++++++++++++++++++++++ 6 files changed, 3751 insertions(+) create mode 100644 drivers/scsi/spraid/Kconfig create mode 100644 drivers/scsi/spraid/Makefile create mode 100644 drivers/scsi/spraid/spraid.h create mode 100644 drivers/scsi/spraid/spraid_main.c diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 871f8ea7b928..0fbe4edeccd0 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -484,6 +484,7 @@ source "drivers/scsi/megaraid/Kconfig.megaraid" source "drivers/scsi/mpt3sas/Kconfig" source "drivers/scsi/smartpqi/Kconfig" source "drivers/scsi/ufs/Kconfig" +source "drivers/scsi/spraid/Kconfig" config SCSI_HPTIOP tristate "HighPoint RocketRAID 3xxx/4xxx Controller support" diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index ce61ba07fadd..78a3c832394c 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -97,6 +97,7 @@ obj-$(CONFIG_SCSI_ZALON) += zalon7xx.o obj-$(CONFIG_SCSI_DC395x) += dc395x.o obj-$(CONFIG_SCSI_AM53C974) += esp_scsi.o am53c974.o obj-$(CONFIG_CXLFLASH) += cxlflash/ +obj-$(CONFIG_RAMAXEL_SPRAID) += spraid/ obj-$(CONFIG_MEGARAID_LEGACY) += megaraid.o obj-$(CONFIG_MEGARAID_NEWGEN) += megaraid/ obj-$(CONFIG_MEGARAID_SAS) += megaraid/ diff --git a/drivers/scsi/spraid/Kconfig b/drivers/scsi/spraid/Kconfig new file mode 100644 index 000000000000..83962efaab07 --- /dev/null +++ b/drivers/scsi/spraid/Kconfig @@ -0,0 +1,11 @@ +# +# Ramaxel driver configuration +# + +config RAMAXEL_SPRAID + tristate "Ramaxel spraid Adapter" + depends on PCI && SCSI + depends on ARM64 || X86_64 + default m + help + This driver supports Ramaxel spraid driver. diff --git a/drivers/scsi/spraid/Makefile b/drivers/scsi/spraid/Makefile new file mode 100644 index 000000000000..aadc2ffd37eb --- /dev/null +++ b/drivers/scsi/spraid/Makefile @@ -0,0 +1,7 @@ +# +# Makefile for the Ramaxel device drivers. +# + +obj-$(CONFIG_RAMAXEL_SPRAID) += spraid.o + +spraid-objs := spraid_main.o \ No newline at end of file diff --git a/drivers/scsi/spraid/spraid.h b/drivers/scsi/spraid/spraid.h new file mode 100644 index 000000000000..49b110312770 --- /dev/null +++ b/drivers/scsi/spraid/spraid.h @@ -0,0 +1,572 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __SPRAID_H_ +#define __SPRAID_H_ + +#define SPRAID_CAP_MQES(cap) ((cap) & 0xffff) +#define SPRAID_CAP_STRIDE(cap) (((cap) >> 32) & 0xf) +#define SPRAID_CAP_MPSMIN(cap) (((cap) >> 48) & 0xf) +#define SPRAID_CAP_MPSMAX(cap) (((cap) >> 52) & 0xf) +#define SPRAID_CAP_TIMEOUT(cap) (((cap) >> 24) & 0xff) + +#define SPRAID_DEFAULT_MAX_CHANNEL 4 +#define SPRAID_DEFAULT_MAX_ID 240 +#define SPRAID_DEFAULT_MAX_LUN_PER_HOST 8 +#define MAX_SECTORS 2048 + +#define IO_SQE_SIZE sizeof(struct spraid_ioq_command) +#define ADMIN_SQE_SIZE sizeof(struct spraid_admin_command) +#define SQE_SIZE(qid) (((qid) > 0) ? IO_SQE_SIZE : ADMIN_SQE_SIZE) +#define CQ_SIZE(depth) ((depth) * sizeof(struct spraid_completion)) +#define SQ_SIZE(qid, depth) ((depth) * SQE_SIZE(qid)) + +#define SENSE_SIZE(depth) ((depth) * SCSI_SENSE_BUFFERSIZE) + +#define SPRAID_AQ_DEPTH 128 +#define SPRAID_NR_AEN_COMMANDS 1 +#define SPRAID_AQ_BLK_MQ_DEPTH (SPRAID_AQ_DEPTH - SPRAID_NR_AEN_COMMANDS) +#define SPRAID_AQ_MQ_TAG_DEPTH (SPRAID_AQ_BLK_MQ_DEPTH - 1) +#define SPRAID_ADMIN_QUEUE_NUM 1 + +#define FUA_MASK 0x08 +#define SPRAID_MINORS BIT(MINORBITS) + +#define COMMAND_IS_WRITE(cmd) ((cmd)->common.opcode & 1) + +#define SPRAID_IO_IOSQES 7 +#define SPRAID_IO_IOCQES 4 +#define PRP_ENTRY_SIZE 8 + +#define SMALL_POOL_SIZE 256 +#define MAX_SMALL_POOL_NUM 16 +#define MAX_CMD_PER_DEV 32 +#define MAX_CDB_LEN 32 + +#define SPRAID_UP_TO_MULTY4(x) (((x) + 4) & (~0x03)) + +#define CQE_STATUS_SUCCESS (0x0) + +#define PCI_VENDOR_ID_RMT_LOGIC 0x1E81 + +#define PCI_DEVICE_ID_RMT_TEST 0x1880 +#define SPRAID_SERVER_DEVICE_HAB_DID 0x2100 +#define SPRAID_SERVER_DEVICE_RAID_DID 0x2200 +#define SPRAID_SERVER_DEVICE_NVME_DID 0x3758 +#define SPRAID_SERVER_DEVICE_FANOUT_DID 0x3858 + +#define IO_6_DEFAULT_TX_LEN 256 + +#define SPRAID_INT_PAGES 2 +#define SPRAID_INT_BYTES(hdev) (SPRAID_INT_PAGES * (hdev)->page_size) + +enum { + SPRAID_REQ_CANCELLED = (1 << 0), + SPRAID_REQ_USERCMD = (1 << 1), +}; + +enum { + SPRAID_SC_SUCCESS = 0x0, + SPRAID_SC_INVALID_OPCODE = 0x1, + SPRAID_SC_INVALID_FIELD = 0x2, + + SPRAID_SC_ABORT_LIMIT = 0x103, + SPRAID_SC_ABORT_MISSING = 0x104, + SPRAID_SC_ASYNC_LIMIT = 0x105, + + SPRAID_SC_DNR = 0x4000, +}; + +enum { + SPRAID_REG_CAP = 0x0000, + SPRAID_REG_CC = 0x0014, + SPRAID_REG_CSTS = 0x001c, + SPRAID_REG_AQA = 0x0024, + SPRAID_REG_ASQ = 0x0028, + SPRAID_REG_ACQ = 0x0030, + SPRAID_REG_DBS = 0x1000, +}; + +enum { + SPRAID_CC_ENABLE = 1 << 0, + SPRAID_CC_CSS_NVM = 0 << 4, + SPRAID_CC_MPS_SHIFT = 7, + SPRAID_CC_AMS_SHIFT = 11, + SPRAID_CC_SHN_SHIFT = 14, + SPRAID_CC_IOSQES_SHIFT = 16, + SPRAID_CC_IOCQES_SHIFT = 20, + SPRAID_CC_AMS_RR = 0 << SPRAID_CC_AMS_SHIFT, + SPRAID_CC_SHN_NONE = 0 << SPRAID_CC_SHN_SHIFT, + SPRAID_CC_IOSQES = SPRAID_IO_IOSQES << SPRAID_CC_IOSQES_SHIFT, + SPRAID_CC_IOCQES = SPRAID_IO_IOCQES << SPRAID_CC_IOCQES_SHIFT, + SPRAID_CC_SHN_NORMAL = 1 << SPRAID_CC_SHN_SHIFT, + SPRAID_CC_SHN_MASK = 3 << SPRAID_CC_SHN_SHIFT, + SPRAID_CSTS_RDY = 1 << 0, + SPRAID_CSTS_SHST_CMPLT = 2 << 2, + SPRAID_CSTS_SHST_MASK = 3 << 2, +}; + +enum { + SPRAID_ADMIN_DELETE_SQ = 0x00, + SPRAID_ADMIN_CREATE_SQ = 0x01, + SPRAID_ADMIN_DELETE_CQ = 0x04, + SPRAID_ADMIN_CREATE_CQ = 0x05, + SPRAID_ADMIN_ABORT_CMD = 0x08, + SPRAID_ADMIN_SET_FEATURES = 0x09, + SPRAID_ADMIN_ASYNC_EVENT = 0x0c, + SPRAID_ADMIN_GET_INFO = 0xc6, + SPRAID_ADMIN_RESET = 0xc8, +}; + +enum { + SPRAID_GET_INFO_CTRL = 0, + SPRAID_GET_INFO_DEV_LIST = 1, +}; + +enum { + SPRAID_RESET_TARGET = 0, + SPRAID_RESET_BUS = 1, +}; + +enum { + SPRAID_AEN_ERROR = 0, + SPRAID_AEN_NOTICE = 2, + SPRAID_AEN_VS = 7, +}; + +enum { + SPRAID_AEN_DEV_CHANGED = 0x00, + SPRAID_AEN_HOST_PROBING = 0x10, +}; + +enum { + SPRAID_CMD_WRITE = 0x01, + SPRAID_CMD_READ = 0x02, + + SPRAID_CMD_NONIO_NONE = 0x80, + SPRAID_CMD_NONIO_TODEV = 0x81, + SPRAID_CMD_NONIO_FROMDEV = 0x82, +}; + +enum { + SPRAID_QUEUE_PHYS_CONTIG = (1 << 0), + SPRAID_CQ_IRQ_ENABLED = (1 << 1), + + SPRAID_FEAT_NUM_QUEUES = 0x07, + SPRAID_FEAT_ASYNC_EVENT = 0x0b, + SPRAID_FEAT_TIMESTAMP = 0x0e, +}; + +enum { + SPRAID_AEN_TIMESYN = 0x07 +}; + +enum spraid_state { + SPRAID_NEW, + SPRAID_LIVE, + SPRAID_RESETTING, + SPRAID_DELEING, + SPRAID_DEAD, +}; + +struct spraid_completion { + __le32 result; + union { + struct { + __u8 sense_len; + __u8 resv[3]; + }; + __le32 result1; + }; + __le16 sq_head; + __le16 sq_id; + __u16 cmd_id; + __le16 status; +}; + +struct spraid_ctrl_info { + __le32 nd; + __le16 max_cmds; + __le16 max_cmd_per_dev; + __le16 max_sge; + __u16 rsvd0; + __le16 max_channel; + __le16 max_tgt_id; + __le16 max_lun; + __le16 max_cdb_len; + __le16 min_stripe_sz; + __le16 max_stripe_sz; + __le16 max_strips_per_io; + __u16 rsvd1; + __u8 mdts; + __u8 acl; + __u8 aer1; + __u8 rsvd2; + __le16 vid; + __le16 ssvid; + __u8 sn[20]; + __u8 mn[40]; + __u8 fr[8]; + __u8 rsvd3[3992]; +}; + +struct spraid_dev { + struct pci_dev *pdev; + struct device *dev; + struct Scsi_Host *shost; + struct spraid_queue *queues; + struct dma_pool *prp_page_pool; + struct dma_pool *prp_small_pool[MAX_SMALL_POOL_NUM]; + mempool_t *iod_mempool; + struct blk_mq_tag_set admin_tagset; + struct request_queue *admin_q; + void __iomem *bar; + u32 max_qid; + u32 num_vecs; + u32 queue_count; + u32 ioq_depth; + int db_stride; + u32 __iomem *dbs; + struct rw_semaphore devices_rwsem; + int numa_node; + u32 page_size; + u32 ctrl_config; + u32 online_queues; + u64 cap; + struct device ctrl_device; + struct cdev cdev; + int instance; + struct spraid_ctrl_info *ctrl_info; + struct spraid_dev_info *devices; + + struct work_struct aen_work; + struct work_struct scan_work; + struct work_struct timesyn_work; + + enum spraid_state state; +}; + +struct spraid_sgl_desc { + __le64 addr; + __le32 length; + __u8 rsvd[3]; + __u8 type; +}; + +union spraid_data_ptr { + struct { + __le64 prp1; + __le64 prp2; + }; + struct spraid_sgl_desc sgl; +}; + +struct spraid_admin_common_command { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __le32 cdw2[4]; + union spraid_data_ptr dptr; + __le32 cdw10; + __le32 cdw11; + __le32 cdw12; + __le32 cdw13; + __le32 cdw14; + __le32 cdw15; +}; + +struct spraid_features { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __u64 rsvd2[2]; + union spraid_data_ptr dptr; + __le32 fid; + __le32 dword11; + __le32 dword12; + __le32 dword13; + __le32 dword14; + __le32 dword15; +}; + +struct spraid_create_cq { + __u8 opcode; + __u8 flags; + __u16 command_id; + __u32 rsvd1[5]; + __le64 prp1; + __u64 rsvd8; + __le16 cqid; + __le16 qsize; + __le16 cq_flags; + __le16 irq_vector; + __u32 rsvd12[4]; +}; + +struct spraid_create_sq { + __u8 opcode; + __u8 flags; + __u16 command_id; + __u32 rsvd1[5]; + __le64 prp1; + __u64 rsvd8; + __le16 sqid; + __le16 qsize; + __le16 sq_flags; + __le16 cqid; + __u32 rsvd12[4]; +}; + +struct spraid_delete_queue { + __u8 opcode; + __u8 flags; + __u16 command_id; + __u32 rsvd1[9]; + __le16 qid; + __u16 rsvd10; + __u32 rsvd11[5]; +}; + +struct spraid_get_info { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __u32 rsvd2[4]; + union spraid_data_ptr dptr; + __u8 type; + __u8 rsvd10[3]; + __le32 cdw11; + __u32 rsvd12[4]; +}; + +enum { + SPRAID_CMD_FLAG_SGL_METABUF = (1 << 6), + SPRAID_CMD_FLAG_SGL_METASEG = (1 << 7), + SPRAID_CMD_FLAG_SGL_ALL = SPRAID_CMD_FLAG_SGL_METABUF | + SPRAID_CMD_FLAG_SGL_METASEG, +}; + +enum spraid_cmd_state { + SPRAID_CMD_IDLE = 0, + SPRAID_CMD_IN_FLIGHT = 1, + SPRAID_CMD_COMPLETE = 2, + SPRAID_CMD_TIMEOUT = 3, + SPRAID_CMD_TMO_COMPLETE = 4, +}; + +struct spraid_abort_cmd { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __u64 rsvd2[4]; + __le16 sqid; + __le16 cid; + __u32 rsvd11[5]; +}; + +struct spraid_reset_cmd { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __u64 rsvd2[4]; + __u8 type; + __u8 rsvd10[3]; + __u32 rsvd11[5]; +}; + +struct spraid_admin_command { + union { + struct spraid_admin_common_command common; + struct spraid_features features; + struct spraid_create_cq create_cq; + struct spraid_create_sq create_sq; + struct spraid_delete_queue delete_queue; + struct spraid_get_info get_info; + struct spraid_abort_cmd abort; + struct spraid_reset_cmd reset; + }; +}; + +struct spraid_ioq_common_command { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_len; + __u8 rsvd2; + __le32 cdw3[3]; + union spraid_data_ptr dptr; + __le32 cdw10[6]; + __u8 cdb[32]; + __le64 sense_addr; + __le32 cdw26[6]; +}; + +struct spraid_rw_command { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_len; + __u8 rsvd2; + __u32 rsvd3[3]; + union spraid_data_ptr dptr; + __le64 slba; + __le16 nlb; + __le16 control; + __u32 rsvd13[3]; + __u8 cdb[32]; + __le64 sense_addr; + __u32 rsvd26[6]; +}; + +struct spraid_scsi_nonio { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 hdid; + __le16 sense_len; + __u8 cdb_length; + __u8 rsvd2; + __u32 rsvd3[3]; + union spraid_data_ptr dptr; + __u32 rsvd10[5]; + __le32 buffer_len; + __u8 cdb[32]; + __le64 sense_addr; + __u32 rsvd26[6]; +}; + +struct spraid_ioq_command { + union { + struct spraid_ioq_common_command common; + struct spraid_rw_command rw; + struct spraid_scsi_nonio scsi_nonio; + }; +}; + +#define SPRAID_IOCTL_ADMIN_CMD \ + _IOWR('N', 0x41, struct spraid_passthru_common_cmd) + +struct spraid_passthru_common_cmd { + __u8 opcode; + __u8 flags; + __u16 rsvd0; + __u32 nsid; + union { + struct { + __u16 subopcode:16; + __u16 rsvd1; + } info_0; + __u32 cdw2; + }; + union { + struct { + __u16 data_len; + __u16 param_len; + } info_1; + __u32 cdw3; + }; + __u64 metadata; + + __u64 addr; + __u64 prp2; + + __u32 cdw10; + __u32 cdw11; + __u32 cdw12; + __u32 cdw13; + __u32 cdw14; + __u32 cdw15; + __u32 timeout_ms; + __u32 result0; + __u32 result1; +}; + +struct spraid_admin_request { + struct spraid_admin_command *cmd; + u32 result0; + u32 result1; + u16 flags; + u16 status; +}; + +struct spraid_queue { + struct spraid_dev *hdev; + spinlock_t sq_lock; /* spinlock for lock handling */ + + spinlock_t cq_lock ____cacheline_aligned_in_smp; /* spinlock for lock handling */ + + void *sq_cmds; + + struct spraid_completion *cqes; + + dma_addr_t sq_dma_addr; + dma_addr_t cq_dma_addr; + u32 __iomem *q_db; + u8 cq_phase; + u8 sqes; + u16 qid; + u16 sq_tail; + u16 cq_head; + u16 last_cq_head; + u16 q_depth; + s16 cq_vector; + void *sense; + dma_addr_t sense_dma_addr; + struct dma_pool *prp_small_pool; +}; + +struct spraid_iod { + struct spraid_admin_request req; + struct spraid_queue *spraidq; + enum spraid_cmd_state state; + int npages; + u32 nsge; + u32 length; + bool use_sgl; + bool sg_drv_mgmt; + dma_addr_t first_dma; + void *sense; + dma_addr_t sense_dma; + struct scatterlist *sg; + struct scatterlist inline_sg[0]; +}; + +#define SPRAID_DEV_INFO_ATTR_BOOT(attr) ((attr) & 0x01) +#define SPRAID_DEV_INFO_ATTR_HDD(attr) ((attr) & 0x02) +#define SPRAID_DEV_INFO_ATTR_PT(attr) (((attr) & 0x22) == 0x02) +#define SPRAID_DEV_INFO_ATTR_RAWDISK(attr) ((attr) & 0x20) + +#define SPRAID_DEV_INFO_FLAG_VALID(flag) ((flag) & 0x01) +#define SPRAID_DEV_INFO_FLAG_CHANGE(flag) ((flag) & 0x02) + +struct spraid_dev_info { + __le32 hdid; + __le16 target; + __u8 channel; + __u8 lun; + __u8 attr; + __u8 flag; + __u8 rsvd2[2]; +}; + +#define MAX_DEV_ENTRY_PER_PAGE_4K 340 +struct spraid_dev_list { + __le32 dev_num; + __u32 rsvd0[3]; + struct spraid_dev_info devices[MAX_DEV_ENTRY_PER_PAGE_4K]; +}; + +struct spraid_sdev_hostdata { + u32 hdid; +}; + +#endif + diff --git a/drivers/scsi/spraid/spraid_main.c b/drivers/scsi/spraid/spraid_main.c new file mode 100644 index 000000000000..aebc92b84510 --- /dev/null +++ b/drivers/scsi/spraid/spraid_main.c @@ -0,0 +1,3159 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Linux spraid device driver + + * Copyright(c) 2021 Ramaxel Memory Technology, Ltd + + * Description: Linux device driver for Ramaxel Memory Technology Logic SPRAID + * controller.This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#define pr_fmt(fmt) "spraid: " fmt + +#include <linux/sched/signal.h> +#include <linux/version.h> +#include <linux/pci.h> +#include <linux/aer.h> +#include <linux/module.h> +#include <linux/ioport.h> +#include <linux/device.h> +#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/cdev.h> +#include <linux/sysfs.h> +#include <linux/gfp.h> +#include <linux/types.h> +#include <linux/ratelimit.h> +#include <linux/once.h> +#include <linux/debugfs.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/blkdev.h> + +#include <scsi/scsi.h> +#include <scsi/scsi_cmnd.h> +#include <scsi/scsi_device.h> +#include <scsi/scsi_host.h> +#include <scsi/scsi_transport.h> +#include <scsi/scsi_dbg.h> + +#include "spraid.h" + +static u32 admin_tmout = 60; +module_param(admin_tmout, uint, 0644); +MODULE_PARM_DESC(admin_tmout, "admin commands timeout (seconds)"); + +static u32 scmd_tmout_pt = 30; +module_param(scmd_tmout_pt, uint, 0644); +MODULE_PARM_DESC(scmd_tmout_pt, "scsi commands timeout for passthrough(seconds)"); + +static u32 scmd_tmout_nonpt = 180; +module_param(scmd_tmout_nonpt, uint, 0644); +MODULE_PARM_DESC(scmd_tmout_nonpt, "scsi commands timeout for rawdisk&raid(seconds)"); + +static u32 wait_abl_tmout = 3; +module_param(wait_abl_tmout, uint, 0644); +MODULE_PARM_DESC(wait_abl_tmout, "wait abnormal io timeout(seconds)"); + +static bool use_sgl_force; +module_param(use_sgl_force, bool, 0644); +MODULE_PARM_DESC(use_sgl_force, "force IO use sgl format, default false"); + +static int ioq_depth_set(const char *val, const struct kernel_param *kp); +static const struct kernel_param_ops ioq_depth_ops = { + .set = ioq_depth_set, + .get = param_get_uint, +}; + +static u32 io_queue_depth = 1024; +module_param_cb(io_queue_depth, &ioq_depth_ops, &io_queue_depth, 0644); +MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2"); + +static int log_debug_switch_set(const char *val, const struct kernel_param *kp) +{ + u8 n = 0; + int ret; + + ret = kstrtou8(val, 10, &n); + if (ret != 0) + return -EINVAL; + + return param_set_byte(val, kp); +} + +static const struct kernel_param_ops log_debug_switch_ops = { + .set = log_debug_switch_set, + .get = param_get_byte, +}; + +static unsigned char log_debug_switch = 1; +module_param_cb(log_debug_switch, &log_debug_switch_ops, &log_debug_switch, 0644); +MODULE_PARM_DESC(log_debug_switch, "set log state, default non-zero for switch on"); + +static int small_pool_num_set(const char *val, const struct kernel_param *kp) +{ + u8 n = 0; + int ret; + + ret = kstrtou8(val, 10, &n); + if (ret != 0) + return -EINVAL; + if (n > MAX_SMALL_POOL_NUM) + n = MAX_SMALL_POOL_NUM; + if (n < 1) + n = 1; + *((u8 *)kp->arg) = n; + + return 0; +} + +static const struct kernel_param_ops small_pool_num_ops = { + .set = small_pool_num_set, + .get = param_get_byte, +}; + +static unsigned char small_pool_num = 4; +module_param_cb(small_pool_num, &small_pool_num_ops, &small_pool_num, 0644); +MODULE_PARM_DESC(small_pool_num, "set prp small pool num, default 4, MAX 16"); + +static void spraid_free_queue(struct spraid_queue *spraidq); +static void spraid_handle_aen_notice(struct spraid_dev *hdev, u32 result); +static void spraid_handle_aen_vs(struct spraid_dev *hdev, u32 result); + +static DEFINE_IDA(spraid_instance_ida); +static dev_t spraid_chr_devt; +static struct class *spraid_class; + +#define SPRAID_CAP_TIMEOUT_UNIT_MS (HZ / 2) + +static struct workqueue_struct *spraid_wq; + +#define dev_log_dbg(dev, fmt, ...) do { \ + if (unlikely(log_debug_switch)) \ + dev_info(dev, "[%s] [%d] " fmt, \ + __func__, __LINE__, ##__VA_ARGS__); \ +} while (0) + +#define SPRAID_DRV_VERSION "1.00.0" + +#define SHUTDOWN_TIMEOUT (50 * HZ) +#define ADMIN_TIMEOUT (admin_tmout * HZ) +#define ADMIN_ERR_TIMEOUT 32757 + +#define SPRAID_WAIT_ABNL_CMD_TIMEOUT (wait_abl_tmout * 2) + +enum FW_STAT_CODE { + FW_STAT_OK = 0, + FW_STAT_NEED_CHECK, + FW_STAT_ERROR, + FW_STAT_EP_PCIE_ERROR, + FW_STAT_NAC_DMA_ERROR, + FW_STAT_ABORTED, + FW_STAT_NEED_RETRY +}; + +static int ioq_depth_set(const char *val, const struct kernel_param *kp) +{ + int n = 0; + int ret; + + ret = kstrtoint(val, 10, &n); + if (ret != 0 || n < 2) + return -EINVAL; + + return param_set_int(val, kp); +} + +static int spraid_remap_bar(struct spraid_dev *hdev, u32 size) +{ + struct pci_dev *pdev = hdev->pdev; + + if (size > pci_resource_len(pdev, 0)) { + dev_err(hdev->dev, "Input size[%u] exceed bar0 length[%llu]\n", + size, pci_resource_len(pdev, 0)); + return -ENOMEM; + } + + if (hdev->bar) + iounmap(hdev->bar); + + hdev->bar = ioremap(pci_resource_start(pdev, 0), size); + if (!hdev->bar) { + dev_err(hdev->dev, "ioremap for bar0 failed\n"); + return -ENOMEM; + } + hdev->dbs = hdev->bar + SPRAID_REG_DBS; + + return 0; +} + +static int spraid_dev_map(struct spraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + int ret; + + ret = pci_request_mem_regions(pdev, "spraid"); + if (ret) { + dev_err(hdev->dev, "fail to request memory regions\n"); + return ret; + } + + ret = spraid_remap_bar(hdev, SPRAID_REG_DBS + 4096); + if (ret) { + pci_release_mem_regions(pdev); + return ret; + } + + return 0; +} + +static void spraid_dev_unmap(struct spraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + + if (hdev->bar) { + iounmap(hdev->bar); + hdev->bar = NULL; + } + pci_release_mem_regions(pdev); +} + +static int spraid_pci_enable(struct spraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + int ret = -ENOMEM; + + if (pci_enable_device_mem(pdev)) { + dev_err(hdev->dev, "Enable pci device memory resources failed\n"); + return ret; + } + pci_set_master(pdev); + + if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) { + dev_err(hdev->dev, "Set dma mask and coherent failed\n"); + goto disable; + } + + if (readl(hdev->bar + SPRAID_REG_CSTS) == U32_MAX) { + ret = -ENODEV; + dev_err(hdev->dev, "Read csts register failed\n"); + goto disable; + } + + ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); + if (ret < 0) { + dev_err(hdev->dev, "Allocate one IRQ for setup admin channel failed\n"); + goto disable; + } + + hdev->cap = lo_hi_readq(hdev->bar + SPRAID_REG_CAP); + hdev->ioq_depth = min_t(u32, SPRAID_CAP_MQES(hdev->cap) + 1, + io_queue_depth); + hdev->db_stride = 1 << SPRAID_CAP_STRIDE(hdev->cap); + + pci_enable_pcie_error_reporting(pdev); + pci_save_state(pdev); + + return 0; + +disable: + pci_disable_device(pdev); + return ret; +} + +static inline +struct spraid_admin_request *spraid_admin_req(struct request *req) +{ + return blk_mq_rq_to_pdu(req); +} + +static int spraid_npages_prp(u32 size, struct spraid_dev *hdev) +{ + u32 nprps = DIV_ROUND_UP(size + hdev->page_size, hdev->page_size); + + return DIV_ROUND_UP(PRP_ENTRY_SIZE * nprps, PAGE_SIZE - PRP_ENTRY_SIZE); +} + +static int spraid_npages_sgl(u32 nseg) +{ + return DIV_ROUND_UP(nseg * sizeof(struct spraid_sgl_desc), PAGE_SIZE); +} + +static void **spraid_iod_list(struct spraid_iod *iod) +{ + return (void **)(iod->inline_sg + (iod->sg_drv_mgmt ? iod->nsge : 0)); +} + +static u32 spraid_iod_ext_size(struct spraid_dev *hdev, u32 size, u32 nsge, + bool sg_drv_mgmt, bool use_sgl) +{ + size_t alloc_size, sg_size; + + if (use_sgl) + alloc_size = sizeof(__le64 *) * spraid_npages_sgl(nsge); + else + alloc_size = sizeof(__le64 *) * spraid_npages_prp(size, hdev); + + sg_size = sg_drv_mgmt ? (sizeof(struct scatterlist) * nsge) : 0; + return sg_size + alloc_size; +} + +static u32 spraid_cmd_size(struct spraid_dev *hdev, bool sg_drv_mgmt, bool use_sgl) +{ + u32 alloc_size = spraid_iod_ext_size(hdev, SPRAID_INT_BYTES(hdev), + SPRAID_INT_PAGES, sg_drv_mgmt, use_sgl); + + dev_log_dbg(hdev->dev, "sg_drv_mgmt: %s, use_sgl: %s, iod size: %lu, alloc_size: %u\n", + sg_drv_mgmt ? "true" : "false", use_sgl ? "true" : "false", + sizeof(struct spraid_iod), alloc_size); + + return sizeof(struct spraid_iod) + alloc_size; +} + +static int spraid_setup_prps(struct spraid_dev *hdev, struct spraid_iod *iod) +{ + struct scatterlist *sg = iod->sg; + u64 dma_addr = sg_dma_address(sg); + int dma_len = sg_dma_len(sg); + __le64 *prp_list, *old_prp_list; + u32 page_size = hdev->page_size; + int offset = dma_addr & (page_size - 1); + void **list = spraid_iod_list(iod); + int length = iod->length; + struct dma_pool *pool; + dma_addr_t prp_dma; + int nprps, i; + + length -= (page_size - offset); + if (length <= 0) { + iod->first_dma = 0; + return 0; + } + + dma_len -= (page_size - offset); + if (dma_len) { + dma_addr += (page_size - offset); + } else { + sg = sg_next(sg); + dma_addr = sg_dma_address(sg); + dma_len = sg_dma_len(sg); + } + + if (length <= page_size) { + iod->first_dma = dma_addr; + return 0; + } + + nprps = DIV_ROUND_UP(length, page_size); + if (nprps <= (SMALL_POOL_SIZE / PRP_ENTRY_SIZE)) { + pool = iod->spraidq->prp_small_pool; + iod->npages = 0; + } else { + pool = hdev->prp_page_pool; + iod->npages = 1; + } + + prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma); + if (!prp_list) { + dev_err_ratelimited(hdev->dev, "Allocate first prp_list memory failed\n"); + iod->first_dma = dma_addr; + iod->npages = -1; + return -ENOMEM; + } + list[0] = prp_list; + iod->first_dma = prp_dma; + i = 0; + for (;;) { + if (i == page_size / PRP_ENTRY_SIZE) { + old_prp_list = prp_list; + + prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma); + if (!prp_list) { + dev_err_ratelimited(hdev->dev, "Allocate %dth prp_list memory failed\n", + iod->npages + 1); + return -ENOMEM; + } + list[iod->npages++] = prp_list; + prp_list[0] = old_prp_list[i - 1]; + old_prp_list[i - 1] = cpu_to_le64(prp_dma); + i = 1; + } + prp_list[i++] = cpu_to_le64(dma_addr); + dma_len -= page_size; + dma_addr += page_size; + length -= page_size; + if (length <= 0) + break; + if (dma_len > 0) + continue; + if (unlikely(dma_len < 0)) + goto bad_sgl; + sg = sg_next(sg); + dma_addr = sg_dma_address(sg); + dma_len = sg_dma_len(sg); + } + + return 0; + +bad_sgl: + dev_err(hdev->dev, "Setup prps, invalid SGL for payload: %d nents: %d\n", + iod->length, iod->nsge); + return -EIO; +} + +#define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct spraid_sgl_desc)) + +static void spraid_submit_cmd(struct spraid_queue *spraidq, const void *cmd) +{ + u32 sqes = SQE_SIZE(spraidq->qid); + unsigned long flags; + struct spraid_admin_common_command *acd = + (struct spraid_admin_common_command *)cmd; + + spin_lock_irqsave(&spraidq->sq_lock, flags); + memcpy((spraidq->sq_cmds + sqes * spraidq->sq_tail), cmd, sqes); + if (++spraidq->sq_tail == spraidq->q_depth) + spraidq->sq_tail = 0; + + writel(spraidq->sq_tail, spraidq->q_db); + spin_unlock_irqrestore(&spraidq->sq_lock, flags); + + dev_log_dbg(spraidq->hdev->dev, "cid[%d], qid[%d], opcode[0x%x], flags[0x%x], hdid[%u]\n", + acd->command_id, spraidq->qid, acd->opcode, acd->flags, + le32_to_cpu(acd->hdid)); +} + +static u32 spraid_mod64(u64 dividend, u32 divisor) +{ + u64 d; + u32 remainder; + + if (!divisor) + pr_err("DIVISOR is zero, in div fn\n"); + + d = dividend; + remainder = do_div(d, divisor); + return remainder; +} + +static inline bool spraid_is_rw_scmd(struct scsi_cmnd *scmd) +{ + switch (scmd->cmnd[0]) { + case READ_6: + case READ_10: + case READ_12: + case READ_16: + case READ_32: + case WRITE_6: + case WRITE_10: + case WRITE_12: + case WRITE_16: + case WRITE_32: + return true; + default: + return false; + } +} + +static bool spraid_is_prp(struct spraid_dev *hdev, struct scsi_cmnd *scmd, u32 nsge) +{ + struct scatterlist *sg = scsi_sglist(scmd); + u32 page_size = hdev->page_size; + bool is_prp = true; + int i = 0; + + scsi_for_each_sg(scmd, sg, nsge, i) { + if (i != 0 && i != nsge - 1) { + if (spraid_mod64(sg_dma_len(sg), page_size) || + spraid_mod64(sg_dma_address(sg), page_size)) { + is_prp = false; + break; + } + } + + if (nsge > 1 && i == 0) { + if ((spraid_mod64((sg_dma_address(sg) + sg_dma_len(sg)), page_size))) { + is_prp = false; + break; + } + } + + if (nsge > 1 && i == (nsge - 1)) { + if (spraid_mod64(sg_dma_address(sg), page_size)) { + is_prp = false; + break; + } + } + } + + return is_prp; +} + +enum { + SPRAID_SGL_FMT_DATA_DESC = 0x00, + SPRAID_SGL_FMT_SEG_DESC = 0x02, + SPRAID_SGL_FMT_LAST_SEG_DESC = 0x03, + SPRAID_KEY_SGL_FMT_DATA_DESC = 0x04, + SPRAID_TRANSPORT_SGL_DATA_DESC = 0x05 +}; + +static void spraid_sgl_set_data(struct spraid_sgl_desc *sge, struct scatterlist *sg) +{ + sge->addr = cpu_to_le64(sg_dma_address(sg)); + sge->length = cpu_to_le32(sg_dma_len(sg)); + sge->type = SPRAID_SGL_FMT_DATA_DESC << 4; +} + +static void spraid_sgl_set_seg(struct spraid_sgl_desc *sge, dma_addr_t dma_addr, int entries) +{ + sge->addr = cpu_to_le64(dma_addr); + if (entries <= SGES_PER_PAGE) { + sge->length = cpu_to_le32(entries * sizeof(*sge)); + sge->type = SPRAID_SGL_FMT_LAST_SEG_DESC << 4; + } else { + sge->length = cpu_to_le32(PAGE_SIZE); + sge->type = SPRAID_SGL_FMT_SEG_DESC << 4; + } +} + +static int spraid_setup_ioq_cmd_sgl(struct spraid_dev *hdev, + struct scsi_cmnd *scmd, struct spraid_ioq_command *ioq_cmd, + struct spraid_iod *iod) +{ + struct spraid_sgl_desc *sg_list, *link, *old_sg_list; + struct scatterlist *sg = scsi_sglist(scmd); + void **list = spraid_iod_list(iod); + struct dma_pool *pool; + int nsge = iod->nsge; + dma_addr_t sgl_dma; + int i = 0; + + ioq_cmd->common.flags |= SPRAID_CMD_FLAG_SGL_METABUF; + + if (nsge == 1) { + spraid_sgl_set_data(&ioq_cmd->common.dptr.sgl, sg); + return 0; + } + + if (nsge <= (SMALL_POOL_SIZE / sizeof(struct spraid_sgl_desc))) { + pool = iod->spraidq->prp_small_pool; + iod->npages = 0; + } else { + pool = hdev->prp_page_pool; + iod->npages = 1; + } + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "Allocate first sgl_list failed\n"); + iod->npages = -1; + return -ENOMEM; + } + + list[0] = sg_list; + iod->first_dma = sgl_dma; + spraid_sgl_set_seg(&ioq_cmd->common.dptr.sgl, sgl_dma, nsge); + do { + if (i == SGES_PER_PAGE) { + old_sg_list = sg_list; + link = &old_sg_list[SGES_PER_PAGE - 1]; + + sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma); + if (!sg_list) { + dev_err_ratelimited(hdev->dev, "Allocate %dth sgl_list failed\n", + iod->npages + 1); + return -ENOMEM; + } + list[iod->npages++] = sg_list; + + i = 0; + memcpy(&sg_list[i++], link, sizeof(*link)); + spraid_sgl_set_seg(link, sgl_dma, nsge); + } + + spraid_sgl_set_data(&sg_list[i++], sg); + sg = sg_next(sg); + } while (--nsge > 0); + + return 0; +} + +#define SPRAID_RW_FUA BIT(14) + +static void spraid_setup_rw_cmd(struct spraid_dev *hdev, + struct spraid_rw_command *rw, + struct scsi_cmnd *scmd) +{ + u32 start_lba_lo, start_lba_hi; + u32 datalength = 0; + u16 control = 0; + + start_lba_lo = 0; + start_lba_hi = 0; + + if (scmd->sc_data_direction == DMA_TO_DEVICE) { + rw->opcode = SPRAID_CMD_WRITE; + } else if (scmd->sc_data_direction == DMA_FROM_DEVICE) { + rw->opcode = SPRAID_CMD_READ; + } else { + dev_err(hdev->dev, "Invalid IO for unsupported data direction: %d\n", + scmd->sc_data_direction); + WARN_ON(1); + } + + /* 6-byte READ(0x08) or WRITE(0x0A) cdb */ + if (scmd->cmd_len == 6) { + datalength = (u32)(scmd->cmnd[4] == 0 ? + IO_6_DEFAULT_TX_LEN : scmd->cmnd[4]); + start_lba_lo = ((u32)scmd->cmnd[1] << 16) | + ((u32)scmd->cmnd[2] << 8) | (u32)scmd->cmnd[3]; + + start_lba_lo &= 0x1FFFFF; + } + + /* 10-byte READ(0x28) or WRITE(0x2A) cdb */ + else if (scmd->cmd_len == 10) { + datalength = (u32)scmd->cmnd[8] | ((u32)scmd->cmnd[7] << 8); + start_lba_lo = ((u32)scmd->cmnd[2] << 24) | + ((u32)scmd->cmnd[3] << 16) | + ((u32)scmd->cmnd[4] << 8) | (u32)scmd->cmnd[5]; + + if (scmd->cmnd[1] & FUA_MASK) + control |= SPRAID_RW_FUA; + } + + /* 12-byte READ(0xA8) or WRITE(0xAA) cdb */ + else if (scmd->cmd_len == 12) { + datalength = ((u32)scmd->cmnd[6] << 24) | + ((u32)scmd->cmnd[7] << 16) | + ((u32)scmd->cmnd[8] << 8) | (u32)scmd->cmnd[9]; + start_lba_lo = ((u32)scmd->cmnd[2] << 24) | + ((u32)scmd->cmnd[3] << 16) | + ((u32)scmd->cmnd[4] << 8) | (u32)scmd->cmnd[5]; + + if (scmd->cmnd[1] & FUA_MASK) + control |= SPRAID_RW_FUA; + } + /* 16-byte READ(0x88) or WRITE(0x8A) cdb */ + else if (scmd->cmd_len == 16) { + datalength = ((u32)scmd->cmnd[10] << 24) | + ((u32)scmd->cmnd[11] << 16) | + ((u32)scmd->cmnd[12] << 8) | (u32)scmd->cmnd[13]; + start_lba_lo = ((u32)scmd->cmnd[6] << 24) | + ((u32)scmd->cmnd[7] << 16) | + ((u32)scmd->cmnd[8] << 8) | (u32)scmd->cmnd[9]; + start_lba_hi = ((u32)scmd->cmnd[2] << 24) | + ((u32)scmd->cmnd[3] << 16) | + ((u32)scmd->cmnd[4] << 8) | (u32)scmd->cmnd[5]; + + if (scmd->cmnd[1] & FUA_MASK) + control |= SPRAID_RW_FUA; + } + /* 32-byte READ(0x88) or WRITE(0x8A) cdb */ + else if (scmd->cmd_len == 32) { + datalength = ((u32)scmd->cmnd[28] << 24) | + ((u32)scmd->cmnd[29] << 16) | + ((u32)scmd->cmnd[30] << 8) | (u32)scmd->cmnd[31]; + start_lba_lo = ((u32)scmd->cmnd[16] << 24) | + ((u32)scmd->cmnd[17] << 16) | + ((u32)scmd->cmnd[18] << 8) | (u32)scmd->cmnd[19]; + start_lba_hi = ((u32)scmd->cmnd[12] << 24) | + ((u32)scmd->cmnd[13] << 16) | + ((u32)scmd->cmnd[14] << 8) | (u32)scmd->cmnd[15]; + + if (scmd->cmnd[10] & FUA_MASK) + control |= SPRAID_RW_FUA; + } + + if (unlikely(datalength > U16_MAX || datalength == 0)) { + dev_err(hdev->dev, "Invalid IO for illegal transfer data length: %u\n", + datalength); + WARN_ON(1); + } + + rw->slba = cpu_to_le64(((u64)start_lba_hi << 32) | start_lba_lo); + /* 0base for nlb */ + rw->nlb = cpu_to_le16((u16)(datalength - 1)); + rw->control = cpu_to_le16(control); +} + +static void spraid_setup_nonio_cmd(struct spraid_dev *hdev, + struct spraid_scsi_nonio *scsi_nonio, struct scsi_cmnd *scmd) +{ + scsi_nonio->buffer_len = cpu_to_le32(scsi_bufflen(scmd)); + + switch (scmd->sc_data_direction) { + case DMA_NONE: + scsi_nonio->opcode = SPRAID_CMD_NONIO_NONE; + break; + case DMA_TO_DEVICE: + scsi_nonio->opcode = SPRAID_CMD_NONIO_TODEV; + break; + case DMA_FROM_DEVICE: + scsi_nonio->opcode = SPRAID_CMD_NONIO_FROMDEV; + break; + default: + dev_err(hdev->dev, "Invalid IO for unsupported data direction: %d\n", + scmd->sc_data_direction); + WARN_ON(1); + } +} + +static void spraid_setup_ioq_cmd(struct spraid_dev *hdev, + struct spraid_ioq_command *ioq_cmd, struct scsi_cmnd *scmd) +{ + memcpy(ioq_cmd->common.cdb, scmd->cmnd, scmd->cmd_len); + ioq_cmd->common.cdb_len = scmd->cmd_len; + + if (spraid_is_rw_scmd(scmd)) + spraid_setup_rw_cmd(hdev, &ioq_cmd->rw, scmd); + else + spraid_setup_nonio_cmd(hdev, &ioq_cmd->scsi_nonio, scmd); +} + +static int spraid_init_iod(struct spraid_dev *hdev, + struct spraid_iod *iod, struct spraid_ioq_command *ioq_cmd, + struct scsi_cmnd *scmd) +{ + if (unlikely(!iod->sense)) { + dev_err(hdev->dev, "Allocate sense data buffer failed\n"); + return -ENOMEM; + } + ioq_cmd->common.sense_addr = cpu_to_le64(iod->sense_dma); + ioq_cmd->common.sense_len = cpu_to_le16(SCSI_SENSE_BUFFERSIZE); + + iod->nsge = 0; + iod->npages = -1; + iod->use_sgl = 0; + iod->sg_drv_mgmt = false; + WRITE_ONCE(iod->state, SPRAID_CMD_IDLE); + + return 0; +} + +static void spraid_free_iod_res(struct spraid_dev *hdev, struct spraid_iod *iod) +{ + const int last_prp = hdev->page_size / sizeof(__le64) - 1; + dma_addr_t dma_addr, next_dma_addr; + struct spraid_sgl_desc *sg_list; + __le64 *prp_list; + void *addr; + int i; + + dma_addr = iod->first_dma; + if (iod->npages == 0) + dma_pool_free(iod->spraidq->prp_small_pool, spraid_iod_list(iod)[0], dma_addr); + + for (i = 0; i < iod->npages; i++) { + addr = spraid_iod_list(iod)[i]; + + if (iod->use_sgl) { + sg_list = addr; + next_dma_addr = + le64_to_cpu((sg_list[SGES_PER_PAGE - 1]).addr); + } else { + prp_list = addr; + next_dma_addr = le64_to_cpu(prp_list[last_prp]); + } + + dma_pool_free(hdev->prp_page_pool, addr, dma_addr); + dma_addr = next_dma_addr; + } + + if (iod->sg_drv_mgmt && iod->sg != iod->inline_sg) { + iod->sg_drv_mgmt = false; + mempool_free(iod->sg, hdev->iod_mempool); + } + + iod->sense = NULL; + iod->npages = -1; +} + +static int spraid_io_map_data(struct spraid_dev *hdev, struct spraid_iod *iod, + struct scsi_cmnd *scmd, struct spraid_ioq_command *ioq_cmd) +{ + int ret; + + iod->nsge = scsi_dma_map(scmd); + + /* No data to DMA, it may be scsi no-rw command */ + if (unlikely(iod->nsge == 0)) + return 0; + + iod->length = scsi_bufflen(scmd); + iod->sg = scsi_sglist(scmd); + iod->use_sgl = !spraid_is_prp(hdev, scmd, iod->nsge); + + if (iod->use_sgl) { + ret = spraid_setup_ioq_cmd_sgl(hdev, scmd, ioq_cmd, iod); + } else { + ret = spraid_setup_prps(hdev, iod); + ioq_cmd->common.dptr.prp1 = + cpu_to_le64(sg_dma_address(iod->sg)); + ioq_cmd->common.dptr.prp2 = cpu_to_le64(iod->first_dma); + } + + if (ret) + scsi_dma_unmap(scmd); + + return ret; +} + +static void spraid_map_status(struct spraid_iod *iod, struct scsi_cmnd *scmd, + struct spraid_completion *cqe) +{ + scsi_set_resid(scmd, 0); + + switch ((le16_to_cpu(cqe->status) >> 1) & 0x7f) { + case FW_STAT_OK: + set_host_byte(scmd, DID_OK); + break; + case FW_STAT_NEED_CHECK: + set_host_byte(scmd, DID_OK); + scmd->result |= le16_to_cpu(cqe->status) >> 8; + if (scmd->result & SAM_STAT_CHECK_CONDITION) { + memset(scmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); + memcpy(scmd->sense_buffer, iod->sense, SCSI_SENSE_BUFFERSIZE); + set_driver_byte(scmd, DRIVER_SENSE); + } + break; + case FW_STAT_ABORTED: + set_host_byte(scmd, DID_ABORT); + break; + case FW_STAT_NEED_RETRY: + set_host_byte(scmd, DID_REQUEUE); + break; + default: + set_host_byte(scmd, DID_BAD_TARGET); + break; + } +} + +static inline void spraid_get_tag_from_scmd(struct scsi_cmnd *scmd, u16 *qid, u16 *cid) +{ + u32 tag = blk_mq_unique_tag(scmd->request); + + *qid = blk_mq_unique_tag_to_hwq(tag) + 1; + *cid = blk_mq_unique_tag_to_tag(tag); +} + +static int spraid_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) +{ + struct spraid_iod *iod = scsi_cmd_priv(scmd); + struct spraid_dev *hdev = shost_priv(shost); + struct scsi_device *sdev = scmd->device; + struct spraid_sdev_hostdata *hostdata; + struct spraid_ioq_command ioq_cmd; + struct spraid_queue *ioq; + unsigned long elapsed; + u16 hwq, cid; + int ret; + + if (unlikely(!scmd)) { + dev_err(hdev->dev, "err, scmd is null, return 0\n"); + return 0; + } + + if (unlikely(hdev->state != SPRAID_LIVE)) { + set_host_byte(scmd, DID_NO_CONNECT); + scmd->scsi_done(scmd); + dev_err(hdev->dev, "err, hdev state is not live.\n"); + return 0; + } + + if (log_debug_switch) + scsi_print_command(scmd); + + spraid_get_tag_from_scmd(scmd, &hwq, &cid); + hostdata = sdev->hostdata; + ioq = &hdev->queues[hwq]; + memset(&ioq_cmd, 0, sizeof(ioq_cmd)); + ioq_cmd.rw.hdid = cpu_to_le32(hostdata->hdid); + ioq_cmd.rw.command_id = cid; + + spraid_setup_ioq_cmd(hdev, &ioq_cmd, scmd); + + ret = cid * SCSI_SENSE_BUFFERSIZE; + iod->sense = ioq->sense + ret; + iod->sense_dma = ioq->sense_dma_addr + ret; + + ret = spraid_init_iod(hdev, iod, &ioq_cmd, scmd); + if (unlikely(ret)) + return SCSI_MLQUEUE_HOST_BUSY; + + iod->spraidq = ioq; + ret = spraid_io_map_data(hdev, iod, scmd, &ioq_cmd); + if (unlikely(ret)) { + dev_err(hdev->dev, "spraid_io_map_data Err.\n"); + set_host_byte(scmd, DID_ERROR); + scmd->scsi_done(scmd); + ret = 0; + goto deinit_iod; + } + + WRITE_ONCE(iod->state, SPRAID_CMD_IN_FLIGHT); + spraid_submit_cmd(ioq, &ioq_cmd); + elapsed = jiffies - scmd->jiffies_at_alloc; + dev_log_dbg(hdev->dev, "cid[%d], qid[%d] submit IO cost %3ld.%3ld seconds\n", + cid, hwq, elapsed / HZ, elapsed % HZ); + return 0; + +deinit_iod: + spraid_free_iod_res(hdev, iod); + return ret; +} + +static int spraid_match_dev(struct spraid_dev *hdev, u16 idx, struct scsi_device *sdev) +{ + if (SPRAID_DEV_INFO_FLAG_VALID(hdev->devices[idx].flag)) { + if (sdev->channel == hdev->devices[idx].channel && + sdev->id == le16_to_cpu(hdev->devices[idx].target) && + sdev->lun < hdev->devices[idx].lun) + return 1; + } + + return 0; +} + +static int spraid_slave_alloc(struct scsi_device *sdev) +{ + struct spraid_sdev_hostdata *hostdata; + struct spraid_dev *hdev; + u16 idx; + + hdev = shost_priv(sdev->host); + hostdata = kzalloc(sizeof(*hostdata), GFP_KERNEL); + if (!hostdata) { + dev_err(hdev->dev, "Alloc scsi host data memory failed\n"); + return -ENOMEM; + } + + down_read(&hdev->devices_rwsem); + for (idx = 0; idx < le32_to_cpu(hdev->ctrl_info->nd); idx++) { + if (spraid_match_dev(hdev, idx, sdev)) + goto scan_host; + } + up_read(&hdev->devices_rwsem); + + kfree(hostdata); + return -ENXIO; + +scan_host: + dev_log_dbg(hdev->dev, "Match device succed\n"); + + hostdata->hdid = le32_to_cpu(hdev->devices[idx].hdid); + sdev->hostdata = hostdata; + up_read(&hdev->devices_rwsem); + return 0; +} + +static void spraid_slave_destroy(struct scsi_device *sdev) +{ + kfree(sdev->hostdata); + sdev->hostdata = NULL; +} + +static int spraid_slave_configure(struct scsi_device *sdev) +{ + u16 idx; + unsigned int timeout = scmd_tmout_nonpt * HZ; + struct spraid_dev *hdev = shost_priv(sdev->host); + struct spraid_sdev_hostdata *hostdata = sdev->hostdata; + + if (!hostdata) { + idx = hostdata->hdid; + if (sdev->channel == hdev->devices[idx].channel && + sdev->id == le16_to_cpu(hdev->devices[idx].target) && + sdev->lun < hdev->devices[idx].lun) { + if (SPRAID_DEV_INFO_ATTR_PT(hdev->devices[idx].attr)) + timeout = scmd_tmout_pt * HZ; + else + timeout = scmd_tmout_nonpt * HZ; + } else { + dev_err(hdev->dev, "[%s] err, sdev->channel:id:lun[%d:%d:%lld];" + "devices[%d], channel:target:lun[%d:%d:%d]\n", + __func__, sdev->channel, sdev->id, sdev->lun, + idx, hdev->devices[idx].channel, + hdev->devices[idx].target, + hdev->devices[idx].lun); + } + } else { + dev_err(hdev->dev, "[%s] err, sdev->hostdata is null\n", __func__); + } + + blk_queue_rq_timeout(sdev->request_queue, timeout); + sdev->eh_timeout = timeout; + + dev_info(hdev->dev, "[%s] sdev->channel:id:lun[%d:%d:%lld], scmd_timeout[%d]s\n", + __func__, sdev->channel, sdev->id, sdev->lun, timeout / HZ); + + return 0; +} + +static void spraid_shost_init(struct spraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + u8 domain, bus; + u32 dev_func; + + domain = pci_domain_nr(pdev->bus); + bus = pdev->bus->number; + dev_func = pdev->devfn; + + hdev->shost->nr_hw_queues = hdev->online_queues - 1; + hdev->shost->can_queue = hdev->ioq_depth; + + hdev->shost->sg_tablesize = le16_to_cpu(hdev->ctrl_info->max_sge); + /* 512B per sector */ + hdev->shost->max_sectors = + (1U << ((hdev->ctrl_info->mdts) * 1U) << 12) / 512; + + hdev->shost->cmd_per_lun = MAX_CMD_PER_DEV; + hdev->shost->max_channel = le16_to_cpu(hdev->ctrl_info->max_channel); + hdev->shost->max_id = le16_to_cpu(hdev->ctrl_info->max_tgt_id); + hdev->shost->max_lun = le16_to_cpu(hdev->ctrl_info->max_lun); + + hdev->shost->this_id = -1; + hdev->shost->unique_id = (domain << 16) | (bus << 8) | dev_func; + hdev->shost->max_cmd_len = MAX_CDB_LEN; + hdev->shost->hostt->cmd_size = max(spraid_cmd_size(hdev, false, true), + spraid_cmd_size(hdev, false, false)); +} + +static inline void spraid_host_deinit(struct spraid_dev *hdev) +{ + ida_free(&spraid_instance_ida, hdev->instance); +} + +static int spraid_alloc_queue(struct spraid_dev *hdev, u16 qid, u16 depth) +{ + struct spraid_queue *spraidq = &hdev->queues[qid]; + int ret = 0; + + if (hdev->queue_count > qid) { + dev_info(hdev->dev, "[%s] warn: queue is exist\n", __func__); + return 0; + } + + spraidq->cqes = dma_alloc_coherent(hdev->dev, CQ_SIZE(depth), + &spraidq->cq_dma_addr, GFP_KERNEL | __GFP_ZERO); + if (!spraidq->cqes) + return -ENOMEM; + + spraidq->sq_cmds = dma_alloc_coherent(hdev->dev, SQ_SIZE(qid, depth), + &spraidq->sq_dma_addr, GFP_KERNEL); + if (!spraidq->sq_cmds) { + ret = -ENOMEM; + goto free_cqes; + } + + spin_lock_init(&spraidq->sq_lock); + spin_lock_init(&spraidq->cq_lock); + spraidq->hdev = hdev; + spraidq->q_depth = depth; + spraidq->qid = qid; + spraidq->cq_vector = -1; + hdev->queue_count++; + + /* alloc sense buffer */ + spraidq->sense = dma_alloc_coherent(hdev->dev, SENSE_SIZE(depth), + &spraidq->sense_dma_addr, GFP_KERNEL | __GFP_ZERO); + if (!spraidq->sense) { + ret = -ENOMEM; + goto free_sq_cmds; + } + + return 0; + +free_sq_cmds: + dma_free_coherent(hdev->dev, SQ_SIZE(qid, depth), (void *)spraidq->sq_cmds, + spraidq->sq_dma_addr); +free_cqes: + dma_free_coherent(hdev->dev, CQ_SIZE(depth), (void *)spraidq->cqes, + spraidq->cq_dma_addr); + return ret; +} + +static int spraid_wait_ready(struct spraid_dev *hdev, u64 cap, bool enabled) +{ + unsigned long timeout = + ((SPRAID_CAP_TIMEOUT(cap) + 1) * SPRAID_CAP_TIMEOUT_UNIT_MS) + jiffies; + u32 bit = enabled ? SPRAID_CSTS_RDY : 0; + + while ((readl(hdev->bar + SPRAID_REG_CSTS) & SPRAID_CSTS_RDY) != bit) { + usleep_range(1000, 2000); + if (fatal_signal_pending(current)) + return -EINTR; + + if (time_after(jiffies, timeout)) { + dev_err(hdev->dev, "Device not ready; aborting %s\n", + enabled ? "initialisation" : "reset"); + return -ENODEV; + } + } + return 0; +} + +static int spraid_shutdown_ctrl(struct spraid_dev *hdev) +{ + unsigned long timeout = SHUTDOWN_TIMEOUT + jiffies; + + hdev->ctrl_config &= ~SPRAID_CC_SHN_MASK; + hdev->ctrl_config |= SPRAID_CC_SHN_NORMAL; + writel(hdev->ctrl_config, hdev->bar + SPRAID_REG_CC); + + while ((readl(hdev->bar + SPRAID_REG_CSTS) & SPRAID_CSTS_SHST_MASK) != + SPRAID_CSTS_SHST_CMPLT) { + msleep(100); + if (fatal_signal_pending(current)) + return -EINTR; + if (time_after(jiffies, timeout)) { + dev_err(hdev->dev, "Device shutdown incomplete; abort shutdown\n"); + return -ENODEV; + } + } + return 0; +} + +static int spraid_disable_ctrl(struct spraid_dev *hdev) +{ + hdev->ctrl_config &= ~SPRAID_CC_SHN_MASK; + hdev->ctrl_config &= ~SPRAID_CC_ENABLE; + writel(hdev->ctrl_config, hdev->bar + SPRAID_REG_CC); + + return spraid_wait_ready(hdev, hdev->cap, false); +} + +static int spraid_enable_ctrl(struct spraid_dev *hdev) +{ + u64 cap = hdev->cap; + u32 dev_page_min = SPRAID_CAP_MPSMIN(cap) + 12; + u32 page_shift = PAGE_SHIFT; + + if (page_shift < dev_page_min) { + dev_err(hdev->dev, "Minimum device page size[%u], too large for host[%u]\n", + 1U << dev_page_min, 1U << page_shift); + return -ENODEV; + } + + page_shift = min_t(unsigned int, SPRAID_CAP_MPSMAX(cap) + 12, PAGE_SHIFT); + hdev->page_size = 1U << page_shift; + + hdev->ctrl_config = SPRAID_CC_CSS_NVM; + hdev->ctrl_config |= (page_shift - 12) << SPRAID_CC_MPS_SHIFT; + hdev->ctrl_config |= SPRAID_CC_AMS_RR | SPRAID_CC_SHN_NONE; + hdev->ctrl_config |= SPRAID_CC_IOSQES | SPRAID_CC_IOCQES; + hdev->ctrl_config |= SPRAID_CC_ENABLE; + writel(hdev->ctrl_config, hdev->bar + SPRAID_REG_CC); + + return spraid_wait_ready(hdev, cap, true); +} + +static void spraid_init_queue(struct spraid_queue *spraidq, u16 qid) +{ + struct spraid_dev *hdev = spraidq->hdev; + + spraidq->sq_tail = 0; + spraidq->cq_head = 0; + spraidq->cq_phase = 1; + spraidq->q_db = &hdev->dbs[qid * 2 * hdev->db_stride]; + spraidq->prp_small_pool = hdev->prp_small_pool[qid % small_pool_num]; + hdev->online_queues++; +} + +static inline bool spraid_cqe_pending(struct spraid_queue *spraidq) +{ + return (le16_to_cpu(spraidq->cqes[spraidq->cq_head].status) & 1) == + spraidq->cq_phase; +} + +static void spraid_complete_ioq_cmnd(struct spraid_queue *ioq, struct spraid_completion *cqe) +{ + struct spraid_dev *hdev = ioq->hdev; + struct blk_mq_tags *tags; + struct scsi_cmnd *scmd; + struct spraid_iod *iod; + struct request *req; + unsigned long elapsed; + + tags = hdev->shost->tag_set.tags[ioq->qid - 1]; + req = blk_mq_tag_to_rq(tags, cqe->cmd_id); + if (unlikely(!req || !blk_mq_request_started(req))) { + dev_warn(hdev->dev, "Invalid id %d completed on queue %d\n", + cqe->cmd_id, ioq->qid); + return; + } + + scmd = blk_mq_rq_to_pdu(req); + iod = scsi_cmd_priv(scmd); + + elapsed = jiffies - scmd->jiffies_at_alloc; + dev_log_dbg(hdev->dev, "cid[%d], qid[%d] finish IO cost %3ld.%3ld seconds\n", + cqe->cmd_id, ioq->qid, elapsed / HZ, elapsed % HZ); + + if (cmpxchg(&iod->state, SPRAID_CMD_IN_FLIGHT, SPRAID_CMD_COMPLETE) != + SPRAID_CMD_IN_FLIGHT) { + dev_warn(hdev->dev, "cid[%d], qid[%d] enters abnormal handler, cost %3ld.%3ld seconds\n", + cqe->cmd_id, ioq->qid, elapsed / HZ, elapsed % HZ); + WRITE_ONCE(iod->state, SPRAID_CMD_TMO_COMPLETE); + + if (iod->nsge) { + iod->nsge = 0; + scsi_dma_unmap(scmd); + } + spraid_free_iod_res(hdev, iod); + + return; + } + + spraid_map_status(iod, scmd, cqe); + if (iod->nsge) { + iod->nsge = 0; + scsi_dma_unmap(scmd); + } + spraid_free_iod_res(hdev, iod); + scmd->scsi_done(scmd); +} + +static inline void spraid_end_admin_request(struct request *req, __le16 status, + __le32 result0, __le32 result1) +{ + struct spraid_admin_request *rq = spraid_admin_req(req); + + rq->status = le16_to_cpu(status) >> 1; + rq->result0 = le32_to_cpu(result0); + rq->result1 = le32_to_cpu(result1); + blk_mq_complete_request(req); +} + +static void spraid_complete_adminq_cmnd(struct spraid_queue *adminq, struct spraid_completion *cqe) +{ + struct blk_mq_tags *tags = adminq->hdev->admin_tagset.tags[0]; + struct request *req; + + req = blk_mq_tag_to_rq(tags, cqe->cmd_id); + if (unlikely(!req)) { + dev_warn(adminq->hdev->dev, "Invalid id %d completed on queue %d\n", + cqe->cmd_id, le16_to_cpu(cqe->sq_id)); + return; + } + spraid_end_admin_request(req, cqe->status, cqe->result, cqe->result1); +} + +static void spraid_complete_aen(struct spraid_queue *spraidq, struct spraid_completion *cqe) +{ + struct spraid_dev *hdev = spraidq->hdev; + u32 result = le32_to_cpu(cqe->result); + + dev_info(hdev->dev, "rcv aen, status[%x], result[%x]\n", + le16_to_cpu(cqe->status) >> 1, result); + + if ((le16_to_cpu(cqe->status) >> 1) != SPRAID_SC_SUCCESS) + return; + switch (result & 0x7) { + case SPRAID_AEN_NOTICE: + spraid_handle_aen_notice(hdev, result); + break; + case SPRAID_AEN_VS: + spraid_handle_aen_vs(hdev, result); + break; + default: + dev_warn(hdev->dev, "Unsupported async event type: %u\n", + result & 0x7); + break; + } + queue_work(spraid_wq, &hdev->aen_work); +} + +static inline void spraid_handle_cqe(struct spraid_queue *spraidq, u16 idx) +{ + struct spraid_completion *cqe = &spraidq->cqes[idx]; + struct spraid_dev *hdev = spraidq->hdev; + + if (unlikely(cqe->cmd_id >= spraidq->q_depth)) { + dev_err(hdev->dev, "Invalid command id[%d] completed on queue %d\n", + cqe->cmd_id, cqe->sq_id); + return; + } + + dev_log_dbg(hdev->dev, "cid[%d], qid[%d], result[0x%x], sq_id[%d], status[0x%x]\n", + cqe->cmd_id, spraidq->qid, le32_to_cpu(cqe->result), + le16_to_cpu(cqe->sq_id), le16_to_cpu(cqe->status)); + + if (unlikely(spraidq->qid == 0 && cqe->cmd_id >= SPRAID_AQ_BLK_MQ_DEPTH)) { + spraid_complete_aen(spraidq, cqe); + return; + } + + if (spraidq->qid) + spraid_complete_ioq_cmnd(spraidq, cqe); + else + spraid_complete_adminq_cmnd(spraidq, cqe); +} + +static void spraid_complete_cqes(struct spraid_queue *spraidq, u16 start, u16 end) +{ + while (start != end) { + spraid_handle_cqe(spraidq, start); + if (++start == spraidq->q_depth) + start = 0; + } +} + +static inline void spraid_update_cq_head(struct spraid_queue *spraidq) +{ + if (++spraidq->cq_head == spraidq->q_depth) { + spraidq->cq_head = 0; + spraidq->cq_phase = !spraidq->cq_phase; + } +} + +static inline bool spraid_process_cq(struct spraid_queue *spraidq, u16 *start, u16 *end, int tag) +{ + bool found = false; + + *start = spraidq->cq_head; + while (!found && spraid_cqe_pending(spraidq)) { + if (spraidq->cqes[spraidq->cq_head].cmd_id == tag) + found = true; + spraid_update_cq_head(spraidq); + } + *end = spraidq->cq_head; + + if (*start != *end) + writel(spraidq->cq_head, spraidq->q_db + spraidq->hdev->db_stride); + + return found; +} + +static bool spraid_poll_cq(struct spraid_queue *spraidq, int cid) +{ + u16 start, end; + bool found; + + if (!spraid_cqe_pending(spraidq)) + return 0; + + spin_lock_irq(&spraidq->cq_lock); + found = spraid_process_cq(spraidq, &start, &end, cid); + spin_unlock_irq(&spraidq->cq_lock); + + spraid_complete_cqes(spraidq, start, end); + return found; +} + +static irqreturn_t spraid_irq(int irq, void *data) +{ + struct spraid_queue *spraidq = data; + irqreturn_t ret = IRQ_NONE; + u16 start, end; + + spin_lock(&spraidq->cq_lock); + if (spraidq->cq_head != spraidq->last_cq_head) + ret = IRQ_HANDLED; + + spraid_process_cq(spraidq, &start, &end, -1); + spraidq->last_cq_head = spraidq->cq_head; + spin_unlock(&spraidq->cq_lock); + + if (start != end) { + spraid_complete_cqes(spraidq, start, end); + ret = IRQ_HANDLED; + } + return ret; +} + +static int spraid_setup_admin_queue(struct spraid_dev *hdev) +{ + struct spraid_queue *adminq = &hdev->queues[0]; + u32 aqa; + int ret; + + dev_info(hdev->dev, "[%s] start disable ctrl\n", __func__); + + ret = spraid_disable_ctrl(hdev); + if (ret) + return ret; + + ret = spraid_alloc_queue(hdev, 0, SPRAID_AQ_DEPTH); + if (ret) + return ret; + + aqa = adminq->q_depth - 1; + aqa |= aqa << 16; + writel(aqa, hdev->bar + SPRAID_REG_AQA); + lo_hi_writeq(adminq->sq_dma_addr, hdev->bar + SPRAID_REG_ASQ); + lo_hi_writeq(adminq->cq_dma_addr, hdev->bar + SPRAID_REG_ACQ); + + dev_info(hdev->dev, "[%s] start enable ctrl\n", __func__); + + ret = spraid_enable_ctrl(hdev); + if (ret) { + ret = -ENODEV; + goto free_queue; + } + + adminq->cq_vector = 0; + spraid_init_queue(adminq, 0); + ret = pci_request_irq(hdev->pdev, adminq->cq_vector, spraid_irq, NULL, + adminq, "spraid%d_q%d", hdev->instance, adminq->qid); + + if (ret) { + adminq->cq_vector = -1; + hdev->online_queues--; + goto free_queue; + } + + dev_info(hdev->dev, "[%s] success, queuecount:[%d], onlinequeue:[%d]\n", + __func__, hdev->queue_count, hdev->online_queues); + + return 0; + +free_queue: + spraid_free_queue(adminq); + return ret; +} + +static u32 spraid_bar_size(struct spraid_dev *hdev, u32 nr_ioqs) +{ + return (SPRAID_REG_DBS + ((nr_ioqs + 1) * 8 * hdev->db_stride)); +} + +static inline void spraid_clear_spraid_request(struct request *req) +{ + if (!(req->rq_flags & RQF_DONTPREP)) { + spraid_admin_req(req)->flags = 0; + req->rq_flags |= RQF_DONTPREP; + } +} + +static struct request *spraid_alloc_admin_request(struct request_queue *q, + struct spraid_admin_command *cmd, + blk_mq_req_flags_t flags) +{ + u32 op = COMMAND_IS_WRITE(cmd) ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN; + struct request *req; + + req = blk_mq_alloc_request(q, op, flags); + if (IS_ERR(req)) + return req; + req->cmd_flags |= REQ_FAILFAST_DRIVER; + spraid_clear_spraid_request(req); + spraid_admin_req(req)->cmd = cmd; + + return req; +} + +static int spraid_submit_admin_sync_cmd(struct request_queue *q, + struct spraid_admin_command *cmd, + u32 *result, void *buffer, + u32 bufflen, u32 timeout, int at_head, blk_mq_req_flags_t flags) +{ + struct request *req; + int ret; + + req = spraid_alloc_admin_request(q, cmd, flags); + if (IS_ERR(req)) + return PTR_ERR(req); + + req->timeout = timeout ? timeout : ADMIN_TIMEOUT; + if (buffer && bufflen) { + ret = blk_rq_map_kern(q, req, buffer, bufflen, GFP_KERNEL); + if (ret) + goto out; + } + blk_execute_rq(req->q, NULL, req, at_head); + + if (result) + *result = spraid_admin_req(req)->result0; + + if (spraid_admin_req(req)->flags & SPRAID_REQ_CANCELLED) + ret = -EINTR; + else + ret = spraid_admin_req(req)->status; + +out: + blk_mq_free_request(req); + return ret; +} + +static int spraid_create_cq(struct spraid_dev *hdev, u16 qid, + struct spraid_queue *spraidq, u16 cq_vector) +{ + struct spraid_admin_command admin_cmd; + int flags = SPRAID_QUEUE_PHYS_CONTIG | SPRAID_CQ_IRQ_ENABLED; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.create_cq.opcode = SPRAID_ADMIN_CREATE_CQ; + admin_cmd.create_cq.prp1 = cpu_to_le64(spraidq->cq_dma_addr); + admin_cmd.create_cq.cqid = cpu_to_le16(qid); + admin_cmd.create_cq.qsize = cpu_to_le16(spraidq->q_depth - 1); + admin_cmd.create_cq.cq_flags = cpu_to_le16(flags); + admin_cmd.create_cq.irq_vector = cpu_to_le16(cq_vector); + + return spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, NULL, + NULL, 0, 0, 0, 0); +} + +static int spraid_create_sq(struct spraid_dev *hdev, u16 qid, + struct spraid_queue *spraidq) +{ + struct spraid_admin_command admin_cmd; + int flags = SPRAID_QUEUE_PHYS_CONTIG; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.create_sq.opcode = SPRAID_ADMIN_CREATE_SQ; + admin_cmd.create_sq.prp1 = cpu_to_le64(spraidq->sq_dma_addr); + admin_cmd.create_sq.sqid = cpu_to_le16(qid); + admin_cmd.create_sq.qsize = cpu_to_le16(spraidq->q_depth - 1); + admin_cmd.create_sq.sq_flags = cpu_to_le16(flags); + admin_cmd.create_sq.cqid = cpu_to_le16(qid); + + return spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, NULL, + NULL, 0, 0, 0, 0); +} + +static void spraid_free_queue(struct spraid_queue *spraidq) +{ + struct spraid_dev *hdev = spraidq->hdev; + + hdev->queue_count--; + dma_free_coherent(hdev->dev, CQ_SIZE(spraidq->q_depth), + (void *)spraidq->cqes, spraidq->cq_dma_addr); + dma_free_coherent(hdev->dev, SQ_SIZE(spraidq->qid, spraidq->q_depth), + spraidq->sq_cmds, spraidq->sq_dma_addr); + dma_free_coherent(hdev->dev, SENSE_SIZE(spraidq->q_depth), + spraidq->sense, spraidq->sense_dma_addr); +} + +static void spraid_free_admin_queue(struct spraid_dev *hdev) +{ + spraid_free_queue(&hdev->queues[0]); +} + +static void spraid_free_io_queues(struct spraid_dev *hdev) +{ + int i; + + for (i = hdev->queue_count - 1; i >= 1; i--) + spraid_free_queue(&hdev->queues[i]); +} + +static int spraid_delete_queue(struct spraid_dev *hdev, u8 op, u16 id) +{ + struct spraid_admin_command admin_cmd; + int ret; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.delete_queue.opcode = op; + admin_cmd.delete_queue.qid = cpu_to_le16(id); + + ret = spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, NULL, + NULL, 0, 0, 0, 0); + + if (ret) + dev_err(hdev->dev, "Delete %s:[%d] failed\n", + (op == SPRAID_ADMIN_DELETE_CQ) ? "cq" : "sq", id); + + return ret; +} + +static int spraid_delete_cq(struct spraid_dev *hdev, u16 cqid) +{ + return spraid_delete_queue(hdev, SPRAID_ADMIN_DELETE_CQ, cqid); +} + +static int spraid_delete_sq(struct spraid_dev *hdev, u16 sqid) +{ + return spraid_delete_queue(hdev, SPRAID_ADMIN_DELETE_SQ, sqid); +} + +static int spraid_create_queue(struct spraid_queue *spraidq, u16 qid) +{ + struct spraid_dev *hdev = spraidq->hdev; + u16 cq_vector; + int ret; + + cq_vector = (hdev->num_vecs == 1) ? 0 : qid; + ret = spraid_create_cq(hdev, qid, spraidq, cq_vector); + if (ret) + return ret; + + ret = spraid_create_sq(hdev, qid, spraidq); + if (ret) + goto delete_cq; + + spraid_init_queue(spraidq, qid); + spraidq->cq_vector = cq_vector; + + ret = pci_request_irq(hdev->pdev, cq_vector, spraid_irq, NULL, + spraidq, "spraid%d_q%d", hdev->instance, qid); + + if (ret) { + dev_err(hdev->dev, "Request queue[%d] irq failed\n", qid); + goto delete_sq; + } + + return 0; + +delete_sq: + spraidq->cq_vector = -1; + hdev->online_queues--; + spraid_delete_sq(hdev, qid); +delete_cq: + spraid_delete_cq(hdev, qid); + + return ret; +} + +static int spraid_create_io_queues(struct spraid_dev *hdev) +{ + u32 i, max; + int ret = 0; + + max = min(hdev->max_qid, hdev->queue_count - 1); + for (i = hdev->online_queues; i <= max; i++) { + ret = spraid_create_queue(&hdev->queues[i], i); + if (ret) { + dev_err(hdev->dev, "Create queue[%d] failed\n", i); + break; + } + } + + dev_info(hdev->dev, "[%s] queue_count[%d], online_queue[%d]", + __func__, hdev->queue_count, hdev->online_queues); + + return ret >= 0 ? 0 : ret; +} + +static int spraid_set_features(struct spraid_dev *hdev, u32 fid, u32 dword11, void *buffer, + size_t buflen, u32 *result) +{ + struct spraid_admin_command admin_cmd; + u32 res; + int ret; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.features.opcode = SPRAID_ADMIN_SET_FEATURES; + admin_cmd.features.fid = cpu_to_le32(fid); + admin_cmd.features.dword11 = cpu_to_le32(dword11); + + ret = spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, &res, + buffer, buflen, 0, 0, 0); + + if (!ret && result) + *result = res; + + return ret; +} + +static int spraid_configure_timestamp(struct spraid_dev *hdev) +{ + __le64 ts; + int ret; + + ts = cpu_to_le64(ktime_to_ms(ktime_get_real())); + ret = spraid_set_features(hdev, SPRAID_FEAT_TIMESTAMP, 0, &ts, sizeof(ts), NULL); + + if (ret) + dev_err(hdev->dev, "set timestamp failed: %d\n", ret); + return ret; +} + +static int spraid_set_queue_cnt(struct spraid_dev *hdev, u32 *cnt) +{ + u32 q_cnt = (*cnt - 1) | ((*cnt - 1) << 16); + u32 nr_ioqs, result; + int status; + + status = spraid_set_features(hdev, SPRAID_FEAT_NUM_QUEUES, q_cnt, NULL, 0, &result); + if (status) { + dev_err(hdev->dev, "Set queue count failed, status: %d\n", + status); + return -EIO; + } + + nr_ioqs = min(result & 0xffff, result >> 16) + 1; + *cnt = min(*cnt, nr_ioqs); + if (*cnt == 0) { + dev_err(hdev->dev, "Illegal queue count: zero\n"); + return -EIO; + } + return 0; +} + +static int spraid_setup_io_queues(struct spraid_dev *hdev) +{ + struct spraid_queue *adminq = &hdev->queues[0]; + struct pci_dev *pdev = hdev->pdev; + u32 nr_ioqs = num_online_cpus(); + u32 i, size; + int ret; + + struct irq_affinity affd = { + .pre_vectors = 1 + }; + + ret = spraid_set_queue_cnt(hdev, &nr_ioqs); + if (ret < 0) + return ret; + + size = spraid_bar_size(hdev, nr_ioqs); + ret = spraid_remap_bar(hdev, size); + if (ret) + return -ENOMEM; + + adminq->q_db = hdev->dbs; + + pci_free_irq(pdev, 0, adminq); + pci_free_irq_vectors(pdev); + + ret = pci_alloc_irq_vectors_affinity(pdev, 1, (nr_ioqs + 1), + PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); + if (ret <= 0) + return -EIO; + + hdev->num_vecs = ret; + + hdev->max_qid = max(ret - 1, 1); + + ret = pci_request_irq(pdev, adminq->cq_vector, spraid_irq, NULL, + adminq, "spraid%d_q%d", hdev->instance, adminq->qid); + if (ret) { + dev_err(hdev->dev, "Request admin irq failed\n"); + adminq->cq_vector = -1; + return ret; + } + + for (i = hdev->queue_count; i <= hdev->max_qid; i++) { + ret = spraid_alloc_queue(hdev, i, hdev->ioq_depth); + if (ret) + break; + } + dev_info(hdev->dev, "[%s] max_qid: %d, queue_count: %d, online_queue: %d, ioq_depth: %d\n", + __func__, hdev->max_qid, hdev->queue_count, + hdev->online_queues, hdev->ioq_depth); + + return spraid_create_io_queues(hdev); +} + +static void spraid_delete_io_queues(struct spraid_dev *hdev) +{ + u16 queues = hdev->online_queues - 1; + u8 opcode = SPRAID_ADMIN_DELETE_SQ; + u16 i, pass; + + if (hdev->online_queues < 2) { + dev_err(hdev->dev, "[%s] err, io queue has been delete\n", __func__); + return; + } + + for (pass = 0; pass < 2; pass++) { + for (i = queues; i > 0; i--) + if (spraid_delete_queue(hdev, opcode, i)) + break; + + opcode = SPRAID_ADMIN_DELETE_CQ; + } +} + +static void spraid_remove_io_queues(struct spraid_dev *hdev) +{ + spraid_delete_io_queues(hdev); + spraid_free_io_queues(hdev); +} + +static void spraid_pci_disable(struct spraid_dev *hdev) +{ + struct pci_dev *pdev = hdev->pdev; + u32 i; + + for (i = 0; i < hdev->online_queues; i++) + pci_free_irq(pdev, hdev->queues[i].cq_vector, &hdev->queues[i]); + pci_free_irq_vectors(pdev); + if (pci_is_enabled(pdev)) { + pci_disable_pcie_error_reporting(pdev); + pci_disable_device(pdev); + } + hdev->online_queues = 0; +} + +static void spraid_disable_admin_queue(struct spraid_dev *hdev, bool shutdown) +{ + struct spraid_queue *adminq = &hdev->queues[0]; + u16 start, end; + + if (shutdown) + spraid_shutdown_ctrl(hdev); + else + spraid_disable_ctrl(hdev); + + if (hdev->queue_count == 0 || hdev->queue_count > 129) { + dev_err(hdev->dev, "[%s] err, admin queue has been delete ,queue count: %d\n", + __func__, hdev->queue_count); + return; + } + + spin_lock_irq(&adminq->cq_lock); + spraid_process_cq(adminq, &start, &end, -1); + spin_unlock_irq(&adminq->cq_lock); + + spraid_complete_cqes(adminq, start, end); + spraid_free_admin_queue(hdev); +} + +static int spraid_create_dma_pools(struct spraid_dev *hdev) +{ + int i; + char poolname[20] = { 0 }; + + hdev->prp_page_pool = dma_pool_create("prp list page", hdev->dev, + PAGE_SIZE, PAGE_SIZE, 0); + + if (!hdev->prp_page_pool) { + dev_err(hdev->dev, "create prp_page_pool failed\n"); + goto destroy_sense_pool; + } + + for (i = 0; i < small_pool_num; i++) { + sprintf(poolname, "prp_list_256_%d", i); + hdev->prp_small_pool[i] = dma_pool_create(poolname, hdev->dev, SMALL_POOL_SIZE, + SMALL_POOL_SIZE, 0); + + if (!hdev->prp_small_pool[i]) { + dev_err(hdev->dev, "create prp_small_pool %d failed\n", i); + goto destroy_prp_small_pool; + } + } + + return 0; + +destroy_prp_small_pool: + while (i > 0) + dma_pool_destroy(hdev->prp_small_pool[--i]); + dma_pool_destroy(hdev->prp_page_pool); +destroy_sense_pool: + dma_pool_destroy(hdev->prp_page_pool); + + return -ENOMEM; +} + +static void spraid_destroy_dma_pools(struct spraid_dev *hdev) +{ + int i; + + for (i = 0; i < small_pool_num; i++) + dma_pool_destroy(hdev->prp_small_pool[i]); + dma_pool_destroy(hdev->prp_page_pool); +} + +static int spraid_get_dev_list(struct spraid_dev *hdev, struct spraid_dev_info *devices) +{ + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + struct spraid_admin_command admin_cmd; + struct spraid_dev_list *list_buf; + u32 i, idx, hdid, ndev; + int ret = 0; + + list_buf = kmalloc(sizeof(*list_buf), GFP_KERNEL); + if (!list_buf) + return -ENOMEM; + + for (idx = 0; idx < nd;) { + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.get_info.opcode = SPRAID_ADMIN_GET_INFO; + admin_cmd.get_info.type = SPRAID_GET_INFO_DEV_LIST; + admin_cmd.get_info.cdw11 = cpu_to_le32(idx); + + ret = spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, NULL, list_buf, + sizeof(*list_buf), 0, 0, 0); + + if (ret) { + dev_err(hdev->dev, "Get device list failed, nd: %u, idx: %u, ret: %d\n", + nd, idx, ret); + goto out; + } + ndev = le32_to_cpu(list_buf->dev_num); + + dev_log_dbg(hdev->dev, "ndev: %u\n", ndev); + + for (i = 0; i < ndev; i++) { + hdid = le32_to_cpu(list_buf->devices[i].hdid); + dev_log_dbg(hdev->dev, "list_buf->devices[%d], hdid: %u target: %d, channel: %d, lun: %d, attr[%x]\n", + i, hdid, + le16_to_cpu(list_buf->devices[i].target), + list_buf->devices[i].channel, + list_buf->devices[i].lun, + list_buf->devices[i].attr); + if (hdid >= nd) { + dev_err(hdev->dev, "err, hdid[%d] bigger than nd[%d]\n", + hdid, nd); + continue; + } + memcpy(&devices[hdid], &list_buf->devices[i], + sizeof(struct spraid_dev_info)); + } + idx += ndev; + + if (idx < MAX_DEV_ENTRY_PER_PAGE_4K) + break; + } + +out: + kfree(list_buf); + return ret; +} + +static void spraid_send_aen(struct spraid_dev *hdev) +{ + struct spraid_queue *adminq = &hdev->queues[0]; + struct spraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.common.opcode = SPRAID_ADMIN_ASYNC_EVENT; + admin_cmd.common.command_id = SPRAID_AQ_BLK_MQ_DEPTH; + + spraid_submit_cmd(adminq, &admin_cmd); + dev_info(hdev->dev, "send aen, cid[%d]\n", SPRAID_AQ_BLK_MQ_DEPTH); +} + +static int spraid_add_device(struct spraid_dev *hdev, struct spraid_dev_info *device) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + sdev = scsi_device_lookup(shost, device->channel, le16_to_cpu(device->target), 0); + if (sdev) { + dev_warn(hdev->dev, "Device is already exist, channel: %d, target_id: %d, lun: %d\n", + device->channel, le16_to_cpu(device->target), 0); + scsi_device_put(sdev); + return -EEXIST; + } + scsi_add_device(shost, device->channel, le16_to_cpu(device->target), 0); + return 0; +} + +static int spraid_rescan_device(struct spraid_dev *hdev, struct spraid_dev_info *device) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + sdev = scsi_device_lookup(shost, device->channel, le16_to_cpu(device->target), 0); + if (!sdev) { + dev_warn(hdev->dev, "Device is not exit, channel: %d, target_id: %d, lun: %d\n", + device->channel, le16_to_cpu(device->target), 0); + return -ENODEV; + } + + scsi_rescan_device(&sdev->sdev_gendev); + scsi_device_put(sdev); + return 0; +} + +static int spraid_remove_device(struct spraid_dev *hdev, struct spraid_dev_info *org_device) +{ + struct Scsi_Host *shost = hdev->shost; + struct scsi_device *sdev; + + sdev = scsi_device_lookup(shost, org_device->channel, le16_to_cpu(org_device->target), 0); + if (!sdev) { + dev_warn(hdev->dev, "Device is not exit, channel: %d, target_id: %d, lun: %d\n", + org_device->channel, le16_to_cpu(org_device->target), 0); + return -ENODEV; + } + + scsi_remove_device(sdev); + scsi_device_put(sdev); + return 0; +} + +static int spraid_dev_list_init(struct spraid_dev *hdev) +{ + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + int i, ret; + + hdev->devices = kzalloc_node(nd * sizeof(struct spraid_dev_info), + GFP_KERNEL, hdev->numa_node); + if (!hdev->devices) + return -ENOMEM; + + ret = spraid_get_dev_list(hdev, hdev->devices); + if (ret) { + dev_err(hdev->dev, "Ignore failure of getting device list within initialization\n"); + return 0; + } + + for (i = 0; i < nd; i++) { + if (SPRAID_DEV_INFO_FLAG_VALID(hdev->devices[i].flag) && + SPRAID_DEV_INFO_ATTR_BOOT(hdev->devices[i].attr)) { + spraid_add_device(hdev, &hdev->devices[i]); + break; + } + } + return 0; +} + +static void spraid_scan_work(struct work_struct *work) +{ + struct spraid_dev *hdev = + container_of(work, struct spraid_dev, scan_work); + struct spraid_dev_info *devices, *org_devices; + u32 nd = le32_to_cpu(hdev->ctrl_info->nd); + u8 flag, org_flag; + int i, ret; + + devices = kcalloc(nd, sizeof(struct spraid_dev_info), GFP_KERNEL); + if (!devices) + return; + ret = spraid_get_dev_list(hdev, devices); + if (ret) + goto free_list; + org_devices = hdev->devices; + for (i = 0; i < nd; i++) { + org_flag = org_devices[i].flag; + flag = devices[i].flag; + + dev_log_dbg(hdev->dev, "i: %d, org_flag: 0x%x, flag: 0x%x\n", + i, org_flag, flag); + + if (SPRAID_DEV_INFO_FLAG_VALID(flag)) { + if (!SPRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->devices_rwsem); + memcpy(&org_devices[i], &devices[i], + sizeof(struct spraid_dev_info)); + up_write(&hdev->devices_rwsem); + spraid_add_device(hdev, &devices[i]); + } else if (SPRAID_DEV_INFO_FLAG_CHANGE(flag)) { + spraid_rescan_device(hdev, &devices[i]); + } + } else { + if (SPRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->devices_rwsem); + org_devices[i].flag &= 0xfe; + up_write(&hdev->devices_rwsem); + spraid_remove_device(hdev, &org_devices[i]); + } + } + } +free_list: + kfree(devices); +} + +static void spraid_timesyn_work(struct work_struct *work) +{ + struct spraid_dev *hdev = + container_of(work, struct spraid_dev, timesyn_work); + + spraid_configure_timestamp(hdev); +} + +static void spraid_queue_scan(struct spraid_dev *hdev) +{ + queue_work(spraid_wq, &hdev->scan_work); +} + +static void spraid_handle_aen_notice(struct spraid_dev *hdev, u32 result) +{ + switch ((result & 0xff00) >> 8) { + case SPRAID_AEN_DEV_CHANGED: + spraid_queue_scan(hdev); + break; + case SPRAID_AEN_HOST_PROBING: + break; + default: + dev_warn(hdev->dev, "async event result %08x\n", result); + } +} + +static void spraid_handle_aen_vs(struct spraid_dev *hdev, u32 result) +{ + switch (result) { + case SPRAID_AEN_TIMESYN: + queue_work(spraid_wq, &hdev->timesyn_work); + break; + default: + dev_warn(hdev->dev, "async event result: %x\n", result); + } +} + +static void spraid_async_event_work(struct work_struct *work) +{ + struct spraid_dev *hdev = + container_of(work, struct spraid_dev, aen_work); + + spraid_send_aen(hdev); +} + +static int spraid_alloc_resources(struct spraid_dev *hdev) +{ + int ret, nqueue; + + ret = ida_alloc(&spraid_instance_ida, GFP_KERNEL); + if (ret < 0) { + dev_err(hdev->dev, "Get instance id failed\n"); + return ret; + } + hdev->instance = ret; + + hdev->ctrl_info = kzalloc_node(sizeof(*hdev->ctrl_info), + GFP_KERNEL, hdev->numa_node); + if (!hdev->ctrl_info) { + ret = -ENOMEM; + goto release_instance; + } + + ret = spraid_create_dma_pools(hdev); + if (ret) + goto free_ctrl_info; + nqueue = num_possible_cpus() + 1; + hdev->queues = kcalloc_node(nqueue, sizeof(struct spraid_queue), + GFP_KERNEL, hdev->numa_node); + if (!hdev->queues) { + ret = -ENOMEM; + goto destroy_dma_pools; + } + + dev_info(hdev->dev, "[%s] queues num: %d\n", __func__, nqueue); + + return 0; + +destroy_dma_pools: + spraid_destroy_dma_pools(hdev); +free_ctrl_info: + kfree(hdev->ctrl_info); +release_instance: + ida_free(&spraid_instance_ida, hdev->instance); + return ret; +} + +static void spraid_free_resources(struct spraid_dev *hdev) +{ + kfree(hdev->queues); + spraid_destroy_dma_pools(hdev); + kfree(hdev->ctrl_info); + ida_free(&spraid_instance_ida, hdev->instance); +} + +static void spraid_setup_passthrough(struct request *req, struct spraid_admin_command *cmd) +{ + memcpy(cmd, spraid_admin_req(req)->cmd, sizeof(*cmd)); + cmd->common.flags &= ~SPRAID_CMD_FLAG_SGL_ALL; +} + +static inline void spraid_clear_hreq(struct request *req) +{ + if (!(req->rq_flags & RQF_DONTPREP)) { + spraid_admin_req(req)->flags = 0; + req->rq_flags |= RQF_DONTPREP; + } +} + +static blk_status_t spraid_setup_admin_cmd(struct request *req, struct spraid_admin_command *cmd) +{ + spraid_clear_hreq(req); + + memset(cmd, 0, sizeof(*cmd)); + switch (req_op(req)) { + case REQ_OP_DRV_IN: + case REQ_OP_DRV_OUT: + spraid_setup_passthrough(req, cmd); + break; + default: + WARN_ON_ONCE(1); + return BLK_STS_IOERR; + } + + cmd->common.command_id = req->tag; + return BLK_STS_OK; +} + +static void spraid_unmap_data(struct spraid_dev *hdev, struct request *req) +{ + struct spraid_iod *iod = blk_mq_rq_to_pdu(req); + enum dma_data_direction dma_dir = rq_data_dir(req) ? + DMA_TO_DEVICE : DMA_FROM_DEVICE; + + if (iod->nsge) + dma_unmap_sg(hdev->dev, iod->sg, iod->nsge, dma_dir); + + spraid_free_iod_res(hdev, iod); +} + +static blk_status_t spraid_admin_map_data(struct spraid_dev *hdev, struct request *req, + struct spraid_admin_command *cmd) +{ + struct spraid_iod *iod = blk_mq_rq_to_pdu(req); + struct request_queue *admin_q = req->q; + enum dma_data_direction dma_dir = rq_data_dir(req) ? + DMA_TO_DEVICE : DMA_FROM_DEVICE; + blk_status_t ret = BLK_STS_IOERR; + int nr_mapped; + int res; + + sg_init_table(iod->sg, blk_rq_nr_phys_segments(req)); + iod->nsge = blk_rq_map_sg(admin_q, req, iod->sg); + if (!iod->nsge) + goto out; + + dev_log_dbg(hdev->dev, "nseg: %u, nsge: %u\n", + blk_rq_nr_phys_segments(req), iod->nsge); + + ret = BLK_STS_RESOURCE; + nr_mapped = dma_map_sg_attrs(hdev->dev, iod->sg, iod->nsge, dma_dir, DMA_ATTR_NO_WARN); + if (!nr_mapped) + goto out; + + res = spraid_setup_prps(hdev, iod); + if (res) + goto unmap; + cmd->common.dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg)); + cmd->common.dptr.prp2 = cpu_to_le64(iod->first_dma); + return BLK_STS_OK; + +unmap: + dma_unmap_sg(hdev->dev, iod->sg, iod->nsge, dma_dir); +out: + return ret; +} + +static blk_status_t spraid_init_admin_iod(struct request *rq, struct spraid_dev *hdev) +{ + struct spraid_iod *iod = blk_mq_rq_to_pdu(rq); + int nents = blk_rq_nr_phys_segments(rq); + unsigned int size = blk_rq_payload_bytes(rq); + + if (nents > SPRAID_INT_PAGES || size > SPRAID_INT_BYTES(hdev)) { + iod->sg = mempool_alloc(hdev->iod_mempool, GFP_ATOMIC); + if (!iod->sg) + return BLK_STS_RESOURCE; + } else { + iod->sg = iod->inline_sg; + } + + iod->nsge = 0; + iod->use_sgl = 0; + iod->npages = -1; + iod->length = size; + iod->sg_drv_mgmt = true; + + return BLK_STS_OK; +} + +static blk_status_t spraid_queue_admin_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd) +{ + struct spraid_queue *adminq = hctx->driver_data; + struct spraid_dev *hdev = adminq->hdev; + struct request *req = bd->rq; + struct spraid_iod *iod = blk_mq_rq_to_pdu(req); + struct spraid_admin_command cmd; + blk_status_t ret; + + ret = spraid_setup_admin_cmd(req, &cmd); + if (ret) + goto out; + + ret = spraid_init_admin_iod(req, hdev); + if (ret) + goto out; + + if (blk_rq_nr_phys_segments(req)) { + ret = spraid_admin_map_data(hdev, req, &cmd); + if (ret) + goto cleanup_iod; + } + + blk_mq_start_request(req); + spraid_submit_cmd(adminq, &cmd); + return BLK_STS_OK; + +cleanup_iod: + spraid_free_iod_res(hdev, iod); +out: + return ret; +} + +static blk_status_t spraid_error_status(struct request *req) +{ + switch (spraid_admin_req(req)->status & 0x7ff) { + case SPRAID_SC_SUCCESS: + return BLK_STS_OK; + default: + return BLK_STS_IOERR; + } +} + +static void spraid_complete_admin_rq(struct request *req) +{ + struct spraid_iod *iod = blk_mq_rq_to_pdu(req); + struct spraid_dev *hdev = iod->spraidq->hdev; + + if (blk_rq_nr_phys_segments(req)) + spraid_unmap_data(hdev, req); + blk_mq_end_request(req, spraid_error_status(req)); +} + +static int spraid_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned int hctx_idx) +{ + struct spraid_dev *hdev = data; + struct spraid_queue *adminq = &hdev->queues[0]; + + WARN_ON(hctx_idx != 0); + WARN_ON(hdev->admin_tagset.tags[0] != hctx->tags); + + hctx->driver_data = adminq; + return 0; +} + +static int spraid_admin_init_request(struct blk_mq_tag_set *set, struct request *req, + unsigned int hctx_idx, unsigned int numa_node) +{ + struct spraid_dev *hdev = set->driver_data; + struct spraid_iod *iod = blk_mq_rq_to_pdu(req); + struct spraid_queue *adminq = &hdev->queues[0]; + + WARN_ON(!adminq); + iod->spraidq = adminq; + return 0; +} + +static enum blk_eh_timer_return +spraid_admin_timeout(struct request *req, bool reserved) +{ + struct spraid_iod *iod = blk_mq_rq_to_pdu(req); + struct spraid_queue *spraidq = iod->spraidq; + struct spraid_dev *hdev = spraidq->hdev; + + dev_err(hdev->dev, "Admin cid[%d] qid[%d] timeout\n", + req->tag, spraidq->qid); + + if (spraid_poll_cq(spraidq, req->tag)) { + dev_warn(hdev->dev, "cid[%d] qid[%d] timeout, completion polled\n", + req->tag, spraidq->qid); + return BLK_EH_DONE; + } + + spraid_end_admin_request(req, cpu_to_le16(-EINVAL), 0, 0); + return BLK_EH_DONE; +} + +static int spraid_get_ctrl_info(struct spraid_dev *hdev, struct spraid_ctrl_info *ctrl_info) +{ + struct spraid_admin_command cmd; + + cmd.get_info.opcode = SPRAID_ADMIN_GET_INFO; + cmd.get_info.type = SPRAID_GET_INFO_CTRL; + + return spraid_submit_admin_sync_cmd(hdev->admin_q, &cmd, NULL, + ctrl_info, sizeof(struct spraid_ctrl_info), 0, 0, 0); +} + +static int spraid_init_ctrl_info(struct spraid_dev *hdev) +{ + int ret; + + hdev->ctrl_info->nd = cpu_to_le32(240); + hdev->ctrl_info->mdts = 8; + hdev->ctrl_info->max_cmds = cpu_to_le16(4096); + hdev->ctrl_info->max_sge = cpu_to_le16(128); + hdev->ctrl_info->max_channel = cpu_to_le16(4); + hdev->ctrl_info->max_tgt_id = cpu_to_le16(256); + hdev->ctrl_info->max_lun = cpu_to_le16(8); + + ret = spraid_get_ctrl_info(hdev, hdev->ctrl_info); + if (ret) { + dev_err(hdev->dev, "get controller info failed: %d\n", ret); + return 0; + } + return 0; +} + +#define SPRAID_MAX_ADMIN_PAYLOAD_SIZE BIT(16) +static int spraid_alloc_iod_ext_mem_pool(struct spraid_dev *hdev) +{ + u16 max_sge = le16_to_cpu(hdev->ctrl_info->max_sge); + size_t alloc_size; + + alloc_size = spraid_iod_ext_size(hdev, SPRAID_MAX_ADMIN_PAYLOAD_SIZE, + max_sge, true, false); + if (alloc_size > PAGE_SIZE) + dev_warn(hdev->dev, "It is unreasonable for sg allocation more than one page\n"); + hdev->iod_mempool = mempool_create_node(1, mempool_kmalloc, mempool_kfree, + (void *)alloc_size, GFP_KERNEL, hdev->numa_node); + if (!hdev->iod_mempool) { + dev_err(hdev->dev, "Create iod extension memory pool failed\n"); + return -ENOMEM; + } + + return 0; +} + +static void spraid_free_iod_ext_mem_pool(struct spraid_dev *hdev) +{ + mempool_destroy(hdev->iod_mempool); +} + +static int spraid_submit_user_cmd(struct request_queue *q, struct spraid_admin_command *cmd, + void __user *ubuffer, unsigned int bufflen, u32 *result, + unsigned int timeout) +{ + struct request *req; + struct bio *bio = NULL; + int ret; + + req = spraid_alloc_admin_request(q, cmd, 0); + if (IS_ERR(req)) + return PTR_ERR(req); + + req->timeout = timeout ? timeout : ADMIN_TIMEOUT; + spraid_admin_req(req)->flags |= SPRAID_REQ_USERCMD; + + if (ubuffer && bufflen) { + ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, GFP_KERNEL); + if (ret) + goto out; + bio = req->bio; + } + blk_execute_rq(req->q, NULL, req, 0); + if (spraid_admin_req(req)->flags & SPRAID_REQ_CANCELLED) + ret = -EINTR; + else + ret = spraid_admin_req(req)->status; + if (result) { + result[0] = spraid_admin_req(req)->result0; + result[1] = spraid_admin_req(req)->result1; + } + if (bio) + blk_rq_unmap_user(bio); +out: + blk_mq_free_request(req); + return ret; +} + +static int spraid_user_admin_cmd(struct spraid_dev *hdev, + struct spraid_passthru_common_cmd __user *ucmd) +{ + struct spraid_passthru_common_cmd cmd; + struct spraid_admin_command admin_cmd; + u32 timeout = 0; + int status; + + if (!capable(CAP_SYS_ADMIN)) { + dev_err(hdev->dev, "Current user hasn't administrator right, reject service\n"); + return -EACCES; + } + + if (copy_from_user(&cmd, ucmd, sizeof(cmd))) { + dev_err(hdev->dev, "Copy command from user space to kernel space failed\n"); + return -EFAULT; + } + + if (cmd.flags) { + dev_err(hdev->dev, "Invalid flags in user command\n"); + return -EINVAL; + } + + dev_log_dbg(hdev->dev, "user_admin_cmd opcode: 0x%x, subopcode: 0x%x", + cmd.opcode, cmd.cdw2 & 0x7ff); + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.common.opcode = cmd.opcode; + admin_cmd.common.flags = cmd.flags; + admin_cmd.common.hdid = cpu_to_le32(cmd.nsid); + admin_cmd.common.cdw2[0] = cpu_to_le32(cmd.cdw2); + admin_cmd.common.cdw2[1] = cpu_to_le32(cmd.cdw3); + admin_cmd.common.cdw10 = cpu_to_le32(cmd.cdw10); + admin_cmd.common.cdw11 = cpu_to_le32(cmd.cdw11); + admin_cmd.common.cdw12 = cpu_to_le32(cmd.cdw12); + admin_cmd.common.cdw13 = cpu_to_le32(cmd.cdw13); + admin_cmd.common.cdw14 = cpu_to_le32(cmd.cdw14); + admin_cmd.common.cdw15 = cpu_to_le32(cmd.cdw15); + + if (cmd.timeout_ms) + timeout = msecs_to_jiffies(cmd.timeout_ms); + + status = spraid_submit_user_cmd(hdev->admin_q, &admin_cmd, + (void __user *)(uintptr_t)cmd.addr, cmd.info_1.data_len, + &cmd.result0, timeout); + if (status >= 0) { + if (put_user(cmd.result0, &ucmd->result0)) + return -EFAULT; + if (put_user(cmd.result1, &ucmd->result1)) + return -EFAULT; + } + + return status; +} + +static int hdev_open(struct inode *inode, struct file *file) +{ + struct spraid_dev *hdev = + container_of(inode->i_cdev, struct spraid_dev, cdev); + file->private_data = hdev; + return 0; +} + +static long hdev_ioctl(struct file *file, u32 cmd, unsigned long arg) +{ + struct spraid_dev *hdev = file->private_data; + void __user *argp = (void __user *)arg; + + switch (cmd) { + case SPRAID_IOCTL_ADMIN_CMD: + return spraid_user_admin_cmd(hdev, argp); + default: + return -ENOTTY; + } +} + +static const struct file_operations spraid_dev_fops = { + .owner = THIS_MODULE, + .open = hdev_open, + .unlocked_ioctl = hdev_ioctl, + .compat_ioctl = hdev_ioctl, +}; + +static int spraid_create_cdev(struct spraid_dev *hdev) +{ + int ret; + + device_initialize(&hdev->ctrl_device); + hdev->ctrl_device.devt = MKDEV(MAJOR(spraid_chr_devt), hdev->instance); + hdev->ctrl_device.class = spraid_class; + hdev->ctrl_device.parent = hdev->dev; + dev_set_drvdata(&hdev->ctrl_device, hdev); + ret = dev_set_name(&hdev->ctrl_device, "spraid%d", hdev->instance); + if (ret) + return ret; + cdev_init(&hdev->cdev, &spraid_dev_fops); + hdev->cdev.owner = THIS_MODULE; + ret = cdev_device_add(&hdev->cdev, &hdev->ctrl_device); + if (ret) { + dev_err(hdev->dev, "Add cdev failed, ret: %d", ret); + put_device(&hdev->ctrl_device); + kfree_const(hdev->ctrl_device.kobj.name); + return ret; + } + + return 0; +} + +static inline void spraid_remove_cdev(struct spraid_dev *hdev) +{ + cdev_device_del(&hdev->cdev, &hdev->ctrl_device); +} + +static const struct blk_mq_ops spraid_admin_mq_ops = { + .queue_rq = spraid_queue_admin_rq, + .complete = spraid_complete_admin_rq, + .init_hctx = spraid_admin_init_hctx, + .init_request = spraid_admin_init_request, + .timeout = spraid_admin_timeout, +}; + +static void spraid_remove_admin_tagset(struct spraid_dev *hdev) +{ + if (hdev->admin_q && !blk_queue_dying(hdev->admin_q)) { + blk_mq_unquiesce_queue(hdev->admin_q); + blk_cleanup_queue(hdev->admin_q); + blk_mq_free_tag_set(&hdev->admin_tagset); + } +} + +static int spraid_alloc_admin_tags(struct spraid_dev *hdev) +{ + if (!hdev->admin_q) { + hdev->admin_tagset.ops = &spraid_admin_mq_ops; + hdev->admin_tagset.nr_hw_queues = 1; + + hdev->admin_tagset.queue_depth = SPRAID_AQ_MQ_TAG_DEPTH; + hdev->admin_tagset.timeout = ADMIN_TIMEOUT; + hdev->admin_tagset.numa_node = hdev->numa_node; + hdev->admin_tagset.cmd_size = + spraid_cmd_size(hdev, true, false); + hdev->admin_tagset.flags = BLK_MQ_F_NO_SCHED; + hdev->admin_tagset.driver_data = hdev; + + if (blk_mq_alloc_tag_set(&hdev->admin_tagset)) { + dev_err(hdev->dev, "Allocate admin tagset failed\n"); + return -ENOMEM; + } + + hdev->admin_q = blk_mq_init_queue(&hdev->admin_tagset); + if (IS_ERR(hdev->admin_q)) { + dev_err(hdev->dev, "Initialize admin request queue failed\n"); + blk_mq_free_tag_set(&hdev->admin_tagset); + return -ENOMEM; + } + if (!blk_get_queue(hdev->admin_q)) { + dev_err(hdev->dev, "Get admin request queue failed\n"); + spraid_remove_admin_tagset(hdev); + hdev->admin_q = NULL; + return -ENODEV; + } + } else { + blk_mq_unquiesce_queue(hdev->admin_q); + } + return 0; +} + +static bool spraid_check_scmd_completed(struct scsi_cmnd *scmd) +{ + struct spraid_dev *hdev = shost_priv(scmd->device->host); + struct spraid_iod *iod = scsi_cmd_priv(scmd); + struct spraid_queue *spraidq; + u16 hwq, cid; + + spraid_get_tag_from_scmd(scmd, &hwq, &cid); + spraidq = &hdev->queues[hwq]; + if (READ_ONCE(iod->state) == SPRAID_CMD_COMPLETE || spraid_poll_cq(spraidq, cid)) { + dev_warn(hdev->dev, "cid[%d], qid[%d] has been completed\n", + cid, spraidq->qid); + return true; + } + return false; +} + +static enum blk_eh_timer_return spraid_scmd_timeout(struct scsi_cmnd *scmd) +{ + struct spraid_iod *iod = scsi_cmd_priv(scmd); + unsigned int timeout = scmd->device->request_queue->rq_timeout; + + if (spraid_check_scmd_completed(scmd)) + goto out; + + if (time_after(jiffies, scmd->jiffies_at_alloc + timeout)) { + if (cmpxchg(&iod->state, SPRAID_CMD_IN_FLIGHT, SPRAID_CMD_TIMEOUT) == + SPRAID_CMD_IN_FLIGHT) { + return BLK_EH_DONE; + } + } +out: + return BLK_EH_RESET_TIMER; +} + +/* send abort command by admin queue temporary */ +static int spraid_send_abort_cmd(struct spraid_dev *hdev, u32 hdid, u16 qid, u16 cid) +{ + struct spraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.abort.opcode = SPRAID_ADMIN_ABORT_CMD; + admin_cmd.abort.hdid = cpu_to_le32(hdid); + admin_cmd.abort.sqid = cpu_to_le16(qid); + admin_cmd.abort.cid = cpu_to_le16(cid); + + return spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, NULL, + NULL, 0, 0, 0, 0); +} + +/* send reset command by admin quueue temporary */ +static int spraid_send_reset_cmd(struct spraid_dev *hdev, int type, u32 hdid) +{ + struct spraid_admin_command admin_cmd; + + memset(&admin_cmd, 0, sizeof(admin_cmd)); + admin_cmd.reset.opcode = SPRAID_ADMIN_RESET; + admin_cmd.reset.hdid = cpu_to_le32(hdid); + admin_cmd.reset.type = type; + + return spraid_submit_admin_sync_cmd(hdev->admin_q, &admin_cmd, NULL, + NULL, 0, 0, 0, 0); +} + +static void spraid_back_fault_cqe(struct spraid_queue *ioq, struct spraid_completion *cqe) +{ + struct spraid_dev *hdev = ioq->hdev; + struct blk_mq_tags *tags; + struct scsi_cmnd *scmd; + struct spraid_iod *iod; + struct request *req; + + tags = hdev->shost->tag_set.tags[ioq->qid - 1]; + req = blk_mq_tag_to_rq(tags, cqe->cmd_id); + if (unlikely(!req || !blk_mq_request_started(req))) + return; + + scmd = blk_mq_rq_to_pdu(req); + iod = scsi_cmd_priv(scmd); + + spraid_map_status(iod, scmd, cqe); + if (iod->nsge) + scsi_dma_unmap(scmd); + spraid_free_iod_res(hdev, iod); + scmd->scsi_done(scmd); + dev_warn(hdev->dev, "Back fault CQE, cid[%d], qid[%d]\n", + cqe->cmd_id, ioq->qid); +} + +static void spraid_back_all_io(struct spraid_dev *hdev) +{ + int i, j; + struct spraid_queue *ioq; + struct spraid_completion cqe = {0}; + + cqe.status = cpu_to_le16(FW_STAT_ERROR); + for (i = 1; i < hdev->online_queues; i++) { + ioq = &hdev->queues[i]; + for (j = 0; j < ioq->q_depth; j++) { + cqe.cmd_id = j; + spraid_back_fault_cqe(ioq, &cqe); + } + } +} + +static void spraid_dev_disable(struct spraid_dev *hdev, bool shutdown) +{ + struct spraid_queue *adminq = &hdev->queues[0]; + u16 start, end; + unsigned long timeout = jiffies + 600 * HZ; + + if (shutdown) + spraid_shutdown_ctrl(hdev); + else + spraid_disable_ctrl(hdev); + + while (!time_after(jiffies, timeout)) { + if (!spraid_wait_ready(hdev, hdev->cap, false)) { + dev_info(hdev->dev, "[%s] wait ready succ\n", __func__); + break; + } + } + + if (hdev->queue_count == 0 || hdev->queue_count > 129) { + dev_err(hdev->dev, "[%s] err, queue has been delete, queue_count: %d\n", + __func__, hdev->queue_count); + return; + } + + spin_lock_irq(&adminq->cq_lock); + spraid_process_cq(adminq, &start, &end, -1); + spin_unlock_irq(&adminq->cq_lock); + spraid_complete_cqes(adminq, start, end); + + spraid_pci_disable(hdev); + + spraid_back_all_io(hdev); + + spraid_free_io_queues(hdev); + spraid_free_admin_queue(hdev); + + hdev->online_queues = 0; +} + +static int spraid_reset_work(struct spraid_dev *hdev) +{ + int ret; + + if (hdev->state == SPRAID_RESETTING) { + dev_info(hdev->dev, "host reset is already running\n"); + return -EBUSY; + } + dev_info(hdev->dev, "first enter host reset\n"); + hdev->state = SPRAID_RESETTING; + + if (hdev->ctrl_config & SPRAID_CC_ENABLE) { + dev_info(hdev->dev, "[%s] start dev_disable\n", __func__); + spraid_dev_disable(hdev, false); + } + + ret = spraid_pci_enable(hdev); + if (ret) + goto out; + + ret = spraid_setup_admin_queue(hdev); + if (ret) + goto pci_disable; + + ret = spraid_alloc_admin_tags(hdev); + if (ret) + goto disable_admin_q; + + ret = spraid_setup_io_queues(hdev); + if (ret || hdev->online_queues <= hdev->shost->nr_hw_queues) + goto remove_io_queues; + + hdev->state = SPRAID_LIVE; + + spraid_send_aen(hdev); + + ret = spraid_dev_list_init(hdev); + if (ret) + goto remove_io_queues; + + return 0; + +remove_io_queues: + spraid_remove_io_queues(hdev); + spraid_remove_admin_tagset(hdev); +disable_admin_q: + spraid_disable_admin_queue(hdev, false); +pci_disable: + spraid_pci_disable(hdev); +out: + hdev->state = SPRAID_DEAD; + dev_err(hdev->dev, "err, host reset failed\n"); + return -ENODEV; +} + +static int spraid_wait_abnl_cmd_done(struct spraid_iod *iod) +{ + u16 times = 0; + + do { + if (READ_ONCE(iod->state) == SPRAID_CMD_TMO_COMPLETE) + break; + msleep(500); + times++; + } while (times <= SPRAID_WAIT_ABNL_CMD_TIMEOUT); + + /* wait command completion timeout after abort/reset success */ + if (times >= SPRAID_WAIT_ABNL_CMD_TIMEOUT) + return -ETIMEDOUT; + + return 0; +} + +static int spraid_abort_handler(struct scsi_cmnd *scmd) +{ + struct spraid_dev *hdev = shost_priv(scmd->device->host); + struct spraid_iod *iod = scsi_cmd_priv(scmd); + struct spraid_sdev_hostdata *hostdata; + u16 hwq, cid; + int ret; + + scsi_print_command(scmd); + + if (!spraid_wait_abnl_cmd_done(iod) || spraid_check_scmd_completed(scmd)) + return SUCCESS; + + hostdata = scmd->device->hostdata; + spraid_get_tag_from_scmd(scmd, &hwq, &cid); + + dev_warn(hdev->dev, "cid[%d] qid[%d] timeout, aborting\n", cid, hwq); + ret = spraid_send_abort_cmd(hdev, hostdata->hdid, hwq, cid); + if (ret != ADMIN_ERR_TIMEOUT) { + ret = spraid_wait_abnl_cmd_done(iod); + if (ret) { + dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, not found\n", cid, hwq); + return FAILED; + } + dev_warn(hdev->dev, "cid[%d] qid[%d] abort succ\n", cid, hwq); + return SUCCESS; + } + dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, timeout\n", cid, hwq); + return FAILED; +} + +static int spraid_tgt_reset_handler(struct scsi_cmnd *scmd) +{ + struct spraid_dev *hdev = shost_priv(scmd->device->host); + struct spraid_iod *iod = scsi_cmd_priv(scmd); + struct spraid_sdev_hostdata *hostdata; + u16 hwq, cid; + int ret; + + scsi_print_command(scmd); + + if (!spraid_wait_abnl_cmd_done(iod) || spraid_check_scmd_completed(scmd)) + return SUCCESS; + + hostdata = scmd->device->hostdata; + spraid_get_tag_from_scmd(scmd, &hwq, &cid); + + dev_warn(hdev->dev, "cid[%d] qid[%d] timeout, target reset\n", cid, hwq); + ret = spraid_send_reset_cmd(hdev, SPRAID_RESET_TARGET, hostdata->hdid); + if (ret == 0) { + ret = spraid_wait_abnl_cmd_done(iod); + if (ret) { + dev_warn(hdev->dev, "cid[%d] qid[%d]target reset failed, not found\n", + cid, hwq); + return FAILED; + } + + dev_warn(hdev->dev, "cid[%d] qid[%d] target reset success\n", cid, hwq); + return SUCCESS; + } + + dev_warn(hdev->dev, "cid[%d] qid[%d] ret[%d] target reset failed\n", cid, hwq, ret); + return FAILED; +} + +static int spraid_bus_reset_handler(struct scsi_cmnd *scmd) +{ + struct spraid_dev *hdev = shost_priv(scmd->device->host); + struct spraid_iod *iod = scsi_cmd_priv(scmd); + struct spraid_sdev_hostdata *hostdata; + u16 hwq, cid; + int ret; + + scsi_print_command(scmd); + + if (!spraid_wait_abnl_cmd_done(iod) || spraid_check_scmd_completed(scmd)) + return SUCCESS; + + hostdata = scmd->device->hostdata; + spraid_get_tag_from_scmd(scmd, &hwq, &cid); + + dev_warn(hdev->dev, "cid[%d] qid[%d] timeout, bus reset\n", cid, hwq); + ret = spraid_send_reset_cmd(hdev, SPRAID_RESET_BUS, hostdata->hdid); + if (ret == 0) { + ret = spraid_wait_abnl_cmd_done(iod); + if (ret) { + dev_warn(hdev->dev, "cid[%d] qid[%d] bus reset failed, not found\n", + cid, hwq); + return FAILED; + } + + dev_warn(hdev->dev, "cid[%d] qid[%d] bus reset succ\n", cid, hwq); + return SUCCESS; + } + + dev_warn(hdev->dev, "cid[%d] qid[%d] ret[%d] bus reset failed\n", cid, hwq, ret); + return FAILED; +} + +static int spraid_shost_reset_handler(struct scsi_cmnd *scmd) +{ + u16 hwq, cid; + struct spraid_dev *hdev = shost_priv(scmd->device->host); + + scsi_print_command(scmd); + if (spraid_check_scmd_completed(scmd)) + return SUCCESS; + + spraid_get_tag_from_scmd(scmd, &hwq, &cid); + dev_warn(hdev->dev, "cid[%d] qid[%d] host reset\n", cid, hwq); + + if (spraid_reset_work(hdev)) { + dev_warn(hdev->dev, "cid[%d] qid[%d] host reset failed\n", cid, hwq); + return FAILED; + } + + dev_warn(hdev->dev, "cid[%d] qid[%d] host reset success\n", cid, hwq); + + return SUCCESS; +} + +static struct scsi_host_template spraid_driver_template = { + .module = THIS_MODULE, + .name = "RMT Logic SpRAID driver", + .proc_name = "spraid", + .queuecommand = spraid_queue_command, + .slave_alloc = spraid_slave_alloc, + .slave_destroy = spraid_slave_destroy, + .slave_configure = spraid_slave_configure, + .eh_timed_out = spraid_scmd_timeout, + .eh_abort_handler = spraid_abort_handler, + .eh_target_reset_handler = spraid_tgt_reset_handler, + .eh_bus_reset_handler = spraid_bus_reset_handler, + .eh_host_reset_handler = spraid_shost_reset_handler, + .change_queue_depth = scsi_change_queue_depth, + .this_id = -1, +}; + +static int spraid_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct spraid_dev *hdev; + struct Scsi_Host *shost; + int node, ret; + + shost = scsi_host_alloc(&spraid_driver_template, sizeof(*hdev)); + if (!shost) { + dev_err(&pdev->dev, "Failed to allocate scsi host\n"); + return -ENOMEM; + } + hdev = shost_priv(shost); + hdev->pdev = pdev; + hdev->dev = get_device(&pdev->dev); + + node = dev_to_node(hdev->dev); + if (node == NUMA_NO_NODE) { + node = first_memory_node; + set_dev_node(hdev->dev, node); + } + hdev->numa_node = node; + hdev->shost = shost; + pci_set_drvdata(pdev, hdev); + + ret = spraid_dev_map(hdev); + if (ret) + goto put_dev; + + init_rwsem(&hdev->devices_rwsem); + INIT_WORK(&hdev->aen_work, spraid_async_event_work); + INIT_WORK(&hdev->scan_work, spraid_scan_work); + INIT_WORK(&hdev->timesyn_work, spraid_timesyn_work); + + ret = spraid_alloc_resources(hdev); + if (ret) + goto dev_unmap; + + ret = spraid_pci_enable(hdev); + if (ret) + goto resources_free; + + ret = spraid_setup_admin_queue(hdev); + if (ret) + goto pci_disable; + + ret = spraid_alloc_admin_tags(hdev); + if (ret) + goto disable_admin_q; + + ret = spraid_init_ctrl_info(hdev); + if (ret) + goto free_admin_tagset; + + ret = spraid_alloc_iod_ext_mem_pool(hdev); + if (ret) + goto free_admin_tagset; + + ret = spraid_setup_io_queues(hdev); + if (ret) + goto free_iod_mempool; + + spraid_shost_init(hdev); + + ret = scsi_add_host(hdev->shost, hdev->dev); + if (ret) { + dev_err(hdev->dev, "Add shost to system failed, ret: %d\n", + ret); + goto remove_io_queues; + } + + spraid_send_aen(hdev); + + ret = spraid_create_cdev(hdev); + if (ret) + goto remove_io_queues; + + if (hdev->online_queues == SPRAID_ADMIN_QUEUE_NUM) { + dev_warn(hdev->dev, "warn only admin queue can be used\n"); + return 0; + } + + hdev->state = SPRAID_LIVE; + spraid_send_aen(hdev); + + ret = spraid_dev_list_init(hdev); + if (ret) + goto remove_cdev; + + ret = spraid_configure_timestamp(hdev); + if (ret) + dev_warn(hdev->dev, "init set timestamp failed\n"); + + scsi_scan_host(hdev->shost); + + return 0; + +remove_cdev: + spraid_remove_cdev(hdev); +remove_io_queues: + spraid_remove_io_queues(hdev); +free_iod_mempool: + spraid_free_iod_ext_mem_pool(hdev); +free_admin_tagset: + spraid_remove_admin_tagset(hdev); +disable_admin_q: + spraid_disable_admin_queue(hdev, false); +pci_disable: + spraid_pci_disable(hdev); +resources_free: + spraid_free_resources(hdev); +dev_unmap: + spraid_dev_unmap(hdev); +put_dev: + put_device(hdev->dev); + scsi_host_put(shost); + + return -ENODEV; +} + +static void spraid_remove(struct pci_dev *pdev) +{ + struct spraid_dev *hdev = pci_get_drvdata(pdev); + struct Scsi_Host *shost = hdev->shost; + + scsi_remove_host(shost); + + kfree(hdev->devices); + spraid_remove_cdev(hdev); + spraid_remove_io_queues(hdev); + spraid_free_iod_ext_mem_pool(hdev); + spraid_remove_admin_tagset(hdev); + spraid_disable_admin_queue(hdev, false); + spraid_pci_disable(hdev); + spraid_free_resources(hdev); + spraid_dev_unmap(hdev); + put_device(hdev->dev); + scsi_host_put(shost); +} + +static const struct pci_device_id spraid_id_table[] = { + { PCI_DEVICE(PCI_VENDOR_ID_RMT_LOGIC, SPRAID_SERVER_DEVICE_HAB_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_RMT_LOGIC, SPRAID_SERVER_DEVICE_RAID_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_RMT_LOGIC, SPRAID_SERVER_DEVICE_NVME_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_RMT_LOGIC, SPRAID_SERVER_DEVICE_FANOUT_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_RMT_LOGIC, PCI_DEVICE_ID_RMT_TEST) }, + { 0, } +}; +MODULE_DEVICE_TABLE(pci, spraid_id_table); + +static struct pci_driver spraid_driver = { + .name = "spraid", + .id_table = spraid_id_table, + .probe = spraid_probe, + .remove = spraid_remove +}; + +static int __init spraid_init(void) +{ + int ret; + + spraid_wq = alloc_workqueue("spraid-wq", WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); + if (!spraid_wq) + return -ENOMEM; + + ret = alloc_chrdev_region(&spraid_chr_devt, 0, SPRAID_MINORS, "spraid"); + if (ret < 0) + goto destroy_wq; + + spraid_class = class_create(THIS_MODULE, "spraid"); + if (IS_ERR(spraid_class)) { + ret = PTR_ERR(spraid_class); + goto unregister_chrdev; + } + + ret = pci_register_driver(&spraid_driver); + if (ret < 0) + goto destroy_class; + + return 0; + +destroy_class: + class_destroy(spraid_class); +unregister_chrdev: + unregister_chrdev_region(spraid_chr_devt, SPRAID_MINORS); +destroy_wq: + destroy_workqueue(spraid_wq); + + return ret; +} + +static void __exit spraid_exit(void) +{ + pci_unregister_driver(&spraid_driver); + class_destroy(spraid_class); + unregister_chrdev_region(spraid_chr_devt, SPRAID_MINORS); + destroy_workqueue(spraid_wq); + ida_destroy(&spraid_instance_ida); +} + +MODULE_AUTHOR("Ramaxel Memory Technology"); +MODULE_DESCRIPTION("Ramaxel Memory Technology SPraid Driver"); +MODULE_LICENSE("GPL"); +MODULE_VERSION(SPRAID_DRV_VERSION); +module_init(spraid_init); +module_exit(spraid_exit); -- 2.17.1

2 1

关于openEuler kernel commit message格式
by Kai Liu 10 Sep '21

10 Sep '21

Hi，关于openEuler kernel的git commit message格式和其中信息的说明，请问哪里可以找到相关文档？感谢！比如我看到类似于这样的信息： mainline inclusion from mainline-5.14-rc7 commit f1c8e410cdac5df42a7806e49efde2859a10fecd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I48J5M CVE: NA 有mainline inclusion，stable inclusion，hulk inclusion等等很多。有些有提供了commit那行，有些又没有。在哪里可以看到一个明确的文档介绍这个约定？ Regards, Kai Liu 刘恺 E-mail: kai.liu(a)suse.com | Mobile #: +86 186 6591 9086 | TZ: UTC+8

3 3

[PATCH kernel-4.19] share_pool: Adjust the position of do_mmap checker
by Yang Yingliang 09 Sep '21

09 Sep '21

From: Tang Yizhou <tangyizhou(a)huawei.com> ascend inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I46DPJ CVE: NA ------------------------------------------------- do_mmap checker should be put before MAP_FIXED. Signed-off-by: Tang Yizhou <tangyizhou(a)huawei.com> Reviewed-by: Weilong Chen <chenweilong(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/mmap.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index e2b53084f2a71..378e1869ac7a0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2353,12 +2353,12 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, if (len > TASK_SIZE - mmap_min_addr) return -ENOMEM; - if (flags & MAP_FIXED) - return addr; - if (sp_check_mmap_addr(addr, flags)) return -EINVAL; + if (flags & MAP_FIXED) + return addr; + if (addr) { addr = PAGE_ALIGN(addr); @@ -2407,12 +2407,12 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (len > TASK_SIZE - mmap_min_addr) return -ENOMEM; - if (flags & MAP_FIXED) - return addr; - if (sp_check_mmap_addr(addr, flags)) return -EINVAL; + if (flags & MAP_FIXED) + return addr; + /* requesting a specific address */ if (addr) { addr = PAGE_ALIGN(addr); -- 2.25.1

1 0

[PATCH kernel-4.19] uio: introduce UIO_MEM_IOVA
by Yang Yingliang 09 Sep '21

09 Sep '21

From: Stephen Hemminger <stephen(a)networkplumber.org> mainline inclusion from mainline-v4.20-rc1 commit bfddabfa230452cea32aae82f9cd85ab22601acf category: bugfix bugzilla: 180968 CVE: NA ------------------------------------------------- Introduce the concept of mapping physical memory locations that are normal memory. The new type UIO_MEM_IOVA are similar to existing UIO_MEM_PHYS but the backing memory is not marked as uncached. Also, indent related switch to the currently used style. Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Chen Huang <chenhuang5(a)huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/uio/uio.c | 24 +++++++++++++----------- include/linux/uio_driver.h | 1 + 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 1876e4c4b81a0..f43498e5cfb85 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -738,7 +738,8 @@ static int uio_mmap_physical(struct vm_area_struct *vma) return -EINVAL; vma->vm_ops = &uio_physical_vm_ops; - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + if (idev->info->mem[mi].memtype == UIO_MEM_PHYS) + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); /* * We cannot use the vm_iomap_memory() helper here, @@ -795,18 +796,19 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) } switch (idev->info->mem[mi].memtype) { - case UIO_MEM_PHYS: - ret = uio_mmap_physical(vma); - break; - case UIO_MEM_LOGICAL: - case UIO_MEM_VIRTUAL: - ret = uio_mmap_logical(vma); - break; - default: - ret = -EINVAL; + case UIO_MEM_IOVA: + case UIO_MEM_PHYS: + ret = uio_mmap_physical(vma); + break; + case UIO_MEM_LOGICAL: + case UIO_MEM_VIRTUAL: + ret = uio_mmap_logical(vma); + break; + default: + ret = -EINVAL; } -out: + out: mutex_unlock(&idev->info_lock); return ret; } diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h index 6f8b68cd460f8..a3cd7cb67a69f 100644 --- a/include/linux/uio_driver.h +++ b/include/linux/uio_driver.h @@ -133,6 +133,7 @@ extern void uio_event_notify(struct uio_info *info); #define UIO_MEM_PHYS 1 #define UIO_MEM_LOGICAL 2 #define UIO_MEM_VIRTUAL 3 +#define UIO_MEM_IOVA 4 /* defines for uio_port->porttype */ #define UIO_PORT_NONE 0 -- 2.25.1

1 0

[PATCH kernel-4.19 1/2] mm/mempolicy.c: check range first in queue_pages_test_walk
by Yang Yingliang 09 Sep '21

09 Sep '21

From: Li Xinhai <lixinhai.lxh(a)gmail.com> mainline inclusion from mainline-v5.5-rc1 commit a18b3ac25bb7be4781cb9e6d31f3e57b3ba01b06 category: bugfix bugzilla: 97909 CVE: NA ------------------------------------------------- Patch series "mm: Fix checking unmapped holes for mbind", v4. This patchset fix checking unmapped holes for mbind(). First patch makes sure the vma been correctly tracked in .test_walk(), so each time when .test_walk() is called, the neighborhood of two vma is correct. Current problem is that the !vma_migratable() check could cause return immediately without update tracking to vma. Second patch fix the inconsistent report of EFAULT when mbind() is called for MPOL_DEFAULT and non MPOL_DEFAULT cases, so application do not need to have workaround code to handle this special behavior. Currently there are two problems, one is that the .test_walk() can not know there is hole at tail side of range, because .test_walk() only call for vma not for hole. The other one is that mbind_range() checks for hole at head side of range but do not consider the MPOL_MF_DISCONTIG_OK flag as done in .test_walk(). This patch (of 2): Checking unmapped hole and updating the previous vma must be handled first, otherwise the unmapped hole could be calculated from a wrong previous vma. Several commits were relevant to this error: - commit 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") This commit was correct, the VM_PFNMAP check was after updating previous vma - commit 48684a65b4e3 ("mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)") This commit added VM_PFNMAP check before updating previous vma. Then, there were two VM_PFNMAP check did same thing twice. - commit acda0c334028 ("mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_page s_range()") This commit tried to fix the duplicated VM_PFNMAP check, but it wrongly removed the one which was after updating vma. Link: http://lkml.kernel.org/r/1573218104-11021-2-git-send-email-lixinhai.lxh@gma… Fixes: acda0c334028 (mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_pages_range()) Signed-off-by: Li Xinhai <lixinhai.lxh(a)gmail.com> Reviewed-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Hugh Dickins <hughd(a)google.com> Cc: linux-man <linux-man(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> Reviewed-by: tong tiangen <tongtiangen(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/mempolicy.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3a835f96c8fea..7b4ba2f355911 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -657,6 +657,16 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, unsigned long endvma = vma->vm_end; unsigned long flags = qp->flags; + /* range check first */ + if (!(flags & MPOL_MF_DISCONTIG_OK)) { + if (!vma->vm_next && vma->vm_end < end) + return -EFAULT; + if (qp->prev && qp->prev->vm_end < vma->vm_start) + return -EFAULT; + } + + qp->prev = vma; + /* * Need check MPOL_MF_STRICT to return -EIO if possible * regardless of vma_migratable @@ -670,15 +680,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, if (vma->vm_start > start) start = vma->vm_start; - if (!(flags & MPOL_MF_DISCONTIG_OK)) { - if (!vma->vm_next && vma->vm_end < end) - return -EFAULT; - if (qp->prev && qp->prev->vm_end < vma->vm_start) - return -EFAULT; - } - - qp->prev = vma; - if (flags & MPOL_MF_LAZY) { /* Similar to task_numa_work, skip inaccessible VMAs */ if (!is_vm_hugetlb_page(vma) && -- 2.25.1

1 1

[PATCH kernel-4.19 1/2] share_pool: Free newly generated id only when necessary
by Yang Yingliang 08 Sep '21

08 Sep '21

From: Tang Yizhou <tangyizhou(a)huawei.com> ascend inclusion category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- Once sp group is created, the generated id will be freed in sp_group_drop. Before that, we should call free_sp_group_id() when error occurs. Signed-off-by: Tang Yizhou <tangyizhou(a)huawei.com> Reviewed-by: Weilong Chen <chenweilong(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/share_pool.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/mm/share_pool.c b/mm/share_pool.c index 6a4da9ac83e14..2d9c0a8916211 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -349,6 +349,12 @@ static void free_sp_group_id(unsigned int spg_id) ida_free(&sp_group_id_ida, spg_id); } +static void free_new_spg_id(bool new, int spg_id) +{ + if (new) + free_sp_group_id(spg_id); +} + static void free_sp_group(struct sp_group *spg) { fput(spg->file); @@ -665,7 +671,8 @@ int sp_group_add_task(int pid, int spg_id) rcu_read_unlock(); if (ret) { up_write(&sp_group_sem); - goto out_free_id; + free_new_spg_id(id_newly_generated, spg_id); + goto out; } /* @@ -682,12 +689,14 @@ int sp_group_add_task(int pid, int spg_id) */ mm = get_task_mm(tsk->group_leader); if (!mm) { - ret = -ESRCH; up_write(&sp_group_sem); + ret = -ESRCH; + free_new_spg_id(id_newly_generated, spg_id); goto out_put_task; } else if (mm->sp_group) { - ret = -EEXIST; up_write(&sp_group_sem); + ret = -EEXIST; + free_new_spg_id(id_newly_generated, spg_id); goto out_put_mm; } @@ -695,6 +704,7 @@ int sp_group_add_task(int pid, int spg_id) if (IS_ERR(spg)) { up_write(&sp_group_sem); ret = PTR_ERR(spg); + free_new_spg_id(id_newly_generated, spg_id); goto out_put_mm; } @@ -813,9 +823,7 @@ int sp_group_add_task(int pid, int spg_id) mmput(mm); out_put_task: put_task_struct(tsk); -out_free_id: - if (unlikely(ret) && id_newly_generated) - free_sp_group_id((unsigned int)spg_id); +out: return ret == 0 ? spg_id : ret; } EXPORT_SYMBOL_GPL(sp_group_add_task); -- 2.25.1

1 1

[PATCH openEuler-1.0-LTS 1/2] net: qrtr: fix OOB Read in qrtr_endpoint_post
by Yang Yingliang 08 Sep '21

08 Sep '21

From: Pavel Skripkin <paskripkin(a)gmail.com> stable inclusion from linux-4.19.196 commit f8111c0d7ed42ede41a3d0d393b104de0730a8a6 CVE: CVE-2021-3743 -------------------------------- [ Upstream commit ad9d24c9429e2159d1e279dc3a83191ccb4daf1d ] Syzbot reported slab-out-of-bounds Read in qrtr_endpoint_post. The problem was in wrong _size_ type: if (len != ALIGN(size, 4) + hdrlen) goto err; If size from qrtr_hdr is 4294967293 (0xfffffffd), the result of ALIGN(size, 4) will be 0. In case of len == hdrlen and size == 4294967293 in header this check won't fail and skb_put_data(skb, data + hdrlen, size); will read out of bound from data, which is hdrlen allocated block. Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Reported-and-tested-by: syzbot+1917d778024161609247(a)syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com> Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Reviewed-by: Yue Haibing <yuehaibing(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- net/qrtr/qrtr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c index e720763376387..320933db88915 100644 --- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -262,7 +262,7 @@ int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len) const struct qrtr_hdr_v2 *v2; struct sk_buff *skb; struct qrtr_cb *cb; - unsigned int size; + size_t size; unsigned int ver; size_t hdrlen; -- 2.25.1

1 1

[PATCH kernel-4.19 1/3] mm, slub: stop freeing kmem_cache_node structures on node offline
by Yang Yingliang 08 Sep '21

08 Sep '21

From: Vlastimil Babka <vbabka(a)suse.cz> mainline inclusion from mainline-v5.12-rc1 commit 666716fd267df0007dfbb6480cd79dd5b05da4cc category: bugfix bugzilla: 175589 CVE: NA ------------------------------------------------- Patch series "mm, slab, slub: remove cpu and memory hotplug locks". Some related work caused me to look at how we use get/put_mems_online() and get/put_online_cpus() during kmem cache creation/descruction/shrinking, and realize that it should be actually safe to remove all of that with rather small effort (as e.g. Michal Hocko suspected in some of the past discussions already). This has the benefit to avoid rather heavy locks that have caused locking order issues already in the past. So this is the result, Patches 2 and 3 remove memory hotplug and cpu hotplug locking, respectively. Patch 1 is due to realization that in fact some races exist despite the locks (even if not removed), but the most sane solution is not to introduce more of them, but rather accept some wasted memory in scenarios that should be rare anyway (full memory hot remove), as we do the same in other contexts already. This patch (of 3): Commit e4f8e513c3d3 ("mm/slub: fix a deadlock in show_slab_objects()") has fixed a problematic locking order by removing the memory hotplug lock get/put_online_mems() from show_slab_objects(). During the discussion, it was argued [1] that this is OK, because existing slabs on the node would prevent a hotremove to proceed. That's true, but per-node kmem_cache_node structures are not necessarily allocated on the same node and may exist even without actual slab pages on the same node. Any path that uses get_node() directly or via for_each_kmem_cache_node() (such as show_slab_objects()) can race with freeing of kmem_cache_node even with the !NULL check, resulting in use-after-free. To that end, commit e4f8e513c3d3 argues in a comment that: * We don't really need mem_hotplug_lock (to hold off * slab_mem_going_offline_callback) here because slab's memory hot * unplug code doesn't destroy the kmem_cache->node[] data. While it's true that slab_mem_going_offline_callback() doesn't free the kmem_cache_node, the later callback slab_mem_offline_callback() actually does, so the race and use-after-free exists. Not just for show_slab_objects() after commit e4f8e513c3d3, but also many other places that are not under slab_mutex. And adding slab_mutex locking or other synchronization to SLUB paths such as get_any_partial() would be bad for performance and error-prone. The easiest solution is therefore to make the abovementioned comment true and stop freeing the kmem_cache_node structures, accepting some wasted memory in the full memory node removal scenario. Analogically we also don't free hotremoved pgdat as mentioned in [1], nor the similar per-node structures in SLAB. Importantly this approach will not block the hotremove, as generally such nodes should be movable in order to succeed hotremove in the first place, and thus the GFP_KERNEL allocated kmem_cache_node will come from elsewhere. [1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.cz/ Link: https://lkml.kernel.org/r/20210113131634.3671-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20210113131634.3671-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: Qian Cai <cai(a)redhat.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Chengyang Fan <cy.fan(a)huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/slub.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 30683ffe18234..983392cdccef9 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4059,8 +4059,6 @@ static int slab_mem_going_offline_callback(void *arg) static void slab_mem_offline_callback(void *arg) { - struct kmem_cache_node *n; - struct kmem_cache *s; struct memory_notify *marg = arg; int offline_node; @@ -4074,21 +4072,11 @@ static void slab_mem_offline_callback(void *arg) return; mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_caches, list) { - n = get_node(s, offline_node); - if (n) { - /* - * if n->nr_slabs > 0, slabs still exist on the node - * that is going down. We were unable to free them, - * and offline_pages() function shouldn't call this - * callback. So, we must fail. - */ - BUG_ON(slabs_node(s, offline_node)); - - s->node[offline_node] = NULL; - kmem_cache_free(kmem_cache_node, n); - } - } + /* + * We no longer free kmem_cache_node structures here, as it would be + * racy with all get_node() users, and infeasible to protect them with + * slab_mutex. + */ mutex_unlock(&slab_mutex); } @@ -4114,6 +4102,12 @@ static int slab_mem_going_online_callback(void *arg) */ mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) { + /* + * The structure may already exist if the node was previously + * onlined and offlined. + */ + if (get_node(s, nid)) + continue; /* * XXX: kmem_cache_alloc_node will fallback to other nodes * since memory is not yet available from the node that -- 2.25.1

1 2