Per Downstream Port Containment Related Enhancements ECN
Table 4-6, Interpretation of _OSC Control Field Returned Value,
for bit 7 of _OSC control return value:
"Firmware sets this bit to 1 to grant the OS control over PCI Express
Downstream Port Containment configuration."
"If control of this feature was requested and denied,
or was not requested, the firmware returns this bit set to 0."
We store bit 7 of _OSC control return value in host->native_dpc,
check it before enable the dpc service as the firmware may not
grant the control.
 Downstream Port Containment Related Enhancements ECN,
Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2
Signed-off-by: Yicong Yang <yangyicong(a)hisilicon.com>
Change since v1:
- use correct reference for _OSC control return value
drivers/pci/pcie/portdrv_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed664..7445d03 100644
@@ -253,7 +253,8 @@ static int get_port_device_capability(struct pci_dev *dev)
if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
- (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
+ (pcie_ports_dpc_native ||
+ ((services & PCIE_PORT_SERVICE_AER) && host->native_dpc)))
services |= PCIE_PORT_SERVICE_DPC;
if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
On an ARM64 system with a SMMUv3 implementation that fully supports
Broadcast TLB Maintenance(BTM) feature as part of the Distributed
Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
received by SMMUv3. This is very useful when the SMMUv3 shares the
page tables with the CPU(eg: Guest SVA use case). For this to work,
the SMMU must use the same VMID that is allocated by KVM to configure
the stage 2 translations. At present KVM VMID allocations are recycled
on rollover and may change as a result. This will create issues if we
have to share the KVM VMID with SMMU.
Please see the discussion here,
This series proposes a way to share the VMID between KVM and IOMMU
1. Splitting the KVM VMID space into two equal halves based on the
command line option "kvm-arm.pinned_vmid_enable".
2. First half of the VMID space follows the normal recycle on rollover
3. Second half of the VMID space doesn't roll over and is used to
allocate pinned VMIDs.
4. Provides helper function to retrieve the KVM instance associated
with a device(if it is part of a vfio group).
5. Introduces generic interfaces to get/put pinned KVM VMIDs.
1. I couldn't figure out a way to determine whether a platform actually
fully supports DVM/BTM or not. Not sure we can take a call based on
SMMUv3 BTM feature bit alone. Probably we can get it from firmware
2. The current splitting of VMID space is only one way to do this and
probably not the best. Maybe we can follow the pinned ASID method used
in SVA code. Suggestions welcome here.
3. The detach_pasid_table() interface is not very clear to me as the current
Qemu prototype is not using that. This requires fixing from my side.
This is based on Jean-Philippe's SVA series and Eric's SMMUv3 dual-stage
The branch with the whole vSVA + BTM solution is here,
This is lightly tested on a HiSilicon D06 platform with uacce/zip dev test tool,
./zip_sva_per -k tlb
Shameer Kolothum (5):
vfio: Add a helper to retrieve kvm instance from a dev
KVM: Add generic infrastructure to support pinned VMIDs
KVM: ARM64: Add support for pinned VMIDs
iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
KVM: arm64: Make sure pinned vmid is released on VM exit
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c | 116 +++++++++++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++-
drivers/vfio/vfio.c | 12 ++
include/linux/kvm_host.h | 17 +++
include/linux/vfio.h | 1 +
virt/kvm/Kconfig | 2 +
virt/kvm/kvm_main.c | 25 +++++
9 files changed, 220 insertions(+), 5 deletions(-)
Patch 1: remove unnecessary seqcount operation.
Patch 2: implement TCQ_F_CAN_BYPASS.
Patch 3: remove qdisc->empty.
Performance data for pktgen in queue_xmit mode + dummy netdev
threads unpatched patched delta
1 2.60Mpps 3.21Mpps +23%
2 3.84Mpps 5.56Mpps +44%
4 5.52Mpps 5.58Mpps +1%
8 2.77Mpps 2.76Mpps -0.3%
16 2.24Mpps 2.23Mpps -0.4%
Performance for IP forward testing: 1.05Mpps increases to
1.16Mpps, about 10% improvement.
V3: Add 'Acked-by' from Jakub and 'Tested-by' from Vladimir,
and resend based on latest net-next.
V2: Adjust the comment and commit log according to discussion
V1: Drop RFC tag, add nolock_qdisc_is_empty() and do the qdisc
empty checking without the protection of qdisc->seqlock to
aviod doing unnecessary spin_trylock() for contention case.
RFC v4: Use STATE_MISSED and STATE_DRAINING to indicate non-empty
qdisc, and add patch 1 and 3.
Yunsheng Lin (3):
net: sched: avoid unnecessary seqcount operation for lockless qdisc
net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
net: sched: remove qdisc->empty for lockless qdisc
include/net/sch_generic.h | 31 ++++++++++++++++++-------------
net/core/dev.c | 27 +++++++++++++++++++++++++--
net/sched/sch_generic.c | 23 ++++++++++++++++-------
3 files changed, 59 insertions(+), 22 deletions(-)
The spin_trylock() was assumed to contain the implicit
barrier needed to ensure the correct ordering between
STATE_MISSED setting/clearing and STATE_MISSED checking
in commit a90c57f2cedd ("net: sched: fix packet stuck
problem for lockless qdisc").
But it turns out that spin_trylock() only has load-acquire
semantic, for strongly-ordered system(like x86), the compiler
barrier implicitly contained in spin_trylock() seems enough
to ensure the correct ordering. But for weakly-orderly system
(like arm64), the store-release semantic is needed to ensure
the correct ordering as clear_bit() and test_bit() is store
operation, see queued_spin_lock().
So add the explicit barrier to ensure the correct ordering
for the above case.
Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com>
V2: add the missing Fixes tag.
The above ordering issue can easily cause out of order packet
problem when testing lockless qdisc bypass patchset  with
two iperf threads and one netdev queue in arm64 system.
include/net/sch_generic.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1e62551..5771030 100644
@@ -163,6 +163,12 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
+ /* Paired with smp_mb__after_atomic() to make sure
+ * STATE_MISSED checking is synchronized with clearing
+ * in pfifo_fast_dequeue().
/* If the MISSED flag is set, it means other thread has
* set the MISSED flag before second spin_trylock(), so
* we can return false here to avoid multi cpus doing
@@ -180,6 +186,12 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
+ /* spin_trylock() only has load-acquire semantic, so use
+ * smp_mb__after_atomic() to ensure STATE_MISSED is set
+ * before doing the second spin_trylock().
/* Retry again in case other CPU may not see the new flag
* after it releases the lock at the end of qdisc_run_end().