Kernel
Threads by month
- ----- 2025 -----
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- 28 participants
- 18058 discussions

[PATCH kernel-4.19 1/6] nbd: handle device refs for DESTROY_ON_DISCONNECT properly
by Yang Yingliang 14 Jul '21
by Yang Yingliang 14 Jul '21
14 Jul '21
From: Josef Bacik <josef(a)toxicpanda.com>
mainline inclusion
from mainline-5.12-rc1
commit c9a2f90f4d6b
category: bugfix
bugzilla: 50455
CVE: NA
-------------------------------------------------
There exists a race where we can be attempting to create a new nbd
configuration while a previous configuration is going down, both
configured with DESTROY_ON_DISCONNECT. Normally devices all have a
reference of 1, as they won't be cleaned up until the module is torn
down. However with DESTROY_ON_DISCONNECT we'll make sure that there is
only 1 reference (generally) on the device for the config itself, and
then once the config is dropped, the device is torn down.
The race that exists looks like this
TASK1 TASK2
nbd_genl_connect()
idr_find()
refcount_inc_not_zero(nbd)
* count is 2 here ^^
nbd_config_put()
nbd_put(nbd) (count is 1)
setup new config
check DESTROY_ON_DISCONNECT
put_dev = true
if (put_dev) nbd_put(nbd)
* free'd here ^^
In nbd_genl_connect() we assume that the nbd ref count will be 2,
however clearly that won't be true if the nbd device had been setup as
DESTROY_ON_DISCONNECT with its prior configuration. Fix this by getting
rid of the runtime flag to check if we need to mess with the nbd device
refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if
we need to adjust the ref counts. This was reported by syzkaller with
the following kasan dump
BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451
CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230
__kasan_report mm/kasan/report.c:396 [inline]
kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
check_memory_region_inline mm/kasan/generic.c:179 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
instrument_atomic_read include/linux/instrumented.h:71 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115
nbd_put drivers/block/nbd.c:248 [inline]
nbd_release+0x116/0x190 drivers/block/nbd.c:1508
__blkdev_put+0x548/0x800 fs/block_dev.c:1579
blkdev_put+0x92/0x570 fs/block_dev.c:1632
blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
__fput+0x283/0x920 fs/file_table.c:280
task_work_run+0xdd/0x190 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc1e92b5270
Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24
RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270
RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007
RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008
R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000
R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e
Allocated by task 1:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:401 [inline]
____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:682 [inline]
nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673
nbd_init+0x250/0x271 drivers/block/nbd.c:2394
do_one_initcall+0x103/0x650 init/main.c:1223
do_initcall_level init/main.c:1296 [inline]
do_initcalls init/main.c:1312 [inline]
do_basic_setup init/main.c:1332 [inline]
kernel_init_freeable+0x605/0x689 init/main.c:1533
kernel_init+0xd/0x1b8 init/main.c:1421
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Freed by task 8451:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356
____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362
kasan_slab_free include/linux/kasan.h:192 [inline]
slab_free_hook mm/slub.c:1547 [inline]
slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580
slab_free mm/slub.c:3143 [inline]
kfree+0xdb/0x3b0 mm/slub.c:4139
nbd_dev_remove drivers/block/nbd.c:243 [inline]
nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251
nbd_put drivers/block/nbd.c:295 [inline]
nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242
nbd_release+0x103/0x190 drivers/block/nbd.c:1507
__blkdev_put+0x548/0x800 fs/block_dev.c:1579
blkdev_put+0x92/0x570 fs/block_dev.c:1632
blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
__fput+0x283/0x920 fs/file_table.c:280
task_work_run+0xdd/0x190 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff888143bf7000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 416 bytes inside of
1024-byte region [ffff888143bf7000, ffff888143bf7400)
The buggy address belongs to the page:
page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0
head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0
flags: 0x57ff00000010200(slab|head)
raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-and-tested-by: syzbot+429d3f82d757c211bff3(a)syzkaller.appspotmail.com
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Luo Meng <luomeng12(a)huawei.com>
Reviewed-by: Jason Yan <yanaijie(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
drivers/block/nbd.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 55552f719b25b..01bede097ed25 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -76,8 +76,7 @@ struct link_dead_args {
#define NBD_RT_HAS_PID_FILE 3
#define NBD_RT_HAS_CONFIG_REF 4
#define NBD_RT_BOUND 5
-#define NBD_RT_DESTROY_ON_DISCONNECT 6
-#define NBD_RT_DISCONNECT_ON_CLOSE 7
+#define NBD_RT_DISCONNECT_ON_CLOSE 6
#define NBD_DESTROY_ON_DISCONNECT 0
#define NBD_DISCONNECT_REQUESTED 1
@@ -1894,12 +1893,21 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
if (info->attrs[NBD_ATTR_CLIENT_FLAGS]) {
u64 flags = nla_get_u64(info->attrs[NBD_ATTR_CLIENT_FLAGS]);
if (flags & NBD_CFLAG_DESTROY_ON_DISCONNECT) {
- set_bit(NBD_RT_DESTROY_ON_DISCONNECT,
- &config->runtime_flags);
- set_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags);
- put_dev = true;
+ /*
+ * We have 1 ref to keep the device around, and then 1
+ * ref for our current operation here, which will be
+ * inherited by the config. If we already have
+ * DESTROY_ON_DISCONNECT set then we know we don't have
+ * that extra ref already held so we don't need the
+ * put_dev.
+ */
+ if (!test_and_set_bit(NBD_DESTROY_ON_DISCONNECT,
+ &nbd->flags))
+ put_dev = true;
} else {
- clear_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags);
+ if (test_and_clear_bit(NBD_DESTROY_ON_DISCONNECT,
+ &nbd->flags))
+ refcount_inc(&nbd->refs);
}
if (flags & NBD_CFLAG_DISCONNECT_ON_CLOSE) {
set_bit(NBD_RT_DISCONNECT_ON_CLOSE,
@@ -2067,15 +2075,13 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
if (info->attrs[NBD_ATTR_CLIENT_FLAGS]) {
u64 flags = nla_get_u64(info->attrs[NBD_ATTR_CLIENT_FLAGS]);
if (flags & NBD_CFLAG_DESTROY_ON_DISCONNECT) {
- if (!test_and_set_bit(NBD_RT_DESTROY_ON_DISCONNECT,
- &config->runtime_flags))
+ if (!test_and_set_bit(NBD_DESTROY_ON_DISCONNECT,
+ &nbd->flags))
put_dev = true;
- set_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags);
} else {
- if (test_and_clear_bit(NBD_RT_DESTROY_ON_DISCONNECT,
- &config->runtime_flags))
+ if (test_and_clear_bit(NBD_DESTROY_ON_DISCONNECT,
+ &nbd->flags))
refcount_inc(&nbd->refs);
- clear_bit(NBD_DESTROY_ON_DISCONNECT, &nbd->flags);
}
if (flags & NBD_CFLAG_DISCONNECT_ON_CLOSE) {
--
2.25.1
1
5
Alex Shi (1):
mm: add VM_WARN_ON_ONCE_PAGE() macro
Alper Gun (1):
KVM: SVM: Call SEV Guest Decommission if ASID binding fails
Anson Huang (1):
ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
Christian König (1):
drm/nouveau: fix dma_address check for CPU/GPU sync
Greg Kroah-Hartman (1):
Linux 4.19.197
Hugh Dickins (16):
mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
mm/thp: make is_huge_zero_pmd() safe and quicker
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
mm/thp: fix vma_address() if virtual address below file offset
mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
mm: page_vma_mapped_walk(): use page for pvmw->page
mm: page_vma_mapped_walk(): settle PageHuge on entry
mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
mm: page_vma_mapped_walk(): crossing page table boundary
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
mm, futex: fix shared futex pgoff on shmem huge page
Jue Wang (1):
mm/thp: fix page_address_in_vma() on file THP tails
Juergen Gross (1):
xen/events: reset active flag for lateeoi events later
ManYi Li (1):
scsi: sr: Return appropriate error code when disk is ejected
Miaohe Lin (2):
mm/rmap: remove unneeded semicolon in page_not_mapped()
mm/rmap: use page_not_mapped in try_to_unmap()
Petr Mladek (2):
kthread_worker: split code for canceling the delayed work timer
kthread: prevent deadlock when kthread_mod_delayed_work() races with
kthread_cancel_delayed_work_sync()
Tony Lindgren (3):
clocksource/drivers/timer-ti-dm: Add clockevent and clocksource
support
clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap
issue
clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
Yang Shi (1):
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
afzal mohammed (1):
ARM: OMAP: replace setup_irq() by request_irq()
Makefile | 2 +-
arch/arm/boot/dts/dra7.dtsi | 11 ++
arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 -
arch/arm/mach-omap1/pm.c | 13 +-
arch/arm/mach-omap1/time.c | 10 +-
arch/arm/mach-omap1/timer32k.c | 10 +-
arch/arm/mach-omap2/board-generic.c | 4 +-
arch/arm/mach-omap2/timer.c | 181 +++++++++++++++++--------
arch/x86/kvm/svm.c | 32 +++--
drivers/clk/ti/clk-7xx.c | 1 +
drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +-
drivers/scsi/sr.c | 2 +
drivers/xen/events/events_base.c | 23 +++-
include/linux/cpuhotplug.h | 1 +
include/linux/huge_mm.h | 8 +-
include/linux/hugetlb.h | 16 ---
include/linux/mm.h | 3 +
include/linux/mmdebug.h | 13 ++
include/linux/pagemap.h | 13 +-
include/linux/rmap.h | 3 +-
kernel/futex.c | 2 +-
kernel/kthread.c | 77 +++++++----
mm/huge_memory.c | 56 ++++----
mm/hugetlb.c | 5 +-
mm/internal.h | 53 ++++++--
mm/memory.c | 41 ++++++
mm/page_vma_mapped.c | 160 +++++++++++++---------
mm/pgtable-generic.c | 4 +-
mm/rmap.c | 48 ++++---
mm/truncate.c | 43 +++---
30 files changed, 541 insertions(+), 302 deletions(-)
--
2.25.1
1
33

[PATCH openEuler-1.0-LTS] mm/memcontrol.c: fix kasan slab-out-of-bounds in mem_cgroup_css_alloc
by Yang Yingliang 12 Jul '21
by Yang Yingliang 12 Jul '21
12 Jul '21
From: Lu Jialin <lujialin4(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I
CVE: NA
--------
static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
...
pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
if (!pn)
return 1;
pnext = to_mgpn_ext(pn);
pnext->lruvec_stat_local = alloc_percpu(struct lruvec_stat);
}
the size of pnext is larger than pn, so pnext->lruvec_stat_local is out
of bounds
Signed-off-by: Lu Jialin <lujialin4(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
include/linux/memcontrol.h | 12 +++++-----
mm/memcontrol.c | 48 +++++++++++++++++++-------------------
2 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c7c7c0a418771..b4a25979ee8c7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -609,11 +609,11 @@ static inline unsigned long memcg_page_state_local(struct mem_cgroup *memcg,
{
long x = 0;
int cpu;
- struct mem_cgroup_extension *mgext;
+ struct mem_cgroup_extension *memcg_ext;
- mgext = to_memcg_ext(memcg);
+ memcg_ext = to_memcg_ext(memcg);
for_each_possible_cpu(cpu)
- x += per_cpu(mgext->vmstats_local->count[idx], cpu);
+ x += per_cpu(memcg_ext->vmstats_local->count[idx], cpu);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -687,7 +687,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
enum node_stat_item idx)
{
struct mem_cgroup_per_node *pn;
- struct mem_cgroup_per_node_extension *pnext;
+ struct mem_cgroup_per_node_extension *pn_ext;
long x = 0;
int cpu;
@@ -695,9 +695,9 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
return node_page_state(lruvec_pgdat(lruvec), idx);
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- pnext = to_mgpn_ext(pn);
+ pn_ext = to_mgpn_ext(pn);
for_each_possible_cpu(cpu)
- x += per_cpu(pnext->lruvec_stat_local->count[idx], cpu);
+ x += per_cpu(pn_ext->lruvec_stat_local->count[idx], cpu);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index da10300a6e7d7..713a839013f72 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -698,14 +698,14 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
x = val + __this_cpu_read(memcg->stat_cpu->count[idx]);
if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
struct mem_cgroup *mi;
- struct mem_cgroup_extension *mgext;
+ struct mem_cgroup_extension *memcg_ext;
/*
* Batch local counters to keep them in sync with
* the hierarchical ones.
*/
- mgext = to_memcg_ext(memcg);
- __this_cpu_add(mgext->vmstats_local->count[idx], x);
+ memcg_ext = to_memcg_ext(memcg);
+ __this_cpu_add(memcg_ext->vmstats_local->count[idx], x);
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
atomic_long_add(x, &mi->stat[idx]);
x = 0;
@@ -739,7 +739,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
{
pg_data_t *pgdat = lruvec_pgdat(lruvec);
struct mem_cgroup_per_node *pn;
- struct mem_cgroup_per_node_extension *pnext;
+ struct mem_cgroup_per_node_extension *pn_ext;
struct mem_cgroup *memcg;
long x;
@@ -756,8 +756,8 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
__mod_memcg_state(memcg, idx, val);
/* Update lruvec */
- pnext = to_mgpn_ext(pn);
- __this_cpu_add(pnext->lruvec_stat_local->count[idx], val);
+ pn_ext = to_mgpn_ext(pn);
+ __this_cpu_add(pn_ext->lruvec_stat_local->count[idx], val);
x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
@@ -787,14 +787,14 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
x = count + __this_cpu_read(memcg->stat_cpu->events[idx]);
if (unlikely(x > MEMCG_CHARGE_BATCH)) {
struct mem_cgroup *mi;
- struct mem_cgroup_extension *mgext;
+ struct mem_cgroup_extension *memcg_ext;
/*
* Batch local counters to keep them in sync with
* the hierarchical ones.
*/
- mgext = to_memcg_ext(memcg);
- __this_cpu_add(mgext->vmstats_local->events[idx], x);
+ memcg_ext = to_memcg_ext(memcg);
+ __this_cpu_add(memcg_ext->vmstats_local->events[idx], x);
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
atomic_long_add(x, &mi->events[idx]);
x = 0;
@@ -811,11 +811,11 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
{
long x = 0;
int cpu;
- struct mem_cgroup_extension *mgext;
+ struct mem_cgroup_extension *memcg_ext;
- mgext = to_memcg_ext(memcg);
+ memcg_ext = to_memcg_ext(memcg);
for_each_possible_cpu(cpu)
- x += per_cpu(mgext->vmstats_local->events[event], cpu);
+ x += per_cpu(memcg_ext->vmstats_local->events[event], cpu);
return x;
}
@@ -4837,7 +4837,7 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
- struct mem_cgroup_per_node_extension *pnext;
+ struct mem_cgroup_per_node_extension *pn_ext;
int tmp = node;
/*
* This routine is called against possible nodes.
@@ -4849,21 +4849,21 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
*/
if (!node_state(node, N_NORMAL_MEMORY))
tmp = -1;
- pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+ pn_ext = kzalloc_node(sizeof(*pn_ext), GFP_KERNEL, tmp);
+ pn = &pn_ext->pn;
if (!pn)
return 1;
- pnext = to_mgpn_ext(pn);
- pnext->lruvec_stat_local = alloc_percpu(struct lruvec_stat);
- if (!pnext->lruvec_stat_local) {
- kfree(pnext);
+ pn_ext->lruvec_stat_local = alloc_percpu(struct lruvec_stat);
+ if (!pn_ext->lruvec_stat_local) {
+ kfree(pn_ext);
return 1;
}
pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
if (!pn->lruvec_stat_cpu) {
- free_percpu(pnext->lruvec_stat_local);
- kfree(pn);
+ free_percpu(pn_ext->lruvec_stat_local);
+ kfree(pn_ext);
return 1;
}
@@ -4879,15 +4879,15 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
- struct mem_cgroup_per_node_extension *pnext;
+ struct mem_cgroup_per_node_extension *pn_ext;
if (!pn)
return;
+ pn_ext = to_mgpn_ext(pn);
free_percpu(pn->lruvec_stat_cpu);
- pnext = to_mgpn_ext(pn);
- free_percpu(pnext->lruvec_stat_local);
- kfree(pn);
+ free_percpu(pn_ext->lruvec_stat_local);
+ kfree(pn_ext);
}
static void __mem_cgroup_free(struct mem_cgroup *memcg)
--
2.25.1
1
0

[PATCH kernel-4.19] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation
by Yang Yingliang 12 Jul '21
by Yang Yingliang 12 Jul '21
12 Jul '21
From: Qu Wenruo <wqu(a)suse.com>
mainline inclusion
from mainline-v5.13-rc5
commit 6d4572a9d71d5fc2affee0258d8582d39859188c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I39MZM
CVE: NA
------------------------------------------------------
[BUG]
When the data space is exhausted, even if the inode has NOCOW attribute,
we will still refuse to truncate unaligned range due to ENOSPC.
The following script can reproduce it pretty easily:
#!/bin/bash
dev=/dev/test/test
mnt=/mnt/btrfs
umount $dev &> /dev/null
umount $mnt &> /dev/null
mkfs.btrfs -f $dev -b 1G
mount -o nospace_cache $dev $mnt
touch $mnt/foobar
chattr +C $mnt/foobar
xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
sync
xfs_io -c "fpunch 0 2k" $mnt/foobar
umount $mnt
Currently this will fail at the fpunch part.
[CAUSE]
Because btrfs_truncate_block() always reserves space without checking
the NOCOW attribute.
Since the writeback path follows NOCOW bit, we only need to bother the
space reservation code in btrfs_truncate_block().
[FIX]
Make btrfs_truncate_block() follow btrfs_buffered_write() to try to
reserve data space first, and fall back to NOCOW check only when we
don't have enough space.
Such always-try-reserve is an optimization introduced in
btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
This patch will export check_can_nocow() as btrfs_check_can_nocow(), and
use it in btrfs_truncate_block() to fix the problem.
Reference: https://patchwork.kernel.org/project/linux-btrfs/patch/20200130052822.11765…
Reported-by: Martin Doucha <martin.doucha(a)suse.com>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: Anand Jain <anand.jain(a)oracle.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Conflicts:
fs/btrfs/file.c
fs/btrfs/inode.c
Signed-off-by: Gou Hao <gouhao(a)uniontech.com>
Signed-off-by: Cheng Jian <cj.chengjian(a)huawei.com>
Reviewed-by: Jiao Fenfang <jiaofenfang(a)uniontech.com>
Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
fs/btrfs/ctree.h | 3 ++-
fs/btrfs/file.c | 8 ++++----
fs/btrfs/inode.c | 39 +++++++++++++++++++++++++++++++++------
3 files changed, 39 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4d1c12faada89..4f5c58d40a79f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3271,7 +3271,8 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out, u64 len);
-
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes);
/* tree-defrag.c */
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 41ad37f8062a9..3cd05edca30ce 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1536,8 +1536,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
return ret;
}
-static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
- size_t *write_bytes)
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct btrfs_root *root = inode->root;
@@ -1647,7 +1647,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
if (ret < 0) {
if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) &&
- check_can_nocow(BTRFS_I(inode), pos,
+ btrfs_check_can_nocow(BTRFS_I(inode), pos,
&write_bytes) > 0) {
/*
* For nodata cow case, no need to reserve
@@ -1925,7 +1925,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
*/
if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) ||
- check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
+ btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
inode_unlock(inode);
return -EAGAIN;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bf0e0e3e09c5d..9e3e003e4488e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4958,11 +4958,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
struct extent_state *cached_state = NULL;
struct extent_changeset *data_reserved = NULL;
char *kaddr;
+ bool only_release_metadata = false;
u32 blocksize = fs_info->sectorsize;
pgoff_t index = from >> PAGE_SHIFT;
unsigned offset = from & (blocksize - 1);
struct page *page;
gfp_t mask = btrfs_alloc_write_mask(mapping);
+ size_t write_bytes = blocksize;
int ret = 0;
u64 block_start;
u64 block_end;
@@ -4974,10 +4976,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
block_start = round_down(from, blocksize);
block_end = block_start + blocksize - 1;
- ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
- block_start, blocksize);
- if (ret)
+ ret = btrfs_check_data_free_space(inode, &data_reserved,
+ block_start, blocksize);
+ if (ret < 0) {
+ if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+ BTRFS_INODE_PREALLOC)) &&
+ btrfs_check_can_nocow(BTRFS_I(inode), block_start,
+ &write_bytes) > 0) {
+ /* For nocow case, no need to reserve data space */
+ only_release_metadata = true;
+ } else {
+ goto out;
+ }
+ }
+ ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
+ if (ret < 0) {
+ if (!only_release_metadata)
+ btrfs_free_reserved_data_space(inode, data_reserved,
+ block_start, blocksize);
goto out;
+ }
again:
page = find_or_create_page(mapping, index, mask);
@@ -5048,10 +5066,19 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
set_page_dirty(page);
unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
+ if (only_release_metadata)
+ set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
+ block_end, EXTENT_NORESERVE, NULL, NULL,
+ GFP_NOFS);
out_unlock:
- if (ret)
- btrfs_delalloc_release_space(inode, data_reserved, block_start,
- blocksize, true);
+ if (ret) {
+ if (only_release_metadata)
+ btrfs_delalloc_release_metadata(BTRFS_I(inode),
+ blocksize, true);
+ else
+ btrfs_delalloc_release_space(inode, data_reserved,
+ block_start, blocksize, true);
+ }
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
unlock_page(page);
put_page(page);
--
2.25.1
2
1

12 Jul '21
From: Mimi Zohar <zohar(a)linux.ibm.com>
stable inclusion
from linux-4.19.196
commit ff660863628fb144badcb3395cde7821c82c13a6
CVE: CVE-2021-35039
--------------------------------
[ Upstream commit 0c18f29aae7ce3dadd26d8ee3505d07cc982df75 ]
Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
"module.sig_enforce=1" on the boot command line sets "sig_enforce".
Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.
This patch makes the presence of /sys/module/module/parameters/sig_enforce
dependent on CONFIG_MODULE_SIG=y.
Fixes: fda784e50aac ("module: export module signature enforcement status")
Reported-by: Nayna Jain <nayna(a)linux.ibm.com>
Tested-by: Mimi Zohar <zohar(a)linux.ibm.com>
Tested-by: Jessica Yu <jeyu(a)kernel.org>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Jessica Yu <jeyu(a)kernel.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
kernel/module.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/module.c b/kernel/module.c
index ad4c1d7b7a956..82c83cbf6ce6c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -268,9 +268,18 @@ static void module_assert_mutex_or_preempt(void)
#endif
}
+#ifdef CONFIG_MODULE_SIG
static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
module_param(sig_enforce, bool_enable_only, 0644);
+void set_module_sig_enforced(void)
+{
+ sig_enforce = true;
+}
+#else
+#define sig_enforce false
+#endif
+
/*
* Export sig_enforce kernel cmdline parameter to allow other subsystems rely
* on that instead of directly to CONFIG_MODULE_SIG_FORCE config.
--
2.25.1
1
0

[PATCH kernel-4.19 1/4] xfs: let writable tracepoint enable to clear flag of f_mode
by Yang Yingliang 12 Jul '21
by Yang Yingliang 12 Jul '21
12 Jul '21
From: Yufen Yu <yuyufen(a)huawei.com>
hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------
Adding a new member clear_f_mode into struct xfs_writable_file,
then we can clear some flag of file->f_mode.
Signed-off-by: Yufen Yu <yuyufen(a)huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Hou Tao <houtao1(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
fs/xfs/xfs_file.c | 10 +++++++---
include/linux/fs.h | 9 ++++++---
include/uapi/linux/xfs.h | 4 +++-
tools/include/uapi/linux/xfs.h | 2 +-
4 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index ffc388c8b4523..bd8ae4df20042 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -35,6 +35,8 @@
#include <linux/mman.h>
#include <linux/fadvise.h>
+#define FMODE_MASK (FMODE_RANDOM | FMODE_WILLNEED | FMODE_SPC_READAHEAD)
+
static const struct vm_operations_struct xfs_file_vm_ops;
int
@@ -238,15 +240,17 @@ xfs_file_buffered_aio_read(
struct xfs_writable_file file;
file.name = file_dentry(filp)->d_name.name;
+ file.clear_f_mode = 0;
file.f_mode = 0;
file.i_size = file_inode(filp)->i_size;
- file.prev_pos = filp->f_ra.prev_pos;
+ file.prev_pos = filp->f_ra.prev_pos >> PAGE_SHIFT;
+ file.pos = iocb->ki_pos >> PAGE_SHIFT;
trace_xfs_file_buffered_read(ip, iov_iter_count(to), iocb->ki_pos);
trace_xfs_file_read(&file, ip, iov_iter_count(to), iocb->ki_pos);
- if (file.f_mode)
- filp->f_mode |= file.f_mode;
+ filp->f_mode |= file.f_mode & FMODE_MASK;
+ filp->f_mode &= ~(file.clear_f_mode & FMODE_MASK);
if (iocb->ki_flags & IOCB_NOWAIT) {
if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5bc43ac95035..394da46d143c2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -160,6 +160,12 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
/* File is stream-like */
#define FMODE_STREAM ((__force fmode_t)0x200000)
+/* File will try to read head of the file into pagecache */
+#define FMODE_WILLNEED ((__force fmode_t)0x400000)
+
+/* File will do specail readahead */
+#define FMODE_SPC_READAHEAD ((__force fmode_t)0x800000)
+
/* File was opened by fanotify and shouldn't generate fanotify events */
#define FMODE_NONOTIFY ((__force fmode_t)0x4000000)
@@ -169,9 +175,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
/* File does not contribute to nr_files count */
#define FMODE_NOACCOUNT ((__force fmode_t)0x20000000)
-/* File will try to read head of the file into pagecache */
-#define FMODE_WILLNEED ((__force fmode_t)0x40000000)
-
/*
* Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
* that indicates that they should check the contents of the iovec are
diff --git a/include/uapi/linux/xfs.h b/include/uapi/linux/xfs.h
index 635a83914273b..0a11c2344e5a3 100644
--- a/include/uapi/linux/xfs.h
+++ b/include/uapi/linux/xfs.h
@@ -7,9 +7,11 @@
struct xfs_writable_file {
const unsigned char *name;
+ unsigned int clear_f_mode; /* can be cleared from file->f_mode */
unsigned int f_mode; /* can be set into file->f_mode */
long long i_size; /* file size */
- long long prev_pos; /* ra->prev_pos */
+ long long prev_pos; /* ra->prev_pos page index */
+ long long pos; /* iocb->ki_pos page index */
};
#endif /* _UAPI_LINUX_XFS_H */
diff --git a/tools/include/uapi/linux/xfs.h b/tools/include/uapi/linux/xfs.h
index f333a2eb74074..2c4c61d5ba539 100644
--- a/tools/include/uapi/linux/xfs.h
+++ b/tools/include/uapi/linux/xfs.h
@@ -5,7 +5,7 @@
#include <linux/types.h>
#define FMODE_RANDOM (0x1000)
-#define FMODE_WILLNEED (0x40000000)
+#define FMODE_WILLNEED (0x400000)
struct xfs_writable_file {
const unsigned char *name;
--
2.25.1
1
3

09 Jul '21
From: yangerkun <yangerkun(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: 172974
CVE: NA
---------------------------
72c9e4df6a99 ('jbd2: ensure abort the journal if detect IO error when
writing original buffer back') will add 'j_atomic_flags' which can lead
lots of kabi broken like jbd2_journal_destroy/jbd2_journal_abort and so
on.
Fix it by add a wrapper.
Signed-off-by: yangerkun <yangerkun(a)huawei.com>
Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
fs/jbd2/checkpoint.c | 5 ++++-
fs/jbd2/journal.c | 19 +++++++++++++------
include/linux/jbd2.h | 23 ++++++++++++++++++-----
3 files changed, 35 insertions(+), 12 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index b1af15ad36dcb..f2c36c9c58be3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -562,6 +562,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
struct transaction_chp_stats_s *stats;
transaction_t *transaction;
journal_t *journal;
+ journal_wrapper_t *journal_wrapper;
struct buffer_head *bh = jh2bh(jh);
JBUFFER_TRACE(jh, "entry");
@@ -572,6 +573,8 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 0;
}
journal = transaction->t_journal;
+ journal_wrapper = container_of(journal, journal_wrapper_t,
+ jw_journal);
JBUFFER_TRACE(jh, "removing from transaction");
@@ -583,7 +586,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
* journal here and we abort the journal later from a better context.
*/
if (buffer_write_io_error(bh))
- set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags);
+ set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags);
__buffer_unlink(jh);
jh->b_cp_transaction = NULL;
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 89fad4c3e13cb..ef9a942fc9a1a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1129,14 +1129,17 @@ static journal_t *journal_init_common(struct block_device *bdev,
{
static struct lock_class_key jbd2_trans_commit_key;
journal_t *journal;
+ journal_wrapper_t *journal_wrapper;
int err;
struct buffer_head *bh;
int n;
- journal = kzalloc(sizeof(*journal), GFP_KERNEL);
- if (!journal)
+ journal_wrapper = kzalloc(sizeof(*journal_wrapper), GFP_KERNEL);
+ if (!journal_wrapper)
return NULL;
+ journal = &(journal_wrapper->jw_journal);
+
init_waitqueue_head(&journal->j_wait_transaction_locked);
init_waitqueue_head(&journal->j_wait_done_commit);
init_waitqueue_head(&journal->j_wait_commit);
@@ -1195,7 +1198,7 @@ static journal_t *journal_init_common(struct block_device *bdev,
err_cleanup:
kfree(journal->j_wbuf);
jbd2_journal_destroy_revoke(journal);
- kfree(journal);
+ kfree(journal_wrapper);
return NULL;
}
@@ -1425,11 +1428,13 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
unsigned long tail_block, int write_op)
{
journal_superblock_t *sb = journal->j_superblock;
+ journal_wrapper_t *journal_wrapper = container_of(journal,
+ journal_wrapper_t, jw_journal);
int ret;
if (is_journal_aborted(journal))
return -EIO;
- if (test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags)) {
+ if (test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags)) {
jbd2_journal_abort(journal, -EIO);
return -EIO;
}
@@ -1754,6 +1759,8 @@ int jbd2_journal_load(journal_t *journal)
int jbd2_journal_destroy(journal_t *journal)
{
int err = 0;
+ journal_wrapper_t *journal_wrapper = container_of(journal,
+ journal_wrapper_t, jw_journal);
/* Wait for the commit thread to wake up and die. */
journal_kill_thread(journal);
@@ -1795,7 +1802,7 @@ int jbd2_journal_destroy(journal_t *journal)
* may become inconsistent.
*/
if (!is_journal_aborted(journal) &&
- test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags))
+ test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags))
jbd2_journal_abort(journal, -EIO);
if (journal->j_sb_buffer) {
@@ -1823,7 +1830,7 @@ int jbd2_journal_destroy(journal_t *journal)
if (journal->j_chksum_driver)
crypto_free_shash(journal->j_chksum_driver);
kfree(journal->j_wbuf);
- kfree(journal);
+ kfree(journal_wrapper);
return err;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 5c0446f22bee1..ef213666c3a3b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -105,6 +105,8 @@ typedef struct jbd2_journal_handle handle_t; /* Atomic operation type */
* This is an opaque datatype.
**/
typedef struct journal_s journal_t; /* Journal control structure */
+
+typedef struct journal_wrapper_s journal_wrapper_t;
#endif
/*
@@ -780,11 +782,6 @@ struct journal_s
*/
unsigned long j_flags;
- /**
- * @j_atomic_flags: Atomic journaling state flags.
- */
- unsigned long j_atomic_flags;
-
/**
* @j_errno:
*
@@ -1199,6 +1196,22 @@ struct journal_s
#endif
};
+/**
+ * struct journal_wrapper_s - The wrapper of journal_s to fix KABI.
+ */
+struct journal_wrapper_s
+{
+ /**
+ * @jw_journal: real journal.
+ */
+ journal_t jw_journal;
+
+ /**
+ * @j_atomic_flags: Atomic journaling state flags.
+ */
+ unsigned long j_atomic_flags;
+};
+
#define jbd2_might_wait_for_commit(j) \
do { \
rwsem_acquire(&j->j_trans_commit_map, 0, 0, _THIS_IP_); \
--
2.25.1
1
0

[PATCH OLK-5.10] x86/perf: Add uncore performance monitor support for Zhaoxin CPUs
by LeoLiuoc 09 Jul '21
by LeoLiuoc 09 Jul '21
09 Jul '21
Zhaoxin CPUs have already provided a performance monitoring unit on
hardware for uncore, but this feature has not been used. Therefore,
add support for Zhaoxin CPUs to make it available to monitor the
uncore performance.
Signed-off-by: LeoLiu-oc <LeoLiu-oc(a)zhaoxin.com>
---
arch/x86/events/zhaoxin/Makefile | 1 +
arch/x86/events/zhaoxin/uncore.c | 1123 ++++++++++++++++++++++++++++++
arch/x86/events/zhaoxin/uncore.h | 311 +++++++++
3 files changed, 1435 insertions(+)
create mode 100644 arch/x86/events/zhaoxin/uncore.c
create mode 100644 arch/x86/events/zhaoxin/uncore.h
diff --git a/arch/x86/events/zhaoxin/Makefile
b/arch/x86/events/zhaoxin/Makefile
index 642c1174d662..767d6212bac1 100644
--- a/arch/x86/events/zhaoxin/Makefile
+++ b/arch/x86/events/zhaoxin/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
obj-y += core.o
+obj-y += uncore.o
diff --git a/arch/x86/events/zhaoxin/uncore.c
b/arch/x86/events/zhaoxin/uncore.c
new file mode 100644
index 000000000000..96771063a61e
--- /dev/null
+++ b/arch/x86/events/zhaoxin/uncore.c
@@ -0,0 +1,1123 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/cpu_device_id.h>
+#include "uncore.h"
+
+static struct zhaoxin_uncore_type *empty_uncore[] = { NULL, };
+static struct zhaoxin_uncore_type **uncore_msr_uncores = empty_uncore;
+
+/* mask of cpus that collect uncore events */
+static cpumask_t uncore_cpu_mask;
+
+/* constraint for the fixed counter */
+static struct event_constraint uncore_constraint_fixed =
+ EVENT_CONSTRAINT(~0ULL, 1 << UNCORE_PMC_IDX_FIXED, ~0ULL);
+
+static int max_packages;
+
+/* CHX event control */
+#define CHX_UNC_CTL_EV_SEL_MASK 0x000000ff
+#define CHX_UNC_CTL_UMASK_MASK 0x0000ff00
+#define CHX_UNC_CTL_EDGE_DET (1 << 18)
+#define CHX_UNC_CTL_EN (1 << 22)
+#define CHX_UNC_CTL_INVERT (1 << 23)
+#define CHX_UNC_CTL_CMASK_MASK 0xff000000
+#define CHX_UNC_FIXED_CTR_CTL_EN (1 << 0)
+
+#define CHX_UNC_RAW_EVENT_MASK (CHX_UNC_CTL_EV_SEL_MASK | \
+ CHX_UNC_CTL_UMASK_MASK | \
+ CHX_UNC_CTL_EDGE_DET | \
+ CHX_UNC_CTL_INVERT | \
+ CHX_UNC_CTL_CMASK_MASK)
+
+/* CHX global control register */
+#define CHX_UNC_PERF_GLOBAL_CTL 0x391
+#define CHX_UNC_FIXED_CTR 0x394
+#define CHX_UNC_FIXED_CTR_CTRL 0x395
+
+/* CHX uncore global control */
+#define CHX_UNC_GLOBAL_CTL_EN_PC_ALL ((1ULL << 4) - 1)
+#define CHX_UNC_GLOBAL_CTL_EN_FC (1ULL << 32)
+
+/* CHX uncore register */
+#define CHX_UNC_PERFEVTSEL0 0x3c0
+#define CHX_UNC_UNCORE_PMC0 0x3b0
+
+DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
+DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
+DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
+DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
+DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+
+ssize_t zx_uncore_event_show(struct device *dev, struct
device_attribute *attr, char *buf)
+{
+ struct uncore_event_desc *event =
+ container_of(attr, struct uncore_event_desc, attr);
+ return sprintf(buf, "%s", event->config);
+}
+
+/*chx uncore support */
+static void chx_uncore_msr_disable_event(struct zhaoxin_uncore_box
*box, struct perf_event *event)
+{
+ wrmsrl(event->hw.config_base, 0);
+}
+
+static u64 uncore_msr_read_counter(struct zhaoxin_uncore_box *box,
struct perf_event *event)
+{
+ u64 count;
+
+ rdmsrl(event->hw.event_base, count);
+
+ return count;
+}
+
+static void chx_uncore_msr_disable_box(struct zhaoxin_uncore_box *box)
+{
+ wrmsrl(CHX_UNC_PERF_GLOBAL_CTL, 0);
+}
+
+static void chx_uncore_msr_enable_box(struct zhaoxin_uncore_box *box)
+{
+ wrmsrl(CHX_UNC_PERF_GLOBAL_CTL, CHX_UNC_GLOBAL_CTL_EN_PC_ALL |
CHX_UNC_GLOBAL_CTL_EN_FC);
+}
+
+static void chx_uncore_msr_enable_event(struct zhaoxin_uncore_box *box,
struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (hwc->idx < UNCORE_PMC_IDX_FIXED)
+ wrmsrl(hwc->config_base, hwc->config | CHX_UNC_CTL_EN);
+ else
+ wrmsrl(hwc->config_base, CHX_UNC_FIXED_CTR_CTL_EN);
+}
+
+static struct attribute *chx_uncore_formats_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_umask.attr,
+ &format_attr_edge.attr,
+ &format_attr_inv.attr,
+ &format_attr_cmask8.attr,
+ NULL,
+};
+
+static struct attribute_group chx_uncore_format_group = {
+ .name = "format",
+ .attrs = chx_uncore_formats_attr,
+};
+
+static struct uncore_event_desc chx_uncore_events[] = {
+ { /* end: all zeroes */ },
+};
+
+static struct zhaoxin_uncore_ops chx_uncore_msr_ops = {
+ .disable_box = chx_uncore_msr_disable_box,
+ .enable_box = chx_uncore_msr_enable_box,
+ .disable_event = chx_uncore_msr_disable_event,
+ .enable_event = chx_uncore_msr_enable_event,
+ .read_counter = uncore_msr_read_counter,
+};
+
+static struct zhaoxin_uncore_type chx_uncore_box = {
+ .name = "",
+ .num_counters = 4,
+ .num_boxes = 1,
+ .perf_ctr_bits = 48,
+ .fixed_ctr_bits = 48,
+ .event_ctl = CHX_UNC_PERFEVTSEL0,
+ .perf_ctr = CHX_UNC_UNCORE_PMC0,
+ .fixed_ctr = CHX_UNC_FIXED_CTR,
+ .fixed_ctl = CHX_UNC_FIXED_CTR_CTRL,
+ .event_mask = CHX_UNC_RAW_EVENT_MASK,
+ .event_descs = chx_uncore_events,
+ .ops = &chx_uncore_msr_ops,
+ .format_group = &chx_uncore_format_group,
+};
+
+static struct zhaoxin_uncore_type *chx_msr_uncores[] = {
+ &chx_uncore_box,
+ NULL,
+};
+
+static struct zhaoxin_uncore_box *uncore_pmu_to_box(struct
zhaoxin_uncore_pmu *pmu, int cpu)
+{
+ unsigned int package_id = topology_logical_package_id(cpu);
+
+ /*
+ * The unsigned check also catches the '-1' return value for non
+ * existent mappings in the topology map.
+ */
+ return package_id < max_packages ? pmu->boxes[package_id] : NULL;
+}
+
+static void uncore_assign_hw_event(struct zhaoxin_uncore_box *box,
+ struct perf_event *event, int idx)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ hwc->idx = idx;
+ hwc->last_tag = ++box->tags[idx];
+
+ if (uncore_pmc_fixed(hwc->idx)) {
+ hwc->event_base = uncore_fixed_ctr(box);
+ hwc->config_base = uncore_fixed_ctl(box);
+ return;
+ }
+
+ hwc->config_base = uncore_event_ctl(box, hwc->idx);
+ hwc->event_base = uncore_perf_ctr(box, hwc->idx);
+}
+
+void uncore_perf_event_update(struct zhaoxin_uncore_box *box, struct
perf_event *event)
+{
+ u64 prev_count, new_count, delta;
+ int shift;
+
+ if (uncore_pmc_fixed(event->hw.idx))
+ shift = 64 - uncore_fixed_ctr_bits(box);
+ else
+ shift = 64 - uncore_perf_ctr_bits(box);
+
+ /* the hrtimer might modify the previous event value */
+again:
+ prev_count = local64_read(&event->hw.prev_count);
+ new_count = uncore_read_counter(box, event);
+ if (local64_xchg(&event->hw.prev_count, new_count) != prev_count)
+ goto again;
+
+ delta = (new_count << shift) - (prev_count << shift);
+ delta >>= shift;
+
+ local64_add(delta, &event->count);
+}
+
+static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
+{
+ struct zhaoxin_uncore_box *box;
+ struct perf_event *event;
+ unsigned long flags;
+ int bit;
+
+ box = container_of(hrtimer, struct zhaoxin_uncore_box, hrtimer);
+ if (!box->n_active || box->cpu != smp_processor_id())
+ return HRTIMER_NORESTART;
+ /*
+ * disable local interrupt to prevent uncore_pmu_event_start/stop
+ * to interrupt the update process
+ */
+ local_irq_save(flags);
+
+ /*
+ * handle boxes with an active event list as opposed to active
+ * counters
+ */
+ list_for_each_entry(event, &box->active_list, active_entry) {
+ uncore_perf_event_update(box, event);
+ }
+
+ for_each_set_bit(bit, box->active_mask, UNCORE_PMC_IDX_MAX)
+ uncore_perf_event_update(box, box->events[bit]);
+
+ local_irq_restore(flags);
+
+ hrtimer_forward_now(hrtimer, ns_to_ktime(box->hrtimer_duration));
+ return HRTIMER_RESTART;
+}
+
+static void uncore_pmu_start_hrtimer(struct zhaoxin_uncore_box *box)
+{
+ hrtimer_start(&box->hrtimer, ns_to_ktime(box->hrtimer_duration),
+ HRTIMER_MODE_REL_PINNED);
+}
+
+static void uncore_pmu_cancel_hrtimer(struct zhaoxin_uncore_box *box)
+{
+ hrtimer_cancel(&box->hrtimer);
+}
+
+static void uncore_pmu_init_hrtimer(struct zhaoxin_uncore_box *box)
+{
+ hrtimer_init(&box->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ box->hrtimer.function = uncore_pmu_hrtimer;
+}
+
+static struct zhaoxin_uncore_box *uncore_alloc_box(struct
zhaoxin_uncore_type *type,
+ int node)
+{
+ int i, size, numshared = type->num_shared_regs;
+ struct zhaoxin_uncore_box *box;
+
+ size = sizeof(*box) + numshared * sizeof(struct
zhaoxin_uncore_extra_reg);
+
+ box = kzalloc_node(size, GFP_KERNEL, node);
+ if (!box)
+ return NULL;
+
+ for (i = 0; i < numshared; i++)
+ raw_spin_lock_init(&box->shared_regs[i].lock);
+
+ uncore_pmu_init_hrtimer(box);
+ box->cpu = -1;
+ box->package_id = -1;
+
+ /* set default hrtimer timeout */
+ box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;
+
+ INIT_LIST_HEAD(&box->active_list);
+
+ return box;
+}
+
+static bool is_box_event(struct zhaoxin_uncore_box *box, struct
perf_event *event)
+{
+ return &box->pmu->pmu == event->pmu;
+}
+
+static struct event_constraint *
+uncore_get_event_constraint(struct zhaoxin_uncore_box *box, struct
perf_event *event)
+{
+ struct zhaoxin_uncore_type *type = box->pmu->type;
+ struct event_constraint *c;
+
+ if (type->ops->get_constraint) {
+ c = type->ops->get_constraint(box, event);
+ if (c)
+ return c;
+ }
+
+ if (event->attr.config == UNCORE_FIXED_EVENT)
+ return &uncore_constraint_fixed;
+
+ if (type->constraints) {
+ for_each_event_constraint(c, type->constraints) {
+ if ((event->hw.config & c->cmask) == c->code)
+ return c;
+ }
+ }
+
+ return &type->unconstrainted;
+}
+
+static void uncore_put_event_constraint(struct zhaoxin_uncore_box *box,
+ struct perf_event *event)
+{
+ if (box->pmu->type->ops->put_constraint)
+ box->pmu->type->ops->put_constraint(box, event);
+}
+
+static int uncore_assign_events(struct zhaoxin_uncore_box *box, int
assign[], int n)
+{
+ unsigned long used_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
+ struct event_constraint *c;
+ int i, wmin, wmax, ret = 0;
+ struct hw_perf_event *hwc;
+
+ bitmap_zero(used_mask, UNCORE_PMC_IDX_MAX);
+
+ for (i = 0, wmin = UNCORE_PMC_IDX_MAX, wmax = 0; i < n; i++) {
+ c = uncore_get_event_constraint(box, box->event_list[i]);
+ box->event_constraint[i] = c;
+ wmin = min(wmin, c->weight);
+ wmax = max(wmax, c->weight);
+ }
+
+ /* fastpath, try to reuse previous register */
+ for (i = 0; i < n; i++) {
+ hwc = &box->event_list[i]->hw;
+ c = box->event_constraint[i];
+
+ /* never assigned */
+ if (hwc->idx == -1)
+ break;
+
+ /* constraint still honored */
+ if (!test_bit(hwc->idx, c->idxmsk))
+ break;
+
+ /* not already used */
+ if (test_bit(hwc->idx, used_mask))
+ break;
+
+ __set_bit(hwc->idx, used_mask);
+ if (assign)
+ assign[i] = hwc->idx;
+ }
+ /* slow path */
+ if (i != n)
+ ret = perf_assign_events(box->event_constraint, n,
+ wmin, wmax, n, assign);
+
+ if (!assign || ret) {
+ for (i = 0; i < n; i++)
+ uncore_put_event_constraint(box, box->event_list[i]);
+ }
+ return ret ? -EINVAL : 0;
+}
+
+static void uncore_pmu_event_start(struct perf_event *event, int flags)
+{
+ struct zhaoxin_uncore_box *box = uncore_event_to_box(event);
+ int idx = event->hw.idx;
+
+
+ if (WARN_ON_ONCE(idx == -1 || idx >= UNCORE_PMC_IDX_MAX))
+ return;
+
+ if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+ return;
+
+ event->hw.state = 0;
+ box->events[idx] = event;
+ box->n_active++;
+ __set_bit(idx, box->active_mask);
+
+ local64_set(&event->hw.prev_count, uncore_read_counter(box, event));
+ uncore_enable_event(box, event);
+
+ if (box->n_active == 1)
+ uncore_pmu_start_hrtimer(box);
+}
+
+static void uncore_pmu_event_stop(struct perf_event *event, int flags)
+{
+ struct zhaoxin_uncore_box *box = uncore_event_to_box(event);
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (__test_and_clear_bit(hwc->idx, box->active_mask)) {
+ uncore_disable_event(box, event);
+ box->n_active--;
+ box->events[hwc->idx] = NULL;
+ WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+ hwc->state |= PERF_HES_STOPPED;
+
+ if (box->n_active == 0)
+ uncore_pmu_cancel_hrtimer(box);
+ }
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ /*
+ * Drain the remaining delta count out of a event
+ * that we are disabling:
+ */
+ uncore_perf_event_update(box, event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+static int
+uncore_collect_events(struct zhaoxin_uncore_box *box, struct perf_event
*leader,
+ bool dogrp)
+{
+ struct perf_event *event;
+ int n, max_count;
+
+ max_count = box->pmu->type->num_counters;
+ if (box->pmu->type->fixed_ctl)
+ max_count++;
+
+ if (box->n_events >= max_count)
+ return -EINVAL;
+
+ n = box->n_events;
+
+ if (is_box_event(box, leader)) {
+ box->event_list[n] = leader;
+ n++;
+ }
+
+ if (!dogrp)
+ return n;
+
+ for_each_sibling_event(event, leader) {
+ if (!is_box_event(box, event) ||
+ event->state <= PERF_EVENT_STATE_OFF)
+ continue;
+
+ if (n >= max_count)
+ return -EINVAL;
+
+ box->event_list[n] = event;
+ n++;
+ }
+ return n;
+}
+
+static int uncore_pmu_event_add(struct perf_event *event, int flags)
+{
+ struct zhaoxin_uncore_box *box = uncore_event_to_box(event);
+ struct hw_perf_event *hwc = &event->hw;
+ int assign[UNCORE_PMC_IDX_MAX];
+ int i, n, ret;
+
+ if (!box)
+ return -ENODEV;
+
+ ret = n = uncore_collect_events(box, event, false);
+ if (ret < 0)
+ return ret;
+
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+ if (!(flags & PERF_EF_START))
+ hwc->state |= PERF_HES_ARCH;
+
+ ret = uncore_assign_events(box, assign, n);
+ if (ret)
+ return ret;
+
+ /* save events moving to new counters */
+ for (i = 0; i < box->n_events; i++) {
+ event = box->event_list[i];
+ hwc = &event->hw;
+
+ if (hwc->idx == assign[i] &&
+ hwc->last_tag == box->tags[assign[i]])
+ continue;
+ /*
+ * Ensure we don't accidentally enable a stopped
+ * counter simply because we rescheduled.
+ */
+ if (hwc->state & PERF_HES_STOPPED)
+ hwc->state |= PERF_HES_ARCH;
+
+ uncore_pmu_event_stop(event, PERF_EF_UPDATE);
+ }
+
+ /* reprogram moved events into new counters */
+ for (i = 0; i < n; i++) {
+ event = box->event_list[i];
+ hwc = &event->hw;
+
+ if (hwc->idx != assign[i] ||
+ hwc->last_tag != box->tags[assign[i]])
+ uncore_assign_hw_event(box, event, assign[i]);
+ else if (i < box->n_events)
+ continue;
+
+ if (hwc->state & PERF_HES_ARCH)
+ continue;
+
+ uncore_pmu_event_start(event, 0);
+ }
+ box->n_events = n;
+
+ return 0;
+}
+
+static int uncore_validate_group(struct zhaoxin_uncore_pmu *pmu,
+ struct perf_event *event)
+{
+ struct perf_event *leader = event->group_leader;
+ struct zhaoxin_uncore_box *fake_box;
+ int ret = -EINVAL, n;
+
+ fake_box = uncore_alloc_box(pmu->type, NUMA_NO_NODE);
+ if (!fake_box)
+ return -ENOMEM;
+
+ fake_box->pmu = pmu;
+ /*
+ * the event is not yet connected with its
+ * siblings therefore we must first collect
+ * existing siblings, then add the new event
+ * before we can simulate the scheduling
+ */
+ n = uncore_collect_events(fake_box, leader, true);
+ if (n < 0)
+ goto out;
+
+ fake_box->n_events = n;
+ n = uncore_collect_events(fake_box, event, false);
+ if (n < 0)
+ goto out;
+
+ fake_box->n_events = n;
+
+ ret = uncore_assign_events(fake_box, NULL, n);
+out:
+ kfree(fake_box);
+ return ret;
+}
+
+static void uncore_pmu_event_del(struct perf_event *event, int flags)
+{
+ struct zhaoxin_uncore_box *box = uncore_event_to_box(event);
+ int i;
+
+ uncore_pmu_event_stop(event, PERF_EF_UPDATE);
+
+ for (i = 0; i < box->n_events; i++) {
+ if (event == box->event_list[i]) {
+ uncore_put_event_constraint(box, event);
+
+ for (++i; i < box->n_events; i++)
+ box->event_list[i - 1] = box->event_list[i];
+
+ --box->n_events;
+ break;
+ }
+ }
+
+ event->hw.idx = -1;
+ event->hw.last_tag = ~0ULL;
+}
+
+static void uncore_pmu_event_read(struct perf_event *event)
+{
+ struct zhaoxin_uncore_box *box = uncore_event_to_box(event);
+
+ uncore_perf_event_update(box, event);
+}
+
+static int uncore_pmu_event_init(struct perf_event *event)
+{
+ struct zhaoxin_uncore_pmu *pmu;
+ struct zhaoxin_uncore_box *box;
+ struct hw_perf_event *hwc = &event->hw;
+ int ret;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ pmu = uncore_event_to_pmu(event);
+ /* no device found for this pmu */
+ if (pmu->func_id < 0)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (hwc->sample_period)
+ return -EINVAL;
+
+ /*
+ * Place all uncore events for a particular physical package
+ * onto a single cpu
+ */
+ if (event->cpu < 0)
+ return -EINVAL;
+ box = uncore_pmu_to_box(pmu, event->cpu);
+ if (!box || box->cpu < 0)
+ return -EINVAL;
+ event->cpu = box->cpu;
+ event->pmu_private = box;
+
+ event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
+
+ event->hw.idx = -1;
+ event->hw.last_tag = ~0ULL;
+ event->hw.extra_reg.idx = EXTRA_REG_NONE;
+ event->hw.branch_reg.idx = EXTRA_REG_NONE;
+
+ if (event->attr.config == UNCORE_FIXED_EVENT) {
+ /* no fixed counter */
+ if (!pmu->type->fixed_ctl)
+ return -EINVAL;
+ /*
+ * if there is only one fixed counter, only the first pmu
+ * can access the fixed counter
+ */
+ if (pmu->type->single_fixed && pmu->pmu_idx > 0)
+ return -EINVAL;
+
+ /* fixed counters have event field hardcoded to zero */
+ hwc->config = 0ULL;
+ } else {
+ hwc->config = event->attr.config &
+ (pmu->type->event_mask |
((u64)pmu->type->event_mask_ext << 32));
+ if (pmu->type->ops->hw_config) {
+ ret = pmu->type->ops->hw_config(box, event);
+ if (ret)
+ return ret;
+ }
+ }
+
+ if (event->group_leader != event)
+ ret = uncore_validate_group(pmu, event);
+ else
+ ret = 0;
+
+ return ret;
+}
+
+static void uncore_pmu_enable(struct pmu *pmu)
+{
+ struct zhaoxin_uncore_pmu *uncore_pmu;
+ struct zhaoxin_uncore_box *box;
+
+ uncore_pmu = container_of(pmu, struct zhaoxin_uncore_pmu, pmu);
+ if (!uncore_pmu)
+ return;
+
+ box = uncore_pmu_to_box(uncore_pmu, smp_processor_id());
+ if (!box)
+ return;
+
+ if (uncore_pmu->type->ops->enable_box)
+ uncore_pmu->type->ops->enable_box(box);
+}
+
+static void uncore_pmu_disable(struct pmu *pmu)
+{
+ struct zhaoxin_uncore_pmu *uncore_pmu;
+ struct zhaoxin_uncore_box *box;
+
+ uncore_pmu = container_of(pmu, struct zhaoxin_uncore_pmu, pmu);
+ if (!uncore_pmu)
+ return;
+
+ box = uncore_pmu_to_box(uncore_pmu, smp_processor_id());
+ if (!box)
+ return;
+
+ if (uncore_pmu->type->ops->disable_box)
+ uncore_pmu->type->ops->disable_box(box);
+}
+
+static ssize_t uncore_get_attr_cpumask(struct device *dev, struct
device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &uncore_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, uncore_get_attr_cpumask, NULL);
+
+static struct attribute *uncore_pmu_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static const struct attribute_group uncore_pmu_attr_group = {
+ .attrs = uncore_pmu_attrs,
+};
+
+static void uncore_pmu_unregister(struct zhaoxin_uncore_pmu *pmu)
+{
+ if (!pmu->registered)
+ return;
+ perf_pmu_unregister(&pmu->pmu);
+ pmu->registered = false;
+}
+
+static void uncore_free_boxes(struct zhaoxin_uncore_pmu *pmu)
+{
+ int package;
+
+ for (package = 0; package < max_packages; package++)
+ kfree(pmu->boxes[package]);
+ kfree(pmu->boxes);
+}
+
+static void uncore_type_exit(struct zhaoxin_uncore_type *type)
+{
+ struct zhaoxin_uncore_pmu *pmu = type->pmus;
+ int i;
+
+ if (pmu) {
+ for (i = 0; i < type->num_boxes; i++, pmu++) {
+ uncore_pmu_unregister(pmu);
+ uncore_free_boxes(pmu);
+ }
+ kfree(type->pmus);
+ type->pmus = NULL;
+ }
+ kfree(type->events_group);
+ type->events_group = NULL;
+}
+
+static void uncore_types_exit(struct zhaoxin_uncore_type **types)
+{
+ for (; *types; types++)
+ uncore_type_exit(*types);
+}
+
+static int __init uncore_type_init(struct zhaoxin_uncore_type *type,
bool setid)
+{
+ struct zhaoxin_uncore_pmu *pmus;
+ size_t size;
+ int i, j;
+
+ pmus = kcalloc(type->num_boxes, sizeof(*pmus), GFP_KERNEL);
+ if (!pmus)
+ return -ENOMEM;
+
+ size = max_packages*sizeof(struct zhaoxin_uncore_box *);
+
+ for (i = 0; i < type->num_boxes; i++) {
+ pmus[i].func_id = setid ? i : -1;
+ pmus[i].pmu_idx = i;
+ pmus[i].type = type;
+ pmus[i].boxes = kzalloc(size, GFP_KERNEL);
+ if (!pmus[i].boxes)
+ goto err;
+ }
+
+ type->pmus = pmus;
+ type->unconstrainted = (struct event_constraint)
+ __EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
+ 0, type->num_counters, 0, 0);
+
+ if (type->event_descs) {
+ struct {
+ struct attribute_group group;
+ struct attribute *attrs[];
+ } *attr_group;
+ for (i = 0; type->event_descs[i].attr.attr.name; i++)
+ ;
+
+ attr_group = kzalloc(struct_size(attr_group, attrs, i + 1),
GFP_KERNEL);
+ if (!attr_group)
+ goto err;
+
+ attr_group->group.name = "events";
+ attr_group->group.attrs = attr_group->attrs;
+
+ for (j = 0; j < i; j++)
+ attr_group->attrs[j] = &type->event_descs[j].attr.attr;
+
+ type->events_group = &attr_group->group;
+ }
+
+ type->pmu_group = &uncore_pmu_attr_group;
+
+ return 0;
+
+err:
+ for (i = 0; i < type->num_boxes; i++)
+ kfree(pmus[i].boxes);
+ kfree(pmus);
+
+ return -ENOMEM;
+}
+
+static int __init
+uncore_types_init(struct zhaoxin_uncore_type **types, bool setid)
+{
+ int ret;
+
+ for (; *types; types++) {
+ ret = uncore_type_init(*types, setid);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+static void uncore_change_type_ctx(struct zhaoxin_uncore_type *type,
int old_cpu,
+ int new_cpu)
+{
+ struct zhaoxin_uncore_pmu *pmu = type->pmus;
+ struct zhaoxin_uncore_box *box;
+ int i, package;
+
+ package = topology_logical_package_id(old_cpu < 0 ? new_cpu : old_cpu);
+ for (i = 0; i < type->num_boxes; i++, pmu++) {
+ box = pmu->boxes[package];
+ if (!box)
+ continue;
+
+ if (old_cpu < 0) {
+ WARN_ON_ONCE(box->cpu != -1);
+ box->cpu = new_cpu;
+ continue;
+ }
+
+ WARN_ON_ONCE(box->cpu != old_cpu);
+ box->cpu = -1;
+ if (new_cpu < 0)
+ continue;
+
+ uncore_pmu_cancel_hrtimer(box);
+ perf_pmu_migrate_context(&pmu->pmu, old_cpu, new_cpu);
+ box->cpu = new_cpu;
+ }
+}
+
+static void uncore_change_context(struct zhaoxin_uncore_type **uncores,
+ int old_cpu, int new_cpu)
+{
+ for (; *uncores; uncores++)
+ uncore_change_type_ctx(*uncores, old_cpu, new_cpu);
+}
+
+static void uncore_box_unref(struct zhaoxin_uncore_type **types, int id)
+{
+ struct zhaoxin_uncore_type *type;
+ struct zhaoxin_uncore_pmu *pmu;
+ struct zhaoxin_uncore_box *box;
+ int i;
+
+ for (; *types; types++) {
+ type = *types;
+ pmu = type->pmus;
+ for (i = 0; i < type->num_boxes; i++, pmu++) {
+ box = pmu->boxes[id];
+ if (box && atomic_dec_return(&box->refcnt) == 0)
+ uncore_box_exit(box);
+ }
+ }
+}
+
+static int uncore_event_cpu_offline(unsigned int cpu)
+{
+ int package, target;
+
+ /* Check if exiting cpu is used for collecting uncore events */
+ if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
+ goto unref;
+ /* Find a new cpu to collect uncore events */
+ target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+ /* Migrate uncore events to the new target */
+ if (target < nr_cpu_ids)
+ cpumask_set_cpu(target, &uncore_cpu_mask);
+ else
+ target = -1;
+
+ uncore_change_context(uncore_msr_uncores, cpu, target);
+
+unref:
+ /* Clear the references */
+ package = topology_logical_package_id(cpu);
+ uncore_box_unref(uncore_msr_uncores, package);
+ return 0;
+}
+
+static int allocate_boxes(struct zhaoxin_uncore_type **types,
+ unsigned int package, unsigned int cpu)
+{
+ struct zhaoxin_uncore_box *box, *tmp;
+ struct zhaoxin_uncore_type *type;
+ struct zhaoxin_uncore_pmu *pmu;
+ LIST_HEAD(allocated);
+ int i;
+
+ /* Try to allocate all required boxes */
+ for (; *types; types++) {
+ type = *types;
+ pmu = type->pmus;
+ for (i = 0; i < type->num_boxes; i++, pmu++) {
+ if (pmu->boxes[package])
+ continue;
+ box = uncore_alloc_box(type, cpu_to_node(cpu));
+ if (!box)
+ goto cleanup;
+ box->pmu = pmu;
+ box->package_id = package;
+ list_add(&box->active_list, &allocated);
+ }
+ }
+ /* Install them in the pmus */
+ list_for_each_entry_safe(box, tmp, &allocated, active_list) {
+ list_del_init(&box->active_list);
+ box->pmu->boxes[package] = box;
+ }
+ return 0;
+
+cleanup:
+ list_for_each_entry_safe(box, tmp, &allocated, active_list) {
+ list_del_init(&box->active_list);
+ kfree(box);
+ }
+ return -ENOMEM;
+}
+
+static int uncore_box_ref(struct zhaoxin_uncore_type **types,
+ int id, unsigned int cpu)
+{
+ struct zhaoxin_uncore_type *type;
+ struct zhaoxin_uncore_pmu *pmu;
+ struct zhaoxin_uncore_box *box;
+ int i, ret;
+
+ ret = allocate_boxes(types, id, cpu);
+ if (ret)
+ return ret;
+
+ for (; *types; types++) {
+ type = *types;
+ pmu = type->pmus;
+ for (i = 0; i < type->num_boxes; i++, pmu++) {
+ box = pmu->boxes[id];
+ if (box && atomic_inc_return(&box->refcnt) == 1)
+ uncore_box_init(box);
+ }
+ }
+ return 0;
+}
+
+static int uncore_event_cpu_online(unsigned int cpu)
+{
+ int package, target, msr_ret;
+
+ package = topology_logical_package_id(cpu);
+ msr_ret = uncore_box_ref(uncore_msr_uncores, package, cpu);
+
+ if (msr_ret)
+ return -ENOMEM;
+
+ /*
+ * Check if there is an online cpu in the package
+ * which collects uncore events already.
+ */
+ target = cpumask_any_and(&uncore_cpu_mask, topology_core_cpumask(cpu));
+ if (target < nr_cpu_ids)
+ return 0;
+
+ cpumask_set_cpu(cpu, &uncore_cpu_mask);
+
+ if (!msr_ret)
+ uncore_change_context(uncore_msr_uncores, -1, cpu);
+
+ return 0;
+}
+
+static int uncore_pmu_register(struct zhaoxin_uncore_pmu *pmu)
+{
+ int ret;
+
+ if (!pmu->type->pmu) {
+ pmu->pmu = (struct pmu) {
+ .attr_groups = pmu->type->attr_groups,
+ .task_ctx_nr = perf_invalid_context,
+ .pmu_enable = uncore_pmu_enable,
+ .pmu_disable = uncore_pmu_disable,
+ .event_init = uncore_pmu_event_init,
+ .add = uncore_pmu_event_add,
+ .del = uncore_pmu_event_del,
+ .start = uncore_pmu_event_start,
+ .stop = uncore_pmu_event_stop,
+ .read = uncore_pmu_event_read,
+ .module = THIS_MODULE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ };
+ } else {
+ pmu->pmu = *pmu->type->pmu;
+ pmu->pmu.attr_groups = pmu->type->attr_groups;
+ }
+
+ if (pmu->type->num_boxes == 1) {
+ if (strlen(pmu->type->name) > 0)
+ sprintf(pmu->name, "uncore_%s", pmu->type->name);
+ else
+ sprintf(pmu->name, "uncore");
+ } else {
+ sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
+ pmu->pmu_idx);
+ }
+
+ ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
+ if (!ret)
+ pmu->registered = true;
+ return ret;
+}
+
+static int __init type_pmu_register(struct zhaoxin_uncore_type *type)
+{
+ int i, ret;
+
+ for (i = 0; i < type->num_boxes; i++) {
+ ret = uncore_pmu_register(&type->pmus[i]);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+static int __init uncore_msr_pmus_register(void)
+{
+ struct zhaoxin_uncore_type **types = uncore_msr_uncores;
+ int ret;
+
+ for (; *types; types++) {
+ ret = type_pmu_register(*types);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+static int __init uncore_cpu_init(void)
+{
+ int ret;
+
+ ret = uncore_types_init(uncore_msr_uncores, true);
+ if (ret)
+ goto err;
+
+ ret = uncore_msr_pmus_register();
+ if (ret)
+ goto err;
+ return 0;
+err:
+ uncore_types_exit(uncore_msr_uncores);
+ uncore_msr_uncores = empty_uncore;
+ return ret;
+}
+
+struct zhaoxin_uncore_init_fun {
+ void (*cpu_init)(void);
+};
+
+void chx_uncore_cpu_init(void)
+{
+ uncore_msr_uncores = chx_msr_uncores;
+}
+
+static const struct zhaoxin_uncore_init_fun chx_uncore_init __initconst = {
+ .cpu_init = chx_uncore_cpu_init,
+};
+
+static const struct x86_cpu_id zhaoxin_uncore_match[] __initconst = {
+ X86_MATCH_VENDOR_FAM_MODEL(ZHAOXIN, 7, ZHAOXIN_FAM7_ZXD,
&chx_uncore_init),
+ X86_MATCH_VENDOR_FAM_MODEL(ZHAOXIN, 7, ZHAOXIN_FAM7_ZXE,
&chx_uncore_init),
+ X86_MATCH_VENDOR_FAM_MODEL(CENTAUR, 7, ZHAOXIN_FAM7_ZXD,
&chx_uncore_init),
+ X86_MATCH_VENDOR_FAM_MODEL(CENTAUR, 7, ZHAOXIN_FAM7_ZXE,
&chx_uncore_init),
+ {},
+};
+
+MODULE_DEVICE_TABLE(x86cpu, zhaoxin_uncore_match);
+
+static int __init zhaoxin_uncore_init(void)
+{
+ const struct x86_cpu_id *id;
+ struct zhaoxin_uncore_init_fun *uncore_init;
+ int cret = 0, ret;
+
+ id = x86_match_cpu(zhaoxin_uncore_match);
+
+ if (!id)
+ return -ENODEV;
+
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return -ENODEV;
+
+ max_packages = topology_max_packages();
+
+ pr_info("welcome to uncore!\n");
+
+ uncore_init = (struct zhaoxin_uncore_init_fun *)id->driver_data;
+
+ if (uncore_init->cpu_init) {
+ uncore_init->cpu_init();
+ cret = uncore_cpu_init();
+ }
+
+ if (cret)
+ return -ENODEV;
+
+ ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_UNCORE_ONLINE,
+ "perf/x86/zhaoxin/uncore:online",
+ uncore_event_cpu_online,
+ uncore_event_cpu_offline);
+ pr_info("zhaoxin uncore init success!\n");
+ if (ret)
+ goto err;
+ return 0;
+
+err:
+ uncore_types_exit(uncore_msr_uncores);
+ return ret;
+}
+module_init(zhaoxin_uncore_init);
+
+static void __exit zhaoxin_uncore_exit(void)
+{
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_UNCORE_ONLINE);
+ uncore_types_exit(uncore_msr_uncores);
+}
+module_exit(zhaoxin_uncore_exit);
diff --git a/arch/x86/events/zhaoxin/uncore.h
b/arch/x86/events/zhaoxin/uncore.h
new file mode 100644
index 000000000000..e0f4ec340725
--- /dev/null
+++ b/arch/x86/events/zhaoxin/uncore.h
@@ -0,0 +1,311 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Zhaoxin PMU; like Intel Architectural PerfMon-v2
+ */
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <asm/apicdef.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+
+#include <linux/perf_event.h>
+#include "../perf_event.h"
+
+#define ZHAOXIN_FAM7_ZXD 0x1b
+#define ZHAOXIN_FAM7_ZXE 0x3b
+
+#define UNCORE_PMU_NAME_LEN 32
+#define UNCORE_PMU_HRTIMER_INTERVAL (60LL * NSEC_PER_SEC)
+#define UNCORE_CHX_IMC_HRTIMER_INTERVAL (5ULL * NSEC_PER_SEC)
+
+
+#define UNCORE_FIXED_EVENT 0xff
+#define UNCORE_PMC_IDX_MAX_GENERIC 4
+#define UNCORE_PMC_IDX_MAX_FIXED 1
+#define UNCORE_PMC_IDX_FIXED UNCORE_PMC_IDX_MAX_GENERIC
+
+#define UNCORE_PMC_IDX_MAX (UNCORE_PMC_IDX_FIXED + 1)
+
+struct zhaoxin_uncore_ops;
+struct zhaoxin_uncore_pmu;
+struct zhaoxin_uncore_box;
+struct uncore_event_desc;
+
+struct zhaoxin_uncore_type {
+ const char *name;
+ int num_counters;
+ int num_boxes;
+ int perf_ctr_bits;
+ int fixed_ctr_bits;
+ unsigned int perf_ctr;
+ unsigned int event_ctl;
+ unsigned int event_mask;
+ unsigned int event_mask_ext;
+ unsigned int fixed_ctr;
+ unsigned int fixed_ctl;
+ unsigned int box_ctl;
+ unsigned int msr_offset;
+ unsigned int num_shared_regs:8;
+ unsigned int single_fixed:1;
+ unsigned int pair_ctr_ctl:1;
+ unsigned int *msr_offsets;
+ struct event_constraint unconstrainted;
+ struct event_constraint *constraints;
+ struct zhaoxin_uncore_pmu *pmus;
+ struct zhaoxin_uncore_ops *ops;
+ struct uncore_event_desc *event_descs;
+ const struct attribute_group *attr_groups[4];
+ struct pmu *pmu; /* for custom pmu ops */
+};
+
+#define pmu_group attr_groups[0]
+#define format_group attr_groups[1]
+#define events_group attr_groups[2]
+
+struct zhaoxin_uncore_ops {
+ void (*init_box)(struct zhaoxin_uncore_box *);
+ void (*exit_box)(struct zhaoxin_uncore_box *);
+ void (*disable_box)(struct zhaoxin_uncore_box *);
+ void (*enable_box)(struct zhaoxin_uncore_box *);
+ void (*disable_event)(struct zhaoxin_uncore_box *, struct
perf_event *);
+ void (*enable_event)(struct zhaoxin_uncore_box *, struct perf_event *);
+ u64 (*read_counter)(struct zhaoxin_uncore_box *, struct perf_event *);
+ int (*hw_config)(struct zhaoxin_uncore_box *, struct perf_event *);
+ struct event_constraint *(*get_constraint)(struct zhaoxin_uncore_box *,
+ struct perf_event *);
+ void (*put_constraint)(struct zhaoxin_uncore_box *, struct
perf_event *);
+};
+
+struct zhaoxin_uncore_pmu {
+ struct pmu pmu;
+ char name[UNCORE_PMU_NAME_LEN];
+ int pmu_idx;
+ int func_id;
+ bool registered;
+ atomic_t activeboxes;
+ struct zhaoxin_uncore_type *type;
+ struct zhaoxin_uncore_box **boxes;
+};
+
+struct zhaoxin_uncore_extra_reg {
+ raw_spinlock_t lock;
+ u64 config, config1, config2;
+ atomic_t ref;
+};
+
+struct zhaoxin_uncore_box {
+ int pci_phys_id;
+ int package_id; /*Package ID */
+ int n_active; /* number of active events */
+ int n_events;
+ int cpu; /* cpu to collect events */
+ unsigned long flags;
+ atomic_t refcnt;
+ struct perf_event *events[UNCORE_PMC_IDX_MAX];
+ struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
+ struct event_constraint *event_constraint[UNCORE_PMC_IDX_MAX];
+ unsigned long active_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
+ u64 tags[UNCORE_PMC_IDX_MAX];
+ struct pci_dev *pci_dev;
+ struct zhaoxin_uncore_pmu *pmu;
+ u64 hrtimer_duration; /* hrtimer timeout for this box */
+ struct hrtimer hrtimer;
+ struct list_head list;
+ struct list_head active_list;
+ void __iomem *io_addr;
+ struct zhaoxin_uncore_extra_reg shared_regs[0];
+};
+
+#define UNCORE_BOX_FLAG_INITIATED 0
+
+struct uncore_event_desc {
+ struct device_attribute attr;
+ const char *config;
+};
+
+ssize_t zx_uncore_event_show(struct device *dev,
+ struct device_attribute *attr, char *buf);
+
+#define ZHAOXIN_UNCORE_EVENT_DESC(_name, _config) \
+{ \
+ .attr = __ATTR(_name, 0444, zx_uncore_event_show, NULL), \
+ .config = _config, \
+}
+
+#define DEFINE_UNCORE_FORMAT_ATTR(_var, _name, _format) \
+static ssize_t __uncore_##_var##_show(struct device *dev, \
+ struct device_attribute *attr, \
+ char *page) \
+{ \
+ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \
+ return sprintf(page, _format "\n"); \
+} \
+static struct device_attribute format_attr_##_var = \
+ __ATTR(_name, 0444, __uncore_##_var##_show, NULL)
+
+static inline bool uncore_pmc_fixed(int idx)
+{
+ return idx == UNCORE_PMC_IDX_FIXED;
+}
+
+static inline unsigned int uncore_msr_box_offset(struct
zhaoxin_uncore_box *box)
+{
+ struct zhaoxin_uncore_pmu *pmu = box->pmu;
+
+ return pmu->type->msr_offsets ?
+ pmu->type->msr_offsets[pmu->pmu_idx] :
+ pmu->type->msr_offset * pmu->pmu_idx;
+}
+
+static inline unsigned int uncore_msr_box_ctl(struct zhaoxin_uncore_box
*box)
+{
+ if (!box->pmu->type->box_ctl)
+ return 0;
+ return box->pmu->type->box_ctl + uncore_msr_box_offset(box);
+}
+
+static inline unsigned int uncore_msr_fixed_ctl(struct
zhaoxin_uncore_box *box)
+{
+ if (!box->pmu->type->fixed_ctl)
+ return 0;
+ return box->pmu->type->fixed_ctl + uncore_msr_box_offset(box);
+}
+
+static inline unsigned int uncore_msr_fixed_ctr(struct
zhaoxin_uncore_box *box)
+{
+ return box->pmu->type->fixed_ctr + uncore_msr_box_offset(box);
+}
+
+static inline
+unsigned int uncore_msr_event_ctl(struct zhaoxin_uncore_box *box, int idx)
+{
+ return box->pmu->type->event_ctl +
+ (box->pmu->type->pair_ctr_ctl ? 2 * idx : idx) +
+ uncore_msr_box_offset(box);
+}
+
+static inline
+unsigned int uncore_msr_perf_ctr(struct zhaoxin_uncore_box *box, int idx)
+{
+ return box->pmu->type->perf_ctr +
+ (box->pmu->type->pair_ctr_ctl ? 2 * idx : idx) +
+ uncore_msr_box_offset(box);
+}
+
+static inline
+unsigned int uncore_fixed_ctl(struct zhaoxin_uncore_box *box)
+{
+ return uncore_msr_fixed_ctl(box);
+}
+
+static inline
+unsigned int uncore_fixed_ctr(struct zhaoxin_uncore_box *box)
+{
+ return uncore_msr_fixed_ctr(box);
+}
+
+static inline
+unsigned int uncore_event_ctl(struct zhaoxin_uncore_box *box, int idx)
+{
+ return uncore_msr_event_ctl(box, idx);
+}
+
+static inline
+unsigned int uncore_perf_ctr(struct zhaoxin_uncore_box *box, int idx)
+{
+ return uncore_msr_perf_ctr(box, idx);
+}
+
+static inline int uncore_perf_ctr_bits(struct zhaoxin_uncore_box *box)
+{
+ return box->pmu->type->perf_ctr_bits;
+}
+
+static inline int uncore_fixed_ctr_bits(struct zhaoxin_uncore_box *box)
+{
+ return box->pmu->type->fixed_ctr_bits;
+}
+
+static inline int uncore_num_counters(struct zhaoxin_uncore_box *box)
+{
+ return box->pmu->type->num_counters;
+}
+
+static inline void uncore_disable_box(struct zhaoxin_uncore_box *box)
+{
+ if (box->pmu->type->ops->disable_box)
+ box->pmu->type->ops->disable_box(box);
+}
+
+static inline void uncore_enable_box(struct zhaoxin_uncore_box *box)
+{
+ if (box->pmu->type->ops->enable_box)
+ box->pmu->type->ops->enable_box(box);
+}
+
+static inline void uncore_disable_event(struct zhaoxin_uncore_box *box,
+ struct perf_event *event)
+{
+ box->pmu->type->ops->disable_event(box, event);
+}
+
+static inline void uncore_enable_event(struct zhaoxin_uncore_box *box,
+ struct perf_event *event)
+{
+ box->pmu->type->ops->enable_event(box, event);
+}
+
+static inline u64 uncore_read_counter(struct zhaoxin_uncore_box *box,
+ struct perf_event *event)
+{
+ return box->pmu->type->ops->read_counter(box, event);
+}
+
+static inline void uncore_box_init(struct zhaoxin_uncore_box *box)
+{
+ if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+ if (box->pmu->type->ops->init_box)
+ box->pmu->type->ops->init_box(box);
+ }
+}
+
+static inline void uncore_box_exit(struct zhaoxin_uncore_box *box)
+{
+ if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+ if (box->pmu->type->ops->exit_box)
+ box->pmu->type->ops->exit_box(box);
+ }
+}
+
+static inline bool uncore_box_is_fake(struct zhaoxin_uncore_box *box)
+{
+ return (box->package_id < 0);
+}
+
+static inline struct zhaoxin_uncore_pmu *uncore_event_to_pmu(struct
perf_event *event)
+{
+ return container_of(event->pmu, struct zhaoxin_uncore_pmu, pmu);
+}
+
+static inline struct zhaoxin_uncore_box *uncore_event_to_box(struct
perf_event *event)
+{
+ return event->pmu_private;
+}
+
+
+static struct zhaoxin_uncore_box *uncore_pmu_to_box(struct
zhaoxin_uncore_pmu *pmu, int cpu);
+static u64 uncore_msr_read_counter(struct zhaoxin_uncore_box *box,
struct perf_event *event);
+
+static void uncore_pmu_start_hrtimer(struct zhaoxin_uncore_box *box);
+static void uncore_pmu_cancel_hrtimer(struct zhaoxin_uncore_box *box);
+static void uncore_pmu_event_start(struct perf_event *event, int flags);
+static void uncore_pmu_event_stop(struct perf_event *event, int flags);
+static int uncore_pmu_event_add(struct perf_event *event, int flags);
+static void uncore_pmu_event_del(struct perf_event *event, int flags);
+static void uncore_pmu_event_read(struct perf_event *event);
+static void uncore_perf_event_update(struct zhaoxin_uncore_box *box,
struct perf_event *event);
+struct event_constraint *
+uncore_get_constraint(struct zhaoxin_uncore_box *box, struct perf_event
*event);
+void uncore_put_constraint(struct zhaoxin_uncore_box *box, struct
perf_event *event);
+u64 uncore_shared_reg_config(struct zhaoxin_uncore_box *box, int idx);
+
+void chx_uncore_cpu_init(void);
--
2.20.1
1
0

09 Jul '21
Some ACPI devices need to issue dma requests to access
the reserved memory area.BIOS uses the device scope type
ACPI_NAMESPACE_DEVICE in RMRR to report these ACPI devices.
This patch add support for detecting ACPI devices in RMRR and in
order to distinguish it from PCI device, some interface functions
are modified.
Signed-off-by: LeoLiu-oc <LeoLiu-oc(a)zhaoxin.com>
---
drivers/iommu/intel/dmar.c | 77 +++++++++++++++++++--------------
drivers/iommu/intel/iommu.c | 86 ++++++++++++++++++++++++++++++++++---
drivers/iommu/iommu.c | 6 +++
include/linux/dmar.h | 11 ++++-
include/linux/iommu.h | 3 ++
5 files changed, 142 insertions(+), 41 deletions(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index b8d0b56a7575..1d705589fe21 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -215,7 +215,7 @@ static bool dmar_match_pci_path(struct
dmar_pci_notify_info *info, int bus,
}
/* Return: > 0 if match found, 0 if no match found, < 0 if error
happens */
-int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
+int dmar_pci_insert_dev_scope(struct dmar_pci_notify_info *info,
void *start, void*end, u16 segment,
struct dmar_dev_scope *devices,
int devices_cnt)
@@ -304,7 +304,7 @@ static int dmar_pci_bus_add_dev(struct
dmar_pci_notify_info *info)
drhd = container_of(dmaru->hdr,
struct acpi_dmar_hardware_unit, header);
- ret = dmar_insert_dev_scope(info, (void *)(drhd + 1),
+ ret = dmar_pci_insert_dev_scope(info, (void *)(drhd + 1),
((void *)drhd) + drhd->header.length,
dmaru->segment,
dmaru->devices, dmaru->devices_cnt);
@@ -719,47 +719,58 @@ dmar_find_matched_drhd_unit(struct pci_dev *dev)
return dmaru;
}
-static void __init dmar_acpi_insert_dev_scope(u8 device_number,
- struct acpi_device *adev)
+/* Return: > 0 if match found, 0 if no match found */
+bool dmar_acpi_insert_dev_scope(u8 device_number,
+ struct acpi_device *adev,
+ void *start, void *end,
+ struct dmar_dev_scope *devices,
+ int devices_cnt)
{
- struct dmar_drhd_unit *dmaru;
- struct acpi_dmar_hardware_unit *drhd;
struct acpi_dmar_device_scope *scope;
struct device *tmp;
int i;
struct acpi_dmar_pci_path *path;
+ for (; start < end; start += scope->length) {
+ scope = start;
+ if (scope->entry_type != ACPI_DMAR_SCOPE_TYPE_NAMESPACE)
+ continue;
+ if (scope->enumeration_id != device_number)
+ continue;
+ path = (void *)(scope + 1);
+ for_each_dev_scope(devices, devices_cnt, i, tmp)
+ if (tmp == NULL) {
+ devices[i].bus = scope->bus;
+ devices[i].devfn = PCI_DEVFN(path->device, path->function);
+ rcu_assign_pointer(devices[i].dev,
+ get_device(&adev->dev));
+ return true;
+ }
+ WARN_ON(i >= devices_cnt);
+ }
+ return false;
+}
+
+static int dmar_acpi_bus_add_dev(u8 device_number, struct acpi_device
*adev)
+{
+ struct dmar_drhd_unit *dmaru;
+ struct acpi_dmar_hardware_unit *drhd;
+ int ret;
+
for_each_drhd_unit(dmaru) {
drhd = container_of(dmaru->hdr,
struct acpi_dmar_hardware_unit,
header);
-
- for (scope = (void *)(drhd + 1);
- (unsigned long)scope < ((unsigned long)drhd) +
drhd->header.length;
- scope = ((void *)scope) + scope->length) {
- if (scope->entry_type != ACPI_DMAR_SCOPE_TYPE_NAMESPACE)
- continue;
- if (scope->enumeration_id != device_number)
- continue;
-
- path = (void *)(scope + 1);
- pr_info("ACPI device \"%s\" under DMAR at %llx as
%02x:%02x.%d\n",
- dev_name(&adev->dev), dmaru->reg_base_addr,
- scope->bus, path->device, path->function);
- for_each_dev_scope(dmaru->devices, dmaru->devices_cnt, i, tmp)
- if (tmp == NULL) {
- dmaru->devices[i].bus = scope->bus;
- dmaru->devices[i].devfn = PCI_DEVFN(path->device,
- path->function);
- rcu_assign_pointer(dmaru->devices[i].dev,
- get_device(&adev->dev));
- return;
- }
- BUG_ON(i >= dmaru->devices_cnt);
- }
+ ret = dmar_acpi_insert_dev_scope(device_number, adev, (void
*)(drhd+1),
+ ((void *)drhd)+drhd->header.length,
+ dmaru->devices, dmaru->devices_cnt);
+ if (ret)
+ break;
}
- pr_warn("No IOMMU scope found for ANDD enumeration ID %d (%s)\n",
- device_number, dev_name(&adev->dev));
+ if (ret > 0)
+ ret = dmar_rmrr_add_acpi_dev(device_number, adev);
+
+ return ret;
}
static int __init dmar_acpi_dev_scope_init(void)
@@ -788,7 +799,7 @@ static int __init dmar_acpi_dev_scope_init(void)
andd->device_name);
continue;
}
- dmar_acpi_insert_dev_scope(andd->device_number, adev);
+ dmar_acpi_bus_add_dev(andd->device_number, adev);
}
}
return 0;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6cc6ef585aa4..5f2b7a64d2c7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4600,6 +4600,25 @@ int dmar_find_matched_atsr_unit(struct pci_dev *dev)
return ret;
}
+int dmar_rmrr_add_acpi_dev(u8 device_number, struct acpi_device *adev)
+{
+ int ret;
+ struct dmar_rmrr_unit *rmrru;
+ struct acpi_dmar_reserved_memory *rmrr;
+
+ list_for_each_entry(rmrru, &dmar_rmrr_units, list) {
+ rmrr = container_of(rmrru->hdr,
+ struct acpi_dmar_reserved_memory,
+ header);
+ ret = dmar_acpi_insert_dev_scope(device_number, adev, (void
*)(rmrr + 1),
+ ((void *)rmrr) + rmrr->header.length,
+ rmrru->devices, rmrru->devices_cnt);
+ if (ret)
+ break;
+ }
+ return 0;
+}
+
int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
{
int ret;
@@ -4615,7 +4634,7 @@ int dmar_iommu_notify_scope_dev(struct
dmar_pci_notify_info *info)
rmrr = container_of(rmrru->hdr,
struct acpi_dmar_reserved_memory, header);
if (info->event == BUS_NOTIFY_ADD_DEVICE) {
- ret = dmar_insert_dev_scope(info, (void *)(rmrr + 1),
+ ret = dmar_pci_insert_dev_scope(info, (void *)(rmrr + 1),
((void *)rmrr) + rmrr->header.length,
rmrr->segment, rmrru->devices,
rmrru->devices_cnt);
@@ -4633,7 +4652,7 @@ int dmar_iommu_notify_scope_dev(struct
dmar_pci_notify_info *info)
atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
if (info->event == BUS_NOTIFY_ADD_DEVICE) {
- ret = dmar_insert_dev_scope(info, (void *)(atsr + 1),
+ ret = dmar_pci_insert_dev_scope(info, (void *)(atsr + 1),
(void *)atsr + atsr->header.length,
atsr->segment, atsru->devices,
atsru->devices_cnt);
@@ -4872,6 +4891,22 @@ static int __init platform_optin_force_iommu(void)
return 1;
}
+static int acpi_device_create_direct_mappings(struct device *pn_dev,
struct device *acpi_device)
+{
+ struct iommu_group *group;
+
+ acpi_device->bus->iommu_ops = &intel_iommu_ops;
+ group = iommu_group_get(pn_dev);
+ if (!group) {
+ pr_warn("ACPI name space devices create direct mappings wrong!\n");
+ return -EINVAL;
+ }
+ printk(KERN_INFO "pn_dev:%s enter to %s\n", dev_name(pn_dev),
__func__);
+ __acpi_device_create_direct_mappings(group, acpi_device);
+
+ return 0;
+}
+
static int __init probe_acpi_namespace_devices(void)
{
struct dmar_drhd_unit *drhd;
@@ -4879,6 +4914,7 @@ static int __init probe_acpi_namespace_devices(void)
struct intel_iommu *iommu __maybe_unused;
struct device *dev;
int i, ret = 0;
+ u8 bus, devfn;
for_each_active_iommu(iommu, drhd) {
for_each_active_dev_scope(drhd->devices,
@@ -4887,6 +4923,8 @@ static int __init probe_acpi_namespace_devices(void)
struct iommu_group *group;
struct acpi_device *adev;
+ struct device *pn_dev = NULL;
+ struct device_domain_info *info = NULL;
if (dev->bus != &acpi_bus_type)
continue;
@@ -4896,19 +4934,53 @@ static int __init probe_acpi_namespace_devices(void)
&adev->physical_node_list, node) {
group = iommu_group_get(pn->dev);
if (group) {
+ pn_dev = pn->dev;
iommu_group_put(group);
continue;
}
- pn->dev->bus->iommu_ops = &intel_iommu_ops;
- ret = iommu_probe_device(pn->dev);
- if (ret)
- break;
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu)
+ return -ENODEV;
+ info = dmar_search_domain_by_dev_info(iommu->segment,
bus, devfn);
+ if (!info) {
+ pn->dev->bus->iommu_ops = &intel_iommu_ops;
+ ret = iommu_probe_device(pn->dev);
+ if (ret) {
+ pr_err("pn->dev:%s probe fail! ret:%d\n",
+ dev_name(pn->dev), ret);
+ goto unlock;
+ }
+ }
+ pn_dev = pn->dev;
+ }
+ if (!pn_dev) {
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu)
+ return -ENODEV;
+ info = dmar_search_domain_by_dev_info(iommu->segment,
bus, devfn);
+ if (!info) {
+ dev->bus->iommu_ops = &intel_iommu_ops;
+ ret = iommu_probe_device(dev);
+ if (ret) {
+ pr_err("dev:%s probe fail! ret:%d\n",
+ dev_name(dev), ret);
+ goto unlock;
+ }
+ goto unlock;
+ }
}
+ if (!info)
+ ret = acpi_device_create_direct_mappings(pn_dev, dev);
+ else
+ ret = acpi_device_create_direct_mappings(info->dev, dev);
+unlock:
mutex_unlock(&adev->physical_node_lock);
- if (ret)
+ if (ret) {
+ pr_err("%s fail! ret:%d\n", __func__, ret);
return ret;
+ }
}
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 86e3dbdfb7bd..6212eb1856f5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -814,6 +814,12 @@ static bool iommu_is_attach_deferred(struct
iommu_domain *domain,
return false;
}
+void __acpi_device_create_direct_mappings(struct iommu_group *group,
struct device *acpi_device)
+{
+ iommu_create_device_direct_mappings(group, acpi_device);
+}
+EXPORT_SYMBOL_GPL(__acpi_device_create_direct_mappings);
+
/**
* iommu_group_add_device - add a device to an iommu group
* @group: the group into which to add the device (reference should be
held)
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 65565820328a..248e3c2feeae 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -113,10 +113,13 @@ extern int dmar_parse_dev_scope(void *start, void
*end, int *cnt,
struct dmar_dev_scope **devices, u16 segment);
extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
extern void dmar_free_dev_scope(struct dmar_dev_scope **devices, int
*cnt);
-extern int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
+extern int dmar_pci_insert_dev_scope(struct dmar_pci_notify_info *info,
void *start, void*end, u16 segment,
struct dmar_dev_scope *devices,
int devices_cnt);
+extern bool dmar_acpi_insert_dev_scope(u8 device_number,
+ struct acpi_device *adev, void *start, void *end,
+ struct dmar_dev_scope *devices, int devices_cnt);
extern int dmar_remove_dev_scope(struct dmar_pci_notify_info *info,
u16 segment, struct dmar_dev_scope *devices,
int count);
@@ -140,6 +143,7 @@ extern int dmar_parse_one_atsr(struct
acpi_dmar_header *header, void *arg);
extern int dmar_check_one_atsr(struct acpi_dmar_header *hdr, void *arg);
extern int dmar_release_one_atsr(struct acpi_dmar_header *hdr, void *arg);
extern int dmar_iommu_hotplug(struct dmar_drhd_unit *dmaru, bool insert);
+extern int dmar_rmrr_add_acpi_dev(u8 device_number, struct acpi_device
*adev);
extern int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info);
#else /* !CONFIG_INTEL_IOMMU: */
static inline int intel_iommu_init(void) { return -ENODEV; }
@@ -150,6 +154,11 @@ static inline void intel_iommu_shutdown(void) { }
#define dmar_check_one_atsr dmar_res_noop
#define dmar_release_one_atsr dmar_res_noop
+static inline int dmar_rmrr_add_acpi_dev(u8 device_number, struct
acpi_device *adev)
+{
+ return 0;
+}
+
static inline int dmar_iommu_notify_scope_dev(struct
dmar_pci_notify_info *info)
{
return 0;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3ff424d4f481..66ae2b7d65de 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -546,6 +546,9 @@ extern void iommu_domain_window_disable(struct
iommu_domain *domain, u32 wnd_nr)
extern int report_iommu_fault(struct iommu_domain *domain, struct
device *dev,
unsigned long iova, int flags);
+extern void __acpi_device_create_direct_mappings(struct iommu_group *group,
+ struct device *acpi_device);
+
static inline void iommu_flush_iotlb_all(struct iommu_domain *domain)
{
if (domain->ops->flush_iotlb_all)
--
2.20.1
1
0

09 Jul '21
This bug is found in Zhaoxin platform, but it's a commom code bug.
Fail sequence:
step1: Unbind UHCI controller from native driver;
step2: Bind UHCI controller to vfio-pci, which will put UHCI controller
in one vfio
group's device list and set UHCI's dev->driver_data to struct
vfio-pci(for UHCI)
step3: Unbind EHCI controller from native driver, will try to tell UHCI
native driver
that "I'm removed by set companion_hcd->self.hs_companion to
NULL. However,
companion_hcd get from UHCI's dev->driver_data that has modified
by vfio-pci
already.So, the vfio-pci structure will be damaged!
step4: Bind EHCI controller to vfio-pci driver, which will put EHCI
controller in the
same vfio group as UHCI controller;
... ...
step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from
vfio group'
device list that has been damaged in step 3. So,delete operation
can random
result into a NULL pointer dereference with the below stack dump.
step6: Bind UHCI controller to native driver;
step7: Unbind EHCI controller from vfio-pci, which will try to remove
EHCI controller
from the vfio group;
step8: Bind EHCI controller to native driver;
[ 929.114641] uhci_hcd 0000:00:10.0: remove, state 1 [ 929.114652] usb
usb1: USB disconnect, device number 1 [ 929.114655] usb 1-1: USB
disconnect, device number 2 [ 929.270313] usb 1-2: USB disconnect,
device number 3 [ 929.318404] uhci_hcd 0000:00:10.0: USB bus 1
deregistered [ 929.343029] uhci_hcd 0000:00:10.1: remove, state 4 [
929.343045] usb usb3: USB disconnect, device number 1 [ 929.343685]
uhci_hcd 0000:00:10.1: USB bus 3 deregistered [ 929.369087] ehci-pci
0000:00:10.7: remove, state 4 [ 929.369102] usb usb4: USB disconnect,
device number 1 [ 929.370325] ehci-pci 0000:00:10.7: USB bus 4
deregistered [ 932.398494] BUG: unable to handle kernel NULL pointer
dereference at 0000000000000000 [ 932.398496] PGD 42a67d067 P4D
42a67d067 PUD 42a65f067 PMD 0 [ 932.398502] Oops: 0002 [#2] SMP NOPTI
[ 932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P D
4.19.65-2020051917-rainos #1
[ 932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd.
HX002EH/HX002EH,
BIOS HX002EH0_01_R480_R_200408 04/08/2020 [ 932.398513] RIP:
0010:vfio_device_put+0x31/0xa0 [vfio] [ 932.398515] Code: 89 e5 41 54 53
4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de
84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08
<48> 89 10
48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00 [
932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202 [ 932.398518]
RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000 [
932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI:
ffff92c7ea713600 [ 932.398521] RBP: ffffbbfd04cffc28 R08:
ffff92c7f02a8080 R09: ffff92c7efc03980 [ 932.398522] R10:
ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600 [
932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15:
0000000000000000 [ 932.398525] FS: 00007f3031500740(0000)
GS:ffff92c7f0280000(0000) knlGS:0000000000000000 [ 932.398526] CS:
0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 932.398527] CR2:
0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0 [
932.398528] Call Trace:
[ 932.398534] vfio_del_group_dev+0xe8/0x2a0 [vfio] [ 932.398539] ?
__blocking_notifier_call_chain+0x52/0x60
[ 932.398542] ? do_wait_intr_irq+0x90/0x90 [ 932.398546] ?
iommu_bus_notifier+0x75/0x100 [ 932.398551] vfio_pci_remove+0x20/0xa0
[vfio_pci] [ 932.398554] pci_device_remove+0x3e/0xc0 [ 932.398557]
device_release_driver_internal+0x17a/0x240
[ 932.398560] device_release_driver+0x12/0x20 [ 932.398561]
unbind_store+0xee/0x180 [ 932.398564] drv_attr_store+0x27/0x40 [
932.398567] sysfs_kf_write+0x3c/0x50 [ 932.398568]
kernfs_fop_write+0x125/0x1a0 [ 932.398572] __vfs_write+0x3a/0x190 [
932.398575] ? apparmor_file_permission+0x1a/0x20
[ 932.398577] ? security_file_permission+0x3b/0xc0
[ 932.398581] ? _cond_resched+0x1a/0x50 [ 932.398582]
vfs_write+0xb8/0x1b0 [ 932.398584] ksys_write+0x5c/0xe0 [ 932.398586]
__x64_sys_write+0x1a/0x20 [ 932.398589] do_syscall_64+0x5a/0x110 [
932.398592] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Using virt-manager/qemu to boot guest os, we can see the same fail sequence!
Fix this by determine whether the PCI Driver of
the USB controller is a kernel native driver.
If not, do not let it modify UHCI's dev->driver_data.
Signed-off-by: LeoLiu-oc <LeoLiu-oc(a)zhaoxin.com>
---
drivers/usb/core/hcd-pci.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index ec0d6c50610c..000ee7a6731f 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -49,6 +49,7 @@ static void for_each_companion(struct pci_dev *pdev,
struct usb_hcd *hcd,
struct pci_dev *companion;
struct usb_hcd *companion_hcd;
unsigned int slot = PCI_SLOT(pdev->devfn);
+ struct pci_driver *drv;
/*
* Iterate through other PCI functions in the same slot.
@@ -61,6 +62,15 @@ static void for_each_companion(struct pci_dev *pdev,
struct usb_hcd *hcd,
PCI_SLOT(companion->devfn) != slot)
continue;
+ drv = companion->driver;
+ if (!drv)
+ continue;
+
+ if (strncmp(drv->name, "uhci_hcd", sizeof("uhci_hcd") - 1) &&
+ strncmp(drv->name, "ooci_hcd", sizeof("uhci_hcd") - 1) &&
+ strncmp(drv->name, "ehci_hcd", sizeof("uhci_hcd") - 1))
+ continue;
+
/*
* Companion device should be either UHCI,OHCI or EHCI host
* controller, otherwise skip.
--
2.20.1
1
0