Kernel
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
June 2022
- 18 participants
- 65 discussions
Backport 5.10.109 LTS patches from upstream
nds32: fix access_ok() checks in get/put_user
wcn36xx: Differentiate wcn3660 from wcn3620
tpm: use try_get_ops() in tpm-space.c
mac80211: fix potential double free on mesh join
rcu: Don't deboost before reporting expedited quiescent state
Revert "ath: add support for special 0x0 regulatory domain"
crypto: qat - disable registration of algorithms
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
drivers: net: xgene: Fix regression in CRC stripping
ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
ALSA: cmipci: Restore aux vol on suspend/resume
ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
ALSA: pcm: Add stream lock during PCM reset ioctl operations
ALSA: pcm: Fix races among concurrent prealloc proc writes
ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls
ALSA: pcm: Fix races among concurrent read/write and buffer changes
ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
ALSA: hda/realtek: Add quirk for ASUS GA402
ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
ALSA: hda/realtek: Add quirk for Clevo NP50PNJ
ALSA: hda/realtek: Add quirk for Clevo NP70PNJ
ALSA: usb-audio: add mapping for new Corsair Virtuoso SE
ALSA: oss: Fix PCM OSS buffer allocation overflow
ASoC: sti: Fix deadlock via snd_pcm_stop_xrun() call
staging: fbtft: fb_st7789v: reset display before initialization
tpm: Fix error handling in async work
cgroup-v1: Correct privileges check in release_agent writes
exfat: avoid incorrectly releasing for root inode
net: ipv6: fix skb_over_panic in __ip6_append_data
Already merged:
llc: only change llc->dev when bind() succeeds
netfilter: nf_tables: initialize registers in nft_do_chain()
llc: fix netdevice reference leaks in llc_ui_bind()
cgroup: Use open-time cgroup namespace for process migration perm checks
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION
Total patches: 37 - 6 = 31
Arnd Bergmann (1):
nds32: fix access_ok() checks in get/put_user
Brian Norris (1):
Revert "ath: add support for special 0x0 regulatory domain"
Bryan O'Donoghue (1):
wcn36xx: Differentiate wcn3660 from wcn3620
Chen Li (1):
exfat: avoid incorrectly releasing for root inode
Giacomo Guiduzzi (1):
ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
Giovanni Cabiddu (1):
crypto: qat - disable registration of algorithms
James Bottomley (1):
tpm: use try_get_ops() in tpm-space.c
Jason Zheng (1):
ALSA: hda/realtek: Add quirk for ASUS GA402
Jonathan Teh (1):
ALSA: cmipci: Restore aux vol on suspend/resume
Lars-Peter Clausen (1):
ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
Linus Lüssing (1):
mac80211: fix potential double free on mesh join
Mark Cilissen (1):
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
Maximilian Luz (1):
ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
Michal Koutný (1):
cgroup-v1: Correct privileges check in release_agent writes
Oliver Graute (1):
staging: fbtft: fb_st7789v: reset display before initialization
Paul E. McKenney (1):
rcu: Don't deboost before reporting expedited quiescent state
Reza Jahanbakhshi (1):
ALSA: usb-audio: add mapping for new Corsair Virtuoso SE
Stephane Graber (1):
drivers: net: xgene: Fix regression in CRC stripping
Tadeusz Struk (2):
net: ipv6: fix skb_over_panic in __ip6_append_data
tpm: Fix error handling in async work
Takashi Iwai (7):
ASoC: sti: Fix deadlock via snd_pcm_stop_xrun() call
ALSA: oss: Fix PCM OSS buffer allocation overflow
ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
ALSA: pcm: Fix races among concurrent read/write and buffer changes
ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free
calls
ALSA: pcm: Fix races among concurrent prealloc proc writes
ALSA: pcm: Add stream lock during PCM reset ioctl operations
Tim Crawford (2):
ALSA: hda/realtek: Add quirk for Clevo NP70PNJ
ALSA: hda/realtek: Add quirk for Clevo NP50PNJ
Werner Sembach (1):
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
huangwenhui (1):
ALSA: hda/realtek - Fix headset mic problem for a HP machine with
alc671
arch/nds32/include/asm/uaccess.h | 22 ++++-
arch/x86/kernel/acpi/boot.c | 24 +++++
drivers/acpi/battery.c | 12 +++
drivers/acpi/video_detect.c | 75 ++++++++++++++
drivers/char/tpm/tpm-dev-common.c | 8 +-
drivers/char/tpm/tpm2-space.c | 8 +-
drivers/crypto/qat/qat_common/qat_crypto.c | 8 ++
.../net/ethernet/apm/xgene/xgene_enet_main.c | 12 ++-
drivers/net/wireless/ath/regd.c | 10 +-
drivers/net/wireless/ath/wcn36xx/main.c | 3 +
drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 1 +
drivers/staging/fbtft/fb_st7789v.c | 2 +
fs/exfat/super.c | 2 +-
include/sound/pcm.h | 1 +
kernel/cgroup/cgroup-v1.c | 6 +-
kernel/rcu/tree_plugin.h | 9 +-
net/ipv6/ip6_output.c | 4 +-
net/mac80211/cfg.c | 3 -
sound/core/oss/pcm_oss.c | 12 ++-
sound/core/oss/pcm_plugin.c | 5 +-
sound/core/pcm.c | 2 +
sound/core/pcm_lib.c | 4 +
sound/core/pcm_memory.c | 11 ++-
sound/core/pcm_native.c | 97 ++++++++++++-------
sound/pci/ac97/ac97_codec.c | 4 +-
sound/pci/cmipci.c | 3 +-
sound/pci/hda/patch_realtek.c | 4 +
sound/soc/sti/uniperif_player.c | 6 +-
sound/soc/sti/uniperif_reader.c | 2 +-
sound/usb/mixer_maps.c | 10 ++
sound/usb/mixer_quirks.c | 7 +-
31 files changed, 289 insertions(+), 88 deletions(-)
--
2.20.1
1
31

[PATCH openEuler-1.0-LTS] mm/memcontrol: fix wrong vmstats for dying memcg
by Yongqiang Liu 27 Jun '22
by Yongqiang Liu 27 Jun '22
27 Jun '22
From: Lu Jialin <lujialin4(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5E8LA
CVE: NA
--------------------------------
At present, only when the absolute value of stat_cpu->count exceeds
MEMCG_CHARGE_BATCH will it be updated to stat, so there will always
be a certain lag difference between stat and the correct value.
In addition, since the partially deleted memcg is still referenced, it
will not be freed immediately after it is offline. Although the
remaining memcg has released the page, it and the parent's stat will
still be not 0 or too large due to the update lag, which leads to the
abnormality of the total_<count> parameter in the memory.stat file.
This patch mainly solves the problem of synchronization between
memcg's stat and the correct value during the destruction process
from two aspects:
1) Perform a flush synchronization operation when memcg is offline
2) For memcg in the process of being destroyed, bypass the threshold
judgment when updating vmstats
Signed-off-by: Lu Jialin <lujialin4(a)huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
mm/memcontrol.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2983baf910f4..345a9d159ad8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -697,7 +697,8 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
return;
x = val + __this_cpu_read(memcg->stat_cpu->count[idx]);
- if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
+ if (unlikely(abs(x) > MEMCG_CHARGE_BATCH ||
+ memcg->css.flags & CSS_DYING)) {
struct mem_cgroup *mi;
struct mem_cgroup_extension *memcg_ext;
@@ -3244,8 +3245,10 @@ static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg)
stat[i] = 0;
for_each_online_cpu(cpu)
- for (i = 0; i < MEMCG_NR_STAT; i++)
+ for (i = 0; i < MEMCG_NR_STAT; i++) {
stat[i] += per_cpu(memcg->stat_cpu->count[i], cpu);
+ per_cpu(memcg->stat_cpu->count[i], cpu) = 0;
+ }
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
for (i = 0; i < MEMCG_NR_STAT; i++)
@@ -3259,9 +3262,11 @@ static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg)
stat[i] = 0;
for_each_online_cpu(cpu)
- for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
stat[i] += per_cpu(
pn->lruvec_stat_cpu->count[i], cpu);
+ per_cpu(pn->lruvec_stat_cpu->count[i], cpu) = 0;
+ }
for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
@@ -3279,9 +3284,11 @@ static void memcg_flush_percpu_vmevents(struct mem_cgroup *memcg)
events[i] = 0;
for_each_online_cpu(cpu)
- for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
events[i] += per_cpu(memcg->stat_cpu->events[i],
cpu);
+ per_cpu(memcg->stat_cpu->events[i], cpu) = 0;
+ }
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
@@ -5106,6 +5113,9 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
memcg_offline_kmem(memcg);
wb_memcg_offline(memcg);
+ memcg_flush_percpu_vmstats(memcg);
+ memcg_flush_percpu_vmevents(memcg);
+
mem_cgroup_id_put(memcg);
}
--
2.25.1
1
0
大家好,
本次Intel Arch例会定于本周二6/28 10:00-11:00AM进行, 欢迎大家提出更多需求或议题和参与讨论。
本次初步议题:
Agenda:
*Status update
*SPR feature PRs merge into intel-kernel & OLK-5.10 kernel *Compiler support for new instructions *Support 22.09 release for SPR fundamental features
-----Original Appointment-----
From: openEuler conference <public(a)openeuler.org>
Sent: Monday, June 20, 2022 3:10 PM
To: openEuler conference; jun.j.tian@intel.com,kai.liu@suse.com
Subject: sig-Intel-Arch
When: Tuesday, June 28, 2022 10:00 AM-11:00 AM (UTC+08:00) Beijing, Chongqing, Hong Kong, Urumqi.
Where:
您好!
sig-Intel-Arch SIG 邀请您参加 2022-06-28 10:00 召开的Zoom会议
会议主题:sig-Intel-Arch
会议链接:https://us06web.zoom.us/j/81976528831?pwd=cVIxUkRhUXFGcldFV0ZtNkpvUFpxZz09
会议纪要:https://etherpad.openeuler.org/p/sig-Intel-Arch-meetings
温馨提醒:建议接入会议后修改参会人的姓名,也可以使用您在gitee.com的ID
更多资讯尽在:https://openeuler.org/zh/
Hello!
openEuler sig-Intel-Arch SIG invites you to attend the Zoom conference will be held at 2022-06-28 10:00,
The subject of the conference is sig-Intel-Arch,
You can join the meeting at https://us06web.zoom.us/j/81976528831?pwd=cVIxUkRhUXFGcldFV0ZtNkpvUFpxZz09.
Add topics at https://etherpad.openeuler.org/p/sig-Intel-Arch-meetings.
Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com.
More information: https://openeuler.org/en/
1
0

[PATCH openEuler-1.0-LTS] ext4: recover csum seed of tmp_inode after migrating to extents
by Yongqiang Liu 25 Jun '22
by Yongqiang Liu 25 Jun '22
25 Jun '22
From: Li Lingfeng <lilingfeng3(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: 186944, https://gitee.com/openeuler/kernel/issues/I5DAJY
CVE: NA
--------------------------------
When migrating to extents, the checksum seed of temporary inode
need to be replaced by inode's, otherwise the inode checksums
will be incorrect when swapping the inodes data.
However, the temporary inode can not match it's checksum to
itself since it has lost it's own checksum seed.
mkfs.ext4 -F /dev/sdc
mount /dev/sdc /mnt/sdc
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/sdc/testfile
chattr -e /mnt/sdc/testfile
chattr +e /mnt/sdc/testfile
fsck -fn /dev/sdc
========
...
Pass 1: Checking inodes, blocks, and sizes
Inode 13 passes checks, but checksum does not match inode. Fix? no
...
========
The fix is simple, save the checksum seed of temporary inode, and
recover it after migrating to extents.
Fixes: e81c9302a6c3 ("ext4: set csum seed in tmp inode while migrating to extents")
Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
fs/ext4/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 75a769634b2b..ed9e7816efbb 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -415,7 +415,7 @@ int ext4_ext_migrate(struct inode *inode)
struct inode *tmp_inode = NULL;
struct migrate_struct lb;
unsigned long max_entries;
- __u32 goal;
+ __u32 goal, tmp_csum_seed;
uid_t owner[2];
/*
@@ -463,6 +463,7 @@ int ext4_ext_migrate(struct inode *inode)
* the migration.
*/
ei = EXT4_I(inode);
+ tmp_csum_seed = EXT4_I(tmp_inode)->i_csum_seed;
EXT4_I(tmp_inode)->i_csum_seed = ei->i_csum_seed;
i_size_write(tmp_inode, i_size_read(inode));
/*
@@ -573,6 +574,7 @@ int ext4_ext_migrate(struct inode *inode)
* the inode is not visible to user space.
*/
tmp_inode->i_blocks = 0;
+ EXT4_I(tmp_inode)->i_csum_seed = tmp_csum_seed;
/* Reset the extent details */
ext4_ext_tree_init(handle, tmp_inode);
--
2.25.1
1
0

[PATCH openEuler-1.0-LTS] vfio: framework supporting vfio device hot migration
by RongWang 24 Jun '22
by RongWang 24 Jun '22
24 Jun '22
From: Rong Wang <w_angrong(a)163.com>
kunpeng inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CO9A
CVE: NA
---------------------------------
As pass through devices, hypervisor can`t control the status of
device, and can`t track dirty memory DMA from device, during
migration.
The goal of this framework is to combine hardware to accomplish
the task above.
qemu
|status control and dirty memory report
vfio
|ops to hardware
hardware
Signed-off-by: Rong Wang <w_angrong(a)163.com>
Signed-off-by: HuHua Li <18245010845(a)163.com>
Signed-off-by: Ripeng Qiu <965412048(a)qq.com>
---
drivers/vfio/pci/Makefile | 2 +-
drivers/vfio/pci/vfio_pci.c | 54 +++
drivers/vfio/pci/vfio_pci_migration.c | 755 ++++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_private.h | 14 +-
drivers/vfio/vfio.c | 411 +++++++++++++++++-
include/linux/vfio_pci_migration.h | 136 ++++++
6 files changed, 1367 insertions(+), 5 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_migration.c
create mode 100644 include/linux/vfio_pci_migration.h
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 76d8ec0..80a777d 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,5 +1,5 @@
-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio_pci_migration.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 51b791c..59d8280 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -30,6 +30,7 @@
#include <linux/vgaarb.h>
#include <linux/nospec.h>
#include <linux/sched/mm.h>
+#include <linux/vfio_pci_migration.h>
#include "vfio_pci_private.h"
@@ -296,6 +297,14 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
vfio_pci_probe_mmaps(vdev);
+ if (vfio_dev_migration_is_supported(pdev)) {
+ ret = vfio_pci_migration_init(vdev);
+ if (ret) {
+ dev_warn(&vdev->pdev->dev, "Failed to init vfio_pci_migration\n");
+ vfio_pci_disable(vdev);
+ return ret;
+ }
+ }
return 0;
}
@@ -392,6 +401,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
out:
pci_disable_device(pdev);
+ vfio_pci_migration_exit(vdev);
vfio_pci_try_bus_reset(vdev);
if (!disable_idle_d3)
@@ -642,6 +652,41 @@ struct vfio_devices {
int max_index;
};
+static long vfio_pci_handle_log_buf_ctl(struct vfio_pci_device *vdev,
+ const unsigned long arg)
+{
+ struct vfio_log_buf_ctl *log_buf_ctl = NULL;
+ struct vfio_log_buf_info *log_buf_info = NULL;
+ struct vf_migration_log_info migration_log_info;
+ long ret = 0;
+
+ log_buf_ctl = (struct vfio_log_buf_ctl *)arg;
+ log_buf_info = (struct vfio_log_buf_info *)log_buf_ctl->data;
+
+ switch (log_buf_ctl->flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ migration_log_info.dom_uuid = log_buf_info->uuid;
+ migration_log_info.buffer_size =
+ log_buf_info->buffer_size;
+ migration_log_info.sge_num = log_buf_info->addrs_size;
+ migration_log_info.sge_len = log_buf_info->frag_size;
+ migration_log_info.sgevec = log_buf_info->sgevec;
+ ret = vfio_pci_device_log_start(vdev,
+ &migration_log_info);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_pci_device_log_stop(vdev,
+ log_buf_info->uuid);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_pci_device_log_status_query(vdev);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
static long vfio_pci_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
{
@@ -1142,6 +1187,8 @@ static long vfio_pci_ioctl(void *device_data,
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
+ } else if (cmd == VFIO_DEVICE_LOG_BUF_CTL) {
+ return vfio_pci_handle_log_buf_ctl(vdev, arg);
}
return -ENOTTY;
@@ -1566,6 +1613,9 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
pci_set_power_state(pdev, PCI_D3hot);
}
+ if (vfio_dev_migration_is_supported(pdev))
+ ret = vfio_pci_device_init(pdev);
+
return ret;
}
@@ -1591,6 +1641,10 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!disable_idle_d3)
pci_set_power_state(pdev, PCI_D0);
+
+ if (vfio_dev_migration_is_supported(pdev)) {
+ vfio_pci_device_uninit(pdev);
+ }
}
static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci_migration.c b/drivers/vfio/pci/vfio_pci_migration.c
new file mode 100644
index 0000000..f69cd13
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_migration.c
@@ -0,0 +1,755 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
+
+#include "vfio_pci_private.h"
+
+static LIST_HEAD(vfio_pci_mig_drivers_list);
+static DEFINE_MUTEX(vfio_pci_mig_drivers_mutex);
+
+static void vfio_pci_add_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ atomic_set(&mig_drv->count, 1);
+ list_add_tail(&mig_drv->list, &vfio_pci_mig_drivers_list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static void vfio_pci_remove_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_del(&mig_drv->list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_find_mig_drv(struct pci_dev *pdev, struct module *module)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->owner == module) {
+ if (mig_drv->bus_num == pdev->bus->number)
+ goto out;
+ }
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_get_mig_driver(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+ struct pci_dev *pf_dev = pci_physfn(pdev);
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->bus_num == pf_dev->bus->number)
+ goto out;
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+bool vfio_dev_migration_is_supported(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&pdev->dev, "unable to find a mig_drv module\n");
+ return false;
+ }
+
+ return true;
+}
+
+int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_start ||
+ (mig_driver->dev_mig_ops->log_start(vdev->pdev,
+ log_info) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log start\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_stop(struct vfio_pci_device *vdev, uint32_t uuid)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_stop ||
+ (mig_driver->dev_mig_ops->log_stop(vdev->pdev, uuid) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log stop\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->get_log_status ||
+ (mig_driver->dev_mig_ops->get_log_status(vdev->pdev) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to get log status\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_init(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (mig_drv->dev_mig_ops->init)
+ return mig_drv->dev_mig_ops->init(pdev);
+
+ return -EFAULT;
+}
+
+void vfio_pci_device_uninit(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return;
+ }
+
+ if (mig_drv->dev_mig_ops->uninit)
+ mig_drv->dev_mig_ops->uninit(pdev);
+}
+
+static void vfio_pci_device_release(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->release)
+ mig_drv->dev_mig_ops->release(pdev);
+}
+
+static int vfio_pci_device_get_info(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->get_info)
+ return mig_drv->dev_mig_ops->get_info(pdev, mig_info);
+ return -EFAULT;
+}
+
+static int vfio_pci_device_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->enable ||
+ (mig_drv->dev_mig_ops->enable(pdev) != 0)) {
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int vfio_pci_device_disable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->disable ||
+ (mig_drv->dev_mig_ops->disable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_pre_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->pre_enable ||
+ (mig_drv->dev_mig_ops->pre_enable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_state_save(struct pci_dev *pdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ void *base = (void *)mig_info;
+ int ret = 0;
+
+ if ((mig_info->device_state & VFIO_DEVICE_STATE_RUNNING) != 0) {
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to stop VF function!\n");
+ return ret;
+ }
+ mig_info->device_state &= ~VFIO_DEVICE_STATE_RUNNING;
+ }
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->save) {
+ ret = mig_drv->dev_mig_ops->save(pdev, base,
+ mig_info->data_offset, data->state_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to save device state!\n");
+ return -EINVAL;
+ }
+ } else {
+ return -EFAULT;
+ }
+
+ mig_info->data_size = data->state_size;
+ mig_info->pending_bytes = mig_info->data_size;
+ return ret;
+}
+
+static int vfio_pci_device_state_restore(struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ void *base = (void *)mig_info;
+ int ret;
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->restore) {
+ ret = mig_drv->dev_mig_ops->restore(pdev, base,
+ mig_info->data_offset, mig_info->data_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to restore device state!\n");
+ return -EINVAL;
+ }
+ return 0;
+ }
+
+ return -EFAULT;
+}
+
+static int vfio_pci_set_device_state(struct vfio_pci_migration_data *data,
+ u32 state)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ int ret = 0;
+
+ if (state == mig_ctl->device_state)
+ return 0;
+
+ if (!mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ switch (state) {
+ case VFIO_DEVICE_STATE_RUNNING:
+ if (!(mig_ctl->device_state &
+ VFIO_DEVICE_STATE_RUNNING))
+ ret = vfio_pci_device_enable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING:
+ /*
+ * (pre-copy) - device should start logging data.
+ */
+ ret = 0;
+ break;
+ case VFIO_DEVICE_STATE_SAVING:
+ /* stop the vf function, save state */
+ ret = vfio_pci_device_state_save(pdev, data);
+ break;
+ case VFIO_DEVICE_STATE_STOP:
+ if (mig_ctl->device_state & VFIO_DEVICE_STATE_RUNNING)
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_RESUMING:
+ ret = vfio_pci_device_pre_enable(pdev, mig_drv);
+ break;
+ default:
+ ret = -EFAULT;
+ break;
+ }
+
+ if (ret)
+ return ret;
+
+ mig_ctl->device_state = state;
+ return 0;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_state(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ u32 device_state;
+ int ret;
+
+ if (count != sizeof(device_state))
+ return -EINVAL;
+
+ if (iswrite) {
+ if (copy_from_user(&device_state, buf, count))
+ return -EFAULT;
+
+ ret = vfio_pci_set_device_state(data, device_state);
+ if (ret)
+ return ret;
+ } else {
+ if (copy_to_user(buf, &mig_ctl->device_state, count))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_pending_bytes(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 pending_bytes;
+
+ if (count != sizeof(pending_bytes) || iswrite)
+ return -EINVAL;
+
+ if (mig_info->device_state ==
+ (VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING)) {
+ /* In pre-copy state we have no data to return for now,
+ * return 0 pending bytes
+ */
+ pending_bytes = 0;
+ } else {
+ pending_bytes = mig_info->pending_bytes;
+ }
+
+ if (copy_to_user(buf, &pending_bytes, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_offset(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_offset = mig_info->data_offset;
+
+ if (count != sizeof(data_offset) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &data_offset, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_size(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_size;
+
+ if (count != sizeof(data_size))
+ return -EINVAL;
+
+ if (iswrite) {
+ /* data_size is writable only during resuming state */
+ if (mig_info->device_state != VFIO_DEVICE_STATE_RESUMING)
+ return -EINVAL;
+
+ if (copy_from_user(&data_size, buf, sizeof(data_size)))
+ return -EFAULT;
+
+ mig_info->data_size = data_size;
+ } else {
+ if (mig_info->device_state != VFIO_DEVICE_STATE_SAVING)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &mig_info->data_size,
+ sizeof(data_size)))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_cmd(struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ u32 device_cmd;
+ int ret = -EFAULT;
+
+ if (count != sizeof(device_cmd) || !iswrite || !mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ if (copy_from_user(&device_cmd, buf, count))
+ return -EFAULT;
+
+ switch (device_cmd) {
+ case VFIO_DEVICE_MIGRATION_CANCEL:
+ if (mig_drv->dev_mig_ops->cancel)
+ ret = mig_drv->dev_mig_ops->cancel(pdev);
+ break;
+ default:
+ dev_err(&pdev->dev, "cmd is invaild\n");
+ return -EINVAL;
+ }
+
+ if (ret != 0)
+ return ret;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_drv_version(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u32 version_id = mig_info->version_id;
+
+ if (count != sizeof(version_id) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &version_id, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_rw(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, u64 pos, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ void *data_addr = data->vf_data;
+
+ if (count == 0) {
+ dev_err(&data->vf_dev->dev, "qemu operation data size error!\n");
+ return -EINVAL;
+ }
+
+ data_addr += pos - mig_ctl->data_offset;
+ if (iswrite) {
+ if (copy_from_user(data_addr, buf, count))
+ return -EFAULT;
+
+ mig_ctl->pending_bytes += count;
+ if (mig_ctl->pending_bytes > data->state_size)
+ return -EINVAL;
+ } else {
+ if (copy_to_user(buf, data_addr, count))
+ return -EFAULT;
+
+ if (mig_ctl->pending_bytes < count)
+ return -EINVAL;
+
+ mig_ctl->pending_bytes -= count;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_dev_migrn_rw(struct vfio_pci_device *vdev,
+ char __user *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+ unsigned int index =
+ VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+ struct vfio_pci_migration_data *data =
+ (struct vfio_pci_migration_data *)vdev->region[index].data;
+ loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ int ret;
+
+ if (pos >= vdev->region[index].size)
+ return -EINVAL;
+
+ count = min(count, (size_t)(vdev->region[index].size - pos));
+ if (pos >= VFIO_MIGRATION_REGION_DATA_OFFSET)
+ return vfio_pci_handle_mig_data_rw(data,
+ buf, count, pos, iswrite);
+
+ switch (pos) {
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_state):
+ ret = vfio_pci_handle_mig_dev_state(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(pending_bytes):
+ ret = vfio_pci_handle_mig_pending_bytes(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_offset):
+ ret = vfio_pci_handle_mig_data_offset(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_size):
+ ret = vfio_pci_handle_mig_data_size(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_cmd):
+ ret = vfio_pci_handle_mig_dev_cmd(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(version_id):
+ ret = vfio_pci_handle_mig_drv_version(mig_ctl,
+ buf, count, iswrite);
+ break;
+ default:
+ dev_err(&vdev->pdev->dev, "invalid pos offset\n");
+ ret = -EFAULT;
+ break;
+ }
+
+ if (mig_ctl->device_state == VFIO_DEVICE_STATE_RESUMING &&
+ mig_ctl->pending_bytes == data->state_size &&
+ mig_ctl->data_size == data->state_size) {
+ if (vfio_pci_device_state_restore(data) != 0) {
+ dev_err(&vdev->pdev->dev, "Failed to restore device state!\n");
+ return -EFAULT;
+ }
+ mig_ctl->pending_bytes = 0;
+ mig_ctl->data_size = 0;
+ }
+
+ return ret;
+}
+
+static void vfio_pci_dev_migrn_release(struct vfio_pci_device *vdev,
+ struct vfio_pci_region *region)
+{
+ struct vfio_pci_migration_data *data = region->data;
+
+ if (data) {
+ kfree(data->mig_ctl);
+ kfree(data);
+ }
+}
+
+static const struct vfio_pci_regops vfio_pci_migration_regops = {
+ .rw = vfio_pci_dev_migrn_rw,
+ .release = vfio_pci_dev_migrn_release,
+};
+
+static int vfio_pci_migration_info_init(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ int ret;
+
+ ret = vfio_pci_device_get_info(pdev, mig_info, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to get device info\n");
+ return ret;
+ }
+
+ if (mig_info->data_size > VFIO_MIGRATION_BUFFER_MAX_SIZE) {
+ dev_err(&pdev->dev, "mig_info->data_size %llu is invalid\n",
+ mig_info->data_size);
+ return -EINVAL;
+ }
+
+ mig_info->data_offset = VFIO_MIGRATION_REGION_DATA_OFFSET;
+ return ret;
+}
+
+static int vfio_device_mig_data_init(struct vfio_pci_device *vdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_ctl;
+ u64 mig_offset;
+ int ret;
+
+ mig_ctl = kzalloc(sizeof(*mig_ctl), GFP_KERNEL);
+ if (!mig_ctl)
+ return -ENOMEM;
+
+ ret = vfio_pci_migration_info_init(vdev->pdev, mig_ctl,
+ data->mig_driver);
+ if (ret) {
+ dev_err(&vdev->pdev->dev, "get device info error!\n");
+ goto err;
+ }
+
+ mig_offset = sizeof(struct vfio_device_migration_info);
+ data->state_size = mig_ctl->data_size;
+ data->mig_ctl = krealloc(mig_ctl, mig_offset + data->state_size,
+ GFP_KERNEL);
+ if (!data->mig_ctl) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ data->vf_data = (void *)((char *)data->mig_ctl + mig_offset);
+ memset(data->vf_data, 0, data->state_size);
+ data->mig_ctl->data_size = 0;
+
+ ret = vfio_pci_register_dev_region(vdev, VFIO_REGION_TYPE_MIGRATION,
+ VFIO_REGION_SUBTYPE_MIGRATION,
+ &vfio_pci_migration_regops, mig_offset + data->state_size,
+ VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE, data);
+ if (ret) {
+ kfree(data->mig_ctl);
+ return ret;
+ }
+
+ return 0;
+err:
+ kfree(mig_ctl);
+ return ret;
+}
+
+int vfio_pci_migration_init(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+ struct vfio_pci_migration_data *data = NULL;
+ struct pci_dev *pdev = vdev->pdev;
+ int ret;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_driver module\n");
+ return -EINVAL;
+ }
+
+ if (!try_module_get(mig_driver->owner)) {
+ pr_err("module %s is not live\n", mig_driver->owner->name);
+ return -ENODEV;
+ }
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ module_put(mig_driver->owner);
+ return -ENOMEM;
+ }
+
+ data->mig_driver = mig_driver;
+ data->vf_dev = pdev;
+
+ ret = vfio_device_mig_data_init(vdev, data);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to init vfio device migration data!\n");
+ goto err;
+ }
+
+ return ret;
+err:
+ kfree(data);
+ module_put(mig_driver->owner);
+ return ret;
+}
+
+void vfio_pci_migration_exit(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&vdev->pdev->dev, "mig_driver is not found\n");
+ return;
+ }
+
+ if (module_refcount(mig_driver->owner) > 0) {
+ vfio_pci_device_release(vdev->pdev, mig_driver);
+ module_put(mig_driver->owner);
+ }
+}
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!ops || !mod || !pdev)
+ return -EINVAL;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (mig_driver) {
+ pr_info("%s migration ops has already been registered\n",
+ mod->name);
+ atomic_add(1, &mig_driver->count);
+ return 0;
+ }
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ mig_driver = kzalloc(sizeof(*mig_driver), GFP_KERNEL);
+ if (!mig_driver) {
+ module_put(THIS_MODULE);
+ return -ENOMEM;
+ }
+
+ mig_driver->pdev = pdev;
+ mig_driver->bus_num = pdev->bus->number;
+ mig_driver->owner = mod;
+ mig_driver->dev_mig_ops = ops;
+
+ vfio_pci_add_mig_drv(mig_driver);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_register_migration_ops);
+
+void vfio_pci_unregister_migration_ops(struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!mod || !pdev)
+ return;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (!mig_driver) {
+ pr_err("mig_driver is not found\n");
+ return;
+ }
+
+ if (atomic_sub_and_test(1, &mig_driver->count)) {
+ vfio_pci_remove_mig_drv(mig_driver);
+ kfree(mig_driver);
+ module_put(THIS_MODULE);
+ pr_info("%s succeed to unregister migration ops\n",
+ THIS_MODULE->name);
+ }
+}
+EXPORT_SYMBOL_GPL(vfio_pci_unregister_migration_ops);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 17d2bae..03af269 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -15,6 +15,7 @@
#include <linux/pci.h>
#include <linux/irqbypass.h>
#include <linux/types.h>
+#include <linux/vfio_pci_migration.h>
#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -55,7 +56,7 @@ struct vfio_pci_irq_ctx {
struct vfio_pci_region;
struct vfio_pci_regops {
- size_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
+ ssize_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
void (*release)(struct vfio_pci_device *vdev,
struct vfio_pci_region *region);
@@ -173,4 +174,15 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
return -ENODEV;
}
#endif
+
+extern bool vfio_dev_migration_is_supported(struct pci_dev *pdev);
+extern int vfio_pci_migration_init(struct vfio_pci_device *vdev);
+extern void vfio_pci_migration_exit(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info);
+extern int vfio_pci_device_log_stop(struct vfio_pci_device *vdev,
+ uint32_t uuid);
+extern int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_init(struct pci_dev *pdev);
+extern void vfio_pci_device_uninit(struct pci_dev *pdev);
#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 7a386fb..35f2a29 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
#include <linux/wait.h>
#include <linux/sched/signal.h>
@@ -40,6 +41,9 @@
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson(a)redhat.com>"
#define DRIVER_DESC "VFIO - User Level meta-driver"
+#define LOG_BUF_FRAG_SIZE (2 * 1024 * 1024) // fix to 2M
+#define LOG_BUF_MAX_ADDRS_SIZE 128 // max vm ram size is 1T
+
static struct vfio {
struct class *class;
struct list_head iommu_drivers_list;
@@ -57,6 +61,14 @@ struct vfio_iommu_driver {
struct list_head vfio_next;
};
+struct vfio_log_buf {
+ struct vfio_log_buf_info info;
+ int fd;
+ int buffer_state;
+ int device_state;
+ unsigned long *cpu_addrs;
+};
+
struct vfio_container {
struct kref kref;
struct list_head group_list;
@@ -64,6 +76,7 @@ struct vfio_container {
struct vfio_iommu_driver *iommu_driver;
void *iommu_data;
bool noiommu;
+ struct vfio_log_buf log_buf;
};
struct vfio_unbound_dev {
@@ -1158,8 +1171,398 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,
return ret;
}
+static long vfio_dispatch_cmd_to_devices(const struct vfio_container *container,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+ long ret = -ENXIO;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ ret = device->ops->ioctl(device->device_data, cmd, arg);
+ if (ret) {
+ pr_err("dispatch cmd to devices failed\n");
+ return ret;
+ }
+ }
+ }
+ return ret;
+}
+
+static long vfio_log_buf_start(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_START;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 1;
+ return 0;
+}
+
+static long vfio_log_buf_stop(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ if (container->log_buf.device_state == 0) {
+ pr_warn("device already stopped\n");
+ return 0;
+ }
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STOP;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 0;
+ return 0;
+}
+
+static long vfio_log_buf_query(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+
+ return vfio_dispatch_cmd_to_devices(container,
+ VFIO_DEVICE_LOG_BUF_CTL, (unsigned long)&log_buf_ctl);
+}
+
+static int vfio_log_buf_fops_mmap(struct file *filep,
+ struct vm_area_struct *vma)
+{
+ struct vfio_container *container = filep->private_data;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ unsigned long frag_pg_size;
+ unsigned long frag_offset;
+ phys_addr_t pa;
+ int ret = -EINVAL;
+
+ if (!log_buf->cpu_addrs) {
+ pr_err("mmap before setup, please setup log buf first\n");
+ return ret;
+ }
+
+ if (log_buf->info.frag_size < PAGE_SIZE) {
+ pr_err("mmap frag size should not less than page size!\n");
+ return ret;
+ }
+
+ frag_pg_size = log_buf->info.frag_size / PAGE_SIZE;
+ frag_offset = vma->vm_pgoff / frag_pg_size;
+
+ if (frag_offset >= log_buf->info.addrs_size) {
+ pr_err("mmap offset out of range!\n");
+ return ret;
+ }
+
+ if (vma->vm_end - vma->vm_start != log_buf->info.frag_size) {
+ pr_err("mmap size error, should be aligned with frag size!\n");
+ return ret;
+ }
+
+ pa = virt_to_phys((void *)log_buf->cpu_addrs[frag_offset]);
+ ret = remap_pfn_range(vma, vma->vm_start,
+ pa >> PAGE_SHIFT,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+ if (ret)
+ pr_err("remap_pfn_range error!\n");
+ return ret;
+}
+
+static struct device *vfio_get_dev(struct vfio_container *container)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ return device->dev;
+ }
+ }
+ return NULL;
+}
+
+static void vfio_log_buf_release_dma(struct device *dev,
+ struct vfio_log_buf *log_buf)
+{
+ int i;
+
+ for (i = 0; i < log_buf->info.addrs_size; i++) {
+ if ((log_buf->cpu_addrs && log_buf->cpu_addrs[i] != 0) &&
+ (log_buf->info.sgevec &&
+ log_buf->info.sgevec[i].addr != 0)) {
+ dma_free_coherent(dev, log_buf->info.frag_size,
+ (void *)log_buf->cpu_addrs[i],
+ log_buf->info.sgevec[i].addr);
+ log_buf->cpu_addrs[i] = 0;
+ log_buf->info.sgevec[i].addr = 0;
+ }
+ }
+}
+
+static long vfio_log_buf_alloc_dma(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf, struct device *dev)
+{
+ int i;
+
+ for (i = 0; i < info->addrs_size; i++) {
+ log_buf->cpu_addrs[i] = (unsigned long)dma_alloc_coherent(dev,
+ info->frag_size, &log_buf->info.sgevec[i].addr,
+ GFP_KERNEL);
+ log_buf->info.sgevec[i].len = info->frag_size;
+ if (log_buf->cpu_addrs[i] == 0 ||
+ log_buf->info.sgevec[i].addr == 0) {
+ return -ENOMEM;
+ }
+ }
+ return 0;
+}
+
+static long vfio_log_buf_alloc_addrs(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf)
+{
+ log_buf->info.sgevec = kcalloc(info->addrs_size,
+ sizeof(struct vfio_log_buf_sge), GFP_KERNEL);
+ if (!log_buf->info.sgevec)
+ return -ENOMEM;
+
+ log_buf->cpu_addrs = kcalloc(info->addrs_size,
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!log_buf->cpu_addrs) {
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static long vfio_log_buf_info_valid(struct vfio_log_buf_info *info)
+{
+ if (info->addrs_size > LOG_BUF_MAX_ADDRS_SIZE ||
+ info->addrs_size == 0) {
+ pr_err("can`t support vm ram size larger than 1T or equal to 0\n");
+ return -EINVAL;
+ }
+ if (info->frag_size != LOG_BUF_FRAG_SIZE) {
+ pr_err("only support %d frag size\n", LOG_BUF_FRAG_SIZE);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static long vfio_log_buf_setup(struct vfio_container *container,
+ unsigned long data)
+{
+ struct vfio_log_buf_info info;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+ long ret;
+
+ if (log_buf->info.sgevec) {
+ pr_warn("log buf already setup\n");
+ return 0;
+ }
+
+ if (copy_from_user(&info, (void __user *)data,
+ sizeof(struct vfio_log_buf_info)))
+ return -EFAULT;
+
+ ret = vfio_log_buf_info_valid(&info);
+ if (ret)
+ return ret;
+
+ ret = vfio_log_buf_alloc_addrs(&info, log_buf);
+ if (ret)
+ goto err_out;
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ goto err_free_addrs;
+ }
+
+ ret = vfio_log_buf_alloc_dma(&info, log_buf, dev);
+ if (ret)
+ goto err_free_dma_array;
+
+ log_buf->info.uuid = info.uuid;
+ log_buf->info.buffer_size = info.buffer_size;
+ log_buf->info.frag_size = info.frag_size;
+ log_buf->info.addrs_size = info.addrs_size;
+ log_buf->buffer_state = 1;
+ return 0;
+
+err_free_dma_array:
+ vfio_log_buf_release_dma(dev, log_buf);
+err_free_addrs:
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+err_out:
+ return -ENOMEM;
+}
+
+static long vfio_log_buf_release_buffer(struct vfio_container *container)
+{
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+
+ if (log_buf->buffer_state == 0) {
+ pr_warn("buffer already released\n");
+ return 0;
+ }
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ return -EFAULT;
+ }
+
+ vfio_log_buf_release_dma(dev, log_buf);
+
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+
+ log_buf->buffer_state = 0;
+ return 0;
+}
+
+static int vfio_log_buf_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_container *container = filep->private_data;
+
+ vfio_log_buf_stop(container);
+ vfio_log_buf_release_buffer(container);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
+ return 0;
+}
+
+static long vfio_ioctl_handle_log_buf_ctl(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret = 0;
+
+ if (copy_from_user(&log_buf_ctl, (void __user *)arg,
+ sizeof(struct vfio_log_buf_ctl)))
+ return -EFAULT;
+
+ switch (log_buf_ctl.flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_SETUP:
+ ret = vfio_log_buf_setup(container,
+ (unsigned long)log_buf_ctl.data);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_RELEASE:
+ ret = vfio_log_buf_release_buffer(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ ret = vfio_log_buf_start(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_log_buf_stop(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_log_buf_query(container);
+ break;
+ default:
+ pr_err("log buf control flag incorrect\n");
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static long vfio_log_buf_fops_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_container *container = filep->private_data;
+ long ret = -EINVAL;
+
+ switch (cmd) {
+ case VFIO_LOG_BUF_CTL:
+ ret = vfio_ioctl_handle_log_buf_ctl(container, arg);
+ break;
+ default:
+ pr_err("log buf control cmd incorrect\n");
+ break;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_log_buf_fops_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_log_buf_fops_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+static const struct file_operations vfio_log_buf_fops = {
+ .owner = THIS_MODULE,
+ .mmap = vfio_log_buf_fops_mmap,
+ .unlocked_ioctl = vfio_log_buf_fops_unl_ioctl,
+ .release = vfio_log_buf_release,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_log_buf_fops_compat_ioctl,
+#endif
+};
+
+static int vfio_get_log_buf_fd(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct file *filep = NULL;
+ int ret;
+
+ if (container->log_buf.fd > 0)
+ return container->log_buf.fd;
+
+ ret = get_unused_fd_flags(O_CLOEXEC);
+ if (ret < 0) {
+ pr_err("get_unused_fd_flags get fd failed\n");
+ return ret;
+ }
+
+ filep = anon_inode_getfile("[vfio-log-buf]", &vfio_log_buf_fops,
+ container, O_RDWR);
+ if (IS_ERR(filep)) {
+ pr_err("anon_inode_getfile failed\n");
+ put_unused_fd(ret);
+ ret = PTR_ERR(filep);
+ return ret;
+ }
+
+ filep->f_mode |= (FMODE_READ | FMODE_WRITE | FMODE_LSEEK);
+
+ fd_install(ret, filep);
+
+ container->log_buf.fd = ret;
+ return ret;
+}
+
static long vfio_fops_unl_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
+ unsigned int cmd, unsigned long arg)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver;
@@ -1179,6 +1582,9 @@ static long vfio_fops_unl_ioctl(struct file *filep,
case VFIO_SET_IOMMU:
ret = vfio_ioctl_set_iommu(container, arg);
break;
+ case VFIO_GET_LOG_BUF_FD:
+ ret = vfio_get_log_buf_fd(container, arg);
+ break;
default:
driver = container->iommu_driver;
data = container->iommu_data;
@@ -1210,6 +1616,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
INIT_LIST_HEAD(&container->group_list);
init_rwsem(&container->group_lock);
kref_init(&container->kref);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
filep->private_data = container;
@@ -1219,9 +1626,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
static int vfio_fops_release(struct inode *inode, struct file *filep)
{
struct vfio_container *container = filep->private_data;
-
filep->private_data = NULL;
-
vfio_container_put(container);
return 0;
diff --git a/include/linux/vfio_pci_migration.h b/include/linux/vfio_pci_migration.h
new file mode 100644
index 0000000..464ffb4
--- /dev/null
+++ b/include/linux/vfio_pci_migration.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#ifndef VFIO_PCI_MIGRATION_H
+#define VFIO_PCI_MIGRATION_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+#define VFIO_REGION_TYPE_MIGRATION (3)
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION (1)
+
+#define VFIO_MIGRATION_BUFFER_MAX_SIZE SZ_256K
+#define VFIO_MIGRATION_REGION_DATA_OFFSET \
+ (sizeof(struct vfio_device_migration_info))
+#define VFIO_DEVICE_MIGRATION_OFFSET(x) \
+ offsetof(struct vfio_device_migration_info, x)
+
+struct vfio_device_migration_info {
+ __u32 device_state; /* VFIO device state */
+#define VFIO_DEVICE_STATE_STOP (0)
+#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING)
+ __u32 reserved;
+
+ __u32 device_cmd;
+ __u32 version_id;
+
+ __u64 pending_bytes;
+ __u64 data_offset;
+ __u64 data_size;
+};
+
+enum {
+ VFIO_DEVICE_STOP = 0xffff0001,
+ VFIO_DEVICE_CONTINUE,
+ VFIO_DEVICE_MIGRATION_CANCEL,
+};
+
+struct vfio_log_buf_sge {
+ __u64 len;
+ __u64 addr;
+};
+
+struct vfio_log_buf_info {
+ __u32 uuid;
+ __u64 buffer_size;
+ __u64 addrs_size;
+ __u64 frag_size;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_log_buf_ctl {
+ __u32 argsz;
+ __u32 flags;
+ #define VFIO_DEVICE_LOG_BUF_FLAG_SETUP (1 << 0)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_RELEASE (1 << 1)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_START (1 << 2)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STOP (1 << 3)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY (1 << 4)
+ void *data;
+};
+#define VFIO_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 21)
+#define VFIO_GET_LOG_BUF_FD _IO(VFIO_TYPE, VFIO_BASE + 22)
+#define VFIO_DEVICE_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 23)
+
+struct vf_migration_log_info {
+ __u32 dom_uuid;
+ __u64 buffer_size;
+ __u64 sge_len;
+ __u64 sge_num;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_device_migration_ops {
+ /* Get device information */
+ int (*get_info)(struct pci_dev *pdev,
+ struct vfio_device_migration_info *info);
+ /* Enable a vf device */
+ int (*enable)(struct pci_dev *pdev);
+ /* Disable a vf device */
+ int (*disable)(struct pci_dev *pdev);
+ /* Save a vf device */
+ int (*save)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Resuming a vf device */
+ int (*restore)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Log start a vf device */
+ int (*log_start)(struct pci_dev *pdev,
+ struct vf_migration_log_info *log_info);
+ /* Log stop a vf device */
+ int (*log_stop)(struct pci_dev *pdev, uint32_t uuid);
+ /* Get vf device log status */
+ int (*get_log_status)(struct pci_dev *pdev);
+ /* Pre enable a vf device(load_setup, before restore a vf) */
+ int (*pre_enable)(struct pci_dev *pdev);
+ /* Cancel a vf device when live migration failed (rollback) */
+ int (*cancel)(struct pci_dev *pdev);
+ /* Init a vf device */
+ int (*init)(struct pci_dev *pdev);
+ /* Uninit a vf device */
+ void (*uninit)(struct pci_dev *pdev);
+ /* Release a vf device */
+ void (*release)(struct pci_dev *pdev);
+};
+
+struct vfio_pci_vendor_mig_driver {
+ struct pci_dev *pdev;
+ unsigned char bus_num;
+ struct vfio_device_migration_ops *dev_mig_ops;
+ struct module *owner;
+ atomic_t count;
+ struct list_head list;
+};
+
+struct vfio_pci_migration_data {
+ u64 state_size;
+ struct pci_dev *vf_dev;
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+ struct vfio_device_migration_info *mig_ctl;
+ void *vf_data;
+};
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev);
+void vfio_pci_unregister_migration_ops(struct module *mod,
+ struct pci_dev *pdev);
+
+#endif /* VFIO_PCI_MIGRATION_H */
--
1.8.3.1
2
1
From: Rong Wang <w_angrong(a)163.com>
kunpeng inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CO9A
CVE: NA
---------------------------------
As pass through devices, hypervisor can`t control the status of
device, and can`t track dirty memory DMA from device, during
migration.
The goal of this framework is to combine hardware to accomplish
the task above.
qemu
|status control and dirty memory report
vfio
|ops to hardware
hardware
Signed-off-by: Rong Wang <w_angrong(a)163.com>
Signed-off-by: HuHua Li <18245010845(a)163.com>
Signed-off-by: Ripeng Qiu <965412048(a)qq.com>
---
drivers/vfio/pci/Makefile | 2 +-
drivers/vfio/pci/vfio_pci.c | 54 +++
drivers/vfio/pci/vfio_pci_migration.c | 755 ++++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_private.h | 14 +-
drivers/vfio/vfio.c | 411 +++++++++++++++++-
include/linux/vfio_pci_migration.h | 136 ++++++
6 files changed, 1367 insertions(+), 5 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_migration.c
create mode 100644 include/linux/vfio_pci_migration.h
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 76d8ec0..80a777d 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,5 +1,5 @@
-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio_pci_migration.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 51b791c..59d8280 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -30,6 +30,7 @@
#include <linux/vgaarb.h>
#include <linux/nospec.h>
#include <linux/sched/mm.h>
+#include <linux/vfio_pci_migration.h>
#include "vfio_pci_private.h"
@@ -296,6 +297,14 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
vfio_pci_probe_mmaps(vdev);
+ if (vfio_dev_migration_is_supported(pdev)) {
+ ret = vfio_pci_migration_init(vdev);
+ if (ret) {
+ dev_warn(&vdev->pdev->dev, "Failed to init vfio_pci_migration\n");
+ vfio_pci_disable(vdev);
+ return ret;
+ }
+ }
return 0;
}
@@ -392,6 +401,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
out:
pci_disable_device(pdev);
+ vfio_pci_migration_exit(vdev);
vfio_pci_try_bus_reset(vdev);
if (!disable_idle_d3)
@@ -642,6 +652,41 @@ struct vfio_devices {
int max_index;
};
+static long vfio_pci_handle_log_buf_ctl(struct vfio_pci_device *vdev,
+ const unsigned long arg)
+{
+ struct vfio_log_buf_ctl *log_buf_ctl = NULL;
+ struct vfio_log_buf_info *log_buf_info = NULL;
+ struct vf_migration_log_info migration_log_info;
+ long ret = 0;
+
+ log_buf_ctl = (struct vfio_log_buf_ctl *)arg;
+ log_buf_info = (struct vfio_log_buf_info *)log_buf_ctl->data;
+
+ switch (log_buf_ctl->flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ migration_log_info.dom_uuid = log_buf_info->uuid;
+ migration_log_info.buffer_size =
+ log_buf_info->buffer_size;
+ migration_log_info.sge_num = log_buf_info->addrs_size;
+ migration_log_info.sge_len = log_buf_info->frag_size;
+ migration_log_info.sgevec = log_buf_info->sgevec;
+ ret = vfio_pci_device_log_start(vdev,
+ &migration_log_info);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_pci_device_log_stop(vdev,
+ log_buf_info->uuid);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_pci_device_log_status_query(vdev);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
static long vfio_pci_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
{
@@ -1142,6 +1187,8 @@ static long vfio_pci_ioctl(void *device_data,
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
+ } else if (cmd == VFIO_DEVICE_LOG_BUF_CTL) {
+ return vfio_pci_handle_log_buf_ctl(vdev, arg);
}
return -ENOTTY;
@@ -1566,6 +1613,9 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
pci_set_power_state(pdev, PCI_D3hot);
}
+ if (vfio_dev_migration_is_supported(pdev))
+ ret = vfio_pci_device_init(pdev);
+
return ret;
}
@@ -1591,6 +1641,10 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!disable_idle_d3)
pci_set_power_state(pdev, PCI_D0);
+
+ if (vfio_dev_migration_is_supported(pdev)) {
+ vfio_pci_device_uninit(pdev);
+ }
}
static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci_migration.c b/drivers/vfio/pci/vfio_pci_migration.c
new file mode 100644
index 0000000..f69cd13
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_migration.c
@@ -0,0 +1,755 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
+
+#include "vfio_pci_private.h"
+
+static LIST_HEAD(vfio_pci_mig_drivers_list);
+static DEFINE_MUTEX(vfio_pci_mig_drivers_mutex);
+
+static void vfio_pci_add_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ atomic_set(&mig_drv->count, 1);
+ list_add_tail(&mig_drv->list, &vfio_pci_mig_drivers_list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static void vfio_pci_remove_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_del(&mig_drv->list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_find_mig_drv(struct pci_dev *pdev, struct module *module)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->owner == module) {
+ if (mig_drv->bus_num == pdev->bus->number)
+ goto out;
+ }
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_get_mig_driver(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+ struct pci_dev *pf_dev = pci_physfn(pdev);
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->bus_num == pf_dev->bus->number)
+ goto out;
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+bool vfio_dev_migration_is_supported(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&pdev->dev, "unable to find a mig_drv module\n");
+ return false;
+ }
+
+ return true;
+}
+
+int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_start ||
+ (mig_driver->dev_mig_ops->log_start(vdev->pdev,
+ log_info) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log start\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_stop(struct vfio_pci_device *vdev, uint32_t uuid)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_stop ||
+ (mig_driver->dev_mig_ops->log_stop(vdev->pdev, uuid) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log stop\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->get_log_status ||
+ (mig_driver->dev_mig_ops->get_log_status(vdev->pdev) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to get log status\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_init(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (mig_drv->dev_mig_ops->init)
+ return mig_drv->dev_mig_ops->init(pdev);
+
+ return -EFAULT;
+}
+
+void vfio_pci_device_uninit(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return;
+ }
+
+ if (mig_drv->dev_mig_ops->uninit)
+ mig_drv->dev_mig_ops->uninit(pdev);
+}
+
+static void vfio_pci_device_release(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->release)
+ mig_drv->dev_mig_ops->release(pdev);
+}
+
+static int vfio_pci_device_get_info(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->get_info)
+ return mig_drv->dev_mig_ops->get_info(pdev, mig_info);
+ return -EFAULT;
+}
+
+static int vfio_pci_device_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->enable ||
+ (mig_drv->dev_mig_ops->enable(pdev) != 0)) {
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int vfio_pci_device_disable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->disable ||
+ (mig_drv->dev_mig_ops->disable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_pre_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->pre_enable ||
+ (mig_drv->dev_mig_ops->pre_enable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_state_save(struct pci_dev *pdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ void *base = (void *)mig_info;
+ int ret = 0;
+
+ if ((mig_info->device_state & VFIO_DEVICE_STATE_RUNNING) != 0) {
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to stop VF function!\n");
+ return ret;
+ }
+ mig_info->device_state &= ~VFIO_DEVICE_STATE_RUNNING;
+ }
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->save) {
+ ret = mig_drv->dev_mig_ops->save(pdev, base,
+ mig_info->data_offset, data->state_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to save device state!\n");
+ return -EINVAL;
+ }
+ } else {
+ return -EFAULT;
+ }
+
+ mig_info->data_size = data->state_size;
+ mig_info->pending_bytes = mig_info->data_size;
+ return ret;
+}
+
+static int vfio_pci_device_state_restore(struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ void *base = (void *)mig_info;
+ int ret;
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->restore) {
+ ret = mig_drv->dev_mig_ops->restore(pdev, base,
+ mig_info->data_offset, mig_info->data_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to restore device state!\n");
+ return -EINVAL;
+ }
+ return 0;
+ }
+
+ return -EFAULT;
+}
+
+static int vfio_pci_set_device_state(struct vfio_pci_migration_data *data,
+ u32 state)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ int ret = 0;
+
+ if (state == mig_ctl->device_state)
+ return 0;
+
+ if (!mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ switch (state) {
+ case VFIO_DEVICE_STATE_RUNNING:
+ if (!(mig_ctl->device_state &
+ VFIO_DEVICE_STATE_RUNNING))
+ ret = vfio_pci_device_enable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING:
+ /*
+ * (pre-copy) - device should start logging data.
+ */
+ ret = 0;
+ break;
+ case VFIO_DEVICE_STATE_SAVING:
+ /* stop the vf function, save state */
+ ret = vfio_pci_device_state_save(pdev, data);
+ break;
+ case VFIO_DEVICE_STATE_STOP:
+ if (mig_ctl->device_state & VFIO_DEVICE_STATE_RUNNING)
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_RESUMING:
+ ret = vfio_pci_device_pre_enable(pdev, mig_drv);
+ break;
+ default:
+ ret = -EFAULT;
+ break;
+ }
+
+ if (ret)
+ return ret;
+
+ mig_ctl->device_state = state;
+ return 0;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_state(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ u32 device_state;
+ int ret;
+
+ if (count != sizeof(device_state))
+ return -EINVAL;
+
+ if (iswrite) {
+ if (copy_from_user(&device_state, buf, count))
+ return -EFAULT;
+
+ ret = vfio_pci_set_device_state(data, device_state);
+ if (ret)
+ return ret;
+ } else {
+ if (copy_to_user(buf, &mig_ctl->device_state, count))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_pending_bytes(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 pending_bytes;
+
+ if (count != sizeof(pending_bytes) || iswrite)
+ return -EINVAL;
+
+ if (mig_info->device_state ==
+ (VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING)) {
+ /* In pre-copy state we have no data to return for now,
+ * return 0 pending bytes
+ */
+ pending_bytes = 0;
+ } else {
+ pending_bytes = mig_info->pending_bytes;
+ }
+
+ if (copy_to_user(buf, &pending_bytes, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_offset(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_offset = mig_info->data_offset;
+
+ if (count != sizeof(data_offset) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &data_offset, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_size(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_size;
+
+ if (count != sizeof(data_size))
+ return -EINVAL;
+
+ if (iswrite) {
+ /* data_size is writable only during resuming state */
+ if (mig_info->device_state != VFIO_DEVICE_STATE_RESUMING)
+ return -EINVAL;
+
+ if (copy_from_user(&data_size, buf, sizeof(data_size)))
+ return -EFAULT;
+
+ mig_info->data_size = data_size;
+ } else {
+ if (mig_info->device_state != VFIO_DEVICE_STATE_SAVING)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &mig_info->data_size,
+ sizeof(data_size)))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_cmd(struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ u32 device_cmd;
+ int ret = -EFAULT;
+
+ if (count != sizeof(device_cmd) || !iswrite || !mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ if (copy_from_user(&device_cmd, buf, count))
+ return -EFAULT;
+
+ switch (device_cmd) {
+ case VFIO_DEVICE_MIGRATION_CANCEL:
+ if (mig_drv->dev_mig_ops->cancel)
+ ret = mig_drv->dev_mig_ops->cancel(pdev);
+ break;
+ default:
+ dev_err(&pdev->dev, "cmd is invaild\n");
+ return -EINVAL;
+ }
+
+ if (ret != 0)
+ return ret;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_drv_version(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u32 version_id = mig_info->version_id;
+
+ if (count != sizeof(version_id) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &version_id, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_rw(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, u64 pos, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ void *data_addr = data->vf_data;
+
+ if (count == 0) {
+ dev_err(&data->vf_dev->dev, "qemu operation data size error!\n");
+ return -EINVAL;
+ }
+
+ data_addr += pos - mig_ctl->data_offset;
+ if (iswrite) {
+ if (copy_from_user(data_addr, buf, count))
+ return -EFAULT;
+
+ mig_ctl->pending_bytes += count;
+ if (mig_ctl->pending_bytes > data->state_size)
+ return -EINVAL;
+ } else {
+ if (copy_to_user(buf, data_addr, count))
+ return -EFAULT;
+
+ if (mig_ctl->pending_bytes < count)
+ return -EINVAL;
+
+ mig_ctl->pending_bytes -= count;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_dev_migrn_rw(struct vfio_pci_device *vdev,
+ char __user *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+ unsigned int index =
+ VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+ struct vfio_pci_migration_data *data =
+ (struct vfio_pci_migration_data *)vdev->region[index].data;
+ loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ int ret;
+
+ if (pos >= vdev->region[index].size)
+ return -EINVAL;
+
+ count = min(count, (size_t)(vdev->region[index].size - pos));
+ if (pos >= VFIO_MIGRATION_REGION_DATA_OFFSET)
+ return vfio_pci_handle_mig_data_rw(data,
+ buf, count, pos, iswrite);
+
+ switch (pos) {
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_state):
+ ret = vfio_pci_handle_mig_dev_state(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(pending_bytes):
+ ret = vfio_pci_handle_mig_pending_bytes(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_offset):
+ ret = vfio_pci_handle_mig_data_offset(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_size):
+ ret = vfio_pci_handle_mig_data_size(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_cmd):
+ ret = vfio_pci_handle_mig_dev_cmd(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(version_id):
+ ret = vfio_pci_handle_mig_drv_version(mig_ctl,
+ buf, count, iswrite);
+ break;
+ default:
+ dev_err(&vdev->pdev->dev, "invalid pos offset\n");
+ ret = -EFAULT;
+ break;
+ }
+
+ if (mig_ctl->device_state == VFIO_DEVICE_STATE_RESUMING &&
+ mig_ctl->pending_bytes == data->state_size &&
+ mig_ctl->data_size == data->state_size) {
+ if (vfio_pci_device_state_restore(data) != 0) {
+ dev_err(&vdev->pdev->dev, "Failed to restore device state!\n");
+ return -EFAULT;
+ }
+ mig_ctl->pending_bytes = 0;
+ mig_ctl->data_size = 0;
+ }
+
+ return ret;
+}
+
+static void vfio_pci_dev_migrn_release(struct vfio_pci_device *vdev,
+ struct vfio_pci_region *region)
+{
+ struct vfio_pci_migration_data *data = region->data;
+
+ if (data) {
+ kfree(data->mig_ctl);
+ kfree(data);
+ }
+}
+
+static const struct vfio_pci_regops vfio_pci_migration_regops = {
+ .rw = vfio_pci_dev_migrn_rw,
+ .release = vfio_pci_dev_migrn_release,
+};
+
+static int vfio_pci_migration_info_init(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ int ret;
+
+ ret = vfio_pci_device_get_info(pdev, mig_info, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to get device info\n");
+ return ret;
+ }
+
+ if (mig_info->data_size > VFIO_MIGRATION_BUFFER_MAX_SIZE) {
+ dev_err(&pdev->dev, "mig_info->data_size %llu is invalid\n",
+ mig_info->data_size);
+ return -EINVAL;
+ }
+
+ mig_info->data_offset = VFIO_MIGRATION_REGION_DATA_OFFSET;
+ return ret;
+}
+
+static int vfio_device_mig_data_init(struct vfio_pci_device *vdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_ctl;
+ u64 mig_offset;
+ int ret;
+
+ mig_ctl = kzalloc(sizeof(*mig_ctl), GFP_KERNEL);
+ if (!mig_ctl)
+ return -ENOMEM;
+
+ ret = vfio_pci_migration_info_init(vdev->pdev, mig_ctl,
+ data->mig_driver);
+ if (ret) {
+ dev_err(&vdev->pdev->dev, "get device info error!\n");
+ goto err;
+ }
+
+ mig_offset = sizeof(struct vfio_device_migration_info);
+ data->state_size = mig_ctl->data_size;
+ data->mig_ctl = krealloc(mig_ctl, mig_offset + data->state_size,
+ GFP_KERNEL);
+ if (!data->mig_ctl) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ data->vf_data = (void *)((char *)data->mig_ctl + mig_offset);
+ memset(data->vf_data, 0, data->state_size);
+ data->mig_ctl->data_size = 0;
+
+ ret = vfio_pci_register_dev_region(vdev, VFIO_REGION_TYPE_MIGRATION,
+ VFIO_REGION_SUBTYPE_MIGRATION,
+ &vfio_pci_migration_regops, mig_offset + data->state_size,
+ VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE, data);
+ if (ret) {
+ kfree(data->mig_ctl);
+ return ret;
+ }
+
+ return 0;
+err:
+ kfree(mig_ctl);
+ return ret;
+}
+
+int vfio_pci_migration_init(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+ struct vfio_pci_migration_data *data = NULL;
+ struct pci_dev *pdev = vdev->pdev;
+ int ret;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_driver module\n");
+ return -EINVAL;
+ }
+
+ if (!try_module_get(mig_driver->owner)) {
+ pr_err("module %s is not live\n", mig_driver->owner->name);
+ return -ENODEV;
+ }
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ module_put(mig_driver->owner);
+ return -ENOMEM;
+ }
+
+ data->mig_driver = mig_driver;
+ data->vf_dev = pdev;
+
+ ret = vfio_device_mig_data_init(vdev, data);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to init vfio device migration data!\n");
+ goto err;
+ }
+
+ return ret;
+err:
+ kfree(data);
+ module_put(mig_driver->owner);
+ return ret;
+}
+
+void vfio_pci_migration_exit(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&vdev->pdev->dev, "mig_driver is not found\n");
+ return;
+ }
+
+ if (module_refcount(mig_driver->owner) > 0) {
+ vfio_pci_device_release(vdev->pdev, mig_driver);
+ module_put(mig_driver->owner);
+ }
+}
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!ops || !mod || !pdev)
+ return -EINVAL;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (mig_driver) {
+ pr_info("%s migration ops has already been registered\n",
+ mod->name);
+ atomic_add(1, &mig_driver->count);
+ return 0;
+ }
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ mig_driver = kzalloc(sizeof(*mig_driver), GFP_KERNEL);
+ if (!mig_driver) {
+ module_put(THIS_MODULE);
+ return -ENOMEM;
+ }
+
+ mig_driver->pdev = pdev;
+ mig_driver->bus_num = pdev->bus->number;
+ mig_driver->owner = mod;
+ mig_driver->dev_mig_ops = ops;
+
+ vfio_pci_add_mig_drv(mig_driver);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_register_migration_ops);
+
+void vfio_pci_unregister_migration_ops(struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!mod || !pdev)
+ return;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (!mig_driver) {
+ pr_err("mig_driver is not found\n");
+ return;
+ }
+
+ if (atomic_sub_and_test(1, &mig_driver->count)) {
+ vfio_pci_remove_mig_drv(mig_driver);
+ kfree(mig_driver);
+ module_put(THIS_MODULE);
+ pr_info("%s succeed to unregister migration ops\n",
+ THIS_MODULE->name);
+ }
+}
+EXPORT_SYMBOL_GPL(vfio_pci_unregister_migration_ops);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 17d2bae..03af269 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -15,6 +15,7 @@
#include <linux/pci.h>
#include <linux/irqbypass.h>
#include <linux/types.h>
+#include <linux/vfio_pci_migration.h>
#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -55,7 +56,7 @@ struct vfio_pci_irq_ctx {
struct vfio_pci_region;
struct vfio_pci_regops {
- size_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
+ ssize_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
void (*release)(struct vfio_pci_device *vdev,
struct vfio_pci_region *region);
@@ -173,4 +174,15 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
return -ENODEV;
}
#endif
+
+extern bool vfio_dev_migration_is_supported(struct pci_dev *pdev);
+extern int vfio_pci_migration_init(struct vfio_pci_device *vdev);
+extern void vfio_pci_migration_exit(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info);
+extern int vfio_pci_device_log_stop(struct vfio_pci_device *vdev,
+ uint32_t uuid);
+extern int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_init(struct pci_dev *pdev);
+extern void vfio_pci_device_uninit(struct pci_dev *pdev);
#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 7a386fb..35f2a29 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
#include <linux/wait.h>
#include <linux/sched/signal.h>
@@ -40,6 +41,9 @@
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson(a)redhat.com>"
#define DRIVER_DESC "VFIO - User Level meta-driver"
+#define LOG_BUF_FRAG_SIZE (2 * 1024 * 1024) // fix to 2M
+#define LOG_BUF_MAX_ADDRS_SIZE 128 // max vm ram size is 1T
+
static struct vfio {
struct class *class;
struct list_head iommu_drivers_list;
@@ -57,6 +61,14 @@ struct vfio_iommu_driver {
struct list_head vfio_next;
};
+struct vfio_log_buf {
+ struct vfio_log_buf_info info;
+ int fd;
+ int buffer_state;
+ int device_state;
+ unsigned long *cpu_addrs;
+};
+
struct vfio_container {
struct kref kref;
struct list_head group_list;
@@ -64,6 +76,7 @@ struct vfio_container {
struct vfio_iommu_driver *iommu_driver;
void *iommu_data;
bool noiommu;
+ struct vfio_log_buf log_buf;
};
struct vfio_unbound_dev {
@@ -1158,8 +1171,398 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,
return ret;
}
+static long vfio_dispatch_cmd_to_devices(const struct vfio_container *container,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+ long ret = -ENXIO;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ ret = device->ops->ioctl(device->device_data, cmd, arg);
+ if (ret) {
+ pr_err("dispatch cmd to devices failed\n");
+ return ret;
+ }
+ }
+ }
+ return ret;
+}
+
+static long vfio_log_buf_start(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_START;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 1;
+ return 0;
+}
+
+static long vfio_log_buf_stop(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ if (container->log_buf.device_state == 0) {
+ pr_warn("device already stopped\n");
+ return 0;
+ }
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STOP;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 0;
+ return 0;
+}
+
+static long vfio_log_buf_query(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+
+ return vfio_dispatch_cmd_to_devices(container,
+ VFIO_DEVICE_LOG_BUF_CTL, (unsigned long)&log_buf_ctl);
+}
+
+static int vfio_log_buf_fops_mmap(struct file *filep,
+ struct vm_area_struct *vma)
+{
+ struct vfio_container *container = filep->private_data;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ unsigned long frag_pg_size;
+ unsigned long frag_offset;
+ phys_addr_t pa;
+ int ret = -EINVAL;
+
+ if (!log_buf->cpu_addrs) {
+ pr_err("mmap before setup, please setup log buf first\n");
+ return ret;
+ }
+
+ if (log_buf->info.frag_size < PAGE_SIZE) {
+ pr_err("mmap frag size should not less than page size!\n");
+ return ret;
+ }
+
+ frag_pg_size = log_buf->info.frag_size / PAGE_SIZE;
+ frag_offset = vma->vm_pgoff / frag_pg_size;
+
+ if (frag_offset >= log_buf->info.addrs_size) {
+ pr_err("mmap offset out of range!\n");
+ return ret;
+ }
+
+ if (vma->vm_end - vma->vm_start != log_buf->info.frag_size) {
+ pr_err("mmap size error, should be aligned with frag size!\n");
+ return ret;
+ }
+
+ pa = virt_to_phys((void *)log_buf->cpu_addrs[frag_offset]);
+ ret = remap_pfn_range(vma, vma->vm_start,
+ pa >> PAGE_SHIFT,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+ if (ret)
+ pr_err("remap_pfn_range error!\n");
+ return ret;
+}
+
+static struct device *vfio_get_dev(struct vfio_container *container)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ return device->dev;
+ }
+ }
+ return NULL;
+}
+
+static void vfio_log_buf_release_dma(struct device *dev,
+ struct vfio_log_buf *log_buf)
+{
+ int i;
+
+ for (i = 0; i < log_buf->info.addrs_size; i++) {
+ if ((log_buf->cpu_addrs && log_buf->cpu_addrs[i] != 0) &&
+ (log_buf->info.sgevec &&
+ log_buf->info.sgevec[i].addr != 0)) {
+ dma_free_coherent(dev, log_buf->info.frag_size,
+ (void *)log_buf->cpu_addrs[i],
+ log_buf->info.sgevec[i].addr);
+ log_buf->cpu_addrs[i] = 0;
+ log_buf->info.sgevec[i].addr = 0;
+ }
+ }
+}
+
+static long vfio_log_buf_alloc_dma(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf, struct device *dev)
+{
+ int i;
+
+ for (i = 0; i < info->addrs_size; i++) {
+ log_buf->cpu_addrs[i] = (unsigned long)dma_alloc_coherent(dev,
+ info->frag_size, &log_buf->info.sgevec[i].addr,
+ GFP_KERNEL);
+ log_buf->info.sgevec[i].len = info->frag_size;
+ if (log_buf->cpu_addrs[i] == 0 ||
+ log_buf->info.sgevec[i].addr == 0) {
+ return -ENOMEM;
+ }
+ }
+ return 0;
+}
+
+static long vfio_log_buf_alloc_addrs(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf)
+{
+ log_buf->info.sgevec = kcalloc(info->addrs_size,
+ sizeof(struct vfio_log_buf_sge), GFP_KERNEL);
+ if (!log_buf->info.sgevec)
+ return -ENOMEM;
+
+ log_buf->cpu_addrs = kcalloc(info->addrs_size,
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!log_buf->cpu_addrs) {
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static long vfio_log_buf_info_valid(struct vfio_log_buf_info *info)
+{
+ if (info->addrs_size > LOG_BUF_MAX_ADDRS_SIZE ||
+ info->addrs_size == 0) {
+ pr_err("can`t support vm ram size larger than 1T or equal to 0\n");
+ return -EINVAL;
+ }
+ if (info->frag_size != LOG_BUF_FRAG_SIZE) {
+ pr_err("only support %d frag size\n", LOG_BUF_FRAG_SIZE);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static long vfio_log_buf_setup(struct vfio_container *container,
+ unsigned long data)
+{
+ struct vfio_log_buf_info info;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+ long ret;
+
+ if (log_buf->info.sgevec) {
+ pr_warn("log buf already setup\n");
+ return 0;
+ }
+
+ if (copy_from_user(&info, (void __user *)data,
+ sizeof(struct vfio_log_buf_info)))
+ return -EFAULT;
+
+ ret = vfio_log_buf_info_valid(&info);
+ if (ret)
+ return ret;
+
+ ret = vfio_log_buf_alloc_addrs(&info, log_buf);
+ if (ret)
+ goto err_out;
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ goto err_free_addrs;
+ }
+
+ ret = vfio_log_buf_alloc_dma(&info, log_buf, dev);
+ if (ret)
+ goto err_free_dma_array;
+
+ log_buf->info.uuid = info.uuid;
+ log_buf->info.buffer_size = info.buffer_size;
+ log_buf->info.frag_size = info.frag_size;
+ log_buf->info.addrs_size = info.addrs_size;
+ log_buf->buffer_state = 1;
+ return 0;
+
+err_free_dma_array:
+ vfio_log_buf_release_dma(dev, log_buf);
+err_free_addrs:
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+err_out:
+ return -ENOMEM;
+}
+
+static long vfio_log_buf_release_buffer(struct vfio_container *container)
+{
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+
+ if (log_buf->buffer_state == 0) {
+ pr_warn("buffer already released\n");
+ return 0;
+ }
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ return -EFAULT;
+ }
+
+ vfio_log_buf_release_dma(dev, log_buf);
+
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+
+ log_buf->buffer_state = 0;
+ return 0;
+}
+
+static int vfio_log_buf_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_container *container = filep->private_data;
+
+ vfio_log_buf_stop(container);
+ vfio_log_buf_release_buffer(container);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
+ return 0;
+}
+
+static long vfio_ioctl_handle_log_buf_ctl(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret = 0;
+
+ if (copy_from_user(&log_buf_ctl, (void __user *)arg,
+ sizeof(struct vfio_log_buf_ctl)))
+ return -EFAULT;
+
+ switch (log_buf_ctl.flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_SETUP:
+ ret = vfio_log_buf_setup(container,
+ (unsigned long)log_buf_ctl.data);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_RELEASE:
+ ret = vfio_log_buf_release_buffer(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ ret = vfio_log_buf_start(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_log_buf_stop(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_log_buf_query(container);
+ break;
+ default:
+ pr_err("log buf control flag incorrect\n");
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static long vfio_log_buf_fops_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_container *container = filep->private_data;
+ long ret = -EINVAL;
+
+ switch (cmd) {
+ case VFIO_LOG_BUF_CTL:
+ ret = vfio_ioctl_handle_log_buf_ctl(container, arg);
+ break;
+ default:
+ pr_err("log buf control cmd incorrect\n");
+ break;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_log_buf_fops_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_log_buf_fops_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+static const struct file_operations vfio_log_buf_fops = {
+ .owner = THIS_MODULE,
+ .mmap = vfio_log_buf_fops_mmap,
+ .unlocked_ioctl = vfio_log_buf_fops_unl_ioctl,
+ .release = vfio_log_buf_release,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_log_buf_fops_compat_ioctl,
+#endif
+};
+
+static int vfio_get_log_buf_fd(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct file *filep = NULL;
+ int ret;
+
+ if (container->log_buf.fd > 0)
+ return container->log_buf.fd;
+
+ ret = get_unused_fd_flags(O_CLOEXEC);
+ if (ret < 0) {
+ pr_err("get_unused_fd_flags get fd failed\n");
+ return ret;
+ }
+
+ filep = anon_inode_getfile("[vfio-log-buf]", &vfio_log_buf_fops,
+ container, O_RDWR);
+ if (IS_ERR(filep)) {
+ pr_err("anon_inode_getfile failed\n");
+ put_unused_fd(ret);
+ ret = PTR_ERR(filep);
+ return ret;
+ }
+
+ filep->f_mode |= (FMODE_READ | FMODE_WRITE | FMODE_LSEEK);
+
+ fd_install(ret, filep);
+
+ container->log_buf.fd = ret;
+ return ret;
+}
+
static long vfio_fops_unl_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
+ unsigned int cmd, unsigned long arg)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver;
@@ -1179,6 +1582,9 @@ static long vfio_fops_unl_ioctl(struct file *filep,
case VFIO_SET_IOMMU:
ret = vfio_ioctl_set_iommu(container, arg);
break;
+ case VFIO_GET_LOG_BUF_FD:
+ ret = vfio_get_log_buf_fd(container, arg);
+ break;
default:
driver = container->iommu_driver;
data = container->iommu_data;
@@ -1210,6 +1616,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
INIT_LIST_HEAD(&container->group_list);
init_rwsem(&container->group_lock);
kref_init(&container->kref);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
filep->private_data = container;
@@ -1219,9 +1626,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
static int vfio_fops_release(struct inode *inode, struct file *filep)
{
struct vfio_container *container = filep->private_data;
-
filep->private_data = NULL;
-
vfio_container_put(container);
return 0;
diff --git a/include/linux/vfio_pci_migration.h b/include/linux/vfio_pci_migration.h
new file mode 100644
index 0000000..464ffb4
--- /dev/null
+++ b/include/linux/vfio_pci_migration.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#ifndef VFIO_PCI_MIGRATION_H
+#define VFIO_PCI_MIGRATION_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+#define VFIO_REGION_TYPE_MIGRATION (3)
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION (1)
+
+#define VFIO_MIGRATION_BUFFER_MAX_SIZE SZ_256K
+#define VFIO_MIGRATION_REGION_DATA_OFFSET \
+ (sizeof(struct vfio_device_migration_info))
+#define VFIO_DEVICE_MIGRATION_OFFSET(x) \
+ offsetof(struct vfio_device_migration_info, x)
+
+struct vfio_device_migration_info {
+ __u32 device_state; /* VFIO device state */
+#define VFIO_DEVICE_STATE_STOP (0)
+#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING)
+ __u32 reserved;
+
+ __u32 device_cmd;
+ __u32 version_id;
+
+ __u64 pending_bytes;
+ __u64 data_offset;
+ __u64 data_size;
+};
+
+enum {
+ VFIO_DEVICE_STOP = 0xffff0001,
+ VFIO_DEVICE_CONTINUE,
+ VFIO_DEVICE_MIGRATION_CANCEL,
+};
+
+struct vfio_log_buf_sge {
+ __u64 len;
+ __u64 addr;
+};
+
+struct vfio_log_buf_info {
+ __u32 uuid;
+ __u64 buffer_size;
+ __u64 addrs_size;
+ __u64 frag_size;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_log_buf_ctl {
+ __u32 argsz;
+ __u32 flags;
+ #define VFIO_DEVICE_LOG_BUF_FLAG_SETUP (1 << 0)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_RELEASE (1 << 1)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_START (1 << 2)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STOP (1 << 3)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY (1 << 4)
+ void *data;
+};
+#define VFIO_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 21)
+#define VFIO_GET_LOG_BUF_FD _IO(VFIO_TYPE, VFIO_BASE + 22)
+#define VFIO_DEVICE_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 23)
+
+struct vf_migration_log_info {
+ __u32 dom_uuid;
+ __u64 buffer_size;
+ __u64 sge_len;
+ __u64 sge_num;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_device_migration_ops {
+ /* Get device information */
+ int (*get_info)(struct pci_dev *pdev,
+ struct vfio_device_migration_info *info);
+ /* Enable a vf device */
+ int (*enable)(struct pci_dev *pdev);
+ /* Disable a vf device */
+ int (*disable)(struct pci_dev *pdev);
+ /* Save a vf device */
+ int (*save)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Resuming a vf device */
+ int (*restore)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Log start a vf device */
+ int (*log_start)(struct pci_dev *pdev,
+ struct vf_migration_log_info *log_info);
+ /* Log stop a vf device */
+ int (*log_stop)(struct pci_dev *pdev, uint32_t uuid);
+ /* Get vf device log status */
+ int (*get_log_status)(struct pci_dev *pdev);
+ /* Pre enable a vf device(load_setup, before restore a vf) */
+ int (*pre_enable)(struct pci_dev *pdev);
+ /* Cancel a vf device when live migration failed (rollback) */
+ int (*cancel)(struct pci_dev *pdev);
+ /* Init a vf device */
+ int (*init)(struct pci_dev *pdev);
+ /* Uninit a vf device */
+ void (*uninit)(struct pci_dev *pdev);
+ /* Release a vf device */
+ void (*release)(struct pci_dev *pdev);
+};
+
+struct vfio_pci_vendor_mig_driver {
+ struct pci_dev *pdev;
+ unsigned char bus_num;
+ struct vfio_device_migration_ops *dev_mig_ops;
+ struct module *owner;
+ atomic_t count;
+ struct list_head list;
+};
+
+struct vfio_pci_migration_data {
+ u64 state_size;
+ struct pci_dev *vf_dev;
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+ struct vfio_device_migration_info *mig_ctl;
+ void *vf_data;
+};
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev);
+void vfio_pci_unregister_migration_ops(struct module *mod,
+ struct pci_dev *pdev);
+
+#endif /* VFIO_PCI_MIGRATION_H */
--
1.8.3.1
1
0

[PATCH OLK-5.10 v2 1/2] ipmi/watchdog: replace atomic_add() and atomic_sub()
by Miaohe Lin 24 Jun '22
by Miaohe Lin 24 Jun '22
24 Jun '22
From: Yejune Deng <yejune.deng(a)gmail.com>
mainline inclusion
from v5.11-rc1
commit a01a89b1db1066a6af23ae08b9a0c345b7966f0b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5DVR9
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
atomic_inc() and atomic_dec() looks better
Signed-off-by: Yejune Deng <yejune.deng(a)gmail.com>
Message-Id: <1605511807-7135-1-git-send-email-yejune.deng(a)gmail.com>
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
---
drivers/char/ipmi/ipmi_watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index f78156d93c3f..32c334e34d55 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -495,7 +495,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -505,7 +505,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -529,12 +529,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
--
2.23.0
1
1

23 Jun '22
From: Yejune Deng <yejune.deng(a)gmail.com>
mainline inclusion
from v5.11-rc1
commit a01a89b1db1066a6af23ae08b9a0c345b7966f0b
category: bugfix
bugzilla: NA
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
atomic_inc() and atomic_dec() looks better
Signed-off-by: Yejune Deng <yejune.deng(a)gmail.com>
Message-Id: <1605511807-7135-1-git-send-email-yejune.deng(a)gmail.com>
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
---
drivers/char/ipmi/ipmi_watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index f78156d93c3f..32c334e34d55 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -495,7 +495,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -505,7 +505,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -529,12 +529,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
--
2.23.0
2
2

[PATCH openEuler-5.10 01/59] bcache: fix race between setting bdev state to none and new write request direct to backing
by Zheng Zengkai 22 Jun '22
by Zheng Zengkai 22 Jun '22
22 Jun '22
From: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
mainline inclusion
from v5.11-rc1
commit df4ad53242158f9f1f97daf4feddbb4f8b77f080
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue
CVE: N/A
-----------------------------------------------
There is a race condition in detaching as below:
A. detaching B. Write request
(1) writing back
(2) write back done, set bdev
state to clean.
(3) cached_dev_put() and
schedule_work(&dc->detach);
(4) write data [0 - 4K] directly
into backing and ack to user.
(5) power-failure...
When we restart this bcache device, this bdev is clean but not detached,
and read [0 - 4K], we will get unexpected old data from cache device.
To fix this problem, set the bdev state to none when we writeback done
in detaching, and then if power-failure happened as above, the data in
cache will not be used in next bcache device starting, it's detached, we
will read the correct data from backing derectly.
Signed-off-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
Signed-off-by: Coly Li <colyli(a)suse.de>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Reviewed-by: Jason Yan <yanaijie(a)huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com>
---
drivers/md/bcache/super.c | 9 ---------
drivers/md/bcache/writeback.c | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 81f1cc5b3499..b7d9d1b79ac2 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1151,9 +1151,6 @@ static void cancel_writeback_rate_update_dwork(struct cached_dev *dc)
static void cached_dev_detach_finish(struct work_struct *w)
{
struct cached_dev *dc = container_of(w, struct cached_dev, detach);
- struct closure cl;
-
- closure_init_stack(&cl);
BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
BUG_ON(refcount_read(&dc->count));
@@ -1167,12 +1164,6 @@ static void cached_dev_detach_finish(struct work_struct *w)
dc->writeback_thread = NULL;
}
- memset(&dc->sb.set_uuid, 0, 16);
- SET_BDEV_STATE(&dc->sb, BDEV_STATE_NONE);
-
- bch_write_bdev_super(dc, &cl);
- closure_sync(&cl);
-
mutex_lock(&bch_register_lock);
calc_cached_dev_sectors(dc->disk.c);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 3c74996978da..a129e4d2707c 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -705,6 +705,15 @@ static int bch_writeback_thread(void *arg)
* bch_cached_dev_detach().
*/
if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) {
+ struct closure cl;
+
+ closure_init_stack(&cl);
+ memset(&dc->sb.set_uuid, 0, 16);
+ SET_BDEV_STATE(&dc->sb, BDEV_STATE_NONE);
+
+ bch_write_bdev_super(dc, &cl);
+ closure_sync(&cl);
+
up_write(&dc->writeback_lock);
break;
}
--
2.20.1
1
58
Backport 5.10.108 LTS patches from upstream
Revert "selftests/bpf: Add test for bpf_timer overwriting crash"
smsc95xx: Ignore -ENODEV errors when device is unplugged
net: usb: Correct reset handling of smsc95xx
net: usb: Correct PHY handling of smsc95xx
perf symbols: Fix symbol size calculation condition
Input: aiptek - properly check endpoint type
scsi: mpt3sas: Page fault in reply q processing
usb: usbtmc: Fix bug in pipe direction for control transfers
usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower
offload
net: bcmgenet: skip invalid partial checksums
bnx2x: fix built-in kernel driver load failure
net: phy: mscc: Add MODULE_FIRMWARE macros
net: dsa: Add missing of_node_put() in dsa_port_parse_of
net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
drm/panel: simple: Fix Innolux G070Y2-L01 BPP settings
drm/imx: parallel-display: Remove bus flags check in
imx_pd_bridge_atomic_check()
hv_netvsc: Add check for kvmalloc_array
atm: eni: Add check for dma_map_single
net/packet: fix slab-out-of-bounds access in packet_recvmsg()
net: phy: marvell: Fix invalid comparison in the resume and suspend functions
esp6: fix check on ipv6_skip_exthdr's return value
vsock: each transport cycles only on its own sockets
efi: fix return value of __setup handlers
mm: swap: get rid of livelock in swapin readahead
ocfs2: fix crash when initialize filecheck kobj fails
crypto: qcom-rng - ensure buffer for generate is completely filled
already merged (2)
esp: Fix possible buffer overflow in ESP transformation
arm64: fix clang warning about TRAMP_VALIAS
Total patches = 30 - 2 = 28
Alan Stern (2):
usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
usb: usbtmc: Fix bug in pipe direction for control transfers
Brian Masney (1):
crypto: qcom-rng - ensure buffer for generate is completely filled
Christoph Niedermaier (1):
drm/imx: parallel-display: Remove bus flags check in
imx_pd_bridge_atomic_check()
Dan Carpenter (1):
usb: gadget: rndis: prevent integer overflow in rndis_set_response()
Doug Berger (1):
net: bcmgenet: skip invalid partial checksums
Eric Dumazet (1):
net/packet: fix slab-out-of-bounds access in packet_recvmsg()
Fabio Estevam (1):
smsc95xx: Ignore -ENODEV errors when device is unplugged
Greg Kroah-Hartman (1):
Revert "selftests/bpf: Add test for bpf_timer overwriting crash"
Guo Ziliang (1):
mm: swap: get rid of livelock in swapin readahead
Jiasheng Jiang (2):
atm: eni: Add check for dma_map_single
hv_netvsc: Add check for kvmalloc_array
Jiyong Park (1):
vsock: each transport cycles only on its own sockets
Joseph Qi (1):
ocfs2: fix crash when initialize filecheck kobj fails
Juerg Haefliger (1):
net: phy: mscc: Add MODULE_FIRMWARE macros
Kurt Cancemi (1):
net: phy: marvell: Fix invalid comparison in the resume and suspend
functions
Manish Chopra (1):
bnx2x: fix built-in kernel driver load failure
Marek Vasut (1):
drm/panel: simple: Fix Innolux G070Y2-L01 BPP settings
Markus Reichl (1):
net: usb: Correct reset handling of smsc95xx
Martyn Welch (1):
net: usb: Correct PHY handling of smsc95xx
Matt Lupfer (1):
scsi: mpt3sas: Page fault in reply q processing
Miaoqian Lin (1):
net: dsa: Add missing of_node_put() in dsa_port_parse_of
Michael Petlan (1):
perf symbols: Fix symbol size calculation condition
Nicolas Dichtel (1):
net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
Pavel Skripkin (1):
Input: aiptek - properly check endpoint type
Randy Dunlap (1):
efi: fix return value of __setup handlers
Sabrina Dubroca (1):
esp6: fix check on ipv6_skip_exthdr's return value
Vladimir Oltean (1):
net: mscc: ocelot: fix backwards compatibility with single-chain
tc-flower offload
drivers/atm/eni.c | 2 +
drivers/crypto/qcom-rng.c | 17 ++--
drivers/firmware/efi/apple-properties.c | 2 +-
drivers/firmware/efi/efi.c | 2 +-
drivers/gpu/drm/imx/parallel-display.c | 8 --
drivers/gpu/drm/panel/panel-simple.c | 2 +-
drivers/input/tablet/aiptek.c | 10 +--
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 -
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 28 +++---
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 +---
.../net/ethernet/broadcom/genet/bcmgenet.c | 6 +-
drivers/net/ethernet/mscc/ocelot_flower.c | 16 +++-
drivers/net/hyperv/netvsc_drv.c | 3 +
drivers/net/phy/marvell.c | 8 +-
drivers/net/phy/mscc/mscc_main.c | 3 +
drivers/net/usb/smsc95xx.c | 86 +++++++++++--------
drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +-
drivers/usb/class/usbtmc.c | 13 ++-
drivers/usb/gadget/function/rndis.c | 1 +
drivers/usb/gadget/udc/core.c | 3 -
drivers/vhost/vsock.c | 3 +-
fs/ocfs2/super.c | 22 ++---
include/linux/if_arp.h | 1 +
include/net/af_vsock.h | 3 +-
mm/swap_state.c | 2 +-
net/dsa/dsa2.c | 1 +
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 11 ++-
net/vmw_vsock/af_vsock.c | 9 +-
net/vmw_vsock/virtio_transport.c | 7 +-
net/vmw_vsock/vmci_transport.c | 5 +-
tools/perf/util/symbol.c | 2 +-
.../selftests/bpf/prog_tests/timer_crash.c | 32 -------
.../testing/selftests/bpf/progs/timer_crash.c | 54 ------------
34 files changed, 175 insertions(+), 212 deletions(-)
delete mode 100644 tools/testing/selftests/bpf/prog_tests/timer_crash.c
delete mode 100644 tools/testing/selftests/bpf/progs/timer_crash.c
--
2.20.1
1
28