Kernel
Threads by month
- ----- 2025 -----
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- 3 participants
- 17954 discussions

[PATCH openEuler-1.0-LTS] io_uring: io_close: Set owner as current->files if req->work.files uninitialized
by Yongqiang Liu 28 Jun '22
by Yongqiang Liu 28 Jun '22
28 Jun '22
From: Zhihao Cheng <chengzhihao1(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: 186543, https://gitee.com/openeuler/kernel/issues/I5BGFA
CVE: NA
--------------------------------
Following process will trigger an use-after-free problem:
1. open /proc/sysvipc/msg and lock it by file lock
fcntl_setlk
do_lock_file_wait
vfs_lock_file
posix_lock_file
locks_insert_lock_ctx
locks_insert_global_locks // Added to lock list
2. Close /proc/sysvipc/msg by io_uring
filp_close(close->put_file, req->work.files) // req->work.files equals
NULL,io_grab_files() initialize it, non-async operations
won't invokes the function.
locks_remove_posix(filp, NULL)
lock.fl_owner = NULL
vfs_lock_file
posix_lock_file
posix_same_owner // Return false according to fl_owner.
locks_delete_lock_ctx(fl, &dispose) and locks_dispose_list
won't be executed, flock is not removed from lock list
fput(filp) // release filp
3. Read /proc/locks
seq_read
locks_start // Get flock from lock list
locks_show
lock_get_status
file_inode(f->file) // Access released file, UAF occurs!
Fix it by passing current->files when req->work.files is uninitialized,
because io-sq thread shares same files with uring_fd task, so it still
works in SQPOLL mode.
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
fs/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c104425b2557..7ae8ba98e73b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3903,7 +3903,7 @@ static int io_close(struct io_kiocb *req, bool force_nonblock,
}
/* No ->flush() or already async, safely close from here */
- ret = filp_close(close->put_file, req->work.files);
+ ret = filp_close(close->put_file, req->work.files ? : current->files);
if (ret < 0)
req_set_fail_links(req);
fput(close->put_file);
--
2.25.1
1
0

[PATCH openEuler-5.10-LTS 1/5] lockdown: also lock down previous kgdb use
by Zheng Zengkai 27 Jun '22
by Zheng Zengkai 27 Jun '22
27 Jun '22
From: Daniel Thompson <daniel.thompson(a)linaro.org>
from stable-v5.10.119
commit a8f4d63142f947cd22fa615b8b3b8921cdaf4991
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5A5YP
CVE: CVE-2022-21499
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
commit eadb2f47a3ced5c64b23b90fd2a3463f63726066 upstream.
KGDB and KDB allow read and write access to kernel memory, and thus
should be restricted during lockdown. An attacker with access to a
serial port (for example, via a hypervisor console, which some cloud
vendors provide over the network) could trigger the debugger so it is
important that the debugger respect the lockdown mode when/if it is
triggered.
Fix this by integrating lockdown into kdb's existing permissions
mechanism. Unfortunately kgdb does not have any permissions mechanism
(although it certainly could be added later) so, for now, kgdb is simply
and brutally disabled by immediately exiting the gdb stub without taking
any action.
For lockdowns established early in the boot (e.g. the normal case) then
this should be fine but on systems where kgdb has set breakpoints before
the lockdown is enacted than "bad things" will happen.
CVE: CVE-2022-21499
Co-developed-by: Stephen Brennan <stephen.s.brennan(a)oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com>
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com>
---
include/linux/security.h | 2 ++
kernel/debug/debug_core.c | 24 ++++++++++++++
kernel/debug/kdb/kdb_main.c | 62 +++++++++++++++++++++++++++++++++++--
security/security.c | 2 ++
4 files changed, 87 insertions(+), 3 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h
index 35355429648e..330029ef7e89 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -121,10 +121,12 @@ enum lockdown_reason {
LOCKDOWN_DEBUGFS,
LOCKDOWN_XMON_WR,
LOCKDOWN_BPF_WRITE_USER,
+ LOCKDOWN_DBG_WRITE_KERNEL,
LOCKDOWN_INTEGRITY_MAX,
LOCKDOWN_KCORE,
LOCKDOWN_KPROBES,
LOCKDOWN_BPF_READ,
+ LOCKDOWN_DBG_READ_KERNEL,
LOCKDOWN_PERF,
LOCKDOWN_TRACEFS,
LOCKDOWN_XMON_RW,
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index be5b6b97adbf..363f781b56ca 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -56,6 +56,7 @@
#include <linux/vmacache.h>
#include <linux/rcupdate.h>
#include <linux/irq.h>
+#include <linux/security.h>
#include <asm/cacheflush.h>
#include <asm/byteorder.h>
@@ -762,6 +763,29 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
continue;
kgdb_connected = 0;
} else {
+ /*
+ * This is a brutal way to interfere with the debugger
+ * and prevent gdb being used to poke at kernel memory.
+ * This could cause trouble if lockdown is applied when
+ * there is already an active gdb session. For now the
+ * answer is simply "don't do that". Typically lockdown
+ * *will* be applied before the debug core gets started
+ * so only developers using kgdb for fairly advanced
+ * early kernel debug can be biten by this. Hopefully
+ * they are sophisticated enough to take care of
+ * themselves, especially with help from the lockdown
+ * message printed on the console!
+ */
+ if (security_locked_down(LOCKDOWN_DBG_WRITE_KERNEL)) {
+ if (IS_ENABLED(CONFIG_KGDB_KDB)) {
+ /* Switch back to kdb if possible... */
+ dbg_kdb_mode = 1;
+ continue;
+ } else {
+ /* ... otherwise just bail */
+ break;
+ }
+ }
error = gdb_serial_stub(ks);
}
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 930ac1b25ec7..4e09fab52faf 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -45,6 +45,7 @@
#include <linux/proc_fs.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
+#include <linux/security.h>
#include "kdb_private.h"
#undef MODULE_PARAM_PREFIX
@@ -197,10 +198,62 @@ struct task_struct *kdb_curr_task(int cpu)
}
/*
- * Check whether the flags of the current command and the permissions
- * of the kdb console has allow a command to be run.
+ * Update the permissions flags (kdb_cmd_enabled) to match the
+ * current lockdown state.
+ *
+ * Within this function the calls to security_locked_down() are "lazy". We
+ * avoid calling them if the current value of kdb_cmd_enabled already excludes
+ * flags that might be subject to lockdown. Additionally we deliberately check
+ * the lockdown flags independently (even though read lockdown implies write
+ * lockdown) since that results in both simpler code and clearer messages to
+ * the user on first-time debugger entry.
+ *
+ * The permission masks during a read+write lockdown permits the following
+ * flags: INSPECT, SIGNAL, REBOOT (and ALWAYS_SAFE).
+ *
+ * The INSPECT commands are not blocked during lockdown because they are
+ * not arbitrary memory reads. INSPECT covers the backtrace family (sometimes
+ * forcing them to have no arguments) and lsmod. These commands do expose
+ * some kernel state but do not allow the developer seated at the console to
+ * choose what state is reported. SIGNAL and REBOOT should not be controversial,
+ * given these are allowed for root during lockdown already.
+ */
+static void kdb_check_for_lockdown(void)
+{
+ const int write_flags = KDB_ENABLE_MEM_WRITE |
+ KDB_ENABLE_REG_WRITE |
+ KDB_ENABLE_FLOW_CTRL;
+ const int read_flags = KDB_ENABLE_MEM_READ |
+ KDB_ENABLE_REG_READ;
+
+ bool need_to_lockdown_write = false;
+ bool need_to_lockdown_read = false;
+
+ if (kdb_cmd_enabled & (KDB_ENABLE_ALL | write_flags))
+ need_to_lockdown_write =
+ security_locked_down(LOCKDOWN_DBG_WRITE_KERNEL);
+
+ if (kdb_cmd_enabled & (KDB_ENABLE_ALL | read_flags))
+ need_to_lockdown_read =
+ security_locked_down(LOCKDOWN_DBG_READ_KERNEL);
+
+ /* De-compose KDB_ENABLE_ALL if required */
+ if (need_to_lockdown_write || need_to_lockdown_read)
+ if (kdb_cmd_enabled & KDB_ENABLE_ALL)
+ kdb_cmd_enabled = KDB_ENABLE_MASK & ~KDB_ENABLE_ALL;
+
+ if (need_to_lockdown_write)
+ kdb_cmd_enabled &= ~write_flags;
+
+ if (need_to_lockdown_read)
+ kdb_cmd_enabled &= ~read_flags;
+}
+
+/*
+ * Check whether the flags of the current command, the permissions of the kdb
+ * console and the lockdown state allow a command to be run.
*/
-static inline bool kdb_check_flags(kdb_cmdflags_t flags, int permissions,
+static bool kdb_check_flags(kdb_cmdflags_t flags, int permissions,
bool no_args)
{
/* permissions comes from userspace so needs massaging slightly */
@@ -1194,6 +1247,9 @@ static int kdb_local(kdb_reason_t reason, int error, struct pt_regs *regs,
kdb_curr_task(raw_smp_processor_id());
KDB_DEBUG_STATE("kdb_local 1", reason);
+
+ kdb_check_for_lockdown();
+
kdb_go_count = 0;
if (reason == KDB_REASON_DEBUG) {
/* special case below */
diff --git a/security/security.c b/security/security.c
index 4fb58543eeb9..2fc40217d49d 100644
--- a/security/security.c
+++ b/security/security.c
@@ -59,10 +59,12 @@ const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
[LOCKDOWN_DEBUGFS] = "debugfs access",
[LOCKDOWN_XMON_WR] = "xmon write access",
[LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
+ [LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
[LOCKDOWN_INTEGRITY_MAX] = "integrity",
[LOCKDOWN_KCORE] = "/proc/kcore access",
[LOCKDOWN_KPROBES] = "use of kprobes",
[LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
+ [LOCKDOWN_DBG_READ_KERNEL] = "use of kgdb/kdb to read kernel RAM",
[LOCKDOWN_PERF] = "unsafe use of perf",
[LOCKDOWN_TRACEFS] = "use of tracefs",
[LOCKDOWN_XMON_RW] = "xmon read and write access",
--
2.20.1
1
4
Backport 5.10.109 LTS patches from upstream
nds32: fix access_ok() checks in get/put_user
wcn36xx: Differentiate wcn3660 from wcn3620
tpm: use try_get_ops() in tpm-space.c
mac80211: fix potential double free on mesh join
rcu: Don't deboost before reporting expedited quiescent state
Revert "ath: add support for special 0x0 regulatory domain"
crypto: qat - disable registration of algorithms
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
drivers: net: xgene: Fix regression in CRC stripping
ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
ALSA: cmipci: Restore aux vol on suspend/resume
ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
ALSA: pcm: Add stream lock during PCM reset ioctl operations
ALSA: pcm: Fix races among concurrent prealloc proc writes
ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls
ALSA: pcm: Fix races among concurrent read/write and buffer changes
ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
ALSA: hda/realtek: Add quirk for ASUS GA402
ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
ALSA: hda/realtek: Add quirk for Clevo NP50PNJ
ALSA: hda/realtek: Add quirk for Clevo NP70PNJ
ALSA: usb-audio: add mapping for new Corsair Virtuoso SE
ALSA: oss: Fix PCM OSS buffer allocation overflow
ASoC: sti: Fix deadlock via snd_pcm_stop_xrun() call
staging: fbtft: fb_st7789v: reset display before initialization
tpm: Fix error handling in async work
cgroup-v1: Correct privileges check in release_agent writes
exfat: avoid incorrectly releasing for root inode
net: ipv6: fix skb_over_panic in __ip6_append_data
Already merged:
llc: only change llc->dev when bind() succeeds
netfilter: nf_tables: initialize registers in nft_do_chain()
llc: fix netdevice reference leaks in llc_ui_bind()
cgroup: Use open-time cgroup namespace for process migration perm checks
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION
Total patches: 37 - 6 = 31
Arnd Bergmann (1):
nds32: fix access_ok() checks in get/put_user
Brian Norris (1):
Revert "ath: add support for special 0x0 regulatory domain"
Bryan O'Donoghue (1):
wcn36xx: Differentiate wcn3660 from wcn3620
Chen Li (1):
exfat: avoid incorrectly releasing for root inode
Giacomo Guiduzzi (1):
ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
Giovanni Cabiddu (1):
crypto: qat - disable registration of algorithms
James Bottomley (1):
tpm: use try_get_ops() in tpm-space.c
Jason Zheng (1):
ALSA: hda/realtek: Add quirk for ASUS GA402
Jonathan Teh (1):
ALSA: cmipci: Restore aux vol on suspend/resume
Lars-Peter Clausen (1):
ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
Linus Lüssing (1):
mac80211: fix potential double free on mesh join
Mark Cilissen (1):
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
Maximilian Luz (1):
ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
Michal Koutný (1):
cgroup-v1: Correct privileges check in release_agent writes
Oliver Graute (1):
staging: fbtft: fb_st7789v: reset display before initialization
Paul E. McKenney (1):
rcu: Don't deboost before reporting expedited quiescent state
Reza Jahanbakhshi (1):
ALSA: usb-audio: add mapping for new Corsair Virtuoso SE
Stephane Graber (1):
drivers: net: xgene: Fix regression in CRC stripping
Tadeusz Struk (2):
net: ipv6: fix skb_over_panic in __ip6_append_data
tpm: Fix error handling in async work
Takashi Iwai (7):
ASoC: sti: Fix deadlock via snd_pcm_stop_xrun() call
ALSA: oss: Fix PCM OSS buffer allocation overflow
ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
ALSA: pcm: Fix races among concurrent read/write and buffer changes
ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free
calls
ALSA: pcm: Fix races among concurrent prealloc proc writes
ALSA: pcm: Add stream lock during PCM reset ioctl operations
Tim Crawford (2):
ALSA: hda/realtek: Add quirk for Clevo NP70PNJ
ALSA: hda/realtek: Add quirk for Clevo NP50PNJ
Werner Sembach (1):
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
huangwenhui (1):
ALSA: hda/realtek - Fix headset mic problem for a HP machine with
alc671
arch/nds32/include/asm/uaccess.h | 22 ++++-
arch/x86/kernel/acpi/boot.c | 24 +++++
drivers/acpi/battery.c | 12 +++
drivers/acpi/video_detect.c | 75 ++++++++++++++
drivers/char/tpm/tpm-dev-common.c | 8 +-
drivers/char/tpm/tpm2-space.c | 8 +-
drivers/crypto/qat/qat_common/qat_crypto.c | 8 ++
.../net/ethernet/apm/xgene/xgene_enet_main.c | 12 ++-
drivers/net/wireless/ath/regd.c | 10 +-
drivers/net/wireless/ath/wcn36xx/main.c | 3 +
drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 1 +
drivers/staging/fbtft/fb_st7789v.c | 2 +
fs/exfat/super.c | 2 +-
include/sound/pcm.h | 1 +
kernel/cgroup/cgroup-v1.c | 6 +-
kernel/rcu/tree_plugin.h | 9 +-
net/ipv6/ip6_output.c | 4 +-
net/mac80211/cfg.c | 3 -
sound/core/oss/pcm_oss.c | 12 ++-
sound/core/oss/pcm_plugin.c | 5 +-
sound/core/pcm.c | 2 +
sound/core/pcm_lib.c | 4 +
sound/core/pcm_memory.c | 11 ++-
sound/core/pcm_native.c | 97 ++++++++++++-------
sound/pci/ac97/ac97_codec.c | 4 +-
sound/pci/cmipci.c | 3 +-
sound/pci/hda/patch_realtek.c | 4 +
sound/soc/sti/uniperif_player.c | 6 +-
sound/soc/sti/uniperif_reader.c | 2 +-
sound/usb/mixer_maps.c | 10 ++
sound/usb/mixer_quirks.c | 7 +-
31 files changed, 289 insertions(+), 88 deletions(-)
--
2.20.1
1
31

[PATCH openEuler-1.0-LTS] mm/memcontrol: fix wrong vmstats for dying memcg
by Yongqiang Liu 27 Jun '22
by Yongqiang Liu 27 Jun '22
27 Jun '22
From: Lu Jialin <lujialin4(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5E8LA
CVE: NA
--------------------------------
At present, only when the absolute value of stat_cpu->count exceeds
MEMCG_CHARGE_BATCH will it be updated to stat, so there will always
be a certain lag difference between stat and the correct value.
In addition, since the partially deleted memcg is still referenced, it
will not be freed immediately after it is offline. Although the
remaining memcg has released the page, it and the parent's stat will
still be not 0 or too large due to the update lag, which leads to the
abnormality of the total_<count> parameter in the memory.stat file.
This patch mainly solves the problem of synchronization between
memcg's stat and the correct value during the destruction process
from two aspects:
1) Perform a flush synchronization operation when memcg is offline
2) For memcg in the process of being destroyed, bypass the threshold
judgment when updating vmstats
Signed-off-by: Lu Jialin <lujialin4(a)huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
mm/memcontrol.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2983baf910f4..345a9d159ad8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -697,7 +697,8 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
return;
x = val + __this_cpu_read(memcg->stat_cpu->count[idx]);
- if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
+ if (unlikely(abs(x) > MEMCG_CHARGE_BATCH ||
+ memcg->css.flags & CSS_DYING)) {
struct mem_cgroup *mi;
struct mem_cgroup_extension *memcg_ext;
@@ -3244,8 +3245,10 @@ static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg)
stat[i] = 0;
for_each_online_cpu(cpu)
- for (i = 0; i < MEMCG_NR_STAT; i++)
+ for (i = 0; i < MEMCG_NR_STAT; i++) {
stat[i] += per_cpu(memcg->stat_cpu->count[i], cpu);
+ per_cpu(memcg->stat_cpu->count[i], cpu) = 0;
+ }
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
for (i = 0; i < MEMCG_NR_STAT; i++)
@@ -3259,9 +3262,11 @@ static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg)
stat[i] = 0;
for_each_online_cpu(cpu)
- for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
stat[i] += per_cpu(
pn->lruvec_stat_cpu->count[i], cpu);
+ per_cpu(pn->lruvec_stat_cpu->count[i], cpu) = 0;
+ }
for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
@@ -3279,9 +3284,11 @@ static void memcg_flush_percpu_vmevents(struct mem_cgroup *memcg)
events[i] = 0;
for_each_online_cpu(cpu)
- for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
events[i] += per_cpu(memcg->stat_cpu->events[i],
cpu);
+ per_cpu(memcg->stat_cpu->events[i], cpu) = 0;
+ }
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
@@ -5106,6 +5113,9 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
memcg_offline_kmem(memcg);
wb_memcg_offline(memcg);
+ memcg_flush_percpu_vmstats(memcg);
+ memcg_flush_percpu_vmevents(memcg);
+
mem_cgroup_id_put(memcg);
}
--
2.25.1
1
0
大家好,
本次Intel Arch例会定于本周二6/28 10:00-11:00AM进行, 欢迎大家提出更多需求或议题和参与讨论。
本次初步议题:
Agenda:
*Status update
*SPR feature PRs merge into intel-kernel & OLK-5.10 kernel *Compiler support for new instructions *Support 22.09 release for SPR fundamental features
-----Original Appointment-----
From: openEuler conference <public(a)openeuler.org>
Sent: Monday, June 20, 2022 3:10 PM
To: openEuler conference; jun.j.tian@intel.com,kai.liu@suse.com
Subject: sig-Intel-Arch
When: Tuesday, June 28, 2022 10:00 AM-11:00 AM (UTC+08:00) Beijing, Chongqing, Hong Kong, Urumqi.
Where:
您好!
sig-Intel-Arch SIG 邀请您参加 2022-06-28 10:00 召开的Zoom会议
会议主题:sig-Intel-Arch
会议链接:https://us06web.zoom.us/j/81976528831?pwd=cVIxUkRhUXFGcldFV0ZtNkpvUFpxZz09
会议纪要:https://etherpad.openeuler.org/p/sig-Intel-Arch-meetings
温馨提醒:建议接入会议后修改参会人的姓名,也可以使用您在gitee.com的ID
更多资讯尽在:https://openeuler.org/zh/
Hello!
openEuler sig-Intel-Arch SIG invites you to attend the Zoom conference will be held at 2022-06-28 10:00,
The subject of the conference is sig-Intel-Arch,
You can join the meeting at https://us06web.zoom.us/j/81976528831?pwd=cVIxUkRhUXFGcldFV0ZtNkpvUFpxZz09.
Add topics at https://etherpad.openeuler.org/p/sig-Intel-Arch-meetings.
Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com.
More information: https://openeuler.org/en/
1
0

[PATCH openEuler-1.0-LTS] ext4: recover csum seed of tmp_inode after migrating to extents
by Yongqiang Liu 25 Jun '22
by Yongqiang Liu 25 Jun '22
25 Jun '22
From: Li Lingfeng <lilingfeng3(a)huawei.com>
hulk inclusion
category: bugfix
bugzilla: 186944, https://gitee.com/openeuler/kernel/issues/I5DAJY
CVE: NA
--------------------------------
When migrating to extents, the checksum seed of temporary inode
need to be replaced by inode's, otherwise the inode checksums
will be incorrect when swapping the inodes data.
However, the temporary inode can not match it's checksum to
itself since it has lost it's own checksum seed.
mkfs.ext4 -F /dev/sdc
mount /dev/sdc /mnt/sdc
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/sdc/testfile
chattr -e /mnt/sdc/testfile
chattr +e /mnt/sdc/testfile
fsck -fn /dev/sdc
========
...
Pass 1: Checking inodes, blocks, and sizes
Inode 13 passes checks, but checksum does not match inode. Fix? no
...
========
The fix is simple, save the checksum seed of temporary inode, and
recover it after migrating to extents.
Fixes: e81c9302a6c3 ("ext4: set csum seed in tmp inode while migrating to extents")
Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
fs/ext4/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 75a769634b2b..ed9e7816efbb 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -415,7 +415,7 @@ int ext4_ext_migrate(struct inode *inode)
struct inode *tmp_inode = NULL;
struct migrate_struct lb;
unsigned long max_entries;
- __u32 goal;
+ __u32 goal, tmp_csum_seed;
uid_t owner[2];
/*
@@ -463,6 +463,7 @@ int ext4_ext_migrate(struct inode *inode)
* the migration.
*/
ei = EXT4_I(inode);
+ tmp_csum_seed = EXT4_I(tmp_inode)->i_csum_seed;
EXT4_I(tmp_inode)->i_csum_seed = ei->i_csum_seed;
i_size_write(tmp_inode, i_size_read(inode));
/*
@@ -573,6 +574,7 @@ int ext4_ext_migrate(struct inode *inode)
* the inode is not visible to user space.
*/
tmp_inode->i_blocks = 0;
+ EXT4_I(tmp_inode)->i_csum_seed = tmp_csum_seed;
/* Reset the extent details */
ext4_ext_tree_init(handle, tmp_inode);
--
2.25.1
1
0

[PATCH openEuler-1.0-LTS] vfio: framework supporting vfio device hot migration
by RongWang 24 Jun '22
by RongWang 24 Jun '22
24 Jun '22
From: Rong Wang <w_angrong(a)163.com>
kunpeng inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CO9A
CVE: NA
---------------------------------
As pass through devices, hypervisor can`t control the status of
device, and can`t track dirty memory DMA from device, during
migration.
The goal of this framework is to combine hardware to accomplish
the task above.
qemu
|status control and dirty memory report
vfio
|ops to hardware
hardware
Signed-off-by: Rong Wang <w_angrong(a)163.com>
Signed-off-by: HuHua Li <18245010845(a)163.com>
Signed-off-by: Ripeng Qiu <965412048(a)qq.com>
---
drivers/vfio/pci/Makefile | 2 +-
drivers/vfio/pci/vfio_pci.c | 54 +++
drivers/vfio/pci/vfio_pci_migration.c | 755 ++++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_private.h | 14 +-
drivers/vfio/vfio.c | 411 +++++++++++++++++-
include/linux/vfio_pci_migration.h | 136 ++++++
6 files changed, 1367 insertions(+), 5 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_migration.c
create mode 100644 include/linux/vfio_pci_migration.h
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 76d8ec0..80a777d 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,5 +1,5 @@
-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio_pci_migration.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 51b791c..59d8280 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -30,6 +30,7 @@
#include <linux/vgaarb.h>
#include <linux/nospec.h>
#include <linux/sched/mm.h>
+#include <linux/vfio_pci_migration.h>
#include "vfio_pci_private.h"
@@ -296,6 +297,14 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
vfio_pci_probe_mmaps(vdev);
+ if (vfio_dev_migration_is_supported(pdev)) {
+ ret = vfio_pci_migration_init(vdev);
+ if (ret) {
+ dev_warn(&vdev->pdev->dev, "Failed to init vfio_pci_migration\n");
+ vfio_pci_disable(vdev);
+ return ret;
+ }
+ }
return 0;
}
@@ -392,6 +401,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
out:
pci_disable_device(pdev);
+ vfio_pci_migration_exit(vdev);
vfio_pci_try_bus_reset(vdev);
if (!disable_idle_d3)
@@ -642,6 +652,41 @@ struct vfio_devices {
int max_index;
};
+static long vfio_pci_handle_log_buf_ctl(struct vfio_pci_device *vdev,
+ const unsigned long arg)
+{
+ struct vfio_log_buf_ctl *log_buf_ctl = NULL;
+ struct vfio_log_buf_info *log_buf_info = NULL;
+ struct vf_migration_log_info migration_log_info;
+ long ret = 0;
+
+ log_buf_ctl = (struct vfio_log_buf_ctl *)arg;
+ log_buf_info = (struct vfio_log_buf_info *)log_buf_ctl->data;
+
+ switch (log_buf_ctl->flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ migration_log_info.dom_uuid = log_buf_info->uuid;
+ migration_log_info.buffer_size =
+ log_buf_info->buffer_size;
+ migration_log_info.sge_num = log_buf_info->addrs_size;
+ migration_log_info.sge_len = log_buf_info->frag_size;
+ migration_log_info.sgevec = log_buf_info->sgevec;
+ ret = vfio_pci_device_log_start(vdev,
+ &migration_log_info);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_pci_device_log_stop(vdev,
+ log_buf_info->uuid);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_pci_device_log_status_query(vdev);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
static long vfio_pci_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
{
@@ -1142,6 +1187,8 @@ static long vfio_pci_ioctl(void *device_data,
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
+ } else if (cmd == VFIO_DEVICE_LOG_BUF_CTL) {
+ return vfio_pci_handle_log_buf_ctl(vdev, arg);
}
return -ENOTTY;
@@ -1566,6 +1613,9 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
pci_set_power_state(pdev, PCI_D3hot);
}
+ if (vfio_dev_migration_is_supported(pdev))
+ ret = vfio_pci_device_init(pdev);
+
return ret;
}
@@ -1591,6 +1641,10 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!disable_idle_d3)
pci_set_power_state(pdev, PCI_D0);
+
+ if (vfio_dev_migration_is_supported(pdev)) {
+ vfio_pci_device_uninit(pdev);
+ }
}
static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci_migration.c b/drivers/vfio/pci/vfio_pci_migration.c
new file mode 100644
index 0000000..f69cd13
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_migration.c
@@ -0,0 +1,755 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
+
+#include "vfio_pci_private.h"
+
+static LIST_HEAD(vfio_pci_mig_drivers_list);
+static DEFINE_MUTEX(vfio_pci_mig_drivers_mutex);
+
+static void vfio_pci_add_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ atomic_set(&mig_drv->count, 1);
+ list_add_tail(&mig_drv->list, &vfio_pci_mig_drivers_list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static void vfio_pci_remove_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_del(&mig_drv->list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_find_mig_drv(struct pci_dev *pdev, struct module *module)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->owner == module) {
+ if (mig_drv->bus_num == pdev->bus->number)
+ goto out;
+ }
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_get_mig_driver(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+ struct pci_dev *pf_dev = pci_physfn(pdev);
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->bus_num == pf_dev->bus->number)
+ goto out;
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+bool vfio_dev_migration_is_supported(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&pdev->dev, "unable to find a mig_drv module\n");
+ return false;
+ }
+
+ return true;
+}
+
+int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_start ||
+ (mig_driver->dev_mig_ops->log_start(vdev->pdev,
+ log_info) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log start\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_stop(struct vfio_pci_device *vdev, uint32_t uuid)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_stop ||
+ (mig_driver->dev_mig_ops->log_stop(vdev->pdev, uuid) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log stop\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->get_log_status ||
+ (mig_driver->dev_mig_ops->get_log_status(vdev->pdev) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to get log status\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_init(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (mig_drv->dev_mig_ops->init)
+ return mig_drv->dev_mig_ops->init(pdev);
+
+ return -EFAULT;
+}
+
+void vfio_pci_device_uninit(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return;
+ }
+
+ if (mig_drv->dev_mig_ops->uninit)
+ mig_drv->dev_mig_ops->uninit(pdev);
+}
+
+static void vfio_pci_device_release(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->release)
+ mig_drv->dev_mig_ops->release(pdev);
+}
+
+static int vfio_pci_device_get_info(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->get_info)
+ return mig_drv->dev_mig_ops->get_info(pdev, mig_info);
+ return -EFAULT;
+}
+
+static int vfio_pci_device_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->enable ||
+ (mig_drv->dev_mig_ops->enable(pdev) != 0)) {
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int vfio_pci_device_disable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->disable ||
+ (mig_drv->dev_mig_ops->disable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_pre_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->pre_enable ||
+ (mig_drv->dev_mig_ops->pre_enable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_state_save(struct pci_dev *pdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ void *base = (void *)mig_info;
+ int ret = 0;
+
+ if ((mig_info->device_state & VFIO_DEVICE_STATE_RUNNING) != 0) {
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to stop VF function!\n");
+ return ret;
+ }
+ mig_info->device_state &= ~VFIO_DEVICE_STATE_RUNNING;
+ }
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->save) {
+ ret = mig_drv->dev_mig_ops->save(pdev, base,
+ mig_info->data_offset, data->state_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to save device state!\n");
+ return -EINVAL;
+ }
+ } else {
+ return -EFAULT;
+ }
+
+ mig_info->data_size = data->state_size;
+ mig_info->pending_bytes = mig_info->data_size;
+ return ret;
+}
+
+static int vfio_pci_device_state_restore(struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ void *base = (void *)mig_info;
+ int ret;
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->restore) {
+ ret = mig_drv->dev_mig_ops->restore(pdev, base,
+ mig_info->data_offset, mig_info->data_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to restore device state!\n");
+ return -EINVAL;
+ }
+ return 0;
+ }
+
+ return -EFAULT;
+}
+
+static int vfio_pci_set_device_state(struct vfio_pci_migration_data *data,
+ u32 state)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ int ret = 0;
+
+ if (state == mig_ctl->device_state)
+ return 0;
+
+ if (!mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ switch (state) {
+ case VFIO_DEVICE_STATE_RUNNING:
+ if (!(mig_ctl->device_state &
+ VFIO_DEVICE_STATE_RUNNING))
+ ret = vfio_pci_device_enable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING:
+ /*
+ * (pre-copy) - device should start logging data.
+ */
+ ret = 0;
+ break;
+ case VFIO_DEVICE_STATE_SAVING:
+ /* stop the vf function, save state */
+ ret = vfio_pci_device_state_save(pdev, data);
+ break;
+ case VFIO_DEVICE_STATE_STOP:
+ if (mig_ctl->device_state & VFIO_DEVICE_STATE_RUNNING)
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_RESUMING:
+ ret = vfio_pci_device_pre_enable(pdev, mig_drv);
+ break;
+ default:
+ ret = -EFAULT;
+ break;
+ }
+
+ if (ret)
+ return ret;
+
+ mig_ctl->device_state = state;
+ return 0;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_state(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ u32 device_state;
+ int ret;
+
+ if (count != sizeof(device_state))
+ return -EINVAL;
+
+ if (iswrite) {
+ if (copy_from_user(&device_state, buf, count))
+ return -EFAULT;
+
+ ret = vfio_pci_set_device_state(data, device_state);
+ if (ret)
+ return ret;
+ } else {
+ if (copy_to_user(buf, &mig_ctl->device_state, count))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_pending_bytes(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 pending_bytes;
+
+ if (count != sizeof(pending_bytes) || iswrite)
+ return -EINVAL;
+
+ if (mig_info->device_state ==
+ (VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING)) {
+ /* In pre-copy state we have no data to return for now,
+ * return 0 pending bytes
+ */
+ pending_bytes = 0;
+ } else {
+ pending_bytes = mig_info->pending_bytes;
+ }
+
+ if (copy_to_user(buf, &pending_bytes, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_offset(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_offset = mig_info->data_offset;
+
+ if (count != sizeof(data_offset) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &data_offset, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_size(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_size;
+
+ if (count != sizeof(data_size))
+ return -EINVAL;
+
+ if (iswrite) {
+ /* data_size is writable only during resuming state */
+ if (mig_info->device_state != VFIO_DEVICE_STATE_RESUMING)
+ return -EINVAL;
+
+ if (copy_from_user(&data_size, buf, sizeof(data_size)))
+ return -EFAULT;
+
+ mig_info->data_size = data_size;
+ } else {
+ if (mig_info->device_state != VFIO_DEVICE_STATE_SAVING)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &mig_info->data_size,
+ sizeof(data_size)))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_cmd(struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ u32 device_cmd;
+ int ret = -EFAULT;
+
+ if (count != sizeof(device_cmd) || !iswrite || !mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ if (copy_from_user(&device_cmd, buf, count))
+ return -EFAULT;
+
+ switch (device_cmd) {
+ case VFIO_DEVICE_MIGRATION_CANCEL:
+ if (mig_drv->dev_mig_ops->cancel)
+ ret = mig_drv->dev_mig_ops->cancel(pdev);
+ break;
+ default:
+ dev_err(&pdev->dev, "cmd is invaild\n");
+ return -EINVAL;
+ }
+
+ if (ret != 0)
+ return ret;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_drv_version(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u32 version_id = mig_info->version_id;
+
+ if (count != sizeof(version_id) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &version_id, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_rw(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, u64 pos, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ void *data_addr = data->vf_data;
+
+ if (count == 0) {
+ dev_err(&data->vf_dev->dev, "qemu operation data size error!\n");
+ return -EINVAL;
+ }
+
+ data_addr += pos - mig_ctl->data_offset;
+ if (iswrite) {
+ if (copy_from_user(data_addr, buf, count))
+ return -EFAULT;
+
+ mig_ctl->pending_bytes += count;
+ if (mig_ctl->pending_bytes > data->state_size)
+ return -EINVAL;
+ } else {
+ if (copy_to_user(buf, data_addr, count))
+ return -EFAULT;
+
+ if (mig_ctl->pending_bytes < count)
+ return -EINVAL;
+
+ mig_ctl->pending_bytes -= count;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_dev_migrn_rw(struct vfio_pci_device *vdev,
+ char __user *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+ unsigned int index =
+ VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+ struct vfio_pci_migration_data *data =
+ (struct vfio_pci_migration_data *)vdev->region[index].data;
+ loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ int ret;
+
+ if (pos >= vdev->region[index].size)
+ return -EINVAL;
+
+ count = min(count, (size_t)(vdev->region[index].size - pos));
+ if (pos >= VFIO_MIGRATION_REGION_DATA_OFFSET)
+ return vfio_pci_handle_mig_data_rw(data,
+ buf, count, pos, iswrite);
+
+ switch (pos) {
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_state):
+ ret = vfio_pci_handle_mig_dev_state(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(pending_bytes):
+ ret = vfio_pci_handle_mig_pending_bytes(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_offset):
+ ret = vfio_pci_handle_mig_data_offset(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_size):
+ ret = vfio_pci_handle_mig_data_size(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_cmd):
+ ret = vfio_pci_handle_mig_dev_cmd(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(version_id):
+ ret = vfio_pci_handle_mig_drv_version(mig_ctl,
+ buf, count, iswrite);
+ break;
+ default:
+ dev_err(&vdev->pdev->dev, "invalid pos offset\n");
+ ret = -EFAULT;
+ break;
+ }
+
+ if (mig_ctl->device_state == VFIO_DEVICE_STATE_RESUMING &&
+ mig_ctl->pending_bytes == data->state_size &&
+ mig_ctl->data_size == data->state_size) {
+ if (vfio_pci_device_state_restore(data) != 0) {
+ dev_err(&vdev->pdev->dev, "Failed to restore device state!\n");
+ return -EFAULT;
+ }
+ mig_ctl->pending_bytes = 0;
+ mig_ctl->data_size = 0;
+ }
+
+ return ret;
+}
+
+static void vfio_pci_dev_migrn_release(struct vfio_pci_device *vdev,
+ struct vfio_pci_region *region)
+{
+ struct vfio_pci_migration_data *data = region->data;
+
+ if (data) {
+ kfree(data->mig_ctl);
+ kfree(data);
+ }
+}
+
+static const struct vfio_pci_regops vfio_pci_migration_regops = {
+ .rw = vfio_pci_dev_migrn_rw,
+ .release = vfio_pci_dev_migrn_release,
+};
+
+static int vfio_pci_migration_info_init(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ int ret;
+
+ ret = vfio_pci_device_get_info(pdev, mig_info, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to get device info\n");
+ return ret;
+ }
+
+ if (mig_info->data_size > VFIO_MIGRATION_BUFFER_MAX_SIZE) {
+ dev_err(&pdev->dev, "mig_info->data_size %llu is invalid\n",
+ mig_info->data_size);
+ return -EINVAL;
+ }
+
+ mig_info->data_offset = VFIO_MIGRATION_REGION_DATA_OFFSET;
+ return ret;
+}
+
+static int vfio_device_mig_data_init(struct vfio_pci_device *vdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_ctl;
+ u64 mig_offset;
+ int ret;
+
+ mig_ctl = kzalloc(sizeof(*mig_ctl), GFP_KERNEL);
+ if (!mig_ctl)
+ return -ENOMEM;
+
+ ret = vfio_pci_migration_info_init(vdev->pdev, mig_ctl,
+ data->mig_driver);
+ if (ret) {
+ dev_err(&vdev->pdev->dev, "get device info error!\n");
+ goto err;
+ }
+
+ mig_offset = sizeof(struct vfio_device_migration_info);
+ data->state_size = mig_ctl->data_size;
+ data->mig_ctl = krealloc(mig_ctl, mig_offset + data->state_size,
+ GFP_KERNEL);
+ if (!data->mig_ctl) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ data->vf_data = (void *)((char *)data->mig_ctl + mig_offset);
+ memset(data->vf_data, 0, data->state_size);
+ data->mig_ctl->data_size = 0;
+
+ ret = vfio_pci_register_dev_region(vdev, VFIO_REGION_TYPE_MIGRATION,
+ VFIO_REGION_SUBTYPE_MIGRATION,
+ &vfio_pci_migration_regops, mig_offset + data->state_size,
+ VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE, data);
+ if (ret) {
+ kfree(data->mig_ctl);
+ return ret;
+ }
+
+ return 0;
+err:
+ kfree(mig_ctl);
+ return ret;
+}
+
+int vfio_pci_migration_init(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+ struct vfio_pci_migration_data *data = NULL;
+ struct pci_dev *pdev = vdev->pdev;
+ int ret;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_driver module\n");
+ return -EINVAL;
+ }
+
+ if (!try_module_get(mig_driver->owner)) {
+ pr_err("module %s is not live\n", mig_driver->owner->name);
+ return -ENODEV;
+ }
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ module_put(mig_driver->owner);
+ return -ENOMEM;
+ }
+
+ data->mig_driver = mig_driver;
+ data->vf_dev = pdev;
+
+ ret = vfio_device_mig_data_init(vdev, data);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to init vfio device migration data!\n");
+ goto err;
+ }
+
+ return ret;
+err:
+ kfree(data);
+ module_put(mig_driver->owner);
+ return ret;
+}
+
+void vfio_pci_migration_exit(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&vdev->pdev->dev, "mig_driver is not found\n");
+ return;
+ }
+
+ if (module_refcount(mig_driver->owner) > 0) {
+ vfio_pci_device_release(vdev->pdev, mig_driver);
+ module_put(mig_driver->owner);
+ }
+}
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!ops || !mod || !pdev)
+ return -EINVAL;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (mig_driver) {
+ pr_info("%s migration ops has already been registered\n",
+ mod->name);
+ atomic_add(1, &mig_driver->count);
+ return 0;
+ }
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ mig_driver = kzalloc(sizeof(*mig_driver), GFP_KERNEL);
+ if (!mig_driver) {
+ module_put(THIS_MODULE);
+ return -ENOMEM;
+ }
+
+ mig_driver->pdev = pdev;
+ mig_driver->bus_num = pdev->bus->number;
+ mig_driver->owner = mod;
+ mig_driver->dev_mig_ops = ops;
+
+ vfio_pci_add_mig_drv(mig_driver);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_register_migration_ops);
+
+void vfio_pci_unregister_migration_ops(struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!mod || !pdev)
+ return;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (!mig_driver) {
+ pr_err("mig_driver is not found\n");
+ return;
+ }
+
+ if (atomic_sub_and_test(1, &mig_driver->count)) {
+ vfio_pci_remove_mig_drv(mig_driver);
+ kfree(mig_driver);
+ module_put(THIS_MODULE);
+ pr_info("%s succeed to unregister migration ops\n",
+ THIS_MODULE->name);
+ }
+}
+EXPORT_SYMBOL_GPL(vfio_pci_unregister_migration_ops);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 17d2bae..03af269 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -15,6 +15,7 @@
#include <linux/pci.h>
#include <linux/irqbypass.h>
#include <linux/types.h>
+#include <linux/vfio_pci_migration.h>
#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -55,7 +56,7 @@ struct vfio_pci_irq_ctx {
struct vfio_pci_region;
struct vfio_pci_regops {
- size_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
+ ssize_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
void (*release)(struct vfio_pci_device *vdev,
struct vfio_pci_region *region);
@@ -173,4 +174,15 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
return -ENODEV;
}
#endif
+
+extern bool vfio_dev_migration_is_supported(struct pci_dev *pdev);
+extern int vfio_pci_migration_init(struct vfio_pci_device *vdev);
+extern void vfio_pci_migration_exit(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info);
+extern int vfio_pci_device_log_stop(struct vfio_pci_device *vdev,
+ uint32_t uuid);
+extern int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_init(struct pci_dev *pdev);
+extern void vfio_pci_device_uninit(struct pci_dev *pdev);
#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 7a386fb..35f2a29 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
#include <linux/wait.h>
#include <linux/sched/signal.h>
@@ -40,6 +41,9 @@
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson(a)redhat.com>"
#define DRIVER_DESC "VFIO - User Level meta-driver"
+#define LOG_BUF_FRAG_SIZE (2 * 1024 * 1024) // fix to 2M
+#define LOG_BUF_MAX_ADDRS_SIZE 128 // max vm ram size is 1T
+
static struct vfio {
struct class *class;
struct list_head iommu_drivers_list;
@@ -57,6 +61,14 @@ struct vfio_iommu_driver {
struct list_head vfio_next;
};
+struct vfio_log_buf {
+ struct vfio_log_buf_info info;
+ int fd;
+ int buffer_state;
+ int device_state;
+ unsigned long *cpu_addrs;
+};
+
struct vfio_container {
struct kref kref;
struct list_head group_list;
@@ -64,6 +76,7 @@ struct vfio_container {
struct vfio_iommu_driver *iommu_driver;
void *iommu_data;
bool noiommu;
+ struct vfio_log_buf log_buf;
};
struct vfio_unbound_dev {
@@ -1158,8 +1171,398 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,
return ret;
}
+static long vfio_dispatch_cmd_to_devices(const struct vfio_container *container,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+ long ret = -ENXIO;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ ret = device->ops->ioctl(device->device_data, cmd, arg);
+ if (ret) {
+ pr_err("dispatch cmd to devices failed\n");
+ return ret;
+ }
+ }
+ }
+ return ret;
+}
+
+static long vfio_log_buf_start(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_START;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 1;
+ return 0;
+}
+
+static long vfio_log_buf_stop(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ if (container->log_buf.device_state == 0) {
+ pr_warn("device already stopped\n");
+ return 0;
+ }
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STOP;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 0;
+ return 0;
+}
+
+static long vfio_log_buf_query(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+
+ return vfio_dispatch_cmd_to_devices(container,
+ VFIO_DEVICE_LOG_BUF_CTL, (unsigned long)&log_buf_ctl);
+}
+
+static int vfio_log_buf_fops_mmap(struct file *filep,
+ struct vm_area_struct *vma)
+{
+ struct vfio_container *container = filep->private_data;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ unsigned long frag_pg_size;
+ unsigned long frag_offset;
+ phys_addr_t pa;
+ int ret = -EINVAL;
+
+ if (!log_buf->cpu_addrs) {
+ pr_err("mmap before setup, please setup log buf first\n");
+ return ret;
+ }
+
+ if (log_buf->info.frag_size < PAGE_SIZE) {
+ pr_err("mmap frag size should not less than page size!\n");
+ return ret;
+ }
+
+ frag_pg_size = log_buf->info.frag_size / PAGE_SIZE;
+ frag_offset = vma->vm_pgoff / frag_pg_size;
+
+ if (frag_offset >= log_buf->info.addrs_size) {
+ pr_err("mmap offset out of range!\n");
+ return ret;
+ }
+
+ if (vma->vm_end - vma->vm_start != log_buf->info.frag_size) {
+ pr_err("mmap size error, should be aligned with frag size!\n");
+ return ret;
+ }
+
+ pa = virt_to_phys((void *)log_buf->cpu_addrs[frag_offset]);
+ ret = remap_pfn_range(vma, vma->vm_start,
+ pa >> PAGE_SHIFT,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+ if (ret)
+ pr_err("remap_pfn_range error!\n");
+ return ret;
+}
+
+static struct device *vfio_get_dev(struct vfio_container *container)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ return device->dev;
+ }
+ }
+ return NULL;
+}
+
+static void vfio_log_buf_release_dma(struct device *dev,
+ struct vfio_log_buf *log_buf)
+{
+ int i;
+
+ for (i = 0; i < log_buf->info.addrs_size; i++) {
+ if ((log_buf->cpu_addrs && log_buf->cpu_addrs[i] != 0) &&
+ (log_buf->info.sgevec &&
+ log_buf->info.sgevec[i].addr != 0)) {
+ dma_free_coherent(dev, log_buf->info.frag_size,
+ (void *)log_buf->cpu_addrs[i],
+ log_buf->info.sgevec[i].addr);
+ log_buf->cpu_addrs[i] = 0;
+ log_buf->info.sgevec[i].addr = 0;
+ }
+ }
+}
+
+static long vfio_log_buf_alloc_dma(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf, struct device *dev)
+{
+ int i;
+
+ for (i = 0; i < info->addrs_size; i++) {
+ log_buf->cpu_addrs[i] = (unsigned long)dma_alloc_coherent(dev,
+ info->frag_size, &log_buf->info.sgevec[i].addr,
+ GFP_KERNEL);
+ log_buf->info.sgevec[i].len = info->frag_size;
+ if (log_buf->cpu_addrs[i] == 0 ||
+ log_buf->info.sgevec[i].addr == 0) {
+ return -ENOMEM;
+ }
+ }
+ return 0;
+}
+
+static long vfio_log_buf_alloc_addrs(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf)
+{
+ log_buf->info.sgevec = kcalloc(info->addrs_size,
+ sizeof(struct vfio_log_buf_sge), GFP_KERNEL);
+ if (!log_buf->info.sgevec)
+ return -ENOMEM;
+
+ log_buf->cpu_addrs = kcalloc(info->addrs_size,
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!log_buf->cpu_addrs) {
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static long vfio_log_buf_info_valid(struct vfio_log_buf_info *info)
+{
+ if (info->addrs_size > LOG_BUF_MAX_ADDRS_SIZE ||
+ info->addrs_size == 0) {
+ pr_err("can`t support vm ram size larger than 1T or equal to 0\n");
+ return -EINVAL;
+ }
+ if (info->frag_size != LOG_BUF_FRAG_SIZE) {
+ pr_err("only support %d frag size\n", LOG_BUF_FRAG_SIZE);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static long vfio_log_buf_setup(struct vfio_container *container,
+ unsigned long data)
+{
+ struct vfio_log_buf_info info;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+ long ret;
+
+ if (log_buf->info.sgevec) {
+ pr_warn("log buf already setup\n");
+ return 0;
+ }
+
+ if (copy_from_user(&info, (void __user *)data,
+ sizeof(struct vfio_log_buf_info)))
+ return -EFAULT;
+
+ ret = vfio_log_buf_info_valid(&info);
+ if (ret)
+ return ret;
+
+ ret = vfio_log_buf_alloc_addrs(&info, log_buf);
+ if (ret)
+ goto err_out;
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ goto err_free_addrs;
+ }
+
+ ret = vfio_log_buf_alloc_dma(&info, log_buf, dev);
+ if (ret)
+ goto err_free_dma_array;
+
+ log_buf->info.uuid = info.uuid;
+ log_buf->info.buffer_size = info.buffer_size;
+ log_buf->info.frag_size = info.frag_size;
+ log_buf->info.addrs_size = info.addrs_size;
+ log_buf->buffer_state = 1;
+ return 0;
+
+err_free_dma_array:
+ vfio_log_buf_release_dma(dev, log_buf);
+err_free_addrs:
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+err_out:
+ return -ENOMEM;
+}
+
+static long vfio_log_buf_release_buffer(struct vfio_container *container)
+{
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+
+ if (log_buf->buffer_state == 0) {
+ pr_warn("buffer already released\n");
+ return 0;
+ }
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ return -EFAULT;
+ }
+
+ vfio_log_buf_release_dma(dev, log_buf);
+
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+
+ log_buf->buffer_state = 0;
+ return 0;
+}
+
+static int vfio_log_buf_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_container *container = filep->private_data;
+
+ vfio_log_buf_stop(container);
+ vfio_log_buf_release_buffer(container);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
+ return 0;
+}
+
+static long vfio_ioctl_handle_log_buf_ctl(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret = 0;
+
+ if (copy_from_user(&log_buf_ctl, (void __user *)arg,
+ sizeof(struct vfio_log_buf_ctl)))
+ return -EFAULT;
+
+ switch (log_buf_ctl.flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_SETUP:
+ ret = vfio_log_buf_setup(container,
+ (unsigned long)log_buf_ctl.data);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_RELEASE:
+ ret = vfio_log_buf_release_buffer(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ ret = vfio_log_buf_start(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_log_buf_stop(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_log_buf_query(container);
+ break;
+ default:
+ pr_err("log buf control flag incorrect\n");
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static long vfio_log_buf_fops_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_container *container = filep->private_data;
+ long ret = -EINVAL;
+
+ switch (cmd) {
+ case VFIO_LOG_BUF_CTL:
+ ret = vfio_ioctl_handle_log_buf_ctl(container, arg);
+ break;
+ default:
+ pr_err("log buf control cmd incorrect\n");
+ break;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_log_buf_fops_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_log_buf_fops_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+static const struct file_operations vfio_log_buf_fops = {
+ .owner = THIS_MODULE,
+ .mmap = vfio_log_buf_fops_mmap,
+ .unlocked_ioctl = vfio_log_buf_fops_unl_ioctl,
+ .release = vfio_log_buf_release,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_log_buf_fops_compat_ioctl,
+#endif
+};
+
+static int vfio_get_log_buf_fd(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct file *filep = NULL;
+ int ret;
+
+ if (container->log_buf.fd > 0)
+ return container->log_buf.fd;
+
+ ret = get_unused_fd_flags(O_CLOEXEC);
+ if (ret < 0) {
+ pr_err("get_unused_fd_flags get fd failed\n");
+ return ret;
+ }
+
+ filep = anon_inode_getfile("[vfio-log-buf]", &vfio_log_buf_fops,
+ container, O_RDWR);
+ if (IS_ERR(filep)) {
+ pr_err("anon_inode_getfile failed\n");
+ put_unused_fd(ret);
+ ret = PTR_ERR(filep);
+ return ret;
+ }
+
+ filep->f_mode |= (FMODE_READ | FMODE_WRITE | FMODE_LSEEK);
+
+ fd_install(ret, filep);
+
+ container->log_buf.fd = ret;
+ return ret;
+}
+
static long vfio_fops_unl_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
+ unsigned int cmd, unsigned long arg)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver;
@@ -1179,6 +1582,9 @@ static long vfio_fops_unl_ioctl(struct file *filep,
case VFIO_SET_IOMMU:
ret = vfio_ioctl_set_iommu(container, arg);
break;
+ case VFIO_GET_LOG_BUF_FD:
+ ret = vfio_get_log_buf_fd(container, arg);
+ break;
default:
driver = container->iommu_driver;
data = container->iommu_data;
@@ -1210,6 +1616,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
INIT_LIST_HEAD(&container->group_list);
init_rwsem(&container->group_lock);
kref_init(&container->kref);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
filep->private_data = container;
@@ -1219,9 +1626,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
static int vfio_fops_release(struct inode *inode, struct file *filep)
{
struct vfio_container *container = filep->private_data;
-
filep->private_data = NULL;
-
vfio_container_put(container);
return 0;
diff --git a/include/linux/vfio_pci_migration.h b/include/linux/vfio_pci_migration.h
new file mode 100644
index 0000000..464ffb4
--- /dev/null
+++ b/include/linux/vfio_pci_migration.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#ifndef VFIO_PCI_MIGRATION_H
+#define VFIO_PCI_MIGRATION_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+#define VFIO_REGION_TYPE_MIGRATION (3)
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION (1)
+
+#define VFIO_MIGRATION_BUFFER_MAX_SIZE SZ_256K
+#define VFIO_MIGRATION_REGION_DATA_OFFSET \
+ (sizeof(struct vfio_device_migration_info))
+#define VFIO_DEVICE_MIGRATION_OFFSET(x) \
+ offsetof(struct vfio_device_migration_info, x)
+
+struct vfio_device_migration_info {
+ __u32 device_state; /* VFIO device state */
+#define VFIO_DEVICE_STATE_STOP (0)
+#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING)
+ __u32 reserved;
+
+ __u32 device_cmd;
+ __u32 version_id;
+
+ __u64 pending_bytes;
+ __u64 data_offset;
+ __u64 data_size;
+};
+
+enum {
+ VFIO_DEVICE_STOP = 0xffff0001,
+ VFIO_DEVICE_CONTINUE,
+ VFIO_DEVICE_MIGRATION_CANCEL,
+};
+
+struct vfio_log_buf_sge {
+ __u64 len;
+ __u64 addr;
+};
+
+struct vfio_log_buf_info {
+ __u32 uuid;
+ __u64 buffer_size;
+ __u64 addrs_size;
+ __u64 frag_size;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_log_buf_ctl {
+ __u32 argsz;
+ __u32 flags;
+ #define VFIO_DEVICE_LOG_BUF_FLAG_SETUP (1 << 0)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_RELEASE (1 << 1)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_START (1 << 2)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STOP (1 << 3)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY (1 << 4)
+ void *data;
+};
+#define VFIO_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 21)
+#define VFIO_GET_LOG_BUF_FD _IO(VFIO_TYPE, VFIO_BASE + 22)
+#define VFIO_DEVICE_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 23)
+
+struct vf_migration_log_info {
+ __u32 dom_uuid;
+ __u64 buffer_size;
+ __u64 sge_len;
+ __u64 sge_num;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_device_migration_ops {
+ /* Get device information */
+ int (*get_info)(struct pci_dev *pdev,
+ struct vfio_device_migration_info *info);
+ /* Enable a vf device */
+ int (*enable)(struct pci_dev *pdev);
+ /* Disable a vf device */
+ int (*disable)(struct pci_dev *pdev);
+ /* Save a vf device */
+ int (*save)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Resuming a vf device */
+ int (*restore)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Log start a vf device */
+ int (*log_start)(struct pci_dev *pdev,
+ struct vf_migration_log_info *log_info);
+ /* Log stop a vf device */
+ int (*log_stop)(struct pci_dev *pdev, uint32_t uuid);
+ /* Get vf device log status */
+ int (*get_log_status)(struct pci_dev *pdev);
+ /* Pre enable a vf device(load_setup, before restore a vf) */
+ int (*pre_enable)(struct pci_dev *pdev);
+ /* Cancel a vf device when live migration failed (rollback) */
+ int (*cancel)(struct pci_dev *pdev);
+ /* Init a vf device */
+ int (*init)(struct pci_dev *pdev);
+ /* Uninit a vf device */
+ void (*uninit)(struct pci_dev *pdev);
+ /* Release a vf device */
+ void (*release)(struct pci_dev *pdev);
+};
+
+struct vfio_pci_vendor_mig_driver {
+ struct pci_dev *pdev;
+ unsigned char bus_num;
+ struct vfio_device_migration_ops *dev_mig_ops;
+ struct module *owner;
+ atomic_t count;
+ struct list_head list;
+};
+
+struct vfio_pci_migration_data {
+ u64 state_size;
+ struct pci_dev *vf_dev;
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+ struct vfio_device_migration_info *mig_ctl;
+ void *vf_data;
+};
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev);
+void vfio_pci_unregister_migration_ops(struct module *mod,
+ struct pci_dev *pdev);
+
+#endif /* VFIO_PCI_MIGRATION_H */
--
1.8.3.1
2
1
From: Rong Wang <w_angrong(a)163.com>
kunpeng inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CO9A
CVE: NA
---------------------------------
As pass through devices, hypervisor can`t control the status of
device, and can`t track dirty memory DMA from device, during
migration.
The goal of this framework is to combine hardware to accomplish
the task above.
qemu
|status control and dirty memory report
vfio
|ops to hardware
hardware
Signed-off-by: Rong Wang <w_angrong(a)163.com>
Signed-off-by: HuHua Li <18245010845(a)163.com>
Signed-off-by: Ripeng Qiu <965412048(a)qq.com>
---
drivers/vfio/pci/Makefile | 2 +-
drivers/vfio/pci/vfio_pci.c | 54 +++
drivers/vfio/pci/vfio_pci_migration.c | 755 ++++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_private.h | 14 +-
drivers/vfio/vfio.c | 411 +++++++++++++++++-
include/linux/vfio_pci_migration.h | 136 ++++++
6 files changed, 1367 insertions(+), 5 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_migration.c
create mode 100644 include/linux/vfio_pci_migration.h
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 76d8ec0..80a777d 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,5 +1,5 @@
-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio_pci_migration.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 51b791c..59d8280 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -30,6 +30,7 @@
#include <linux/vgaarb.h>
#include <linux/nospec.h>
#include <linux/sched/mm.h>
+#include <linux/vfio_pci_migration.h>
#include "vfio_pci_private.h"
@@ -296,6 +297,14 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
vfio_pci_probe_mmaps(vdev);
+ if (vfio_dev_migration_is_supported(pdev)) {
+ ret = vfio_pci_migration_init(vdev);
+ if (ret) {
+ dev_warn(&vdev->pdev->dev, "Failed to init vfio_pci_migration\n");
+ vfio_pci_disable(vdev);
+ return ret;
+ }
+ }
return 0;
}
@@ -392,6 +401,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
out:
pci_disable_device(pdev);
+ vfio_pci_migration_exit(vdev);
vfio_pci_try_bus_reset(vdev);
if (!disable_idle_d3)
@@ -642,6 +652,41 @@ struct vfio_devices {
int max_index;
};
+static long vfio_pci_handle_log_buf_ctl(struct vfio_pci_device *vdev,
+ const unsigned long arg)
+{
+ struct vfio_log_buf_ctl *log_buf_ctl = NULL;
+ struct vfio_log_buf_info *log_buf_info = NULL;
+ struct vf_migration_log_info migration_log_info;
+ long ret = 0;
+
+ log_buf_ctl = (struct vfio_log_buf_ctl *)arg;
+ log_buf_info = (struct vfio_log_buf_info *)log_buf_ctl->data;
+
+ switch (log_buf_ctl->flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ migration_log_info.dom_uuid = log_buf_info->uuid;
+ migration_log_info.buffer_size =
+ log_buf_info->buffer_size;
+ migration_log_info.sge_num = log_buf_info->addrs_size;
+ migration_log_info.sge_len = log_buf_info->frag_size;
+ migration_log_info.sgevec = log_buf_info->sgevec;
+ ret = vfio_pci_device_log_start(vdev,
+ &migration_log_info);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_pci_device_log_stop(vdev,
+ log_buf_info->uuid);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_pci_device_log_status_query(vdev);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
static long vfio_pci_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
{
@@ -1142,6 +1187,8 @@ static long vfio_pci_ioctl(void *device_data,
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
+ } else if (cmd == VFIO_DEVICE_LOG_BUF_CTL) {
+ return vfio_pci_handle_log_buf_ctl(vdev, arg);
}
return -ENOTTY;
@@ -1566,6 +1613,9 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
pci_set_power_state(pdev, PCI_D3hot);
}
+ if (vfio_dev_migration_is_supported(pdev))
+ ret = vfio_pci_device_init(pdev);
+
return ret;
}
@@ -1591,6 +1641,10 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!disable_idle_d3)
pci_set_power_state(pdev, PCI_D0);
+
+ if (vfio_dev_migration_is_supported(pdev)) {
+ vfio_pci_device_uninit(pdev);
+ }
}
static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci_migration.c b/drivers/vfio/pci/vfio_pci_migration.c
new file mode 100644
index 0000000..f69cd13
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_migration.c
@@ -0,0 +1,755 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
+
+#include "vfio_pci_private.h"
+
+static LIST_HEAD(vfio_pci_mig_drivers_list);
+static DEFINE_MUTEX(vfio_pci_mig_drivers_mutex);
+
+static void vfio_pci_add_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ atomic_set(&mig_drv->count, 1);
+ list_add_tail(&mig_drv->list, &vfio_pci_mig_drivers_list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static void vfio_pci_remove_mig_drv(struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_del(&mig_drv->list);
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_find_mig_drv(struct pci_dev *pdev, struct module *module)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->owner == module) {
+ if (mig_drv->bus_num == pdev->bus->number)
+ goto out;
+ }
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+static struct vfio_pci_vendor_mig_driver *
+ vfio_pci_get_mig_driver(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = NULL;
+ struct pci_dev *pf_dev = pci_physfn(pdev);
+
+ mutex_lock(&vfio_pci_mig_drivers_mutex);
+ list_for_each_entry(mig_drv, &vfio_pci_mig_drivers_list, list) {
+ if (mig_drv->bus_num == pf_dev->bus->number)
+ goto out;
+ }
+ mig_drv = NULL;
+out:
+ mutex_unlock(&vfio_pci_mig_drivers_mutex);
+ return mig_drv;
+}
+
+bool vfio_dev_migration_is_supported(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&pdev->dev, "unable to find a mig_drv module\n");
+ return false;
+ }
+
+ return true;
+}
+
+int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_start ||
+ (mig_driver->dev_mig_ops->log_start(vdev->pdev,
+ log_info) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log start\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_stop(struct vfio_pci_device *vdev, uint32_t uuid)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->log_stop ||
+ (mig_driver->dev_mig_ops->log_stop(vdev->pdev, uuid) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to set log stop\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&vdev->pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (!mig_driver->dev_mig_ops->get_log_status ||
+ (mig_driver->dev_mig_ops->get_log_status(vdev->pdev) != 0)) {
+ dev_err(&vdev->pdev->dev, "failed to get log status\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int vfio_pci_device_init(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return -EFAULT;
+ }
+
+ if (mig_drv->dev_mig_ops->init)
+ return mig_drv->dev_mig_ops->init(pdev);
+
+ return -EFAULT;
+}
+
+void vfio_pci_device_uninit(struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv;
+
+ mig_drv = vfio_pci_get_mig_driver(pdev);
+ if (!mig_drv || !mig_drv->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_drv module\n");
+ return;
+ }
+
+ if (mig_drv->dev_mig_ops->uninit)
+ mig_drv->dev_mig_ops->uninit(pdev);
+}
+
+static void vfio_pci_device_release(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->release)
+ mig_drv->dev_mig_ops->release(pdev);
+}
+
+static int vfio_pci_device_get_info(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (mig_drv->dev_mig_ops->get_info)
+ return mig_drv->dev_mig_ops->get_info(pdev, mig_info);
+ return -EFAULT;
+}
+
+static int vfio_pci_device_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->enable ||
+ (mig_drv->dev_mig_ops->enable(pdev) != 0)) {
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int vfio_pci_device_disable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->disable ||
+ (mig_drv->dev_mig_ops->disable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_pre_enable(struct pci_dev *pdev,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ if (!mig_drv->dev_mig_ops->pre_enable ||
+ (mig_drv->dev_mig_ops->pre_enable(pdev) != 0))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int vfio_pci_device_state_save(struct pci_dev *pdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ void *base = (void *)mig_info;
+ int ret = 0;
+
+ if ((mig_info->device_state & VFIO_DEVICE_STATE_RUNNING) != 0) {
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to stop VF function!\n");
+ return ret;
+ }
+ mig_info->device_state &= ~VFIO_DEVICE_STATE_RUNNING;
+ }
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->save) {
+ ret = mig_drv->dev_mig_ops->save(pdev, base,
+ mig_info->data_offset, data->state_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to save device state!\n");
+ return -EINVAL;
+ }
+ } else {
+ return -EFAULT;
+ }
+
+ mig_info->data_size = data->state_size;
+ mig_info->pending_bytes = mig_info->data_size;
+ return ret;
+}
+
+static int vfio_pci_device_state_restore(struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_info = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ void *base = (void *)mig_info;
+ int ret;
+
+ if (mig_drv->dev_mig_ops && mig_drv->dev_mig_ops->restore) {
+ ret = mig_drv->dev_mig_ops->restore(pdev, base,
+ mig_info->data_offset, mig_info->data_size);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to restore device state!\n");
+ return -EINVAL;
+ }
+ return 0;
+ }
+
+ return -EFAULT;
+}
+
+static int vfio_pci_set_device_state(struct vfio_pci_migration_data *data,
+ u32 state)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ int ret = 0;
+
+ if (state == mig_ctl->device_state)
+ return 0;
+
+ if (!mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ switch (state) {
+ case VFIO_DEVICE_STATE_RUNNING:
+ if (!(mig_ctl->device_state &
+ VFIO_DEVICE_STATE_RUNNING))
+ ret = vfio_pci_device_enable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING:
+ /*
+ * (pre-copy) - device should start logging data.
+ */
+ ret = 0;
+ break;
+ case VFIO_DEVICE_STATE_SAVING:
+ /* stop the vf function, save state */
+ ret = vfio_pci_device_state_save(pdev, data);
+ break;
+ case VFIO_DEVICE_STATE_STOP:
+ if (mig_ctl->device_state & VFIO_DEVICE_STATE_RUNNING)
+ ret = vfio_pci_device_disable(pdev, mig_drv);
+ break;
+ case VFIO_DEVICE_STATE_RESUMING:
+ ret = vfio_pci_device_pre_enable(pdev, mig_drv);
+ break;
+ default:
+ ret = -EFAULT;
+ break;
+ }
+
+ if (ret)
+ return ret;
+
+ mig_ctl->device_state = state;
+ return 0;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_state(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ u32 device_state;
+ int ret;
+
+ if (count != sizeof(device_state))
+ return -EINVAL;
+
+ if (iswrite) {
+ if (copy_from_user(&device_state, buf, count))
+ return -EFAULT;
+
+ ret = vfio_pci_set_device_state(data, device_state);
+ if (ret)
+ return ret;
+ } else {
+ if (copy_to_user(buf, &mig_ctl->device_state, count))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_pending_bytes(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 pending_bytes;
+
+ if (count != sizeof(pending_bytes) || iswrite)
+ return -EINVAL;
+
+ if (mig_info->device_state ==
+ (VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING)) {
+ /* In pre-copy state we have no data to return for now,
+ * return 0 pending bytes
+ */
+ pending_bytes = 0;
+ } else {
+ pending_bytes = mig_info->pending_bytes;
+ }
+
+ if (copy_to_user(buf, &pending_bytes, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_offset(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_offset = mig_info->data_offset;
+
+ if (count != sizeof(data_offset) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &data_offset, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_size(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u64 data_size;
+
+ if (count != sizeof(data_size))
+ return -EINVAL;
+
+ if (iswrite) {
+ /* data_size is writable only during resuming state */
+ if (mig_info->device_state != VFIO_DEVICE_STATE_RESUMING)
+ return -EINVAL;
+
+ if (copy_from_user(&data_size, buf, sizeof(data_size)))
+ return -EFAULT;
+
+ mig_info->data_size = data_size;
+ } else {
+ if (mig_info->device_state != VFIO_DEVICE_STATE_SAVING)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &mig_info->data_size,
+ sizeof(data_size)))
+ return -EFAULT;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_dev_cmd(struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, bool iswrite)
+{
+ struct vfio_pci_vendor_mig_driver *mig_drv = data->mig_driver;
+ struct pci_dev *pdev = data->vf_dev;
+ u32 device_cmd;
+ int ret = -EFAULT;
+
+ if (count != sizeof(device_cmd) || !iswrite || !mig_drv->dev_mig_ops)
+ return -EINVAL;
+
+ if (copy_from_user(&device_cmd, buf, count))
+ return -EFAULT;
+
+ switch (device_cmd) {
+ case VFIO_DEVICE_MIGRATION_CANCEL:
+ if (mig_drv->dev_mig_ops->cancel)
+ ret = mig_drv->dev_mig_ops->cancel(pdev);
+ break;
+ default:
+ dev_err(&pdev->dev, "cmd is invaild\n");
+ return -EINVAL;
+ }
+
+ if (ret != 0)
+ return ret;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_drv_version(
+ struct vfio_device_migration_info *mig_info,
+ char __user *buf, size_t count, bool iswrite)
+{
+ u32 version_id = mig_info->version_id;
+
+ if (count != sizeof(version_id) || iswrite)
+ return -EINVAL;
+
+ if (copy_to_user(buf, &version_id, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static ssize_t vfio_pci_handle_mig_data_rw(
+ struct vfio_pci_migration_data *data,
+ char __user *buf, size_t count, u64 pos, bool iswrite)
+{
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ void *data_addr = data->vf_data;
+
+ if (count == 0) {
+ dev_err(&data->vf_dev->dev, "qemu operation data size error!\n");
+ return -EINVAL;
+ }
+
+ data_addr += pos - mig_ctl->data_offset;
+ if (iswrite) {
+ if (copy_from_user(data_addr, buf, count))
+ return -EFAULT;
+
+ mig_ctl->pending_bytes += count;
+ if (mig_ctl->pending_bytes > data->state_size)
+ return -EINVAL;
+ } else {
+ if (copy_to_user(buf, data_addr, count))
+ return -EFAULT;
+
+ if (mig_ctl->pending_bytes < count)
+ return -EINVAL;
+
+ mig_ctl->pending_bytes -= count;
+ }
+
+ return count;
+}
+
+static ssize_t vfio_pci_dev_migrn_rw(struct vfio_pci_device *vdev,
+ char __user *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+ unsigned int index =
+ VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+ struct vfio_pci_migration_data *data =
+ (struct vfio_pci_migration_data *)vdev->region[index].data;
+ loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+ struct vfio_device_migration_info *mig_ctl = data->mig_ctl;
+ int ret;
+
+ if (pos >= vdev->region[index].size)
+ return -EINVAL;
+
+ count = min(count, (size_t)(vdev->region[index].size - pos));
+ if (pos >= VFIO_MIGRATION_REGION_DATA_OFFSET)
+ return vfio_pci_handle_mig_data_rw(data,
+ buf, count, pos, iswrite);
+
+ switch (pos) {
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_state):
+ ret = vfio_pci_handle_mig_dev_state(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(pending_bytes):
+ ret = vfio_pci_handle_mig_pending_bytes(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_offset):
+ ret = vfio_pci_handle_mig_data_offset(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(data_size):
+ ret = vfio_pci_handle_mig_data_size(mig_ctl,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(device_cmd):
+ ret = vfio_pci_handle_mig_dev_cmd(data,
+ buf, count, iswrite);
+ break;
+ case VFIO_DEVICE_MIGRATION_OFFSET(version_id):
+ ret = vfio_pci_handle_mig_drv_version(mig_ctl,
+ buf, count, iswrite);
+ break;
+ default:
+ dev_err(&vdev->pdev->dev, "invalid pos offset\n");
+ ret = -EFAULT;
+ break;
+ }
+
+ if (mig_ctl->device_state == VFIO_DEVICE_STATE_RESUMING &&
+ mig_ctl->pending_bytes == data->state_size &&
+ mig_ctl->data_size == data->state_size) {
+ if (vfio_pci_device_state_restore(data) != 0) {
+ dev_err(&vdev->pdev->dev, "Failed to restore device state!\n");
+ return -EFAULT;
+ }
+ mig_ctl->pending_bytes = 0;
+ mig_ctl->data_size = 0;
+ }
+
+ return ret;
+}
+
+static void vfio_pci_dev_migrn_release(struct vfio_pci_device *vdev,
+ struct vfio_pci_region *region)
+{
+ struct vfio_pci_migration_data *data = region->data;
+
+ if (data) {
+ kfree(data->mig_ctl);
+ kfree(data);
+ }
+}
+
+static const struct vfio_pci_regops vfio_pci_migration_regops = {
+ .rw = vfio_pci_dev_migrn_rw,
+ .release = vfio_pci_dev_migrn_release,
+};
+
+static int vfio_pci_migration_info_init(struct pci_dev *pdev,
+ struct vfio_device_migration_info *mig_info,
+ struct vfio_pci_vendor_mig_driver *mig_drv)
+{
+ int ret;
+
+ ret = vfio_pci_device_get_info(pdev, mig_info, mig_drv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to get device info\n");
+ return ret;
+ }
+
+ if (mig_info->data_size > VFIO_MIGRATION_BUFFER_MAX_SIZE) {
+ dev_err(&pdev->dev, "mig_info->data_size %llu is invalid\n",
+ mig_info->data_size);
+ return -EINVAL;
+ }
+
+ mig_info->data_offset = VFIO_MIGRATION_REGION_DATA_OFFSET;
+ return ret;
+}
+
+static int vfio_device_mig_data_init(struct vfio_pci_device *vdev,
+ struct vfio_pci_migration_data *data)
+{
+ struct vfio_device_migration_info *mig_ctl;
+ u64 mig_offset;
+ int ret;
+
+ mig_ctl = kzalloc(sizeof(*mig_ctl), GFP_KERNEL);
+ if (!mig_ctl)
+ return -ENOMEM;
+
+ ret = vfio_pci_migration_info_init(vdev->pdev, mig_ctl,
+ data->mig_driver);
+ if (ret) {
+ dev_err(&vdev->pdev->dev, "get device info error!\n");
+ goto err;
+ }
+
+ mig_offset = sizeof(struct vfio_device_migration_info);
+ data->state_size = mig_ctl->data_size;
+ data->mig_ctl = krealloc(mig_ctl, mig_offset + data->state_size,
+ GFP_KERNEL);
+ if (!data->mig_ctl) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ data->vf_data = (void *)((char *)data->mig_ctl + mig_offset);
+ memset(data->vf_data, 0, data->state_size);
+ data->mig_ctl->data_size = 0;
+
+ ret = vfio_pci_register_dev_region(vdev, VFIO_REGION_TYPE_MIGRATION,
+ VFIO_REGION_SUBTYPE_MIGRATION,
+ &vfio_pci_migration_regops, mig_offset + data->state_size,
+ VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE, data);
+ if (ret) {
+ kfree(data->mig_ctl);
+ return ret;
+ }
+
+ return 0;
+err:
+ kfree(mig_ctl);
+ return ret;
+}
+
+int vfio_pci_migration_init(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+ struct vfio_pci_migration_data *data = NULL;
+ struct pci_dev *pdev = vdev->pdev;
+ int ret;
+
+ mig_driver = vfio_pci_get_mig_driver(pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_err(&pdev->dev, "unable to find a mig_driver module\n");
+ return -EINVAL;
+ }
+
+ if (!try_module_get(mig_driver->owner)) {
+ pr_err("module %s is not live\n", mig_driver->owner->name);
+ return -ENODEV;
+ }
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ module_put(mig_driver->owner);
+ return -ENOMEM;
+ }
+
+ data->mig_driver = mig_driver;
+ data->vf_dev = pdev;
+
+ ret = vfio_device_mig_data_init(vdev, data);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to init vfio device migration data!\n");
+ goto err;
+ }
+
+ return ret;
+err:
+ kfree(data);
+ module_put(mig_driver->owner);
+ return ret;
+}
+
+void vfio_pci_migration_exit(struct vfio_pci_device *vdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ mig_driver = vfio_pci_get_mig_driver(vdev->pdev);
+ if (!mig_driver || !mig_driver->dev_mig_ops) {
+ dev_warn(&vdev->pdev->dev, "mig_driver is not found\n");
+ return;
+ }
+
+ if (module_refcount(mig_driver->owner) > 0) {
+ vfio_pci_device_release(vdev->pdev, mig_driver);
+ module_put(mig_driver->owner);
+ }
+}
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!ops || !mod || !pdev)
+ return -EINVAL;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (mig_driver) {
+ pr_info("%s migration ops has already been registered\n",
+ mod->name);
+ atomic_add(1, &mig_driver->count);
+ return 0;
+ }
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ mig_driver = kzalloc(sizeof(*mig_driver), GFP_KERNEL);
+ if (!mig_driver) {
+ module_put(THIS_MODULE);
+ return -ENOMEM;
+ }
+
+ mig_driver->pdev = pdev;
+ mig_driver->bus_num = pdev->bus->number;
+ mig_driver->owner = mod;
+ mig_driver->dev_mig_ops = ops;
+
+ vfio_pci_add_mig_drv(mig_driver);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_register_migration_ops);
+
+void vfio_pci_unregister_migration_ops(struct module *mod, struct pci_dev *pdev)
+{
+ struct vfio_pci_vendor_mig_driver *mig_driver = NULL;
+
+ if (!mod || !pdev)
+ return;
+
+ mig_driver = vfio_pci_find_mig_drv(pdev, mod);
+ if (!mig_driver) {
+ pr_err("mig_driver is not found\n");
+ return;
+ }
+
+ if (atomic_sub_and_test(1, &mig_driver->count)) {
+ vfio_pci_remove_mig_drv(mig_driver);
+ kfree(mig_driver);
+ module_put(THIS_MODULE);
+ pr_info("%s succeed to unregister migration ops\n",
+ THIS_MODULE->name);
+ }
+}
+EXPORT_SYMBOL_GPL(vfio_pci_unregister_migration_ops);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 17d2bae..03af269 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -15,6 +15,7 @@
#include <linux/pci.h>
#include <linux/irqbypass.h>
#include <linux/types.h>
+#include <linux/vfio_pci_migration.h>
#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -55,7 +56,7 @@ struct vfio_pci_irq_ctx {
struct vfio_pci_region;
struct vfio_pci_regops {
- size_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
+ ssize_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
void (*release)(struct vfio_pci_device *vdev,
struct vfio_pci_region *region);
@@ -173,4 +174,15 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
return -ENODEV;
}
#endif
+
+extern bool vfio_dev_migration_is_supported(struct pci_dev *pdev);
+extern int vfio_pci_migration_init(struct vfio_pci_device *vdev);
+extern void vfio_pci_migration_exit(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_log_start(struct vfio_pci_device *vdev,
+ struct vf_migration_log_info *log_info);
+extern int vfio_pci_device_log_stop(struct vfio_pci_device *vdev,
+ uint32_t uuid);
+extern int vfio_pci_device_log_status_query(struct vfio_pci_device *vdev);
+extern int vfio_pci_device_init(struct pci_dev *pdev);
+extern void vfio_pci_device_uninit(struct pci_dev *pdev);
#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 7a386fb..35f2a29 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
+#include <linux/vfio_pci_migration.h>
#include <linux/wait.h>
#include <linux/sched/signal.h>
@@ -40,6 +41,9 @@
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson(a)redhat.com>"
#define DRIVER_DESC "VFIO - User Level meta-driver"
+#define LOG_BUF_FRAG_SIZE (2 * 1024 * 1024) // fix to 2M
+#define LOG_BUF_MAX_ADDRS_SIZE 128 // max vm ram size is 1T
+
static struct vfio {
struct class *class;
struct list_head iommu_drivers_list;
@@ -57,6 +61,14 @@ struct vfio_iommu_driver {
struct list_head vfio_next;
};
+struct vfio_log_buf {
+ struct vfio_log_buf_info info;
+ int fd;
+ int buffer_state;
+ int device_state;
+ unsigned long *cpu_addrs;
+};
+
struct vfio_container {
struct kref kref;
struct list_head group_list;
@@ -64,6 +76,7 @@ struct vfio_container {
struct vfio_iommu_driver *iommu_driver;
void *iommu_data;
bool noiommu;
+ struct vfio_log_buf log_buf;
};
struct vfio_unbound_dev {
@@ -1158,8 +1171,398 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,
return ret;
}
+static long vfio_dispatch_cmd_to_devices(const struct vfio_container *container,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+ long ret = -ENXIO;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ ret = device->ops->ioctl(device->device_data, cmd, arg);
+ if (ret) {
+ pr_err("dispatch cmd to devices failed\n");
+ return ret;
+ }
+ }
+ }
+ return ret;
+}
+
+static long vfio_log_buf_start(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_START;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 1;
+ return 0;
+}
+
+static long vfio_log_buf_stop(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret;
+
+ if (container->log_buf.device_state == 0) {
+ pr_warn("device already stopped\n");
+ return 0;
+ }
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STOP;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+ ret = vfio_dispatch_cmd_to_devices(container, VFIO_DEVICE_LOG_BUF_CTL,
+ (unsigned long)&log_buf_ctl);
+ if (ret)
+ return ret;
+
+ container->log_buf.device_state = 0;
+ return 0;
+}
+
+static long vfio_log_buf_query(struct vfio_container *container)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+
+ log_buf_ctl.argsz = sizeof(struct vfio_log_buf_info);
+ log_buf_ctl.flags = VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY;
+ log_buf_ctl.data = (void *)&container->log_buf.info;
+
+ return vfio_dispatch_cmd_to_devices(container,
+ VFIO_DEVICE_LOG_BUF_CTL, (unsigned long)&log_buf_ctl);
+}
+
+static int vfio_log_buf_fops_mmap(struct file *filep,
+ struct vm_area_struct *vma)
+{
+ struct vfio_container *container = filep->private_data;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ unsigned long frag_pg_size;
+ unsigned long frag_offset;
+ phys_addr_t pa;
+ int ret = -EINVAL;
+
+ if (!log_buf->cpu_addrs) {
+ pr_err("mmap before setup, please setup log buf first\n");
+ return ret;
+ }
+
+ if (log_buf->info.frag_size < PAGE_SIZE) {
+ pr_err("mmap frag size should not less than page size!\n");
+ return ret;
+ }
+
+ frag_pg_size = log_buf->info.frag_size / PAGE_SIZE;
+ frag_offset = vma->vm_pgoff / frag_pg_size;
+
+ if (frag_offset >= log_buf->info.addrs_size) {
+ pr_err("mmap offset out of range!\n");
+ return ret;
+ }
+
+ if (vma->vm_end - vma->vm_start != log_buf->info.frag_size) {
+ pr_err("mmap size error, should be aligned with frag size!\n");
+ return ret;
+ }
+
+ pa = virt_to_phys((void *)log_buf->cpu_addrs[frag_offset]);
+ ret = remap_pfn_range(vma, vma->vm_start,
+ pa >> PAGE_SHIFT,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+ if (ret)
+ pr_err("remap_pfn_range error!\n");
+ return ret;
+}
+
+static struct device *vfio_get_dev(struct vfio_container *container)
+{
+ struct vfio_group *group = NULL;
+ struct vfio_device *device = NULL;
+
+ list_for_each_entry(group, &container->group_list, container_next) {
+ list_for_each_entry(device, &group->device_list, group_next) {
+ return device->dev;
+ }
+ }
+ return NULL;
+}
+
+static void vfio_log_buf_release_dma(struct device *dev,
+ struct vfio_log_buf *log_buf)
+{
+ int i;
+
+ for (i = 0; i < log_buf->info.addrs_size; i++) {
+ if ((log_buf->cpu_addrs && log_buf->cpu_addrs[i] != 0) &&
+ (log_buf->info.sgevec &&
+ log_buf->info.sgevec[i].addr != 0)) {
+ dma_free_coherent(dev, log_buf->info.frag_size,
+ (void *)log_buf->cpu_addrs[i],
+ log_buf->info.sgevec[i].addr);
+ log_buf->cpu_addrs[i] = 0;
+ log_buf->info.sgevec[i].addr = 0;
+ }
+ }
+}
+
+static long vfio_log_buf_alloc_dma(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf, struct device *dev)
+{
+ int i;
+
+ for (i = 0; i < info->addrs_size; i++) {
+ log_buf->cpu_addrs[i] = (unsigned long)dma_alloc_coherent(dev,
+ info->frag_size, &log_buf->info.sgevec[i].addr,
+ GFP_KERNEL);
+ log_buf->info.sgevec[i].len = info->frag_size;
+ if (log_buf->cpu_addrs[i] == 0 ||
+ log_buf->info.sgevec[i].addr == 0) {
+ return -ENOMEM;
+ }
+ }
+ return 0;
+}
+
+static long vfio_log_buf_alloc_addrs(struct vfio_log_buf_info *info,
+ struct vfio_log_buf *log_buf)
+{
+ log_buf->info.sgevec = kcalloc(info->addrs_size,
+ sizeof(struct vfio_log_buf_sge), GFP_KERNEL);
+ if (!log_buf->info.sgevec)
+ return -ENOMEM;
+
+ log_buf->cpu_addrs = kcalloc(info->addrs_size,
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!log_buf->cpu_addrs) {
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static long vfio_log_buf_info_valid(struct vfio_log_buf_info *info)
+{
+ if (info->addrs_size > LOG_BUF_MAX_ADDRS_SIZE ||
+ info->addrs_size == 0) {
+ pr_err("can`t support vm ram size larger than 1T or equal to 0\n");
+ return -EINVAL;
+ }
+ if (info->frag_size != LOG_BUF_FRAG_SIZE) {
+ pr_err("only support %d frag size\n", LOG_BUF_FRAG_SIZE);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static long vfio_log_buf_setup(struct vfio_container *container,
+ unsigned long data)
+{
+ struct vfio_log_buf_info info;
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+ long ret;
+
+ if (log_buf->info.sgevec) {
+ pr_warn("log buf already setup\n");
+ return 0;
+ }
+
+ if (copy_from_user(&info, (void __user *)data,
+ sizeof(struct vfio_log_buf_info)))
+ return -EFAULT;
+
+ ret = vfio_log_buf_info_valid(&info);
+ if (ret)
+ return ret;
+
+ ret = vfio_log_buf_alloc_addrs(&info, log_buf);
+ if (ret)
+ goto err_out;
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ goto err_free_addrs;
+ }
+
+ ret = vfio_log_buf_alloc_dma(&info, log_buf, dev);
+ if (ret)
+ goto err_free_dma_array;
+
+ log_buf->info.uuid = info.uuid;
+ log_buf->info.buffer_size = info.buffer_size;
+ log_buf->info.frag_size = info.frag_size;
+ log_buf->info.addrs_size = info.addrs_size;
+ log_buf->buffer_state = 1;
+ return 0;
+
+err_free_dma_array:
+ vfio_log_buf_release_dma(dev, log_buf);
+err_free_addrs:
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+err_out:
+ return -ENOMEM;
+}
+
+static long vfio_log_buf_release_buffer(struct vfio_container *container)
+{
+ struct vfio_log_buf *log_buf = &container->log_buf;
+ struct device *dev = NULL;
+
+ if (log_buf->buffer_state == 0) {
+ pr_warn("buffer already released\n");
+ return 0;
+ }
+
+ dev = vfio_get_dev(container);
+ if (!dev) {
+ pr_err("can`t get dev\n");
+ return -EFAULT;
+ }
+
+ vfio_log_buf_release_dma(dev, log_buf);
+
+ kfree(log_buf->cpu_addrs);
+ log_buf->cpu_addrs = NULL;
+
+ kfree(log_buf->info.sgevec);
+ log_buf->info.sgevec = NULL;
+
+ log_buf->buffer_state = 0;
+ return 0;
+}
+
+static int vfio_log_buf_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_container *container = filep->private_data;
+
+ vfio_log_buf_stop(container);
+ vfio_log_buf_release_buffer(container);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
+ return 0;
+}
+
+static long vfio_ioctl_handle_log_buf_ctl(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct vfio_log_buf_ctl log_buf_ctl;
+ long ret = 0;
+
+ if (copy_from_user(&log_buf_ctl, (void __user *)arg,
+ sizeof(struct vfio_log_buf_ctl)))
+ return -EFAULT;
+
+ switch (log_buf_ctl.flags) {
+ case VFIO_DEVICE_LOG_BUF_FLAG_SETUP:
+ ret = vfio_log_buf_setup(container,
+ (unsigned long)log_buf_ctl.data);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_RELEASE:
+ ret = vfio_log_buf_release_buffer(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_START:
+ ret = vfio_log_buf_start(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STOP:
+ ret = vfio_log_buf_stop(container);
+ break;
+ case VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY:
+ ret = vfio_log_buf_query(container);
+ break;
+ default:
+ pr_err("log buf control flag incorrect\n");
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static long vfio_log_buf_fops_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_container *container = filep->private_data;
+ long ret = -EINVAL;
+
+ switch (cmd) {
+ case VFIO_LOG_BUF_CTL:
+ ret = vfio_ioctl_handle_log_buf_ctl(container, arg);
+ break;
+ default:
+ pr_err("log buf control cmd incorrect\n");
+ break;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_log_buf_fops_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_log_buf_fops_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+static const struct file_operations vfio_log_buf_fops = {
+ .owner = THIS_MODULE,
+ .mmap = vfio_log_buf_fops_mmap,
+ .unlocked_ioctl = vfio_log_buf_fops_unl_ioctl,
+ .release = vfio_log_buf_release,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_log_buf_fops_compat_ioctl,
+#endif
+};
+
+static int vfio_get_log_buf_fd(struct vfio_container *container,
+ unsigned long arg)
+{
+ struct file *filep = NULL;
+ int ret;
+
+ if (container->log_buf.fd > 0)
+ return container->log_buf.fd;
+
+ ret = get_unused_fd_flags(O_CLOEXEC);
+ if (ret < 0) {
+ pr_err("get_unused_fd_flags get fd failed\n");
+ return ret;
+ }
+
+ filep = anon_inode_getfile("[vfio-log-buf]", &vfio_log_buf_fops,
+ container, O_RDWR);
+ if (IS_ERR(filep)) {
+ pr_err("anon_inode_getfile failed\n");
+ put_unused_fd(ret);
+ ret = PTR_ERR(filep);
+ return ret;
+ }
+
+ filep->f_mode |= (FMODE_READ | FMODE_WRITE | FMODE_LSEEK);
+
+ fd_install(ret, filep);
+
+ container->log_buf.fd = ret;
+ return ret;
+}
+
static long vfio_fops_unl_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
+ unsigned int cmd, unsigned long arg)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver;
@@ -1179,6 +1582,9 @@ static long vfio_fops_unl_ioctl(struct file *filep,
case VFIO_SET_IOMMU:
ret = vfio_ioctl_set_iommu(container, arg);
break;
+ case VFIO_GET_LOG_BUF_FD:
+ ret = vfio_get_log_buf_fd(container, arg);
+ break;
default:
driver = container->iommu_driver;
data = container->iommu_data;
@@ -1210,6 +1616,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
INIT_LIST_HEAD(&container->group_list);
init_rwsem(&container->group_lock);
kref_init(&container->kref);
+ memset(&container->log_buf, 0, sizeof(struct vfio_log_buf));
filep->private_data = container;
@@ -1219,9 +1626,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
static int vfio_fops_release(struct inode *inode, struct file *filep)
{
struct vfio_container *container = filep->private_data;
-
filep->private_data = NULL;
-
vfio_container_put(container);
return 0;
diff --git a/include/linux/vfio_pci_migration.h b/include/linux/vfio_pci_migration.h
new file mode 100644
index 0000000..464ffb4
--- /dev/null
+++ b/include/linux/vfio_pci_migration.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd. All rights reserved.
+ */
+
+#ifndef VFIO_PCI_MIGRATION_H
+#define VFIO_PCI_MIGRATION_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+#define VFIO_REGION_TYPE_MIGRATION (3)
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION (1)
+
+#define VFIO_MIGRATION_BUFFER_MAX_SIZE SZ_256K
+#define VFIO_MIGRATION_REGION_DATA_OFFSET \
+ (sizeof(struct vfio_device_migration_info))
+#define VFIO_DEVICE_MIGRATION_OFFSET(x) \
+ offsetof(struct vfio_device_migration_info, x)
+
+struct vfio_device_migration_info {
+ __u32 device_state; /* VFIO device state */
+#define VFIO_DEVICE_STATE_STOP (0)
+#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING)
+ __u32 reserved;
+
+ __u32 device_cmd;
+ __u32 version_id;
+
+ __u64 pending_bytes;
+ __u64 data_offset;
+ __u64 data_size;
+};
+
+enum {
+ VFIO_DEVICE_STOP = 0xffff0001,
+ VFIO_DEVICE_CONTINUE,
+ VFIO_DEVICE_MIGRATION_CANCEL,
+};
+
+struct vfio_log_buf_sge {
+ __u64 len;
+ __u64 addr;
+};
+
+struct vfio_log_buf_info {
+ __u32 uuid;
+ __u64 buffer_size;
+ __u64 addrs_size;
+ __u64 frag_size;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_log_buf_ctl {
+ __u32 argsz;
+ __u32 flags;
+ #define VFIO_DEVICE_LOG_BUF_FLAG_SETUP (1 << 0)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_RELEASE (1 << 1)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_START (1 << 2)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STOP (1 << 3)
+ #define VFIO_DEVICE_LOG_BUF_FLAG_STATUS_QUERY (1 << 4)
+ void *data;
+};
+#define VFIO_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 21)
+#define VFIO_GET_LOG_BUF_FD _IO(VFIO_TYPE, VFIO_BASE + 22)
+#define VFIO_DEVICE_LOG_BUF_CTL _IO(VFIO_TYPE, VFIO_BASE + 23)
+
+struct vf_migration_log_info {
+ __u32 dom_uuid;
+ __u64 buffer_size;
+ __u64 sge_len;
+ __u64 sge_num;
+ struct vfio_log_buf_sge *sgevec;
+};
+
+struct vfio_device_migration_ops {
+ /* Get device information */
+ int (*get_info)(struct pci_dev *pdev,
+ struct vfio_device_migration_info *info);
+ /* Enable a vf device */
+ int (*enable)(struct pci_dev *pdev);
+ /* Disable a vf device */
+ int (*disable)(struct pci_dev *pdev);
+ /* Save a vf device */
+ int (*save)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Resuming a vf device */
+ int (*restore)(struct pci_dev *pdev, void *base,
+ uint64_t off, uint64_t count);
+ /* Log start a vf device */
+ int (*log_start)(struct pci_dev *pdev,
+ struct vf_migration_log_info *log_info);
+ /* Log stop a vf device */
+ int (*log_stop)(struct pci_dev *pdev, uint32_t uuid);
+ /* Get vf device log status */
+ int (*get_log_status)(struct pci_dev *pdev);
+ /* Pre enable a vf device(load_setup, before restore a vf) */
+ int (*pre_enable)(struct pci_dev *pdev);
+ /* Cancel a vf device when live migration failed (rollback) */
+ int (*cancel)(struct pci_dev *pdev);
+ /* Init a vf device */
+ int (*init)(struct pci_dev *pdev);
+ /* Uninit a vf device */
+ void (*uninit)(struct pci_dev *pdev);
+ /* Release a vf device */
+ void (*release)(struct pci_dev *pdev);
+};
+
+struct vfio_pci_vendor_mig_driver {
+ struct pci_dev *pdev;
+ unsigned char bus_num;
+ struct vfio_device_migration_ops *dev_mig_ops;
+ struct module *owner;
+ atomic_t count;
+ struct list_head list;
+};
+
+struct vfio_pci_migration_data {
+ u64 state_size;
+ struct pci_dev *vf_dev;
+ struct vfio_pci_vendor_mig_driver *mig_driver;
+ struct vfio_device_migration_info *mig_ctl;
+ void *vf_data;
+};
+
+int vfio_pci_register_migration_ops(struct vfio_device_migration_ops *ops,
+ struct module *mod, struct pci_dev *pdev);
+void vfio_pci_unregister_migration_ops(struct module *mod,
+ struct pci_dev *pdev);
+
+#endif /* VFIO_PCI_MIGRATION_H */
--
1.8.3.1
1
0

[PATCH OLK-5.10 v2 1/2] ipmi/watchdog: replace atomic_add() and atomic_sub()
by Miaohe Lin 24 Jun '22
by Miaohe Lin 24 Jun '22
24 Jun '22
From: Yejune Deng <yejune.deng(a)gmail.com>
mainline inclusion
from v5.11-rc1
commit a01a89b1db1066a6af23ae08b9a0c345b7966f0b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5DVR9
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
atomic_inc() and atomic_dec() looks better
Signed-off-by: Yejune Deng <yejune.deng(a)gmail.com>
Message-Id: <1605511807-7135-1-git-send-email-yejune.deng(a)gmail.com>
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
---
drivers/char/ipmi/ipmi_watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index f78156d93c3f..32c334e34d55 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -495,7 +495,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -505,7 +505,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -529,12 +529,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
--
2.23.0
1
1

23 Jun '22
From: Yejune Deng <yejune.deng(a)gmail.com>
mainline inclusion
from v5.11-rc1
commit a01a89b1db1066a6af23ae08b9a0c345b7966f0b
category: bugfix
bugzilla: NA
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
atomic_inc() and atomic_dec() looks better
Signed-off-by: Yejune Deng <yejune.deng(a)gmail.com>
Message-Id: <1605511807-7135-1-git-send-email-yejune.deng(a)gmail.com>
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
---
drivers/char/ipmi/ipmi_watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index f78156d93c3f..32c334e34d55 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -495,7 +495,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -505,7 +505,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -529,12 +529,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_add(1, &panic_done_count);
+ atomic_inc(&panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_sub(1, &panic_done_count);
+ atomic_dec(&panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
--
2.23.0
2
2