Alexander Duyck (1): e1000: Do not perform reset in reset_task if we are already down
Alistair Popple (1): mm/rmap: fixup copying of soft dirty and uffd ptes
Aya Levin (2): net/mlx5e: Fix VLAN cleanup flow net/mlx5e: Fix VLAN create flow
Chuck Lever (1): svcrdma: Fix leak of transport addresses
Cong Wang (1): tipc: fix the skb_unshare() in tipc_buf_append()
Dan Aloni (1): svcrdma: fix bounce buffers for unaligned offsets and multiple pages
Dumitru Ceara (1): openvswitch: handle DNAT tuple collision
Eran Ben Elisha (1): net/mlx5: Don't call timecounter cyc2time directly from 1PPS flow
Hanjun Guo (4): irqchip/gicv3: Call acpi_put_table() to fix memory leak tty/amba-pl011: Call acpi_put_table() to fix memory leak cpufreq : CPPC: Break out if HiSilicon CPPC workaround is matched cpufreq: CPPC: put ACPI table after using it
Jonathan Lemon (1): mlx4: handle non-napi callers to napi_poll
Linus Torvalds (1): tty: make FONTX ioctl use the tty pointer they were actually passed
Nikolai Merinov (1): partitions/efi: Fix partition name parsing in GUID partition entry
Rohit Maheshwari (1): net/tls: sendfile fails with ktls offload
Sabrina Dubroca (1): xfrmi: drop ignore_df check before updating pmtu
Tonghao Zhang (2): net: openvswitch: use u64 for meter bucket net: openvswitch: use div_u64() for 64-by-32 divisions
Tuong Lien (1): tipc: fix memory leak in service subscripting
Yang Shi (1): mm: madvise: fix vma user-after-free
Yunsheng Lin (1): net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
kiyin(尹亮) (1): perf/core: Fix a memory leak in perf_event_parse_addr_filter()
block/partitions/efi.c | 35 +++++++++---- block/partitions/efi.h | 2 +- drivers/cpufreq/cppc_cpufreq.c | 8 ++- drivers/irqchip/irq-gic-v3.c | 2 + drivers/net/ethernet/intel/e1000/e1000_main.c | 18 +++++-- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 3 ++ drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_fs.c | 14 ++++-- .../ethernet/mellanox/mlx5/core/lib/clock.c | 5 +- drivers/tty/serial/amba-pl011.c | 2 + drivers/tty/vt/vt_ioctl.c | 32 ++++++------ kernel/events/core.c | 12 ++--- mm/madvise.c | 2 +- mm/migrate.c | 9 +++- mm/rmap.c | 7 ++- net/openvswitch/conntrack.c | 22 +++++---- net/openvswitch/meter.c | 4 +- net/openvswitch/meter.h | 2 +- net/sched/sch_generic.c | 49 +++++++++++++------ net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 1 + net/sunrpc/xprtrdma/svc_rdma_sendto.c | 3 +- net/tipc/msg.c | 3 +- net/tipc/topsrv.c | 4 +- net/tls/tls_device.c | 11 +++-- net/xfrm/xfrm_interface.c | 2 +- 25 files changed, 168 insertions(+), 86 deletions(-)
From: Linus Torvalds torvalds@linux-foundation.org
mainline inclusion from mainline-v5.10-rc3 commit 90bfdeef83f1d6c696039b6a917190dcbbad3220 category: bugfix bugzilla: NA CVE: CVE-2020-25668
--------------------------------
Some of the font tty ioctl's always used the current foreground VC for their operations. Don't do that then.
This fixes a data race on fg_console.
Side note: both Michael Ellerman and Jiri Slaby point out that all these ioctls are deprecated, and should probably have been removed long ago, and everything seems to be using the KDFONTOP ioctl instead.
In fact, Michael points out that it looks like busybox's loadfont program seems to have switched over to using KDFONTOP exactly _because_ of this bug (ahem.. 12 years ago ;-).
Reported-by: Minh Yuan yuanmingbuaa@gmail.com Acked-by: Michael Ellerman mpe@ellerman.id.au Acked-by: Jiri Slaby jirislaby@kernel.org Cc: Greg KH greg@kroah.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: drivers/tty/vt/vt_ioctl.c [yyl: There is no vt_io_fontreset(), change the vc_cons to vc in do_fontx_ioctl()] Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/tty/vt/vt_ioctl.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-)
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c index 6a82030cf1ef..2e959563af53 100644 --- a/drivers/tty/vt/vt_ioctl.c +++ b/drivers/tty/vt/vt_ioctl.c @@ -244,7 +244,7 @@ int vt_waitactive(int n)
static inline int -do_fontx_ioctl(int cmd, struct consolefontdesc __user *user_cfd, int perm, struct console_font_op *op) +do_fontx_ioctl(struct vc_data *vc, int cmd, struct consolefontdesc __user *user_cfd, int perm, struct console_font_op *op) { struct consolefontdesc cfdarg; int i; @@ -262,15 +262,16 @@ do_fontx_ioctl(int cmd, struct consolefontdesc __user *user_cfd, int perm, struc op->height = cfdarg.charheight; op->charcount = cfdarg.charcount; op->data = cfdarg.chardata; - return con_font_op(vc_cons[fg_console].d, op); - case GIO_FONTX: { + return con_font_op(vc, op); + + case GIO_FONTX: op->op = KD_FONT_OP_GET; op->flags = KD_FONT_FLAG_OLD; op->width = 8; op->height = cfdarg.charheight; op->charcount = cfdarg.charcount; op->data = cfdarg.chardata; - i = con_font_op(vc_cons[fg_console].d, op); + i = con_font_op(vc, op); if (i) return i; cfdarg.charheight = op->height; @@ -278,7 +279,6 @@ do_fontx_ioctl(int cmd, struct consolefontdesc __user *user_cfd, int perm, struc if (copy_to_user(user_cfd, &cfdarg, sizeof(struct consolefontdesc))) return -EFAULT; return 0; - } } return -EINVAL; } @@ -924,7 +924,7 @@ int vt_ioctl(struct tty_struct *tty, op.height = 0; op.charcount = 256; op.data = up; - ret = con_font_op(vc_cons[fg_console].d, &op); + ret = con_font_op(vc, &op); break; }
@@ -935,7 +935,7 @@ int vt_ioctl(struct tty_struct *tty, op.height = 32; op.charcount = 256; op.data = up; - ret = con_font_op(vc_cons[fg_console].d, &op); + ret = con_font_op(vc, &op); break; }
@@ -952,7 +952,7 @@ int vt_ioctl(struct tty_struct *tty,
case PIO_FONTX: case GIO_FONTX: - ret = do_fontx_ioctl(cmd, up, perm, &op); + ret = do_fontx_ioctl(vc, cmd, up, perm, &op); break;
case PIO_FONTRESET: @@ -969,11 +969,11 @@ int vt_ioctl(struct tty_struct *tty, { op.op = KD_FONT_OP_SET_DEFAULT; op.data = NULL; - ret = con_font_op(vc_cons[fg_console].d, &op); + ret = con_font_op(vc, &op); if (ret) break; console_lock(); - con_set_default_unimap(vc_cons[fg_console].d); + con_set_default_unimap(vc); console_unlock(); break; } @@ -1100,8 +1100,9 @@ struct compat_consolefontdesc { };
static inline int -compat_fontx_ioctl(int cmd, struct compat_consolefontdesc __user *user_cfd, - int perm, struct console_font_op *op) +compat_fontx_ioctl(struct vc_data *vc, int cmd, + struct compat_consolefontdesc __user *user_cfd, + int perm, struct console_font_op *op) { struct compat_consolefontdesc cfdarg; int i; @@ -1119,7 +1120,8 @@ compat_fontx_ioctl(int cmd, struct compat_consolefontdesc __user *user_cfd, op->height = cfdarg.charheight; op->charcount = cfdarg.charcount; op->data = compat_ptr(cfdarg.chardata); - return con_font_op(vc_cons[fg_console].d, op); + return con_font_op(vc, op); + case GIO_FONTX: op->op = KD_FONT_OP_GET; op->flags = KD_FONT_FLAG_OLD; @@ -1127,7 +1129,7 @@ compat_fontx_ioctl(int cmd, struct compat_consolefontdesc __user *user_cfd, op->height = cfdarg.charheight; op->charcount = cfdarg.charcount; op->data = compat_ptr(cfdarg.chardata); - i = con_font_op(vc_cons[fg_console].d, op); + i = con_font_op(vc, op); if (i) return i; cfdarg.charheight = op->height; @@ -1218,7 +1220,7 @@ long vt_compat_ioctl(struct tty_struct *tty, */ case PIO_FONTX: case GIO_FONTX: - ret = compat_fontx_ioctl(cmd, up, perm, &op); + ret = compat_fontx_ioctl(vc, cmd, up, perm, &op); break;
case KDFONTOP:
From: Nikolai Merinov n.merinov@inango-systems.com
mainline inclusion from mainline-5.7-rc1 commit d5528d5e91041e68e8eab9792ce627705a0ed273 category: bugfix bugzilla: 32454 CVE: NA ---------------------------
GUID partition entry defined to have a partition name as 36 UTF-16LE code units. This means that on big-endian platforms ASCII symbols would be read with 0xXX00 efi_char16_t character code. In order to correctly extract ASCII characters from a partition name field we should be converted from 16LE to CPU architecture.
The problem exists on all big endian platforms.
[ mingo: Minor edits. ]
Fixes: eec7ecfede74 ("genhd, efi: add efi partition metadata to hd_structs") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Nikolai Merinov n.merinov@inango-systems.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20200308080859.21568-29-ardb@kernel.org Link: https://lore.kernel.org/r/797777312.1324734.1582544319435.JavaMail.zimbra@in... Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/partitions/efi.c | 35 ++++++++++++++++++++++++++--------- block/partitions/efi.h | 2 +- 2 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/block/partitions/efi.c b/block/partitions/efi.c index 39f70d968754..b9beaa0a9b36 100644 --- a/block/partitions/efi.c +++ b/block/partitions/efi.c @@ -670,6 +670,31 @@ static int find_valid_gpt(struct parsed_partitions *state, gpt_header **gpt, return 0; }
+/** + * utf16_le_to_7bit(): Naively converts a UTF-16LE string to 7-bit ASCII characters + * @in: input UTF-16LE string + * @size: size of the input string + * @out: output string ptr, should be capable to store @size+1 characters + * + * Description: Converts @size UTF16-LE symbols from @in string to 7-bit + * ASCII characters and stores them to @out. Adds trailing zero to @out array. + */ +static void utf16_le_to_7bit(const __le16 *in, unsigned int size, u8 *out) +{ + unsigned int i = 0; + + out[size] = 0; + + while (i < size) { + u8 c = le16_to_cpu(in[i]) & 0xff; + + if (c && !isprint(c)) + c = '!'; + out[i] = c; + i++; + } +} + /** * efi_partition(struct parsed_partitions *state) * @state: disk parsed partitions @@ -706,7 +731,6 @@ int efi_partition(struct parsed_partitions *state)
for (i = 0; i < le32_to_cpu(gpt->num_partition_entries) && i < state->limit-1; i++) { struct partition_meta_info *info; - unsigned label_count = 0; unsigned label_max; u64 start = le64_to_cpu(ptes[i].starting_lba); u64 size = le64_to_cpu(ptes[i].ending_lba) - @@ -727,14 +751,7 @@ int efi_partition(struct parsed_partitions *state) /* Naively convert UTF16-LE to 7 bits. */ label_max = min(ARRAY_SIZE(info->volname) - 1, ARRAY_SIZE(ptes[i].partition_name)); - info->volname[label_max] = 0; - while (label_count < label_max) { - u8 c = ptes[i].partition_name[label_count] & 0xff; - if (c && !isprint(c)) - c = '!'; - info->volname[label_count] = c; - label_count++; - } + utf16_le_to_7bit(ptes[i].partition_name, label_max, info->volname); state->parts[i + 1].has_info = true; } kfree(ptes); diff --git a/block/partitions/efi.h b/block/partitions/efi.h index abd0b19288a6..42db2513ecfa 100644 --- a/block/partitions/efi.h +++ b/block/partitions/efi.h @@ -102,7 +102,7 @@ typedef struct _gpt_entry { __le64 starting_lba; __le64 ending_lba; gpt_entry_attributes attributes; - efi_char16_t partition_name[72 / sizeof (efi_char16_t)]; + __le16 partition_name[72/sizeof(__le16)]; } __packed gpt_entry;
typedef struct _gpt_mbr_record {
From: Hanjun Guo guohanjun@huawei.com
hulk inclusion category: bugfix bugzilla: NA CVE: NA
----------------------------------------
With check_early_ioremap_leak(), we got warning:
[ 2.599829] ------------[ cut here ]------------ [ 2.604436] Debug warning: early ioremap leak of 1 areas detected. [ 2.604436] please boot with early_ioremap_debug and report the dmesg. [ 2.617135] WARNING: CPU: 0 PID: 1 at mm/early_ioremap.c:99 check_early_ioremap_leak+0x4c/0x60 [ 2.625731] Modules linked in: [ 2.628775] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #32 [ 2.634941] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 [ 2.643451] pstate: 60c00009 (nZCv daif +PAN +UAO) [ 2.648229] pc : check_early_ioremap_leak+0x4c/0x60 [ 2.653093] lr : check_early_ioremap_leak+0x4c/0x60 [ 2.657956] sp : ffff00008222fd90 [ 2.661257] x29: ffff00008222fd90 x28: ffff000080f41070 [ 2.666555] x27: ffff000080e507b0 x26: ffff000081480000 [ 2.671854] x25: 0000000000000008 x24: 0000000000000007 [ 2.677152] x23: ffff000080e3ae58 x22: ffff000081480000 [ 2.682450] x21: 0000000000000000 x20: ffff000080e811b0 [ 2.687748] x19: ffff0000812f3000 x18: ffffffffffffffff [ 2.693046] x17: 000000008c2a0cd7 x16: 00000000e2c72cdd [ 2.698344] x15: ffff0000812f3708 x14: 00017 [ 2.724834] x5 : 000000000000000f x4 : 0000000000000000 [ 2.730132] x3 : 0000000000000000 x2 : ffffffffffffffff [ 2.735430] x1 : 612b3b9664d69500 x0 : 0000000000000000 [ 2.740729] Call trace: [ 2.743163] check_early_ioremap_leak+0x4c/0x60 [ 2.747681] do_one_initcall+0x54/0x200 [ 2.751504] kernel_init_freeable+0x2a0/0x34c [ 2.755849] kernel_init+0x18/0x118 [ 2.759324] ret_from_fork+0x10/0x18 [ 2.762887] ---[ end trace a57e2a42868dc894 ]---
adding early_ioremap_debug=1 in boot cmdline,
[ 0.000000] ------------[ cut here ]------------ [ 0.000000] __early_ioremap(396f0000, 00010000) [1] => 00000000 + ffff7fdffe620000 [ 0.000000] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:162 __early_ioremap+0x18c/0x1c8 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.19.90+ #31 [ 0.000000] pstate: 60000089 (nZCv daIf -PAN -UAO) [ 0.000000] pc : __early_ioremap+0x18c/0x1c8 [ 0.000000] lr : __early_ioremap+0x18c/0x1c8 [ 0.000000] sp : ffff0000812afc00 [ 0.000000] x29: ffff0000812afc00 x28: ffff000080f2c878 [ 0.000000] x27: 0000000000000000 x26: 0000000000000001 [ 0.000000] x25: 0068000000000713 x24: 00000000396e0000 [ 0.000000] x23: 0000000000000001 x22: 0000000000000000 [ 0.000000] x21: 0000000000010000 x20: 00000000396f0000 [ 0.000000] x19: ffff000080f2c000 x18: ffffffffffffffff [ 0.000000] x17: 0000000000000000 x16: 0000000000000000 [ 0.000000] x15: ffff0000812d3708 x14: 3030323665666664 [ 0.000000] x13: 663766666666202b x12: 2030303030303030 [ 0.000000] x11: 30203e3d205d315b x10: 2029303030303130 [ 0.000000] x9 : 3030202c30303030 x8 : 0000000000001181 [ 0.000000] x7 : 65726f695f796c72 x6 : ffff0000814ec9ed [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 [ 0.000000] x3 : 0000000000000000 x2 : ffffffffffffffff [ 0.000000] x1 : ed48fa59d999c400 x0 : 0000000000000000 [ 0.000000] Call trace: [ 0.000000] __early_ioremap+0x18c/0x1c8 [ 0.000000] early_memremap+0x7c/0x88 [ 0.000000] __acpi_map_table+0x2c/0x40 [ 0.000000] acpi_os_map_iomem+0x134/0x208 [ 0.000000] acpi_os_map_memory+0x28/0x38 [ 0.000000] acpi_tb_acquire_table+0x58/0x8c [ 0.000000] acpi_tb_validate_table+0x34/0x58 [ 0.000000] acpi_tb_get_table+0x4c/0x90 [ 0.000000] acpi_get_table+0x94/0xc4 [ 0.000000] acpi_table_parse_entries_array+0x98/0x224 [ 0.000000] acpi_table_parse_entries+0x70/0x98 [ 0.000000] acpi_table_parse_madt+0x40/0x50 [ 0.000000] __acpi_probe_device_table+0x90/0xf0 [ 0.000000] irqchip_init+0x38/0x40 [ 0.000000] init_IRQ+0x10c/0x140 [ 0.000000] start_kernel+0x35c/0x4fc [ 0.000000] ---[ end trace f68728a0d3053bc0 ]---
It turns out that we missed the acpi_put_table() after get the table, fix it.
Reported-by: Chao Gao gaochao24@huawei.com Signed-off-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/irqchip/irq-gic-v3.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 99cc646819f5..7bf14acdcd28 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -940,6 +940,8 @@ static void gic_check_hisi_workaround(void) break; } } + + acpi_put_table(tbl); }
static void gic_compute_nr_gicr(void)
From: Hanjun Guo guohanjun@huawei.com
hulk inclusion category: bugfix bugzilla: NA CVE: NA
----------------------------------------
acpi_get_table() should be coupled with acpi_put_table(), or it will leat to memory leak, fix the memory leak to call acpi_put_table().
Signed-off-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/tty/serial/amba-pl011.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index f55e4d9596cb..8d2cffedbe37 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -1523,6 +1523,8 @@ static void pl011_check_hisi_workaround(void) break; } } + + acpi_put_table(tbl); }
#else
From: Hanjun Guo guohanjun@huawei.com
mainline inclusion from mainline-v5.6-rc1 commit c740237937c039c06e9cda32b9a37dde8b0d1e63 category: bugfix bugzilla: NA CVE: NA
--------------------------------
Bail out if we match the OEM information, to save some possible extra iteration.
Also update the code to fix minor coding style issue.
Signed-off-by: Hanjun Guo guohanjun@huawei.com [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/cpufreq/cppc_cpufreq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 9d4da48a4004..22d98c6db945 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -43,7 +43,7 @@ static struct cppc_cpudata **all_cpu_data;
struct cppc_workaround_oem_info { - char oem_id[ACPI_OEM_ID_SIZE +1]; + char oem_id[ACPI_OEM_ID_SIZE + 1]; char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1]; u32 oem_revision; }; @@ -97,8 +97,10 @@ static void cppc_check_hisi_workaround(void) for (i = 0; i < ARRAY_SIZE(wa_info); i++) { if (!memcmp(wa_info[i].oem_id, tbl->oem_id, ACPI_OEM_ID_SIZE) && !memcmp(wa_info[i].oem_table_id, tbl->oem_table_id, ACPI_OEM_TABLE_ID_SIZE) && - wa_info[i].oem_revision == tbl->oem_revision) + wa_info[i].oem_revision == tbl->oem_revision) { apply_hisi_workaround = true; + break; + } } }
From: Hanjun Guo guohanjun@huawei.com
mainline inclusion from mainline-v5.6-rc1 commit 80e8b1e59f0399b94a6088bcb9477bd798cc5eba category: bugfix bugzilla: NA CVE: NA
--------------------------------
Put the ACPI table to release the table mapping after using it successfully.
Signed-off-by: Hanjun Guo guohanjun@huawei.com [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/cpufreq/cppc_cpufreq.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 22d98c6db945..0a245f1caa95 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -102,6 +102,8 @@ static void cppc_check_hisi_workaround(void) break; } } + + acpi_put_table(tbl); }
/* Callback function used to retrieve the max frequency from DMI */
From: Yunsheng Lin linyunsheng@huawei.com
stable inclusion from linux-4.19.148 commit 749cc0b0c7f3dcdfe5842f998c0274e54987384f
--------------------------------
[ Upstream commit 2fb541c862c987d02dfdf28f1545016deecfa0d5 ]
Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later.
Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called.
Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin linyunsheng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/sched/sch_generic.c | 49 +++++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 16 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 119e20cad662..bd96fd261dba 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1115,27 +1115,36 @@ static void dev_deactivate_queue(struct net_device *dev, struct netdev_queue *dev_queue, void *_qdisc_default) { - struct Qdisc *qdisc_default = _qdisc_default; - struct Qdisc *qdisc; + struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc);
- qdisc = rtnl_dereference(dev_queue->qdisc); if (qdisc) { - bool nolock = qdisc->flags & TCQ_F_NOLOCK; - - if (nolock) - spin_lock_bh(&qdisc->seqlock); - spin_lock_bh(qdisc_lock(qdisc)); - if (!(qdisc->flags & TCQ_F_BUILTIN)) set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state); + } +}
- rcu_assign_pointer(dev_queue->qdisc, qdisc_default); - qdisc_reset(qdisc); +static void dev_reset_queue(struct net_device *dev, + struct netdev_queue *dev_queue, + void *_unused) +{ + struct Qdisc *qdisc; + bool nolock;
- spin_unlock_bh(qdisc_lock(qdisc)); - if (nolock) - spin_unlock_bh(&qdisc->seqlock); - } + qdisc = dev_queue->qdisc_sleeping; + if (!qdisc) + return; + + nolock = qdisc->flags & TCQ_F_NOLOCK; + + if (nolock) + spin_lock_bh(&qdisc->seqlock); + spin_lock_bh(qdisc_lock(qdisc)); + + qdisc_reset(qdisc); + + spin_unlock_bh(qdisc_lock(qdisc)); + if (nolock) + spin_unlock_bh(&qdisc->seqlock); }
static bool some_qdisc_is_busy(struct net_device *dev) @@ -1196,12 +1205,20 @@ void dev_deactivate_many(struct list_head *head) dev_watchdog_down(dev); }
- /* Wait for outstanding qdisc-less dev_queue_xmit calls. + /* Wait for outstanding qdisc-less dev_queue_xmit calls or + * outstanding qdisc enqueuing calls. * This is avoided if all devices are in dismantle phase : * Caller will call synchronize_net() for us */ synchronize_net();
+ list_for_each_entry(dev, head, close_list) { + netdev_for_each_tx_queue(dev, dev_reset_queue, NULL); + + if (dev_ingress_queue(dev)) + dev_reset_queue(dev, dev_ingress_queue(dev), NULL); + } + /* Wait for outstanding qdisc_run calls. */ list_for_each_entry(dev, head, close_list) { while (some_qdisc_is_busy(dev))
From: Chuck Lever chuck.lever@oracle.com
stable inclusion from linux-4.19.149 commit 308aeb3629c8745ef55ec38545cf2dc338108267
--------------------------------
[ Upstream commit 1a33d8a284b1e85e03b8c7b1ea8fb985fccd1d71 ]
Kernel memory leak detected:
unreferenced object 0xffff888849cdf480 (size 8): comm "kworker/u8:3", pid 2086, jiffies 4297898756 (age 4269.856s) hex dump (first 8 bytes): 30 00 cd 49 88 88 ff ff 0..I.... backtrace: [<00000000acfc370b>] __kmalloc_track_caller+0x137/0x183 [<00000000a2724354>] kstrdup+0x2b/0x43 [<0000000082964f84>] xprt_rdma_format_addresses+0x114/0x17d [rpcrdma] [<00000000dfa6ed00>] xprt_setup_rdma_bc+0xc0/0x10c [rpcrdma] [<0000000073051a83>] xprt_create_transport+0x3f/0x1a0 [sunrpc] [<0000000053531a8e>] rpc_create+0x118/0x1cd [sunrpc] [<000000003a51b5f8>] setup_callback_client+0x1a5/0x27d [nfsd] [<000000001bd410af>] nfsd4_process_cb_update.isra.7+0x16c/0x1ac [nfsd] [<000000007f4bbd56>] nfsd4_run_cb_work+0x4c/0xbd [nfsd] [<0000000055c5586b>] process_one_work+0x1b2/0x2fe [<00000000b1e3e8ef>] worker_thread+0x1a6/0x25a [<000000005205fb78>] kthread+0xf6/0xfb [<000000006d2dc057>] ret_from_fork+0x3a/0x50
Introduce a call to xprt_rdma_free_addresses() similar to the way that the TCP backchannel releases a transport's peer address strings.
Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index b9827665ff35..d183d4aee822 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -256,6 +256,7 @@ xprt_rdma_bc_put(struct rpc_xprt *xprt) { dprintk("svcrdma: %s: xprt %p\n", __func__, xprt);
+ xprt_rdma_free_addresses(xprt); xprt_free(xprt); module_put(THIS_MODULE); }
From: Tonghao Zhang xiangxia.m.yue@gmail.com
stable inclusion from linux-4.19.149 commit 6043d6112f7dece5285eb87edc49b5d4ac248297
--------------------------------
[ Upstream commit e57358873bb5d6caa882b9684f59140912b37dde ]
When setting the meter rate to 4+Gbps, there is an overflow, the meters don't work as expected.
Cc: Pravin B Shelar pshelar@ovn.org Cc: Andy Zhou azhou@ovn.org Signed-off-by: Tonghao Zhang xiangxia.m.yue@gmail.com Acked-by: Pravin B Shelar pshelar@ovn.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/openvswitch/meter.c | 2 +- net/openvswitch/meter.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c index c038e021a591..6f5131d1074b 100644 --- a/net/openvswitch/meter.c +++ b/net/openvswitch/meter.c @@ -255,7 +255,7 @@ static struct dp_meter *dp_meter_create(struct nlattr **a) * * Start with a full bucket. */ - band->bucket = (band->burst_size + band->rate) * 1000; + band->bucket = (band->burst_size + band->rate) * 1000ULL; band_max_delta_t = band->bucket / band->rate; if (band_max_delta_t > meter->max_delta_t) meter->max_delta_t = band_max_delta_t; diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h index 964ace2650f8..970557ed5b5b 100644 --- a/net/openvswitch/meter.h +++ b/net/openvswitch/meter.h @@ -26,7 +26,7 @@ struct dp_meter_band { u32 type; u32 rate; u32 burst_size; - u32 bucket; /* 1/1000 packets, or in bits */ + u64 bucket; /* 1/1000 packets, or in bits */ struct ovs_flow_stats stats; };
From: Tuong Lien tuong.t.lien@dektech.com.au
stable inclusion from linux-4.19.149 commit 6b3ea3aa6c675b65b6b068f5726c93abc8a4b460
--------------------------------
[ Upstream commit 0771d7df819284d46cf5cfb57698621b503ec17f ]
Upon receipt of a service subscription request from user via a topology connection, one 'sub' object will be allocated in kernel, so it will be able to send an event of the service if any to the user correspondingly then. Also, in case of any failure, the connection will be shutdown and all the pertaining 'sub' objects will be freed.
However, there is a race condition as follows resulting in memory leak:
receive-work connection send-work | | | sub-1 |<------//-------| | sub-2 |<------//-------| | | |<---------------| evt for sub-x sub-3 |<------//-------| | : : : : : : | /--------| | | | * peer closed | | | | | | | |<-------X-------| evt for sub-y | | |<===============| sub-n |<------/ X shutdown | -> orphan | |
That is, the 'receive-work' may get the last subscription request while the 'send-work' is shutting down the connection due to peer close.
We had a 'lock' on the connection, so the two actions cannot be carried out simultaneously. If the last subscription is allocated e.g. 'sub-n', before the 'send-work' closes the connection, there will be no issue at all, the 'sub' objects will be freed. In contrast the last subscription will become orphan since the connection was closed, and we released all references.
This commit fixes the issue by simply adding one test if the connection remains in 'connected' state right after we obtain the connection lock, then a subscription object can be created as usual, otherwise we ignore it.
Acked-by: Ying Xue ying.xue@windriver.com Acked-by: Jon Maloy jmaloy@redhat.com Reported-by: Thang Ngo thang.h.ngo@dektech.com.au Signed-off-by: Tuong Lien tuong.t.lien@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/tipc/topsrv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index 41f4464ac6cc..ec9a7137d267 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -407,7 +407,9 @@ static int tipc_conn_rcv_from_sock(struct tipc_conn *con) return -EWOULDBLOCK; if (ret == sizeof(s)) { read_lock_bh(&sk->sk_callback_lock); - ret = tipc_conn_rcv_sub(srv, con, &s); + /* RACE: the connection can be closed in the meantime */ + if (likely(connected(con))) + ret = tipc_conn_rcv_sub(srv, con, &s); read_unlock_bh(&sk->sk_callback_lock); if (!ret) return 0;
From: Alexander Duyck alexander.h.duyck@linux.intel.com
stable inclusion from linux-4.19.149 commit dc8ecb8017bfcf864c051ba7c022a82f36aa7700
--------------------------------
[ Upstream commit 49ee3c2ab5234757bfb56a0b3a3cb422f427e3a3 ]
We are seeing a deadlock in e1000 down when NAPI is being disabled. Looking over the kernel function trace of the system it appears that the interface is being closed and then a reset is hitting which deadlocks the interface as the NAPI interface is already disabled.
To prevent this from happening I am disabling the reset task when __E1000_DOWN is already set. In addition code has been added so that we set the __E1000_DOWN while holding the __E1000_RESET flag in e1000_close in order to guarantee that the reset task will not run after we have started the close call.
Signed-off-by: Alexander Duyck alexander.h.duyck@linux.intel.com Tested-by: Maxim Zhukov mussitantesmortem@gmail.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c index ec2e158d18f5..18b61be9e0b9 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_main.c +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c @@ -542,8 +542,13 @@ void e1000_reinit_locked(struct e1000_adapter *adapter) WARN_ON(in_interrupt()); while (test_and_set_bit(__E1000_RESETTING, &adapter->flags)) msleep(1); - e1000_down(adapter); - e1000_up(adapter); + + /* only run the task if not already down */ + if (!test_bit(__E1000_DOWN, &adapter->flags)) { + e1000_down(adapter); + e1000_up(adapter); + } + clear_bit(__E1000_RESETTING, &adapter->flags); }
@@ -1433,10 +1438,15 @@ int e1000_close(struct net_device *netdev) struct e1000_hw *hw = &adapter->hw; int count = E1000_CHECK_RESET_COUNT;
- while (test_bit(__E1000_RESETTING, &adapter->flags) && count--) + while (test_and_set_bit(__E1000_RESETTING, &adapter->flags) && count--) usleep_range(10000, 20000);
- WARN_ON(test_bit(__E1000_RESETTING, &adapter->flags)); + WARN_ON(count < 0); + + /* signal that we're down so that the reset task will no longer run */ + set_bit(__E1000_DOWN, &adapter->flags); + clear_bit(__E1000_RESETTING, &adapter->flags); + e1000_down(adapter); e1000_power_down_phy(adapter); e1000_free_irq(adapter);
From: Tonghao Zhang xiangxia.m.yue@gmail.com
stable inclusion from linux-4.19.149 commit 1e6a4232befee0c3dbd201f8a50b5c333498f259
--------------------------------
[ Upstream commit 659d4587fe7233bfdff303744b20d6f41ad04362 ]
Compile the kernel for arm 32 platform, the build warning found. To fix that, should use div_u64() for divisions. | net/openvswitch/meter.c:396: undefined reference to `__udivdi3'
[add more commit msg, change reported tag, and use div_u64 instead of do_div by Tonghao]
Fixes: e57358873bb5d6ca ("net: openvswitch: use u64 for meter bucket") Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Tonghao Zhang xiangxia.m.yue@gmail.com Tested-by: Tonghao Zhang xiangxia.m.yue@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/openvswitch/meter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c index 6f5131d1074b..5ea2471ffc03 100644 --- a/net/openvswitch/meter.c +++ b/net/openvswitch/meter.c @@ -256,7 +256,7 @@ static struct dp_meter *dp_meter_create(struct nlattr **a) * Start with a full bucket. */ band->bucket = (band->burst_size + band->rate) * 1000ULL; - band_max_delta_t = band->bucket / band->rate; + band_max_delta_t = div_u64(band->bucket, band->rate); if (band_max_delta_t > meter->max_delta_t) meter->max_delta_t = band_max_delta_t; band++;
From: Sabrina Dubroca sd@queasysnail.net
stable inclusion from linux-4.19.151 commit a01cb66b26a39062a3cccc6e77b3b53a737c254e
--------------------------------
commit 45a36a18d01907710bad5258d81f76c18882ad88 upstream.
xfrm interfaces currently test for !skb->ignore_df when deciding whether to update the pmtu on the skb's dst. Because of this, no pmtu exception is created when we do something like:
ping -s 1438 <dest>
By dropping this check, the pmtu exception will be created and the next ping attempt will work.
Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Reported-by: Xiumei Mu xmu@redhat.com Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/xfrm/xfrm_interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c index bd24359a9136..0079e5922067 100644 --- a/net/xfrm/xfrm_interface.c +++ b/net/xfrm/xfrm_interface.c @@ -293,7 +293,7 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) }
mtu = dst_mtu(dst); - if (!skb->ignore_df && skb->len > mtu) { + if (skb->len > mtu) { skb_dst_update_pmtu_no_confirm(skb, mtu);
if (skb->protocol == htons(ETH_P_IPV6)) {
From: Dumitru Ceara dceara@redhat.com
stable inclusion from linux-4.19.151 commit 5ab1e499f2428a4200ea098a7221efe7ef133530
--------------------------------
commit 8aa7b526dc0b5dbf40c1b834d76a667ad672a410 upstream.
With multiple DNAT rules it's possible that after destination translation the resulting tuples collide.
For example, two openvswitch flows: nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20)) nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
Assuming two TCP clients initiating the following connections: 10.0.0.10:5000->10.0.0.10:10 10.0.0.10:5000->10.0.0.20:10
Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing nf_conntrack_confirm() to fail because of tuple collision.
Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction.
Reported-at: https://bugzilla.redhat.com/1877128 Suggested-by: Florian Westphal fw@strlen.de Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Dumitru Ceara dceara@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/openvswitch/conntrack.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 9bfe89ec6efd..90462f58e991 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -899,15 +899,19 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, } err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype);
- if (err == NF_ACCEPT && - ct->status & IPS_SRC_NAT && ct->status & IPS_DST_NAT) { - if (maniptype == NF_NAT_MANIP_SRC) - maniptype = NF_NAT_MANIP_DST; - else - maniptype = NF_NAT_MANIP_SRC; - - err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, - maniptype); + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { + if (ct->status & IPS_SRC_NAT) { + if (maniptype == NF_NAT_MANIP_SRC) + maniptype = NF_NAT_MANIP_DST; + else + maniptype = NF_NAT_MANIP_SRC; + + err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, + maniptype); + } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { + err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL, + NF_NAT_MANIP_SRC); + } }
/* Mark NAT done if successful and update the flow key. */
From: Aya Levin ayal@nvidia.com
stable inclusion from linux-4.19.151 commit 7a592c6788f3dd9bf231808e8daadefed07d012d
--------------------------------
[ Upstream commit 8c7353b6f716436ad0bfda2b5c5524ab2dde5894 ]
Prior to this patch unloading an interface in promiscuous mode with RX VLAN filtering feature turned off - resulted in a warning. This is due to a wrong condition in the VLAN rules cleanup flow, which left the any-vid rules in the VLAN steering table. These rules prevented destroying the flow group and the flow table.
The any-vid rules are removed in 2 flows, but none of them remove it in case both promiscuous is set and VLAN filtering is off. Fix the issue by changing the condition of the VLAN table cleanup flow to clean also in case of promiscuous mode.
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1 ... ... ------------[ cut here ]------------ FW pages counter is 11560 after reclaiming all pages WARNING: CPU: 1 PID: 28729 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660 mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_function_teardown+0x2f/0x90 [mlx5_core] mlx5_unload_one+0x71/0x110 [mlx5_core] remove_one+0x44/0x80 [mlx5_core] pci_device_remove+0x3e/0xc0 device_release_driver_internal+0xfb/0x1c0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x68/0x90 pci_stop_and_remove_bus_device+0x12/0x20 hv_eject_device_work+0x6f/0x170 [pci_hyperv] ? __schedule+0x349/0x790 process_one_work+0x206/0x400 worker_thread+0x34/0x3f0 ? process_one_work+0x400/0x400 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 ---[ end trace 6283bde8d26170dc ]---
Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 76cc10e44080..b8c3ceaed585 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -415,8 +415,12 @@ static void mlx5e_del_vlan_rules(struct mlx5e_priv *priv) for_each_set_bit(i, priv->fs.vlan.active_svlans, VLAN_N_VID) mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, i);
- if (priv->fs.vlan.cvlan_filter_disabled && - !(priv->netdev->flags & IFF_PROMISC)) + WARN_ON_ONCE(!(test_bit(MLX5E_STATE_DESTROYING, &priv->state))); + + /* must be called after DESTROY bit is set and + * set_rx_mode is called and flushed + */ + if (priv->fs.vlan.cvlan_filter_disabled) mlx5e_del_any_vid_rules(priv); }
From: Aya Levin ayal@nvidia.com
stable inclusion from linux-4.19.151 commit f9c195bf08c0a215cf2bd89255f754c79e923055
--------------------------------
[ Upstream commit d4a16052bccdd695982f89d815ca075825115821 ]
When interface is attached while in promiscuous mode and with VLAN filtering turned off, both configurations are not respected and VLAN filtering is performed. There are 2 flows which add the any-vid rules during interface attach: VLAN creation table and set rx mode. Each is relaying on the other to add any-vid rules, eventually non of them does.
Fix this by adding any-vid rules on VLAN creation regardless of promiscuous mode.
Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index b8c3ceaed585..7ddacc9e4fe4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -217,6 +217,9 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv, break; }
+ if (WARN_ONCE(*rule_p, "VLAN rule already exists type %d", rule_type)) + return 0; + *rule_p = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
if (IS_ERR(*rule_p)) { @@ -397,8 +400,7 @@ static void mlx5e_add_vlan_rules(struct mlx5e_priv *priv) for_each_set_bit(i, priv->fs.vlan.active_svlans, VLAN_N_VID) mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, i);
- if (priv->fs.vlan.cvlan_filter_disabled && - !(priv->netdev->flags & IFF_PROMISC)) + if (priv->fs.vlan.cvlan_filter_disabled) mlx5e_add_any_vid_rules(priv); }
From: Jonathan Lemon bsd@fb.com
stable inclusion from linux-4.19.153 commit c0cded2aa1db11246240b24234e3c4470d431635
--------------------------------
[ Upstream commit b2b8a92733b288128feb57ffa694758cf475106c ]
netcons calls napi_poll with a budget of 0 to transmit packets. Handle this by: - skipping RX processing - do not try to recycle TX packets to the RX cache
Signed-off-by: Jonathan Lemon jonathan.lemon@gmail.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 3 +++ drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 45d9a5f8fa1b..f509a6ce31db 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -945,6 +945,9 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget) bool clean_complete = true; int done;
+ if (!budget) + return 0; + if (priv->tx_ring_num[TX_XDP]) { xdp_tx_cq = priv->tx_cq[TX_XDP][cq->ring]; if (xdp_tx_cq->xdp_busy) { diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 1857ee0f0871..e58052d07e39 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -343,7 +343,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv, .dma = tx_info->map0_dma, };
- if (!mlx4_en_rx_recycle(ring->recycle_ring, &frame)) { + if (!napi_mode || !mlx4_en_rx_recycle(ring->recycle_ring, &frame)) { dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir); put_page(tx_info->page);
From: Cong Wang xiyou.wangcong@gmail.com
stable inclusion from linux-4.19.153 commit 26217e062f976fc4e2b7b8b6981a6d119435ea51
--------------------------------
[ Upstream commit ed42989eab57d619667d7e87dfbd8fe207db54fe ]
skb_unshare() drops a reference count on the old skb unconditionally, so in the failure case, we end up freeing the skb twice here. And because the skb is allocated in fclone and cloned by caller tipc_msg_reassemble(), the consequence is actually freeing the original skb too, thus triggered the UAF by syzbot.
Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy().
Fixes: ff48b6222e65 ("tipc: use skb_unshare() instead in tipc_buf_append()") Reported-and-tested-by: syzbot+e96a7ba46281824cc46a@syzkaller.appspotmail.com Cc: Jon Maloy jmaloy@redhat.com Cc: Ying Xue ying.xue@windriver.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Xin Long lucien.xin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/tipc/msg.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/tipc/msg.c b/net/tipc/msg.c index b078b77620f1..0b8446cd541c 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -140,7 +140,8 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf) if (fragid == FIRST_FRAGMENT) { if (unlikely(head)) goto err; - frag = skb_unshare(frag, GFP_ATOMIC); + if (skb_cloned(frag)) + frag = skb_copy(frag, GFP_ATOMIC); if (unlikely(!frag)) goto err; head = *headbuf = frag;
From: Rohit Maheshwari rohitm@chelsio.com
stable inclusion from linux-4.19.153 commit b2d31640d4045d8bb7cf25d2bd53ce8d1b14fb40
--------------------------------
[ Upstream commit ea1dd3e9d080c961b9a451130b61c72dc9a5397b ]
At first when sendpage gets called, if there is more data, 'more' in tls_push_data() gets set which later sets pending_open_record_frags, but when there is no more data in file left, and last time tls_push_data() gets called, pending_open_record_frags doesn't get reset. And later when 2 bytes of encrypted alert comes as sendmsg, it first checks for pending_open_record_frags, and since this is set, it creates a record with 0 data bytes to encrypt, meaning record length is prepend_size + tag_size only, which causes problem. We should set/reset pending_open_record_frags based on more bit.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Rohit Maheshwari rohitm@chelsio.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/tls/tls_device.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 575d62130578..dd0fc2aa6875 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -351,13 +351,13 @@ static int tls_push_data(struct sock *sk, struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_offload_context_tx *ctx = tls_offload_ctx_tx(tls_ctx); int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST; - int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE); struct tls_record_info *record = ctx->open_record; struct page_frag *pfrag; size_t orig_size = size; u32 max_open_record_len; - int copy, rc = 0; + bool more = false; bool done = false; + int copy, rc = 0; long timeo;
if (flags & @@ -422,9 +422,8 @@ static int tls_push_data(struct sock *sk, if (!size) { last_record: tls_push_record_flags = flags; - if (more) { - tls_ctx->pending_open_record_frags = - record->num_frags; + if (flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE)) { + more = true; break; }
@@ -445,6 +444,8 @@ static int tls_push_data(struct sock *sk, } } while (!done);
+ tls_ctx->pending_open_record_frags = more; + if (orig_size - size > 0) rc = orig_size - size;
From: Eran Ben Elisha eranbe@mellanox.com
stable inclusion from linux-4.19.153 commit 85c4859b4f54d4469b1dcf411104cc40243c8c05
--------------------------------
[ Upstream commit 0d2ffdc8d4002a62de31ff7aa3bef28c843c3cbe ]
Before calling timecounter_cyc2time(), clock->lock must be taken. Use mlx5_timecounter_cyc2time instead which guarantees a safe access.
Fixes: afc98a0b46d8 ("net/mlx5: Update ptp_clock_event foreach PPS event") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c index d359e850dbf0..0fd62510fb27 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c @@ -475,8 +475,9 @@ void mlx5_pps_event(struct mlx5_core_dev *mdev, switch (clock->ptp_info.pin_config[pin].func) { case PTP_PF_EXTTS: ptp_event.index = pin; - ptp_event.timestamp = timecounter_cyc2time(&clock->tc, - be64_to_cpu(eqe->data.pps.time_stamp)); + ptp_event.timestamp = + mlx5_timecounter_cyc2time(clock, + be64_to_cpu(eqe->data.pps.time_stamp)); if (clock->pps_info.enabled) { ptp_event.type = PTP_CLOCK_PPSUSR; ptp_event.pps_times.ts_real =
From: Dan Aloni dan@kernelim.com
stable inclusion from linux-4.19.154 commit 95e7b4ee3d35c8d2bca4ad3c3365abdea0bbb71d
--------------------------------
[ Upstream commit c327a310ec4d6ecbea13185ed56c11def441d9ab ]
This was discovered using O_DIRECT at the client side, with small unaligned file offsets or IOs that span multiple file pages.
Fixes: e248aa7be86 ("svcrdma: Remove max_sge check at connect time") Signed-off-by: Dan Aloni dan@kernelim.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com
Signed-off-by: Aichun Li liaichun@huawei.com Reviewed-by: wangxiaopeng wangxiaopeng7@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index aa4d19a780d7..4062cd624b26 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -639,10 +639,11 @@ static int svc_rdma_pull_up_reply_msg(struct svcxprt_rdma *rdma, while (remaining) { len = min_t(u32, PAGE_SIZE - pageoff, remaining);
- memcpy(dst, page_address(*ppages), len); + memcpy(dst, page_address(*ppages) + pageoff, len); remaining -= len; dst += len; pageoff = 0; + ppages++; } }
From: Yang Shi shy828301@gmail.com
mainline inclusion from mainline-v5.9-rc4 commit 7867fd7cc44e63c6673cd0f8fea155456d34d0de category: bugfix bugzilla: 42216 CVE: NA
-------------------------------------------------
The syzbot reported the below use-after-free:
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline] BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline] BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145 Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 madvise_willneed mm/madvise.c:293 [inline] madvise_vma mm/madvise.c:942 [inline] do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145 do_madvise mm/madvise.c:1169 [inline] __do_sys_madvise mm/madvise.c:1171 [inline] __se_sys_madvise mm/madvise.c:1169 [inline] __x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 9992: kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482 vm_area_alloc+0x1c/0x110 kernel/fork.c:347 mmap_region+0x8e5/0x1780 mm/mmap.c:1743 do_mmap+0xcf9/0x11d0 mm/mmap.c:1545 vm_mmap_pgoff+0x195/0x200 mm/util.c:506 ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 9992: kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693 remove_vma+0x132/0x170 mm/mmap.c:184 remove_vma_list mm/mmap.c:2613 [inline] __do_munmap+0x743/0x1170 mm/mmap.c:2869 do_munmap mm/mmap.c:2877 [inline] mmap_region+0x257/0x1780 mm/mmap.c:1716 do_mmap+0xcf9/0x11d0 mm/mmap.c:1545 vm_mmap_pgoff+0x195/0x200 mm/util.c:506 ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
It is because vma is accessed after releasing mmap_lock, but someone else acquired the mmap_lock and the vma is gone.
Releasing mmap_lock after accessing vma should fix the problem.
Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()") Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com Signed-off-by: Yang Shi shy828301@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org [5.4+] Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c index 464282e24a30..1369e6d062bc 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -308,9 +308,9 @@ static long madvise_willneed(struct vm_area_struct *vma, */ *prev = NULL; /* tell sys_madvise we drop mmap_sem */ get_file(file); - up_read(¤t->mm->mmap_sem); offset = (loff_t)(start - vma->vm_start) + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); + up_read(¤t->mm->mmap_sem); vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED); fput(file); down_read(¤t->mm->mmap_sem);
From: Alistair Popple alistair@popple.id.au
mainline inclusion from mainline-v5.9-rc4 commit ad7df764b7e1c7dc64e016da7ada2e3e1bb90700 category: bugfix bugzilla: 42213 CVE: NA
-------------------------------------------------
During memory migration a pte is temporarily replaced with a migration swap pte. Some pte bits from the existing mapping such as the soft-dirty and uffd write-protect bits are preserved by copying these to the temporary migration swap pte.
However these bits are not stored at the same location for swap and non-swap ptes. Therefore testing these bits requires using the appropriate helper function for the given pte type.
Unfortunately several code locations were found where the wrong helper function is being used to test soft_dirty and uffd_wp bits which leads to them getting incorrectly set or cleared during page-migration.
Fix these by using the correct tests based on pte type.
Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Alistair Popple alistair@popple.id.au Signed-off-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Peter Xu peterx@redhat.com Cc: Jérôme Glisse jglisse@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Alistair Popple alistair@popple.id.au Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/migrate.c mm/rmap.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/migrate.c | 9 +++++++-- mm/rmap.c | 7 ++++++- 2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 70f8ad4ade3f..298c56e334cd 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2310,8 +2310,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, entry = make_migration_entry(page, mpfn & MIGRATE_PFN_WRITE); swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pte)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_present(pte)) { + if (pte_soft_dirty(pte)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + } else { + if (pte_swp_soft_dirty(pte)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + } set_pte_at(mm, addr, ptep, swp_pte);
/* diff --git a/mm/rmap.c b/mm/rmap.c index 4ca7a0db9645..aabd094d310f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1467,7 +1467,12 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, */ entry = make_migration_entry(page, 0); swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) + + /* + * pteval maps a zone device page and is therefore + * a swap pte. + */ + if (pte_swp_soft_dirty(pteval)) swp_pte = pte_swp_mksoft_dirty(swp_pte); set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); /*
From: kiyin(尹亮) kiyin@tencent.com
mainline inclusion from mainline-v5.10-rc3 commit 7bdb157cdebbf95a1cd94ed2e01b338714075d00 category: bugfix bugzilla: NA CVE: CVE-2020-25704
--------------------------------
As shown through runtime testing, the "filename" allocation is not always freed in perf_event_parse_addr_filter().
There are three possible ways that this could happen:
- It could be allocated twice on subsequent iterations through the loop, - or leaked on the success path, - or on the failure path.
Clean up the code flow to make it obvious that 'filename' is always freed in the reallocation path and in the two return paths as well.
We rely on the fact that kfree(NULL) is NOP and filename is initialized with NULL.
This fixes the leak. No other side effects expected.
[ Dan Carpenter: cleaned up the code flow & added a changelog. ] [ Ingo Molnar: updated the changelog some more. ]
Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Signed-off-by: "kiyin(尹亮)" kiyin@tencent.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: "Srivatsa S. Bhat" srivatsa@csail.mit.edu Cc: Anthony Liguori aliguori@amazon.com Reviewed-by: Jian Cheng cj.chengjian@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/events/core.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4f17dd88cfde..f759b336a0df 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9064,6 +9064,7 @@ perf_event_parse_addr_filter(struct perf_event *event, char *fstr, if (token == IF_SRC_FILE || token == IF_SRC_FILEADDR) { int fpos = token == IF_SRC_FILE ? 2 : 1;
+ kfree(filename); filename = match_strdup(&args[fpos]); if (!filename) { ret = -ENOMEM; @@ -9110,16 +9111,13 @@ perf_event_parse_addr_filter(struct perf_event *event, char *fstr, */ ret = -EOPNOTSUPP; if (!event->ctx->task) - goto fail_free_name; + goto fail;
/* look up the path and grab its inode */ ret = kern_path(filename, LOOKUP_FOLLOW, &filter->path); if (ret) - goto fail_free_name; - - kfree(filename); - filename = NULL; + goto fail;
ret = -EINVAL; if (!filter->path.dentry || @@ -9139,13 +9137,13 @@ perf_event_parse_addr_filter(struct perf_event *event, char *fstr, if (state != IF_STATE_ACTION) goto fail;
+ kfree(filename); kfree(orig);
return 0;
-fail_free_name: - kfree(filename); fail: + kfree(filename); free_filters_list(filters); kfree(orig);