[PATCH kernel-4.19 1/9] ext4: Fix unreport netlink message to userspace when fs abort

List overview All Threads
Download

newer

older

[PATCH openEuler-1.0-LTS 000/109]...

[PATCH kernel-4.19 01/74] net:...

Yang Yingliang

12 Apr 2021 12 Apr '21

4:13 p.m.

From: Ye Bin yebin10@huawei.com

hulk inclusion category: bugfix bugzilla: NA CVE: NA

-----------------------------------------------

Fixes: 5aa03d66d1db ("ext4: make ext4_abort() use __ext4_error()") Fixes: 12aed7b79111 ("ext4: report error to userspace by netlink") Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/super.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3b2f7f7ea8cba..655ba77db225e 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -89,7 +89,6 @@ static void ext4_unregister_li_request(struct super_block *sb); static void ext4_clear_request_list(void); static struct inode *ext4_get_journal_inode(struct super_block *sb, unsigned int journal_inum); -static void ext4_netlink_send_info(struct super_block *sb, int ext4_errno); static struct sock *ext4nl;

/* @@ -590,7 +589,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, smp_wmb(); sb->s_flags |= SB_RDONLY; out: - ext4_netlink_send_info(sb, 1); + ext4_netlink_send_info(sb, force_ro ? 2 : 1); }

static void flush_stashed_error_work(struct work_struct *work)

-- 2.25.1

Show replies by date

Yang Yingliang

12 Apr 12 Apr

4:13 p.m.

New subject: [PATCH kernel-4.19 2/9] mm,hwpoison: return -EBUSY when migration fails

From: Oscar Salvador osalvador@suse.de

mainline inclusion from mainline-v5.11-rc1 commit 3f4b815a439adfb8f238335612c4b28bc10084d8 category: bugfix bugzilla: NA CVE: NA

-------------------------------------------------

Currently, we return -EIO when we fail to migrate the page.

Migrations' failures are rather transient as they can happen due to several reasons, e.g: high page refcount bump, mapping->migrate_page failing etc. All meaning that at that time the page could not be migrated, but that has nothing to do with an EIO error.

Let us return -EBUSY instead, as we do in case we failed to isolate the page.

While are it, let us remove the "ret" print as its value does not change.

Link: https://lkml.kernel.org/r/20201209092818.30417-1-osalvador@suse.de Signed-off-by: Oscar Salvador osalvador@suse.de Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: David Hildenbrand david@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Shixin Liu liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/memory-failure.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fa091bfb5943a..d15ffccb2db40 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1730,7 +1730,7 @@ static int soft_offline_huge_page(struct page *page, int flags) if (!list_empty(&pagelist)) putback_movable_pages(&pagelist); if (ret > 0) - ret = -EIO; + ret = -EBUSY; } else { /* * We set PG_hwpoison only when the migration source hugepage @@ -1821,11 +1821,11 @@ static int __soft_offline_page(struct page *page, int flags) pr_info("soft offline: %#lx: migration failed %d, type %lx (%pGp)\n", pfn, ret, page->flags, &page->flags); if (ret > 0) - ret = -EIO; + ret = -EBUSY; } } else { - pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx (%pGp)\n", - pfn, ret, page_count(page), page->flags, &page->flags); + pr_info("soft offline: %#lx: isolation failed, page count %d, type %lx (%pGp)\n", + pfn, page_count(page), page->flags, &page->flags); } return ret; }

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 3/9] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

From: Vlastimil Babka vbabka@suse.cz

mainline inclusion from mainline-5.4-rc3 commit 59bb47985c1db229ccff8c5deebecd54fc77d2a9 category: bugfix bugzilla: 51349 CVE: NA

-------------------------------------------------

In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes.

That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size).

Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise.

[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a... [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.... [3] https://lwn.net/Articles/787740/

[akpm@linux-foundation.org: documentation fixlet, per Matthew] Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Acked-by: Michal Hocko mhocko@suse.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Acked-by: Christoph Hellwig hch@lst.de Cc: David Sterba dsterba@suse.cz Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Ming Lei ming.lei@redhat.com Cc: Dave Chinner david@fromorbit.com Cc: "Darrick J . Wong" darrick.wong@oracle.com Cc: Christoph Hellwig hch@lst.de Cc: James Bottomley James.Bottomley@HansenPartnership.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org (cherry picked from commit 59bb47985c1db229ccff8c5deebecd54fc77d2a9) Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/slab.h | 4 ++++ mm/slab_common.c | 11 ++++++++++- mm/slob.c | 41 ++++++++++++++++++++++++++++++----------- 3 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h index 788f04a7ca766..5785b836e9fd7 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -492,6 +492,10 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags) * kmalloc is the normal method of allocating memory * for objects smaller than page size in the kernel. * + * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN + * bytes. For @size of power of two bytes, the alignment is also guaranteed + * to be at least to the size. + * * The @flags argument may be one of: * * %GFP_USER - Allocate memory on behalf of user. May sleep. diff --git a/mm/slab_common.c b/mm/slab_common.c index d208b47e01a8e..6b1cbf89a6861 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -999,10 +999,19 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, unsigned int useroffset, unsigned int usersize) { int err; + unsigned int align = ARCH_KMALLOC_MINALIGN;

s->name = name; s->size = s->object_size = size; - s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size); + + /* + * For power of two sizes, guarantee natural alignment for kmalloc + * caches, regardless of SL*B debugging options. + */ + if (is_power_of_2(size)) + align = max(align, size); + s->align = calculate_alignment(flags, align, size); + s->useroffset = useroffset; s->usersize = usersize;

diff --git a/mm/slob.c b/mm/slob.c index 307c2c9feb441..fdf284009be92 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -215,7 +215,7 @@ static void slob_free_pages(void *b, int order) /* * Allocate a slob block within a given slob_page sp. */ -static void *slob_page_alloc(struct page *sp, size_t size, int align) +static void *slob_page_alloc(struct page *sp, size_t size, int align, int align_offset) { slob_t *prev, *cur, *aligned = NULL; int delta = 0, units = SLOB_UNITS(size); @@ -223,8 +223,17 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align) for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) { slobidx_t avail = slob_units(cur);

+ /* + * 'aligned' will hold the address of the slob block so that the + * address 'aligned'+'align_offset' is aligned according to the + * 'align' parameter. This is for kmalloc() which prepends the + * allocated block with its size, so that the block itself is + * aligned when needed. + */ if (align) { - aligned = (slob_t *)ALIGN((unsigned long)cur, align); + aligned = (slob_t *) + (ALIGN((unsigned long)cur + align_offset, align) + - align_offset); delta = aligned - cur; } if (avail >= units + delta) { /* room enough? */ @@ -266,7 +275,8 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align) /* * slob_alloc: entry point into the slob allocator. */ -static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) +static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, + int align_offset) { struct page *sp; struct list_head *prev; @@ -298,7 +308,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)

/* Attempt to alloc */ prev = sp->lru.prev; - b = slob_page_alloc(sp, size, align); + b = slob_page_alloc(sp, size, align, align_offset); if (!b) continue;

@@ -326,7 +336,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) INIT_LIST_HEAD(&sp->lru); set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE)); set_slob_page_free(sp, slob_list); - b = slob_page_alloc(sp, size, align); + b = slob_page_alloc(sp, size, align, align_offset); BUG_ON(!b); spin_unlock_irqrestore(&slob_lock, flags); } @@ -428,7 +438,7 @@ static __always_inline void * __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller) { unsigned int *m; - int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); + int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); void *ret;

gfp &= gfp_allowed_mask; @@ -436,19 +446,28 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller) fs_reclaim_acquire(gfp); fs_reclaim_release(gfp);

- if (size < PAGE_SIZE - align) { + if (size < PAGE_SIZE - minalign) { + int align = minalign; + + /* + * For power of two sizes, guarantee natural alignment for + * kmalloc()'d objects. + */ + if (is_power_of_2(size)) + align = max(minalign, (int) size); + if (!size) return ZERO_SIZE_PTR;

- m = slob_alloc(size + align, gfp, align, node); + m = slob_alloc(size + minalign, gfp, align, node, minalign);

if (!m) return NULL; *m = size; - ret = (void *)m + align; + ret = (void *)m + minalign;

trace_kmalloc_node(caller, ret, - size, size + align, gfp, node); + size, size + minalign, gfp, node); } else { unsigned int order = get_order(size);

@@ -544,7 +563,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node) fs_reclaim_release(flags);

if (c->size < PAGE_SIZE) { - b = slob_alloc(c->size, flags, c->align, node); + b = slob_alloc(c->size, flags, c->align, node, 0); trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size, SLOB_UNITS(c->size) * SLOB_UNIT, flags, node);

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 4/9] block: Use non _rcu version of list functions for tag_set_list

From: Daniel Wagner dwagner@suse.de

mainline inclusion from mainline-5.9-rc1 commit 08c875cbf481d74db82d6bba2fbcf580087dee24 category: bugfix bugzilla: 39772 CVE: NA

---------------------------

tag_set_list is only accessed under the tag_set_lock lock. There is no need for using the _rcu list functions.

The _rcu list function were introduced to allow read access to the tag_set_list protected under RCU, see 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") and 05b79413946d ("Revert "blk-mq: don't handle TAG_SHARED in restart""). Those changes got reverted later but the cleanup commit missed a couple of places to undo the changes.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()" Signed-off-by: Daniel Wagner dwagner@suse.de Reviewed-by: Hannes Reinecke hare@suse.de Cc: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-mq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c index 634a7155ffb95..1699fad606cd3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2530,7 +2530,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) struct blk_mq_tag_set *set = q->tag_set;

mutex_lock(&set->tag_list_lock); - list_del_rcu(&q->tag_set_list); + list_del(&q->tag_set_list); if (list_is_singular(&set->tag_list)) { /* just transitioned to unshared */ set->flags &= ~BLK_MQ_F_TAG_SHARED; @@ -2559,7 +2559,7 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set, } if (set->flags & BLK_MQ_F_TAG_SHARED) queue_set_hctx_shared(q, true); - list_add_tail_rcu(&q->tag_set_list, &set->tag_list); + list_add_tail(&q->tag_set_list, &set->tag_list);

mutex_unlock(&set->tag_list_lock); }

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 5/9] block: respect queue limit of max discard segment

From: Ming Lei ming.lei@redhat.com

mainline inclusion from mainline-v5.9-rc3 commit 943b40c832beb71115e38a1c4d99b640b5342738 category: bugfix bugzilla: 41908 CVE: NA

---------------------------

When queue_max_discard_segments(q) is 1, blk_discard_mergable() will return false for discard request, then normal request merge is applied. However, only queue_max_segments() is checked, so max discard segment limit isn't respected.

Check max discard segment limit in the request merge code for fixing the issue.

Discard request failure of virtio_blk is fixed.

Fixes: 69840466086d ("block: fix the DISCARD request merge") Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Cc: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk

Conflict: block/blk-merge.c Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-merge.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c index 044bff9afa5e7..d24a6c9398ed8 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -477,13 +477,20 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, } EXPORT_SYMBOL(blk_rq_map_sg);

+static inline unsigned int blk_rq_get_max_segments(struct request *rq) +{ + if (req_op(rq) == REQ_OP_DISCARD) + return queue_max_discard_segments(rq->q); + return queue_max_segments(rq->q); +} + static inline int ll_new_hw_segment(struct request_queue *q, struct request *req, struct bio *bio) { int nr_phys_segs = bio_phys_segments(q, bio);

- if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q)) + if (req->nr_phys_segments + nr_phys_segs > blk_rq_get_max_segments(req)) goto no_merge;

if (blk_integrity_merge_bio(q, req, bio) == false) @@ -606,7 +613,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, total_phys_segments--; }

- if (total_phys_segments > queue_max_segments(q)) + if (total_phys_segments > blk_rq_get_max_segments(req)) return 0;

if (blk_integrity_merge_rq(q, req, next) == false)

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 6/9] RDMA/hns: fix timer, gid_type, scc cfg

From: wanglin wanglin137@huawei.com

driver inclusion category: bugfix bugzilla: NA CVE: NA

This patch fix timer, gid_type, scc cfg not just for HIP08_B.

Reviewed-by: Hu Chunzhi huchunzhi@huawei.com Reviewed-by: Zhao Weibo zhaoweibo3@huawei.com Signed-off-by: wanglin wanglin137@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/infiniband/hw/hns/hns_roce_hw_sysfs_v2.c | 2 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 16 ++++++++-------- 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_sysfs_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_sysfs_v2.c index a780a0e10386a..c6106379d2433 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_sysfs_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_sysfs_v2.c @@ -340,7 +340,7 @@ int hns_roce_v2_query_pkt_stat(struct hns_roce_dev *hr_dev, if (status) return status;

- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) { + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) { hns_roce_cmq_setup_basic_desc(&desc_cnp_rx, HNS_ROCE_OPC_QUEYR_CNP_RX_CNT, true); status = hns_roce_cmq_send(hr_dev, &desc_cnp_rx, 1); diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 864487b03ee7c..e8ae603872eba 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -1873,7 +1873,7 @@ static void set_default_caps(struct hns_roce_dev *hr_dev) caps->max_srq_wrs = HNS_ROCE_V2_MAX_SRQ_WR; caps->max_srq_sges = HNS_ROCE_V2_MAX_SRQ_SGE;

- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) { + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) { caps->flags |= HNS_ROCE_CAP_FLAG_ATOMIC | HNS_ROCE_CAP_FLAG_MW | HNS_ROCE_CAP_FLAG_SRQ | HNS_ROCE_CAP_FLAG_FRMR | HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL; @@ -2122,7 +2122,7 @@ static int hns_roce_query_pf_caps(struct hns_roce_dev *hr_dev) caps->srqc_bt_num, &caps->srqc_buf_pg_sz, &caps->srqc_ba_pg_sz, HEM_TYPE_SRQC);

- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) { + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) { caps->qpc_timer_hop_num = HNS_ROCE_HOP_NUM_0; caps->cqc_timer_hop_num = HNS_ROCE_HOP_NUM_0; caps->scc_ctx_hop_num = ctx_hop_num; @@ -2186,7 +2186,7 @@ static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev) if (ret) return ret;

- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) { + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) { ret = hns_roce_query_pf_timer_resource(hr_dev); if (ret) { dev_err(hr_dev->dev, @@ -2202,7 +2202,7 @@ static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev) return ret; }

- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) { + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) { ret = hns_roce_set_vf_switch_param(hr_dev, 0); if (ret) { dev_err(hr_dev->dev, @@ -2503,7 +2503,7 @@ static int hns_roce_v2_init(struct hns_roce_dev *hr_dev) goto err_cqc_timer_failed; } } - if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) hns_roce_clear_extdb_list_info(hr_dev);

return 0; @@ -2531,7 +2531,7 @@ static void hns_roce_v2_exit(struct hns_roce_dev *hr_dev) { struct hns_roce_v2_priv *priv = hr_dev->priv;

- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) hns_roce_function_clear(hr_dev);

hns_roce_free_link_table(hr_dev, &priv->tpq); @@ -4873,10 +4873,10 @@ static int hns_roce_v2_set_path(struct ib_qp *ibqp, V2_QPC_BYTE_24_HOP_LIMIT_S, 0);

#ifdef CONFIG_KERNEL_419 - if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B && + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B && gid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) #else - if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B && + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B && gid_attr.gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) #endif roce_set_field(context->byte_24_mtu_tc, V2_QPC_BYTE_24_TC_M,

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 7/9] share_pool: Add parameter checking

From: Tang Yizhou tangyizhou@huawei.com

ascend inclusion category: bugfix bugzilla: 51980 CVE: NA

-------------------------------------------------

1. Add parameter checking in sp_alloc. 2. Add variable initialization in sp_group_exit.

Signed-off-by: Tang Yizhou tangyizhou@huawei.com Reviewed-by: Ding Tianhong dingtianhong@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/share_pool.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/share_pool.c b/mm/share_pool.c index d99b3a8b3c005..ffcddd53ea3a3 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -1329,6 +1329,11 @@ void *sp_alloc(unsigned long size, unsigned long sp_flags, int spg_id) if (enable_mdc_default_group) spg_id = mdc_default_group_id;

+ if (unlikely(!size)) { + pr_err_ratelimited("share pool: allocation failed, invalid size %lu\n", size); + return ERR_PTR(-EINVAL); + } + if (spg_id != SPG_ID_DEFAULT && spg_id < SPG_ID_MIN) { pr_err_ratelimited("share pool: allocation failed, invalid group id %d\n", spg_id); return ERR_PTR(-EINVAL); @@ -2962,7 +2967,7 @@ EXPORT_SYMBOL(sharepool_no_page); void sp_group_exit(struct mm_struct *mm) { struct sp_group *spg = mm->sp_group; - bool is_alive; + bool is_alive = true;

if (!spg || !enable_ascend_share_pool) return;

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 8/9] block: only update parent bi_status when bio fail

From: Yufen Yu yuyufen@huawei.com

mainline inclusion from mainline-v5.12-rc3 commit 3edf5346e4f2ce2fa0c94651a90a8dda169565ee category: bugfix bugzilla: 47613 CVE: NA ---------------------------

For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following:

split bio(READ) kworker

nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work);

bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status)

<interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit>

parent->bi_status = 0 parent->bi_end_io() // return bi_status=0

The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker.

In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error.

Suggested-by: Keith Busch kbusch@kernel.org Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Kuai Yu yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/bio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c index 94d0f4798b5bf..da05350dfba28 100644 --- a/block/bio.c +++ b/block/bio.c @@ -314,7 +314,7 @@ static struct bio *__bio_chain_endio(struct bio *bio) { struct bio *parent = bio->bi_private;

- if (!parent->bi_status) + if (bio->bi_status && !parent->bi_status) parent->bi_status = bio->bi_status; bio_put(bio); return parent;

-- 2.25.1

Yang Yingliang

4:13 p.m.

New subject: [PATCH kernel-4.19 9/9] KVM: SVM: Periodically schedule when unregistering regions on destroy

From: David Rientjes rientjes@google.com

mainline inclusion from mainline-v5.9-rc5 commit 7be74942f184fdfba34ddd19a0d995deb34d4a03 category: bugfix bugzilla: NA CVE: CVE-2020-36311

--------------------------------

There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15:

watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130

Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel.

Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable.

Signed-off-by: David Rientjes rientjes@google.com Message-Id: alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Conflicts: arch/x86/kvm/svm.c [yyl: arch/x86/kvm/svm/sev.c rename from arch/x86/kvm/svm.c] Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/x86/kvm/svm.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 8cb9277aa6ff2..334480e198769 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1954,6 +1954,7 @@ static void sev_vm_destroy(struct kvm *kvm) list_for_each_safe(pos, q, head) { __unregister_enc_region_locked(kvm, list_entry(pos, struct enc_region, list)); + cond_resched(); } }

-- 2.25.1

1299

Age (days ago)

1299

Last active (days ago)

kernel@openeuler.org

8 comments

1 participants

tags (0)

participants (1)

Yang Yingliang