From: Li Nan linan122@huawei.com
hulk inclusion category: bugfix bugzilla: 187584, https://gitee.com/openeuler/kernel/issues/I5QW2R CVE: NA
--------------------------------
This reverts commit 37838982c0e97898780682980aaf10755625630e.
There are two wbt_enable_default() in bfq_exit_queue(). Although it will not lead to no fault, revert one.
Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/bfq-iosched.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 4bfea5e5354e..1aec01c0a707 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6418,8 +6418,6 @@ static void bfq_exit_queue(struct elevator_queue *e) spin_unlock_irq(&bfqd->lock); #endif
- wbt_enable_default(bfqd->queue); - kfree(bfqd);
/* Re-enable throttling in case elevator disabled it */
From: Ye Bin yebin10@huawei.com
mainline inclusion from mainline-v5.19-rc3 commit 9b6641dd95a0c441b277dd72ba22fed8d61f76ad category: bugfix bugzilla: 186927, https://gitee.com/src-openeuler/kernel/issues/I5YIY6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We got issue as follows: [home]# mount /dev/sda test EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended [home]# dmesg EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended EXT4-fs (sda): Errors on filesystem, clearing orphan list. EXT4-fs (sda): recovery complete EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none. [home]# debugfs /dev/sda debugfs 1.46.5 (30-Dec-2021) Checksum errors in superblock! Retrying...
Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update super block checksum.
To solve above issue, defer update super block checksum after ext4_orphan_cleanup.
Signed-off-by: Ye Bin yebin10@huawei.com Cc: stable@kernel.org Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ritesh Harjani ritesh.list@gmail.com Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/ext4/super.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3bb1a70c2db0..9e7d204c9730 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5079,14 +5079,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) err = percpu_counter_init(&sbi->s_freeinodes_counter, freei, GFP_KERNEL); } - /* - * Update the checksum after updating free space/inode - * counters. Otherwise the superblock can have an incorrect - * checksum in the buffer cache until it is written out and - * e2fsprogs programs trying to open a file system immediately - * after it is mounted can fail. - */ - ext4_superblock_csum_set(sb); if (!err) err = percpu_counter_init(&sbi->s_dirs_counter, ext4_count_dirs(sb), GFP_KERNEL); @@ -5141,6 +5133,14 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS; ext4_orphan_cleanup(sb, es); EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS; + /* + * Update the checksum after updating free space/inode counters and + * ext4_orphan_cleanup. Otherwise the superblock can have an incorrect + * checksum in the buffer cache until it is written out and + * e2fsprogs programs trying to open a file system immediately + * after it is mounted can fail. + */ + ext4_superblock_csum_set(sb); if (needs_recovery) { ext4_msg(sb, KERN_INFO, "recovery complete"); err = ext4_mark_recovery_complete(sb, es);
From: Guangbin Huang huangguangbin2@huawei.com
mainline inclusion from mainline-master commit cfdcb075048c1e886c45a9c9e681ed222f74ecb9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXH9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
As the argument struct hnae3_handle *h of function hclge_get_dscp_prio() can be other client registered in hnae3 layer, we need to transform it into hnae3_handle of local nic client to get right dscp settings for other clients.
Fixes: dfea275e06c2 ("net: hns3: optimize converting dscp to priority process of hns3_nic_select_queue()") Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Jiantao Xiao xiaojiantao1@h-partners.com Reviewed-by: Jian Shen shenjian15@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 823c6ea7b0e0..aa5adae87c47 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -13017,14 +13017,16 @@ static void hclge_clean_vport_config(struct hnae3_ae_dev *ae_dev, int num_vfs) static int hclge_get_dscp_prio(struct hnae3_handle *h, u8 dscp, u8 *tc_mode, u8 *priority) { + struct hclge_vport *vport = hclge_get_vport(h); + if (dscp >= HNAE3_MAX_DSCP) return -EINVAL;
if (tc_mode) - *tc_mode = h->kinfo.tc_map_mode; + *tc_mode = vport->nic.kinfo.tc_map_mode; if (priority) - *priority = h->kinfo.dscp_prio[dscp] == HNAE3_PRIO_ID_INVALID ? 0 : - h->kinfo.dscp_prio[dscp]; + *priority = vport->nic.kinfo.dscp_prio[dscp] == HNAE3_PRIO_ID_INVALID ? 0 : + vport->nic.kinfo.dscp_prio[dscp];
return 0; }
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60IHY CVE: NA
--------------------------------
This reverts commit eeabdc14ef8231fea94074b744d9648805a4015b. Prepare to backport solution from mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/bfq-cgroup.c | 16 +++++----------- block/bfq-wf2q.c | 9 --------- 2 files changed, 5 insertions(+), 20 deletions(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index f84a88b7a09d..abd01025d043 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -643,7 +643,6 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, struct bfq_group *bfqg) { struct bfq_entity *entity = &bfqq->entity; - struct bfq_group *old_parent = bfqq_group(bfqq);
if (bfqq == &bfqd->oom_bfqq) return; @@ -667,22 +666,18 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfq_deactivate_bfqq(bfqd, bfqq, false, false); else if (entity->on_st_or_in_serv) bfq_put_idle_entity(bfq_entity_service_tree(entity), entity); + bfqg_and_blkg_put(bfqq_group(bfqq));
entity->parent = bfqg->my_entity; entity->sched_data = &bfqg->sched_data; /* pin down bfqg and its associated blkg */ bfqg_and_blkg_get(bfqg);
- /* - * Don't leave the bfqq->pos_root to old bfqg, since the ref to old - * bfqg will be released and the bfqg might be freed. - */ - if (unlikely(!bfqd->nonrot_with_queueing)) - bfq_pos_tree_add_move(bfqd, bfqq); - bfqg_and_blkg_put(old_parent); - - if (bfq_bfqq_busy(bfqq)) + if (bfq_bfqq_busy(bfqq)) { + if (unlikely(!bfqd->nonrot_with_queueing)) + bfq_pos_tree_add_move(bfqd, bfqq); bfq_activate_bfqq(bfqd, bfqq); + }
if (!bfqd->in_service_queue && !bfqd->rq_in_driver) bfq_schedule_dispatch(bfqd); @@ -964,7 +959,6 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
put_async_queues: bfq_put_async_queues(bfqd, bfqg); - pd->plid = BLKCG_MAX_POLS;
spin_unlock_irqrestore(&bfqd->lock, flags); /* diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c index 5a6cb0513c4f..26776bdbdf36 100644 --- a/block/bfq-wf2q.c +++ b/block/bfq-wf2q.c @@ -1695,15 +1695,6 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq, */ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq) { -#ifdef CONFIG_BFQ_GROUP_IOSCHED - /* If parent group is offlined, move the bfqq to root group */ - if (bfqq->entity.parent) { - struct bfq_group *bfqg = bfq_bfqq_to_bfqg(bfqq); - - if (bfqg->pd.plid >= BLKCG_MAX_POLS) - bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); - } -#endif bfq_log_bfqq(bfqd, bfqq, "add to busy");
bfq_activate_bfqq(bfqd, bfqq);
From: Jan Kara jack@suse.cz
stable inclusion from stable-v5.10.121 commit 70a7dea84639bcd029130e00e01792eb9207fb38 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60IHY CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 09f871868080c33992cd6a9b72a5ca49582578fa upstream.
Track whether bfq_group is still online. We cannot rely on blkcg_gq->online because that gets cleared only after all policies are offlined and we need something that gets updated already under bfqd->lock when we are cleaning up our bfq_group to be able to guarantee that when we see online bfq_group, it will stay online while we are holding bfqd->lock lock.
CC: stable@vger.kernel.org Tested-by: "yukuai (C)" yukuai3@huawei.com Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/bfq-cgroup.c | 3 ++- block/bfq-iosched.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index abd01025d043..6846bfe03912 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -555,6 +555,7 @@ static void bfq_pd_init(struct blkg_policy_data *pd) */ bfqg->bfqd = bfqd; bfqg->active_entities = 0; + bfqg->online = true; bfqg->rq_pos_tree = RB_ROOT; }
@@ -601,7 +602,6 @@ struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, struct bfq_entity *entity;
bfqg = bfq_lookup_bfqg(bfqd, blkcg); - if (unlikely(!bfqg)) return NULL;
@@ -959,6 +959,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
put_async_queues: bfq_put_async_queues(bfqd, bfqg); + bfqg->online = false;
spin_unlock_irqrestore(&bfqd->lock, flags); /* diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index fb51e5ce9400..b116ed48d27d 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -901,6 +901,8 @@ struct bfq_group {
/* reference counter (see comments in bfq_bic_update_cgroup) */ int ref; + /* Is bfq_group still online? */ + bool online;
struct bfq_entity entity; struct bfq_sched_data sched_data;
From: Jan Kara jack@suse.cz
stable inclusion from stable-v5.10.121 commit 0285718e28259e41f405a038ee0e6bb984fd1b34 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60IHY CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 4e54a2493e582361adc3bfbf06c7d50d19d18837 upstream.
BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio would not be associated with any blkcg, the usage of __bio_blkcg() in BFQ is prone to races with the task being migrated between cgroups as __bio_blkcg() calls at different places could return different blkcgs.
Convert BFQ to the new situation where bio->bi_blkg is initialized in bio_set_dev() and thus practically always valid. This allows us to save blkcg_gq lookup and noticeably simplify the code.
CC: stable@vger.kernel.org Fixes: 0fe061b9f03c ("blkcg: fix ref count issue with bio_blkcg() using task_css") Tested-by: "yukuai (C)" yukuai3@huawei.com Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/bfq-cgroup.c | 63 +++++++++++++++++---------------------------- block/bfq-iosched.c | 10 +------ block/bfq-iosched.h | 3 +-- 3 files changed, 25 insertions(+), 51 deletions(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 6846bfe03912..168faa2c8c3b 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -584,27 +584,11 @@ static void bfq_group_set_parent(struct bfq_group *bfqg, entity->sched_data = &parent->sched_data; }
-static struct bfq_group *bfq_lookup_bfqg(struct bfq_data *bfqd, - struct blkcg *blkcg) +static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg) { - struct blkcg_gq *blkg; - - blkg = blkg_lookup(blkcg, bfqd->queue); - if (likely(blkg)) - return blkg_to_bfqg(blkg); - return NULL; -} - -struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, - struct blkcg *blkcg) -{ - struct bfq_group *bfqg, *parent; + struct bfq_group *parent; struct bfq_entity *entity;
- bfqg = bfq_lookup_bfqg(bfqd, blkcg); - if (unlikely(!bfqg)) - return NULL; - /* * Update chain of bfq_groups as we might be handling a leaf group * which, along with some of its relatives, has not been hooked yet @@ -621,8 +605,15 @@ struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, bfq_group_set_parent(curr_bfqg, parent); } } +}
- return bfqg; +struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio) +{ + struct blkcg_gq *blkg = bio->bi_blkg; + + if (!blkg) + return bfqd->root_group; + return blkg_to_bfqg(blkg); }
/** @@ -694,25 +685,15 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, * Move bic to blkcg, assuming that bfqd->lock is held; which makes * sure that the reference to cgroup is valid across the call (see * comments in bfq_bic_update_cgroup on this issue) - * - * NOTE: an alternative approach might have been to store the current - * cgroup in bfqq and getting a reference to it, reducing the lookup - * time here, at the price of slightly more complex code. */ -static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd, - struct bfq_io_cq *bic, - struct blkcg *blkcg) +static void *__bfq_bic_change_cgroup(struct bfq_data *bfqd, + struct bfq_io_cq *bic, + struct bfq_group *bfqg) { struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0); struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1); - struct bfq_group *bfqg; struct bfq_entity *entity;
- bfqg = bfq_find_set_group(bfqd, blkcg); - - if (unlikely(!bfqg)) - bfqg = bfqd->root_group; - if (async_bfqq) { entity = &async_bfqq->entity;
@@ -764,20 +745,24 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd, void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) { struct bfq_data *bfqd = bic_to_bfqd(bic); - struct bfq_group *bfqg = NULL; + struct bfq_group *bfqg = bfq_bio_bfqg(bfqd, bio); uint64_t serial_nr;
- rcu_read_lock(); - serial_nr = __bio_blkcg(bio)->css.serial_nr; + serial_nr = bfqg_to_blkg(bfqg)->blkcg->css.serial_nr;
/* * Check whether blkcg has changed. The condition may trigger * spuriously on a newly created cic but there's no harm. */ if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr)) - goto out; + return;
- bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio)); + /* + * New cgroup for this process. Make sure it is linked to bfq internal + * cgroup hierarchy. + */ + bfq_link_bfqg(bfqd, bfqg); + __bfq_bic_change_cgroup(bfqd, bic, bfqg); /* * Update blkg_path for bfq_log_* functions. We cache this * path, and update it here, for the following @@ -830,8 +815,6 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) */ blkg_path(bfqg_to_blkg(bfqg), bfqg->blkg_path, sizeof(bfqg->blkg_path)); bic->blkcg_serial_nr = serial_nr; -out: - rcu_read_unlock(); }
/** @@ -1449,7 +1432,7 @@ void bfq_end_wr_async(struct bfq_data *bfqd) bfq_end_wr_async_queues(bfqd, bfqd->root_group); }
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, struct blkcg *blkcg) +struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio) { return bfqd->root_group; } diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 1aec01c0a707..6edc00da5b57 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5175,14 +5175,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd, struct bfq_queue *bfqq; struct bfq_group *bfqg;
- rcu_read_lock(); - - bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio)); - if (!bfqg) { - bfqq = &bfqd->oom_bfqq; - goto out; - } - + bfqg = bfq_bio_bfqg(bfqd, bio); if (!is_sync) { async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class, ioprio); @@ -5226,7 +5219,6 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd, out: bfqq->ref++; /* get a process reference to this queue */ bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq, bfqq->ref); - rcu_read_unlock(); return bfqq; }
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index b116ed48d27d..2a4a6f44efff 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -984,8 +984,7 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg); void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio); void bfq_end_wr_async(struct bfq_data *bfqd); -struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, - struct blkcg *blkcg); +struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio); struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg); struct bfq_group *bfqq_group(struct bfq_queue *bfqq); struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node);
From: Jan Kara jack@suse.cz
stable inclusion from stable-v5.10.121 commit 51f724bffa3403a5236597e6b75df7329c1ec6e9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60IHY CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 075a53b78b815301f8d3dd1ee2cd99554e34f0dd upstream.
Bios queued into BFQ IO scheduler can be associated with a cgroup that was already offlined. This may then cause insertion of this bfq_group into a service tree. But this bfq_group will get freed as soon as last bio associated with it is completed leading to use after free issues for service tree users. Fix the problem by making sure we always operate on online bfq_group. If the bfq_group associated with the bio is not online, we pick the first online parent.
CC: stable@vger.kernel.org Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" yukuai3@huawei.com Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/bfq-cgroup.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 168faa2c8c3b..f99351017182 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -610,10 +610,19 @@ static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg) struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio) { struct blkcg_gq *blkg = bio->bi_blkg; + struct bfq_group *bfqg;
- if (!blkg) - return bfqd->root_group; - return blkg_to_bfqg(blkg); + while (blkg) { + bfqg = blkg_to_bfqg(blkg); + if (bfqg->online) { + bio_associate_blkg_from_css(bio, &blkg->blkcg->css); + return bfqg; + } + blkg = blkg->parent; + } + bio_associate_blkg_from_css(bio, + &bfqg_to_blkg(bfqd->root_group)->blkcg->css); + return bfqd->root_group; }
/**
From: Li Huafei lihuafei1@huawei.com
mainline inclusion from mainline-v6.1-rc4 commit 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I600G0 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
KASAN reported a use-after-free with ftrace ops [1]. It was found from vmcore that perf had registered two ops with the same content successively, both dynamic. After unregistering the second ops, a use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the FTRACE_UPDATE_CALLS command is not set because there is another enabled ops with the same content. Also, both ops are dynamic and the ftrace callback function is ftrace_ops_list_func, so the FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value of 'command' will be 0 and ftrace_shutdown() will skip the rcu synchronization.
However, ftrace may be activated. When the ops is released, another CPU may be accessing the ops. Add the missing synchronization to fix this problem.
[1] BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b4/0x248 lib/dump_stack.c:118 print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387 __kasan_report mm/kasan/report.c:547 [inline] kasan_report+0x118/0x210 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load8+0x98/0xc0 mm/kasan/generic.c:253 __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 ftrace_graph_call+0x0/0x4 __might_sleep+0x8/0x100 include/linux/perf_event.h:1170 __might_fault mm/memory.c:5183 [inline] __might_fault+0x58/0x70 mm/memory.c:5171 do_strncpy_from_user lib/strncpy_from_user.c:41 [inline] strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139 getname_flags+0xb0/0x31c fs/namei.c:149 getname+0x2c/0x40 fs/namei.c:209 [...]
Allocated by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:479 [inline] __kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493 kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950 kmalloc include/linux/slab.h:563 [inline] kzalloc include/linux/slab.h:675 [inline] perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 __arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723 [...]
Freed by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track+0x24/0x34 mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358 __kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437 __kasan_slab_free mm/kasan/common.c:445 [inline] kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446 slab_free_hook mm/slub.c:1569 [inline] slab_free_freelist_hook mm/slub.c:1608 [inline] slab_free mm/slub.c:3179 [inline] kfree+0x12c/0xc10 mm/slub.c:4176 perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 [...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei1...
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled") Cc: stable@vger.kernel.org Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Li Huafei lihuafei1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/trace/ftrace.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 4f40bc2f90a7..945e87b0084e 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2937,18 +2937,8 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command) command |= FTRACE_UPDATE_TRACE_FUNC; }
- if (!command || !ftrace_enabled) { - /* - * If these are dynamic or per_cpu ops, they still - * need their data freed. Since, function tracing is - * not currently active, we can just free them - * without synchronizing all CPUs. - */ - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) - goto free_ops; - - return 0; - } + if (!command || !ftrace_enabled) + goto out;
/* * If the ops uses a trampoline, then it needs to be @@ -2985,6 +2975,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command) removed_ops = NULL; ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out: /* * Dynamic ops may be freed, we must make sure that all * callers are done before leaving this function. @@ -3012,7 +3003,6 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command) if (IS_ENABLED(CONFIG_PREEMPTION)) synchronize_rcu_tasks();
- free_ops: ftrace_trampoline_free(ops); }
From: Lei Zhou zhoulei154@h-partners.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60FNG
--------------------------------------------------------------
The PTT device can only filter the devices on the same PCIe core, within BDF range [lower_bdf, upper_bdf]. Add the miss checking when initialize the filters list.
Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
Signed-off-by: Wangming Shao shaowangming@h-partners.com Signed-off-by: Lei Zhou zhoulei154@h-partners.com Reviewed-by: Yicong Yang yangyicong@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/hwtracing/ptt/hisi_ptt.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index 5d5526aa60c4..30f1525639b5 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -356,8 +356,18 @@ static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt)
static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) { + struct pci_dev *root_port = pcie_find_root_port(pdev); struct hisi_ptt_filter_desc *filter; struct hisi_ptt *hisi_ptt = data; + u32 port_devid; + + if (!root_port) + return 0; + + port_devid = PCI_DEVID(root_port->bus->number, root_port->devfn); + if (port_devid < hisi_ptt->lower_bdf || + port_devid > hisi_ptt->upper_bdf) + return 0;
/* * We won't fail the probe if filter allocation failed here. The filters
From: Yicong Yang yangyicong@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60FNG
--------------------------------------------------------------------------
Some event id of hisi-pcie-pmu is incorrect, fix them.
Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") Signed-off-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Jay Fang f.fangjian@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/perf/hisilicon/hisi_pcie_pmu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c index 2f18838754ec..e54f37a1dc92 100644 --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -696,10 +696,10 @@ static struct attribute *hisi_pcie_pmu_events_attr[] = { HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_cnt, 0x10210), HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_latency, 0x0011), HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_cnt, 0x10011), - HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_flux, 0x1005), - HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_time, 0x11005), - HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_flux, 0x2004), - HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_time, 0x12004), + HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_flux, 0x0804), + HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_time, 0x10804), + HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_flux, 0x0405), + HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_time, 0x10405), NULL };
From: Yicong Yang yangyicong@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60FNG
--------------------------------------------------------------------------
The PMU support to filter the TLP when counting the bandwidth with below options:
- only count the TLP headers - only count the TLP payloads - count both TLP headers and payloads
In the current driver it's default to count the TLP payloads only, which will have an implicity side effects that on the traffic only have header only TLPs, we'll get no data.
Make this user configuration through "len_mode" parameter and make it default to count both TLP headers and payloads when user not specified. Also update the documentation for it.
Signed-off-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Jay Fang f.fangjian@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/perf/hisilicon/hisi_pcie_pmu.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c index e54f37a1dc92..cd5b719b8c2e 100644 --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -47,10 +47,14 @@ #define HISI_PCIE_EVENT_M GENMASK_ULL(15, 0) #define HISI_PCIE_THR_MODE_M GENMASK_ULL(27, 27) #define HISI_PCIE_THR_M GENMASK_ULL(31, 28) +#define HISI_PCIE_LEN_M GENMASK_ULL(35, 34) #define HISI_PCIE_TARGET_M GENMASK_ULL(52, 36) #define HISI_PCIE_TRIG_MODE_M GENMASK_ULL(53, 53) #define HISI_PCIE_TRIG_M GENMASK_ULL(59, 56)
+/* Default config of TLP length mode, will count both TLP headers and payloads */ +#define HISI_PCIE_LEN_M_DEFAULT 3ULL + #define HISI_PCIE_MAX_COUNTERS 8 #define HISI_PCIE_REG_STEP 8 #define HISI_PCIE_THR_MAX_VAL 10 @@ -91,6 +95,7 @@ HISI_PCIE_PMU_FILTER_ATTR(thr_len, config1, 3, 0); HISI_PCIE_PMU_FILTER_ATTR(thr_mode, config1, 4, 4); HISI_PCIE_PMU_FILTER_ATTR(trig_len, config1, 8, 5); HISI_PCIE_PMU_FILTER_ATTR(trig_mode, config1, 9, 9); +HISI_PCIE_PMU_FILTER_ATTR(len_mode, config1, 11, 10); HISI_PCIE_PMU_FILTER_ATTR(port, config2, 15, 0); HISI_PCIE_PMU_FILTER_ATTR(bdf, config2, 31, 16);
@@ -218,8 +223,8 @@ static void hisi_pcie_pmu_config_filter(struct perf_event *event) { struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); struct hw_perf_event *hwc = &event->hw; + u64 port, trig_len, thr_len, len_mode; u64 reg = HISI_PCIE_INIT_SET; - u64 port, trig_len, thr_len;
/* Config HISI_PCIE_EVENT_CTRL according to event. */ reg |= FIELD_PREP(HISI_PCIE_EVENT_M, hisi_pcie_get_real_event(event)); @@ -248,6 +253,12 @@ static void hisi_pcie_pmu_config_filter(struct perf_event *event) reg |= HISI_PCIE_THR_EN; }
+ len_mode = hisi_pcie_get_len_mode(event); + if (len_mode) + reg |= FIELD_PREP(HISI_PCIE_LEN_M, len_mode); + else + reg |= FIELD_PREP(HISI_PCIE_LEN_M, HISI_PCIE_LEN_M_DEFAULT); + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, hwc->idx, reg); }
@@ -714,6 +725,7 @@ static struct attribute *hisi_pcie_pmu_format_attr[] = { HISI_PCIE_PMU_FORMAT_ATTR(thr_mode, "config1:4"), HISI_PCIE_PMU_FORMAT_ATTR(trig_len, "config1:5-8"), HISI_PCIE_PMU_FORMAT_ATTR(trig_mode, "config1:9"), + HISI_PCIE_PMU_FORMAT_ATTR(len_mode, "config1:10-11"), HISI_PCIE_PMU_FORMAT_ATTR(port, "config2:0-15"), HISI_PCIE_PMU_FORMAT_ATTR(bdf, "config2:16-31"), NULL
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.17-rc1 commit 2d86293c70750e4331e9616aded33ab6b47c299d category: bugfix bugzilla: 186909,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Now that the VFS will do something with the return values from ->sync_fs, make ours pass on error codes.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_super.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 2561e95fbdd1..9c4ff38e0901 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -729,6 +729,7 @@ xfs_fs_sync_fs( int wait) { struct xfs_mount *mp = XFS_M(sb); + int error;
trace_xfs_fs_sync_fs(mp, __return_address);
@@ -738,7 +739,10 @@ xfs_fs_sync_fs( if (!wait) return 0;
- xfs_log_force(mp, XFS_LOG_SYNC); + error = xfs_log_force(mp, XFS_LOG_SYNC); + if (error) + return error; + if (laptop_mode) { /* * The disk must be active because we're syncing.
From: Brian Foster bfoster@redhat.com
mainline inclusion from mainline-v5.18-rc2 commit f650df7171b882dca737ddbbeb414100b31f16af category: bugfix bugzilla: 187094,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The filestream AG selection loop uses pagf data to aid in AG selection, which depends on pagf initialization. If the in-core structure is not initialized, the caller invokes the AGF read path to do so and carries on. If another task enters the loop and finds a pagf init already in progress, the AGF read returns -EAGAIN and the task continues the loop. This does not increment the current ag index, however, which means the task spins on the current AGF buffer until unlocked.
If the AGF read I/O submitted by the initial task happens to be delayed for whatever reason, this results in soft lockup warnings via the spinning task. This is reproduced by xfs/170. To avoid this problem, fix the AGF trylock failure path to properly iterate to the next AG. If a task iterates all AGs without making progress, the trylock behavior is dropped in favor of blocking locks and thus a soft lockup is no longer possible.
Fixes: f48e2df8a877ca1c ("xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers") Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_filestream.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c index db23e455eb91..bc41ec0c483d 100644 --- a/fs/xfs/xfs_filestream.c +++ b/fs/xfs/xfs_filestream.c @@ -128,11 +128,12 @@ xfs_filestream_pick_ag( if (!pag->pagf_init) { err = xfs_alloc_pagf_init(mp, NULL, ag, trylock); if (err) { - xfs_perag_put(pag); - if (err != -EAGAIN) + if (err != -EAGAIN) { + xfs_perag_put(pag); return err; + } /* Couldn't lock the AGF, skip this AG. */ - continue; + goto next_ag; } }
From: Eric Sandeen sandeen@redhat.com
mainline inclusion from mainline-v5.11-rc1 commit 207ddc0ef4f413ab1f4e0c1fcab2226425dec293 category: bugfix bugzilla: 187102,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We don't yet support dax on reflinked files, but that is in the works.
Further, having the flag set does not automatically mean that the inode is actually "in the CPU direct access state," which depends on several other conditions in addition to the flag being set.
As such, we should not catch this as corruption in the verifier - simply not actually enabling S_DAX on reflinked files is enough for now.
Fixes: 4f435ebe7d04 ("xfs: don't mix reflink and DAX mode for now") Signed-off-by: Eric Sandeen sandeen@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de [darrick: fix the scrubber too] Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_buf.c | 4 ---- fs/xfs/scrub/inode.c | 4 ---- 2 files changed, 8 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index c667c63f2cb0..4d7410e49db4 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -547,10 +547,6 @@ xfs_dinode_verify( if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME)) return __this_address;
- /* don't let reflink and dax mix */ - if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX)) - return __this_address; - /* COW extent size hint validation */ fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize), mode, flags, flags2); diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 147c443a7242..357e6042d1c0 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -185,10 +185,6 @@ xchk_inode_flags2( if ((flags & XFS_DIFLAG_REALTIME) && (flags2 & XFS_DIFLAG2_REFLINK)) goto bad;
- /* dax and reflink make no sense, currently */ - if ((flags2 & XFS_DIFLAG2_DAX) && (flags2 & XFS_DIFLAG2_REFLINK)) - goto bad; - /* no bigtime iflag without the bigtime feature */ if (xfs_dinode_has_bigtime(dip) && !xfs_sb_version_hasbigtime(&mp->m_sb))
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.16-rc2 commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 category: bugfix bugzilla: 186901,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop:
mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done
When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A).
When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount.
The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads.
Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_super.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 9c4ff38e0901..fd2cb3393747 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1778,7 +1778,10 @@ static int xfs_remount_ro( struct xfs_mount *mp) { - int error; + struct xfs_icwalk icw = { + .icw_flags = XFS_ICWALK_FLAG_SYNC, + }; + int error;
/* * Cancel background eofb scanning so it cannot race with the final @@ -1786,8 +1789,13 @@ xfs_remount_ro( */ xfs_blockgc_stop(mp);
- /* Get rid of any leftover CoW reservations... */ - error = xfs_blockgc_free_space(mp, NULL); + /* + * Clear out all remaining COW staging extents and speculative post-EOF + * preallocations so that we don't leave inodes requiring inactivation + * cleanups during reclaim on a read-only mount. We must process every + * cached inode, so this requires a synchronous cache scan. + */ + error = xfs_blockgc_free_space(mp, &icw); if (error) { xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); return error;
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.16-rc5 commit 7993f1a431bc5271369d359941485a9340658ac3 category: bugfix bugzilla: 186901,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop:
mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done
When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A).
When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount.
The results are catastrophic -- file (B) and the refcount btree are now corrupt. In the first patch, we fixed the race condition in (2) so that (A) will always flush the COW fork. In this second patch, we move the _recover_cow call to the initial mount call in (0) for safety.
As mentioned previously, xfs_reflink_recover_cow walks the refcount btree looking for COW staging extents, and frees them. This was intended to be run at mount time (when we know there are no live inodes) to clean up any leftover staging events that may have been left behind during an unclean shutdown. As a time "optimization" for readonly mounts, we deferred this to the ro->rw transition, not realizing that any failure to clean all COW forks during a rw->ro transition would result in catastrophic corruption.
Therefore, remove this optimization and only run the recovery routine when we're guaranteed not to have any COW staging extents anywhere, which means we always run this at mount time. While we're at it, move the callsite to xfs_log_mount_finish because any refcount btree expansion (however unlikely given that we're removing records from the right side of the index) must be fed by a per-AG reservation, which doesn't exist in its current location.
Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_log_recover.c | 23 +++++++++++++++++++++++ fs/xfs/xfs_mount.c | 10 ---------- fs/xfs/xfs_reflink.c | 5 ++++- fs/xfs/xfs_super.c | 9 --------- 4 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index f3e7016823e8..83afe5bc0872 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -25,6 +25,7 @@ #include "xfs_icache.h" #include "xfs_error.h" #include "xfs_buf_item.h" +#include "xfs_reflink.h"
#define BLK_AVG(blk1, blk2) ((blk1+blk2) >> 1)
@@ -3465,6 +3466,28 @@ xlog_recover_finish(
xlog_recover_process_iunlinks(log); xlog_recover_check_summary(log); + + /* + * Recover any CoW staging blocks that are still referenced by the + * ondisk refcount metadata. During mount there cannot be any live + * staging extents as we have not permitted any user modifications. + * Therefore, it is safe to free them all right now, even on a + * read-only mount. + */ + error = xfs_reflink_recover_cow(log->l_mp); + if (error) { + xfs_alert(log->l_mp, + "Failed to recover leftover CoW staging extents, err %d.", + error); + /* + * If we get an error here, make sure the log is shut down + * but return zero so that any log items committed since the + * end of intents processing can be pushed through the CIL + * and AIL. + */ + xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR); + } + return 0; }
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 1f8ba6f40654..959425cfb612 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1040,15 +1040,6 @@ xfs_mountfs( xfs_warn(mp, "Unable to allocate reserve blocks. Continuing without reserve pool.");
- /* Recover any CoW blocks that never got remapped. */ - error = xfs_reflink_recover_cow(mp); - if (error) { - xfs_err(mp, - "Error %d recovering leftover CoW allocations.", error); - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); - goto out_quota; - } - /* Reserve AG blocks for future btree expansion. */ error = xfs_fs_reserve_ag_blocks(mp); if (error && error != -ENOSPC) @@ -1059,7 +1050,6 @@ xfs_mountfs(
out_agresv: xfs_fs_unreserve_ag_blocks(mp); - out_quota: xfs_qm_unmount_quotas(mp); out_rtunmount: xfs_rtunmount_inodes(mp); diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 5e3f00f1192a..6c8492b16c7b 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -744,7 +744,10 @@ xfs_reflink_end_cow( }
/* - * Free leftover CoW reservations that didn't get cleaned out. + * Free all CoW staging blocks that are still referenced by the ondisk refcount + * metadata. The ondisk metadata does not track which inode created the + * staging extent, so callers must ensure that there are no cached inodes with + * live CoW staging extents. */ int xfs_reflink_recover_cow( diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index fd2cb3393747..bf65e2e50ab7 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1752,15 +1752,6 @@ xfs_remount_rw( */ xfs_restore_resvblks(mp); xfs_log_work_queue(mp); - - /* Recover any CoW blocks that never got remapped. */ - error = xfs_reflink_recover_cow(mp); - if (error) { - xfs_err(mp, - "Error %d recovering leftover CoW allocations.", error); - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); - return error; - } xfs_blockgc_start(mp);
/* Create the per-AG metadata reservation pool .*/
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.16-rc5 commit 8dc9384b7d75012856b02ff44c37566a55fc2abf category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Oh, let me count the ways that the kvmalloc API sucks dog eggs.
The problem is when we are logging lots of large objects, we hit kvmalloc really damn hard with costly order allocations, and behaviour utterly sucks:
- 49.73% xlog_cil_commit - 31.62% kvmalloc_node - 29.96% __kmalloc_node - 29.38% kmalloc_large_node - 29.33% __alloc_pages - 24.33% __alloc_pages_slowpath.constprop.0 - 18.35% __alloc_pages_direct_compact - 17.39% try_to_compact_pages - compact_zone_order - 15.26% compact_zone 5.29% __pageblock_pfn_to_page 3.71% PageHuge - 1.44% isolate_migratepages_block 0.71% set_pfnblock_flags_mask 1.11% get_pfnblock_flags_mask - 0.81% get_page_from_freelist - 0.59% _raw_spin_lock_irqsave - do_raw_spin_lock __pv_queued_spin_lock_slowpath - 3.24% try_to_free_pages - 3.14% shrink_node - 2.94% shrink_slab.constprop.0 - 0.89% super_cache_count - 0.66% xfs_fs_nr_cached_objects - 0.65% xfs_reclaim_inodes_count 0.55% xfs_perag_get_tag 0.58% kfree_rcu_shrink_count - 2.09% get_page_from_freelist - 1.03% _raw_spin_lock_irqsave - do_raw_spin_lock __pv_queued_spin_lock_slowpath - 4.88% get_page_from_freelist - 3.66% _raw_spin_lock_irqsave - do_raw_spin_lock __pv_queued_spin_lock_slowpath - 1.63% __vmalloc_node - __vmalloc_node_range - 1.10% __alloc_pages_bulk - 0.93% __alloc_pages - 0.92% get_page_from_freelist - 0.89% rmqueue_bulk - 0.69% _raw_spin_lock - do_raw_spin_lock __pv_queued_spin_lock_slowpath 13.73% memcpy_erms - 2.22% kvfree
On this workload, that's almost a dozen CPUs all trying to compact and reclaim memory inside kvmalloc_node at the same time. Yet it is regularly falling back to vmalloc despite all that compaction, page and shrinker reclaim that direct reclaim is doing. Copying all the metadata is taking far less CPU time than allocating the storage!
Direct reclaim should be considered extremely harmful.
This is a high frequency, high throughput, CPU usage and latency sensitive allocation. We've got memory there, and we're using kvmalloc to allow memory allocation to avoid doing lots of work to try to do contiguous allocations.
Except it still does *lots of costly work* that is unnecessary.
Worse: the only way to avoid the slowpath page allocation trying to do compaction on costly allocations is to turn off direct reclaim (i.e. remove __GFP_RECLAIM_DIRECT from the gfp flags).
Unfortunately, the stupid kvmalloc API then says "oh, this isn't a GFP_KERNEL allocation context, so you only get kmalloc!". This cuts off the vmalloc fallback, and this leads to almost instant OOM problems which ends up in filesystems deadlocks, shutdowns and/or kernel crashes.
I want some basic kvmalloc behaviour:
- kmalloc for a contiguous range with fail fast semantics - no compaction direct reclaim if the allocation enters the slow path. - run normal vmalloc (i.e. GFP_KERNEL) if kmalloc fails
The really, really stupid part about this is these kvmalloc() calls are run under memalloc_nofs task context, so all the allocations are always reduced to GFP_NOFS regardless of the fact that kvmalloc requires GFP_KERNEL to be passed in. IOWs, we're already telling kvmalloc to behave differently to the gfp flags we pass in, but it still won't allow vmalloc to be run with anything other than GFP_KERNEL.
So, this patch open codes the kvmalloc() in the commit path to have the above described behaviour. The result is we more than halve the CPU time spend doing kvmalloc() in this path and transaction commits with 64kB objects in them more than doubles. i.e. we get ~5x reduction in CPU usage per costly-sized kvmalloc() invocation and the profile looks like this:
- 37.60% xlog_cil_commit 16.01% memcpy_erms - 8.45% __kmalloc - 8.04% kmalloc_order_trace - 8.03% kmalloc_order - 7.93% alloc_pages - 7.90% __alloc_pages - 4.05% __alloc_pages_slowpath.constprop.0 - 2.18% get_page_from_freelist - 1.77% wake_all_kswapds .... - __wake_up_common_lock - 0.94% _raw_spin_lock_irqsave - 3.72% get_page_from_freelist - 2.43% _raw_spin_lock_irqsave - 5.72% vmalloc - 5.72% __vmalloc_node_range - 4.81% __get_vm_area_node.constprop.0 - 3.26% alloc_vmap_area - 2.52% _raw_spin_lock - 1.46% _raw_spin_lock 0.56% __alloc_pages_bulk - 4.66% kvfree - 3.25% vfree - __vfree - 3.23% __vunmap - 1.95% remove_vm_area - 1.06% free_vmap_area_noflush - 0.82% _raw_spin_lock - 0.68% _raw_spin_lock - 0.92% _raw_spin_lock - 1.40% kfree - 1.36% __free_pages - 1.35% __free_pages_ok - 1.02% _raw_spin_lock_irqsave
It's worth noting that over 50% of the CPU time spent allocating these shadow buffers is now spent on spinlocks. So the shadow buffer allocation overhead is greatly reduced by getting rid of direct reclaim from kmalloc, and could probably be made even less costly if vmalloc() didn't use global spinlocks to protect it's structures.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_log_cil.c | 46 +++++++++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index ec9bef3670f2..c5118801218b 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -102,6 +102,39 @@ xlog_cil_iovec_space( sizeof(uint64_t)); }
+/* + * shadow buffers can be large, so we need to use kvmalloc() here to ensure + * success. Unfortunately, kvmalloc() only allows GFP_KERNEL contexts to fall + * back to vmalloc, so we can't actually do anything useful with gfp flags to + * control the kmalloc() behaviour within kvmalloc(). Hence kmalloc() will do + * direct reclaim and compaction in the slow path, both of which are + * horrendously expensive. We just want kmalloc to fail fast and fall back to + * vmalloc if it can't get somethign straight away from the free lists or buddy + * allocator. Hence we have to open code kvmalloc outselves here. + * + * Also, we are in memalloc_nofs_save task context here, so despite the use of + * GFP_KERNEL here, we are actually going to be doing GFP_NOFS allocations. This + * is actually the only way to make vmalloc() do GFP_NOFS allocations, so lets + * just all pretend this is a GFP_KERNEL context operation.... + */ +static inline void * +xlog_cil_kvmalloc( + size_t buf_size) +{ + gfp_t flags = GFP_KERNEL; + void *p; + + flags &= ~__GFP_DIRECT_RECLAIM; + flags |= __GFP_NOWARN | __GFP_NORETRY; + do { + p = kmalloc(buf_size, flags); + if (!p) + p = vmalloc(buf_size); + } while (!p); + + return p; +} + /* * Allocate or pin log vector buffers for CIL insertion. * @@ -203,25 +236,16 @@ xlog_cil_alloc_shadow_bufs( */ if (!lip->li_lv_shadow || buf_size > lip->li_lv_shadow->lv_size) { - /* * We free and allocate here as a realloc would copy - * unnecessary data. We don't use kmem_zalloc() for the + * unnecessary data. We don't use kvzalloc() for the * same reason - we don't need to zero the data area in * the buffer, only the log vector header and the iovec * storage. */ kmem_free(lip->li_lv_shadow); + lv = xlog_cil_kvmalloc(buf_size);
- /* - * We are in transaction context, which means this - * allocation will pick up GFP_NOFS from the - * memalloc_nofs_save/restore context the transaction - * holds. This means we can use GFP_KERNEL here so the - * generic kvmalloc() code will run vmalloc on - * contiguous page allocation failure as we require. - */ - lv = kvmalloc(buf_size, GFP_KERNEL); memset(lv, 0, xlog_cil_iovec_space(niovecs));
lv->lv_item = lip;
From: Yang Xu xuyang2018.jy@fujitsu.com
stable inclusion from stable-v5.10.128 commit 1e76bd4c67224a645558314c0097d5b5a338bba9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PBNO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit a1de97fe296c52eafc6590a3506f4bbd44ecb19a upstream.
When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine. Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.
The deadlock as below: [983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080 [ 983.923405] Call Trace: [ 983.923410] __schedule+0x2c4/0x700 [ 983.923412] schedule+0x37/0xa0 [ 983.923414] schedule_timeout+0x274/0x300 [ 983.923416] __down+0x9b/0xf0 [ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923453] down+0x3b/0x50 [ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs] [ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs] [ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs] [ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs] [ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs] [ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs] [ 983.923624] ? kmem_cache_alloc+0x12e/0x270 [ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs] [ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs] [ 983.923664] xfs_attr_set+0x273/0x320 [xfs] [ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs] [ 983.923686] __vfs_removexattr+0x4d/0x60 [ 983.923688] __vfs_removexattr_locked+0xac/0x130 [ 983.923689] vfs_removexattr+0x4e/0xf0 [ 983.923690] removexattr+0x4d/0x80 [ 983.923693] ? __check_object_size+0xa8/0x16b [ 983.923695] ? strncpy_from_user+0x47/0x1a0 [ 983.923696] ? getname_flags+0x6a/0x1e0 [ 983.923697] ? _cond_resched+0x15/0x30 [ 983.923699] ? __sb_start_write+0x1e/0x70 [ 983.923700] ? mnt_want_write+0x28/0x50 [ 983.923701] path_removexattr+0x9b/0xb0 [ 983.923702] __x64_sys_removexattr+0x17/0x20 [ 983.923704] do_syscall_64+0x5b/0x1a0 [ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 983.923707] RIP: 0033:0x7f080f10ee1b
When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.
Then subsequent removexattr will hang because of it.
This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines"). It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state in this case. But xfs_attr_node_hasname will free state itself instead of caller if xfs_da3_node_lookup_int fails.
Fix this bug by moving the step of free state into caller.
[amir: this text from original commit is not relevant for 5.10 backport: Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and xfs_attr_node_removename_setup function because we should free state ourselves. ]
Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Signed-off-by: Yang Xu xuyang2018.jy@fujitsu.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_attr.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 60cf08e0cb5e..909980802f19 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -864,21 +864,18 @@ xfs_attr_node_hasname(
state = xfs_da_state_alloc(args); if (statep != NULL) - *statep = NULL; + *statep = state;
/* * Search to see if name exists, and get back a pointer to it. */ error = xfs_da3_node_lookup_int(state, &retval); - if (error) { - xfs_da_state_free(state); - return error; - } + if (error) + retval = error;
- if (statep != NULL) - *statep = state; - else + if (!statep) xfs_da_state_free(state); + return retval; }
From: Dave Chinner dchinner@redhat.com
stable inclusion from stable-v5.10.128 commit 6b734f7b7071859f582b5acb95abb97e1276a030 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PBNO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 09654ed8a18cfd45027a67d6cbca45c9ea54feab upstream.
Got a report that a repeated crash test of a container host would eventually fail with a log recovery error preventing the system from mounting the root filesystem. It manifested as a directory leaf node corruption on writeback like so:
XFS (loop0): Mounting V5 Filesystem XFS (loop0): Starting recovery (logdev: internal) XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=....... 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...).... 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.# 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C...... 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................ 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A...... 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A.... XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed
Tracing indicated that we were recovering changes from a transaction at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57. That is, log recovery was overwriting a buffer with newer changes on disk than was in the transaction. Tracing indicated that we were hitting the "recovery immediately" case in xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the buffer.
The code was extracting the LSN correctly, then ignoring it because the UUID in the buffer did not match the superblock UUID. The problem arises because the UUID check uses the wrong UUID - it should be checking the sb_meta_uuid, not sb_uuid. This filesystem has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the correct matching sb_meta_uuid in it, it's just the code checked it against the wrong superblock uuid.
The is no corruption in the filesystem, and failing to recover the buffer due to a write verifier failure means the recovery bug did not propagate the corruption to disk. Hence there is no corruption before or after this bug has manifested, the impact is limited simply to an unmountable filesystem....
This was missed back in 2015 during an audit of incorrect sb_uuid usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs to validate against sb_meta_uuid") that fixed the magic32 buffers to validate against sb_meta_uuid instead of sb_uuid. It missed the magicda buffers....
Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_buf_item_recover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 4775485b4062..aa4d45701de5 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn( }
if (lsn != (xfs_lsn_t)-1) { - if (!uuid_equal(&mp->m_sb.sb_uuid, uuid)) + if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid)) goto recover_immediately; return lsn; }
From: Zhang Yi yi.zhang@huawei.com
mainline inclusion from mainline-v5.19-rc2 commit 04a98a036cf8b810dda172a9dcfcbd783bf63655 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In the procedure of recover AGI unlinked lists, if something bad happenes on one of the unlinked inode in the bucket list, we would call xlog_recover_clear_agi_bucket() to clear the whole unlinked bucket list, not the unlinked inodes after the bad one. If we have already added some inodes to the gc workqueue before the bad inode in the list, we could get below error when freeing those inodes, and finaly fail to complete the log recover procedure.
XFS (ram0): Internal error xfs_iunlink_remove at line 2456 of file fs/xfs/xfs_inode.c. Caller xfs_ifree+0xb0/0x360 [xfs]
The problem is xlog_recover_clear_agi_bucket() clear the bucket list, so the gc worker fail to check the agino in xfs_verify_agino(). Fix this by flush workqueue before clearing the bucket.
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_log_recover.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 83afe5bc0872..88b48aed446a 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2715,6 +2715,7 @@ xlog_recover_process_one_iunlink( * Call xlog_recover_clear_agi_bucket() to perform a transaction to * clear the inode pointer in the bucket. */ + xfs_inodegc_flush(mp); xlog_recover_clear_agi_bucket(mp, agno, bucket); return NULLAGINO; }
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit b9b7e1dc56c5ca8d6fc37c410b054e9f26737d2e category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
XFS does not check for possible overflow of per-inode extent counter fields when adding extents to either data or attr fork.
For e.g. 1. Insert 5 million xattrs (each having a value size of 255 bytes) and then delete 50% of them in an alternating manner.
2. On a 4k block sized XFS filesystem instance, the above causes 98511 extents to be created in the attr fork of the inode.
xfsaild/loop0 2008 [003] 1475.127209: probe:xfs_inode_to_disk: (ffffffffa43fb6b0) if_nextents=98511 i_ino=131
3. The incore inode fork extent counter is a signed 32-bit quantity. However the on-disk extent counter is an unsigned 16-bit quantity and hence cannot hold 98511 extents.
4. The following incorrect value is stored in the attr extent counter, # xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0 core.naextents = -32561
This commit adds a new helper function (i.e. xfs_iext_count_may_overflow()) to check for overflow of the per-inode data and xattr extent counters. Future patches will use this function to make sure that an FS operation won't cause the extent counter to overflow.
Suggested-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.c | 23 +++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_fork.h | 2 ++ 2 files changed, 25 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 7575de5cecb1..8d48716547e5 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -23,6 +23,7 @@ #include "xfs_da_btree.h" #include "xfs_dir2_priv.h" #include "xfs_attr_leaf.h" +#include "xfs_types.h"
kmem_zone_t *xfs_ifork_zone;
@@ -728,3 +729,25 @@ xfs_ifork_verify_local_attr(
return 0; } + +int +xfs_iext_count_may_overflow( + struct xfs_inode *ip, + int whichfork, + int nr_to_add) +{ + struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + uint64_t max_exts; + uint64_t nr_exts; + + if (whichfork == XFS_COW_FORK) + return 0; + + max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM; + + nr_exts = ifp->if_nextents + nr_to_add; + if (nr_exts < ifp->if_nextents || nr_exts > max_exts) + return -EFBIG; + + return 0; +} diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index a4953e95c4f3..0beb8e2a00be 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -172,5 +172,7 @@ extern void xfs_ifork_init_cow(struct xfs_inode *ip);
int xfs_ifork_verify_local_data(struct xfs_inode *ip); int xfs_ifork_verify_local_attr(struct xfs_inode *ip); +int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork, + int nr_to_add);
#endif /* __XFS_INODE_FORK_H__ */
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 727e1acd297cae15449607d6e2ee39c71216cf1a category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When adding a new data extent (without modifying an inode's existing extents) the extent count increases only by 1. This commit checks for extent count overflow in such cases.
Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Allison Henderson allison.henderson@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com
Conflict: commit 3de4eb106fcc ("xfs: allow reservation of rtblocks with xfs_trans_alloc_inode") is backported already, which introduce some conflicts on code context. Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 6 ++++++ fs/xfs/libxfs/xfs_inode_fork.h | 6 ++++++ fs/xfs/xfs_bmap_item.c | 7 +++++++ fs/xfs/xfs_bmap_util.c | 4 ++++ fs/xfs/xfs_dquot.c | 8 +++++++- fs/xfs/xfs_iomap.c | 5 +++++ fs/xfs/xfs_rtalloc.c | 5 +++++ 7 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index e6bb7b928b38..07596edbfb38 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4516,6 +4516,12 @@ xfs_bmapi_convert_delalloc( return error;
xfs_ilock(ip, XFS_ILOCK_EXCL); + + error = xfs_iext_count_may_overflow(ip, whichfork, + XFS_IEXT_ADD_NOSPLIT_CNT); + if (error) + goto out_trans_cancel; + xfs_trans_ijoin(tp, ip, 0);
if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) || diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 0beb8e2a00be..7fc2b129a2e7 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -34,6 +34,12 @@ struct xfs_ifork { #define XFS_IFEXTENTS 0x02 /* All extent pointers are read in */ #define XFS_IFBROOT 0x04 /* i_broot points to the bmap b-tree root */
+/* + * Worst-case increase in the fork extent count when we're adding a single + * extent to a fork and there's no possibility of splitting an existing mapping. + */ +#define XFS_IEXT_ADD_NOSPLIT_CNT (1) + /* * Fork handling. */ diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 984bb480f177..adc7507947bf 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -497,6 +497,13 @@ xfs_bui_item_recover( xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
+ if (bui_type == XFS_BMAP_MAP) { + error = xfs_iext_count_may_overflow(ip, whichfork, + XFS_IEXT_ADD_NOSPLIT_CNT); + if (error) + goto err_cancel; + } + count = bmap->me_len; error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip, whichfork, bmap->me_startoff, bmap->me_startblock, diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index d0113dec5d32..c727ac76d03b 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -804,6 +804,10 @@ xfs_alloc_file_space( if (error) break;
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_ADD_NOSPLIT_CNT); + if (error) + goto error;
error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb, alloc_type, 0, imapp, diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 3c31bd97b590..23366951bf95 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -314,8 +314,14 @@ xfs_dquot_disk_alloc( return -ESRCH; }
- /* Create the block mapping. */ xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL); + + error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK, + XFS_IEXT_ADD_NOSPLIT_CNT); + if (error) + return error; + + /* Create the block mapping. */ error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset, XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map, &nmaps); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 3b362416ddb0..ff0b092d0c37 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -241,6 +241,11 @@ xfs_iomap_write_direct( if (error) return error;
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_ADD_NOSPLIT_CNT); + if (error) + goto out_trans_cancel; + /* * From this point onwards we overwrite the imap pointer that the * caller gave to us. diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 35aa62625bf3..8a150acecba4 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -804,6 +804,11 @@ xfs_growfs_rt_alloc( xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_ADD_NOSPLIT_CNT); + if (error) + goto out_trans_cancel; + /* * Allocate blocks to the bitmap file. */
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 85ef08b5a667615bc7be5058259753dc42a7adcd category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The extent mapping the file offset at which a hole has to be inserted will be split into two extents causing extent count to increase by 1.
Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Allison Henderson allison.henderson@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com
Conflict: commit 3a1af6c317d0 ("xfs: refactor common transaction/inode/quota allocation idiom") is backported, which introduce some conflicts on code context. Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.h | 7 +++++++ fs/xfs/xfs_bmap_item.c | 15 +++++++++------ fs/xfs/xfs_bmap_util.c | 10 ++++++++++ 3 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 7fc2b129a2e7..bcac769a7df6 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -40,6 +40,13 @@ struct xfs_ifork { */ #define XFS_IEXT_ADD_NOSPLIT_CNT (1)
+/* + * Punching out an extent from the middle of an existing extent can cause the + * extent count to increase by 1. + * i.e. | Old extent | Hole | Old extent | + */ +#define XFS_IEXT_PUNCH_HOLE_CNT (1) + /* * Fork handling. */ diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index adc7507947bf..44ec0f2d5253 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -439,6 +439,7 @@ xfs_bui_item_recover( xfs_exntst_t state; unsigned int bui_type; int whichfork; + int iext_delta; int error = 0;
/* Only one mapping operation per BUI... */ @@ -497,12 +498,14 @@ xfs_bui_item_recover( xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
- if (bui_type == XFS_BMAP_MAP) { - error = xfs_iext_count_may_overflow(ip, whichfork, - XFS_IEXT_ADD_NOSPLIT_CNT); - if (error) - goto err_cancel; - } + if (bui_type == XFS_BMAP_MAP) + iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT; + else + iext_delta = XFS_IEXT_PUNCH_HOLE_CNT; + + error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta); + if (error) + goto err_cancel;
count = bmap->me_len; error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip, diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index c727ac76d03b..202d6af3e503 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -859,6 +859,11 @@ xfs_unmap_extent( if (error) return error;
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_PUNCH_HOLE_CNT); + if (error) + goto out_trans_cancel; + error = xfs_bunmapi(tp, ip, startoffset_fsb, len_fsb, 0, 2, done); if (error) goto out_trans_cancel; @@ -1136,6 +1141,11 @@ xfs_insert_file_space( xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_PUNCH_HOLE_CNT); + if (error) + goto out_trans_cancel; + /* * The extent shifting code works on extent granularity. So, if stop_fsb * is not the starting block of extent, we need to split the extent at
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit f5d92749191402c50e32ac83dd9da3b910f5680f category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Directory entry addition can cause the following, 1. Data block can be added/removed. A new extent can cause extent count to increase by 1. 2. Free disk block can be added/removed. Same behaviour as described above for Data block. 3. Dabtree blocks. XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com
Conflict: commit 3a1af6c317d0 ("xfs: refactor common transaction/inode/quota allocation idiom") is backported, which introduce some conflicts on code context. Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++ fs/xfs/xfs_inode.c | 10 ++++++++++ fs/xfs/xfs_symlink.c | 5 +++++ 3 files changed, 28 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index bcac769a7df6..ea1a9dd8a763 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -47,6 +47,19 @@ struct xfs_ifork { */ #define XFS_IEXT_PUNCH_HOLE_CNT (1)
+/* + * Directory entry addition can cause the following, + * 1. Data block can be added/removed. + * A new extent can cause extent count to increase by 1. + * 2. Free disk block can be added/removed. + * Same behaviour as described above for Data block. + * 3. Dabtree blocks. + * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new + * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH. + */ +#define XFS_IEXT_DIR_MANIP_CNT(mp) \ + ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount) + /* * Fork handling. */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index ba4bde9d5fcb..31c1f8f951a0 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1171,6 +1171,11 @@ xfs_create( xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT); unlock_dp_on_error = true;
+ error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK, + XFS_IEXT_DIR_MANIP_CNT(mp)); + if (error) + goto out_trans_cancel; + /* * A newly created regular or special file just has one directory * entry pointing to them, but a directory also the "." entry @@ -1383,6 +1388,11 @@ xfs_link( xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
+ error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK, + XFS_IEXT_DIR_MANIP_CNT(mp)); + if (error) + goto error_return; + /* * If we are using project inheritance, we only allow hard link * creation in our tree when the project IDs are the same; else diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 8cd3ad4a0dfb..7b9cdb1a41ff 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -213,6 +213,11 @@ xfs_symlink( goto out_trans_cancel; }
+ error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK, + XFS_IEXT_DIR_MANIP_CNT(mp)); + if (error) + goto out_trans_cancel; + /* * Allocate an inode for the symlink. */
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 0dbc5cb1a91cc8c44b1c75429f5b9351837114fd category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Directory entry removal must always succeed; Hence XFS does the following during low disk space scenario: 1. Data/Free blocks linger until a future remove operation. 2. Dabtree blocks would be swapped with the last block in the leaf space and then the new last block will be unmapped.
This facility is reused during low inode extent count scenario i.e. this commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so that the above mentioned behaviour is exercised causing no change to the directory's extent count.
Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 07596edbfb38..89ccf059d1aa 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5139,6 +5139,24 @@ xfs_bmap_del_extent_real( /* * Deleting the middle of the extent. */ + + /* + * For directories, -ENOSPC is returned since a directory entry + * remove operation must not fail due to low extent count + * availability. -ENOSPC will be handled by higher layers of XFS + * by letting the corresponding empty Data/Free blocks to linger + * until a future remove operation. Dabtree blocks would be + * swapped with the last block in the leaf space and then the + * new last block will be unmapped. + */ + error = xfs_iext_count_may_overflow(ip, whichfork, 1); + if (error) { + ASSERT(S_ISDIR(VFS_I(ip)->i_mode) && + whichfork == XFS_DATA_FORK); + error = -ENOSPC; + goto done; + } + old = got;
got.br_blockcount = del->br_startoff - got.br_startoff;
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 02092a2f034fdeabab524ae39c2de86ba9ffa15a category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
A rename operation is essentially a directory entry remove operation from the perspective of parent directory (i.e. src_dp) of rename's source. Hence the only place where we check for extent count overflow for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns -ENOSPC when it detects a possible extent count overflow and in response, the higher layers of directory handling code do the following: 1. Data/Free blocks: XFS lets these blocks linger until a future remove operation removes them. 2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf space and unmaps the last block.
For target_dp, there are two cases depending on whether the destination directory entry exists or not.
When destination directory entry does not exist (i.e. target_ip == NULL), extent count overflow check is performed only when transaction has a non-zero sized space reservation associated with it. With a zero-sized space reservation, XFS allows a rename operation to continue only when the directory has sufficient free space in its data/leaf/free space blocks to hold the new entry.
When destination directory entry exists (i.e. target_ip != NULL), all we need to do is change the inode number associated with the already existing entry. Hence there is no need to perform an extent count overflow check.
Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 3 +++ fs/xfs/xfs_inode.c | 44 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 89ccf059d1aa..97dbb8af9fa0 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5148,6 +5148,9 @@ xfs_bmap_del_extent_real( * until a future remove operation. Dabtree blocks would be * swapped with the last block in the leaf space and then the * new last block will be unmapped. + * + * The above logic also applies to the source directory entry of + * a rename operation. */ error = xfs_iext_count_may_overflow(ip, whichfork, 1); if (error) { diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 31c1f8f951a0..c9cf34a4fee8 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3310,6 +3310,35 @@ xfs_rename( /* * Check for expected errors before we dirty the transaction * so we can return an error without a transaction abort. + * + * Extent count overflow check: + * + * From the perspective of src_dp, a rename operation is essentially a + * directory entry remove operation. Hence the only place where we check + * for extent count overflow for src_dp is in + * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns + * -ENOSPC when it detects a possible extent count overflow and in + * response, the higher layers of directory handling code do the + * following: + * 1. Data/Free blocks: XFS lets these blocks linger until a + * future remove operation removes them. + * 2. Dabtree blocks: XFS swaps the blocks with the last block in the + * Leaf space and unmaps the last block. + * + * For target_dp, there are two cases depending on whether the + * destination directory entry exists or not. + * + * When destination directory entry does not exist (i.e. target_ip == + * NULL), extent count overflow check is performed only when transaction + * has a non-zero sized space reservation associated with it. With a + * zero-sized space reservation, XFS allows a rename operation to + * continue only when the directory has sufficient free space in its + * data/leaf/free space blocks to hold the new entry. + * + * When destination directory entry exists (i.e. target_ip != NULL), all + * we need to do is change the inode number associated with the already + * existing entry. Hence there is no need to perform an extent count + * overflow check. */ if (target_ip == NULL) { /* @@ -3320,6 +3349,12 @@ xfs_rename( error = xfs_dir_canenter(tp, target_dp, target_name); if (error) goto out_trans_cancel; + } else { + error = xfs_iext_count_may_overflow(target_dp, + XFS_DATA_FORK, + XFS_IEXT_DIR_MANIP_CNT(mp)); + if (error) + goto out_trans_cancel; } } else { /* @@ -3485,9 +3520,16 @@ xfs_rename( if (wip) { error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino, spaceres); - } else + } else { + /* + * NOTE: We don't need to check for extent count overflow here + * because the dir remove name code will leave the dir block in + * place if the extent count would overflow. + */ error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino, spaceres); + } + if (error) goto out_trans_cancel;
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 3a19bb147c72d2e9b77137bf5130b9cfb50a5eef category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be added. One extra extent for dabtree in case a local attr is large enough to cause a double split. It can also cause extent count to increase proportional to the size of a remote xattr's value.
Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Allison Henderson allison.henderson@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com
Conflict: commit 3a1af6c317d0 ("xfs: refactor common transaction/inode/quota allocation idiom") is backported, which introduce some conflicts on code context. Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_attr.c | 12 ++++++++++++ fs/xfs/libxfs/xfs_inode_fork.h | 10 ++++++++++ 2 files changed, 22 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 909980802f19..5ce192d2a426 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -396,6 +396,7 @@ xfs_attr_set( struct xfs_trans_res tres; bool rsvd = (args->attr_filter & XFS_ATTR_ROOT); int error, local; + int rmt_blks = 0; unsigned int total;
if (XFS_FORCED_SHUTDOWN(dp->i_mount)) @@ -442,11 +443,15 @@ xfs_attr_set( tres.tr_logcount = XFS_ATTRSET_LOG_COUNT; tres.tr_logflags = XFS_TRANS_PERM_LOG_RES; total = args->total; + + if (!local) + rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen); } else { XFS_STATS_INC(mp, xs_attr_remove);
tres = M_RES(mp)->tr_attrrm; total = XFS_ATTRRM_SPACE_RES(mp); + rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX); }
/* @@ -457,6 +462,13 @@ xfs_attr_set( if (error) return error;
+ if (args->value || xfs_inode_hasattr(dp)) { + error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK, + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks)); + if (error) + goto out_trans_cancel; + } + if (args->value) { error = xfs_has_attr(args); if (error == -EEXIST && (args->attr_flags & XATTR_CREATE)) diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index ea1a9dd8a763..8d89838e23f8 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -60,6 +60,16 @@ struct xfs_ifork { #define XFS_IEXT_DIR_MANIP_CNT(mp) \ ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
+/* + * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to + * be added. One extra extent for dabtree in case a local attr is + * large enough to cause a double split. It can also cause extent + * count to increase proportional to the size of a remote xattr's + * value. + */ +#define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \ + (XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks)) + /* * Fork handling. */
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit c442f3086d5a108b7ff086c8ade1923a8f389db5 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
A write to a sub-interval of an existing unwritten extent causes the original extent to be split into 3 extents i.e. | Unwritten | Real | Unwritten | Hence extent count can increase by 2.
Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Allison Henderson allison.henderson@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com
Conflict: commit 3a1af6c317d0 ("xfs: refactor common transaction/inode/quota allocation idiom") is backported, which introduce some conflicts on code context. Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.h | 9 +++++++++ fs/xfs/xfs_iomap.c | 4 ++++ 2 files changed, 13 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 8d89838e23f8..917e289ad962 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -70,6 +70,15 @@ struct xfs_ifork { #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \ (XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
+/* + * A write to a sub-interval of an existing unwritten extent causes the original + * extent to be split into 3 extents + * i.e. | Unwritten | Real | Unwritten | + * Hence extent count can increase by 2. + */ +#define XFS_IEXT_WRITE_UNWRITTEN_CNT (2) + + /* * Fork handling. */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index ff0b092d0c37..cf22dad509e1 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -545,6 +545,10 @@ xfs_iomap_write_unwritten( if (error) return error;
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_WRITE_UNWRITTEN_CNT); + if (error) + goto error_on_bmapi_transaction;
/* * Modify the unwritten extent state of the buffer.
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 5f1d5bbfb2e674052a9fe542f53678978af20770 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Moving an extent to data fork can cause a sub-interval of an existing extent to be unmapped. This will increase extent count by 1. Mapping in the new extent can increase the extent count by 1 again i.e. | Old extent | New extent | Old extent | Hence number of extents increases by 2.
Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Allison Henderson allison.henderson@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.h | 9 +++++++++ fs/xfs/xfs_reflink.c | 5 +++++ 2 files changed, 14 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 917e289ad962..c8f279edc5c1 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -79,6 +79,15 @@ struct xfs_ifork { #define XFS_IEXT_WRITE_UNWRITTEN_CNT (2)
+/* + * Moving an extent to data fork can cause a sub-interval of an existing extent + * to be unmapped. This will increase extent count by 1. Mapping in the new + * extent can increase the extent count by 1 again i.e. + * | Old extent | New extent | Old extent | + * Hence number of extents increases by 2. + */ +#define XFS_IEXT_REFLINK_END_COW_CNT (2) + /* * Fork handling. */ diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 6c8492b16c7b..b85b249df989 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -615,6 +615,11 @@ xfs_reflink_end_cow_extent( xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
+ error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_REFLINK_END_COW_CNT); + if (error) + goto out_cancel; + /* * In case of racing, overlapping AIO writes no COW extents might be * left by the time I/O completes for the loser of the race. In that
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit ee898d78c3540b44270a5fdffe208d7bbb219d93 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Remapping an extent involves unmapping the existing extent and mapping in the new extent. When unmapping, an extent containing the entire unmap range can be split into two extents, i.e. | Old extent | hole | Old extent | Hence extent count increases by 1.
Mapping in the new extent into the destination file can increase the extent count by 1.
Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_reflink.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index b85b249df989..5fc128bc7939 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -997,6 +997,7 @@ xfs_reflink_remap_extent( bool quota_reserved = true; bool smap_real; bool dmap_written = xfs_bmap_is_written_extent(dmap); + int iext_delta = 0; int nimaps; int error;
@@ -1107,6 +1108,16 @@ xfs_reflink_remap_extent( goto out_cancel; }
+ if (smap_real) + ++iext_delta; + + if (dmap_written) + ++iext_delta; + + error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta); + if (error) + goto out_cancel; + if (smap_real) { /* * If the extent we're unmapping is backed by storage (written
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit bcc561f21f115437a010307420fc43d91be91c66 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Removing an initial range of source/donor file's extent and adding a new extent (from donor/source file) in its place will cause extent count to increase by 1.
Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.h | 7 +++++++ fs/xfs/xfs_bmap_util.c | 16 ++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index c8f279edc5c1..9e2137cd7372 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -88,6 +88,13 @@ struct xfs_ifork { */ #define XFS_IEXT_REFLINK_END_COW_CNT (2)
+/* + * Removing an initial range of source/donor file's extent and adding a new + * extent (from donor/source file) in its place will cause extent count to + * increase by 1. + */ +#define XFS_IEXT_SWAP_RMAP_CNT (1) + /* * Fork handling. */ diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 202d6af3e503..df004890c2a3 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1367,6 +1367,22 @@ xfs_swap_extent_rmap( irec.br_blockcount); trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec);
+ if (xfs_bmap_is_real_extent(&uirec)) { + error = xfs_iext_count_may_overflow(ip, + XFS_DATA_FORK, + XFS_IEXT_SWAP_RMAP_CNT); + if (error) + goto out; + } + + if (xfs_bmap_is_real_extent(&irec)) { + error = xfs_iext_count_may_overflow(tip, + XFS_DATA_FORK, + XFS_IEXT_SWAP_RMAP_CNT); + if (error) + goto out; + } + /* Remove the mapping from the donor file. */ xfs_bmap_unmap_extent(tp, tip, &uirec);
From: Chandan Babu R chandanrlinux@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 5147ef30f2cd128c9eedf7a697e8cb2ce2767989 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
With dax enabled filesystems, a direct write operation into an existing unwritten extent results in xfs_iomap_write_direct() zero-ing and converting the extent into a normal extent before the actual data is copied from the userspace buffer.
The inode extent count can increase by 2 if the extent range being written to maps to the middle of the existing unwritten extent range. Hence this commit uses XFS_IEXT_WRITE_UNWRITTEN_CNT as the extent count delta when such a write operation is being performed.
Fixes: 727e1acd297c ("xfs: Check for extent overflow when trivally adding a new extent") Reported-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanrlinux@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_iomap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index cf22dad509e1..31c553a49241 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -198,6 +198,7 @@ xfs_iomap_write_direct( bool force = false; int error; int bmapi_flags = XFS_BMAPI_PREALLOC; + int nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
ASSERT(count_fsb > 0);
@@ -232,6 +233,7 @@ xfs_iomap_write_direct( bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO; if (imap->br_state == XFS_EXT_UNWRITTEN) { force = true; + nr_exts = XFS_IEXT_WRITE_UNWRITTEN_CNT; dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1; } } @@ -241,8 +243,7 @@ xfs_iomap_write_direct( if (error) return error;
- error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, - XFS_IEXT_ADD_NOSPLIT_CNT); + error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts); if (error) goto out_trans_cancel;
From: Guo Xuenan guoxuenan@huawei.com
Offering: HULK hulk inclusion category: bugfix bugzilla: 186943,https://gitee.com/openeuler/kernel/issues/I4KIAO
--------------------------------
For leaf dir, In most cases, there should be as many bestfree slots as the dir data blocks that can fit under i_size (except for [1]).
Root cause is we don't examin the number bestfree slots, when the slots number less than dir data blocks, if we need to allocate new dir data block and update the bestfree array, we will use the dir block number as index to assign bestfree array, while we did not check the leaf buf boundary which may cause UAF or other memory access problem. This issue can also triggered with test cases xfs/473 from fstests.
Considering the special case [1] , only add check bestfree array boundary, to avoid UAF or slab-out-of bound.
[1] https://lore.kernel.org/all/163961697197.3129691.1911552605195534271.stgit@m...
Simplify the testcase xfs/473 with commands below: DEV=/dev/sdb MP=/mnt/sdb WORKDIR=/mnt/sdb/341 #1. mkfs create new xfs image mkfs.xfs -f ${DEV} mount ${DEV} ${MP} mkdir -p ${WORKDIR} for i in `seq 1 341` #2. create leaf dir with 341 entries file name len 8 do touch ${WORKDIR}/$(printf "%08d" ${i}) done inode=$(ls -i ${MP} | cut -d' ' -f1) umount ${MP} #3. xfs_db set bestcount to 0 xfs_db -x ${DEV} -c "inode ${inode}" -c "dblock 8388608" \ -c "write ltail.bestcount 0" mount ${DEV} ${MP} touch ${WORKDIR}/{1..100}.txt #4. touch new file, reproduce
The error log is shown as follows: Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
================================================================== BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0 Write of size 2 at addr ffff88810168b000 by task touch/1552 CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report.cold+0xf6/0x691 kasan_report+0xa8/0x120 xfs_dir2_leaf_addname+0x1995/0x1ac0 xfs_dir_createname+0x58c/0x7f0 xfs_create+0x7af/0x1010 xfs_generic_create+0x270/0x5e0 path_openat+0x270b/0x3450 do_filp_open+0x1cf/0x2b0 do_sys_openat2+0x46b/0x7a0 do_sys_open+0xb7/0x130 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fe4d9e9312b Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000 </TASK>
The buggy address belongs to the physical page: page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10168b flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff) raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000 raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint 00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78 XDD3[S5........x XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file fs/xfs/libxfs/xfs_dir2_data.c. Caller xfs_dir2_data_use_free+0x28a/0xeb0 CPU: 5 PID: 1552 Comm: touch Tainted: G B 6.0.0-rc3+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 xfs_corruption_error+0x132/0x150 xfs_dir2_data_use_free+0x198/0xeb0 xfs_dir2_leaf_addname+0xa59/0x1ac0 xfs_dir_createname+0x58c/0x7f0 xfs_create+0x7af/0x1010 xfs_generic_create+0x270/0x5e0 path_openat+0x270b/0x3450 do_filp_open+0x1cf/0x2b0 do_sys_openat2+0x46b/0x7a0 do_sys_open+0xb7/0x130 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fe4d9e9312b Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000 </TASK> XFS (sdb): Corruption detected. Unmount and run xfs_repair
Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_dir2_leaf.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c index 95d2a3f92d75..bd1b2559e165 100644 --- a/fs/xfs/libxfs/xfs_dir2_leaf.c +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c @@ -815,6 +815,18 @@ xfs_dir2_leaf_addname( */ else xfs_dir3_leaf_log_bests(args, lbp, use_block, use_block); + /* + * An abnormal corner case, bestfree count less than data + * blocks, add a condition to avoid UAF or slab-out-of bound. + */ + if ((char *)(&bestsp[use_block]) >= (char *)ltp) { + xfs_trans_brelse(tp, lbp); + if (tp->t_flags & XFS_TRANS_DIRTY) + xfs_force_shutdown(tp->t_mountp, + SHUTDOWN_CORRUPT_INCORE); + return -EFSCORRUPTED; + } + hdr = dbp->b_addr; bf = xfs_dir2_data_bestfree_p(dp->i_mount, hdr); bestsp[use_block] = bf[0].length;
From: Xiaole He hexiaole@kylinos.cn
mainline inclusion from mainline-v6.0-rc1 commit fdbae121b4369fe49eb5f8efbd23604ab4c50116 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The 'ctime', 'mtime', and 'atime' for inode is the type of 'xfs_timestamp_t', which is a 64-bit type:
/* fs/xfs/libxfs/xfs_format.h begin */ typedef __be64 xfs_timestamp_t; /* fs/xfs/libxfs/xfs_format.h end */
When the 'bigtime' feature is disabled, this 64-bit type is splitted into two parts of 32-bit, one part is encoded for seconds since 1970-01-01 00:00:00 UTC, the other part is encoded for nanoseconds above the seconds, this two parts are the type of 'xfs_legacy_timestamp' and the min and max time value of this type are defined as macros 'XFS_LEGACY_TIME_MIN' and 'XFS_LEGACY_TIME_MAX':
/* fs/xfs/libxfs/xfs_format.h begin */ struct xfs_legacy_timestamp { __be32 t_sec; /* timestamp seconds */ __be32 t_nsec; /* timestamp nanoseconds */ }; #define XFS_LEGACY_TIME_MIN ((int64_t)S32_MIN) #define XFS_LEGACY_TIME_MAX ((int64_t)S32_MAX) /* fs/xfs/libxfs/xfs_format.h end */ /* include/linux/limits.h begin */ #define U32_MAX ((u32)~0U) #define S32_MAX ((s32)(U32_MAX >> 1)) #define S32_MIN ((s32)(-S32_MAX - 1)) /* include/linux/limits.h end */
'XFS_LEGACY_TIME_MIN' is the min time value of the 'xfs_legacy_timestamp', that is -(2^31) seconds relative to the 1970-01-01 00:00:00 UTC, it can be converted to human-friendly time value by 'date' command:
/* command begin */ [root@~]# date --utc -d '@0' +'%Y-%m-%d %H:%M:%S' 1970-01-01 00:00:00 [root@~]# date --utc -d "@`echo '-(2^31)'|bc`" +'%Y-%m-%d %H:%M:%S' 1901-12-13 20:45:52 [root@~]# /* command end */
When 'bigtime' feature is enabled, this 64-bit type becomes a 64-bit nanoseconds counter, with the start time value is the min time value of 'xfs_legacy_timestamp'(start time means the value of 64-bit nanoseconds counter is 0). We have already caculated the min time value of 'xfs_legacy_timestamp', that is 1901-12-13 20:45:52 UTC, but the comment for the start time value of inode with 'bigtime' feature enabled writes the value is 1901-12-31 20:45:52 UTC:
/* fs/xfs/libxfs/xfs_format.h begin */ /* * XFS Timestamps * ============== * When the bigtime feature is enabled, ondisk inode timestamps become an * unsigned 64-bit nanoseconds counter. This means that the bigtime inode * timestamp epoch is the start of the classic timestamp range, which is * Dec 31 20:45:52 UTC 1901. ... ... */ /* fs/xfs/libxfs/xfs_format.h end */
That is a typo, and this patch corrects the typo, from 'Dec 31' to 'Dec 13'.
Suggested-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Xiaole He hexiaole@kylinos.cn Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_format.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index dd764da08f6f..8faa2a6069e6 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -870,7 +870,7 @@ struct xfs_agfl { * When the bigtime feature is enabled, ondisk inode timestamps become an * unsigned 64-bit nanoseconds counter. This means that the bigtime inode * timestamp epoch is the start of the classic timestamp range, which is - * Dec 31 20:45:52 UTC 1901. Because the epochs are not the same, callers + * Dec 13 20:45:52 UTC 1901. Because the epochs are not the same, callers * /must/ use the bigtime conversion functions when encoding and decoding raw * timestamps. */
From: Xiaole He hexiaole@kylinos.cn
mainline inclusion from mainline-v6.0-rc1 commit 031d166f968efba6e4f091ff75d0bb5206bb3918 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In 'fs/xfs/libxfs/xfs_trans_resv.c', the comment for transaction of removing a directory entry writes:
/* fs/xfs/libxfs/xfs_trans_resv.c begin */ /* * For removing a directory entry we can modify: * the parent directory inode: inode size * the removed inode: inode size ... xfs_calc_remove_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES(mp) + xfs_calc_iunlink_add_reservation(mp) + max((xfs_calc_inode_res(mp, 1) + ... /* fs/xfs/libxfs/xfs_trans_resv.c end */
There has 2 inode size of space to be reserverd, but the actual code for inode reservation space writes.
There only count for 1 inode size to be reserved in 'xfs_calc_inode_res(mp, 1)', rather than 2.
Signed-off-by: hexiaole hexiaole@kylinos.cn Reviewed-by: Darrick J. Wong djwong@kernel.org [djwong: remove redundant code citations] Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_trans_resv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index ce12c8142bd1..faf9d4839816 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -423,7 +423,7 @@ xfs_calc_remove_reservation( { return XFS_DQUOT_LOGRES(mp) + xfs_calc_iunlink_add_reservation(mp) + - max((xfs_calc_inode_res(mp, 1) + + max((xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
From: "Darrick J. Wong" darrick.wong@oracle.com
mainline inclusion from mainline-v5.10-rc5 commit da531cc46ef16301b1bc5bc74acbaacc628904f5 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
xfs_iget can return -ENOENT for a file that the inobt thinks is allocated but has zeroed mode. This currently causes scrub to exit with an operational error instead of flagging this as a corruption. The end result is that scrub mistakenly reports the ENOENT to the user instead of "directory parent pointer corrupt" like we do for EINVAL.
Fixes: 5927268f5a04 ("xfs: flag inode corruption if parent ptr doesn't get us a real inode") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/scrub/parent.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c index 855aa8bcab64..66c35f6dfc24 100644 --- a/fs/xfs/scrub/parent.c +++ b/fs/xfs/scrub/parent.c @@ -164,13 +164,13 @@ xchk_parent_validate( * can't use DONTCACHE here because DONTCACHE inodes can trigger * immediate inactive cleanup of the inode. * - * If _iget returns -EINVAL then the parent inode number is garbage - * and the directory is corrupt. If the _iget returns -EFSCORRUPTED - * or -EFSBADCRC then the parent is corrupt which is a cross - * referencing error. Any other error is an operational error. + * If _iget returns -EINVAL or -ENOENT then the parent inode number is + * garbage and the directory is corrupt. If the _iget returns + * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a + * cross referencing error. Any other error is an operational error. */ error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp); - if (error == -EINVAL) { + if (error == -EINVAL || error == -ENOENT) { error = -EFSCORRUPTED; xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error); goto out;
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.11-rc4 commit f50b8f475a2c70ae8309c16b6d4ecb305a4aa9d6 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add a helper to factor out the nowait locking logical for the read/write helpers.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Conflicts: fs/xfs/xfs_file.c Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_file.c | 55 +++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 26 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 80adec66744b..ebc1de5fb2d7 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -224,6 +224,23 @@ xfs_file_fsync( return error; }
+static int +xfs_ilock_iocb( + struct kiocb *iocb, + unsigned int lock_mode) +{ + struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!xfs_ilock_nowait(ip, lock_mode)) + return -EAGAIN; + } else { + xfs_ilock(ip, lock_mode); + } + + return 0; +} + STATIC ssize_t xfs_file_dio_aio_read( struct kiocb *iocb, @@ -240,12 +257,9 @@ xfs_file_dio_aio_read(
file_accessed(iocb->ki_filp);
- if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) - return -EAGAIN; - } else { - xfs_ilock(ip, XFS_IOLOCK_SHARED); - } + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); + if (ret) + return ret; ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, is_sync_kiocb(iocb)); xfs_iunlock(ip, XFS_IOLOCK_SHARED); @@ -267,13 +281,9 @@ xfs_file_dax_read( if (!count) return 0; /* skip atime */
- if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) - return -EAGAIN; - } else { - xfs_ilock(ip, XFS_IOLOCK_SHARED); - } - + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); + if (ret) + return ret; ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops); xfs_iunlock(ip, XFS_IOLOCK_SHARED);
@@ -292,12 +302,9 @@ xfs_file_buffered_aio_read( trace_xfs_file_buffered_read(ip, iov_iter_count(to), iocb->ki_pos); fs_file_read_do_trace(iocb);
- if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) - return -EAGAIN; - } else { - xfs_ilock(ip, XFS_IOLOCK_SHARED); - } + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); + if (ret) + return ret; ret = generic_file_read_iter(iocb, to); xfs_iunlock(ip, XFS_IOLOCK_SHARED);
@@ -650,13 +657,9 @@ xfs_file_dax_write( size_t count; loff_t pos;
- if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, iolock)) - return -EAGAIN; - } else { - xfs_ilock(ip, iolock); - } - + ret = xfs_ilock_iocb(iocb, iolock); + if (ret) + return ret; ret = xfs_file_aio_write_checks(iocb, from, &iolock); if (ret) goto out;
From: "Darrick J. Wong" darrick.wong@oracle.com
mainline inclusion from mainline-v5.10-rc5 commit 4b80ac64450f169bae364df631d233d66070a06e category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
It's possible that xfs_iget can return EINVAL for inodes that the inobt thinks are free, or ENOENT for inodes that look free. If this is the case, mark the directory corrupt immediately when we check ftype. Note that we already check the ftype of the '.' and '..' entries, so we can skip the iget part since we already know the inode type for '.' and we have a separate parent pointer scrubber for '..'.
Fixes: a5c46e5e8912 ("xfs: scrub directory metadata") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/scrub/dir.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index b045e95c2ea7..178b3455a170 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -66,8 +66,18 @@ xchk_dir_check_ftype( * eofblocks cleanup (which allocates what would be a nested * transaction), we can't use DONTCACHE here because DONTCACHE * inodes can trigger immediate inactive cleanup of the inode. + * + * If _iget returns -EINVAL or -ENOENT then the child inode number is + * garbage and the directory is corrupt. If the _iget returns + * -EFSCORRUPTED or -EFSBADCRC then the child is corrupt which is a + * cross referencing error. Any other error is an operational error. */ error = xfs_iget(mp, sdc->sc->tp, inum, 0, 0, &ip); + if (error == -EINVAL || error == -ENOENT) { + error = -EFSCORRUPTED; + xchk_fblock_process_error(sdc->sc, XFS_DATA_FORK, 0, &error); + goto out; + } if (!xchk_fblock_xref_process_error(sdc->sc, XFS_DATA_FORK, offset, &error)) goto out; @@ -105,6 +115,7 @@ xchk_dir_actor( struct xfs_name xname; xfs_ino_t lookup_ino; xfs_dablk_t offset; + bool checked_ftype = false; int error = 0;
sdc = container_of(dir_iter, struct xchk_dir_ctx, dir_iter); @@ -133,6 +144,7 @@ xchk_dir_actor( if (xfs_sb_version_hasftype(&mp->m_sb) && type != DT_DIR) xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset); + checked_ftype = true; if (ino != ip->i_ino) xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset); @@ -144,6 +156,7 @@ xchk_dir_actor( if (xfs_sb_version_hasftype(&mp->m_sb) && type != DT_DIR) xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset); + checked_ftype = true; if (ip->i_ino == mp->m_sb.sb_rootino && ino != ip->i_ino) xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset); @@ -167,9 +180,11 @@ xchk_dir_actor( }
/* Verify the file type. This function absorbs error codes. */ - error = xchk_dir_check_ftype(sdc, offset, lookup_ino, type); - if (error) - goto out; + if (!checked_ftype) { + error = xchk_dir_check_ftype(sdc, offset, lookup_ino, type); + if (error) + goto out; + } out: /* * A negative error code returned here is supposed to cause the
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.11-rc4 commit 354be7e3b2baf32e63c0599cc131d393591ba299 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Ensure we don't block on the iolock, or waiting for I/O in xfs_file_aio_write_checks if the caller asked to avoid that.
Fixes: 29a5d29ec181 ("xfs: nowait aio support") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_file.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index ebc1de5fb2d7..d2451de87006 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -363,7 +363,14 @@ xfs_file_aio_write_checks( if (error <= 0) return error;
- error = xfs_break_layouts(inode, iolock, BREAK_WRITE); + if (iocb->ki_flags & IOCB_NOWAIT) { + error = break_layout(inode, false); + if (error == -EWOULDBLOCK) + error = -EAGAIN; + } else { + error = xfs_break_layouts(inode, iolock, BREAK_WRITE); + } + if (error) return error;
@@ -374,7 +381,11 @@ xfs_file_aio_write_checks( if (*iolock == XFS_IOLOCK_SHARED && !IS_NOSEC(inode)) { xfs_iunlock(ip, *iolock); *iolock = XFS_IOLOCK_EXCL; - xfs_ilock(ip, *iolock); + error = xfs_ilock_iocb(iocb, *iolock); + if (error) { + *iolock = 0; + return error; + } goto restart; }
@@ -405,6 +416,10 @@ xfs_file_aio_write_checks( isize = i_size_read(inode); if (iocb->ki_pos > isize) { spin_unlock(&ip->i_flags_lock); + + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + if (!drained_dio) { if (*iolock == XFS_IOLOCK_SHARED) { xfs_iunlock(ip, *iolock); @@ -635,7 +650,8 @@ xfs_file_dio_aio_write( &xfs_dio_write_ops, is_sync_kiocb(iocb) || unaligned_io); out: - xfs_iunlock(ip, iolock); + if (iolock) + xfs_iunlock(ip, iolock);
/* * No fallback to buffered IO after short writes for XFS, direct I/O @@ -674,7 +690,8 @@ xfs_file_dax_write( error = xfs_setfilesize(ip, pos, ret); } out: - xfs_iunlock(ip, iolock); + if (iolock) + xfs_iunlock(ip, iolock); if (error) return error;
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.11-rc4 commit 89e0eb8c13bb842e224b27d7e071262cd84717cb category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In commit 9669f51de5c0 I tried to get rid of the undocumented cow gc lifetime knob. The knob's function was never documented and it now doesn't really have a function since eof and cow gc have been consolidated.
Regrettably, xfs/231 relies on it and regresses on for-next. I did not succeed at getting far enough through fstests patch review for the fixup to land in time.
Restore the sysctl knob, document what it did (does?), put it on the deprecation schedule, and rip out a redundant function.
Fixes: 9669f51de5c0 ("xfs: consolidate the eofblocks and cowblocks workers") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- Documentation/admin-guide/xfs.rst | 16 ++++++++------ fs/xfs/xfs_sysctl.c | 35 +++++++++++++------------------ 2 files changed, 24 insertions(+), 27 deletions(-)
diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst index 4b9096a8ee5e..82cc349e3aac 100644 --- a/Documentation/admin-guide/xfs.rst +++ b/Documentation/admin-guide/xfs.rst @@ -284,6 +284,9 @@ The following sysctls are available for the XFS filesystem: removes unused preallocation from clean inodes and releases the unused space back to the free pool.
+ fs.xfs.speculative_cow_prealloc_lifetime + This is an alias for speculative_prealloc_lifetime. + fs.xfs.error_level (Min: 0 Default: 3 Max: 11) A volume knob for error reporting when internal errors occur. This will generate detailed messages & backtraces for filesystem @@ -356,12 +359,13 @@ The following sysctls are available for the XFS filesystem: Deprecated Sysctls ==================
-=========================== ================ - Name Removal Schedule -=========================== ================ -fs.xfs.irix_sgid_inherit September 2025 -fs.xfs.irix_symlink_mode September 2025 -=========================== ================ +=========================================== ================ + Name Removal Schedule +=========================================== ================ +fs.xfs.irix_sgid_inherit September 2025 +fs.xfs.irix_symlink_mode September 2025 +fs.xfs.speculative_cow_prealloc_lifetime September 2025 +=========================================== ================
Removed Sysctls diff --git a/fs/xfs/xfs_sysctl.c b/fs/xfs/xfs_sysctl.c index 145e06c47744..546a6cd96729 100644 --- a/fs/xfs/xfs_sysctl.c +++ b/fs/xfs/xfs_sysctl.c @@ -51,7 +51,7 @@ xfs_panic_mask_proc_handler( #endif /* CONFIG_PROC_FS */
STATIC int -xfs_deprecate_irix_sgid_inherit_proc_handler( +xfs_deprecated_dointvec_minmax( struct ctl_table *ctl, int write, void *buffer, @@ -59,24 +59,8 @@ xfs_deprecate_irix_sgid_inherit_proc_handler( loff_t *ppos) { if (write) { - printk_once(KERN_WARNING - "XFS: " "%s sysctl option is deprecated.\n", - ctl->procname); - } - return proc_dointvec_minmax(ctl, write, buffer, lenp, ppos); -} - -STATIC int -xfs_deprecate_irix_symlink_mode_proc_handler( - struct ctl_table *ctl, - int write, - void *buffer, - size_t *lenp, - loff_t *ppos) -{ - if (write) { - printk_once(KERN_WARNING - "XFS: " "%s sysctl option is deprecated.\n", + printk_ratelimited(KERN_WARNING + "XFS: %s sysctl option is deprecated.\n", ctl->procname); } return proc_dointvec_minmax(ctl, write, buffer, lenp, ppos); @@ -88,7 +72,7 @@ static struct ctl_table xfs_table[] = { .data = &xfs_params.sgid_inherit.val, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = xfs_deprecate_irix_sgid_inherit_proc_handler, + .proc_handler = xfs_deprecated_dointvec_minmax, .extra1 = &xfs_params.sgid_inherit.min, .extra2 = &xfs_params.sgid_inherit.max }, @@ -97,7 +81,7 @@ static struct ctl_table xfs_table[] = { .data = &xfs_params.symlink_mode.val, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = xfs_deprecate_irix_symlink_mode_proc_handler, + .proc_handler = xfs_deprecated_dointvec_minmax, .extra1 = &xfs_params.symlink_mode.min, .extra2 = &xfs_params.symlink_mode.max }, @@ -201,6 +185,15 @@ static struct ctl_table xfs_table[] = { .extra1 = &xfs_params.blockgc_timer.min, .extra2 = &xfs_params.blockgc_timer.max, }, + { + .procname = "speculative_cow_prealloc_lifetime", + .data = &xfs_params.blockgc_timer.val, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = xfs_deprecated_dointvec_minmax, + .extra1 = &xfs_params.blockgc_timer.min, + .extra2 = &xfs_params.blockgc_timer.max, + }, /* please keep this the last entry */ #ifdef CONFIG_PROC_FS {
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.12-rc4 commit 9de4b514494a3b49fa708186c0dc4611f1fe549c category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If scrub observes cross-referencing errors while scanning a data structure, mark the data structure sick. There's /something/ inconsistent, even if we can't really tell what it is.
Fixes: 4860a05d2475 ("xfs: scrub/repair should update filesystem metadata health") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/scrub/health.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index 83d27cdf579b..3de59b5c2ce6 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -133,7 +133,8 @@ xchk_update_health( if (!sc->sick_mask) return;
- bad = (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT); + bad = (sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT | + XFS_SCRUB_OFLAG_XCORRUPT)); switch (type_to_health_flag[sc->sm->sm_type].group) { case XHG_AG: pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.16-rc2 commit 1090427bf18f9835b3ccbd36edf43f2509444e27 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
With the remove of xfs_dqrele_all_inodes, xfs_inew_wait and all the infrastructure used to wake the XFS_INEW bit waitqueue is unused.
Reported-by: kernel test robot lkp@intel.com Fixes: 777eb1fa857e ("xfs: remove xfs_dqrele_all_inodes") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Conflicts: fs/xfs/xfs_inode.h Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_icache.c | 21 --------------------- fs/xfs/xfs_inode.h | 4 +--- 2 files changed, 1 insertion(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 986d087df226..a722ac1fd8f6 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -296,22 +296,6 @@ xfs_perag_clear_inode_tag( trace_xfs_perag_clear_inode_tag(mp, pag->pag_agno, tag, _RET_IP_); }
-static inline void -xfs_inew_wait( - struct xfs_inode *ip) -{ - wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_INEW_BIT); - DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_INEW_BIT); - - do { - prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); - if (!xfs_iflags_test(ip, XFS_INEW)) - break; - schedule(); - } while (true); - finish_wait(wq, &wait.wq_entry); -} - /* * When we recycle a reclaimable inode, we need to re-initialise the VFS inode * part of the structure. This is made more complex by the fact we store @@ -375,18 +359,13 @@ xfs_iget_recycle( ASSERT(!rwsem_is_locked(&inode->i_rwsem)); error = xfs_reinit_inode(mp, inode); if (error) { - bool wake; - /* * Re-initializing the inode failed, and we are in deep * trouble. Try to re-add it to the reclaim list. */ rcu_read_lock(); spin_lock(&ip->i_flags_lock); - wake = !!__xfs_iflags_test(ip, XFS_INEW); ip->i_flags &= ~(XFS_INEW | XFS_IRECLAIM); - if (wake) - wake_up_bit(&ip->i_flags, __XFS_INEW_BIT); ASSERT(ip->i_flags & XFS_IRECLAIMABLE); spin_unlock(&ip->i_flags_lock); rcu_read_unlock(); diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 57141daff28e..ba0f57dd5392 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -221,8 +221,7 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip) #define XFS_IRECLAIM (1 << 0) /* started reclaiming this inode */ #define XFS_ISTALE (1 << 1) /* inode has been staled */ #define XFS_IRECLAIMABLE (1 << 2) /* inode can be reclaimed */ -#define __XFS_INEW_BIT 3 /* inode has just been allocated */ -#define XFS_INEW (1 << __XFS_INEW_BIT) +#define XFS_INEW (1 << 3) /* inode has just been allocated */ #define XFS_ITRUNCATED (1 << 5) /* truncated down so flush-on-close */ #define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */ #define XFS_IFLUSHING (1 << 7) /* inode is being flushed */ @@ -477,7 +476,6 @@ static inline void xfs_finish_inode_setup(struct xfs_inode *ip) xfs_iflags_clear(ip, XFS_INEW); barrier(); unlock_new_inode(VFS_I(ip)); - wake_up_bit(&ip->i_flags, __XFS_INEW_BIT); }
static inline void xfs_setup_existing_inode(struct xfs_inode *ip)
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.16-rc3 commit b97cca3ba9098522e5a1c3388764ead42640c1a5 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In commit 02b9984d6408, we pushed a sync_filesystem() call from the VFS into xfs_fs_remount. The only time that we ever need to push dirty file data or metadata to disk for a remount is if we're remounting the filesystem read only, so this really could be moved to xfs_remount_ro.
Once we've moved the call site, actually check the return value from sync_filesystem.
Fixes: 02b9984d6408 ("fs: push sync_filesystem() down to the file system's remount_fs()") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_super.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index bf65e2e50ab7..bc4ea6f13dad 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1774,6 +1774,11 @@ xfs_remount_ro( }; int error;
+ /* Flush all the dirty data to disk. */ + error = sync_filesystem(mp->m_super); + if (error) + return error; + /* * Cancel background eofb scanning so it cannot race with the final * log force+buftarg wait and deadlock the remount. @@ -1853,8 +1858,6 @@ xfs_fc_reconfigure( if (error) return error;
- sync_filesystem(mp->m_super); - /* inode32 -> inode64 */ if ((mp->m_flags & XFS_MOUNT_SMALL_INUMS) && !(new_mp->m_flags & XFS_MOUNT_SMALL_INUMS)) {
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.16-rc3 commit eba0549bc7d100691c13384b774346b8aa9cf9a9 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
There are a few places where we test the current process' capability set to decide if we're going to be more or less generous with resource acquisition for a system call. If the process doesn't have the capability, we can continue the call, albeit in a degraded mode.
These are /not/ the actual security decisions, so it's not proper to use capable(), which (in certain selinux setups) causes audit messages to get logged. Switch them to has_capability_noaudit.
Fixes: 7317a03df703f ("xfs: refactor inode ownership change transaction/inode/quota allocation idiom") Fixes: ea9a46e1c4925 ("xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN") Signed-off-by: Darrick J. Wong djwong@kernel.org Cc: Dave Chinner david@fromorbit.com Reviewed-by: Ondrej Mosnacek omosnace@redhat.com Acked-by: Serge Hallyn serge@hallyn.com Reviewed-by: Eric Sandeen sandeen@redhat.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com
Conflicts: fs/xfs/xfs_fsmap.c Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_fsmap.c | 4 ++-- fs/xfs/xfs_ioctl.c | 2 +- fs/xfs/xfs_iops.c | 2 +- kernel/capability.c | 1 + 4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 2d98d8cfae44..bb678a57ec9d 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -848,8 +848,8 @@ xfs_getfsmap( !xfs_getfsmap_is_valid_device(mp, &head->fmh_keys[1])) return -EINVAL;
- use_rmap = capable(CAP_SYS_ADMIN) && - xfs_sb_version_hasrmapbt(&mp->m_sb); + use_rmap = xfs_sb_version_hasrmapbt(&mp->m_sb) && + has_capability_noaudit(current, CAP_SYS_ADMIN); head->fmh_entries = 0;
/* Set up our device handlers. */ diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index bf05525ba88c..87299bab516c 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1290,7 +1290,7 @@ xfs_ioctl_setattr_get_trans( goto out_error;
error = xfs_trans_alloc_ichange(ip, NULL, NULL, pdqp, - capable(CAP_FOWNER), &tp); + has_capability_noaudit(current, CAP_FOWNER), &tp); if (error) goto out_error;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 51f84c0e9417..2bcd5b4c7b73 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -704,7 +704,7 @@ xfs_setattr_nonsize( }
error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL, - capable(CAP_FOWNER), &tp); + has_capability_noaudit(current, CAP_FOWNER), &tp); if (error) goto out_dqrele;
diff --git a/kernel/capability.c b/kernel/capability.c index de7eac903a2a..c5e1871a0ea7 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -360,6 +360,7 @@ bool has_capability_noaudit(struct task_struct *t, int cap) { return has_ns_capability_noaudit(t, &init_user_ns, cap); } +EXPORT_SYMBOL(has_capability_noaudit);
static bool ns_capable_common(struct user_namespace *ns, int cap,
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.16-rc3 commit 70447e0ad9781f84e60e0990888bd8c84987f44e category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When the AIL tries to flush the CIL, it relies on the CIL push ending up on stable storage without having to wait for and manipulate iclog state directly. However, if there is already a pending CIL push when the AIL tries to flush the CIL, it won't set the cil->xc_push_commit_stable flag and so the CIL push will not actively flush the commit record iclog.
generic/530 when run on a single CPU test VM can trigger this fairly reliably. This test exercises unlinked inode recovery, and can result in inodes being pinned in memory by ongoing modifications to the inode cluster buffer to record unlinked list modifications. As a result, the first inode unlinked in a buffer can pin the tail of the log whilst the inode cluster buffer is pinned by the current checkpoint that has been pushed but isn't on stable storage because because the cil->xc_push_commit_stable was not set. This results in the log/AIL effectively deadlocking until something triggers the commit record iclog to be pushed to stable storage (i.e. the periodic log worker calling xfs_log_force()).
The fix is two-fold - first we should always set the cil->xc_push_commit_stable when xlog_cil_flush() is called, regardless of whether there is already a pending push or not.
Second, if the CIL is empty, we should trigger an iclog flush to ensure that the iclogs of the last checkpoint have actually been submitted to disk as that checkpoint may not have been run under stable completion constraints.
Reported-and-tested-by: Matthew Wilcox willy@infradead.org Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_log_cil.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index c5118801218b..c2fa07909ea6 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -1243,18 +1243,27 @@ xlog_cil_push_now( if (!async) flush_workqueue(cil->xc_push_wq);
+ spin_lock(&cil->xc_push_lock); + + /* + * If this is an async flush request, we always need to set the + * xc_push_commit_stable flag even if something else has already queued + * a push. The flush caller is asking for the CIL to be on stable + * storage when the next push completes, so regardless of who has queued + * the push, the flush requires stable semantics from it. + */ + cil->xc_push_commit_stable = async; + /* * If the CIL is empty or we've already pushed the sequence then - * there's no work we need to do. + * there's no more work that we need to do. */ - spin_lock(&cil->xc_push_lock); if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) { spin_unlock(&cil->xc_push_lock); return; }
cil->xc_push_seq = push_seq; - cil->xc_push_commit_stable = async; queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work); spin_unlock(&cil->xc_push_lock); } @@ -1352,6 +1361,13 @@ xlog_cil_flush(
trace_xfs_log_force(log->l_mp, seq, _RET_IP_); xlog_cil_push_now(log, seq, true); + + /* + * If the CIL is empty, make sure that any previous checkpoint that may + * still be in an active iclog is pushed to stable storage. + */ + if (list_empty(&log->l_cilp->xc_cil)) + xfs_log_force(log->l_mp, 0); }
/*
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.16-rc3 commit cd6f79d1fb324968a3bae92f82eeb7d28ca1fd22 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Brian reported a null pointer dereference failure during unmount in xfs/006. He tracked the problem down to the AIL being torn down before a log shutdown had completed and removed all the items from the AIL. The failure occurred in this path while unmount was proceeding in another task:
xfs_trans_ail_delete+0x102/0x130 [xfs] xfs_buf_item_done+0x22/0x30 [xfs] xfs_buf_ioend+0x73/0x4d0 [xfs] xfs_trans_committed_bulk+0x17e/0x2f0 [xfs] xlog_cil_committed+0x2a9/0x300 [xfs] xlog_cil_process_committed+0x69/0x80 [xfs] xlog_state_shutdown_callbacks+0xce/0xf0 [xfs] xlog_force_shutdown+0xdf/0x150 [xfs] xfs_do_force_shutdown+0x5f/0x150 [xfs] xlog_ioend_work+0x71/0x80 [xfs] process_one_work+0x1c5/0x390 worker_thread+0x30/0x350 kthread+0xd7/0x100 ret_from_fork+0x1f/0x30
This is processing an EIO error to a log write, and it's triggering a force shutdown. This causes the log to be shut down, and then it is running attached iclog callbacks from the shutdown context. That means the fs and log has already been marked as xfs_is_shutdown/xlog_is_shutdown and so high level code will abort (e.g. xfs_trans_commit(), xfs_log_force(), etc) with an error because of shutdown.
The umount would have been blocked waiting for a log force completion inside xfs_log_cover() -> xfs_sync_sb(). The first thing for this situation to occur is for xfs_sync_sb() to exit without waiting for the iclog buffer to be comitted to disk. The above trace is the completion routine for the iclog buffer, and it is shutting down the filesystem.
xlog_state_shutdown_callbacks() does this:
{ struct xlog_in_core *iclog; LIST_HEAD(cb_list);
spin_lock(&log->l_icloglock); iclog = log->l_iclog; do { if (atomic_read(&iclog->ic_refcnt)) { /* Reference holder will re-run iclog callbacks. */ continue; } list_splice_init(&iclog->ic_callbacks, &cb_list);
wake_up_all(&iclog->ic_write_wait); wake_up_all(&iclog->ic_force_wait);
} while ((iclog = iclog->ic_next) != log->l_iclog);
wake_up_all(&log->l_flush_wait); spin_unlock(&log->l_icloglock);
xlog_cil_process_committed(&cb_list);
}
This wakes any thread waiting on IO completion of the iclog (in this case the umount log force) before shutdown processes all the pending callbacks. That means the xfs_sync_sb() waiting on a sync transaction in xfs_log_force() on iclog->ic_force_wait will get woken before the callbacks attached to that iclog are run. This results in xfs_sync_sb() returning an error, and so unmount unblocks and continues to run whilst the log shutdown is still in progress.
Normally this is just fine because the force waiter has nothing to do with AIL operations. But in the case of this unmount path, the log force waiter goes on to tear down the AIL because the log is now shut down and so nothing ever blocks it again from the wait point in xfs_log_cover().
Hence it's a race to see who gets to the AIL first - the unmount code or xlog_cil_process_committed() killing the superblock buffer.
To fix this, we just have to change the order of processing in xlog_state_shutdown_callbacks() to run the callbacks before it wakes any task waiting on completion of the iclog.
Reported-by: Brian Foster bfoster@redhat.com Fixes: aad7272a9208 ("xfs: separate out log shutdown callback processing") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_log.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 91e8640a4711..367aa7be2cad 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -483,7 +483,10 @@ xfs_log_reserve( * Run all the pending iclog callbacks and wake log force waiters and iclog * space waiters so they can process the newly set shutdown state. We really * don't care what order we process callbacks here because the log is shut down - * and so state cannot change on disk anymore. + * and so state cannot change on disk anymore. However, we cannot wake waiters + * until the callbacks have been processed because we may be in unmount and + * we must ensure that all AIL operations the callbacks perform have completed + * before we tear down the AIL. * * We avoid processing actively referenced iclogs so that we don't run callbacks * while the iclog owner might still be preparing the iclog for IO submssion. @@ -497,7 +500,6 @@ xlog_state_shutdown_callbacks( struct xlog_in_core *iclog; LIST_HEAD(cb_list);
- spin_lock(&log->l_icloglock); iclog = log->l_iclog; do { if (atomic_read(&iclog->ic_refcnt)) { @@ -505,14 +507,16 @@ xlog_state_shutdown_callbacks( continue; } list_splice_init(&iclog->ic_callbacks, &cb_list); + spin_unlock(&log->l_icloglock); + + xlog_cil_process_committed(&cb_list); + + spin_lock(&log->l_icloglock); wake_up_all(&iclog->ic_write_wait); wake_up_all(&iclog->ic_force_wait); } while ((iclog = iclog->ic_next) != log->l_iclog);
wake_up_all(&log->l_flush_wait); - spin_unlock(&log->l_icloglock); - - xlog_cil_process_committed(&cb_list); }
/* @@ -556,11 +560,8 @@ xlog_state_release_iclog( * pending iclog callbacks that were waiting on the release of * this iclog. */ - if (last_ref) { - spin_unlock(&log->l_icloglock); + if (last_ref) xlog_state_shutdown_callbacks(log); - spin_lock(&log->l_icloglock); - } return -EIO; }
@@ -3769,7 +3770,10 @@ xlog_force_shutdown( wake_up_all(&log->l_cilp->xc_start_wait); wake_up_all(&log->l_cilp->xc_commit_wait); spin_unlock(&log->l_cilp->xc_push_lock); + + spin_lock(&log->l_icloglock); xlog_state_shutdown_callbacks(log); + spin_unlock(&log->l_icloglock);
return log_error; }
From: Eric Sandeen sandeen@redhat.com
mainline inclusion from mainline-v5.16-rc3 commit bc37e4fb5cac2925b2e286b1f1d4fc2b519f7d92 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This reverts commit 4b8628d57b725b32616965e66975fcdebe008fe7.
XFS quota has had the concept of a "quota warning limit" since the earliest Irix implementation, but a mechanism for incrementing the warning counter was never implemented, as documented in the xfs_quota(8) man page. We do know from the historical archive that it was never incremented at runtime during quota reservation operations.
With this commit, the warning counter quickly increments for every allocation attempt after the user has crossed a quote soft limit threshold, and this in turn transitions the user to hard quota failures, rendering soft quota thresholds and timers useless. This was reported as a regression by users.
Because the intended behavior of this warning counter has never been understood or documented, and the result of this change is a regression in soft quota functionality, revert this commit to make soft quota limits and timers operable again.
Fixes: 4b8628d57b72 ("xfs: actually bump warning counts when we send warnings) Signed-off-by: Eric Sandeen sandeen@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_trans_dquot.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index 3d7386650cfe..d9b861581833 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -615,7 +615,6 @@ xfs_dqresv_check( return QUOTA_NL_ISOFTLONGWARN; }
- res->warnings++; return QUOTA_NL_ISOFTWARN; }
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.16-rc3 commit 86d40f1e49e9a909d25c35ba01bea80dbcd758cb category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
xfs/434 and xfs/436 have been reporting occasional memory leaks of xfs_dquot objects. These tests themselves were the messenger, not the culprit, since they unload the xfs module, which trips the slub debugging code while tearing down all the xfs slab caches: Reviewed-by: Zhang Yi yi.zhang@huawei.com
============================================================================= BUG xfs_dquot (Tainted: G W ): Objects remaining in xfs_dquot on __kmem_cache_shutdown() -----------------------------------------------------------------------------
Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff) CPU: 0 PID: 3953166 Comm: modprobe Tainted: G W 5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
Since we don't generally rmmod the xfs module between fstests, this means that xfs/434 is really just the canary in the coal mine -- something leaked a dquot, but we don't know who. After days of pounding on fstests with kmemleak enabled, I finally got it to spit this out:
unreferenced object 0xffff8880465654c0 (size 536): comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s) hex dump (first 32 bytes): 60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff `JVF....X...... 00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00 ..RI............ backtrace: [<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs] [<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs] [<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs] [<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs] [<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs] [<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs] [<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs] [<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs] [<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs] [<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs] [<ffffffff81170bb2>] process_one_work+0x672/0x1040 [<ffffffff81171b1b>] worker_thread+0x59b/0xec0 [<ffffffff8118711e>] kthread+0x29e/0x340 [<ffffffff810032bf>] ret_from_fork+0x1f/0x30
Now we know that quotacheck is at fault, but even this report was canaryish -- it was triggered by xfs/494, which doesn't actually mount any filesystems. (kmemleak can be a little slow to notice leaks, even with fstests repeatedly whacking it to look for them.) Looking at the *previous* fstest, however, showed that the test run before xfs/494 was xfs/117. The tipoff to the problem is in this excerpt from dmesg:
XFS (sda4): Quotacheck needed: Please wait. XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68 ..........WTT.Lh 00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00 ..}.m...4.}.m... 00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00 4.}.m........... 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab ................ 00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03 .....W{......... 00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08 ................ XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas.
The dinode verifier decided that the inode was corrupt, which causes iget to return with EFSCORRUPTED. Since this happened during quotacheck, it is obvious that the kernel aborted the inode walk on account of the corruption error and disabled quotas. Unfortunately, we neglect to purge the dquot cache before doing that, which is how the dquots leaked.
The problems started 10 years ago in commit b84a3a, when the dquot lists were converted to a radix tree, but the error handling behavior was not correctly preserved -- in that commit, if the bulkstat failed and usrquota was enabled, the bulkstat failure code would be overwritten by the result of flushing all the dquots to disk. As long as that succeeds, we'd continue the quota mount as if everything were ok, but instead we're now operating with a corrupt inode and incorrect quota usage counts. I didn't notice this bug in 2019 when I wrote commit ebd126a, which changed quotacheck to skip the dqflush when the scan doesn't complete due to inode walk failures.
Introduced-by: b84a3a96751f ("xfs: remove the per-filesystem list of dquots") Fixes: ebd126a651f8 ("xfs: convert quotacheck to use the new iwalk functions") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_qm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 926d262c1236..0a6c8cc8e997 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1312,8 +1312,15 @@ xfs_qm_quotacheck(
error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); - if (error) + if (error) { + /* + * The inode walk may have partially populated the dquot + * caches. We must purge them before disabling quota and + * tearing down the quotainfo, or else the dquots will leak. + */ + xfs_qm_dqpurge_all(mp); goto error_return; + }
/* * We've made all the changes that we need to make incore. Flush them
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.16-rc3 commit 7561cea5dbb97fecb952548a0fb74fb105bf4664 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
KASAN reported the following use after free bug when running generic/475:
XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) XFS (dm-0): Ending recovery (logdev: internal) Buffer I/O error on dev dm-0, logical block 20639616, async page read Buffer I/O error on dev dm-0, logical block 20639617, async page read XFS (dm-0): log I/O error -5 XFS (dm-0): Filesystem has been shut down due to log error (0x2). XFS (dm-0): Unmounting Filesystem XFS (dm-0): Please unmount the filesystem and rectify the problem(s). ================================================================== BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270 Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136
CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs] Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2b8/0x661 ? do_raw_spin_lock+0x246/0x270 kasan_report+0xab/0x120 ? do_raw_spin_lock+0x246/0x270 do_raw_spin_lock+0x246/0x270 ? rwlock_bug.part.0+0x90/0x90 xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] process_one_work+0x672/0x1040 worker_thread+0x59b/0xec0 ? __kthread_parkme+0xc6/0x1f0 ? process_one_work+0x1040/0x1040 ? process_one_work+0x1040/0x1040 kthread+0x29e/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK>
Allocated by task 154099: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0x8d/0x2e0 [xfs] xlog_cil_init+0x1f/0x540 [xfs] xlog_alloc_log+0xd1e/0x1260 [xfs] xfs_log_mount+0xba/0x640 [xfs] xfs_mountfs+0xf2b/0x1d00 [xfs] xfs_fs_fill_super+0x10af/0x1910 [xfs] get_tree_bdev+0x383/0x670 vfs_get_tree+0x7d/0x240 path_mount+0xdb7/0x1890 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 154151: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x110/0x190 slab_free_freelist_hook+0xab/0x180 kfree+0xbc/0x310 xlog_dealloc_log+0x1b/0x2b0 [xfs] xfs_unmountfs+0x119/0x200 [xfs] xfs_fs_put_super+0x6e/0x2e0 [xfs] generic_shutdown_super+0x12b/0x3a0 kill_block_super+0x95/0xd0 deactivate_locked_super+0x80/0x130 cleanup_mnt+0x329/0x4d0 task_work_run+0xc5/0x160 exit_to_user_mode_prepare+0xd4/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x46/0xb0
This appears to be a race between the unmount process, which frees the CIL and waits for in-flight iclog IO; and the iclog IO completion. When generic/475 runs, it starts fsstress in the background, waits a few seconds, and substitutes a dm-error device to simulate a disk falling out of a machine. If the fsstress encounters EIO on a pure data write, it will exit but the filesystem will still be online.
The next thing the test does is unmount the filesystem, which tries to clean the log, free the CIL, and wait for iclog IO completion. If an iclog was being written when the dm-error switch occurred, it can race with log unmounting as follows:
Thread 1 Thread 2
xfs_log_unmount xfs_log_clean xfs_log_quiesce xlog_ioend_work <observe error> xlog_force_shutdown test_and_set_bit(XLOG_IOERROR) xfs_log_force <log is shut down, nop> xfs_log_umount_write <log is shut down, nop> xlog_dealloc_log xlog_cil_destroy <wait for iclogs> spin_lock(&log->l_cilp->xc_push_lock) <KABOOM>
Therefore, free the CIL after waiting for the iclogs to complete. I /think/ this race has existed for quite a few years now, though I don't remember the ~2014 era logging code well enough to know if it was a real threat then or if the actual race was exposed only more recently.
Fixes: ac983517ec59 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_log.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 367aa7be2cad..681bdcbe2265 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1941,8 +1941,6 @@ xlog_dealloc_log( xlog_in_core_t *iclog, *next_iclog; int i;
- xlog_cil_destroy(log); - /* * Cycle all the iclogbuf locks to make sure all log IO completion * is done before we tear down these buffers. @@ -1954,6 +1952,13 @@ xlog_dealloc_log( iclog = iclog->ic_next; }
+ /* + * Destroy the CIL after waiting for iclog IO completion because an + * iclog EIO error will try to shut down the log, which accesses the + * CIL to wake up the waiters. + */ + xlog_cil_destroy(log); + iclog = log->l_iclog; for (i = 0; i < log->l_iclog_bufs; i++) { next_iclog = iclog->ic_next;
From: Dan Carpenter dan.carpenter@oracle.com
stable inclusion from stable-v5.10.140 commit 6a564bad3a6474a5247491d2b48637ec69d429dd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5W3GQ
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 6ed6356b07714e0198be3bc3ecccc8b40a212de4 upstream.
The "bufsize" comes from the root user. If "bufsize" is negative then, because of type promotion, neither of the validation checks at the start of the function are able to catch it:
if (bufsize < sizeof(struct xfs_attrlist) || bufsize > XFS_XATTR_LIST_MAX) return -EINVAL;
This means "bufsize" will trigger (WARN_ON_ONCE(size > INT_MAX)) in kvmalloc_node(). Fix this by changing the type from int to size_t.
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Wang Hai wanghai38@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_ioctl.c | 2 +- fs/xfs/xfs_ioctl.h | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 87299bab516c..814345b6a245 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -371,7 +371,7 @@ int xfs_ioc_attr_list( struct xfs_inode *dp, void __user *ubuf, - int bufsize, + size_t bufsize, int flags, struct xfs_attrlist_cursor __user *ucursor) { diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h index bab6a5a92407..416e20de66e7 100644 --- a/fs/xfs/xfs_ioctl.h +++ b/fs/xfs/xfs_ioctl.h @@ -38,8 +38,9 @@ xfs_readlink_by_handle( int xfs_ioc_attrmulti_one(struct file *parfilp, struct inode *inode, uint32_t opcode, void __user *uname, void __user *value, uint32_t *len, uint32_t flags); -int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf, int bufsize, - int flags, struct xfs_attrlist_cursor __user *ucursor); +int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf, + size_t bufsize, int flags, + struct xfs_attrlist_cursor __user *ucursor);
extern struct dentry * xfs_handle_to_dentry(
From: "Darrick J. Wong" djwong@kernel.org
stable inclusion from stable-v5.10.140 commit 1b9b4139d794cf0ae51ba3dd91f009c77fab16d0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5W3GQ
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 29d650f7e3ab55283b89c9f5883d0c256ce478b5 upstream.
Syzbot tripped over the following complaint from the kernel:
WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597
While trying to run XFS_IOC_GETBMAP against the following structure:
struct getbmap fubar = { .bmv_count = 0x22dae649, };
Obviously, this is a crazy huge value since the next thing that the ioctl would do is allocate 37GB of memory. This is enough to make kvmalloc mad, but isn't large enough to trip the validation functions. In other words, I'm fussing with checks that were **already sufficient** because that's easier than dealing with 644 internal bug reports. Yes, that's right, six hundred and forty-four.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Catherine Hoang catherine.hoang@oracle.com Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Wang Hai wanghai38@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 814345b6a245..f1de12df880e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1680,7 +1680,7 @@ xfs_ioc_getbmap(
if (bmx.bmv_count < 2) return -EINVAL; - if (bmx.bmv_count > ULONG_MAX / recsize) + if (bmx.bmv_count >= INT_MAX / recsize) return -ENOMEM;
buf = kvzalloc(bmx.bmv_count * sizeof(*buf), GFP_KERNEL);
From: Long Li leo.lilong@huawei.com
Offering: HULK hulk inclusion category: bugfix bugzilla: 186982,https://gitee.com/openeuler/kernel/issues/I4KIAO
--------------------------------
When lazysbcount is enabled, fsstress and loop mount/unmount test report the following problems:
XFS (loop0): SB summary counter sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460, xfs_sb block 0x0 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00 XFSB.........(.. 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a i.|._.D..t....4Z 00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80 ..... .......... 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ 00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00 ................ 00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19 ................ XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply +0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed
This corruption will shutdown the file system and the file system will no longer be mountable. The following script can reproduce the problem, but it may take a long time.
#!/bin/bash
device=/dev/sda testdir=/mnt/test round=0
function fail() { echo "$*" exit 1 }
mkdir -p $testdir while [ $round -lt 10000 ] do echo "******* round $round ********" mkfs.xfs -f $device mount $device $testdir || fail "mount failed!" fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null & sleep 4 killall -w fsstress umount $testdir xfs_repair -e $device > /dev/null if [ $? -eq 2 ];then echo "ERR CODE 2: Dirty log exception during repair." exit 1 fi round=$(($round+1)) done
With lazysbcount is enabled, There is no additional lock protection for reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the m_ifree, this will make the m_ifree greater than m_icount. For example, consider the following sequence and ifreedelta is postive:
CPU0 CPU1 xfs_log_sb xfs_trans_unreserve_and_mod_sb ---------- ------------------------------ percpu_counter_sum(&mp->m_icount) percpu_counter_add_batch(&mp->m_icount, idelta, XFS_ICOUNT_BATCH) percpu_counter_add(&mp->m_ifree, ifreedelta); percpu_counter_sum(&mp->m_ifree)
After this, incorrect inode count (sb_ifree > sb_icount) will be writen to the log. In the subsequent writing of sb, incorrect inode count (sb_ifree > sb_icount) will fail to pass the boundary check in xfs_validate_sb_write() that cause the file system shutdown.
When lazysbcount is enabled, we don't need to guarantee that Lazy sb counters are completely correct, but we do need to guarantee that sb_ifree <= sb_icount. On the other hand, the constraint that m_ifree <= m_icount must be satisfied any time that there /cannot/ be other threads allocating or freeing inode chunks. If the constraint is violated under these circumstances, sb_i{count,free} (the ondisk superblock inode counters) maybe incorrect and need to be marked sick at unmount, the count will be rebuilt on the next mount.
Fixes: 8756a5af1819 ("libxfs: add more bounds checking to sb sanity checks") Signed-off-by: Long Li leo.lilong@huawei.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_sb.c | 4 +++- fs/xfs/xfs_mount.c | 16 +++++++++++++++- 2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 13ef340044b3..1296cff23a3b 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -966,7 +966,9 @@ xfs_log_sb( */ if (xfs_sb_version_haslazysbcount(&mp->m_sb)) { mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); - mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); + mp->m_sb.sb_ifree = min_t(uint64_t, + percpu_counter_sum(&mp->m_ifree), + mp->m_sb.sb_icount); mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); }
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 959425cfb612..6504503f07b3 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -637,6 +637,20 @@ xfs_check_summary_counts( return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount); }
+static void +xfs_ifree_unmount( + struct xfs_mount *mp) +{ + if (XFS_FORCED_SHUTDOWN(mp)) + return; + + if (percpu_counter_sum(&mp->m_ifree) > + percpu_counter_sum(&mp->m_icount)) { + xfs_alert(mp, "ifree/icount mismatch at unmount"); + xfs_fs_mark_sick(mp, XFS_SICK_FS_COUNTERS); + } +} + /* * Flush and reclaim dirty inodes in preparation for unmount. Inodes and * internal inode structures can be sitting in the CIL and AIL at this point, @@ -1160,7 +1174,7 @@ xfs_unmountfs( xfs_warn(mp, "Unable to update superblock counters. " "Freespace may not be correct on next mount.");
- + xfs_ifree_unmount(mp); xfs_log_unmount(mp); xfs_da_unmount(mp); xfs_uuid_unmount(mp);
From: Zeng Heng zengheng4@huawei.com
mainline inclusion from mainline-v6.1-rc1 commit cf4f4c12dea7a977a143c8fe5af1740b7f9876f8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When `xfs_sysfs_init` returns failed, `mp->m_errortag` needs to free. Otherwise kmemleak would report memory leak after mounting xfs image:
unreferenced object 0xffff888101364900 (size 192): comm "mount", pid 13099, jiffies 4294915218 (age 335.207s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f08ad25c>] __kmalloc+0x41/0x1b0 [<00000000dca9aeb6>] kmem_alloc+0xfd/0x430 [<0000000040361882>] xfs_errortag_init+0x20/0x110 [<00000000b384a0f6>] xfs_mountfs+0x6ea/0x1a30 [<000000003774395d>] xfs_fs_fill_super+0xe10/0x1a80 [<000000009cf07b6c>] get_tree_bdev+0x3e7/0x700 [<00000000046b5426>] vfs_get_tree+0x8e/0x2e0 [<00000000952ec082>] path_mount+0xf8c/0x1990 [<00000000beb1f838>] do_mount+0xee/0x110 [<000000000e9c41bb>] __x64_sys_mount+0x14b/0x1f0 [<00000000f7bb938e>] do_syscall_64+0x3b/0x90 [<000000003fcd67a9>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: c68401011522 ("xfs: expose errortag knobs via sysfs") Signed-off-by: Zeng Heng zengheng4@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_error.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index f9e2f606b5b8..aed72fee71cb 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -215,13 +215,18 @@ int xfs_errortag_init( struct xfs_mount *mp) { + int ret; + mp->m_errortag = kmem_zalloc(sizeof(unsigned int) * XFS_ERRTAG_MAX, KM_MAYFAIL); if (!mp->m_errortag) return -ENOMEM;
- return xfs_sysfs_init(&mp->m_errortag_kobj, &xfs_errortag_ktype, - &mp->m_kobj, "errortag"); + ret = xfs_sysfs_init(&mp->m_errortag_kobj, &xfs_errortag_ktype, + &mp->m_kobj, "errortag"); + if (ret) + kmem_free(mp->m_errortag); + return ret; }
void
From: Li Zetao lizetao1@huawei.com
mainline inclusion from mainline-v6.1-rc1 commit d08af40340cad0e025d643c3982781a8f99d5032 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
kmemleak reported a sequence of memory leaks, and one of them indicated we failed to free a pointer: comm "mount", pid 19610, jiffies 4297086464 (age 60.635s) hex dump (first 8 bytes): 73 64 61 00 81 88 ff ff sda..... backtrace: [<00000000d77f3e04>] kstrdup_const+0x46/0x70 [<00000000e51fa804>] kobject_set_name_vargs+0x2f/0xb0 [<00000000247cd595>] kobject_init_and_add+0xb0/0x120 [<00000000f9139aaf>] xfs_mountfs+0x367/0xfc0 [<00000000250d3caf>] xfs_fs_fill_super+0xa16/0xdc0 [<000000008d873d38>] get_tree_bdev+0x256/0x390 [<000000004881f3fa>] vfs_get_tree+0x41/0xf0 [<000000008291ab52>] path_mount+0x9b3/0xdd0 [<0000000022ba8f2d>] __x64_sys_mount+0x190/0x1d0
As mentioned in kobject_init_and_add() comment, if this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Apparently, xfs_sysfs_init() does not follow such a requirement. When kobject_init_and_add() returns an error, the space of kobj->kobject.name alloced by kstrdup_const() is unfree, which will cause the above stack.
Fix it by adding kobject_put() when kobject_init_and_add returns an error.
Fixes: a31b1d3d89e4 ("xfs: add xfs_mount sysfs kobject") Signed-off-by: Li Zetao lizetao1@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_sysfs.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_sysfs.h b/fs/xfs/xfs_sysfs.h index 43585850f154..513095e353a5 100644 --- a/fs/xfs/xfs_sysfs.h +++ b/fs/xfs/xfs_sysfs.h @@ -33,10 +33,15 @@ xfs_sysfs_init( const char *name) { struct kobject *parent; + int err;
parent = parent_kobj ? &parent_kobj->kobject : NULL; init_completion(&kobj->complete); - return kobject_init_and_add(&kobj->kobject, ktype, parent, "%s", name); + err = kobject_init_and_add(&kobj->kobject, ktype, parent, "%s", name); + if (err) + kobject_put(&kobj->kobject); + + return err; }
static inline void
From: Anshuman Khandual anshuman.khandual@arm.com
mainline inclusion from mainline-v5.12-rc3 commit 79cc2ed5a716544621b11a3f90550e5c7d314306 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I611C3 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Currently without THP being enabled, MAX_ORDER via FORCE_MAX_ZONEORDER gets reduced to 11, which falls below HUGETLB_PAGE_ORDER for certain 16K and 64K page size configurations. This is problematic which throws up the following warning during boot as pageblock_order via HUGETLB_PAGE_ORDER order exceeds MAX_ORDER.
WARNING: CPU: 7 PID: 127 at mm/vmstat.c:1092 __fragmentation_index+0x58/0x70 Modules linked in: CPU: 7 PID: 127 Comm: kswapd0 Not tainted 5.12.0-rc1-00005-g0221e3101a1 #237 Hardware name: linux,dummy-virt (DT) pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--) pc : __fragmentation_index+0x58/0x70 lr : fragmentation_index+0x88/0xa8 sp : ffff800016ccfc00 x29: ffff800016ccfc00 x28: 0000000000000000 x27: ffff800011fd4000 x26: 0000000000000002 x25: ffff800016ccfda0 x24: 0000000000000002 x23: 0000000000000640 x22: ffff0005ffcb5b18 x21: 0000000000000002 x20: 000000000000000d x19: ffff0005ffcb3980 x18: 0000000000000004 x17: 0000000000000001 x16: 0000000000000019 x15: ffff800011ca7fb8 x14: 00000000000002b3 x13: 0000000000000000 x12: 00000000000005e0 x11: 0000000000000003 x10: 0000000000000080 x9 : ffff800011c93948 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000007000 x5 : 0000000000007944 x4 : 0000000000000032 x3 : 000000000000001c x2 : 000000000000000b x1 : ffff800016ccfc10 x0 : 000000000000000d Call trace: __fragmentation_index+0x58/0x70 compaction_suitable+0x58/0x78 wakeup_kcompactd+0x8c/0xd8 balance_pgdat+0x570/0x5d0 kswapd+0x1e0/0x388 kthread+0x154/0x158 ret_from_fork+0x10/0x30
This solves the problem via keeping FORCE_MAX_ZONEORDER unchanged with or without THP on 16K and 64K page size configurations, making sure that the HUGETLB_PAGE_ORDER (and pageblock_order) would never exceed MAX_ORDER.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/1614597914-28565-1-git-send-email-anshuman.khandua... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Zhang Peng zhangpeng362@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 969f0889fbea..d55838a307d7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1290,8 +1290,8 @@ config XEN
config FORCE_MAX_ZONEORDER int - default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE) - default "12" if (ARM64_16K_PAGES && TRANSPARENT_HUGEPAGE) + default "14" if ARM64_64K_PAGES + default "12" if ARM64_16K_PAGES default "11" help The kernel memory allocator divides physically contiguous memory
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S CVE: NA
--------------------------------
For reliable memory allocation bind to nodes which do not hvve any reliable zones, its memory allocation will fail and then warn message will be produced at the end of __alloc_pages_slowpath().
Though this memory allocation can fallback to movable zone in check_after_alloc() if fallback is enabled, something should be done to prevent this pointless warn log.
To solve this problem, fallback to movable zone if no suitable zone found.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/page_alloc.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b7b05a53686a..274b68a147ea 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4674,6 +4674,25 @@ check_retry_cpuset(int cpuset_mems_cookie, struct alloc_context *ac) }
#ifdef CONFIG_MEMORY_RELIABLE +/* + * if fallback is enabled, fallback to movable zone if no dma/normal zone + * found + */ +static inline struct zone *mem_reliable_fallback_zone(gfp_t gfp_mask, + struct alloc_context *ac) +{ + if (!reliable_allow_fb_enabled()) + return NULL; + + if (!(gfp_mask & GFP_RELIABLE)) + return NULL; + + ac->highest_zoneidx = gfp_zone(gfp_mask & ~GFP_RELIABLE); + ac->preferred_zoneref = first_zones_zonelist( + ac->zonelist, ac->highest_zoneidx, ac->nodemask); + return ac->preferred_zoneref->zone; +} + static inline void mem_reliable_fallback_slowpath(gfp_t gfp_mask, struct alloc_context *ac) { @@ -4691,6 +4710,11 @@ static inline void mem_reliable_fallback_slowpath(gfp_t gfp_mask, } } #else +static inline struct zone *mem_reliable_fallback_zone(gfp_t gfp_mask, + struct alloc_context *ac) +{ + return NULL; +} static inline void mem_reliable_fallback_slowpath(gfp_t gfp_mask, struct alloc_context *ac) {} #endif @@ -4740,8 +4764,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, */ ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, ac->nodemask); - if (!ac->preferred_zoneref->zone) - goto nopage; + if (!ac->preferred_zoneref->zone) { + if (!mem_reliable_fallback_zone(gfp_mask, ac)) + goto nopage; + }
if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac);
From: Yicong Yang yangyicong@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I610G3
--------------------------------------------------------------
Add generic GPIO bus recovery support for i2c-hisi driver by registering the recovery information with core provided i2c_generic_scl_recovery() method.
As the SCL/SDA pins are multiplexed with GPIO, we need to switch the pins mux to GPIO before recovery and switch back after recovery. It's implemented by the ACPI method in the i2c_bus_recovery_info->{prepare,unprepare}_recovery() method.
Signed-off-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Wangming Shao shaowangming@h-partners.com Acked-by: Xie XiuQi xiexiuqi@huawei.com Reviewed-by: Yicong Yang yangyicong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/i2c/busses/i2c-hisi.c | 78 +++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)
diff --git a/drivers/i2c/busses/i2c-hisi.c b/drivers/i2c/busses/i2c-hisi.c index 5f17fe52feea..812b640d994e 100644 --- a/drivers/i2c/busses/i2c-hisi.c +++ b/drivers/i2c/busses/i2c-hisi.c @@ -5,9 +5,11 @@ * Copyright (c) 2021 HiSilicon Technologies Co., Ltd. */
+#include <linux/acpi.h> #include <linux/bits.h> #include <linux/bitfield.h> #include <linux/completion.h> +#include <linux/gpio/consumer.h> #include <linux/i2c.h> #include <linux/interrupt.h> #include <linux/io.h> @@ -106,6 +108,9 @@ struct hisi_i2c_controller { struct i2c_timings t; u32 clk_rate_khz; u32 spk_len; + + /* Bus recovery method */ + struct i2c_bus_recovery_info rinfo; };
static void hisi_i2c_enable_int(struct hisi_i2c_controller *ctlr, u32 mask) @@ -427,6 +432,77 @@ static void hisi_i2c_configure_bus(struct hisi_i2c_controller *ctlr) writel(reg, ctlr->iobase + HISI_I2C_FIFO_CTRL); }
+#ifdef CONFIG_ACPI +#define HISI_I2C_PIN_MUX_METHOD "PMUX" + +/** + * i2c_dw_acpi_pin_mux_change - Change the I2C controller's pin mux through ACPI + * @dev: device owns the SCL/SDA pin + * @to_gpio: true to switch to GPIO, false to switch to SCL/SDA + * + * The function invokes the specific ACPI method "PMUX" for changing the + * pin mux of I2C controller between SCL/SDA and GPIO in order to help on + * the generic GPIO recovery process. + */ +static void i2c_hisi_pin_mux_change(struct device *dev, bool to_gpio) +{ + acpi_handle handle = ACPI_HANDLE(dev); + struct acpi_object_list arg_list; + unsigned long long data; + union acpi_object arg; + + arg.type = ACPI_TYPE_INTEGER; + arg.integer.value = to_gpio; + arg_list.count = 1; + arg_list.pointer = &arg; + + acpi_evaluate_integer(handle, HISI_I2C_PIN_MUX_METHOD, &arg_list, &data); +} + +static void i2c_hisi_prepare_recovery(struct i2c_adapter *adap) +{ + struct hisi_i2c_controller *ctlr = i2c_get_adapdata(adap); + + i2c_hisi_pin_mux_change(ctlr->dev, true); +} + +static void i2c_hisi_unprepare_recovery(struct i2c_adapter *adap) +{ + struct hisi_i2c_controller *ctlr = i2c_get_adapdata(adap); + + i2c_hisi_pin_mux_change(ctlr->dev, false); +} + +static void hisi_i2c_init_recovery_info(struct hisi_i2c_controller *ctlr) +{ + struct i2c_bus_recovery_info *rinfo = &ctlr->rinfo; + struct acpi_device *adev = ACPI_COMPANION(ctlr->dev); + struct gpio_desc *gpio; + + if (!acpi_has_method(adev->handle, HISI_I2C_PIN_MUX_METHOD)) + return; + + gpio = devm_gpiod_get_optional(ctlr->dev, "scl", GPIOD_OUT_HIGH); + if (IS_ERR_OR_NULL(gpio)) + return; + + rinfo->scl_gpiod = gpio; + + gpio = devm_gpiod_get_optional(ctlr->dev, "sda", GPIOD_IN); + if (IS_ERR(gpio)) + return; + + rinfo->sda_gpiod = gpio; + rinfo->recover_bus = i2c_generic_scl_recovery; + rinfo->prepare_recovery = i2c_hisi_prepare_recovery; + rinfo->unprepare_recovery = i2c_hisi_unprepare_recovery; + + ctlr->adapter.bus_recovery_info = rinfo; +} +#else +static inline void hisi_i2c_init_recovery_info(struct hisi_i2c_controller *ctlr) { } +#endif /* CONFIG_ACPI */ + static int hisi_i2c_probe(struct platform_device *pdev) { struct hisi_i2c_controller *ctlr; @@ -478,6 +554,8 @@ static int hisi_i2c_probe(struct platform_device *pdev) adapter->dev.parent = dev; i2c_set_adapdata(adapter, ctlr);
+ hisi_i2c_init_recovery_info(ctlr); + ret = devm_i2c_add_adapter(dev, adapter); if (ret) return ret;
From: Li Nan linan122@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I617GN CVE: NA
--------------------------------
The q->tag_set can be NULL in blk_mq_queue_tag_busy_ite() while queue has not been initialized:
CPU0 CPU1 dm_mq_init_request_queue md->tag_set = kzalloc_node blk_mq_init_allocated_queue q->mq_ops = set->ops; diskstats_show part_get_stat_info if(q->mq_ops) blk_mq_in_flight_with_stat blk_mq_queue_tag_busy_ite if (blk_mq_is_shared_tags(q->tag_set->flags)) //q->tag_set is null here q->tag_set = set blk_register_queue blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q)
There is same bug when cat /sys/block/[device]/inflight. Fix it by checking the flag 'QUEUE_FLAG_REGISTERED'. Althrough this may cause some io not to be counted temporarily, it doesn't hurt in real user case.
Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-mq-tag.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 24b48a2f7fba..87bb146c7d44 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -515,6 +515,13 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv) { + /* + * For dm, it can run here after register_disk, but the queue has not + * been initialized yet. Check QUEUE_FLAG_REGISTERED prevent null point + * access. + */ + if (!blk_queue_registered(q)) + return; /* * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx * while the queue is frozen. So we can use q_usage_counter to avoid
From: Li Nan linan122@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I617GN CVE: NA
--------------------------------
Since 8b97d51a0c9c, blk_mq_queue_tag_busy_iter() will return directly if queue has not been registered. However, scsi_scan will issue io before queue is registered ,and it causes io hang as some special scsi driver (e.x. ata_piix) relied on blk_mq_timeou_work() to complete io when driver initializing during scan. Fix the bug by checking QUEUE_FLAG_REGISTERED upward.
Fixes: 8b97d51a0c9c ("[Huawei] blk-mq: fix null pointer dereference in blk_mq_queue_tag_busy_ite") Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-mq-tag.c | 7 ------- block/blk-mq.c | 12 ++++++++---- 2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 87bb146c7d44..24b48a2f7fba 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -515,13 +515,6 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv) { - /* - * For dm, it can run here after register_disk, but the queue has not - * been initialized yet. Check QUEUE_FLAG_REGISTERED prevent null point - * access. - */ - if (!blk_queue_registered(q)) - return; /* * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx * while the queue is frozen. So we can use q_usage_counter to avoid diff --git a/block/blk-mq.c b/block/blk-mq.c index d494297880ef..7fbbad7b08b3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -151,7 +151,8 @@ unsigned int blk_mq_in_flight_with_stat(struct request_queue *q, { struct mq_inflight mi = { .part = part };
- blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight_with_stat, &mi); + if (blk_queue_registered(q)) + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight_with_stat, &mi);
return mi.inflight[0] + mi.inflight[1]; } @@ -174,7 +175,8 @@ unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part) { struct mq_inflight mi = { .part = part };
- blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); + if (blk_queue_registered(q)) + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi);
return mi.inflight[0] + mi.inflight[1]; } @@ -184,7 +186,8 @@ void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part, { struct mq_inflight mi = { .part = part };
- blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); + if (blk_queue_registered(q)) + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); inflight[0] = mi.inflight[0]; inflight[1] = mi.inflight[1]; } @@ -974,7 +977,8 @@ bool blk_mq_queue_inflight(struct request_queue *q) { bool busy = false;
- blk_mq_queue_tag_busy_iter(q, blk_mq_rq_inflight, &busy); + if (blk_queue_registered(q)) + blk_mq_queue_tag_busy_iter(q, blk_mq_rq_inflight, &busy); return busy; } EXPORT_SYMBOL_GPL(blk_mq_queue_inflight);
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
stable inclusion from stable-v5.10.154 commit 26ca2ac091b49281d73df86111d16e5a76e43bd7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5ZNRS CVE: CVE-2022-42895
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b1a2cd50c0357f243b7435a732b4e62ba3157a2e upstream.
On l2cap_parse_conf_req the variable efs is only initialized if remote_efs has been set.
CVE: CVE-2022-42895 CC: stable@vger.kernel.org Reported-by: Tamás Koczka poprdi@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Tedd Ho-Jeong An tedd.an@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/bluetooth/l2cap_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 584a9deb8c80..d35b168e7793 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -3758,7 +3758,8 @@ static int l2cap_parse_conf_req(struct l2cap_chan *chan, void *data, size_t data l2cap_add_conf_opt(&ptr, L2CAP_CONF_RFC, sizeof(rfc), (unsigned long) &rfc, endptr - ptr);
- if (test_bit(FLAG_EFS_ENABLE, &chan->flags)) { + if (remote_efs && + test_bit(FLAG_EFS_ENABLE, &chan->flags)) { chan->remote_id = efs.id; chan->remote_stype = efs.stype; chan->remote_msdu = le16_to_cpu(efs.msdu);
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v6.1-rc1 commit 8cfb08575c6d4585f1ce0deeb189e5c824776b04 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CHA CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
Li Huafei reports that mcount-based ftrace with module PLTs was broken by commit:
a6253579977e4c6f ("arm64: ftrace: consistently handle PLTs.")
When a module PLTs are used and a module is loaded sufficiently far away from the kernel, we'll create PLTs for any branches which are out-of-range. These are separate from the special ftrace trampoline PLTs, which the module PLT code doesn't directly manipulate.
When mcount is in use this is a problem, as each mcount callsite in a module will be initialized to point to a module PLT, but since commit a6253579977e4c6f ftrace_make_nop() will assume that the callsite has been initialized to point to the special ftrace trampoline PLT, and ftrace_find_callable_addr() rejects other cases.
This means that when ftrace tries to initialize a callsite via ftrace_make_nop(), the call to ftrace_find_callable_addr() will find that the `_mcount` stub is out-of-range and is not handled by the ftrace PLT, resulting in a splat:
| ftrace_test: loading out-of-tree module taints kernel. | ftrace: no module PLT for _mcount | ------------[ ftrace bug ]------------ | ftrace failed to modify | [<ffff800029180014>] 0xffff800029180014 | actual: 44:00:00:94 | Initializing ftrace call sites | ftrace record flags: 2000000 | (0) | expected tramp: ffff80000802eb3c | ------------[ cut here ]------------ | WARNING: CPU: 3 PID: 157 at kernel/trace/ftrace.c:2120 ftrace_bug+0x94/0x270 | Modules linked in: | CPU: 3 PID: 157 Comm: insmod Tainted: G O 6.0.0-rc6-00151-gcd722513a189-dirty #22 | Hardware name: linux,dummy-virt (DT) | pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : ftrace_bug+0x94/0x270 | lr : ftrace_bug+0x21c/0x270 | sp : ffff80000b2bbaf0 | x29: ffff80000b2bbaf0 x28: 0000000000000000 x27: ffff0000c4d38000 | x26: 0000000000000001 x25: ffff800009d7e000 x24: ffff0000c4d86e00 | x23: 0000000002000000 x22: ffff80000a62b000 x21: ffff8000098ebea8 | x20: ffff0000c4d38000 x19: ffff80000aa24158 x18: ffffffffffffffff | x17: 0000000000000000 x16: 0a0d2d2d2d2d2d2d x15: ffff800009aa9118 | x14: 0000000000000000 x13: 6333626532303830 x12: 3030303866666666 | x11: 203a706d61727420 x10: 6465746365707865 x9 : 3362653230383030 | x8 : c0000000ffffefff x7 : 0000000000017fe8 x6 : 000000000000bff4 | x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000001 | x2 : ad2cb14bb5438900 x1 : 0000000000000000 x0 : 0000000000000022 | Call trace: | ftrace_bug+0x94/0x270 | ftrace_process_locs+0x308/0x430 | ftrace_module_init+0x44/0x60 | load_module+0x15b4/0x1ce8 | __do_sys_init_module+0x1ec/0x238 | __arm64_sys_init_module+0x24/0x30 | invoke_syscall+0x54/0x118 | el0_svc_common.constprop.4+0x84/0x100 | do_el0_svc+0x3c/0xd0 | el0_svc+0x1c/0x50 | el0t_64_sync_handler+0x90/0xb8 | el0t_64_sync+0x15c/0x160 | ---[ end trace 0000000000000000 ]--- | ---------test_init-----------
Fix this by reverting to the old behaviour of ignoring the old instruction when initialising an mcount callsite in a module, which was the behaviour prior to commit a6253579977e4c6f.
Signed-off-by: Mark Rutland mark.rutland@arm.com Fixes: a6253579977e ("arm64: ftrace: consistently handle PLTs.") Reported-by: Li Huafei lihuafei1@huawei.com Link: https://lore.kernel.org/linux-arm-kernel/20220929094134.99512-1-lihuafei1@hu... Cc: Ard Biesheuvel ardb@kernel.org Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20220929134525.798593-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/ftrace.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index 3724bab278b2..402a24f845b9 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -216,11 +216,26 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long pc = rec->ip; u32 old = 0, new;
+ new = aarch64_insn_gen_nop(); + + /* + * When using mcount, callsites in modules may have been initalized to + * call an arbitrary module PLT (which redirects to the _mcount stub) + * rather than the ftrace PLT we'll use at runtime (which redirects to + * the ftrace trampoline). We can ignore the old PLT when initializing + * the callsite. + * + * Note: 'mod' is only set at module load time. + */ + if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && + IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) && mod) { + return aarch64_insn_patch_text_nosync((void *)pc, new); + } + if (!ftrace_find_callable_addr(rec, mod, &addr)) return -EINVAL;
old = aarch64_insn_gen_branch_imm(pc, addr, AARCH64_INSN_BRANCH_LINK); - new = aarch64_insn_gen_nop();
return ftrace_modify_code(pc, old, new, true); }
From: Yu Liao liaoyu15@huawei.com
hulk inclusion category: bugfix bugzilla: 186781, https://gitee.com/openeuler/kernel/issues/I61CEW CVE: NA
--------------------------------
Split out a function that does not acquire ops_lock from rtc_update_irq_enable, in preparation for fixing RTC_RD_TIME and RTC_UIE_ON race problem.
Signed-off-by: Yu Liao liaoyu15@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/rtc/interface.c | 34 ++++++++++++++++++++-------------- include/linux/rtc.h | 1 + 2 files changed, 21 insertions(+), 14 deletions(-)
diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 146056858135..d4e576a65885 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -543,20 +543,10 @@ int rtc_alarm_irq_enable(struct rtc_device *rtc, unsigned int enabled) } EXPORT_SYMBOL_GPL(rtc_alarm_irq_enable);
-int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) +int __rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) { - int rc = 0, err; + int rc = 0, err = 0;
- err = mutex_lock_interruptible(&rtc->ops_lock); - if (err) - return err; - -#ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL - if (enabled == 0 && rtc->uie_irq_active) { - mutex_unlock(&rtc->ops_lock); - return rtc_dev_update_irq_enable_emul(rtc, 0); - } -#endif /* make sure we're changing state */ if (rtc->uie_rtctimer.enabled == enabled) goto out; @@ -583,8 +573,6 @@ int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) }
out: - mutex_unlock(&rtc->ops_lock); - /* * __rtc_read_time() failed, this probably means that the RTC time has * never been set or less probably there is a transient error on the @@ -594,6 +582,24 @@ int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) if (rc) return rc;
+ return err; +} + +int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) +{ + int err; + +#ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL + if (enabled == 0 && rtc->uie_irq_active) + return rtc_dev_update_irq_enable_emul(rtc, 0); +#endif + + err = mutex_lock_interruptible(&rtc->ops_lock); + if (err) + return err; + err = __rtc_update_irq_enable(rtc, enabled); + mutex_unlock(&rtc->ops_lock); + #ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL /* * Enable emulation if the driver returned -EINVAL to signal that it has diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 22d1575e4991..d0cd0a611ad3 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -182,6 +182,7 @@ extern void rtc_class_close(struct rtc_device *rtc); extern int rtc_irq_set_state(struct rtc_device *rtc, int enabled); extern int rtc_irq_set_freq(struct rtc_device *rtc, int freq); extern int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled); +extern int __rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled); extern int rtc_alarm_irq_enable(struct rtc_device *rtc, unsigned int enabled); extern int rtc_dev_update_irq_enable_emul(struct rtc_device *rtc, unsigned int enabled);
From: Yu Liao liaoyu15@huawei.com
hulk inclusion category: bugfix bugzilla: 186781, https://gitee.com/openeuler/kernel/issues/I61CEW CVE: NA
--------------------------------
When the RTC_SET_TIME and RTC_RD_TIME threads run in parallel, there is no guarantee that uie_rtctimer.enabled is equal to the previously read uie when executing rtc->ops->set_time.
Fix this by keeping reading uie state, disabling uie, setting rtc time and enabling uie in critical sections.
Fixes: 7e7c005b4b1f ("rtc: disable uie before setting time and enable after") Signed-off-by: Yu Liao liaoyu15@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/rtc/interface.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index d4e576a65885..16fbcf6835f3 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -137,21 +137,19 @@ int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm)
rtc_subtract_offset(rtc, tm);
-#ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL - uie = rtc->uie_rtctimer.enabled || rtc->uie_irq_active; -#else + err = mutex_lock_interruptible(&rtc->ops_lock); + if (err) + return err; + uie = rtc->uie_rtctimer.enabled; -#endif if (uie) { - err = rtc_update_irq_enable(rtc, 0); - if (err) + err = __rtc_update_irq_enable(rtc, 0); + if (err) { + mutex_unlock(&rtc->ops_lock); return err; + } }
- err = mutex_lock_interruptible(&rtc->ops_lock); - if (err) - return err; - if (!rtc->ops) err = -ENODEV; else if (rtc->ops->set_time) @@ -160,16 +158,14 @@ int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm) err = -EINVAL;
pm_stay_awake(rtc->dev.parent); + + if (uie) + err = __rtc_update_irq_enable(rtc, 1); + mutex_unlock(&rtc->ops_lock); /* A timer might have just expired */ schedule_work(&rtc->irqwork);
- if (uie) { - err = rtc_update_irq_enable(rtc, 1); - if (err) - return err; - } - trace_rtc_set_time(rtc_tm_to_time64(tm), err); return err; }
From: Junhao He hejunhao3@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5KAX7
--------------------------------------------------------------------------
Fixed the issue that the kabi value changed when the HiSilicon PMU driver added the enum variable in "enum cpuhp_state{}".
The hisi_pcie_pmu and hisi_cpa_pmu drivers to replace the explicit specify hotplug events with dynamic allocation hotplug events(CPUHP_AP_ONLINE_DYN). The states between *CPUHP_AP_ONLINE_DYN* and *CPUHP_AP_ONLINE_DYN_END* are reserved for the dynamic allocation.
Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Yicong Yang yangyicong@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/perf/hisilicon/hisi_pcie_pmu.c | 22 ++++++++++--------- drivers/perf/hisilicon/hisi_uncore_cpa_pmu.c | 23 ++++++++++---------- include/linux/cpuhotplug.h | 6 ----- 3 files changed, 24 insertions(+), 27 deletions(-)
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c index cd5b719b8c2e..88d488765203 100644 --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -19,6 +19,9 @@ #include <linux/pci.h> #include <linux/perf_event.h>
+/* Dynamic CPU hotplug state used by PCIe PMU */ +static enum cpuhp_state hisi_pcie_pmu_online; + #define DRV_NAME "hisi_pcie_pmu" /* Define registers */ #define HISI_PCIE_GLOBAL_CTRL 0x00 @@ -830,7 +833,7 @@ static int hisi_pcie_init_pmu(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_p if (ret) goto err_iounmap;
- ret = cpuhp_state_add_instance(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node); + ret = cpuhp_state_add_instance(hisi_pcie_pmu_online, &pcie_pmu->node); if (ret) { pci_err(pdev, "Failed to register hotplug: %d\n", ret); goto err_irq_unregister; @@ -845,8 +848,7 @@ static int hisi_pcie_init_pmu(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_p return ret;
err_hotplug_unregister: - cpuhp_state_remove_instance_nocalls( - CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node); + cpuhp_state_remove_instance_nocalls(hisi_pcie_pmu_online, &pcie_pmu->node);
err_irq_unregister: hisi_pcie_pmu_irq_unregister(pdev, pcie_pmu); @@ -862,8 +864,7 @@ static void hisi_pcie_uninit_pmu(struct pci_dev *pdev) struct hisi_pcie_pmu *pcie_pmu = pci_get_drvdata(pdev);
perf_pmu_unregister(&pcie_pmu->pmu); - cpuhp_state_remove_instance_nocalls( - CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node); + cpuhp_state_remove_instance_nocalls(hisi_pcie_pmu_online, &pcie_pmu->node); hisi_pcie_pmu_irq_unregister(pdev, pcie_pmu); iounmap(pcie_pmu->base); } @@ -934,18 +935,19 @@ static int __init hisi_pcie_module_init(void) { int ret;
- ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, - "AP_PERF_ARM_HISI_PCIE_PMU_ONLINE", + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "perf/hisi/pcie:online", hisi_pcie_pmu_online_cpu, hisi_pcie_pmu_offline_cpu); - if (ret) { + if (ret < 0) { pr_err("Failed to setup PCIe PMU hotplug: %d\n", ret); return ret; } + hisi_pcie_pmu_online = ret;
ret = pci_register_driver(&hisi_pcie_pmu_driver); if (ret) - cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE); + cpuhp_remove_multi_state(hisi_pcie_pmu_online);
return ret; } @@ -954,7 +956,7 @@ module_init(hisi_pcie_module_init); static void __exit hisi_pcie_module_exit(void) { pci_unregister_driver(&hisi_pcie_pmu_driver); - cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE); + cpuhp_remove_multi_state(hisi_pcie_pmu_online); } module_exit(hisi_pcie_module_exit);
diff --git a/drivers/perf/hisilicon/hisi_uncore_cpa_pmu.c b/drivers/perf/hisilicon/hisi_uncore_cpa_pmu.c index a9bb73f76be4..09839dae9b7c 100644 --- a/drivers/perf/hisilicon/hisi_uncore_cpa_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_cpa_pmu.c @@ -19,6 +19,9 @@
#include "hisi_uncore_pmu.h"
+/* Dynamic CPU hotplug state used by CPA PMU */ +static enum cpuhp_state hisi_cpa_pmu_online; + /* CPA register definition */ #define CPA_PERF_CTRL 0x1c00 #define CPA_EVENT_CTRL 0x1c04 @@ -334,8 +337,7 @@ static int hisi_cpa_pmu_probe(struct platform_device *pdev)
/* Power Management should be disabled before using CPA PMU. */ hisi_cpa_pmu_disable_pm(cpa_pmu); - ret = cpuhp_state_add_instance(CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE, - &cpa_pmu->node); + ret = cpuhp_state_add_instance(hisi_cpa_pmu_online, &cpa_pmu->node); if (ret) { dev_err(&pdev->dev, "Error %d registering hotplug\n", ret); hisi_cpa_pmu_enable_pm(cpa_pmu); @@ -345,8 +347,7 @@ static int hisi_cpa_pmu_probe(struct platform_device *pdev) ret = perf_pmu_register(&cpa_pmu->pmu, name, -1); if (ret) { dev_err(cpa_pmu->dev, "PMU register failed\n"); - cpuhp_state_remove_instance_nocalls( - CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE, &cpa_pmu->node); + cpuhp_state_remove_instance_nocalls(hisi_cpa_pmu_online, &cpa_pmu->node); hisi_cpa_pmu_enable_pm(cpa_pmu); return ret; } @@ -360,8 +361,7 @@ static int hisi_cpa_pmu_remove(struct platform_device *pdev) struct hisi_pmu *cpa_pmu = platform_get_drvdata(pdev);
perf_pmu_unregister(&cpa_pmu->pmu); - cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE, - &cpa_pmu->node); + cpuhp_state_remove_instance_nocalls(hisi_cpa_pmu_online, &cpa_pmu->node); hisi_cpa_pmu_enable_pm(cpa_pmu); return 0; } @@ -380,18 +380,19 @@ static int __init hisi_cpa_pmu_module_init(void) { int ret;
- ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE, - "AP_PERF_ARM_HISI_CPA_ONLINE", + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "pmu/hisi/cpa:online", hisi_uncore_pmu_online_cpu, hisi_uncore_pmu_offline_cpu); - if (ret) { + if (ret < 0) { pr_err("setup hotplug failed: %d\n", ret); return ret; } + hisi_cpa_pmu_online = ret;
ret = platform_driver_register(&hisi_cpa_pmu_driver); if (ret) - cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE); + cpuhp_remove_multi_state(hisi_cpa_pmu_online);
return ret; } @@ -400,7 +401,7 @@ module_init(hisi_cpa_pmu_module_init); static void __exit hisi_cpa_pmu_module_exit(void) { platform_driver_unregister(&hisi_cpa_pmu_driver); - cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE); + cpuhp_remove_multi_state(hisi_cpa_pmu_online); } module_exit(hisi_cpa_pmu_module_exit);
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 5571bfc2ec6e..b98b9eb7d5f8 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -173,17 +173,11 @@ enum cpuhp_state { CPUHP_AP_PERF_S390_SF_ONLINE, CPUHP_AP_PERF_ARM_CCI_ONLINE, CPUHP_AP_PERF_ARM_CCN_ONLINE, - #ifndef __GENKSYMS__ - CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE, - #endif CPUHP_AP_PERF_ARM_HISI_DDRC_ONLINE, CPUHP_AP_PERF_ARM_HISI_HHA_ONLINE, CPUHP_AP_PERF_ARM_HISI_L3_ONLINE, CPUHP_AP_PERF_ARM_HISI_PA_ONLINE, CPUHP_AP_PERF_ARM_HISI_SLLC_ONLINE, - #ifndef __GENKSYMS__ - CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, - #endif CPUHP_AP_PERF_ARM_L2X0_ONLINE, CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE, CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
From: Zheng Zucheng zhengzucheng@huawei.com
hulk inclusion category: feature bugzilla: 187196, https://gitee.com/openeuler/kernel/issues/I612CS
-------------------------------
Allocate a new task_struct_resvd object for the recently cloned task
Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: chenhui 00515652 judy.chenhui@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/sched.h | 2 ++ init/init_task.c | 5 +++++ kernel/fork.c | 21 ++++++++++++++++++++- 3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index aeeaca7dd253..9abd00ed53b4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -677,6 +677,8 @@ struct wake_q_node { * struct task_struct_resvd - KABI extension struct */ struct task_struct_resvd { + /* pointer back to the main task_struct */ + struct task_struct *task; };
struct task_struct { diff --git a/init/init_task.c b/init/init_task.c index 5fa18ed59d33..891007de2eef 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -57,6 +57,10 @@ unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] }; #endif
+static struct task_struct_resvd init_task_struct_resvd = { + .task = &init_task, +}; + /* * Set up the first task table, touch at your own risk!. Base=0, * limit=0x1fffff (=2MB) @@ -213,6 +217,7 @@ struct task_struct init_task #ifdef CONFIG_SECCOMP_FILTER .seccomp = { .filter_count = ATOMIC_INIT(0) }, #endif + ._resvd = &init_task_struct_resvd, }; EXPORT_SYMBOL(init_task);
diff --git a/kernel/fork.c b/kernel/fork.c index 8dbb8d985e78..8a40bfda35e1 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -173,6 +173,7 @@ static inline struct task_struct *alloc_task_struct_node(int node)
static inline void free_task_struct(struct task_struct *tsk) { + kfree(tsk->_resvd); kmem_cache_free(task_struct_cachep, tsk); } #endif @@ -851,6 +852,18 @@ void set_task_stack_end_magic(struct task_struct *tsk) *stackend = STACK_END_MAGIC; /* for overflow detection */ }
+static bool dup_resvd_task_struct(struct task_struct *dst, + struct task_struct *orig, int node) +{ + dst->_resvd = kmalloc_node(sizeof(struct task_struct_resvd), + GFP_KERNEL, node); + if (!dst->_resvd) + return false; + + dst->_resvd->task = dst; + return true; +} + static struct task_struct *dup_task_struct(struct task_struct *orig, int node) { struct task_struct *tsk; @@ -863,6 +876,12 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) tsk = alloc_task_struct_node(node); if (!tsk) return NULL; + /* + * before proceeding, we need to make tsk->_resvd = NULL, + * otherwise the error paths below, if taken, might end up causing + * a double-free for task_struct_resvd extension object. + */ + WRITE_ONCE(tsk->_resvd, NULL);
stack = alloc_thread_stack_node(tsk, node); if (!stack) @@ -888,7 +907,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) refcount_set(&tsk->stack_refcount, 1); #endif
- if (err) + if (err || !dup_resvd_task_struct(tsk, orig, node)) goto free_stack;
err = scs_prepare(tsk, node);
From: Nico Pache npache@redhat.com
mainline inclusion from mainline-v5.18-rc4 commit e4a38402c36e42df28eb1a5394be87e6571fb48a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61FDP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death.
A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death:
CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory)
If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely.
Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup.
Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
Based on a patch by Michal Hocko.
Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz jsavitz@redhat.com Signed-off-by: Nico Pache npache@redhat.com Co-developed-by: Joel Savitz jsavitz@redhat.com Suggested-by: Thomas Gleixner tglx@linutronix.de Acked-by: Thomas Gleixner tglx@linutronix.de Acked-by: Michal Hocko mhocko@suse.com Cc: Rafael Aquini aquini@redhat.com Cc: Waiman Long longman@redhat.com Cc: Herton R. Krzesinski herton@redhat.com Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Cc: Daniel Bristot de Oliveira bristot@redhat.com Cc: David Rientjes rientjes@google.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@redhat.com Cc: Joel Savitz jsavitz@redhat.com Cc: Darren Hart dvhart@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: chenhui 00515652 judy.chenhui@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/sched.h | 1 + mm/oom_kill.c | 54 ++++++++++++++++++++++++++++++++----------- 2 files changed, 41 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index 9abd00ed53b4..a5787b3f51a8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1358,6 +1358,7 @@ struct task_struct { int pagefault_disabled; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; + struct timer_list oom_reaper_timer; #endif #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index dd2b4f890403..0c46b493599e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -680,7 +680,7 @@ static void oom_reap_task(struct task_struct *tsk) */ set_bit(MMF_OOM_SKIP, &mm->flags);
- /* Drop a reference taken by wake_oom_reaper */ + /* Drop a reference taken by queue_oom_reaper */ put_task_struct(tsk); }
@@ -690,12 +690,12 @@ static int oom_reaper(void *unused) struct task_struct *tsk = NULL;
wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL); - spin_lock(&oom_reaper_lock); + spin_lock_irq(&oom_reaper_lock); if (oom_reaper_list != NULL) { tsk = oom_reaper_list; oom_reaper_list = tsk->oom_reaper_list; } - spin_unlock(&oom_reaper_lock); + spin_unlock_irq(&oom_reaper_lock);
if (tsk) oom_reap_task(tsk); @@ -704,22 +704,48 @@ static int oom_reaper(void *unused) return 0; }
-static void wake_oom_reaper(struct task_struct *tsk) +static void wake_oom_reaper(struct timer_list *timer) { - /* mm is already queued? */ - if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) - return; + struct task_struct *tsk = container_of(timer, struct task_struct, + oom_reaper_timer); + struct mm_struct *mm = tsk->signal->oom_mm; + unsigned long flags;
- get_task_struct(tsk); + /* The victim managed to terminate on its own - see exit_mmap */ + if (test_bit(MMF_OOM_SKIP, &mm->flags)) { + put_task_struct(tsk); + return; + }
- spin_lock(&oom_reaper_lock); + spin_lock_irqsave(&oom_reaper_lock, flags); tsk->oom_reaper_list = oom_reaper_list; oom_reaper_list = tsk; - spin_unlock(&oom_reaper_lock); + spin_unlock_irqrestore(&oom_reaper_lock, flags); trace_wake_reaper(tsk->pid); wake_up(&oom_reaper_wait); }
+/* + * Give the OOM victim time to exit naturally before invoking the oom_reaping. + * The timers timeout is arbitrary... the longer it is, the longer the worst + * case scenario for the OOM can take. If it is too small, the oom_reaper can + * get in the way and release resources needed by the process exit path. + * e.g. The futex robust list can sit in Anon|Private memory that gets reaped + * before the exit path is able to wake the futex waiters. + */ +#define OOM_REAPER_DELAY (2*HZ) +static void queue_oom_reaper(struct task_struct *tsk) +{ + /* mm is already queued? */ + if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) + return; + + get_task_struct(tsk); + timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); + tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; + add_timer(&tsk->oom_reaper_timer); +} + static int __init oom_init(void) { oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper"); @@ -727,7 +753,7 @@ static int __init oom_init(void) } subsys_initcall(oom_init) #else -static inline void wake_oom_reaper(struct task_struct *tsk) +static inline void queue_oom_reaper(struct task_struct *tsk) { } #endif /* CONFIG_MMU */ @@ -978,7 +1004,7 @@ static void __oom_kill_process(struct task_struct *victim, const char *message) rcu_read_unlock();
if (can_oom_reap) - wake_oom_reaper(victim); + queue_oom_reaper(victim);
mmdrop(mm); put_task_struct(victim); @@ -1014,7 +1040,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) task_lock(victim); if (task_will_free_mem(victim)) { mark_oom_victim(victim); - wake_oom_reaper(victim); + queue_oom_reaper(victim); task_unlock(victim); put_task_struct(victim); return; @@ -1157,7 +1183,7 @@ bool out_of_memory(struct oom_control *oc) */ if (task_will_free_mem(current)) { mark_oom_victim(current); - wake_oom_reaper(current); + queue_oom_reaper(current); return true; }
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I61FDP CVE: NA
-------------------------------
Move oom_reaper_timer from task_struct to task_struct_resvd to fix KABI broken.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: chenhui 00515652 judy.chenhui@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/sched.h | 5 ++++- mm/oom_kill.c | 11 ++++++----- 2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index a5787b3f51a8..65960d47f001 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -679,6 +679,10 @@ struct wake_q_node { struct task_struct_resvd { /* pointer back to the main task_struct */ struct task_struct *task; + +#ifdef CONFIG_MMU + struct timer_list oom_reaper_timer; +#endif };
struct task_struct { @@ -1358,7 +1362,6 @@ struct task_struct { int pagefault_disabled; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; - struct timer_list oom_reaper_timer; #endif #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0c46b493599e..417ff9574d19 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -706,8 +706,9 @@ static int oom_reaper(void *unused)
static void wake_oom_reaper(struct timer_list *timer) { - struct task_struct *tsk = container_of(timer, struct task_struct, - oom_reaper_timer); + struct task_struct_resvd *tsk_resvd = container_of(timer, + struct task_struct_resvd, oom_reaper_timer); + struct task_struct *tsk = tsk_resvd->task; struct mm_struct *mm = tsk->signal->oom_mm; unsigned long flags;
@@ -741,9 +742,9 @@ static void queue_oom_reaper(struct task_struct *tsk) return;
get_task_struct(tsk); - timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; - add_timer(&tsk->oom_reaper_timer); + timer_setup(&tsk->_resvd->oom_reaper_timer, wake_oom_reaper, 0); + tsk->_resvd->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; + add_timer(&tsk->_resvd->oom_reaper_timer); }
static int __init oom_init(void)
From: Luo Meng luomeng12@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61HSS CVE: NA
--------------------------------
Add DMINFO() to help tracking device creation/removal success.
Signed-off-by: Luo Meng luomeng12@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/md/dm-ioctl.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index b012a2748af8..2186a3a4e48b 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -272,6 +272,9 @@ static struct dm_table *__hash_remove(struct hash_cell *hc) table = NULL; if (hc->new_map) table = hc->new_map; + + DMINFO("%s[%i]: %s (%s) is removed successfully", + current->comm, current->pid, hc->md->disk->disk_name, hc->name); dm_put(hc->md); free_cell(hc);
@@ -773,6 +776,7 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si { int r, m = DM_ANY_MINOR; struct mapped_device *md; + struct hash_cell *hc;
r = check_name(param->name); if (r) @@ -796,6 +800,9 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si
__dev_status(md, param);
+ hc = dm_get_mdptr(md); + DMINFO("%s[%i]: %s (%s) is created successfully", + current->comm, current->pid, md->disk->disk_name, hc->name); dm_put(md);
return 0;
From: Guangbin Huang huangguangbin2@huawei.com
mainline inclusion from mainline-arm64-upstream commit 39915b6b5fc2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KAX7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
HNS3 PMU End Point device is supported on HiSilicon HIP09 platform, so add document hns3-pmu.rst to provide guidance on how to use it.
Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Reviewed-by: John Garry john.garry@huawei.com Reviewed-by: Shaokun Zhang zhangshaokun@hisilicon.com Link: https://lore.kernel.org/r/20220628063419.38514-2-huangguangbin2@huawei.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Jiantao Xiao xiaojiantao1@h-partners.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Jian Shen shenjian15@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- Documentation/admin-guide/perf/hns3-pmu.rst | 136 ++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 Documentation/admin-guide/perf/hns3-pmu.rst
diff --git a/Documentation/admin-guide/perf/hns3-pmu.rst b/Documentation/admin-guide/perf/hns3-pmu.rst new file mode 100644 index 000000000000..578407e487d6 --- /dev/null +++ b/Documentation/admin-guide/perf/hns3-pmu.rst @@ -0,0 +1,136 @@ +====================================== +HNS3 Performance Monitoring Unit (PMU) +====================================== + +HNS3(HiSilicon network system 3) Performance Monitoring Unit (PMU) is an +End Point device to collect performance statistics of HiSilicon SoC NIC. +On Hip09, each SICL(Super I/O cluster) has one PMU device. + +HNS3 PMU supports collection of performance statistics such as bandwidth, +latency, packet rate and interrupt rate. + +Each HNS3 PMU supports 8 hardware events. + +HNS3 PMU driver +=============== + +The HNS3 PMU driver registers a perf PMU with the name of its sicl id.:: + + /sys/devices/hns3_pmu_sicl_<sicl_id> + +PMU driver provides description of available events, filter modes, format, +identifier and cpumask in sysfs. + +The "events" directory describes the event code of all supported events +shown in perf list. + +The "filtermode" directory describes the supported filter modes of each +event. + +The "format" directory describes all formats of the config (events) and +config1 (filter options) fields of the perf_event_attr structure. + +The "identifier" file shows version of PMU hardware device. + +The "bdf_min" and "bdf_max" files show the supported bdf range of each +pmu device. + +The "hw_clk_freq" file shows the hardware clock frequency of each pmu +device. + +Example usage of checking event code and subevent code:: + + $# cat /sys/devices/hns3_pmu_sicl_0/events/dly_tx_normal_to_mac_time + config=0x00204 + $# cat /sys/devices/hns3_pmu_sicl_0/events/dly_tx_normal_to_mac_packet_num + config=0x10204 + +Each performance statistic has a pair of events to get two values to +calculate real performance data in userspace. + +The bits 0~15 of config (here 0x0204) are the true hardware event code. If +two events have same value of bits 0~15 of config, that means they are +event pair. And the bit 16 of config indicates getting counter 0 or +counter 1 of hardware event. + +After getting two values of event pair in usersapce, the formula of +computation to calculate real performance data is::: + + counter 0 / counter 1 + +Example usage of checking supported filter mode:: + + $# cat /sys/devices/hns3_pmu_sicl_0/filtermode/bw_ssu_rpu_byte_num + filter mode supported: global/port/port-tc/func/func-queue/ + +Example usage of perf:: + + $# perf list + hns3_pmu_sicl_0/bw_ssu_rpu_byte_num/ [kernel PMU event] + hns3_pmu_sicl_0/bw_ssu_rpu_time/ [kernel PMU event] + ------------------------------------------ + + $# perf stat -g -e hns3_pmu_sicl_0/bw_ssu_rpu_byte_num,global=1/ -e hns3_pmu_sicl_0/bw_ssu_rpu_time,global=1/ -I 1000 + or + $# perf stat -g -e hns3_pmu_sicl_0/config=0x00002,global=1/ -e hns3_pmu_sicl_0/config=0x10002,global=1/ -I 1000 + + +Filter modes +-------------- + +1. global mode +PMU collect performance statistics for all HNS3 PCIe functions of IO DIE. +Set the "global" filter option to 1 will enable this mode. +Example usage of perf:: + + $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,global=1/ -I 1000 + +2. port mode +PMU collect performance statistic of one whole physical port. The port id +is same as mac id. The "tc" filter option must be set to 0xF in this mode, +here tc stands for traffic class. + +Example usage of perf:: + + $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,port=0,tc=0xF/ -I 1000 + +3. port-tc mode +PMU collect performance statistic of one tc of physical port. The port id +is same as mac id. The "tc" filter option must be set to 0 ~ 7 in this +mode. +Example usage of perf:: + + $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,port=0,tc=0/ -I 1000 + +4. func mode +PMU collect performance statistic of one PF/VF. The function id is BDF of +PF/VF, its conversion formula:: + + func = (bus << 8) + (device << 3) + (function) + +for example: + BDF func + 35:00.0 0x3500 + 35:00.1 0x3501 + 35:01.0 0x3508 + +In this mode, the "queue" filter option must be set to 0xFFFF. +Example usage of perf:: + + $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,bdf=0x3500,queue=0xFFFF/ -I 1000 + +5. func-queue mode +PMU collect performance statistic of one queue of PF/VF. The function id +is BDF of PF/VF, the "queue" filter option must be set to the exact queue +id of function. +Example usage of perf:: + + $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,bdf=0x3500,queue=0/ -I 1000 + +6. func-intr mode +PMU collect performance statistic of one interrupt of PF/VF. The function +id is BDF of PF/VF, the "intr" filter option must be set to the exact +interrupt id of function. +Example usage of perf:: + + $# perf stat -a -e hns3_pmu_sicl_0/config=0x00301,bdf=0x3500,intr=0/ -I 1000
From: Guangbin Huang huangguangbin2@huawei.com
mainline inclusion from mainline-arm64-upstream commit 66637ab137b4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KAX7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
HNS3(HiSilicon Network System 3) PMU is RCiEP device in HiSilicon SoC NIC, supports collection of performance statistics such as bandwidth, latency, packet rate and interrupt rate.
NIC of each SICL has one PMU device for it. Driver registers each PMU device to perf, and exports information of supported events, filter mode of each event, bdf range, hardware clock frequency, identifier and so on via sysfs.
Each PMU device has its own registers of control, counters and interrupt, and it supports 8 hardware events, each hardward event has its own registers for configuration, counters and interrupt.
Filter options contains: config - select event port - select physical port of nic tc - select tc(must be used with port) func - select PF/VF queue - select queue of PF/VF(must be used with func) intr - select interrupt number(must be used with func) global - select all functions of IO DIE
Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Reviewed-by: John Garry john.garry@huawei.com Reviewed-by: Shaokun Zhang zhangshaokun@hisilicon.com Link: https://lore.kernel.org/r/20220628063419.38514-3-huangguangbin2@huawei.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Jiantao Xiao xoiaojiantao1@h-partners.com Signed-off-by: Junhao He hejunhao3@huawei.com Signed-off-by: Jiantao Xiao xiaojiantao1@h-partners.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Jian Shen shenjian15@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- MAINTAINERS | 6 + drivers/perf/hisilicon/Kconfig | 10 + drivers/perf/hisilicon/Makefile | 1 + drivers/perf/hisilicon/hns3_pmu.c | 1672 +++++++++++++++++++++++++++++ 4 files changed, 1689 insertions(+) create mode 100644 drivers/perf/hisilicon/hns3_pmu.c
diff --git a/MAINTAINERS b/MAINTAINERS index 9d54886961c5..a8a9608823e2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8007,6 +8007,12 @@ F: Documentation/ABI/testing/sysfs-devices-hisi_ptt F: Documentation/trace/hisi-ptt.rst F: drivers/hwtracing/ptt/
+HISILICON HNS3 PMU DRIVER +M: Guangbin Huang huangguangbin2@huawei.com +S: Supported +F: Documentation/admin-guide/perf/hns3-pmu.rst +F: drivers/perf/hisilicon/hns3_pmu.c + HISILICON QM AND ZIP Controller DRIVER M: Zhou Wang wangzhou1@hisilicon.com L: linux-crypto@vger.kernel.org diff --git a/drivers/perf/hisilicon/Kconfig b/drivers/perf/hisilicon/Kconfig index 5546218b5598..171bfc1b6bc2 100644 --- a/drivers/perf/hisilicon/Kconfig +++ b/drivers/perf/hisilicon/Kconfig @@ -14,3 +14,13 @@ config HISI_PCIE_PMU RCiEP devices. Adds the PCIe PMU into perf events system for monitoring latency, bandwidth etc. + +config HNS3_PMU + tristate "HNS3 PERF PMU" + depends on ARM64 || COMPILE_TEST + depends on PCI + help + Provide support for HNS3 performance monitoring unit (PMU) RCiEP + devices. + Adds the HNS3 PMU into perf events system for monitoring latency, + bandwidth etc. diff --git a/drivers/perf/hisilicon/Makefile b/drivers/perf/hisilicon/Makefile index a3522abb3975..a35705795dfc 100644 --- a/drivers/perf/hisilicon/Makefile +++ b/drivers/perf/hisilicon/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_HISI_PMU) += hisi_uncore_pmu.o hisi_uncore_l3c_pmu.o \ hisi_uncore_lpddrc_pmu.o
obj-$(CONFIG_HISI_PCIE_PMU) += hisi_pcie_pmu.o +obj-$(CONFIG_HNS3_PMU) += hns3_pmu.o diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c new file mode 100644 index 000000000000..b9d201ff2e63 --- /dev/null +++ b/drivers/perf/hisilicon/hns3_pmu.c @@ -0,0 +1,1672 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This driver adds support for HNS3 PMU iEP device. Related perf events are + * bandwidth, latency, packet rate, interrupt rate etc. + * + * Copyright (C) 2022 HiSilicon Limited + */ +#include <linux/bitfield.h> +#include <linux/bitmap.h> +#include <linux/bug.h> +#include <linux/cpuhotplug.h> +#include <linux/cpumask.h> +#include <linux/delay.h> +#include <linux/device.h> +#include <linux/err.h> +#include <linux/interrupt.h> +#include <linux/iopoll.h> +#include <linux/io-64-nonatomic-hi-lo.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/pci-epf.h> +#include <linux/perf_event.h> +#include <linux/smp.h> + +/* Dynamic CPU hotplug state used by HNS3 PMU */ +static enum cpuhp_state hns3_pmu_online; + +/* registers offset address */ +#define HNS3_PMU_REG_GLOBAL_CTRL 0x0000 +#define HNS3_PMU_REG_CLOCK_FREQ 0x0020 +#define HNS3_PMU_REG_BDF 0x0fe0 +#define HNS3_PMU_REG_VERSION 0x0fe4 +#define HNS3_PMU_REG_DEVICE_ID 0x0fe8 + +#define HNS3_PMU_REG_EVENT_OFFSET 0x1000 +#define HNS3_PMU_REG_EVENT_SIZE 0x1000 +#define HNS3_PMU_REG_EVENT_CTRL_LOW 0x00 +#define HNS3_PMU_REG_EVENT_CTRL_HIGH 0x04 +#define HNS3_PMU_REG_EVENT_INTR_STATUS 0x08 +#define HNS3_PMU_REG_EVENT_INTR_MASK 0x0c +#define HNS3_PMU_REG_EVENT_COUNTER 0x10 +#define HNS3_PMU_REG_EVENT_EXT_COUNTER 0x18 +#define HNS3_PMU_REG_EVENT_QID_CTRL 0x28 +#define HNS3_PMU_REG_EVENT_QID_PARA 0x2c + +#define HNS3_PMU_FILTER_SUPPORT_GLOBAL BIT(0) +#define HNS3_PMU_FILTER_SUPPORT_PORT BIT(1) +#define HNS3_PMU_FILTER_SUPPORT_PORT_TC BIT(2) +#define HNS3_PMU_FILTER_SUPPORT_FUNC BIT(3) +#define HNS3_PMU_FILTER_SUPPORT_FUNC_QUEUE BIT(4) +#define HNS3_PMU_FILTER_SUPPORT_FUNC_INTR BIT(5) + +#define HNS3_PMU_FILTER_ALL_TC 0xf +#define HNS3_PMU_FILTER_ALL_QUEUE 0xffff + +#define HNS3_PMU_CTRL_SUBEVENT_S 4 +#define HNS3_PMU_CTRL_FILTER_MODE_S 24 + +#define HNS3_PMU_GLOBAL_START BIT(0) + +#define HNS3_PMU_EVENT_STATUS_RESET BIT(11) +#define HNS3_PMU_EVENT_EN BIT(12) +#define HNS3_PMU_EVENT_OVERFLOW_RESTART BIT(15) + +#define HNS3_PMU_QID_PARA_FUNC_S 0 +#define HNS3_PMU_QID_PARA_QUEUE_S 16 + +#define HNS3_PMU_QID_CTRL_REQ_ENABLE BIT(0) +#define HNS3_PMU_QID_CTRL_DONE BIT(1) +#define HNS3_PMU_QID_CTRL_MISS BIT(2) + +#define HNS3_PMU_INTR_MASK_OVERFLOW BIT(1) + +#define HNS3_PMU_MAX_HW_EVENTS 8 + +/* + * Each hardware event contains two registers (counter and ext_counter) for + * bandwidth, packet rate, latency and interrupt rate. These two registers will + * be triggered to run at the same when a hardware event is enabled. The meaning + * of counter and ext_counter of different event type are different, their + * meaning show as follow: + * + * +----------------+------------------+---------------+ + * | event type | counter | ext_counter | + * +----------------+------------------+---------------+ + * | bandwidth | byte number | cycle number | + * +----------------+------------------+---------------+ + * | packet rate | packet number | cycle number | + * +----------------+------------------+---------------+ + * | latency | cycle number | packet number | + * +----------------+------------------+---------------+ + * | interrupt rate | interrupt number | cycle number | + * +----------------+------------------+---------------+ + * + * The cycle number indicates increment of counter of hardware timer, the + * frequency of hardware timer can be read from hw_clk_freq file. + * + * Performance of each hardware event is calculated by: counter / ext_counter. + * + * Since processing of data is preferred to be done in userspace, we expose + * ext_counter as a separate event for userspace and use bit 16 to indicate it. + * For example, event 0x00001 and 0x10001 are actually one event for hardware + * because bit 0-15 are same. If the bit 16 of one event is 0 means to read + * counter register, otherwise means to read ext_counter register. + */ +/* bandwidth events */ +#define HNS3_PMU_EVT_BW_SSU_EGU_BYTE_NUM 0x00001 +#define HNS3_PMU_EVT_BW_SSU_EGU_TIME 0x10001 +#define HNS3_PMU_EVT_BW_SSU_RPU_BYTE_NUM 0x00002 +#define HNS3_PMU_EVT_BW_SSU_RPU_TIME 0x10002 +#define HNS3_PMU_EVT_BW_SSU_ROCE_BYTE_NUM 0x00003 +#define HNS3_PMU_EVT_BW_SSU_ROCE_TIME 0x10003 +#define HNS3_PMU_EVT_BW_ROCE_SSU_BYTE_NUM 0x00004 +#define HNS3_PMU_EVT_BW_ROCE_SSU_TIME 0x10004 +#define HNS3_PMU_EVT_BW_TPU_SSU_BYTE_NUM 0x00005 +#define HNS3_PMU_EVT_BW_TPU_SSU_TIME 0x10005 +#define HNS3_PMU_EVT_BW_RPU_RCBRX_BYTE_NUM 0x00006 +#define HNS3_PMU_EVT_BW_RPU_RCBRX_TIME 0x10006 +#define HNS3_PMU_EVT_BW_RCBTX_TXSCH_BYTE_NUM 0x00008 +#define HNS3_PMU_EVT_BW_RCBTX_TXSCH_TIME 0x10008 +#define HNS3_PMU_EVT_BW_WR_FBD_BYTE_NUM 0x00009 +#define HNS3_PMU_EVT_BW_WR_FBD_TIME 0x10009 +#define HNS3_PMU_EVT_BW_WR_EBD_BYTE_NUM 0x0000a +#define HNS3_PMU_EVT_BW_WR_EBD_TIME 0x1000a +#define HNS3_PMU_EVT_BW_RD_FBD_BYTE_NUM 0x0000b +#define HNS3_PMU_EVT_BW_RD_FBD_TIME 0x1000b +#define HNS3_PMU_EVT_BW_RD_EBD_BYTE_NUM 0x0000c +#define HNS3_PMU_EVT_BW_RD_EBD_TIME 0x1000c +#define HNS3_PMU_EVT_BW_RD_PAY_M0_BYTE_NUM 0x0000d +#define HNS3_PMU_EVT_BW_RD_PAY_M0_TIME 0x1000d +#define HNS3_PMU_EVT_BW_RD_PAY_M1_BYTE_NUM 0x0000e +#define HNS3_PMU_EVT_BW_RD_PAY_M1_TIME 0x1000e +#define HNS3_PMU_EVT_BW_WR_PAY_M0_BYTE_NUM 0x0000f +#define HNS3_PMU_EVT_BW_WR_PAY_M0_TIME 0x1000f +#define HNS3_PMU_EVT_BW_WR_PAY_M1_BYTE_NUM 0x00010 +#define HNS3_PMU_EVT_BW_WR_PAY_M1_TIME 0x10010 + +/* packet rate events */ +#define HNS3_PMU_EVT_PPS_IGU_SSU_PACKET_NUM 0x00100 +#define HNS3_PMU_EVT_PPS_IGU_SSU_TIME 0x10100 +#define HNS3_PMU_EVT_PPS_SSU_EGU_PACKET_NUM 0x00101 +#define HNS3_PMU_EVT_PPS_SSU_EGU_TIME 0x10101 +#define HNS3_PMU_EVT_PPS_SSU_RPU_PACKET_NUM 0x00102 +#define HNS3_PMU_EVT_PPS_SSU_RPU_TIME 0x10102 +#define HNS3_PMU_EVT_PPS_SSU_ROCE_PACKET_NUM 0x00103 +#define HNS3_PMU_EVT_PPS_SSU_ROCE_TIME 0x10103 +#define HNS3_PMU_EVT_PPS_ROCE_SSU_PACKET_NUM 0x00104 +#define HNS3_PMU_EVT_PPS_ROCE_SSU_TIME 0x10104 +#define HNS3_PMU_EVT_PPS_TPU_SSU_PACKET_NUM 0x00105 +#define HNS3_PMU_EVT_PPS_TPU_SSU_TIME 0x10105 +#define HNS3_PMU_EVT_PPS_RPU_RCBRX_PACKET_NUM 0x00106 +#define HNS3_PMU_EVT_PPS_RPU_RCBRX_TIME 0x10106 +#define HNS3_PMU_EVT_PPS_RCBTX_TPU_PACKET_NUM 0x00107 +#define HNS3_PMU_EVT_PPS_RCBTX_TPU_TIME 0x10107 +#define HNS3_PMU_EVT_PPS_RCBTX_TXSCH_PACKET_NUM 0x00108 +#define HNS3_PMU_EVT_PPS_RCBTX_TXSCH_TIME 0x10108 +#define HNS3_PMU_EVT_PPS_WR_FBD_PACKET_NUM 0x00109 +#define HNS3_PMU_EVT_PPS_WR_FBD_TIME 0x10109 +#define HNS3_PMU_EVT_PPS_WR_EBD_PACKET_NUM 0x0010a +#define HNS3_PMU_EVT_PPS_WR_EBD_TIME 0x1010a +#define HNS3_PMU_EVT_PPS_RD_FBD_PACKET_NUM 0x0010b +#define HNS3_PMU_EVT_PPS_RD_FBD_TIME 0x1010b +#define HNS3_PMU_EVT_PPS_RD_EBD_PACKET_NUM 0x0010c +#define HNS3_PMU_EVT_PPS_RD_EBD_TIME 0x1010c +#define HNS3_PMU_EVT_PPS_RD_PAY_M0_PACKET_NUM 0x0010d +#define HNS3_PMU_EVT_PPS_RD_PAY_M0_TIME 0x1010d +#define HNS3_PMU_EVT_PPS_RD_PAY_M1_PACKET_NUM 0x0010e +#define HNS3_PMU_EVT_PPS_RD_PAY_M1_TIME 0x1010e +#define HNS3_PMU_EVT_PPS_WR_PAY_M0_PACKET_NUM 0x0010f +#define HNS3_PMU_EVT_PPS_WR_PAY_M0_TIME 0x1010f +#define HNS3_PMU_EVT_PPS_WR_PAY_M1_PACKET_NUM 0x00110 +#define HNS3_PMU_EVT_PPS_WR_PAY_M1_TIME 0x10110 +#define HNS3_PMU_EVT_PPS_NICROH_TX_PRE_PACKET_NUM 0x00111 +#define HNS3_PMU_EVT_PPS_NICROH_TX_PRE_TIME 0x10111 +#define HNS3_PMU_EVT_PPS_NICROH_RX_PRE_PACKET_NUM 0x00112 +#define HNS3_PMU_EVT_PPS_NICROH_RX_PRE_TIME 0x10112 + +/* latency events */ +#define HNS3_PMU_EVT_DLY_TX_PUSH_TIME 0x00202 +#define HNS3_PMU_EVT_DLY_TX_PUSH_PACKET_NUM 0x10202 +#define HNS3_PMU_EVT_DLY_TX_TIME 0x00204 +#define HNS3_PMU_EVT_DLY_TX_PACKET_NUM 0x10204 +#define HNS3_PMU_EVT_DLY_SSU_TX_NIC_TIME 0x00206 +#define HNS3_PMU_EVT_DLY_SSU_TX_NIC_PACKET_NUM 0x10206 +#define HNS3_PMU_EVT_DLY_SSU_TX_ROCE_TIME 0x00207 +#define HNS3_PMU_EVT_DLY_SSU_TX_ROCE_PACKET_NUM 0x10207 +#define HNS3_PMU_EVT_DLY_SSU_RX_NIC_TIME 0x00208 +#define HNS3_PMU_EVT_DLY_SSU_RX_NIC_PACKET_NUM 0x10208 +#define HNS3_PMU_EVT_DLY_SSU_RX_ROCE_TIME 0x00209 +#define HNS3_PMU_EVT_DLY_SSU_RX_ROCE_PACKET_NUM 0x10209 +#define HNS3_PMU_EVT_DLY_RPU_TIME 0x0020e +#define HNS3_PMU_EVT_DLY_RPU_PACKET_NUM 0x1020e +#define HNS3_PMU_EVT_DLY_TPU_TIME 0x0020f +#define HNS3_PMU_EVT_DLY_TPU_PACKET_NUM 0x1020f +#define HNS3_PMU_EVT_DLY_RPE_TIME 0x00210 +#define HNS3_PMU_EVT_DLY_RPE_PACKET_NUM 0x10210 +#define HNS3_PMU_EVT_DLY_TPE_TIME 0x00211 +#define HNS3_PMU_EVT_DLY_TPE_PACKET_NUM 0x10211 +#define HNS3_PMU_EVT_DLY_TPE_PUSH_TIME 0x00212 +#define HNS3_PMU_EVT_DLY_TPE_PUSH_PACKET_NUM 0x10212 +#define HNS3_PMU_EVT_DLY_WR_FBD_TIME 0x00213 +#define HNS3_PMU_EVT_DLY_WR_FBD_PACKET_NUM 0x10213 +#define HNS3_PMU_EVT_DLY_WR_EBD_TIME 0x00214 +#define HNS3_PMU_EVT_DLY_WR_EBD_PACKET_NUM 0x10214 +#define HNS3_PMU_EVT_DLY_RD_FBD_TIME 0x00215 +#define HNS3_PMU_EVT_DLY_RD_FBD_PACKET_NUM 0x10215 +#define HNS3_PMU_EVT_DLY_RD_EBD_TIME 0x00216 +#define HNS3_PMU_EVT_DLY_RD_EBD_PACKET_NUM 0x10216 +#define HNS3_PMU_EVT_DLY_RD_PAY_M0_TIME 0x00217 +#define HNS3_PMU_EVT_DLY_RD_PAY_M0_PACKET_NUM 0x10217 +#define HNS3_PMU_EVT_DLY_RD_PAY_M1_TIME 0x00218 +#define HNS3_PMU_EVT_DLY_RD_PAY_M1_PACKET_NUM 0x10218 +#define HNS3_PMU_EVT_DLY_WR_PAY_M0_TIME 0x00219 +#define HNS3_PMU_EVT_DLY_WR_PAY_M0_PACKET_NUM 0x10219 +#define HNS3_PMU_EVT_DLY_WR_PAY_M1_TIME 0x0021a +#define HNS3_PMU_EVT_DLY_WR_PAY_M1_PACKET_NUM 0x1021a +#define HNS3_PMU_EVT_DLY_MSIX_WRITE_TIME 0x0021c +#define HNS3_PMU_EVT_DLY_MSIX_WRITE_PACKET_NUM 0x1021c + +/* interrupt rate events */ +#define HNS3_PMU_EVT_PPS_MSIX_NIC_INTR_NUM 0x00300 +#define HNS3_PMU_EVT_PPS_MSIX_NIC_TIME 0x10300 + +/* filter mode supported by each bandwidth event */ +#define HNS3_PMU_FILTER_BW_SSU_EGU 0x07 +#define HNS3_PMU_FILTER_BW_SSU_RPU 0x1f +#define HNS3_PMU_FILTER_BW_SSU_ROCE 0x0f +#define HNS3_PMU_FILTER_BW_ROCE_SSU 0x0f +#define HNS3_PMU_FILTER_BW_TPU_SSU 0x1f +#define HNS3_PMU_FILTER_BW_RPU_RCBRX 0x11 +#define HNS3_PMU_FILTER_BW_RCBTX_TXSCH 0x11 +#define HNS3_PMU_FILTER_BW_WR_FBD 0x1b +#define HNS3_PMU_FILTER_BW_WR_EBD 0x11 +#define HNS3_PMU_FILTER_BW_RD_FBD 0x01 +#define HNS3_PMU_FILTER_BW_RD_EBD 0x1b +#define HNS3_PMU_FILTER_BW_RD_PAY_M0 0x01 +#define HNS3_PMU_FILTER_BW_RD_PAY_M1 0x01 +#define HNS3_PMU_FILTER_BW_WR_PAY_M0 0x01 +#define HNS3_PMU_FILTER_BW_WR_PAY_M1 0x01 + +/* filter mode supported by each packet rate event */ +#define HNS3_PMU_FILTER_PPS_IGU_SSU 0x07 +#define HNS3_PMU_FILTER_PPS_SSU_EGU 0x07 +#define HNS3_PMU_FILTER_PPS_SSU_RPU 0x1f +#define HNS3_PMU_FILTER_PPS_SSU_ROCE 0x0f +#define HNS3_PMU_FILTER_PPS_ROCE_SSU 0x0f +#define HNS3_PMU_FILTER_PPS_TPU_SSU 0x1f +#define HNS3_PMU_FILTER_PPS_RPU_RCBRX 0x11 +#define HNS3_PMU_FILTER_PPS_RCBTX_TPU 0x1f +#define HNS3_PMU_FILTER_PPS_RCBTX_TXSCH 0x11 +#define HNS3_PMU_FILTER_PPS_WR_FBD 0x1b +#define HNS3_PMU_FILTER_PPS_WR_EBD 0x11 +#define HNS3_PMU_FILTER_PPS_RD_FBD 0x01 +#define HNS3_PMU_FILTER_PPS_RD_EBD 0x1b +#define HNS3_PMU_FILTER_PPS_RD_PAY_M0 0x01 +#define HNS3_PMU_FILTER_PPS_RD_PAY_M1 0x01 +#define HNS3_PMU_FILTER_PPS_WR_PAY_M0 0x01 +#define HNS3_PMU_FILTER_PPS_WR_PAY_M1 0x01 +#define HNS3_PMU_FILTER_PPS_NICROH_TX_PRE 0x01 +#define HNS3_PMU_FILTER_PPS_NICROH_RX_PRE 0x01 + +/* filter mode supported by each latency event */ +#define HNS3_PMU_FILTER_DLY_TX_PUSH 0x01 +#define HNS3_PMU_FILTER_DLY_TX 0x01 +#define HNS3_PMU_FILTER_DLY_SSU_TX_NIC 0x07 +#define HNS3_PMU_FILTER_DLY_SSU_TX_ROCE 0x07 +#define HNS3_PMU_FILTER_DLY_SSU_RX_NIC 0x07 +#define HNS3_PMU_FILTER_DLY_SSU_RX_ROCE 0x07 +#define HNS3_PMU_FILTER_DLY_RPU 0x11 +#define HNS3_PMU_FILTER_DLY_TPU 0x1f +#define HNS3_PMU_FILTER_DLY_RPE 0x01 +#define HNS3_PMU_FILTER_DLY_TPE 0x0b +#define HNS3_PMU_FILTER_DLY_TPE_PUSH 0x1b +#define HNS3_PMU_FILTER_DLY_WR_FBD 0x1b +#define HNS3_PMU_FILTER_DLY_WR_EBD 0x11 +#define HNS3_PMU_FILTER_DLY_RD_FBD 0x01 +#define HNS3_PMU_FILTER_DLY_RD_EBD 0x1b +#define HNS3_PMU_FILTER_DLY_RD_PAY_M0 0x01 +#define HNS3_PMU_FILTER_DLY_RD_PAY_M1 0x01 +#define HNS3_PMU_FILTER_DLY_WR_PAY_M0 0x01 +#define HNS3_PMU_FILTER_DLY_WR_PAY_M1 0x01 +#define HNS3_PMU_FILTER_DLY_MSIX_WRITE 0x01 + +/* filter mode supported by each interrupt rate event */ +#define HNS3_PMU_FILTER_INTR_MSIX_NIC 0x01 + +enum hns3_pmu_hw_filter_mode { + HNS3_PMU_HW_FILTER_GLOBAL, + HNS3_PMU_HW_FILTER_PORT, + HNS3_PMU_HW_FILTER_PORT_TC, + HNS3_PMU_HW_FILTER_FUNC, + HNS3_PMU_HW_FILTER_FUNC_QUEUE, + HNS3_PMU_HW_FILTER_FUNC_INTR, +}; + +struct hns3_pmu_event_attr { + u32 event; + u16 filter_support; +}; + +struct hns3_pmu { + struct perf_event *hw_events[HNS3_PMU_MAX_HW_EVENTS]; + struct hlist_node node; + struct pci_dev *pdev; + struct pmu pmu; + void __iomem *base; + int irq; + int on_cpu; + u32 identifier; + u32 hw_clk_freq; /* hardware clock frequency of PMU */ + /* maximum and minimum bdf allowed by PMU */ + u16 bdf_min; + u16 bdf_max; +}; + +#define to_hns3_pmu(p) (container_of((p), struct hns3_pmu, pmu)) + +#define GET_PCI_DEVFN(bdf) ((bdf) & 0xff) + +#define FILTER_CONDITION_PORT(port) ((1 << (port)) & 0xff) +#define FILTER_CONDITION_PORT_TC(port, tc) (((port) << 3) | ((tc) & 0x07)) +#define FILTER_CONDITION_FUNC_INTR(func, intr) (((intr) << 8) | (func)) + +#define HNS3_PMU_FILTER_ATTR(_name, _config, _start, _end) \ + static inline u64 hns3_pmu_get_##_name(struct perf_event *event) \ + { \ + return FIELD_GET(GENMASK_ULL(_end, _start), \ + event->attr._config); \ + } + +HNS3_PMU_FILTER_ATTR(subevent, config, 0, 7); +HNS3_PMU_FILTER_ATTR(event_type, config, 8, 15); +HNS3_PMU_FILTER_ATTR(ext_counter_used, config, 16, 16); +HNS3_PMU_FILTER_ATTR(port, config1, 0, 3); +HNS3_PMU_FILTER_ATTR(tc, config1, 4, 7); +HNS3_PMU_FILTER_ATTR(bdf, config1, 8, 23); +HNS3_PMU_FILTER_ATTR(queue, config1, 24, 39); +HNS3_PMU_FILTER_ATTR(intr, config1, 40, 51); +HNS3_PMU_FILTER_ATTR(global, config1, 52, 52); + +#define HNS3_BW_EVT_BYTE_NUM(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_BW_##_name##_BYTE_NUM, \ + HNS3_PMU_FILTER_BW_##_name}) +#define HNS3_BW_EVT_TIME(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_BW_##_name##_TIME, \ + HNS3_PMU_FILTER_BW_##_name}) +#define HNS3_PPS_EVT_PACKET_NUM(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_PPS_##_name##_PACKET_NUM, \ + HNS3_PMU_FILTER_PPS_##_name}) +#define HNS3_PPS_EVT_TIME(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_PPS_##_name##_TIME, \ + HNS3_PMU_FILTER_PPS_##_name}) +#define HNS3_DLY_EVT_TIME(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_DLY_##_name##_TIME, \ + HNS3_PMU_FILTER_DLY_##_name}) +#define HNS3_DLY_EVT_PACKET_NUM(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_DLY_##_name##_PACKET_NUM, \ + HNS3_PMU_FILTER_DLY_##_name}) +#define HNS3_INTR_EVT_INTR_NUM(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_PPS_##_name##_INTR_NUM, \ + HNS3_PMU_FILTER_INTR_##_name}) +#define HNS3_INTR_EVT_TIME(_name) (&(struct hns3_pmu_event_attr) {\ + HNS3_PMU_EVT_PPS_##_name##_TIME, \ + HNS3_PMU_FILTER_INTR_##_name}) + +static ssize_t hns3_pmu_format_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_ext_attribute *eattr; + + eattr = container_of(attr, struct dev_ext_attribute, attr); + + return sysfs_emit(buf, "%s\n", (char *)eattr->var); +} + +static ssize_t hns3_pmu_event_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct hns3_pmu_event_attr *event; + struct dev_ext_attribute *eattr; + + eattr = container_of(attr, struct dev_ext_attribute, attr); + event = eattr->var; + + return sysfs_emit(buf, "config=0x%x\n", event->event); +} + +static ssize_t hns3_pmu_filter_mode_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct hns3_pmu_event_attr *event; + struct dev_ext_attribute *eattr; + int len; + + eattr = container_of(attr, struct dev_ext_attribute, attr); + event = eattr->var; + + len = sysfs_emit_at(buf, 0, "filter mode supported: "); + if (event->filter_support & HNS3_PMU_FILTER_SUPPORT_GLOBAL) + len += sysfs_emit_at(buf, len, "global "); + if (event->filter_support & HNS3_PMU_FILTER_SUPPORT_PORT) + len += sysfs_emit_at(buf, len, "port "); + if (event->filter_support & HNS3_PMU_FILTER_SUPPORT_PORT_TC) + len += sysfs_emit_at(buf, len, "port-tc "); + if (event->filter_support & HNS3_PMU_FILTER_SUPPORT_FUNC) + len += sysfs_emit_at(buf, len, "func "); + if (event->filter_support & HNS3_PMU_FILTER_SUPPORT_FUNC_QUEUE) + len += sysfs_emit_at(buf, len, "func-queue "); + if (event->filter_support & HNS3_PMU_FILTER_SUPPORT_FUNC_INTR) + len += sysfs_emit_at(buf, len, "func-intr "); + + len += sysfs_emit_at(buf, len, "\n"); + + return len; +} + +#define HNS3_PMU_ATTR(_name, _func, _config) \ + (&((struct dev_ext_attribute[]) { \ + { __ATTR(_name, 0444, _func, NULL), (void *)_config } \ + })[0].attr.attr) + +#define HNS3_PMU_FORMAT_ATTR(_name, _format) \ + HNS3_PMU_ATTR(_name, hns3_pmu_format_show, (void *)_format) +#define HNS3_PMU_EVENT_ATTR(_name, _event) \ + HNS3_PMU_ATTR(_name, hns3_pmu_event_show, (void *)_event) +#define HNS3_PMU_FLT_MODE_ATTR(_name, _event) \ + HNS3_PMU_ATTR(_name, hns3_pmu_filter_mode_show, (void *)_event) + +#define HNS3_PMU_BW_EVT_PAIR(_name, _macro) \ + HNS3_PMU_EVENT_ATTR(_name##_byte_num, HNS3_BW_EVT_BYTE_NUM(_macro)), \ + HNS3_PMU_EVENT_ATTR(_name##_time, HNS3_BW_EVT_TIME(_macro)) +#define HNS3_PMU_PPS_EVT_PAIR(_name, _macro) \ + HNS3_PMU_EVENT_ATTR(_name##_packet_num, HNS3_PPS_EVT_PACKET_NUM(_macro)), \ + HNS3_PMU_EVENT_ATTR(_name##_time, HNS3_PPS_EVT_TIME(_macro)) +#define HNS3_PMU_DLY_EVT_PAIR(_name, _macro) \ + HNS3_PMU_EVENT_ATTR(_name##_time, HNS3_DLY_EVT_TIME(_macro)), \ + HNS3_PMU_EVENT_ATTR(_name##_packet_num, HNS3_DLY_EVT_PACKET_NUM(_macro)) +#define HNS3_PMU_INTR_EVT_PAIR(_name, _macro) \ + HNS3_PMU_EVENT_ATTR(_name##_intr_num, HNS3_INTR_EVT_INTR_NUM(_macro)), \ + HNS3_PMU_EVENT_ATTR(_name##_time, HNS3_INTR_EVT_TIME(_macro)) + +#define HNS3_PMU_BW_FLT_MODE_PAIR(_name, _macro) \ + HNS3_PMU_FLT_MODE_ATTR(_name##_byte_num, HNS3_BW_EVT_BYTE_NUM(_macro)), \ + HNS3_PMU_FLT_MODE_ATTR(_name##_time, HNS3_BW_EVT_TIME(_macro)) +#define HNS3_PMU_PPS_FLT_MODE_PAIR(_name, _macro) \ + HNS3_PMU_FLT_MODE_ATTR(_name##_packet_num, HNS3_PPS_EVT_PACKET_NUM(_macro)), \ + HNS3_PMU_FLT_MODE_ATTR(_name##_time, HNS3_PPS_EVT_TIME(_macro)) +#define HNS3_PMU_DLY_FLT_MODE_PAIR(_name, _macro) \ + HNS3_PMU_FLT_MODE_ATTR(_name##_time, HNS3_DLY_EVT_TIME(_macro)), \ + HNS3_PMU_FLT_MODE_ATTR(_name##_packet_num, HNS3_DLY_EVT_PACKET_NUM(_macro)) +#define HNS3_PMU_INTR_FLT_MODE_PAIR(_name, _macro) \ + HNS3_PMU_FLT_MODE_ATTR(_name##_intr_num, HNS3_INTR_EVT_INTR_NUM(_macro)), \ + HNS3_PMU_FLT_MODE_ATTR(_name##_time, HNS3_INTR_EVT_TIME(_macro)) + +static u8 hns3_pmu_hw_filter_modes[] = { + HNS3_PMU_HW_FILTER_GLOBAL, + HNS3_PMU_HW_FILTER_PORT, + HNS3_PMU_HW_FILTER_PORT_TC, + HNS3_PMU_HW_FILTER_FUNC, + HNS3_PMU_HW_FILTER_FUNC_QUEUE, + HNS3_PMU_HW_FILTER_FUNC_INTR, +}; + +#define HNS3_PMU_SET_HW_FILTER(_hwc, _mode) \ + ((_hwc)->addr_filters = (void *)&hns3_pmu_hw_filter_modes[(_mode)]) + +static ssize_t identifier_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "0x%x\n", hns3_pmu->identifier); +} +static DEVICE_ATTR_RO(identifier); + +static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "%d\n", hns3_pmu->on_cpu); +} +static DEVICE_ATTR_RO(cpumask); + +static ssize_t bdf_min_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(dev_get_drvdata(dev)); + u16 bdf = hns3_pmu->bdf_min; + + return sysfs_emit(buf, "%02x:%02x.%x\n", PCI_BUS_NUM(bdf), + PCI_SLOT(bdf), PCI_FUNC(bdf)); +} +static DEVICE_ATTR_RO(bdf_min); + +static ssize_t bdf_max_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(dev_get_drvdata(dev)); + u16 bdf = hns3_pmu->bdf_max; + + return sysfs_emit(buf, "%02x:%02x.%x\n", PCI_BUS_NUM(bdf), + PCI_SLOT(bdf), PCI_FUNC(bdf)); +} +static DEVICE_ATTR_RO(bdf_max); + +static ssize_t hw_clk_freq_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "%u\n", hns3_pmu->hw_clk_freq); +} +static DEVICE_ATTR_RO(hw_clk_freq); + +static struct attribute *hns3_pmu_events_attr[] = { + /* bandwidth events */ + HNS3_PMU_BW_EVT_PAIR(bw_ssu_egu, SSU_EGU), + HNS3_PMU_BW_EVT_PAIR(bw_ssu_rpu, SSU_RPU), + HNS3_PMU_BW_EVT_PAIR(bw_ssu_roce, SSU_ROCE), + HNS3_PMU_BW_EVT_PAIR(bw_roce_ssu, ROCE_SSU), + HNS3_PMU_BW_EVT_PAIR(bw_tpu_ssu, TPU_SSU), + HNS3_PMU_BW_EVT_PAIR(bw_rpu_rcbrx, RPU_RCBRX), + HNS3_PMU_BW_EVT_PAIR(bw_rcbtx_txsch, RCBTX_TXSCH), + HNS3_PMU_BW_EVT_PAIR(bw_wr_fbd, WR_FBD), + HNS3_PMU_BW_EVT_PAIR(bw_wr_ebd, WR_EBD), + HNS3_PMU_BW_EVT_PAIR(bw_rd_fbd, RD_FBD), + HNS3_PMU_BW_EVT_PAIR(bw_rd_ebd, RD_EBD), + HNS3_PMU_BW_EVT_PAIR(bw_rd_pay_m0, RD_PAY_M0), + HNS3_PMU_BW_EVT_PAIR(bw_rd_pay_m1, RD_PAY_M1), + HNS3_PMU_BW_EVT_PAIR(bw_wr_pay_m0, WR_PAY_M0), + HNS3_PMU_BW_EVT_PAIR(bw_wr_pay_m1, WR_PAY_M1), + + /* packet rate events */ + HNS3_PMU_PPS_EVT_PAIR(pps_igu_ssu, IGU_SSU), + HNS3_PMU_PPS_EVT_PAIR(pps_ssu_egu, SSU_EGU), + HNS3_PMU_PPS_EVT_PAIR(pps_ssu_rpu, SSU_RPU), + HNS3_PMU_PPS_EVT_PAIR(pps_ssu_roce, SSU_ROCE), + HNS3_PMU_PPS_EVT_PAIR(pps_roce_ssu, ROCE_SSU), + HNS3_PMU_PPS_EVT_PAIR(pps_tpu_ssu, TPU_SSU), + HNS3_PMU_PPS_EVT_PAIR(pps_rpu_rcbrx, RPU_RCBRX), + HNS3_PMU_PPS_EVT_PAIR(pps_rcbtx_tpu, RCBTX_TPU), + HNS3_PMU_PPS_EVT_PAIR(pps_rcbtx_txsch, RCBTX_TXSCH), + HNS3_PMU_PPS_EVT_PAIR(pps_wr_fbd, WR_FBD), + HNS3_PMU_PPS_EVT_PAIR(pps_wr_ebd, WR_EBD), + HNS3_PMU_PPS_EVT_PAIR(pps_rd_fbd, RD_FBD), + HNS3_PMU_PPS_EVT_PAIR(pps_rd_ebd, RD_EBD), + HNS3_PMU_PPS_EVT_PAIR(pps_rd_pay_m0, RD_PAY_M0), + HNS3_PMU_PPS_EVT_PAIR(pps_rd_pay_m1, RD_PAY_M1), + HNS3_PMU_PPS_EVT_PAIR(pps_wr_pay_m0, WR_PAY_M0), + HNS3_PMU_PPS_EVT_PAIR(pps_wr_pay_m1, WR_PAY_M1), + HNS3_PMU_PPS_EVT_PAIR(pps_intr_nicroh_tx_pre, NICROH_TX_PRE), + HNS3_PMU_PPS_EVT_PAIR(pps_intr_nicroh_rx_pre, NICROH_RX_PRE), + + /* latency events */ + HNS3_PMU_DLY_EVT_PAIR(dly_tx_push_to_mac, TX_PUSH), + HNS3_PMU_DLY_EVT_PAIR(dly_tx_normal_to_mac, TX), + HNS3_PMU_DLY_EVT_PAIR(dly_ssu_tx_th_nic, SSU_TX_NIC), + HNS3_PMU_DLY_EVT_PAIR(dly_ssu_tx_th_roce, SSU_TX_ROCE), + HNS3_PMU_DLY_EVT_PAIR(dly_ssu_rx_th_nic, SSU_RX_NIC), + HNS3_PMU_DLY_EVT_PAIR(dly_ssu_rx_th_roce, SSU_RX_ROCE), + HNS3_PMU_DLY_EVT_PAIR(dly_rpu, RPU), + HNS3_PMU_DLY_EVT_PAIR(dly_tpu, TPU), + HNS3_PMU_DLY_EVT_PAIR(dly_rpe, RPE), + HNS3_PMU_DLY_EVT_PAIR(dly_tpe_normal, TPE), + HNS3_PMU_DLY_EVT_PAIR(dly_tpe_push, TPE_PUSH), + HNS3_PMU_DLY_EVT_PAIR(dly_wr_fbd, WR_FBD), + HNS3_PMU_DLY_EVT_PAIR(dly_wr_ebd, WR_EBD), + HNS3_PMU_DLY_EVT_PAIR(dly_rd_fbd, RD_FBD), + HNS3_PMU_DLY_EVT_PAIR(dly_rd_ebd, RD_EBD), + HNS3_PMU_DLY_EVT_PAIR(dly_rd_pay_m0, RD_PAY_M0), + HNS3_PMU_DLY_EVT_PAIR(dly_rd_pay_m1, RD_PAY_M1), + HNS3_PMU_DLY_EVT_PAIR(dly_wr_pay_m0, WR_PAY_M0), + HNS3_PMU_DLY_EVT_PAIR(dly_wr_pay_m1, WR_PAY_M1), + HNS3_PMU_DLY_EVT_PAIR(dly_msix_write, MSIX_WRITE), + + /* interrupt rate events */ + HNS3_PMU_INTR_EVT_PAIR(pps_intr_msix_nic, MSIX_NIC), + + NULL +}; + +static struct attribute *hns3_pmu_filter_mode_attr[] = { + /* bandwidth events */ + HNS3_PMU_BW_FLT_MODE_PAIR(bw_ssu_egu, SSU_EGU), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_ssu_rpu, SSU_RPU), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_ssu_roce, SSU_ROCE), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_roce_ssu, ROCE_SSU), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_tpu_ssu, TPU_SSU), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_rpu_rcbrx, RPU_RCBRX), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_rcbtx_txsch, RCBTX_TXSCH), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_wr_fbd, WR_FBD), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_wr_ebd, WR_EBD), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_rd_fbd, RD_FBD), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_rd_ebd, RD_EBD), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_rd_pay_m0, RD_PAY_M0), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_rd_pay_m1, RD_PAY_M1), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_wr_pay_m0, WR_PAY_M0), + HNS3_PMU_BW_FLT_MODE_PAIR(bw_wr_pay_m1, WR_PAY_M1), + + /* packet rate events */ + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_igu_ssu, IGU_SSU), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_ssu_egu, SSU_EGU), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_ssu_rpu, SSU_RPU), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_ssu_roce, SSU_ROCE), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_roce_ssu, ROCE_SSU), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_tpu_ssu, TPU_SSU), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rpu_rcbrx, RPU_RCBRX), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rcbtx_tpu, RCBTX_TPU), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rcbtx_txsch, RCBTX_TXSCH), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_wr_fbd, WR_FBD), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_wr_ebd, WR_EBD), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rd_fbd, RD_FBD), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rd_ebd, RD_EBD), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rd_pay_m0, RD_PAY_M0), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_rd_pay_m1, RD_PAY_M1), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_wr_pay_m0, WR_PAY_M0), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_wr_pay_m1, WR_PAY_M1), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_intr_nicroh_tx_pre, NICROH_TX_PRE), + HNS3_PMU_PPS_FLT_MODE_PAIR(pps_intr_nicroh_rx_pre, NICROH_RX_PRE), + + /* latency events */ + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_tx_push_to_mac, TX_PUSH), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_tx_normal_to_mac, TX), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_ssu_tx_th_nic, SSU_TX_NIC), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_ssu_tx_th_roce, SSU_TX_ROCE), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_ssu_rx_th_nic, SSU_RX_NIC), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_ssu_rx_th_roce, SSU_RX_ROCE), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_rpu, RPU), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_tpu, TPU), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_rpe, RPE), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_tpe_normal, TPE), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_tpe_push, TPE_PUSH), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_wr_fbd, WR_FBD), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_wr_ebd, WR_EBD), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_rd_fbd, RD_FBD), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_rd_ebd, RD_EBD), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_rd_pay_m0, RD_PAY_M0), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_rd_pay_m1, RD_PAY_M1), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_wr_pay_m0, WR_PAY_M0), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_wr_pay_m1, WR_PAY_M1), + HNS3_PMU_DLY_FLT_MODE_PAIR(dly_msix_write, MSIX_WRITE), + + /* interrupt rate events */ + HNS3_PMU_INTR_FLT_MODE_PAIR(pps_intr_msix_nic, MSIX_NIC), + + NULL +}; + +static struct attribute_group hns3_pmu_events_group = { + .name = "events", + .attrs = hns3_pmu_events_attr, +}; + +static struct attribute_group hns3_pmu_filter_mode_group = { + .name = "filtermode", + .attrs = hns3_pmu_filter_mode_attr, +}; + +static struct attribute *hns3_pmu_format_attr[] = { + HNS3_PMU_FORMAT_ATTR(subevent, "config:0-7"), + HNS3_PMU_FORMAT_ATTR(event_type, "config:8-15"), + HNS3_PMU_FORMAT_ATTR(ext_counter_used, "config:16"), + HNS3_PMU_FORMAT_ATTR(port, "config1:0-3"), + HNS3_PMU_FORMAT_ATTR(tc, "config1:4-7"), + HNS3_PMU_FORMAT_ATTR(bdf, "config1:8-23"), + HNS3_PMU_FORMAT_ATTR(queue, "config1:24-39"), + HNS3_PMU_FORMAT_ATTR(intr, "config1:40-51"), + HNS3_PMU_FORMAT_ATTR(global, "config1:52"), + NULL +}; + +static struct attribute_group hns3_pmu_format_group = { + .name = "format", + .attrs = hns3_pmu_format_attr, +}; + +static struct attribute *hns3_pmu_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL +}; + +static struct attribute_group hns3_pmu_cpumask_attr_group = { + .attrs = hns3_pmu_cpumask_attrs, +}; + +static struct attribute *hns3_pmu_identifier_attrs[] = { + &dev_attr_identifier.attr, + NULL +}; + +static struct attribute_group hns3_pmu_identifier_attr_group = { + .attrs = hns3_pmu_identifier_attrs, +}; + +static struct attribute *hns3_pmu_bdf_range_attrs[] = { + &dev_attr_bdf_min.attr, + &dev_attr_bdf_max.attr, + NULL +}; + +static struct attribute_group hns3_pmu_bdf_range_attr_group = { + .attrs = hns3_pmu_bdf_range_attrs, +}; + +static struct attribute *hns3_pmu_hw_clk_freq_attrs[] = { + &dev_attr_hw_clk_freq.attr, + NULL +}; + +static struct attribute_group hns3_pmu_hw_clk_freq_attr_group = { + .attrs = hns3_pmu_hw_clk_freq_attrs, +}; + +static const struct attribute_group *hns3_pmu_attr_groups[] = { + &hns3_pmu_events_group, + &hns3_pmu_filter_mode_group, + &hns3_pmu_format_group, + &hns3_pmu_cpumask_attr_group, + &hns3_pmu_identifier_attr_group, + &hns3_pmu_bdf_range_attr_group, + &hns3_pmu_hw_clk_freq_attr_group, + NULL +}; + +static u32 hns3_pmu_get_event(struct perf_event *event) +{ + return hns3_pmu_get_ext_counter_used(event) << 16 | + hns3_pmu_get_event_type(event) << 8 | + hns3_pmu_get_subevent(event); +} + +static u32 hns3_pmu_get_real_event(struct perf_event *event) +{ + return hns3_pmu_get_event_type(event) << 8 | + hns3_pmu_get_subevent(event); +} + +static u32 hns3_pmu_get_offset(u32 offset, u32 idx) +{ + return offset + HNS3_PMU_REG_EVENT_OFFSET + + HNS3_PMU_REG_EVENT_SIZE * idx; +} + +static u32 hns3_pmu_readl(struct hns3_pmu *hns3_pmu, u32 reg_offset, u32 idx) +{ + u32 offset = hns3_pmu_get_offset(reg_offset, idx); + + return readl(hns3_pmu->base + offset); +} + +static void hns3_pmu_writel(struct hns3_pmu *hns3_pmu, u32 reg_offset, u32 idx, + u32 val) +{ + u32 offset = hns3_pmu_get_offset(reg_offset, idx); + + writel(val, hns3_pmu->base + offset); +} + +static u64 hns3_pmu_readq(struct hns3_pmu *hns3_pmu, u32 reg_offset, u32 idx) +{ + u32 offset = hns3_pmu_get_offset(reg_offset, idx); + + return readq(hns3_pmu->base + offset); +} + +static void hns3_pmu_writeq(struct hns3_pmu *hns3_pmu, u32 reg_offset, u32 idx, + u64 val) +{ + u32 offset = hns3_pmu_get_offset(reg_offset, idx); + + writeq(val, hns3_pmu->base + offset); +} + +static bool hns3_pmu_cmp_event(struct perf_event *target, + struct perf_event *event) +{ + return hns3_pmu_get_real_event(target) == hns3_pmu_get_real_event(event); +} + +static int hns3_pmu_find_related_event_idx(struct hns3_pmu *hns3_pmu, + struct perf_event *event) +{ + struct perf_event *sibling; + int hw_event_used = 0; + int idx; + + for (idx = 0; idx < HNS3_PMU_MAX_HW_EVENTS; idx++) { + sibling = hns3_pmu->hw_events[idx]; + if (!sibling) + continue; + + hw_event_used++; + + if (!hns3_pmu_cmp_event(sibling, event)) + continue; + + /* Related events is used in group */ + if (sibling->group_leader == event->group_leader) + return idx; + } + + /* No related event and all hardware events are used up */ + if (hw_event_used >= HNS3_PMU_MAX_HW_EVENTS) + return -EBUSY; + + /* No related event and there is extra hardware events can be use */ + return -ENOENT; +} + +static int hns3_pmu_get_event_idx(struct hns3_pmu *hns3_pmu) +{ + int idx; + + for (idx = 0; idx < HNS3_PMU_MAX_HW_EVENTS; idx++) { + if (!hns3_pmu->hw_events[idx]) + return idx; + } + + return -EBUSY; +} + +static bool hns3_pmu_valid_bdf(struct hns3_pmu *hns3_pmu, u16 bdf) +{ + struct pci_dev *pdev; + + if (bdf < hns3_pmu->bdf_min || bdf > hns3_pmu->bdf_max) { + pci_err(hns3_pmu->pdev, "Invalid EP device: %#x!\n", bdf); + return false; + } + + pdev = pci_get_domain_bus_and_slot(pci_domain_nr(hns3_pmu->pdev->bus), + PCI_BUS_NUM(bdf), + GET_PCI_DEVFN(bdf)); + if (!pdev) { + pci_err(hns3_pmu->pdev, "Nonexistent EP device: %#x!\n", bdf); + return false; + } + + pci_dev_put(pdev); + return true; +} + +static void hns3_pmu_set_qid_para(struct hns3_pmu *hns3_pmu, u32 idx, u16 bdf, + u16 queue) +{ + u32 val; + + val = GET_PCI_DEVFN(bdf); + val |= (u32)queue << HNS3_PMU_QID_PARA_QUEUE_S; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_QID_PARA, idx, val); +} + +static bool hns3_pmu_qid_req_start(struct hns3_pmu *hns3_pmu, u32 idx) +{ + bool queue_id_valid = false; + u32 reg_qid_ctrl, val; + int err; + + /* enable queue id request */ + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_QID_CTRL, idx, + HNS3_PMU_QID_CTRL_REQ_ENABLE); + + reg_qid_ctrl = hns3_pmu_get_offset(HNS3_PMU_REG_EVENT_QID_CTRL, idx); + err = readl_poll_timeout(hns3_pmu->base + reg_qid_ctrl, val, + val & HNS3_PMU_QID_CTRL_DONE, 1, 1000); + if (err == -ETIMEDOUT) { + pci_err(hns3_pmu->pdev, "QID request timeout!\n"); + goto out; + } + + queue_id_valid = !(val & HNS3_PMU_QID_CTRL_MISS); + +out: + /* disable qid request and clear status */ + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_QID_CTRL, idx, 0); + + return queue_id_valid; +} + +static bool hns3_pmu_valid_queue(struct hns3_pmu *hns3_pmu, u32 idx, u16 bdf, + u16 queue) +{ + hns3_pmu_set_qid_para(hns3_pmu, idx, bdf, queue); + + return hns3_pmu_qid_req_start(hns3_pmu, idx); +} + +static struct hns3_pmu_event_attr *hns3_pmu_get_pmu_event(u32 event) +{ + struct hns3_pmu_event_attr *pmu_event; + struct dev_ext_attribute *eattr; + struct device_attribute *dattr; + struct attribute *attr; + u32 i; + + for (i = 0; i < ARRAY_SIZE(hns3_pmu_events_attr) - 1; i++) { + attr = hns3_pmu_events_attr[i]; + dattr = container_of(attr, struct device_attribute, attr); + eattr = container_of(dattr, struct dev_ext_attribute, attr); + pmu_event = eattr->var; + + if (event == pmu_event->event) + return pmu_event; + } + + return NULL; +} + +static int hns3_pmu_set_func_mode(struct perf_event *event, + struct hns3_pmu *hns3_pmu) +{ + struct hw_perf_event *hwc = &event->hw; + u16 bdf = hns3_pmu_get_bdf(event); + + if (!hns3_pmu_valid_bdf(hns3_pmu, bdf)) + return -ENOENT; + + HNS3_PMU_SET_HW_FILTER(hwc, HNS3_PMU_HW_FILTER_FUNC); + + return 0; +} + +static int hns3_pmu_set_func_queue_mode(struct perf_event *event, + struct hns3_pmu *hns3_pmu) +{ + u16 queue_id = hns3_pmu_get_queue(event); + struct hw_perf_event *hwc = &event->hw; + u16 bdf = hns3_pmu_get_bdf(event); + + if (!hns3_pmu_valid_bdf(hns3_pmu, bdf)) + return -ENOENT; + + if (!hns3_pmu_valid_queue(hns3_pmu, hwc->idx, bdf, queue_id)) { + pci_err(hns3_pmu->pdev, "Invalid queue: %u\n", queue_id); + return -ENOENT; + } + + HNS3_PMU_SET_HW_FILTER(hwc, HNS3_PMU_HW_FILTER_FUNC_QUEUE); + + return 0; +} + +static bool +hns3_pmu_is_enabled_global_mode(struct perf_event *event, + struct hns3_pmu_event_attr *pmu_event) +{ + u8 global = hns3_pmu_get_global(event); + + if (!(pmu_event->filter_support & HNS3_PMU_FILTER_SUPPORT_GLOBAL)) + return false; + + return global; +} + +static bool hns3_pmu_is_enabled_func_mode(struct perf_event *event, + struct hns3_pmu_event_attr *pmu_event) +{ + u16 queue_id = hns3_pmu_get_queue(event); + u16 bdf = hns3_pmu_get_bdf(event); + + if (!(pmu_event->filter_support & HNS3_PMU_FILTER_SUPPORT_FUNC)) + return false; + else if (queue_id != HNS3_PMU_FILTER_ALL_QUEUE) + return false; + + return bdf; +} + +static bool +hns3_pmu_is_enabled_func_queue_mode(struct perf_event *event, + struct hns3_pmu_event_attr *pmu_event) +{ + u16 queue_id = hns3_pmu_get_queue(event); + u16 bdf = hns3_pmu_get_bdf(event); + + if (!(pmu_event->filter_support & HNS3_PMU_FILTER_SUPPORT_FUNC_QUEUE)) + return false; + else if (queue_id == HNS3_PMU_FILTER_ALL_QUEUE) + return false; + + return bdf; +} + +static bool hns3_pmu_is_enabled_port_mode(struct perf_event *event, + struct hns3_pmu_event_attr *pmu_event) +{ + u8 tc_id = hns3_pmu_get_tc(event); + + if (!(pmu_event->filter_support & HNS3_PMU_FILTER_SUPPORT_PORT)) + return false; + + return tc_id == HNS3_PMU_FILTER_ALL_TC; +} + +static bool +hns3_pmu_is_enabled_port_tc_mode(struct perf_event *event, + struct hns3_pmu_event_attr *pmu_event) +{ + u8 tc_id = hns3_pmu_get_tc(event); + + if (!(pmu_event->filter_support & HNS3_PMU_FILTER_SUPPORT_PORT_TC)) + return false; + + return tc_id != HNS3_PMU_FILTER_ALL_TC; +} + +static bool +hns3_pmu_is_enabled_func_intr_mode(struct perf_event *event, + struct hns3_pmu *hns3_pmu, + struct hns3_pmu_event_attr *pmu_event) +{ + u16 bdf = hns3_pmu_get_bdf(event); + + if (!(pmu_event->filter_support & HNS3_PMU_FILTER_SUPPORT_FUNC_INTR)) + return false; + + return hns3_pmu_valid_bdf(hns3_pmu, bdf); +} + +static int hns3_pmu_select_filter_mode(struct perf_event *event, + struct hns3_pmu *hns3_pmu) +{ + u32 event_id = hns3_pmu_get_event(event); + struct hw_perf_event *hwc = &event->hw; + struct hns3_pmu_event_attr *pmu_event; + + pmu_event = hns3_pmu_get_pmu_event(event_id); + if (!pmu_event) { + pci_err(hns3_pmu->pdev, "Invalid pmu event\n"); + return -ENOENT; + } + + if (hns3_pmu_is_enabled_global_mode(event, pmu_event)) { + HNS3_PMU_SET_HW_FILTER(hwc, HNS3_PMU_HW_FILTER_GLOBAL); + return 0; + } + + if (hns3_pmu_is_enabled_func_mode(event, pmu_event)) + return hns3_pmu_set_func_mode(event, hns3_pmu); + + if (hns3_pmu_is_enabled_func_queue_mode(event, pmu_event)) + return hns3_pmu_set_func_queue_mode(event, hns3_pmu); + + if (hns3_pmu_is_enabled_port_mode(event, pmu_event)) { + HNS3_PMU_SET_HW_FILTER(hwc, HNS3_PMU_HW_FILTER_PORT); + return 0; + } + + if (hns3_pmu_is_enabled_port_tc_mode(event, pmu_event)) { + HNS3_PMU_SET_HW_FILTER(hwc, HNS3_PMU_HW_FILTER_PORT_TC); + return 0; + } + + if (hns3_pmu_is_enabled_func_intr_mode(event, hns3_pmu, pmu_event)) { + HNS3_PMU_SET_HW_FILTER(hwc, HNS3_PMU_HW_FILTER_FUNC_INTR); + return 0; + } + + return -ENOENT; +} + +static bool hns3_pmu_validate_event_group(struct perf_event *event) +{ + struct perf_event *sibling, *leader = event->group_leader; + struct perf_event *event_group[HNS3_PMU_MAX_HW_EVENTS]; + int counters = 1; + int num; + + event_group[0] = leader; + if (!is_software_event(leader)) { + if (leader->pmu != event->pmu) + return false; + + if (leader != event && !hns3_pmu_cmp_event(leader, event)) + event_group[counters++] = event; + } + + for_each_sibling_event(sibling, event->group_leader) { + if (is_software_event(sibling)) + continue; + + if (sibling->pmu != event->pmu) + return false; + + for (num = 0; num < counters; num++) { + if (hns3_pmu_cmp_event(event_group[num], sibling)) + break; + } + + if (num == counters) + event_group[counters++] = sibling; + } + + return counters <= HNS3_PMU_MAX_HW_EVENTS; +} + +static u32 hns3_pmu_get_filter_condition(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u16 intr_id = hns3_pmu_get_intr(event); + u8 port_id = hns3_pmu_get_port(event); + u16 bdf = hns3_pmu_get_bdf(event); + u8 tc_id = hns3_pmu_get_tc(event); + u8 filter_mode; + + filter_mode = *(u8 *)hwc->addr_filters; + switch (filter_mode) { + case HNS3_PMU_HW_FILTER_PORT: + return FILTER_CONDITION_PORT(port_id); + case HNS3_PMU_HW_FILTER_PORT_TC: + return FILTER_CONDITION_PORT_TC(port_id, tc_id); + case HNS3_PMU_HW_FILTER_FUNC: + case HNS3_PMU_HW_FILTER_FUNC_QUEUE: + return GET_PCI_DEVFN(bdf); + case HNS3_PMU_HW_FILTER_FUNC_INTR: + return FILTER_CONDITION_FUNC_INTR(GET_PCI_DEVFN(bdf), intr_id); + default: + break; + } + + return 0; +} + +static void hns3_pmu_config_filter(struct perf_event *event) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + u8 event_type = hns3_pmu_get_event_type(event); + u8 subevent_id = hns3_pmu_get_subevent(event); + u16 queue_id = hns3_pmu_get_queue(event); + struct hw_perf_event *hwc = &event->hw; + u8 filter_mode = *(u8 *)hwc->addr_filters; + u16 bdf = hns3_pmu_get_bdf(event); + u32 idx = hwc->idx; + u32 val; + + val = event_type; + val |= subevent_id << HNS3_PMU_CTRL_SUBEVENT_S; + val |= filter_mode << HNS3_PMU_CTRL_FILTER_MODE_S; + val |= HNS3_PMU_EVENT_OVERFLOW_RESTART; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx, val); + + val = hns3_pmu_get_filter_condition(event); + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_HIGH, idx, val); + + if (filter_mode == HNS3_PMU_HW_FILTER_FUNC_QUEUE) + hns3_pmu_set_qid_para(hns3_pmu, idx, bdf, queue_id); +} + +static void hns3_pmu_enable_counter(struct hns3_pmu *hns3_pmu, + struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + u32 val; + + val = hns3_pmu_readl(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx); + val |= HNS3_PMU_EVENT_EN; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx, val); +} + +static void hns3_pmu_disable_counter(struct hns3_pmu *hns3_pmu, + struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + u32 val; + + val = hns3_pmu_readl(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx); + val &= ~HNS3_PMU_EVENT_EN; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx, val); +} + +static void hns3_pmu_enable_intr(struct hns3_pmu *hns3_pmu, + struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + u32 val; + + val = hns3_pmu_readl(hns3_pmu, HNS3_PMU_REG_EVENT_INTR_MASK, idx); + val &= ~HNS3_PMU_INTR_MASK_OVERFLOW; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_INTR_MASK, idx, val); +} + +static void hns3_pmu_disable_intr(struct hns3_pmu *hns3_pmu, + struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + u32 val; + + val = hns3_pmu_readl(hns3_pmu, HNS3_PMU_REG_EVENT_INTR_MASK, idx); + val |= HNS3_PMU_INTR_MASK_OVERFLOW; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_INTR_MASK, idx, val); +} + +static void hns3_pmu_clear_intr_status(struct hns3_pmu *hns3_pmu, u32 idx) +{ + u32 val; + + val = hns3_pmu_readl(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx); + val |= HNS3_PMU_EVENT_STATUS_RESET; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx, val); + + val = hns3_pmu_readl(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx); + val &= ~HNS3_PMU_EVENT_STATUS_RESET; + hns3_pmu_writel(hns3_pmu, HNS3_PMU_REG_EVENT_CTRL_LOW, idx, val); +} + +static u64 hns3_pmu_read_counter(struct perf_event *event) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + + return hns3_pmu_readq(hns3_pmu, event->hw.event_base, event->hw.idx); +} + +static void hns3_pmu_write_counter(struct perf_event *event, u64 value) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + u32 idx = event->hw.idx; + + hns3_pmu_writeq(hns3_pmu, HNS3_PMU_REG_EVENT_COUNTER, idx, value); + hns3_pmu_writeq(hns3_pmu, HNS3_PMU_REG_EVENT_EXT_COUNTER, idx, value); +} + +static void hns3_pmu_init_counter(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + + local64_set(&hwc->prev_count, 0); + hns3_pmu_write_counter(event, 0); +} + +static int hns3_pmu_event_init(struct perf_event *event) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + int idx; + int ret; + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* Sampling is not supported */ + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) + return -EOPNOTSUPP; + + event->cpu = hns3_pmu->on_cpu; + + idx = hns3_pmu_get_event_idx(hns3_pmu); + if (idx < 0) { + pci_err(hns3_pmu->pdev, "Up to %u events are supported!\n", + HNS3_PMU_MAX_HW_EVENTS); + return -EBUSY; + } + + hwc->idx = idx; + + ret = hns3_pmu_select_filter_mode(event, hns3_pmu); + if (ret) { + pci_err(hns3_pmu->pdev, "Invalid filter, ret = %d.\n", ret); + return ret; + } + + if (!hns3_pmu_validate_event_group(event)) { + pci_err(hns3_pmu->pdev, "Invalid event group.\n"); + return -EINVAL; + } + + if (hns3_pmu_get_ext_counter_used(event)) + hwc->event_base = HNS3_PMU_REG_EVENT_EXT_COUNTER; + else + hwc->event_base = HNS3_PMU_REG_EVENT_COUNTER; + + return 0; +} + +static void hns3_pmu_read(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 new_cnt, prev_cnt, delta; + + do { + prev_cnt = local64_read(&hwc->prev_count); + new_cnt = hns3_pmu_read_counter(event); + } while (local64_cmpxchg(&hwc->prev_count, prev_cnt, new_cnt) != + prev_cnt); + + delta = new_cnt - prev_cnt; + local64_add(delta, &event->count); +} + +static void hns3_pmu_start(struct perf_event *event, int flags) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED))) + return; + + WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE)); + hwc->state = 0; + + hns3_pmu_config_filter(event); + hns3_pmu_init_counter(event); + hns3_pmu_enable_intr(hns3_pmu, hwc); + hns3_pmu_enable_counter(hns3_pmu, hwc); + + perf_event_update_userpage(event); +} + +static void hns3_pmu_stop(struct perf_event *event, int flags) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + hns3_pmu_disable_counter(hns3_pmu, hwc); + hns3_pmu_disable_intr(hns3_pmu, hwc); + + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + + if (hwc->state & PERF_HES_UPTODATE) + return; + + /* Read hardware counter and update the perf counter statistics */ + hns3_pmu_read(event); + hwc->state |= PERF_HES_UPTODATE; +} + +static int hns3_pmu_add(struct perf_event *event, int flags) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + int idx; + + hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE; + + /* Check all working events to find a related event. */ + idx = hns3_pmu_find_related_event_idx(hns3_pmu, event); + if (idx < 0 && idx != -ENOENT) + return idx; + + /* Current event shares an enabled hardware event with related event */ + if (idx >= 0 && idx < HNS3_PMU_MAX_HW_EVENTS) { + hwc->idx = idx; + goto start_count; + } + + idx = hns3_pmu_get_event_idx(hns3_pmu); + if (idx < 0) + return idx; + + hwc->idx = idx; + hns3_pmu->hw_events[idx] = event; + +start_count: + if (flags & PERF_EF_START) + hns3_pmu_start(event, PERF_EF_RELOAD); + + return 0; +} + +static void hns3_pmu_del(struct perf_event *event, int flags) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + hns3_pmu_stop(event, PERF_EF_UPDATE); + hns3_pmu->hw_events[hwc->idx] = NULL; + perf_event_update_userpage(event); +} + +static void hns3_pmu_enable(struct pmu *pmu) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(pmu); + u32 val; + + val = readl(hns3_pmu->base + HNS3_PMU_REG_GLOBAL_CTRL); + val |= HNS3_PMU_GLOBAL_START; + writel(val, hns3_pmu->base + HNS3_PMU_REG_GLOBAL_CTRL); +} + +static void hns3_pmu_disable(struct pmu *pmu) +{ + struct hns3_pmu *hns3_pmu = to_hns3_pmu(pmu); + u32 val; + + val = readl(hns3_pmu->base + HNS3_PMU_REG_GLOBAL_CTRL); + val &= ~HNS3_PMU_GLOBAL_START; + writel(val, hns3_pmu->base + HNS3_PMU_REG_GLOBAL_CTRL); +} + +static int hns3_pmu_alloc_pmu(struct pci_dev *pdev, struct hns3_pmu *hns3_pmu) +{ + u16 device_id; + char *name; + u32 val; + + hns3_pmu->base = pcim_iomap_table(pdev)[BAR_2]; + if (!hns3_pmu->base) { + pci_err(pdev, "ioremap failed\n"); + return -ENOMEM; + } + + hns3_pmu->hw_clk_freq = readl(hns3_pmu->base + HNS3_PMU_REG_CLOCK_FREQ); + + val = readl(hns3_pmu->base + HNS3_PMU_REG_BDF); + hns3_pmu->bdf_min = val & 0xffff; + hns3_pmu->bdf_max = val >> 16; + + val = readl(hns3_pmu->base + HNS3_PMU_REG_DEVICE_ID); + device_id = val & 0xffff; + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "hns3_pmu_sicl_%u", device_id); + if (!name) + return -ENOMEM; + + hns3_pmu->pdev = pdev; + hns3_pmu->on_cpu = -1; + hns3_pmu->identifier = readl(hns3_pmu->base + HNS3_PMU_REG_VERSION); + hns3_pmu->pmu = (struct pmu) { + .name = name, + .module = THIS_MODULE, + .event_init = hns3_pmu_event_init, + .pmu_enable = hns3_pmu_enable, + .pmu_disable = hns3_pmu_disable, + .add = hns3_pmu_add, + .del = hns3_pmu_del, + .start = hns3_pmu_start, + .stop = hns3_pmu_stop, + .read = hns3_pmu_read, + .task_ctx_nr = perf_invalid_context, + .attr_groups = hns3_pmu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, + }; + + return 0; +} + +static irqreturn_t hns3_pmu_irq(int irq, void *data) +{ + struct hns3_pmu *hns3_pmu = data; + u32 intr_status, idx; + + for (idx = 0; idx < HNS3_PMU_MAX_HW_EVENTS; idx++) { + intr_status = hns3_pmu_readl(hns3_pmu, + HNS3_PMU_REG_EVENT_INTR_STATUS, + idx); + + /* + * As each counter will restart from 0 when it is overflowed, + * extra processing is no need, just clear interrupt status. + */ + if (intr_status) + hns3_pmu_clear_intr_status(hns3_pmu, idx); + } + + return IRQ_HANDLED; +} + +static int hns3_pmu_online_cpu(unsigned int cpu, struct hlist_node *node) +{ + struct hns3_pmu *hns3_pmu; + + hns3_pmu = hlist_entry_safe(node, struct hns3_pmu, node); + if (!hns3_pmu) + return -ENODEV; + + if (hns3_pmu->on_cpu == -1) { + hns3_pmu->on_cpu = cpu; + irq_set_affinity(hns3_pmu->irq, cpumask_of(cpu)); + } + + return 0; +} + +static int hns3_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) +{ + struct hns3_pmu *hns3_pmu; + unsigned int target; + + hns3_pmu = hlist_entry_safe(node, struct hns3_pmu, node); + if (!hns3_pmu) + return -ENODEV; + + /* Nothing to do if this CPU doesn't own the PMU */ + if (hns3_pmu->on_cpu != cpu) + return 0; + + /* Choose a new CPU from all online cpus */ + target = cpumask_any_but(cpu_online_mask, cpu); + if (target >= nr_cpu_ids) + return 0; + + perf_pmu_migrate_context(&hns3_pmu->pmu, cpu, target); + hns3_pmu->on_cpu = target; + irq_set_affinity(hns3_pmu->irq, cpumask_of(target)); + + return 0; +} + +static void hns3_pmu_free_irq(void *data) +{ + struct pci_dev *pdev = data; + + pci_free_irq_vectors(pdev); +} + +static int hns3_pmu_irq_register(struct pci_dev *pdev, + struct hns3_pmu *hns3_pmu) +{ + int irq, ret; + + ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI); + if (ret < 0) { + pci_err(pdev, "failed to enable MSI vectors, ret = %d.\n", ret); + return ret; + } + + ret = devm_add_action(&pdev->dev, hns3_pmu_free_irq, pdev); + if (ret) { + pci_err(pdev, "failed to add free irq action, ret = %d.\n", ret); + return ret; + } + + irq = pci_irq_vector(pdev, 0); + ret = devm_request_irq(&pdev->dev, irq, hns3_pmu_irq, 0, + hns3_pmu->pmu.name, hns3_pmu); + if (ret) { + pci_err(pdev, "failed to register irq, ret = %d.\n", ret); + return ret; + } + + hns3_pmu->irq = irq; + + return 0; +} + +static int hns3_pmu_init_pmu(struct pci_dev *pdev, struct hns3_pmu *hns3_pmu) +{ + int ret; + + ret = hns3_pmu_alloc_pmu(pdev, hns3_pmu); + if (ret) + return ret; + + ret = hns3_pmu_irq_register(pdev, hns3_pmu); + if (ret) + return ret; + + ret = cpuhp_state_add_instance(hns3_pmu_online, &hns3_pmu->node); + if (ret) { + pci_err(pdev, "failed to register hotplug, ret = %d.\n", ret); + return ret; + } + + ret = perf_pmu_register(&hns3_pmu->pmu, hns3_pmu->pmu.name, -1); + if (ret) { + pci_err(pdev, "failed to register perf PMU, ret = %d.\n", ret); + cpuhp_state_remove_instance(hns3_pmu_online, &hns3_pmu->node); + } + + return ret; +} + +static void hns3_pmu_uninit_pmu(struct pci_dev *pdev) +{ + struct hns3_pmu *hns3_pmu = pci_get_drvdata(pdev); + + perf_pmu_unregister(&hns3_pmu->pmu); + cpuhp_state_remove_instance(hns3_pmu_online, &hns3_pmu->node); +} + +static int hns3_pmu_init_dev(struct pci_dev *pdev) +{ + int ret; + + ret = pcim_enable_device(pdev); + if (ret) { + pci_err(pdev, "failed to enable pci device, ret = %d.\n", ret); + return ret; + } + + ret = pcim_iomap_regions(pdev, BIT(BAR_2), "hns3_pmu"); + if (ret < 0) { + pci_err(pdev, "failed to request pci region, ret = %d.\n", ret); + return ret; + } + + pci_set_master(pdev); + + return 0; +} + +static int hns3_pmu_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct hns3_pmu *hns3_pmu; + int ret; + + hns3_pmu = devm_kzalloc(&pdev->dev, sizeof(*hns3_pmu), GFP_KERNEL); + if (!hns3_pmu) + return -ENOMEM; + + ret = hns3_pmu_init_dev(pdev); + if (ret) + return ret; + + ret = hns3_pmu_init_pmu(pdev, hns3_pmu); + if (ret) { + pci_clear_master(pdev); + return ret; + } + + pci_set_drvdata(pdev, hns3_pmu); + + return ret; +} + +static void hns3_pmu_remove(struct pci_dev *pdev) +{ + hns3_pmu_uninit_pmu(pdev); + pci_clear_master(pdev); + pci_set_drvdata(pdev, NULL); +} + +static const struct pci_device_id hns3_pmu_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, 0xa22b) }, + { 0, } +}; +MODULE_DEVICE_TABLE(pci, hns3_pmu_ids); + +static struct pci_driver hns3_pmu_driver = { + .name = "hns3_pmu", + .id_table = hns3_pmu_ids, + .probe = hns3_pmu_probe, + .remove = hns3_pmu_remove, +}; + +static int __init hns3_pmu_module_init(void) +{ + int ret; + + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "perf/hns3_pmu/pmcg:online", + hns3_pmu_online_cpu, + hns3_pmu_offline_cpu); + if (ret < 0) { + pr_err("failed to setup HNS3 PMU hotplug, ret = %d.\n", ret); + return ret; + } + hns3_pmu_online = ret; + + ret = pci_register_driver(&hns3_pmu_driver); + if (ret) { + pr_err("failed to register pci driver, ret = %d.\n", ret); + cpuhp_remove_multi_state(hns3_pmu_online); + } + + return ret; +} +module_init(hns3_pmu_module_init); + +static void __exit hns3_pmu_module_exit(void) +{ + pci_unregister_driver(&hns3_pmu_driver); + cpuhp_remove_multi_state(hns3_pmu_online); +} +module_exit(hns3_pmu_module_exit); + +MODULE_DESCRIPTION("HNS3 PMU driver"); +MODULE_LICENSE("GPL v2");
From: Will Deacon will@kernel.org
mainline inclusion from mainline-arm64-upstream commit aaaee7b55c9e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KAX7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
After commit 39915b6b5fc2 ("drivers/perf: hisi: Add description for HNS3 PMU driver"),building the 'htmldocs' target results in the following warning:
| Documentation/admin-guide/perf/hns3-pmu.rst: WARNING: document isn't included in any toctree
Add 'hns3-pmu' to the perf toctree to silence the warning.
Reported-by: Stephen Rothwell sfr@canb.auug.org.au Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Jiantao Xiao xiaojiantao1@h-partners.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Jian Shen shenjian15@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- Documentation/admin-guide/perf/index.rst | 1 + 1 file changed, 1 insertion(+)
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 5a8f2529a033..5b59862f6621 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -8,6 +8,7 @@ Performance monitor support :maxdepth: 1
hisi-pmu + hns3-pmu imx-ddr qcom_l2_pmu qcom_l3_pmu
From: Lorenz Bauer lmb@cloudflare.com
stable inclusion from stable-v5.10.135 commit 6aad811b37eeeba902b14cc4ab698d2b37bb4fb9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 607b9cc92bd7208338d714a22b8082fe83bcb177 upstream.
Share the timing / signal interruption logic between different implementations of PROG_TEST_RUN. There is a change in behaviour as well. We check the loop exit condition before checking for pending signals. This resolves an edge case where a signal arrives during the last iteration. Instead of aborting with EINTR we return the successful result to user space.
Signed-off-by: Lorenz Bauer lmb@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20210303101816.36774-2-lmb@cloudflare.com [dtcccc: fix conflicts in bpf_test_run()] Signed-off-by: Tianchen Ding dtcccc@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Conflicts: net/bpf/test_run.c Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/bpf/test_run.c | 142 +++++++++++++++++++++++++-------------------- 1 file changed, 78 insertions(+), 64 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 141c39274066..5df75f12551e 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -16,16 +16,80 @@ #define CREATE_TRACE_POINTS #include <trace/events/bpf_test_run.h>
+struct bpf_test_timer { + enum { NO_PREEMPT, NO_MIGRATE } mode; + u32 i; + u64 time_start, time_spent; +}; + +static void bpf_test_timer_enter(struct bpf_test_timer *t) + __acquires(rcu) +{ + rcu_read_lock(); + if (t->mode == NO_PREEMPT) + preempt_disable(); + else + migrate_disable(); + + t->time_start = ktime_get_ns(); +} + +static void bpf_test_timer_leave(struct bpf_test_timer *t) + __releases(rcu) +{ + t->time_start = 0; + + if (t->mode == NO_PREEMPT) + preempt_enable(); + else + migrate_enable(); + rcu_read_unlock(); +} + +static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *err, u32 *duration) + __must_hold(rcu) +{ + t->i++; + if (t->i >= repeat) { + /* We're done. */ + t->time_spent += ktime_get_ns() - t->time_start; + do_div(t->time_spent, t->i); + *duration = t->time_spent > U32_MAX ? U32_MAX : (u32)t->time_spent; + *err = 0; + goto reset; + } + + if (signal_pending(current)) { + /* During iteration: we've been cancelled, abort. */ + *err = -EINTR; + goto reset; + } + + if (need_resched()) { + /* During iteration: we need to reschedule between runs. */ + t->time_spent += ktime_get_ns() - t->time_start; + bpf_test_timer_leave(t); + cond_resched(); + bpf_test_timer_enter(t); + } + + /* Do another round. */ + return true; + +reset: + t->i = 0; + return false; +} + static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { struct bpf_prog_array_item item = {.prog = prog}; struct bpf_run_ctx *old_ctx; struct bpf_cg_run_ctx run_ctx; + struct bpf_test_timer t = { NO_MIGRATE }; enum bpf_cgroup_storage_type stype; - u64 time_start, time_spent = 0; - int ret = 0; - u32 i; + int ret;
for_each_cgroup_storage_type(stype) { item.cgroup_storage[stype] = bpf_cgroup_storage_alloc(prog, stype); @@ -40,42 +104,17 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, if (!repeat) repeat = 1;
- rcu_read_lock(); - migrate_disable(); - time_start = ktime_get_ns(); + bpf_test_timer_enter(&t); old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); - for (i = 0; i < repeat; i++) { + do { run_ctx.prog_item = &item; - if (xdp) *retval = bpf_prog_run_xdp(prog, ctx); else *retval = BPF_PROG_RUN(prog, ctx); - - if (signal_pending(current)) { - ret = -EINTR; - break; - } - - if (need_resched()) { - time_spent += ktime_get_ns() - time_start; - migrate_enable(); - rcu_read_unlock(); - - cond_resched(); - - rcu_read_lock(); - migrate_disable(); - time_start = ktime_get_ns(); - } - } + } while (bpf_test_timer_continue(&t, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); - time_spent += ktime_get_ns() - time_start; - migrate_enable(); - rcu_read_unlock(); - - do_div(time_spent, repeat); - *time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent; + bpf_test_timer_leave(&t);
for_each_cgroup_storage_type(stype) bpf_cgroup_storage_free(item.cgroup_storage[stype]); @@ -692,18 +731,17 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { + struct bpf_test_timer t = { NO_PREEMPT }; u32 size = kattr->test.data_size_in; struct bpf_flow_dissector ctx = {}; u32 repeat = kattr->test.repeat; struct bpf_flow_keys *user_ctx; struct bpf_flow_keys flow_keys; - u64 time_start, time_spent = 0; const struct ethhdr *eth; unsigned int flags = 0; u32 retval, duration; void *data; int ret; - u32 i;
if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR) return -EINVAL; @@ -739,39 +777,15 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, ctx.data = data; ctx.data_end = (__u8 *)data + size;
- rcu_read_lock(); - preempt_disable(); - time_start = ktime_get_ns(); - for (i = 0; i < repeat; i++) { + bpf_test_timer_enter(&t); + do { retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN, size, flags); + } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + bpf_test_timer_leave(&t);
- if (signal_pending(current)) { - preempt_enable(); - rcu_read_unlock(); - - ret = -EINTR; - goto out; - } - - if (need_resched()) { - time_spent += ktime_get_ns() - time_start; - preempt_enable(); - rcu_read_unlock(); - - cond_resched(); - - rcu_read_lock(); - preempt_disable(); - time_start = ktime_get_ns(); - } - } - time_spent += ktime_get_ns() - time_start; - preempt_enable(); - rcu_read_unlock(); - - do_div(time_spent, repeat); - duration = time_spent > U32_MAX ? U32_MAX : (u32)time_spent; + if (ret < 0) + goto out;
ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys), retval, duration);
From: Lorenz Bauer lmb@cloudflare.com
stable inclusion from stable-v5.10.135 commit 6d3fad2b44eb9d226a896d1c93909f0fd2e1b9ea category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 7c32e8f8bc33a5f4b113a630857e46634e3e143b upstream.
Allow to pass sk_lookup programs to PROG_TEST_RUN. User space provides the full bpf_sk_lookup struct as context. Since the context includes a socket pointer that can't be exposed to user space we define that PROG_TEST_RUN returns the cookie of the selected socket or zero in place of the socket pointer.
We don't support testing programs that select a reuseport socket, since this would mean running another (unrelated) BPF program from the sk_lookup test handler.
Signed-off-by: Lorenz Bauer lmb@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com Signed-off-by: Tianchen Ding dtcccc@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/bpf.h | 10 ++++ include/uapi/linux/bpf.h | 5 +- net/bpf/test_run.c | 105 +++++++++++++++++++++++++++++++++ net/core/filter.c | 1 + tools/include/uapi/linux/bpf.h | 5 +- 5 files changed, 124 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3154be71a80c..8d95f4c66275 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1545,6 +1545,9 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); +int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, + const union bpf_attr *kattr, + union bpf_attr __user *uattr); bool btf_ctx_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info); @@ -1759,6 +1762,13 @@ static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, return -ENOTSUPP; }
+static inline int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, + const union bpf_attr *kattr, + union bpf_attr __user *uattr) +{ + return -ENOTSUPP; +} + static inline void bpf_map_put(struct bpf_map *map) { } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8fae845d80e2..bd566bfc843f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5022,7 +5022,10 @@ struct bpf_pidns_info {
/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */ struct bpf_sk_lookup { - __bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */ + union { + __bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */ + __u64 cookie; /* Non-zero if socket was selected in PROG_TEST_RUN */ + };
__u32 family; /* Protocol family (AF_INET, AF_INET6) */ __u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */ diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 5df75f12551e..df8d9d800ebc 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -10,8 +10,10 @@ #include <net/bpf_sk_storage.h> #include <net/sock.h> #include <net/tcp.h> +#include <net/net_namespace.h> #include <linux/error-injection.h> #include <linux/smp.h> +#include <linux/sock_diag.h>
#define CREATE_TRACE_POINTS #include <trace/events/bpf_test_run.h> @@ -798,3 +800,106 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, kfree(data); return ret; } + +int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kattr, + union bpf_attr __user *uattr) +{ + struct bpf_test_timer t = { NO_PREEMPT }; + struct bpf_prog_array *progs = NULL; + struct bpf_sk_lookup_kern ctx = {}; + u32 repeat = kattr->test.repeat; + struct bpf_sk_lookup *user_ctx; + u32 retval, duration; + int ret = -EINVAL; + + if (prog->type != BPF_PROG_TYPE_SK_LOOKUP) + return -EINVAL; + + if (kattr->test.flags || kattr->test.cpu) + return -EINVAL; + + if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out || + kattr->test.data_size_out) + return -EINVAL; + + if (!repeat) + repeat = 1; + + user_ctx = bpf_ctx_init(kattr, sizeof(*user_ctx)); + if (IS_ERR(user_ctx)) + return PTR_ERR(user_ctx); + + if (!user_ctx) + return -EINVAL; + + if (user_ctx->sk) + goto out; + + if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx))) + goto out; + + if (user_ctx->local_port > U16_MAX || user_ctx->remote_port > U16_MAX) { + ret = -ERANGE; + goto out; + } + + ctx.family = (u16)user_ctx->family; + ctx.protocol = (u16)user_ctx->protocol; + ctx.dport = (u16)user_ctx->local_port; + ctx.sport = (__force __be16)user_ctx->remote_port; + + switch (ctx.family) { + case AF_INET: + ctx.v4.daddr = (__force __be32)user_ctx->local_ip4; + ctx.v4.saddr = (__force __be32)user_ctx->remote_ip4; + break; + +#if IS_ENABLED(CONFIG_IPV6) + case AF_INET6: + ctx.v6.daddr = (struct in6_addr *)user_ctx->local_ip6; + ctx.v6.saddr = (struct in6_addr *)user_ctx->remote_ip6; + break; +#endif + + default: + ret = -EAFNOSUPPORT; + goto out; + } + + progs = bpf_prog_array_alloc(1, GFP_KERNEL); + if (!progs) { + ret = -ENOMEM; + goto out; + } + + progs->items[0].prog = prog; + + bpf_test_timer_enter(&t); + do { + ctx.selected_sk = NULL; + retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, BPF_PROG_RUN); + } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + bpf_test_timer_leave(&t); + + if (ret < 0) + goto out; + + user_ctx->cookie = 0; + if (ctx.selected_sk) { + if (ctx.selected_sk->sk_reuseport && !ctx.no_reuseport) { + ret = -EOPNOTSUPP; + goto out; + } + + user_ctx->cookie = sock_gen_cookie(ctx.selected_sk); + } + + ret = bpf_test_finish(kattr, uattr, NULL, 0, retval, duration); + if (!ret) + ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx)); + +out: + bpf_prog_array_free(progs); + kfree(user_ctx); + return ret; +} diff --git a/net/core/filter.c b/net/core/filter.c index 0644dde98433..23b6820a2e28 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -10414,6 +10414,7 @@ static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type, }
const struct bpf_prog_ops sk_lookup_prog_ops = { + .test_run = bpf_prog_test_run_sk_lookup, };
const struct bpf_verifier_ops sk_lookup_verifier_ops = { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 9ff20fbad70e..4af3661414fa 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5022,7 +5022,10 @@ struct bpf_pidns_info {
/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */ struct bpf_sk_lookup { - __bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */ + union { + __bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */ + __u64 cookie; /* Non-zero if socket was selected in PROG_TEST_RUN */ + };
__u32 family; /* Protocol family (AF_INET, AF_INET6) */ __u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
From: Lorenz Bauer lmb@cloudflare.com
stable inclusion from stable-v5.10.135 commit 4bfc9dc60873923ffa64ee77084bac55031a30a0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b4f894633fa14d7d46ba7676f950b90a401504bb upstream.
sk_lookup doesn't allow setting data_in for bpf_prog_run. This doesn't play well with the verifier tests, since they always set a 64 byte input buffer. Allow not running verifier tests by setting bpf_test.runs to a negative value and don't run the ctx access case for sk_lookup. We have dedicated ctx access tests so skipping here doesn't reduce coverage.
Signed-off-by: Lorenz Bauer lmb@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20210303101816.36774-6-lmb@cloudflare.com Signed-off-by: Tianchen Ding dtcccc@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- tools/testing/selftests/bpf/test_verifier.c | 4 ++-- tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 0fc813235575..961c17b4681e 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -101,7 +101,7 @@ struct bpf_test { enum bpf_prog_type prog_type; uint8_t flags; void (*fill_helper)(struct bpf_test *self); - uint8_t runs; + int runs; #define bpf_testdata_struct_t \ struct { \ uint32_t retval, retval_unpriv; \ @@ -1064,7 +1064,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
run_errs = 0; run_successes = 0; - if (!alignment_prevented_execution && fd_prog >= 0) { + if (!alignment_prevented_execution && fd_prog >= 0 && test->runs >= 0) { uint32_t expected_val; int i;
diff --git a/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c index 2ad5f974451c..fd3b62a084b9 100644 --- a/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c +++ b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c @@ -239,6 +239,7 @@ .result = ACCEPT, .prog_type = BPF_PROG_TYPE_SK_LOOKUP, .expected_attach_type = BPF_SK_LOOKUP, + .runs = -1, }, /* invalid 8-byte reads from a 4-byte fields in bpf_sk_lookup */ {
From: GUO Zihua guozihua@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61O87 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
--------------------------------
Currently ima_lsm_copy_rule() set the arg_p field of the source rule to NULL, so that the source rule could be freed afterward. It does not make sense for this behavior to be inside a "copy" function. So move it outside and let the caller handle this field.
ima_lsm_copy_rule() now produce a shallow copy of the original entry including args_p field. Meaning only the lsm.rule and the rule itself should be freed for the original rule. Thus, instead of calling ima_lsm_free_rule() which frees lsm.rule as well as args_p field, free the lsm.rule directly.
Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Roberto Sassu roberto.sassu@huawei.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Conflicts: security/integrity/ima/ima_policy.c Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: GUO Zihua guozihua@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- security/integrity/ima/ima_policy.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index b1ab4b3d99fb..d39118c1ad3d 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -399,12 +399,6 @@ static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry)
nentry->lsm[i].type = entry->lsm[i].type; nentry->lsm[i].args_p = entry->lsm[i].args_p; - /* - * Remove the reference from entry so that the associated - * memory will not be freed during a later call to - * ima_lsm_free_rule(entry). - */ - entry->lsm[i].args_p = NULL;
ima_filter_rule_init(nentry->lsm[i].type, Audit_equal, nentry->lsm[i].args_p, @@ -418,6 +412,7 @@ static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry)
static int ima_lsm_update_rule(struct ima_rule_entry *entry) { + int i; struct ima_rule_entry *nentry;
nentry = ima_lsm_copy_rule(entry); @@ -432,7 +427,8 @@ static int ima_lsm_update_rule(struct ima_rule_entry *entry) * references and the entry itself. All other memory refrences will now * be owned by nentry. */ - ima_lsm_free_rule(entry); + for (i = 0; i < MAX_LSM_RULES; i++) + ima_filter_rule_free(entry->lsm[i].rule); kfree(entry);
return 0;
From: GUO Zihua guozihua@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61O87 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
--------------------------------
IMA relies on the blocking LSM policy notifier callback to update the LSM based IMA policy rules.
When SELinux update its policies, IMA would be notified and starts updating all its lsm rules one-by-one. During this time, -ESTALE would be returned by ima_filter_rule_match() if it is called with a LSM rule that has not yet been updated. In ima_match_rules(), -ESTALE is not handled, and the LSM rule is considered a match, causing extra files to be measured by IMA.
Fix it by re-initializing a temporary rule if -ESTALE is returned by ima_filter_rule_match(). The origin rule in the rule list would be updated by the LSM policy notifier callback.
Fixes: b16942455193 ("ima: use the lsm policy update notifier") Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Roberto Sassu roberto.sassu@huawei.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Conflicts: security/integrity/ima/ima_policy.c Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: GUO Zihua guozihua@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- security/integrity/ima/ima_policy.c | 41 ++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 9 deletions(-)
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index d39118c1ad3d..274f4c7c99f4 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -528,6 +528,9 @@ static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode, const char *keyring) { int i; + bool result = false; + struct ima_rule_entry *lsm_rule = rule; + bool rule_reinitialized = false;
if (func == KEY_CHECK) { return (rule->flags & IMA_FUNC) && (rule->func == func) && @@ -573,34 +576,54 @@ static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode, int rc = 0; u32 osid;
- if (!rule->lsm[i].rule) { - if (!rule->lsm[i].args_p) + if (!lsm_rule->lsm[i].rule) { + if (!lsm_rule->lsm[i].args_p) continue; else return false; } + +retry: switch (i) { case LSM_OBJ_USER: case LSM_OBJ_ROLE: case LSM_OBJ_TYPE: security_inode_getsecid(inode, &osid); - rc = ima_filter_rule_match(osid, rule->lsm[i].type, + rc = ima_filter_rule_match(osid, lsm_rule->lsm[i].type, Audit_equal, - rule->lsm[i].rule); + lsm_rule->lsm[i].rule); break; case LSM_SUBJ_USER: case LSM_SUBJ_ROLE: case LSM_SUBJ_TYPE: - rc = ima_filter_rule_match(secid, rule->lsm[i].type, + rc = ima_filter_rule_match(secid, lsm_rule->lsm[i].type, Audit_equal, - rule->lsm[i].rule); + lsm_rule->lsm[i].rule); default: break; } - if (!rc) - return false; + + if (rc == -ESTALE && !rule_reinitialized) { + lsm_rule = ima_lsm_copy_rule(rule); + if (lsm_rule) { + rule_reinitialized = true; + goto retry; + } + } + if (!rc) { + result = false; + goto out; + } } - return true; + result = true; + +out: + if (rule_reinitialized) { + for (i = 0; i < MAX_LSM_RULES; i++) + ima_filter_rule_free(lsm_rule->lsm[i].rule); + kfree(lsm_rule); + } + return result; }
/*
From: Long Li leo.lilong@huawei.com
hulk inclusion category: bugfix bugzilla: 187286, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
--------------------------------
The following error occurred during the fsstress test:
XFS: Assertion failed: VFS_I(ip)->i_nlink >= 2, file: fs/xfs/xfs_inode.c, line: 2452
The problem was that inode race condition causes incorrect i_nlink to be written to disk, and then it is read into memory. Consider the following call graph, inodes that are marked as both XFS_IFLUSHING and XFS_IRECLAIMABLE, i_nlink will be reset to 1 and then restored to original value in xfs_reinit_inode(). Therefore, the i_nlink of directory on disk may be set to 1.
xfsaild xfs_inode_item_push xfs_iflush_cluster xfs_iflush xfs_inode_to_disk
xfs_iget xfs_iget_cache_hit xfs_iget_recycle xfs_reinit_inode inode_init_always
xfs_reinit_inode() needs to hold the ILOCK_EXCL as it is changing internal inode state and can race with other RCU protected inode lookups. On the read side, xfs_iflush_cluster() grabs the ILOCK_SHARED while under rcu + ip->i_flags_lock, and so xfs_iflush/xfs_inode_to_disk() are protected from racing inode updates (during transactions) by that lock.
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/xfs_icache.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index a722ac1fd8f6..2039423df384 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -345,6 +345,9 @@ xfs_iget_recycle(
trace_xfs_iget_recycle(ip);
+ if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) + return -EAGAIN; + /* * We need to make it look like the inode is being reclaimed to prevent * the actual reclaim workers from stomping over us while we recycle @@ -358,6 +361,7 @@ xfs_iget_recycle(
ASSERT(!rwsem_is_locked(&inode->i_rwsem)); error = xfs_reinit_inode(mp, inode); + xfs_iunlock(ip, XFS_ILOCK_EXCL); if (error) { /* * Re-initializing the inode failed, and we are in deep @@ -521,6 +525,8 @@ xfs_iget_cache_hit( if (ip->i_flags & XFS_IRECLAIMABLE) { /* Drops i_flags_lock and RCU read lock. */ error = xfs_iget_recycle(pag, ip); + if (error == -EAGAIN) + goto out_skip; if (error) return error; } else {
From: Yixing Liu liuyixing1@huawei.com
driver inclusion category: Bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61FED
-----------------------------------------------------------
When it is roce v1, if the traffic_class value exceeds 63, the following error will appear: modify qp to 2 state failed(22) Failed to create AH
This is because the driver intercepts the over-spec value in set dscp, and there is no need to obtain dscp for roce v1, so the driver does not intercept v1.
Fixes: 11ef2ec6aa7c ("RDMA/hns: Support DSCP of userspace") Signed-off-by: Yixing Liu liuyixing1@huawei.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_ah.c | 9 ++++++--- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 3 ++- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_ah.c b/drivers/infiniband/hw/hns/hns_roce_ah.c index cea402b28c44..975f58d9b8f0 100644 --- a/drivers/infiniband/hw/hns/hns_roce_ah.c +++ b/drivers/infiniband/hw/hns/hns_roce_ah.c @@ -61,8 +61,8 @@ int hns_roce_create_ah(struct ib_ah *ibah, struct rdma_ah_init_attr *init_attr, struct hns_roce_dev *hr_dev = to_hr_dev(ibah->device); struct hns_roce_ib_create_ah_resp resp = {}; struct hns_roce_ah *ah = to_hr_ah(ibah); - u8 priority; - u8 tc_mode; + u8 priority = 0; + u8 tc_mode = 0; int ret;
if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08 && udata) @@ -81,7 +81,10 @@ int hns_roce_create_ah(struct ib_ah *ibah, struct rdma_ah_init_attr *init_attr,
ret = hr_dev->hw->get_dscp(hr_dev, get_tclass(grh), &tc_mode, &priority); - if (ret && ret != -EOPNOTSUPP) + if (ret == -EOPNOTSUPP) + ret = 0; + + if (ret && grh->sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) return ret;
if (tc_mode == HNAE3_TC_MAP_MODE_DSCP && diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 4fb82685cf54..12f124a1de04 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -5038,7 +5038,8 @@ static int hns_roce_set_sl(struct ib_qp *ibqp,
ret = hns_roce_hw_v2_get_dscp(hr_dev, get_tclass(&attr->ah_attr.grh), &hr_qp->tc_mode, &hr_qp->priority); - if (ret && ret != -EOPNOTSUPP) { + if (ret && ret != -EOPNOTSUPP && + grh->sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) { ibdev_err(ibdev, "failed to get dscp, ret = %d.\n", ret); return ret; }
From: Yixing Liu liuyixing1@huawei.com
driver inclusion category: Bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61F1Q
--------------------------------------------------------------
Running roce v1 business on fpga, the following error occurs: hns3 0000:35:00.0 hns_0: local work queue 0x2 catast error, sub_event type is: 4
This is because the sl transmitted by the roce v1 service driver after set dscp is incorrect, which makes the sl of db inconsistent with the sl of qpc, resulting in an sl error on the hardware.
Fixes: 11ef2ec6aa7c ("RDMA/hns: Support DSCP of userspace") Signed-off-by: Yixing Liu liuyixing1@huawei.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_qp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index cbe7d37430c7..df5bebc5e1c1 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -1384,7 +1384,7 @@ int hns_roce_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
if (udata && udata->outlen) { resp.tc_mode = hr_qp->tc_mode; - resp.priority = hr_qp->priority; + resp.priority = hr_qp->sl; ret = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp))); if (ret)
From: Wenpeng Liang liangwenpeng@huawei.com
driver inclusion category: Bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61RNU
----------------------------------------------------------
Use urt to run the open xrc qp business, and the following error occurs: Create qp failed.
because the driver does not have an ex_cmd flag, related ioctl() or syscall() will fail to execute. so add open xrc qp cmd flag.
Fixes: ae394640bc89 ("RDMA/hns: Add support for XRC on HIP09") Signed-off-by: Wenpeng Liang liangwenpeng@huawei.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index 43ab590f9aac..e3b188b2bb4c 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -708,7 +708,8 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev) ib_dev->uverbs_cmd_mask |= (1ULL << IB_USER_VERBS_CMD_OPEN_XRCD) | (1ULL << IB_USER_VERBS_CMD_CLOSE_XRCD) | - (1ULL << IB_USER_VERBS_CMD_CREATE_XSRQ); + (1ULL << IB_USER_VERBS_CMD_CREATE_XSRQ) | + (1ULL << IB_USER_VERBS_CMD_OPEN_QP); ib_set_device_ops(ib_dev, &hns_roce_dev_xrcd_ops); }
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
stable inclusion from stable-v5.10.154 commit 6b6f94fb9a74dd2891f11de4e638c6202bc89476 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5ZNPH?from=project-issue CVE: CVE-2022-42896
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------
commit 711f8c3fb3db61897080468586b970c87c61d9e4 upstream.
The Bluetooth spec states that the valid range for SPSM is from 0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A page 1059: Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896 CC: stable@vger.kernel.org Reported-by: Tamás Koczka poprdi@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Tedd Ho-Jeong An tedd.an@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/bluetooth/l2cap_core.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index d35b168e7793..bbba3beffcd3 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5800,6 +5800,19 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn, BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm), scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A + * page 1059: + * + * Valid range: 0x0001-0x00ff + * + * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges + */ + if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) { + result = L2CAP_CR_LE_BAD_PSM; + chan = NULL; + goto response; + } + /* Check if we have socket listening on psm */ pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src, &conn->hcon->dst, LE_LINK); @@ -5980,6 +5993,18 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn,
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A + * page 1059: + * + * Valid range: 0x0001-0x00ff + * + * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges + */ + if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) { + result = L2CAP_CR_LE_BAD_PSM; + goto response; + } + BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));