July 2021 - Kernel - mailweb.openeuler.org

[PATCH kernel-4.19 1/3] ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Ye Bin <yebin10(a)huawei.com> mainline inclusion from mainline-v5.14-rc1 commit 558d6450c7755aa005d89021204b6cdcae5e848f category: bugfix bugzilla: 109280 CVE: NA ----------------------------------------------- If a writeback of the superblock fails with an I/O error, the buffer is marked not uptodate. However, this can cause a WARN_ON to trigger when we attempt to write superblock a second time. (Which might succeed this time, for cerrtain types of block devices such as iSCSI devices over a flaky network.) Try to detect this case in flush_stashed_error_work(), and also change __ext4_handle_dirty_metadata() so we always set the uptodate flag, not just in the nojournal case. Before this commit, this problem can be repliciated via: 1. dmsetup create dust1 --table '0 2097152 dust /dev/sdc 0 4096' 2. mount /dev/mapper/dust1 /home/test 3. dmsetup message dust1 0 addbadblock 0 10 4. cd /home/test 5. echo "XXXXXXX" > t After a few seconds, we got following warning: [ 80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0 [ 80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata [ 91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0 [ 91.417038] ------------[ cut here ]------------ [ 91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e [ 91.440322] Call Trace: [ 91.440652] __jbd2_journal_temp_unlink_buffer+0x135/0x220 [ 91.441354] __jbd2_journal_unfile_buffer+0x24/0x90 [ 91.441981] __jbd2_journal_refile_buffer+0x134/0x1d0 [ 91.442628] jbd2_journal_commit_transaction+0x249a/0x3240 [ 91.443336] ? put_prev_entity+0x2a/0x200 [ 91.443856] ? kjournald2+0x12e/0x510 [ 91.444324] kjournald2+0x12e/0x510 [ 91.444773] ? woken_wake_function+0x30/0x30 [ 91.445326] kthread+0x150/0x1b0 [ 91.445739] ? commit_timeout+0x20/0x20 [ 91.446258] ? kthread_flush_worker+0xb0/0xb0 [ 91.446818] ret_from_fork+0x1f/0x30 [ 91.447293] ---[ end trace 66f0b6bf3d1abade ]--- Signed-off-by: Ye Bin <yebin10(a)huawei.com> Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Signed-off-by: Ye Bin <yebin10(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/super.c | 12 ++++++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index fd7c41da1f8f9..022ab8afd9499 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -304,6 +304,7 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, set_buffer_meta(bh); set_buffer_prio(bh); + set_buffer_uptodate(bh); if (ext4_handle_valid(handle)) { err = jbd2_journal_dirty_metadata(handle, bh); /* Errors can only happen due to aborted journal or a nasty bug */ @@ -332,7 +333,6 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, err); } } else { - set_buffer_uptodate(bh); if (inode) mark_buffer_dirty_inode(bh, inode); else diff --git a/fs/ext4/super.c b/fs/ext4/super.c index fd67f3037fb8e..dd96627ec6ae4 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -695,15 +695,23 @@ static void flush_stashed_error_work(struct work_struct *work) * ext4 error handling code during handling of previous errors. */ if (!sb_rdonly(sbi->s_sb) && journal) { + struct buffer_head *sbh = sbi->s_sbh; handle = jbd2_journal_start(journal, 1); if (IS_ERR(handle)) goto write_directly; - if (jbd2_journal_get_write_access(handle, sbi->s_sbh)) { + if (jbd2_journal_get_write_access(handle, sbh)) { jbd2_journal_stop(handle); goto write_directly; } ext4_update_super(sbi->s_sb); - if (jbd2_journal_dirty_metadata(handle, sbi->s_sbh)) { + if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { + ext4_msg(sbi->s_sb, KERN_ERR, "previous I/O error to " + "superblock detected"); + clear_buffer_write_io_error(sbh); + set_buffer_uptodate(sbh); + } + + if (jbd2_journal_dirty_metadata(handle, sbh)) { jbd2_journal_stop(handle); goto write_directly; } -- 2.25.1

1 2

[PATCH kernel-4.19] arm64/config: Set CONFIG_TXGBE=m by default
by Yang Yingliang 19 Jul '21

19 Jul '21

From: zhenpengzheng <zhenpengzheng(a)net-swift.com> hulk inclusion category: feature bugzilla: 50777 CVE: NA ------------------------------------------------------------------------- Ensure the netswift 10G NIC driver ko can be distributed in ISO on X86. Signed-off-by: zhenpengzheng <zhenpengzheng(a)net-swift.com> Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com> Reviewed-by: Jian Cheng <cj.chengjian(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 1e1f7f3201028..8cfe2c3cd7dde 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -2489,6 +2489,8 @@ CONFIG_HNS3_CAE=m CONFIG_NET_VENDOR_HUAWEI=y CONFIG_HINIC=m CONFIG_BMA=m +CONFIG_NET_VENDOR_NETSWIFT=y +CONFIG_TXGBE=m # CONFIG_NET_VENDOR_I825XX is not set CONFIG_NET_VENDOR_INTEL=y # CONFIG_E100 is not set -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS 1/4] Add traffic policy for low cache available.
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Xu Wei <xuwei56(a)huawei.com> euleros inclusion category: feature bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327 CVE: NA When cache available is low, bcache turn to writethrough mode. Therefore, All write IO will be directly sent to backend device, which is usually HDD. At same time, cache device flush dirty data to the backend device in the bcache writeback process. So write IO from user will damage the sequentiality of writeback. And if there is lots of IO from writeback, user's write IO may be block. This patch add traffic policy in bcache to solve the problem and improve the performance for bcache when cache available is low. Signed-off-by: qinghaixiang <xuweiqhx(a)163.com> Signed-off-by: Xu Wei <xuwei56(a)huawei.com> Acked-by: Xie XiuQi <xiexiuqi(a)huawei.com> Reviewed-by: Li Ruilin <liruilin4(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/md/bcache/bcache.h | 49 ++++++++++++ drivers/md/bcache/btree.h | 6 +- drivers/md/bcache/request.c | 143 +++++++++++++++++++++++++++++++++- drivers/md/bcache/request.h | 2 + drivers/md/bcache/super.c | 35 +++++++++ drivers/md/bcache/sysfs.c | 56 +++++++++++++ drivers/md/bcache/writeback.c | 11 ++- drivers/md/bcache/writeback.h | 6 +- 8 files changed, 300 insertions(+), 8 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 99d12fce876b2..70fbde8ca70c9 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -399,6 +399,28 @@ struct cached_dev { unsigned int offline_seconds; char backing_dev_name[BDEVNAME_SIZE]; + + /* Count the front and writeback io bandwidth per second */ + atomic_t writeback_sector_size; + atomic_t writeback_io_num; + atomic_t front_io_num; + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + struct timer_list io_stat_timer; + + unsigned int writeback_state; +#define WRITEBACK_DEFAULT 0 +#define WRITEBACK_QUICK 1 +#define WRITEBACK_SLOW 2 + + /* realize for token bucket */ + spinlock_t token_lock; + unsigned int max_sector_size; + unsigned int max_io_num; + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + struct timer_list token_assign_timer; }; enum alloc_reserve { @@ -717,6 +739,10 @@ struct cache_set { #define BUCKET_HASH_BITS 12 struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; + unsigned int cutoff_writeback_sync; + bool traffic_policy_start; + bool force_write_through; + unsigned int gc_sectors; }; struct bbio { @@ -732,6 +758,29 @@ struct bbio { struct bio bio; }; +struct get_bcache_status { + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + uint64_t dirty_rate; + unsigned int available; +}; + +struct set_bcache_status { + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + bool traffic_policy_start; + bool force_write_through; + bool copy_gc_enabled; + bool trigger_gc; + unsigned int writeback_state; + unsigned int gc_sectors; + unsigned int cutoff_writeback_sync; +}; +#define BCACHE_MAJOR 'B' +#define BCACHE_GET_WRITE_STATUS _IOR(BCACHE_MAJOR, 0x0, struct get_bcache_status) +#define BCACHE_SET_WRITE_STATUS _IOW(BCACHE_MAJOR, 0x1, struct set_bcache_status) + #define BTREE_PRIO USHRT_MAX #define INITIAL_PRIO 32768U diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index 4d0cca145f699..7ddadcc485ea6 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -193,7 +193,11 @@ static inline unsigned int bset_block_offset(struct btree *b, struct bset *i) static inline void set_gc_sectors(struct cache_set *c) { - atomic_set(&c->sectors_to_gc, c->sb.bucket_size * c->nbuckets / 16); + if (c->gc_sectors == 0) + atomic_set(&c->sectors_to_gc, + c->sb.bucket_size * c->nbuckets / 16); + else + atomic_set(&c->sectors_to_gc, c->gc_sectors); } void bkey_put(struct cache_set *c, struct bkey *k); diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 6d89e56a4a410..c05544e07722e 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -28,6 +28,7 @@ struct kmem_cache *bch_search_cache; static void bch_data_insert_start(struct closure *cl); +static void alloc_token(struct cached_dev *dc, unsigned int sectors); static unsigned int cache_mode(struct cached_dev *dc) { @@ -396,7 +397,8 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) goto skip; if (mode == CACHE_MODE_NONE || - (mode == CACHE_MODE_WRITEAROUND && + ((mode == CACHE_MODE_WRITEAROUND || + c->force_write_through == true) && op_is_write(bio_op(bio)))) goto skip; @@ -858,6 +860,10 @@ static void cached_dev_read_done(struct closure *cl) if (s->iop.bio && (!dc->read_bypass || s->prefetch) && !test_bit(CACHE_SET_STOPPING, &s->iop.c->flags)) { BUG_ON(!s->iop.replace); + if ((dc->disk.c->traffic_policy_start == true) && + (dc->disk.c->force_write_through != true)) { + alloc_token(dc, bio_sectors(s->iop.bio)); + } closure_call(&s->iop.cl, bch_data_insert, NULL, cl); } @@ -1000,6 +1006,35 @@ static void cached_dev_write_complete(struct closure *cl) continue_at(cl, cached_dev_bio_complete, NULL); } +static void alloc_token(struct cached_dev *dc, unsigned int sectors) +{ + int count = 0; + + spin_lock_bh(&dc->token_lock); + + while ((dc->write_token_sector_size < sectors) && + (dc->write_token_io_num == 0)) { + spin_unlock_bh(&dc->token_lock); + schedule_timeout_interruptible(msecs_to_jiffies(10)); + count++; + if ((dc->disk.c->traffic_policy_start != true) || + (cache_mode(dc) != CACHE_MODE_WRITEBACK) || + (count > 100)) + return; + spin_lock_bh(&dc->token_lock); + } + + if (dc->write_token_sector_size >= sectors) + dc->write_token_sector_size -= sectors; + else + dc->write_token_sector_size = 0; + + if (dc->write_token_io_num > 0) + dc->write_token_io_num--; + + spin_unlock_bh(&dc->token_lock); +} + static void cached_dev_write(struct cached_dev *dc, struct search *s) { struct closure *cl = &s->cl; @@ -1247,6 +1282,7 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, cached_dev_nodata, bcache_wq); } else { + atomic_inc(&dc->front_io_num); s->iop.bypass = check_should_bypass(dc, bio); if (!s->iop.bypass && bio->bi_iter.bi_size && !rw) { @@ -1258,10 +1294,17 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, save_circ_item(&s->smp); } - if (rw) + if (rw) { + if ((s->iop.bypass == false) && + (dc->disk.c->traffic_policy_start == true) && + (cache_mode(dc) == CACHE_MODE_WRITEBACK) && + (bio_op(bio) != REQ_OP_DISCARD)) { + alloc_token(dc, bio_sectors(bio)); + } cached_dev_write(dc, s); - else + } else { cached_dev_read(dc, s); + } } } else /* I/O request sent to backing device */ @@ -1270,6 +1313,65 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, return BLK_QC_T_NONE; } +static int bcache_get_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct get_bcache_status a; + uint64_t cache_sectors; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + + a.writeback_sector_size_per_sec = dc->writeback_sector_size_per_sec; + a.writeback_io_num_per_sec = dc->writeback_io_num_per_sec; + a.front_io_num_per_sec = dc->front_io_num_per_sec; + cache_sectors = c->nbuckets * c->sb.bucket_size - + atomic_long_read(&c->flash_dev_dirty_sectors); + a.dirty_rate = div64_u64(bcache_dev_sectors_dirty(&dc->disk) * 100, + cache_sectors); + a.available = 100 - c->gc_stats.in_use; + if (copy_to_user((struct get_bcache_status *)arg, &a, + sizeof(struct get_bcache_status))) + return -EFAULT; + return 0; +} + +static int bcache_set_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct set_bcache_status a; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + if (copy_from_user(&a, (struct set_bcache_status *)arg, + sizeof(struct set_bcache_status))) + return -EFAULT; + + if (c->traffic_policy_start != a.traffic_policy_start) + pr_info("%s traffic policy %s", dc->disk.disk->disk_name, + (a.traffic_policy_start == true) ? "enable" : "disable"); + if (c->force_write_through != a.force_write_through) + pr_info("%s force write through %s", dc->disk.disk->disk_name, + (a.force_write_through == true) ? "enable" : "disable"); + if (a.trigger_gc) { + pr_info("trigger %s gc", dc->disk.disk->disk_name); + atomic_set(&c->sectors_to_gc, -1); + wake_up_gc(c); + } + if ((a.cutoff_writeback_sync >= MIN_CUTOFF_WRITEBACK_SYNC) && + (a.cutoff_writeback_sync <= MAX_CUTOFF_WRITEBACK_SYNC)) { + c->cutoff_writeback_sync = a.cutoff_writeback_sync; + } + + dc->max_sector_size = a.write_token_sector_size; + dc->max_io_num = a.write_token_io_num; + c->traffic_policy_start = a.traffic_policy_start; + c->force_write_through = a.force_write_through; + c->gc_sectors = a.gc_sectors; + dc->writeback_state = a.writeback_state; + return 0; +} + static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, unsigned int cmd, unsigned long arg) { @@ -1278,7 +1380,14 @@ static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, if (dc->io_disable) return -EIO; - return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + switch (cmd) { + case BCACHE_GET_WRITE_STATUS: + return bcache_get_write_status(dc, arg); + case BCACHE_SET_WRITE_STATUS: + return bcache_set_write_status(dc, arg); + default: + return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + } } static int cached_dev_congested(void *data, int bits) @@ -1438,3 +1547,29 @@ int __init bch_request_init(void) return 0; } + +static void token_assign(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, token_assign_timer); + + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); + + spin_lock(&dc->token_lock); + dc->write_token_sector_size = dc->max_sector_size / 8; + dc->write_token_io_num = dc->max_io_num / 8; + dc->write_token_io_num = + (dc->write_token_io_num == 0) ? 1 : dc->write_token_io_num; + spin_unlock(&dc->token_lock); +} + +void bch_traffic_policy_init(struct cached_dev *dc) +{ + spin_lock_init(&dc->token_lock); + dc->write_token_sector_size = 0; + dc->write_token_io_num = 0; + + timer_setup(&dc->token_assign_timer, token_assign, 0); + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); +} diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h index 3667bc5390dfe..f677ba8704940 100644 --- a/drivers/md/bcache/request.h +++ b/drivers/md/bcache/request.h @@ -41,6 +41,8 @@ void bch_data_insert(struct closure *cl); void bch_cached_dev_request_init(struct cached_dev *dc); void bch_flash_dev_request_init(struct bcache_device *d); +void bch_traffic_policy_init(struct cached_dev *dc); + extern struct kmem_cache *bch_search_cache, *bch_passthrough_cache; struct search { diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index e7f7a0f038682..3f858de9e9602 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1210,6 +1210,8 @@ static void cached_dev_free(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); + del_timer_sync(&dc->io_stat_timer); + del_timer_sync(&dc->token_assign_timer); if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) cancel_writeback_rate_update_dwork(dc); @@ -1250,6 +1252,36 @@ static void cached_dev_flush(struct closure *cl) continue_at(cl, cached_dev_free, system_wq); } +static void cached_dev_io_stat(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, io_stat_timer); + + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); + + dc->writeback_sector_size_per_sec = + atomic_read(&dc->writeback_sector_size); + dc->writeback_io_num_per_sec = atomic_read(&dc->writeback_io_num); + dc->front_io_num_per_sec = atomic_read(&dc->front_io_num); + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); +} + +static void cached_dev_timer_init(struct cached_dev *dc) +{ + dc->writeback_sector_size_per_sec = 0; + dc->writeback_io_num_per_sec = 0; + dc->front_io_num_per_sec = 0; + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); + + timer_setup(&dc->io_stat_timer, cached_dev_io_stat, 0); + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); +} + static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) { int ret; @@ -1266,6 +1298,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) INIT_LIST_HEAD(&dc->io_lru); spin_lock_init(&dc->io_lock); bch_cache_accounting_init(&dc->accounting, &dc->disk.cl); + cached_dev_timer_init(dc); + bch_traffic_policy_init(dc); dc->sequential_cutoff = 4 << 20; dc->inflight_block_enable = 1; @@ -1774,6 +1808,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->congested_read_threshold_us = 2000; c->congested_write_threshold_us = 20000; c->error_limit = DEFAULT_IO_ERROR_LIMIT; + c->cutoff_writeback_sync = MIN_CUTOFF_WRITEBACK_SYNC; WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags)); return c; diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 706d3a245dba6..4c693ac29b0e0 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -51,6 +51,13 @@ static const char * const error_actions[] = { NULL }; +static const char * const writeback_state[] = { + "default", + "quick", + "slow", + NULL +}; + write_attribute(attach); write_attribute(detach); write_attribute(unregister); @@ -96,6 +103,9 @@ read_attribute(io_errors); read_attribute(congested); rw_attribute(congested_read_threshold_us); rw_attribute(congested_write_threshold_us); +rw_attribute(gc_sectors); +rw_attribute(traffic_policy_start); +rw_attribute(force_write_through); rw_attribute(sequential_cutoff); rw_attribute(read_bypass); @@ -114,7 +124,13 @@ rw_attribute(writeback_rate_update_seconds); rw_attribute(writeback_rate_i_term_inverse); rw_attribute(writeback_rate_p_term_inverse); rw_attribute(writeback_rate_minimum); +rw_attribute(writeback_state); +read_attribute(writeback_sector_size_per_sec); +read_attribute(writeback_io_num_per_sec); +read_attribute(front_io_num_per_sec); read_attribute(writeback_rate_debug); +read_attribute(write_token_sector_size); +read_attribute(write_token_io_num); read_attribute(stripe_size); read_attribute(partial_stripes_expensive); @@ -169,6 +185,11 @@ SHOW(__bch_cached_dev) bch_cache_modes, BDEV_CACHE_MODE(&dc->sb)); + if (attr == &sysfs_writeback_state) + return bch_snprint_string_list(buf, PAGE_SIZE, + writeback_state, + dc->writeback_state); + if (attr == &sysfs_readahead_cache_policy) return bch_snprint_string_list(buf, PAGE_SIZE, bch_reada_cache_policies, @@ -186,6 +207,9 @@ SHOW(__bch_cached_dev) var_printf(writeback_metadata, "%i"); var_printf(writeback_running, "%i"); var_print(writeback_delay); + var_print(writeback_sector_size_per_sec); + var_print(writeback_io_num_per_sec); + var_print(front_io_num_per_sec); var_print(writeback_percent); sysfs_hprint(writeback_rate, wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); @@ -248,6 +272,8 @@ SHOW(__bch_cached_dev) sysfs_print(running, atomic_read(&dc->running)); sysfs_print(state, states[BDEV_STATE(&dc->sb)]); + var_print(write_token_sector_size); + var_print(write_token_io_num); if (attr == &sysfs_label) { memcpy(buf, dc->sb.label, SB_LABEL_SIZE); @@ -346,6 +372,15 @@ STORE(__cached_dev) } } + if (attr == &sysfs_writeback_state) { + v = __sysfs_match_string(writeback_state, -1, buf); + + if (v < 0) + return v; + + dc->writeback_state = v; + } + if (attr == &sysfs_readahead_cache_policy) { v = __sysfs_match_string(bch_reada_cache_policies, -1, buf); if (v < 0) @@ -448,11 +483,14 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_data_csum, #endif &sysfs_cache_mode, + &sysfs_writeback_state, &sysfs_readahead_cache_policy, &sysfs_stop_when_cache_set_failed, &sysfs_writeback_metadata, &sysfs_writeback_running, &sysfs_writeback_delay, + &sysfs_writeback_sector_size_per_sec, + &sysfs_writeback_io_num_per_sec, &sysfs_writeback_percent, &sysfs_writeback_rate, &sysfs_writeback_rate_update_seconds, @@ -460,6 +498,9 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_minimum, &sysfs_writeback_rate_debug, + &sysfs_write_token_sector_size, + &sysfs_write_token_io_num, + &sysfs_front_io_num_per_sec, &sysfs_io_errors, &sysfs_io_error_limit, &sysfs_io_disable, @@ -714,6 +755,12 @@ SHOW(__bch_cache_set) c->congested_read_threshold_us); sysfs_print(congested_write_threshold_us, c->congested_write_threshold_us); + sysfs_print(gc_sectors, + c->gc_sectors); + sysfs_print(traffic_policy_start, + c->traffic_policy_start); + sysfs_print(force_write_through, + c->force_write_through); sysfs_print(active_journal_entries, fifo_used(&c->journal.pin)); sysfs_printf(verify, "%i", c->verify); @@ -800,6 +847,12 @@ STORE(__bch_cache_set) c->congested_read_threshold_us); sysfs_strtoul(congested_write_threshold_us, c->congested_write_threshold_us); + sysfs_strtoul(gc_sectors, + c->gc_sectors); + sysfs_strtoul(traffic_policy_start, + c->traffic_policy_start); + sysfs_strtoul(force_write_through, + c->force_write_through); if (attr == &sysfs_errors) { v = __sysfs_match_string(error_actions, -1, buf); @@ -926,6 +979,9 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_btree_shrinker_disabled, &sysfs_copy_gc_enabled, &sysfs_io_disable, + &sysfs_gc_sectors, + &sysfs_traffic_policy_start, + &sysfs_force_write_through, NULL }; KTYPE(bch_cache_set_internal); diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index b5fc3c6c7178e..901ad8bae7614 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -222,7 +222,13 @@ static unsigned int writeback_delay(struct cached_dev *dc, !dc->writeback_percent) return 0; - return bch_next_delay(&dc->writeback_rate, sectors); + if (dc->writeback_state == WRITEBACK_DEFAULT) { + return bch_next_delay(&dc->writeback_rate, sectors); + } else if (dc->writeback_state == WRITEBACK_QUICK) { + return 0; + } else { + return msecs_to_jiffies(1000); + } } struct dirty_io { @@ -287,6 +293,9 @@ static void write_dirty_finish(struct closure *cl) : &dc->disk.c->writeback_keys_done); } + atomic_add(KEY_SIZE(&w->key), &dc->writeback_sector_size); + atomic_inc(&dc->writeback_io_num); + bch_keybuf_del(&dc->writeback_keys, w); up(&dc->in_flight); diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index e75dc33339f6f..a3151c0e96609 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -3,7 +3,8 @@ #define _BCACHE_WRITEBACK_H #define CUTOFF_WRITEBACK 40 -#define CUTOFF_WRITEBACK_SYNC 70 +#define MIN_CUTOFF_WRITEBACK_SYNC 70 +#define MAX_CUTOFF_WRITEBACK_SYNC 90 #define MAX_WRITEBACKS_IN_PASS 5 #define MAX_WRITESIZE_IN_PASS 5000 /* *512b */ @@ -57,10 +58,11 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio, unsigned int cache_mode, bool would_skip) { unsigned int in_use = dc->disk.c->gc_stats.in_use; + unsigned int cutoff = dc->disk.c->cutoff_writeback_sync; if (cache_mode != CACHE_MODE_WRITEBACK || test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || - in_use > CUTOFF_WRITEBACK_SYNC) + in_use > cutoff) return false; if (bio_op(bio) == REQ_OP_DISCARD) -- 2.25.1

1 3

[PATCH kernel-4.19 1/4] Add traffic policy for low cache available.
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Xu Wei <xuwei56(a)huawei.com> euleros inclusion category: feature bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327 CVE: NA When cache available is low, bcache turn to writethrough mode. Therefore, All write IO will be directly sent to backend device, which is usually HDD. At same time, cache device flush dirty data to the backend device in the bcache writeback process. So write IO from user will damage the sequentiality of writeback. And if there is lots of IO from writeback, user's write IO may be block. This patch add traffic policy in bcache to solve the problem and improve the performance for bcache when cache available is low. Signed-off-by: qinghaixiang <xuweiqhx(a)163.com> Signed-off-by: Xu Wei <xuwei56(a)huawei.com> Acked-by: Xie XiuQi <xiexiuqi(a)huawei.com> Reviewed-by: Li Ruilin <liruilin4(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/md/bcache/bcache.h | 49 ++++++++++++ drivers/md/bcache/btree.h | 6 +- drivers/md/bcache/request.c | 145 ++++++++++++++++++++++++++++++++-- drivers/md/bcache/request.h | 2 + drivers/md/bcache/super.c | 36 +++++++++ drivers/md/bcache/sysfs.c | 56 +++++++++++++ drivers/md/bcache/writeback.c | 11 ++- drivers/md/bcache/writeback.h | 3 +- 8 files changed, 300 insertions(+), 8 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 3340f59117112..aa9e85dc04872 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -400,6 +400,28 @@ struct cached_dev { unsigned int offline_seconds; char backing_dev_name[BDEVNAME_SIZE]; + + /* Count the front and writeback io bandwidth per second */ + atomic_t writeback_sector_size; + atomic_t writeback_io_num; + atomic_t front_io_num; + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + struct timer_list io_stat_timer; + + unsigned int writeback_state; +#define WRITEBACK_DEFAULT 0 +#define WRITEBACK_QUICK 1 +#define WRITEBACK_SLOW 2 + + /* realize for token bucket */ + spinlock_t token_lock; + unsigned int max_sector_size; + unsigned int max_io_num; + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + struct timer_list token_assign_timer; }; enum alloc_reserve { @@ -739,6 +761,10 @@ struct cache_set { #define BUCKET_HASH_BITS 12 struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; + unsigned int cutoff_writeback_sync; + bool traffic_policy_start; + bool force_write_through; + unsigned int gc_sectors; }; struct bbio { @@ -754,6 +780,29 @@ struct bbio { struct bio bio; }; +struct get_bcache_status { + unsigned int writeback_sector_size_per_sec; + unsigned int writeback_io_num_per_sec; + unsigned int front_io_num_per_sec; + uint64_t dirty_rate; + unsigned int available; +}; + +struct set_bcache_status { + unsigned int write_token_sector_size; + unsigned int write_token_io_num; + bool traffic_policy_start; + bool force_write_through; + bool copy_gc_enabled; + bool trigger_gc; + unsigned int writeback_state; + unsigned int gc_sectors; + unsigned int cutoff_writeback_sync; +}; +#define BCACHE_MAJOR 'B' +#define BCACHE_GET_WRITE_STATUS _IOR(BCACHE_MAJOR, 0x0, struct get_bcache_status) +#define BCACHE_SET_WRITE_STATUS _IOW(BCACHE_MAJOR, 0x1, struct set_bcache_status) + #define BTREE_PRIO USHRT_MAX #define INITIAL_PRIO 32768U diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index 76cfd121a4861..7c5f7c235c533 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -193,7 +193,11 @@ static inline unsigned int bset_block_offset(struct btree *b, struct bset *i) static inline void set_gc_sectors(struct cache_set *c) { - atomic_set(&c->sectors_to_gc, c->sb.bucket_size * c->nbuckets / 16); + if (c->gc_sectors == 0) + atomic_set(&c->sectors_to_gc, + c->sb.bucket_size * c->nbuckets / 16); + else + atomic_set(&c->sectors_to_gc, c->gc_sectors); } void bkey_put(struct cache_set *c, struct bkey *k); diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 2f3ecd8fc5801..50303841dd45b 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -28,6 +28,7 @@ struct kmem_cache *bch_search_cache; static void bch_data_insert_start(struct closure *cl); +static void alloc_token(struct cached_dev *dc, unsigned int sectors); static unsigned int cache_mode(struct cached_dev *dc) { @@ -384,8 +385,9 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) goto skip; if (mode == CACHE_MODE_NONE || - (mode == CACHE_MODE_WRITEAROUND && - op_is_write(bio_op(bio)))) + ((mode == CACHE_MODE_WRITEAROUND || + c->force_write_through == true) && + op_is_write(bio_op(bio)))) goto skip; /* @@ -863,6 +865,10 @@ static void cached_dev_read_done(struct closure *cl) if (s->iop.bio && (!dc->read_bypass || s->prefetch) && !test_bit(CACHE_SET_STOPPING, &s->iop.c->flags)) { BUG_ON(!s->iop.replace); + if ((dc->disk.c->traffic_policy_start == true) && + (dc->disk.c->force_write_through != true)) { + alloc_token(dc, bio_sectors(s->iop.bio)); + } closure_call(&s->iop.cl, bch_data_insert, NULL, cl); } @@ -1005,6 +1011,35 @@ static void cached_dev_write_complete(struct closure *cl) continue_at(cl, cached_dev_bio_complete, NULL); } +static void alloc_token(struct cached_dev *dc, unsigned int sectors) +{ + int count = 0; + + spin_lock_bh(&dc->token_lock); + + while ((dc->write_token_sector_size < sectors) && + (dc->write_token_io_num == 0)) { + spin_unlock_bh(&dc->token_lock); + schedule_timeout_interruptible(msecs_to_jiffies(10)); + count++; + if ((dc->disk.c->traffic_policy_start != true) || + (cache_mode(dc) != CACHE_MODE_WRITEBACK) || + (count > 100)) + return; + spin_lock_bh(&dc->token_lock); + } + + if (dc->write_token_sector_size >= sectors) + dc->write_token_sector_size -= sectors; + else + dc->write_token_sector_size = 0; + + if (dc->write_token_io_num > 0) + dc->write_token_io_num--; + + spin_unlock_bh(&dc->token_lock); +} + static void cached_dev_write(struct cached_dev *dc, struct search *s) { struct closure *cl = &s->cl; @@ -1252,6 +1287,7 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, cached_dev_nodata, bcache_wq); } else { + atomic_inc(&dc->front_io_num); s->iop.bypass = check_should_bypass(dc, bio); if (!s->iop.bypass && bio->bi_iter.bi_size && !rw) { @@ -1263,10 +1299,17 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, save_circ_item(&s->smp); } - if (rw) + if (rw) { + if ((s->iop.bypass == false) && + (dc->disk.c->traffic_policy_start == true) && + (cache_mode(dc) == CACHE_MODE_WRITEBACK) && + (bio_op(bio) != REQ_OP_DISCARD)) { + alloc_token(dc, bio_sectors(bio)); + } cached_dev_write(dc, s); - else + } else { cached_dev_read(dc, s); + } } } else /* I/O request sent to backing device */ @@ -1275,6 +1318,65 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q, return BLK_QC_T_NONE; } +static int bcache_get_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct get_bcache_status a; + uint64_t cache_sectors; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + + a.writeback_sector_size_per_sec = dc->writeback_sector_size_per_sec; + a.writeback_io_num_per_sec = dc->writeback_io_num_per_sec; + a.front_io_num_per_sec = dc->front_io_num_per_sec; + cache_sectors = c->nbuckets * c->sb.bucket_size - + atomic_long_read(&c->flash_dev_dirty_sectors); + a.dirty_rate = div64_u64(bcache_dev_sectors_dirty(&dc->disk) * 100, + cache_sectors); + a.available = 100 - c->gc_stats.in_use; + if (copy_to_user((struct get_bcache_status *)arg, &a, + sizeof(struct get_bcache_status))) + return -EFAULT; + return 0; +} + +static int bcache_set_write_status(struct cached_dev *dc, unsigned long arg) +{ + struct set_bcache_status a; + struct cache_set *c = dc->disk.c; + + if (c == NULL) + return -ENODEV; + if (copy_from_user(&a, (struct set_bcache_status *)arg, + sizeof(struct set_bcache_status))) + return -EFAULT; + + if (c->traffic_policy_start != a.traffic_policy_start) + pr_info("%s traffic policy %s", dc->disk.disk->disk_name, + (a.traffic_policy_start == true) ? "enable" : "disable"); + if (c->force_write_through != a.force_write_through) + pr_info("%s force write through %s", dc->disk.disk->disk_name, + (a.force_write_through == true) ? "enable" : "disable"); + if (a.trigger_gc) { + pr_info("trigger %s gc", dc->disk.disk->disk_name); + atomic_set(&c->sectors_to_gc, -1); + wake_up_gc(c); + } + if ((a.cutoff_writeback_sync >= CUTOFF_WRITEBACK_SYNC) && + (a.cutoff_writeback_sync <= CUTOFF_WRITEBACK_SYNC_MAX)) { + c->cutoff_writeback_sync = a.cutoff_writeback_sync; + } + + dc->max_sector_size = a.write_token_sector_size; + dc->max_io_num = a.write_token_io_num; + c->traffic_policy_start = a.traffic_policy_start; + c->force_write_through = a.force_write_through; + c->gc_sectors = a.gc_sectors; + dc->writeback_state = a.writeback_state; + return 0; +} + static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, unsigned int cmd, unsigned long arg) { @@ -1283,7 +1385,14 @@ static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode, if (dc->io_disable) return -EIO; - return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + switch (cmd) { + case BCACHE_GET_WRITE_STATUS: + return bcache_get_write_status(dc, arg); + case BCACHE_SET_WRITE_STATUS: + return bcache_set_write_status(dc, arg); + default: + return __blkdev_driver_ioctl(dc->bdev, mode, cmd, arg); + } } static int cached_dev_congested(void *data, int bits) @@ -1443,3 +1552,29 @@ int __init bch_request_init(void) return 0; } + +static void token_assign(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, token_assign_timer); + + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); + + spin_lock(&dc->token_lock); + dc->write_token_sector_size = dc->max_sector_size / 8; + dc->write_token_io_num = dc->max_io_num / 8; + dc->write_token_io_num = + (dc->write_token_io_num == 0) ? 1 : dc->write_token_io_num; + spin_unlock(&dc->token_lock); +} + +void bch_traffic_policy_init(struct cached_dev *dc) +{ + spin_lock_init(&dc->token_lock); + dc->write_token_sector_size = 0; + dc->write_token_io_num = 0; + + timer_setup(&dc->token_assign_timer, token_assign, 0); + dc->token_assign_timer.expires = jiffies + HZ / 8; + add_timer(&dc->token_assign_timer); +} diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h index 7682630db73f6..8e2ab8071b5bf 100644 --- a/drivers/md/bcache/request.h +++ b/drivers/md/bcache/request.h @@ -41,6 +41,8 @@ void bch_data_insert(struct closure *cl); void bch_cached_dev_request_init(struct cached_dev *dc); void bch_flash_dev_request_init(struct bcache_device *d); +void bch_traffic_policy_init(struct cached_dev *dc); + extern struct kmem_cache *bch_search_cache; struct search { diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 754e88895738b..e2828c762edc5 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1260,6 +1260,9 @@ static void cached_dev_free(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); + del_timer_sync(&dc->io_stat_timer); + del_timer_sync(&dc->token_assign_timer); + if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) cancel_writeback_rate_update_dwork(dc); @@ -1303,6 +1306,36 @@ static void cached_dev_flush(struct closure *cl) continue_at(cl, cached_dev_free, system_wq); } +static void cached_dev_io_stat(struct timer_list *t) +{ + struct cached_dev *dc = from_timer(dc, t, io_stat_timer); + + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); + + dc->writeback_sector_size_per_sec = + atomic_read(&dc->writeback_sector_size); + dc->writeback_io_num_per_sec = atomic_read(&dc->writeback_io_num); + dc->front_io_num_per_sec = atomic_read(&dc->front_io_num); + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); +} + +static void cached_dev_timer_init(struct cached_dev *dc) +{ + dc->writeback_sector_size_per_sec = 0; + dc->writeback_io_num_per_sec = 0; + dc->front_io_num_per_sec = 0; + atomic_set(&dc->writeback_sector_size, 0); + atomic_set(&dc->writeback_io_num, 0); + atomic_set(&dc->front_io_num, 0); + + timer_setup(&dc->io_stat_timer, cached_dev_io_stat, 0); + dc->io_stat_timer.expires = jiffies + HZ; + add_timer(&dc->io_stat_timer); +} + static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) { int ret; @@ -1319,6 +1352,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) INIT_LIST_HEAD(&dc->io_lru); spin_lock_init(&dc->io_lock); bch_cache_accounting_init(&dc->accounting, &dc->disk.cl); + cached_dev_timer_init(dc); + bch_traffic_policy_init(dc); dc->sequential_cutoff = 4 << 20; dc->inflight_block_enable = 1; @@ -1837,6 +1872,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->congested_read_threshold_us = 2000; c->congested_write_threshold_us = 20000; c->error_limit = DEFAULT_IO_ERROR_LIMIT; + c->cutoff_writeback_sync = bch_cutoff_writeback_sync; c->idle_max_writeback_rate_enabled = 1; WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags)); diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index e23c426229399..097560389760c 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -53,6 +53,13 @@ static const char * const error_actions[] = { NULL }; +static const char * const writeback_state[] = { + "default", + "quick", + "slow", + NULL +}; + write_attribute(attach); write_attribute(detach); write_attribute(unregister); @@ -102,6 +109,9 @@ read_attribute(cutoff_writeback); read_attribute(cutoff_writeback_sync); rw_attribute(congested_read_threshold_us); rw_attribute(congested_write_threshold_us); +rw_attribute(gc_sectors); +rw_attribute(traffic_policy_start); +rw_attribute(force_write_through); rw_attribute(sequential_cutoff); rw_attribute(read_bypass); @@ -120,7 +130,13 @@ rw_attribute(writeback_rate_update_seconds); rw_attribute(writeback_rate_i_term_inverse); rw_attribute(writeback_rate_p_term_inverse); rw_attribute(writeback_rate_minimum); +rw_attribute(writeback_state); +read_attribute(writeback_sector_size_per_sec); +read_attribute(writeback_io_num_per_sec); +read_attribute(front_io_num_per_sec); read_attribute(writeback_rate_debug); +read_attribute(write_token_sector_size); +read_attribute(write_token_io_num); read_attribute(stripe_size); read_attribute(partial_stripes_expensive); @@ -177,6 +193,11 @@ SHOW(__bch_cached_dev) bch_cache_modes, BDEV_CACHE_MODE(&dc->sb)); + if (attr == &sysfs_writeback_state) + return bch_snprint_string_list(buf, PAGE_SIZE, + writeback_state, + dc->writeback_state); + if (attr == &sysfs_readahead_cache_policy) return bch_snprint_string_list(buf, PAGE_SIZE, bch_reada_cache_policies, @@ -194,6 +215,9 @@ SHOW(__bch_cached_dev) var_printf(writeback_metadata, "%i"); var_printf(writeback_running, "%i"); var_print(writeback_delay); + var_print(writeback_sector_size_per_sec); + var_print(writeback_io_num_per_sec); + var_print(front_io_num_per_sec); var_print(writeback_percent); sysfs_hprint(writeback_rate, wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); @@ -256,6 +280,8 @@ SHOW(__bch_cached_dev) sysfs_print(running, atomic_read(&dc->running)); sysfs_print(state, states[BDEV_STATE(&dc->sb)]); + var_print(write_token_sector_size); + var_print(write_token_io_num); if (attr == &sysfs_label) { memcpy(buf, dc->sb.label, SB_LABEL_SIZE); @@ -375,6 +401,15 @@ STORE(__cached_dev) } } + if (attr == &sysfs_writeback_state) { + v = __sysfs_match_string(writeback_state, -1, buf); + + if (v < 0) + return v; + + dc->writeback_state = v; + } + if (attr == &sysfs_readahead_cache_policy) { v = __sysfs_match_string(bch_reada_cache_policies, -1, buf); if (v < 0) @@ -498,11 +533,14 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_data_csum, #endif &sysfs_cache_mode, + &sysfs_writeback_state, &sysfs_readahead_cache_policy, &sysfs_stop_when_cache_set_failed, &sysfs_writeback_metadata, &sysfs_writeback_running, &sysfs_writeback_delay, + &sysfs_writeback_sector_size_per_sec, + &sysfs_writeback_io_num_per_sec, &sysfs_writeback_percent, &sysfs_writeback_rate, &sysfs_writeback_rate_update_seconds, @@ -510,6 +548,9 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_minimum, &sysfs_writeback_rate_debug, + &sysfs_write_token_sector_size, + &sysfs_write_token_io_num, + &sysfs_front_io_num_per_sec, &sysfs_io_errors, &sysfs_io_error_limit, &sysfs_io_disable, @@ -770,6 +811,12 @@ SHOW(__bch_cache_set) c->congested_read_threshold_us); sysfs_print(congested_write_threshold_us, c->congested_write_threshold_us); + sysfs_print(gc_sectors, + c->gc_sectors); + sysfs_print(traffic_policy_start, + c->traffic_policy_start); + sysfs_print(force_write_through, + c->force_write_through); sysfs_print(cutoff_writeback, bch_cutoff_writeback); sysfs_print(cutoff_writeback_sync, bch_cutoff_writeback_sync); @@ -855,6 +902,12 @@ STORE(__bch_cache_set) sysfs_strtoul_clamp(congested_write_threshold_us, c->congested_write_threshold_us, 0, UINT_MAX); + sysfs_strtoul(gc_sectors, + c->gc_sectors); + sysfs_strtoul(traffic_policy_start, + c->traffic_policy_start); + sysfs_strtoul(force_write_through, + c->force_write_through); if (attr == &sysfs_errors) { v = __sysfs_match_string(error_actions, -1, buf); @@ -997,6 +1050,9 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_idle_max_writeback_rate, &sysfs_gc_after_writeback, &sysfs_io_disable, + &sysfs_gc_sectors, + &sysfs_traffic_policy_start, + &sysfs_force_write_through, &sysfs_cutoff_writeback, &sysfs_cutoff_writeback_sync, NULL diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 25a66e54f7ba5..1e3041f6363c7 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -239,7 +239,13 @@ static unsigned int writeback_delay(struct cached_dev *dc, !dc->writeback_percent) return 0; - return bch_next_delay(&dc->writeback_rate, sectors); + if (dc->writeback_state == WRITEBACK_DEFAULT) { + return bch_next_delay(&dc->writeback_rate, sectors); + } else if (dc->writeback_state == WRITEBACK_QUICK) { + return 0; + } else { + return msecs_to_jiffies(1000); + } } struct dirty_io { @@ -304,6 +310,9 @@ static void write_dirty_finish(struct closure *cl) : &dc->disk.c->writeback_keys_done); } + atomic_add(KEY_SIZE(&w->key), &dc->writeback_sector_size); + atomic_inc(&dc->writeback_io_num); + bch_keybuf_del(&dc->writeback_keys, w); up(&dc->in_flight); diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index c4ff76037227b..912b1a146cea2 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -80,10 +80,11 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio, unsigned int cache_mode, bool would_skip) { unsigned int in_use = dc->disk.c->gc_stats.in_use; + unsigned int cutoff = dc->disk.c->cutoff_writeback_sync; if (cache_mode != CACHE_MODE_WRITEBACK || test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || - in_use > bch_cutoff_writeback_sync) + in_use > cutoff) return false; if (bio_op(bio) == REQ_OP_DISCARD) -- 2.25.1

1 3

[PATCH openEuler-20.03-LTS-tag] kernel.spec: fix rpmbuild error with patch
by Gou Hao 19 Jul '21

19 Jul '21

From: gouhao <gouhao(a)uniontech.com> kernel.spec: fix rpmbuild error with patch uniontech inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I3ZNR0 CVE: NA -------------------------------------------------------------- When %{with_patch} is 1, the second compilation will fail when compiling with rpmbuild, and the failure message is: Reversed (or previously applied) patch detected! rpm package compilation usually deletes the source directory in ~/rpmbuild/BUILD in the %prep stage, but in our spec, in the %prep stage, if %{with patch} is 1 and vanilla-%{TarballVer} exists, there is no delete the source code directory and enter it directly， it will cause this bug. Signed-off-by: gouhao <gouhao(a)uniontech.com> --- kernel.spec | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/kernel.spec b/kernel.spec index e00a323..11c76aa 100644 --- a/kernel.spec +++ b/kernel.spec @@ -205,20 +205,14 @@ package or when debugging this package.\ %endif %prep -%if 0%{?with_patch} -if [ ! -d kernel-%{version}/vanilla-%{TarballVer} ];then -%setup -q -n kernel-%{version} -a 9998 -c - mv linux-%{TarballVer} vanilla-%{TarballVer} -else - cd kernel-%{version} -fi -cp -rl vanilla-%{TarballVer} linux-%{KernelVer} -%else + %setup -q -n kernel-%{version} -c -mv kernel linux-%{version} -cp -rl linux-%{version} linux-%{KernelVer} + +%if 0%{?with_patch} +tar -xjf %{SOURCE9998} %endif +mv linux-%{version} linux-%{KernelVer} cd linux-%{KernelVer} %if 0%{?with_patch} -- 2.20.1

1 0

[PATCH kernel-4.19] igmp: Add ip_mc_list lock in ip_check_mc_rcu
by Yang Yingliang 19 Jul '21

19 Jul '21

From: Liu Jian <liujian56(a)huawei.com> mainline inclusion from mainline-net-next commit 23d2b94043ca8835bd1e67749020e839f396a1c2 category: bugfix bugzilla: NA CVE: NA -------------------------------- I got below panic when doing fuzz test: Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G B 5.14.0-rc1-00195-gcff5c4254439-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x7a/0x9b panic+0x2cd/0x5af end_report.cold+0x5a/0x5a kasan_report+0xec/0x110 ip_check_mc_rcu+0x556/0x5d0 __mkroute_output+0x895/0x1740 ip_route_output_key_hash_rcu+0x2d0/0x1050 ip_route_output_key_hash+0x182/0x2e0 ip_route_output_flow+0x28/0x130 udp_sendmsg+0x165d/0x2280 udpv6_sendmsg+0x121e/0x24f0 inet6_sendmsg+0xf7/0x140 sock_sendmsg+0xe9/0x180 ____sys_sendmsg+0x2b8/0x7a0 ___sys_sendmsg+0xf0/0x160 __sys_sendmmsg+0x17e/0x3c0 __x64_sys_sendmmsg+0x9e/0x100 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x462eb9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9 RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff It is one use-after-free in ip_check_mc_rcu. In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection. But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock. Signed-off-by: Liu Jian <liujian56(a)huawei.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Reviewed-by: Wei Yongjun <weiyongjun1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- net/ipv4/igmp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index ffa847fc96194..1a0953b2b1396 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -2736,6 +2736,7 @@ int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u rv = 1; } else if (im) { if (src_addr) { + spin_lock_bh(&im->lock); for (psf = im->sources; psf; psf = psf->sf_next) { if (psf->sf_inaddr == src_addr) break; @@ -2746,6 +2747,7 @@ int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u im->sfcount[MCAST_EXCLUDE]; else rv = im->sfcount[MCAST_EXCLUDE] != 0; + spin_unlock_bh(&im->lock); } else rv = 1; /* unspecified source; tentatively allow */ } -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] memcg: fix unsuitable null check after alloc memory
by Yang Yingliang 18 Jul '21

18 Jul '21

From: Lu Jialin <lujialin4(a)huawei.com> hulk inclusion category: bugfix bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I CVE: NA -------- Signed-off-by: Lu Jialin <lujialin4(a)huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/memcontrol.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 713a839013f72..e55b46d5d0fcb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4850,8 +4850,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!node_state(node, N_NORMAL_MEMORY)) tmp = -1; pn_ext = kzalloc_node(sizeof(*pn_ext), GFP_KERNEL, tmp); - pn = &pn_ext->pn; - if (!pn) + if (!pn_ext) return 1; pn_ext->lruvec_stat_local = alloc_percpu(struct lruvec_stat); @@ -4860,6 +4859,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) return 1; } + pn = &pn_ext->pn; pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); if (!pn->lruvec_stat_cpu) { free_percpu(pn_ext->lruvec_stat_local); @@ -4927,10 +4927,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void) size += nr_node_ids * sizeof(struct mem_cgroup_per_node *); memcg_ext = kzalloc(size, GFP_KERNEL); - memcg = &memcg_ext->memcg; - if (!memcg) + if (!memcg_ext) return NULL; + memcg = &memcg_ext->memcg; memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX, GFP_KERNEL); -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] config: enable KASAN and UBSAN by default
by Yang Yingliang 16 Jul '21

16 Jul '21

hulk inclusion category: bugfix bugzilla: NA CVE: NA -------------------------------- Enable KASAN and UBSAN by default for test. Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/configs/hulk_defconfig | 10 ++++++++-- arch/x86/configs/hulk_defconfig | 10 ++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/arm64/configs/hulk_defconfig b/arch/arm64/configs/hulk_defconfig index 741b33fab4e9c..6a3937a4017dd 100644 --- a/arch/arm64/configs/hulk_defconfig +++ b/arch/arm64/configs/hulk_defconfig @@ -5504,7 +5504,10 @@ CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_PER_CPU_MAPS is not set CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_CC_HAS_SANCOV_TRACE_PC=y # CONFIG_KCOV is not set @@ -5668,7 +5671,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y diff --git a/arch/x86/configs/hulk_defconfig b/arch/x86/configs/hulk_defconfig index 6ea79ca8a8a1e..f43f5ead30fa4 100644 --- a/arch/x86/configs/hulk_defconfig +++ b/arch/x86/configs/hulk_defconfig @@ -7331,7 +7331,10 @@ CONFIG_DEBUG_MEMORY_INIT=y CONFIG_HAVE_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_DEBUG_SHIRQ=y @@ -7504,7 +7507,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y # CONFIG_IO_STRICT_DEVMEM is not set -- 2.25.1

1 0

[PATCH kernel-4.19] config: enable KASAN and UBSAN by default
by Yang Yingliang 16 Jul '21

16 Jul '21

hulk inclusion category: bugfix bugzilla: NA CVE: NA -------------------------------- Enable KASAN and UBSAN by default for test. Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- arch/arm64/configs/hulk_defconfig | 10 ++++++++-- arch/x86/configs/hulk_defconfig | 10 ++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/arm64/configs/hulk_defconfig b/arch/arm64/configs/hulk_defconfig index 59069097da854..6e9b7ff0bf2dc 100644 --- a/arch/arm64/configs/hulk_defconfig +++ b/arch/arm64/configs/hulk_defconfig @@ -5542,7 +5542,10 @@ CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_PER_CPU_MAPS is not set CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_CC_HAS_SANCOV_TRACE_PC=y # CONFIG_KCOV is not set @@ -5708,7 +5711,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y diff --git a/arch/x86/configs/hulk_defconfig b/arch/x86/configs/hulk_defconfig index b0e92300198c2..38ac853d4e31c 100644 --- a/arch/x86/configs/hulk_defconfig +++ b/arch/x86/configs/hulk_defconfig @@ -7320,7 +7320,10 @@ CONFIG_DEBUG_MEMORY_INIT=y CONFIG_HAVE_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_HAVE_ARCH_KASAN=y -# CONFIG_KASAN is not set +CONFIG_KASAN=y +# CONFIG_KASAN_OUTLINE is not set +CONFIG_KASAN_INLINE=y +# CONFIG_TEST_KASAN is not set CONFIG_ARCH_HAS_KCOV=y CONFIG_CC_HAS_SANCOV_TRACE_PC=y # CONFIG_KCOV is not set @@ -7495,7 +7498,10 @@ CONFIG_KDB_DEFAULT_ENABLE=0x0 CONFIG_KDB_KEYBOARD=y CONFIG_KDB_CONTINUE_CATASTROPHIC=0 CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y -# CONFIG_UBSAN is not set +CONFIG_UBSAN=y +CONFIG_UBSAN_SANITIZE_ALL=y +# CONFIG_UBSAN_ALIGNMENT is not set +# CONFIG_TEST_UBSAN is not set CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y CONFIG_STRICT_DEVMEM=y # CONFIG_IO_STRICT_DEVMEM is not set -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS] cpuidle: fix a build error when compiling haltpoll into module
by GONG, Ruiqi 16 Jul '21

16 Jul '21

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZURN CVE: NA -------- Kernel build would fail in case of CONFIG_HALTPOLL_CPUIDLE=m, caused by haltpoll_switch_governor() not marked as an exported symbol. Fix this by complementing the EXPORT_SYMBOL statement. Fixes: 97c227885e0a ("cpuidle: fix container_of err in cpuidle_device and cpuidle_driver") Signed-off-by: GONG, Ruiqi <gongruiqi1(a)huawei.com> Cc: Jiajun Chen <chenjiajun8(a)huawei.com> --- drivers/cpuidle/driver.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 484d0e9655fc..c97d08c90655 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -260,6 +260,8 @@ void haltpoll_switch_governor(struct cpuidle_driver *drv) } } +EXPORT_SYMBOL_GPL(haltpoll_switch_governor); + /** * cpuidle_register_driver - registers a driver * @drv: a pointer to a valid struct cpuidle_driver -- 2.27.0

3 2