Cengiz Can (1): blktrace: fix dereference after null check
Jan Kara (1): blktrace: Protect q->blk_trace with RCU
Qu Wenruo (1): btrfs: tree-checker: Remove comprehensive root owner check
Zhang Xiaoxu (1): vgacon: Fix a UAF in vgacon_invert_region
Zheng Bin (1): xfs: add agf freeblocks verify in xfs_agf_verify
drivers/video/console/vgacon.c | 3 ++ fs/btrfs/tree-checker.c | 24 --------- fs/xfs/libxfs/xfs_alloc.c | 16 ++++++ include/linux/blkdev.h | 2 +- include/linux/blktrace_api.h | 18 +++++-- kernel/trace/blktrace.c | 120 ++++++++++++++++++++++++++++++----------- 6 files changed, 122 insertions(+), 61 deletions(-)
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
mainline inclusion from mainline-v5.6-rc4 commit 513dc792d6060d5ef572e43852683097a8420f56 category: bugfix bugzilla: 13690 CVE: CVE-2020-8647, CVE-2020-8649
-------------------------------------------------
When syzkaller tests, there is a UAF: BUG: KASan: use after free in vgacon_invert_region+0x9d/0x110 at addr ffff880000100000 Read of size 2 by task syz-executor.1/16489 page:ffffea0000004000 count:0 mapcount:-127 mapping: (null) index:0x0 page flags: 0xfffff00000000() page dumped because: kasan: bad access detected CPU: 1 PID: 16489 Comm: syz-executor.1 Not tainted Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: [<ffffffffb119f309>] dump_stack+0x1e/0x20 [<ffffffffb04af957>] kasan_report+0x577/0x950 [<ffffffffb04ae652>] __asan_load2+0x62/0x80 [<ffffffffb090f26d>] vgacon_invert_region+0x9d/0x110 [<ffffffffb0a39d95>] invert_screen+0xe5/0x470 [<ffffffffb0a21dcb>] set_selection+0x44b/0x12f0 [<ffffffffb0a3bfae>] tioclinux+0xee/0x490 [<ffffffffb0a1d114>] vt_ioctl+0xff4/0x2670 [<ffffffffb0a0089a>] tty_ioctl+0x46a/0x1a10 [<ffffffffb052db3d>] do_vfs_ioctl+0x5bd/0xc40 [<ffffffffb052e2f2>] SyS_ioctl+0x132/0x170 [<ffffffffb11c9b1b>] system_call_fastpath+0x22/0x27 Memory state around the buggy address: ffff8800000fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800000fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880000100000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
It can be reproduce in the linux mainline by the program: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <linux/vt.h>
struct tiocl_selection { unsigned short xs; /* X start */ unsigned short ys; /* Y start */ unsigned short xe; /* X end */ unsigned short ye; /* Y end */ unsigned short sel_mode; /* selection mode */ };
#define TIOCL_SETSEL 2 struct tiocl { unsigned char type; unsigned char pad; struct tiocl_selection sel; };
int main() { int fd = 0; const char *dev = "/dev/char/4:1";
struct vt_consize v = {0}; struct tiocl tioc = {0};
fd = open(dev, O_RDWR, 0);
v.v_rows = 3346; ioctl(fd, VT_RESIZEX, &v);
tioc.type = TIOCL_SETSEL; ioctl(fd, TIOCLINUX, &tioc);
return 0; }
When resize the screen, update the 'vc->vc_size_row' to the new_row_size, but when 'set_origin' in 'vgacon_set_origin', vgacon use 'vga_vram_base' for 'vc_origin' and 'vc_visible_origin', not 'vc_screenbuf'. It maybe smaller than 'vc_screenbuf'. When TIOCLINUX, use the new_row_size to calc the offset, it maybe larger than the vga_vram_size in vgacon driver, then bad access. Also, if set an larger screenbuf firstly, then set an more larger screenbuf, when copy old_origin to new_origin, a bad access may happen.
So, If the screen size larger than vga_vram, resize screen should be failed. This alse fix CVE-2020-8649 and CVE-2020-8647.
Linus pointed out that overflow checking seems absent. We're saved by the existing bounds checks in vc_do_resize() with rather strict limits:
if (cols > VC_RESIZE_MAXCOL || lines > VC_RESIZE_MAXROW) return -EINVAL;
Fixes: 0aec4867dca14 ("[PATCH] SVGATextMode fix") Reference: CVE-2020-8647 and CVE-2020-8649 Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com [danvet: augment commit message to point out overflow safety] Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20200304022429.37738-1-zhangxi...
Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/video/console/vgacon.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c index c6b3bdb..bfaa9ec 100644 --- a/drivers/video/console/vgacon.c +++ b/drivers/video/console/vgacon.c @@ -1316,6 +1316,9 @@ static int vgacon_font_get(struct vc_data *c, struct console_font *font) static int vgacon_resize(struct vc_data *c, unsigned int width, unsigned int height, unsigned int user) { + if ((width << 1) * height > vga_vram_size) + return -EINVAL; + if (width % 2 || width > screen_info.orig_video_cols || height > (screen_info.orig_video_lines * vga_default_font_height)/ c->vc_font.height)
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-v5.6-rc4 commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 category: bugfix bugzilla: 13690 CVE: CVE-2019-19768
-------------------------------------------------
KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Reviewed-by: Ming Lei ming.lei@redhat.com Tested-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Reported-by: Tristan Madani tristmd@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: kernel/trace/blktrace.c [yyl: adjust context]
Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/blkdev.h | 2 +- include/linux/blktrace_api.h | 18 +++++-- kernel/trace/blktrace.c | 117 +++++++++++++++++++++++++++++++------------ 3 files changed, 100 insertions(+), 37 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 36ffbce..d3d81b5 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -633,7 +633,7 @@ struct request_queue { unsigned int sg_reserved_size; int node; #ifdef CONFIG_BLK_DEV_IO_TRACE - struct blk_trace *blk_trace; + struct blk_trace __rcu *blk_trace; struct mutex blk_trace_mutex; #endif /* diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 7bb2d8d..3b6ff59 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -51,9 +51,13 @@ extern __printf(3, 4) **/ #define blk_add_cgroup_trace_msg(q, cg, fmt, ...) \ do { \ - struct blk_trace *bt = (q)->blk_trace; \ + struct blk_trace *bt; \ + \ + rcu_read_lock(); \ + bt = rcu_dereference((q)->blk_trace); \ if (unlikely(bt)) \ __trace_note_message(bt, cg, fmt, ##__VA_ARGS__);\ + rcu_read_unlock(); \ } while (0) #define blk_add_trace_msg(q, fmt, ...) \ blk_add_cgroup_trace_msg(q, NULL, fmt, ##__VA_ARGS__) @@ -61,10 +65,14 @@ extern __printf(3, 4)
static inline bool blk_trace_note_message_enabled(struct request_queue *q) { - struct blk_trace *bt = q->blk_trace; - if (likely(!bt)) - return false; - return bt->act_mask & BLK_TC_NOTIFY; + struct blk_trace *bt; + bool ret; + + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + ret = bt && (bt->act_mask & BLK_TC_NOTIFY); + rcu_read_unlock(); + return ret; }
extern void blk_add_driver_data(struct request_queue *q, struct request *rq, diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 2868d85..3eb0d90 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -336,6 +336,7 @@ static void put_probe_ref(void)
static void blk_trace_cleanup(struct blk_trace *bt) { + synchronize_rcu(); blk_trace_free(bt); put_probe_ref(); } @@ -636,8 +637,10 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name, static int __blk_trace_startstop(struct request_queue *q, int start) { int ret; - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); if (bt == NULL) return -EINVAL;
@@ -746,8 +749,8 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) void blk_trace_shutdown(struct request_queue *q) { mutex_lock(&q->blk_trace_mutex); - - if (q->blk_trace) { + if (rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex))) { __blk_trace_startstop(q, 0); __blk_trace_remove(q); } @@ -759,8 +762,10 @@ void blk_trace_shutdown(struct request_queue *q) static union kernfs_node_id * blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ /* We don't use the 'bt' value here except as an optimization... */ + bt = rcu_dereference_protected(q->blk_trace, 1); if (!bt || !(blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP)) return NULL;
@@ -805,10 +810,14 @@ static void blk_add_trace_rq(struct request *rq, int error, unsigned int nr_bytes, u32 what, union kernfs_node_id *cgid) { - struct blk_trace *bt = rq->q->blk_trace; + struct blk_trace *bt;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(rq->q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
if (blk_rq_is_passthrough(rq)) what |= BLK_TC_ACT(BLK_TC_PC); @@ -817,6 +826,7 @@ static void blk_add_trace_rq(struct request *rq, int error,
__blk_add_trace(bt, blk_rq_trace_sector(rq), nr_bytes, req_op(rq), rq->cmd_flags, what, error, 0, NULL, cgid); + rcu_read_unlock(); }
static void blk_add_trace_rq_insert(void *ignore, @@ -862,14 +872,19 @@ static void blk_add_trace_rq_complete(void *ignore, struct request *rq, static void blk_add_trace_bio(struct request_queue *q, struct bio *bio, u32 what, int error) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
__blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, bio_op(bio), bio->bi_opf, what, error, 0, NULL, blk_trace_bio_get_cgid(q, bio)); + rcu_read_unlock(); }
static void blk_add_trace_bio_bounce(void *ignore, @@ -914,11 +929,15 @@ static void blk_add_trace_getrq(void *ignore, if (bio) blk_add_trace_bio(q, bio, BLK_TA_GETRQ, 0); else { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; + + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace);
if (bt) __blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_GETRQ, 0, 0, NULL, NULL); + rcu_read_unlock(); } }
@@ -930,27 +949,37 @@ static void blk_add_trace_sleeprq(void *ignore, if (bio) blk_add_trace_bio(q, bio, BLK_TA_SLEEPRQ, 0); else { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; + + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace);
if (bt) __blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_SLEEPRQ, 0, 0, NULL, NULL); + rcu_read_unlock(); } }
static void blk_add_trace_plug(void *ignore, struct request_queue *q) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; + + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace);
if (bt) __blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, NULL); + rcu_read_unlock(); }
static void blk_add_trace_unplug(void *ignore, struct request_queue *q, unsigned int depth, bool explicit) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) { __be64 rpdu = cpu_to_be64(depth); u32 what; @@ -962,14 +991,17 @@ static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, NULL); } + rcu_read_unlock(); }
static void blk_add_trace_split(void *ignore, struct request_queue *q, struct bio *bio, unsigned int pdu) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) { __be64 rpdu = cpu_to_be64(pdu);
@@ -978,6 +1010,7 @@ static void blk_add_trace_split(void *ignore, BLK_TA_SPLIT, bio->bi_status, sizeof(rpdu), &rpdu, blk_trace_bio_get_cgid(q, bio)); } + rcu_read_unlock(); }
/** @@ -997,11 +1030,15 @@ static void blk_add_trace_bio_remap(void *ignore, struct request_queue *q, struct bio *bio, dev_t dev, sector_t from) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; struct blk_io_trace_remap r;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
r.device_from = cpu_to_be32(dev); r.device_to = cpu_to_be32(bio_dev(bio)); @@ -1010,6 +1047,7 @@ static void blk_add_trace_bio_remap(void *ignore, __blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, bio_op(bio), bio->bi_opf, BLK_TA_REMAP, bio->bi_status, sizeof(r), &r, blk_trace_bio_get_cgid(q, bio)); + rcu_read_unlock(); }
/** @@ -1030,11 +1068,15 @@ static void blk_add_trace_rq_remap(void *ignore, struct request *rq, dev_t dev, sector_t from) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; struct blk_io_trace_remap r;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
r.device_from = cpu_to_be32(dev); r.device_to = cpu_to_be32(disk_devt(rq->rq_disk)); @@ -1043,6 +1085,7 @@ static void blk_add_trace_rq_remap(void *ignore, __blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq), rq_data_dir(rq), 0, BLK_TA_REMAP, 0, sizeof(r), &r, blk_trace_request_get_cgid(q, rq)); + rcu_read_unlock(); }
/** @@ -1060,14 +1103,19 @@ void blk_add_driver_data(struct request_queue *q, struct request *rq, void *data, size_t len) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
__blk_add_trace(bt, blk_rq_trace_sector(rq), blk_rq_bytes(rq), 0, 0, BLK_TA_DRV_DATA, 0, len, data, blk_trace_request_get_cgid(q, rq)); + rcu_read_unlock(); } EXPORT_SYMBOL_GPL(blk_add_driver_data);
@@ -1594,6 +1642,7 @@ static int blk_trace_remove_queue(struct request_queue *q) return -EINVAL;
put_probe_ref(); + synchronize_rcu(); blk_trace_free(bt); return 0; } @@ -1755,6 +1804,7 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, struct hd_struct *p = dev_to_part(dev); struct request_queue *q; struct block_device *bdev; + struct blk_trace *bt; ssize_t ret = -ENXIO;
bdev = bdget(part_devt(p)); @@ -1767,21 +1817,23 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev,
mutex_lock(&q->blk_trace_mutex);
+ bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); if (attr == &dev_attr_enable) { - ret = sprintf(buf, "%u\n", !!q->blk_trace); + ret = sprintf(buf, "%u\n", !!bt); goto out_unlock_bdev; }
- if (q->blk_trace == NULL) + if (bt == NULL) ret = sprintf(buf, "disabled\n"); else if (attr == &dev_attr_act_mask) - ret = blk_trace_mask2str(buf, q->blk_trace->act_mask); + ret = blk_trace_mask2str(buf, bt->act_mask); else if (attr == &dev_attr_pid) - ret = sprintf(buf, "%u\n", q->blk_trace->pid); + ret = sprintf(buf, "%u\n", bt->pid); else if (attr == &dev_attr_start_lba) - ret = sprintf(buf, "%llu\n", q->blk_trace->start_lba); + ret = sprintf(buf, "%llu\n", bt->start_lba); else if (attr == &dev_attr_end_lba) - ret = sprintf(buf, "%llu\n", q->blk_trace->end_lba); + ret = sprintf(buf, "%llu\n", bt->end_lba);
out_unlock_bdev: mutex_unlock(&q->blk_trace_mutex); @@ -1798,6 +1850,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, struct block_device *bdev; struct request_queue *q; struct hd_struct *p; + struct blk_trace *bt; u64 value; ssize_t ret = -EINVAL;
@@ -1828,8 +1881,10 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev,
mutex_lock(&q->blk_trace_mutex);
+ bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); if (attr == &dev_attr_enable) { - if (!!value == !!q->blk_trace) { + if (!!value == !!bt) { ret = 0; goto out_unlock_bdev; } @@ -1841,18 +1896,18 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, }
ret = 0; - if (q->blk_trace == NULL) + if (bt == NULL) ret = blk_trace_setup_queue(q, bdev);
if (ret == 0) { if (attr == &dev_attr_act_mask) - q->blk_trace->act_mask = value; + bt->act_mask = value; else if (attr == &dev_attr_pid) - q->blk_trace->pid = value; + bt->pid = value; else if (attr == &dev_attr_start_lba) - q->blk_trace->start_lba = value; + bt->start_lba = value; else if (attr == &dev_attr_end_lba) - q->blk_trace->end_lba = value; + bt->end_lba = value; }
out_unlock_bdev:
From: Cengiz Can cengiz@kernel.wtf
mainline inclusion from mainline-v5.6-rc4 commit 153031a301bb07194e9c37466cfce8eacb977621 category: bugfix bugzilla: 13690 CVE: CVE-2019-19768
-------------------------------------------------
There was a recent change in blktrace.c that added a RCU protection to `q->blk_trace` in order to fix a use-after-free issue during access.
However the change missed an edge case that can lead to dereferencing of `bt` pointer even when it's NULL:
Coverity static analyzer marked this as a FORWARD_NULL issue with CID 1460458.
``` /kernel/trace/blktrace.c: 1904 in sysfs_blk_trace_attr_store() 1898 ret = 0; 1899 if (bt == NULL) 1900 ret = blk_trace_setup_queue(q, bdev); 1901 1902 if (ret == 0) { 1903 if (attr == &dev_attr_act_mask)
CID 1460458: Null pointer dereferences (FORWARD_NULL) Dereferencing null pointer "bt".
1904 bt->act_mask = value; 1905 else if (attr == &dev_attr_pid) 1906 bt->pid = value; 1907 else if (attr == &dev_attr_start_lba) 1908 bt->start_lba = value; 1909 else if (attr == &dev_attr_end_lba) ```
Added a reassignment with RCU annotation to fix the issue.
Fixes: c780e86dd48 ("blktrace: Protect q->blk_trace with RCU") Cc: stable@vger.kernel.org Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bob Liu bob.liu@oracle.com Reviewed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Cengiz Can cengiz@kernel.wtf Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/trace/blktrace.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 3eb0d90..33f1cdc 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -1896,8 +1896,11 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, }
ret = 0; - if (bt == NULL) + if (bt == NULL) { ret = blk_trace_setup_queue(q, bdev); + bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); + }
if (ret == 0) { if (attr == &dev_attr_act_mask)
From: Zheng Bin zhengbin13@huawei.com
mainline inclusion from mainline-v5.6 commit d0c7feaf87678371c2c09b3709400be416b2dc62 category: bugfix bugzilla: 30215 CVE: NA
---------------------------
We recently used fuzz(hydra) to test XFS and automatically generate tmp.img(XFS v5 format, but some metadata is wrong)
xfs_repair information(just one AG): agf_freeblks 0, counted 3224 in ag 0 agf_longest 536874136, counted 3224 in ag 0 sb_fdblocks 613, counted 3228
Test as follows: mount tmp.img tmpdir cp file1M tmpdir sync
In 4.19-stable, sync will stuck, the reason is: xfs_mountfs xfs_check_summary_counts if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) return 0; -->just return, incore sb_fdblocks still be 613 xfs_initialize_perag_data
cp file1M tmpdir -->ok(write file to pagecache) sync -->stuck(write pagecache to disk) xfs_map_blocks xfs_iomap_write_allocate while (count_fsb != 0) { nimaps = 0; while (nimaps == 0) { --> endless loop nimaps = 1; xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again xfs_bmapi_write xfs_bmap_alloc xfs_bmap_btalloc xfs_alloc_vextent xfs_alloc_fix_freelist xfs_alloc_space_available -->fail(agf_freeblks is 0)
In linux-next, sync not stuck, cause commit c2b3164320b5 ("xfs: use the latest extent at writeback delalloc conversion time") remove the above while, dmesg is as follows: [ 55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.
Users do not know why this page is discard, the better soultion is: 1. Like xfs_repair, make sure sb_fdblocks is equal to counted (xfs_initialize_perag_data did this, who is not called at this mount) 2. Add agf verify, if fail, will tell users to repair
This patch use the second soultion.
Signed-off-by: Zheng Bin zhengbin13@huawei.com Signed-off-by: Ren Xudong renxudong1@huawei.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Zheng Bin zhengbin13@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 4be387d..4ec8235 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2596,6 +2596,13 @@ STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *, be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp))) return __this_address;
+ if (be32_to_cpu(agf->agf_length) > mp->m_sb.sb_dblocks) + return __this_address; + + if (be32_to_cpu(agf->agf_freeblks) < be32_to_cpu(agf->agf_longest) || + be32_to_cpu(agf->agf_freeblks) > be32_to_cpu(agf->agf_length)) + return __this_address; + if (be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]) < 1 || be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) < 1 || be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]) > XFS_BTREE_MAXLEVELS || @@ -2607,6 +2614,10 @@ STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *, be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)) return __this_address;
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb) && + be32_to_cpu(agf->agf_rmap_blocks) > be32_to_cpu(agf->agf_length)) + return __this_address; + /* * during growfs operations, the perag is not fully initialised, * so we can't use it for any useful checking. growfs ensures we can't @@ -2621,6 +2632,11 @@ STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *, return __this_address;
if (xfs_sb_version_hasreflink(&mp->m_sb) && + be32_to_cpu(agf->agf_refcount_blocks) > + be32_to_cpu(agf->agf_length)) + return __this_address; + + if (xfs_sb_version_hasreflink(&mp->m_sb) && (be32_to_cpu(agf->agf_refcount_level) < 1 || be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)) return __this_address;
From: Qu Wenruo wqu@suse.com
mainline inclusion from mainline-5.2-rc1 commit ff2ac107fae2440b6877c615c0ac788d2a106ed7 category: bugfix bugzilla: NA CVE: CVE-2019-19036 ---------------------------
Commit 1ba98d086fe3 ("Btrfs: detect corruption when non-root leaf has zero item") introduced comprehensive root owner checker.
However it's pretty expensive tree search to locate the owner root, especially when it get reused by mandatory read and write time tree-checker.
This patch will remove that check, and completely rely on owner based empty leaf check, which is much faster and still works fine for most case.
And since we skip the old root owner check, now write time tree check can be merged with btrfs_check_leaf_full().
Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Conflict: fs/btrfs/tree-checker.c Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/btrfs/tree-checker.c | 24 ------------------------ 1 file changed, 24 deletions(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 3ec712c..f1ae228 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -495,7 +495,6 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf, */ if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) { u64 owner = btrfs_header_owner(leaf); - struct btrfs_root *check_root;
/* These trees must never be empty */ if (owner == BTRFS_ROOT_TREE_OBJECTID || @@ -509,29 +508,6 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf, owner); return -EUCLEAN; } - key.objectid = owner; - key.type = BTRFS_ROOT_ITEM_KEY; - key.offset = (u64)-1; - - check_root = btrfs_get_fs_root(fs_info, &key, false); - /* - * The only reason we also check NULL here is that during - * open_ctree() some roots has not yet been set up. - */ - if (!IS_ERR_OR_NULL(check_root)) { - struct extent_buffer *eb; - - eb = btrfs_root_node(check_root); - /* if leaf is the root, then it's fine */ - if (leaf != eb) { - generic_err(fs_info, leaf, 0, - "invalid nritems, have %u should not be 0 for non-root leaf", - nritems); - free_extent_buffer(eb); - return -EUCLEAN; - } - free_extent_buffer(eb); - } return 0; }