August 2021 - Kernel - mailweb.openeuler.org

[PATCH kernel-4.19] device core: Consolidate locking and unlocking of parent and device
by Yang Yingliang 25 Aug '21

25 Aug '21

From: Alexander Duyck <alexander.h.duyck(a)linux.intel.com> mainline inclusion from mainline-v5.1-rc1 commit ed88747 category: bugfix bugzilla: 176200 CVE: NA ------------------------------------------------- Try to consolidate all of the locking and unlocking of both the parent and device when attaching or removing a driver from a given device. To do that I first consolidated the lock pattern into two functions __device_driver_lock and __device_driver_unlock. After doing that I then created functions specific to attaching and detaching the driver while acquiring these locks. By doing this I was able to reduce the number of spots where we touch need_parent_lock from 12 down to 4. This patch should produce no functional changes, it is meant to be a code clean-up/consolidation only. Reviewed-by: Luis Chamberlain <mcgrof(a)kernel.org> Reviewed-by: Bart Van Assche <bvanassche(a)acm.org> Reviewed-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck(a)linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> Reviewed-by: Hanjun Guo <guohanjun(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/base/base.h | 2 + drivers/base/bus.c | 23 ++--------- drivers/base/dd.c | 95 +++++++++++++++++++++++++++++++++++---------- 3 files changed, 81 insertions(+), 39 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index 559b047de9f75..2d270b8c731a0 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -128,6 +128,8 @@ extern int driver_add_groups(struct device_driver *drv, const struct attribute_group **groups); extern void driver_remove_groups(struct device_driver *drv, const struct attribute_group **groups); +int device_driver_attach(struct device_driver *drv, struct device *dev); +void device_driver_detach(struct device *dev); extern char *make_class_name(const char *name, struct kobject *kobj); diff --git a/drivers/base/bus.c b/drivers/base/bus.c index e06a57936cc96..38a09ca932a3b 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -187,11 +187,7 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf, dev = bus_find_device_by_name(bus, NULL, buf); if (dev && dev->driver == drv) { - if (dev->parent && dev->bus->need_parent_lock) - device_lock(dev->parent); - device_release_driver(dev); - if (dev->parent && dev->bus->need_parent_lock) - device_unlock(dev->parent); + device_driver_detach(dev); err = count; } put_device(dev); @@ -214,13 +210,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf, dev = bus_find_device_by_name(bus, NULL, buf); if (dev && dev->driver == NULL && driver_match_device(drv, dev)) { - if (dev->parent && bus->need_parent_lock) - device_lock(dev->parent); - device_lock(dev); - err = driver_probe_device(drv, dev); - device_unlock(dev); - if (dev->parent && bus->need_parent_lock) - device_unlock(dev->parent); + err = device_driver_attach(drv, dev); if (err > 0) { /* success */ @@ -774,13 +764,8 @@ EXPORT_SYMBOL_GPL(bus_rescan_devices); */ int device_reprobe(struct device *dev) { - if (dev->driver) { - if (dev->parent && dev->bus->need_parent_lock) - device_lock(dev->parent); - device_release_driver(dev); - if (dev->parent && dev->bus->need_parent_lock) - device_unlock(dev->parent); - } + if (dev->driver) + device_driver_detach(dev); return bus_rescan_devices_helper(dev, NULL); } EXPORT_SYMBOL_GPL(device_reprobe); diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 26ba7a99b7d5b..aca447eacdb2b 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -869,6 +869,64 @@ void device_initial_probe(struct device *dev) __device_attach(dev, true); } +/* + * __device_driver_lock - acquire locks needed to manipulate dev->drv + * @dev: Device we will update driver info for + * @parent: Parent device. Needed if the bus requires parent lock + * + * This function will take the required locks for manipulating dev->drv. + * Normally this will just be the @dev lock, but when called for a USB + * interface, @parent lock will be held as well. + */ +static void __device_driver_lock(struct device *dev, struct device *parent) +{ + if (parent && dev->bus->need_parent_lock) + device_lock(parent); + device_lock(dev); +} + +/* + * __device_driver_unlock - release locks needed to manipulate dev->drv + * @dev: Device we will update driver info for + * @parent: Parent device. Needed if the bus requires parent lock + * + * This function will release the required locks for manipulating dev->drv. + * Normally this will just be the the @dev lock, but when called for a + * USB interface, @parent lock will be released as well. + */ +static void __device_driver_unlock(struct device *dev, struct device *parent) +{ + device_unlock(dev); + if (parent && dev->bus->need_parent_lock) + device_unlock(parent); +} + +/** + * device_driver_attach - attach a specific driver to a specific device + * @drv: Driver to attach + * @dev: Device to attach it to + * + * Manually attach driver to a device. Will acquire both @dev lock and + * @dev->parent lock if needed. + */ +int device_driver_attach(struct device_driver *drv, struct device *dev) +{ + int ret = 0; + + __device_driver_lock(dev, dev->parent); + + /* + * If device has been removed or someone has already successfully + * bound a driver before us just skip the driver probe call. + */ + if (!dev->p->dead && !dev->driver) + ret = driver_probe_device(drv, dev); + + __device_driver_unlock(dev, dev->parent); + + return ret; +} + static int __driver_attach(struct device *dev, void *data) { struct device_driver *drv = data; @@ -896,14 +954,7 @@ static int __driver_attach(struct device *dev, void *data) return ret; } /* ret > 0 means positive match */ - if (dev->parent && dev->bus->need_parent_lock) - device_lock(dev->parent); - device_lock(dev); - if (!dev->p->dead && !dev->driver) - driver_probe_device(drv, dev); - device_unlock(dev); - if (dev->parent && dev->bus->need_parent_lock) - device_unlock(dev->parent); + device_driver_attach(drv, dev); return 0; } @@ -936,15 +987,11 @@ static void __device_release_driver(struct device *dev, struct device *parent) pm_runtime_get_sync(dev); while (device_links_busy(dev)) { - device_unlock(dev); - if (parent && dev->bus->need_parent_lock) - device_unlock(parent); + __device_driver_unlock(dev, parent); device_links_unbind_consumers(dev); - if (parent && dev->bus->need_parent_lock) - device_lock(parent); - device_lock(dev); + __device_driver_lock(dev, parent); /* * A concurrent invocation of the same function might * have released the driver successfully while this one @@ -998,16 +1045,12 @@ void device_release_driver_internal(struct device *dev, struct device_driver *drv, struct device *parent) { - if (parent && dev->bus->need_parent_lock) - device_lock(parent); + __device_driver_lock(dev, parent); - device_lock(dev); if (!drv || drv == dev->driver) __device_release_driver(dev, parent); - device_unlock(dev); - if (parent && dev->bus->need_parent_lock) - device_unlock(parent); + __device_driver_unlock(dev, parent); } /** @@ -1032,6 +1075,18 @@ void device_release_driver(struct device *dev) } EXPORT_SYMBOL_GPL(device_release_driver); +/** + * device_driver_detach - detach driver from a specific device + * @dev: device to detach driver from + * + * Detach driver from device. Will acquire both @dev lock and @dev->parent + * lock if needed. + */ +void device_driver_detach(struct device *dev) +{ + device_release_driver_internal(dev, NULL, dev->parent); +} + /** * driver_detach - detach driver from all devices it controls. * @drv: driver. -- 2.25.1

1 0

[PATCH openEuler-1.0-LTS 1/2] ext4: make the updating inode data procedure atomic
by Yang Yingliang 25 Aug '21

25 Aug '21

From: Zhang Yi <yi.zhang(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I476C7 CVE: NA --------------------------- Now that ext4_do_update_inode() return error before filling the whole inode data if we fail to set inode blocks in ext4_inode_blocks_set(). This error should never happen in theory since sb->s_maxbytes should not have allowed this, we have already init sb->s_maxbytes according to this feature in ext4_fill_super(). So even through that could only happen due to the filesystem corruption, we'd better to return after we finish updating the inode because it may left an uninitialized buffer and we could read this buffer later in "errors=continue" mode. This patch make the updating inode data procedure atomic, call EXT4_ERROR_INODE() after we dropping i_raw_lock after something bad happened, make sure that the inode is integrated, and also drop a BUG_ON and do some small cleanups. Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Reviewed-by: Yang Erkun <yangerkun(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/ext4/inode.c | 44 ++++++++++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 16 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b809d383cc5ae..a032f211b80cf 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5169,8 +5169,14 @@ static int ext4_inode_blocks_set(handle_t *handle, ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE); return 0; } + + /* + * This should never happen since sb->s_maxbytes should not have + * allowed this, sb->s_maxbytes was set according to the huge_file + * feature in ext4_fill_super(). + */ if (!ext4_has_feature_huge_file(sb)) - return -EFBIG; + return -EFSCORRUPTED; if (i_blocks <= 0xffffffffffffULL) { /* @@ -5277,16 +5283,14 @@ static int ext4_do_update_inode(handle_t *handle, spin_lock(&ei->i_raw_lock); - /* For fields not tracked in the in-memory inode, - * initialise them to zero for new inodes. */ + /* + * For fields not tracked in the in-memory inode, initialise them + * to zero for new inodes. + */ if (ext4_test_inode_state(inode, EXT4_STATE_NEW)) memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size); err = ext4_inode_blocks_set(handle, raw_inode, ei); - if (err) { - spin_unlock(&ei->i_raw_lock); - goto out_brelse; - } raw_inode->i_mode = cpu_to_le16(inode->i_mode); i_uid = i_uid_read(inode); @@ -5295,10 +5299,11 @@ static int ext4_do_update_inode(handle_t *handle, if (!(test_opt(inode->i_sb, NO_UID32))) { raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid)); raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid)); -/* - * Fix up interoperability with old kernels. Otherwise, old inodes get - * re-used with the upper 16 bits of the uid/gid intact - */ + /* + * Fix up interoperability with old kernels. Otherwise, + * old inodes get re-used with the upper 16 bits of the + * uid/gid intact. + */ if (ei->i_dtime && list_empty(&ei->i_orphan)) { raw_inode->i_uid_high = 0; raw_inode->i_gid_high = 0; @@ -5367,8 +5372,9 @@ static int ext4_do_update_inode(handle_t *handle, } } - BUG_ON(!ext4_has_feature_project(inode->i_sb) && - i_projid != EXT4_DEF_PROJID); + if (i_projid != EXT4_DEF_PROJID && + !ext4_has_feature_project(inode->i_sb)) + err = err ?: -EFSCORRUPTED; if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE && EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)) @@ -5376,6 +5382,11 @@ static int ext4_do_update_inode(handle_t *handle, ext4_inode_csum_set(inode, raw_inode, ei); spin_unlock(&ei->i_raw_lock); + if (err) { + EXT4_ERROR_INODE(inode, "corrupted inode contents"); + goto out_brelse; + } + if (inode->i_sb->s_flags & SB_LAZYTIME) ext4_update_other_inodes_time(inode->i_sb, inode->i_ino, bh->b_data); @@ -5383,13 +5394,13 @@ static int ext4_do_update_inode(handle_t *handle, BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); err = ext4_handle_dirty_metadata(handle, NULL, bh); if (err) - goto out_brelse; + goto out_error; ext4_clear_inode_state(inode, EXT4_STATE_NEW); if (set_large_file) { BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access"); err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); if (err) - goto out_brelse; + goto out_error; lock_buffer(EXT4_SB(sb)->s_sbh); ext4_set_feature_large_file(sb); ext4_superblock_csum_set(sb); @@ -5399,9 +5410,10 @@ static int ext4_do_update_inode(handle_t *handle, EXT4_SB(sb)->s_sbh); } ext4_update_inode_fsync_trans(handle, inode, need_datasync); +out_error: + ext4_std_error(inode->i_sb, err); out_brelse: brelse(bh); - ext4_std_error(inode->i_sb, err); return err; } -- 2.25.1

1 1

[PATCH kernel-4.19 1/3] ext4: move inode eio simulation behind io completeion
by Yang Yingliang 25 Aug '21

25 Aug '21

From: Zhang Yi <yi.zhang(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I476C7 CVE: NA --------------------------- No EIO simulation is required if the buffer is uptodate, so move the simulation behind read bio completeion just like inode/block bitmap simulation does. Link: https://lore.kernel.org/linux-ext4/20210821065450.1397451-2-yi.zhang@huawei… Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Reviewed-by: Yang Erkun <yangerkun(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/ext4/inode.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 800e6e3de40aa..1b2ccf61354f3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4635,8 +4635,6 @@ static int __ext4_get_inode_loc(struct inode *inode, bh = sb_getblk(sb, block); if (unlikely(!bh)) return -ENOMEM; - if (ext4_simulate_fail(sb, EXT4_SIM_INODE_EIO)) - goto simulate_eio; if (!buffer_uptodate(bh)) { lock_buffer(bh); @@ -4721,8 +4719,8 @@ static int __ext4_get_inode_loc(struct inode *inode, trace_ext4_load_inode(inode); ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL); wait_on_buffer(bh); + ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO); if (!buffer_uptodate(bh)) { - simulate_eio: ext4_error_inode_block(inode, block, EIO, "unable to read itable block"); brelse(bh); -- 2.25.1

1 2

[PATCH openEuler-1.0-LTS 1/4] Bluetooth: defer cleanup of resources in hci_unregister_dev()
by Yang Yingliang 25 Aug '21

25 Aug '21

From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp> stable inclusion from linux-4.19.203 commit 3719acc161d5c1ce09912cc1c9eddc2c5faa3c66 -------------------------------- [ Upstream commit e04480920d1eec9c061841399aa6f35b6f987d8b ] syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to calling lock_sock() with rw spinlock held [1]. It seems that history of this locking problem is a trial and error. Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock() as an attempt to fix lockdep warning. Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the sleep in atomic context warning. Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from hci_sock.c") in 3.3-rc1 removed local_bh_disable(). Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to lock_sock() as an attempt to fix CVE-2021-3573. This difficulty comes from current implementation that hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all references from sockets because hci_unregister_dev() immediately reclaims resources as soon as returning from hci_sock_dev_event(HCI_DEV_UNREG). But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not doing what it should do. Therefore, instead of trying to detach sockets from device, let's accept not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG), by moving actual cleanup of resources from hci_unregister_dev() to hci_cleanup_dev() which is called by bt_host_release() when all references to this unregistered device (which is a kobject) are gone. Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets hci_pi(sk)->hdev, we need to check whether this device was unregistered and return an error based on HCI_UNREGISTER flag. There might be subtle behavioral difference in "monitor the hdev" functionality; please report if you found something went wrong due to this patch. Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1] Reported-by: syzbot <syzbot+a5df189917e79d5e59c9(a)syzkaller.appspotmail.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") Acked-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- include/net/bluetooth/hci_core.h | 1 + net/bluetooth/hci_core.c | 16 +++++------ net/bluetooth/hci_sock.c | 49 +++++++++++++++++++++----------- net/bluetooth/hci_sysfs.c | 3 ++ 4 files changed, 45 insertions(+), 24 deletions(-) diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index d8ea8b5ab3572..6960997854255 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1042,6 +1042,7 @@ struct hci_dev *hci_alloc_dev(void); void hci_free_dev(struct hci_dev *hdev); int hci_register_dev(struct hci_dev *hdev); void hci_unregister_dev(struct hci_dev *hdev); +void hci_cleanup_dev(struct hci_dev *hdev); int hci_suspend_dev(struct hci_dev *hdev); int hci_resume_dev(struct hci_dev *hdev); int hci_reset_dev(struct hci_dev *hdev); diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index fb47fe995cefd..94d64cdc8af62 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3259,14 +3259,10 @@ EXPORT_SYMBOL(hci_register_dev); /* Unregister HCI device */ void hci_unregister_dev(struct hci_dev *hdev) { - int id; - BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus); hci_dev_set_flag(hdev, HCI_UNREGISTER); - id = hdev->id; - write_lock(&hci_dev_list_lock); list_del(&hdev->list); write_unlock(&hci_dev_list_lock); @@ -3295,7 +3291,14 @@ void hci_unregister_dev(struct hci_dev *hdev) } device_del(&hdev->dev); + /* Actual cleanup is deferred until hci_cleanup_dev(). */ + hci_dev_put(hdev); +} +EXPORT_SYMBOL(hci_unregister_dev); +/* Cleanup HCI device */ +void hci_cleanup_dev(struct hci_dev *hdev) +{ debugfs_remove_recursive(hdev->debugfs); kfree_const(hdev->hw_info); kfree_const(hdev->fw_info); @@ -3318,11 +3321,8 @@ void hci_unregister_dev(struct hci_dev *hdev) hci_discovery_filter_clear(hdev); hci_dev_unlock(hdev); - hci_dev_put(hdev); - - ida_simple_remove(&hci_index_ida, id); + ida_simple_remove(&hci_index_ida, hdev->id); } -EXPORT_SYMBOL(hci_unregister_dev); /* Suspend HCI device */ int hci_suspend_dev(struct hci_dev *hdev) diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 78788e52a0399..45c12639bdc1c 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -59,6 +59,17 @@ struct hci_pinfo { char comm[TASK_COMM_LEN]; }; +static struct hci_dev *hci_hdev_from_sock(struct sock *sk) +{ + struct hci_dev *hdev = hci_pi(sk)->hdev; + + if (!hdev) + return ERR_PTR(-EBADFD); + if (hci_dev_test_flag(hdev, HCI_UNREGISTER)) + return ERR_PTR(-EPIPE); + return hdev; +} + void hci_sock_set_flag(struct sock *sk, int nr) { set_bit(nr, &hci_pi(sk)->flags); @@ -752,19 +763,13 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) if (event == HCI_DEV_UNREG) { struct sock *sk; - /* Detach sockets from device */ + /* Wake up sockets using this dead device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { - hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; - sk->sk_state = BT_OPEN; sk->sk_state_change(sk); - - hci_dev_put(hdev); } - release_sock(sk); } read_unlock(&hci_sk_list.lock); } @@ -920,10 +925,10 @@ static int hci_sock_blacklist_del(struct hci_dev *hdev, void __user *arg) static int hci_sock_bound_ioctl(struct sock *sk, unsigned int cmd, unsigned long arg) { - struct hci_dev *hdev = hci_pi(sk)->hdev; + struct hci_dev *hdev = hci_hdev_from_sock(sk); - if (!hdev) - return -EBADFD; + if (IS_ERR(hdev)) + return PTR_ERR(hdev); if (hci_dev_test_flag(hdev, HCI_USER_CHANNEL)) return -EBUSY; @@ -1077,6 +1082,18 @@ static int hci_sock_bind(struct socket *sock, struct sockaddr *addr, lock_sock(sk); + /* Allow detaching from dead device and attaching to alive device, if + * the caller wants to re-bind (instead of close) this socket in + * response to hci_sock_dev_event(HCI_DEV_UNREG) notification. + */ + hdev = hci_pi(sk)->hdev; + if (hdev && hci_dev_test_flag(hdev, HCI_UNREGISTER)) { + hci_pi(sk)->hdev = NULL; + sk->sk_state = BT_OPEN; + hci_dev_put(hdev); + } + hdev = NULL; + if (sk->sk_state == BT_BOUND) { err = -EALREADY; goto done; @@ -1353,9 +1370,9 @@ static int hci_sock_getname(struct socket *sock, struct sockaddr *addr, lock_sock(sk); - hdev = hci_pi(sk)->hdev; - if (!hdev) { - err = -EBADFD; + hdev = hci_hdev_from_sock(sk); + if (IS_ERR(hdev)) { + err = PTR_ERR(hdev); goto done; } @@ -1715,9 +1732,9 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, goto done; } - hdev = hci_pi(sk)->hdev; - if (!hdev) { - err = -EBADFD; + hdev = hci_hdev_from_sock(sk); + if (IS_ERR(hdev)) { + err = PTR_ERR(hdev); goto done; } diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index 9874844a95a98..b69d88b88d2e4 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -83,6 +83,9 @@ void hci_conn_del_sysfs(struct hci_conn *conn) static void bt_host_release(struct device *dev) { struct hci_dev *hdev = to_hci_dev(dev); + + if (hci_dev_test_flag(hdev, HCI_UNREGISTER)) + hci_cleanup_dev(hdev); kfree(hdev); module_put(THIS_MODULE); } -- 2.25.1

1 3

[PATCH kernel-4.19 1/3] Bluetooth: schedule SCO timeouts with delayed_work
by Yang Yingliang 25 Aug '21

25 Aug '21

From: Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com> mainline inclusion from mainline-v5.15 commit ba316be1b6a00db7126ed9a39f9bee434a508043 category: bugfix bugzilla: NA CVE: CVE-2021-3640 --------------------------- struct sock.sk_timer should be used as a sock cleanup timer. However, SCO uses it to implement sock timeouts. This causes issues because struct sock.sk_timer's callback is run in an IRQ context, and the timer callback function sco_sock_timeout takes a spin lock on the socket. However, other functions such as sco_conn_del and sco_conn_ready take the spin lock with interrupts enabled. This inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock usage could lead to deadlocks as reported by Syzbot [1]: CPU0 ---- lock(slock-AF_BLUETOOTH-BTPROTO_SCO); <Interrupt> lock(slock-AF_BLUETOOTH-BTPROTO_SCO); To fix this, we use delayed work to implement SCO sock timouts instead. This allows us to avoid taking the spin lock on the socket in an IRQ context, and corrects the misuse of struct sock.sk_timer. As a note, cancel_delayed_work is used instead of cancel_delayed_work_sync in sco_sock_set_timer and sco_sock_clear_timer to avoid a deadlock. In the future, the call to bh_lock_sock inside sco_sock_timeout should be changed to lock_sock to synchronize with other functions using lock_sock. However, since sco_sock_set_timer and sco_sock_clear_timer are sometimes called under the locked socket (in sco_connect and __sco_sock_close), cancel_delayed_work_sync might cause them to sleep until an sco_sock_timeout that has started finishes running. But sco_sock_timeout would also sleep until it can grab the lock_sock. Using cancel_delayed_work is fine because sco_sock_timeout does not change from run to run, hence there is no functional difference between: 1. waiting for a timeout to finish running before scheduling another timeout 2. scheduling another timeout while a timeout is running. Link: https://syzkaller.appspot.com/bug?id=9089d89de0502e120f234ca0fc8a703f7368b3… [1] Reported-by: syzbot+2f6d7c28bb4bf7e82060(a)syzkaller.appspotmail.com Tested-by: syzbot+2f6d7c28bb4bf7e82060(a)syzkaller.appspotmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- net/bluetooth/sco.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index a4ca55df73908..e30151e81566f 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -48,6 +48,8 @@ struct sco_conn { spinlock_t lock; struct sock *sk; + struct delayed_work timeout_work; + unsigned int mtu; }; @@ -73,9 +75,20 @@ struct sco_pinfo { #define SCO_CONN_TIMEOUT (HZ * 40) #define SCO_DISCONN_TIMEOUT (HZ * 2) -static void sco_sock_timeout(struct timer_list *t) +static void sco_sock_timeout(struct work_struct *work) { - struct sock *sk = from_timer(sk, t, sk_timer); + struct sco_conn *conn = container_of(work, struct sco_conn, + timeout_work.work); + struct sock *sk; + + sco_conn_lock(conn); + sk = conn->sk; + if (sk) + sock_hold(sk); + sco_conn_unlock(conn); + + if (!sk) + return; BT_DBG("sock %p state %d", sk, sk->sk_state); @@ -90,14 +103,21 @@ static void sco_sock_timeout(struct timer_list *t) static void sco_sock_set_timer(struct sock *sk, long timeout) { + if (!sco_pi(sk)->conn) + return; + BT_DBG("sock %p state %d timeout %ld", sk, sk->sk_state, timeout); - sk_reset_timer(sk, &sk->sk_timer, jiffies + timeout); + cancel_delayed_work(&sco_pi(sk)->conn->timeout_work); + schedule_delayed_work(&sco_pi(sk)->conn->timeout_work, timeout); } static void sco_sock_clear_timer(struct sock *sk) { + if (!sco_pi(sk)->conn) + return; + BT_DBG("sock %p state %d", sk, sk->sk_state); - sk_stop_timer(sk, &sk->sk_timer); + cancel_delayed_work(&sco_pi(sk)->conn->timeout_work); } /* ---- SCO connections ---- */ @@ -178,6 +198,9 @@ static void sco_conn_del(struct hci_conn *hcon, int err) bh_unlock_sock(sk); sco_sock_kill(sk); sock_put(sk); + + /* Ensure no more work items will run before freeing conn. */ + cancel_delayed_work_sync(&conn->timeout_work); } hcon->sco_data = NULL; @@ -192,6 +215,8 @@ static void __sco_chan_add(struct sco_conn *conn, struct sock *sk, sco_pi(sk)->conn = conn; conn->sk = sk; + INIT_DELAYED_WORK(&conn->timeout_work, sco_sock_timeout); + if (parent) bt_accept_enqueue(parent, sk, true); } @@ -488,8 +513,6 @@ static struct sock *sco_sock_alloc(struct net *net, struct socket *sock, sco_pi(sk)->setting = BT_VOICE_CVSD_16BIT; - timer_setup(&sk->sk_timer, sco_sock_timeout, 0); - bt_sock_link(&sco_sk_list, sk); return sk; } -- 2.25.1

1 2

[PATCH openEuler-1.0-LTS 1/5] mm: rmap: explicitly reset vma->anon_vma in unlink_anon_vmas()
by Yang Yingliang 24 Aug '21

24 Aug '21

From: Li Xinhai <lixinhai.lxh(a)gmail.com> mainline inclusion from mainline-v5.12-rc1 commit ee8ab1903e3d912d8f10bedbf96c3b6a1c8cbede category: bugfix bugzilla: 175120 CVE: NA ------------------------------------------------- In case the vma will continue to be used after unlink its relevant anon_vma, we need to reset the vma->anon_vma pointer to NULL. So, later when fault happen within this vma again, a new anon_vma will be prepared. By this way, the vma will only be checked for reverse mapping of pages which been fault in after the unlink_anon_vmas call. Currently, the mremap with MREMAP_DONTUNMAP scenario will continue use the vma after moved its page table entries to a new vma. For other scenarios, the vma itself will be freed after call unlink_anon_vmas. Link: https://lkml.kernel.org/r/20210119075126.3513154-1-lixinhai.lxh@gmail.com Signed-off-by: Li Xinhai <lixinhai.lxh(a)gmail.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Brian Geffon <bgeffon(a)google.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Lokesh Gidra <lokeshgidra(a)google.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> Reviewed-by: tong tiangen <tongtiangen(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/rmap.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 738e07ee35345..7debdf0cc6785 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -406,8 +406,15 @@ void unlink_anon_vmas(struct vm_area_struct *vma) list_del(&avc->same_vma); anon_vma_chain_free(avc); } - if (vma->anon_vma) + if (vma->anon_vma) { vma->anon_vma->degree--; + + /* + * vma would still be needed after unlink, and anon_vma will be prepared + * when handle fault. + */ + vma->anon_vma = NULL; + } unlock_anon_vma_root(root); /* -- 2.25.1

1 4

[PATCH kernel-4.19 1/5] mm: rmap: explicitly reset vma->anon_vma in unlink_anon_vmas()
by Yang Yingliang 24 Aug '21

24 Aug '21

From: Li Xinhai <lixinhai.lxh(a)gmail.com> mainline inclusion from mainline-v5.12-rc1 commit ee8ab1903e3d912d8f10bedbf96c3b6a1c8cbede category: bugfix bugzilla: 175120 CVE: NA ------------------------------------------------- In case the vma will continue to be used after unlink its relevant anon_vma, we need to reset the vma->anon_vma pointer to NULL. So, later when fault happen within this vma again, a new anon_vma will be prepared. By this way, the vma will only be checked for reverse mapping of pages which been fault in after the unlink_anon_vmas call. Currently, the mremap with MREMAP_DONTUNMAP scenario will continue use the vma after moved its page table entries to a new vma. For other scenarios, the vma itself will be freed after call unlink_anon_vmas. Link: https://lkml.kernel.org/r/20210119075126.3513154-1-lixinhai.lxh@gmail.com Signed-off-by: Li Xinhai <lixinhai.lxh(a)gmail.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Brian Geffon <bgeffon(a)google.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Lokesh Gidra <lokeshgidra(a)google.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> Reviewed-by: tong tiangen <tongtiangen(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- mm/rmap.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 738e07ee35345..7debdf0cc6785 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -406,8 +406,15 @@ void unlink_anon_vmas(struct vm_area_struct *vma) list_del(&avc->same_vma); anon_vma_chain_free(avc); } - if (vma->anon_vma) + if (vma->anon_vma) { vma->anon_vma->degree--; + + /* + * vma would still be needed after unlink, and anon_vma will be prepared + * when handle fault. + */ + vma->anon_vma = NULL; + } unlock_anon_vma_root(root); /* -- 2.25.1

1 4

[PATCH openEuler-21.09] X86/config: Enable CONFIG_USERSWAP
by Zheng Zengkai 24 Aug '21

24 Aug '21

From: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40AXF CVE: NA -------------------------------------- Enable CONFIG_USERSWAP for openeuler_defconfig Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> Reviewed-by: tong tiangen <tongtiangen(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- arch/x86/configs/openeuler_defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index e2a7fda97fa3..89dc79bd375d 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -8510,3 +8510,4 @@ CONFIG_ARCH_HAS_KCOV=y # end of Kernel hacking CONFIG_ETMEM_SCAN=m CONFIG_ETMEM_SWAP=m +CONFIG_USERSWAP=y -- 2.20.1

1 0

[PATCH openEuler-1.0-LTS 1/4] bpf/verifier: per-register parent pointers
by Yang Yingliang 24 Aug '21

24 Aug '21

From: Edward Cree <ecree(a)solarflare.com> mainline inclusion from mainline-v4.20-rc1 commit 679c782de14bd48c19dd74cd1af20a2bc05dd936 category: feature bugzilla: 43460 CVE: NA --------------------------------------- By giving each register its own liveness chain, we elide the skip_callee() logic. Instead, each register's parent is the state it inherits from; both check_func_call() and prepare_func_exit() automatically connect reg states to the correct chain since when they copy the reg state across (r1-r5 into the callee as args, and r0 out as the return value) they also copy the parent pointer. Signed-off-by: Edward Cree <ecree(a)solarflare.com> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Conflicts: kernel/bpf/verifier.c [liuxin:solve the conflicts in verifier.c] Signed-off-by: liuxin <liuxin264(a)huawei.com> Reviewed-by: Cheng Jian <cj.chengjian(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Wei Yongjun <weiyongjun1(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- include/linux/bpf_verifier.h | 8 +- kernel/bpf/verifier.c | 183 +++++++++-------------------------- 2 files changed, 47 insertions(+), 144 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 1c8517320ea64..daab0960c0544 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -41,6 +41,7 @@ enum bpf_reg_liveness { }; struct bpf_reg_state { + /* Ordering of fields matters. See states_equal() */ enum bpf_reg_type type; union { /* valid when type == PTR_TO_PACKET */ @@ -62,7 +63,6 @@ struct bpf_reg_state { * came from, when one is tested for != NULL. */ u32 id; - /* Ordering of fields matters. See states_equal() */ /* For scalar types (SCALAR_VALUE), this represents our knowledge of * the actual value. * For pointer types, this represents the variable part of the offset @@ -79,15 +79,15 @@ struct bpf_reg_state { s64 smax_value; /* maximum possible (s64)value */ u64 umin_value; /* minimum possible (u64)value */ u64 umax_value; /* maximum possible (u64)value */ + /* parentage chain for liveness checking */ + struct bpf_reg_state *parent; /* Inside the callee two registers can be both PTR_TO_STACK like * R1=fp-8 and R2=fp-8, but one of them points to this function stack * while another to the caller's stack. To differentiate them 'frameno' * is used which is an index in bpf_verifier_state->frame[] array * pointing to bpf_func_state. - * This field must be second to last, for states_equal() reasons. */ u32 frameno; - /* This field must be last, for states_equal() reasons. */ enum bpf_reg_liveness live; }; @@ -110,7 +110,6 @@ struct bpf_stack_state { */ struct bpf_func_state { struct bpf_reg_state regs[MAX_BPF_REG]; - struct bpf_verifier_state *parent; /* index of call instruction that called into this func */ int callsite; /* stack frame number of this function state from pov of @@ -132,7 +131,6 @@ struct bpf_func_state { struct bpf_verifier_state { /* call stack tracking */ struct bpf_func_state *frame[MAX_CALL_FRAMES]; - struct bpf_verifier_state *parent; u32 curframe; bool speculative; }; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 650cec781f5a8..0bb6664ced7e1 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -381,9 +381,9 @@ static int copy_stack_state(struct bpf_func_state *dst, /* do_check() starts with zero-sized stack in struct bpf_verifier_state to * make it consume minimal amount of memory. check_stack_write() access from * the program calls into realloc_func_state() to grow the stack size. - * Note there is a non-zero 'parent' pointer inside bpf_verifier_state - * which this function copies over. It points to previous bpf_verifier_state - * which is never reallocated + * Note there is a non-zero parent pointer inside each reg of bpf_verifier_state + * which this function copies over. It points to corresponding reg in previous + * bpf_verifier_state which is never reallocated */ static int realloc_func_state(struct bpf_func_state *state, int size, bool copy_old) @@ -468,7 +468,6 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, } dst_state->speculative = src->speculative; dst_state->curframe = src->curframe; - dst_state->parent = src->parent; for (i = 0; i <= src->curframe; i++) { dst = dst_state->frame[i]; if (!dst) { @@ -740,6 +739,7 @@ static void init_reg_state(struct bpf_verifier_env *env, for (i = 0; i < MAX_BPF_REG; i++) { mark_reg_not_init(env, regs, i); regs[i].live = REG_LIVE_NONE; + regs[i].parent = NULL; } /* frame pointer */ @@ -884,74 +884,21 @@ static int check_subprogs(struct bpf_verifier_env *env) return 0; } -static -struct bpf_verifier_state *skip_callee(struct bpf_verifier_env *env, - const struct bpf_verifier_state *state, - struct bpf_verifier_state *parent, - u32 regno) -{ - struct bpf_verifier_state *tmp = NULL; - - /* 'parent' could be a state of caller and - * 'state' could be a state of callee. In such case - * parent->curframe < state->curframe - * and it's ok for r1 - r5 registers - * - * 'parent' could be a callee's state after it bpf_exit-ed. - * In such case parent->curframe > state->curframe - * and it's ok for r0 only - */ - if (parent->curframe == state->curframe || - (parent->curframe < state->curframe && - regno >= BPF_REG_1 && regno <= BPF_REG_5) || - (parent->curframe > state->curframe && - regno == BPF_REG_0)) - return parent; - - if (parent->curframe > state->curframe && - regno >= BPF_REG_6) { - /* for callee saved regs we have to skip the whole chain - * of states that belong to callee and mark as LIVE_READ - * the registers before the call - */ - tmp = parent; - while (tmp && tmp->curframe != state->curframe) { - tmp = tmp->parent; - } - if (!tmp) - goto bug; - parent = tmp; - } else { - goto bug; - } - return parent; -bug: - verbose(env, "verifier bug regno %d tmp %p\n", regno, tmp); - verbose(env, "regno %d parent frame %d current frame %d\n", - regno, parent->curframe, state->curframe); - return NULL; -} - +/* Parentage chain of this register (or stack slot) should take care of all + * issues like callee-saved registers, stack slot allocation time, etc. + */ static int mark_reg_read(struct bpf_verifier_env *env, - const struct bpf_verifier_state *state, - struct bpf_verifier_state *parent, - u32 regno) + const struct bpf_reg_state *state, + struct bpf_reg_state *parent) { bool writes = parent == state->parent; /* Observe write marks */ - if (regno == BPF_REG_FP) - /* We don't need to worry about FP liveness because it's read-only */ - return 0; - while (parent) { /* if read wasn't screened by an earlier write ... */ - if (writes && state->frame[state->curframe]->regs[regno].live & REG_LIVE_WRITTEN) + if (writes && state->live & REG_LIVE_WRITTEN) break; - parent = skip_callee(env, state, parent, regno); - if (!parent) - return -EFAULT; /* ... then we depend on parent's value */ - parent->frame[parent->curframe]->regs[regno].live |= REG_LIVE_READ; + parent->live |= REG_LIVE_READ; state = parent; parent = state->parent; writes = true; @@ -977,7 +924,10 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno, verbose(env, "R%d !read_ok\n", regno); return -EACCES; } - return mark_reg_read(env, vstate, vstate->parent, regno); + /* We don't need to worry about FP liveness because it's read-only */ + if (regno != BPF_REG_FP) + return mark_reg_read(env, &regs[regno], + regs[regno].parent); } else { /* check whether register used as dest operand can be written to */ if (regno == BPF_REG_FP) { @@ -1088,8 +1038,8 @@ static int check_stack_write(struct bpf_verifier_env *env, } else { u8 type = STACK_MISC; - /* regular write of data into stack */ - state->stack[spi].spilled_ptr = (struct bpf_reg_state) {}; + /* regular write of data into stack destroys any spilled ptr */ + state->stack[spi].spilled_ptr.type = NOT_INIT; /* only mark the slot as written if all 8 bytes were written * otherwise read propagation may incorrectly stop too soon @@ -1114,61 +1064,6 @@ static int check_stack_write(struct bpf_verifier_env *env, return 0; } -/* registers of every function are unique and mark_reg_read() propagates - * the liveness in the following cases: - * - from callee into caller for R1 - R5 that were used as arguments - * - from caller into callee for R0 that used as result of the call - * - from caller to the same caller skipping states of the callee for R6 - R9, - * since R6 - R9 are callee saved by implicit function prologue and - * caller's R6 != callee's R6, so when we propagate liveness up to - * parent states we need to skip callee states for R6 - R9. - * - * stack slot marking is different, since stacks of caller and callee are - * accessible in both (since caller can pass a pointer to caller's stack to - * callee which can pass it to another function), hence mark_stack_slot_read() - * has to propagate the stack liveness to all parent states at given frame number. - * Consider code: - * f1() { - * ptr = fp - 8; - * *ptr = ctx; - * call f2 { - * .. = *ptr; - * } - * .. = *ptr; - * } - * First *ptr is reading from f1's stack and mark_stack_slot_read() has - * to mark liveness at the f1's frame and not f2's frame. - * Second *ptr is also reading from f1's stack and mark_stack_slot_read() has - * to propagate liveness to f2 states at f1's frame level and further into - * f1 states at f1's frame level until write into that stack slot - */ -static void mark_stack_slot_read(struct bpf_verifier_env *env, - const struct bpf_verifier_state *state, - struct bpf_verifier_state *parent, - int slot, int frameno) -{ - bool writes = parent == state->parent; /* Observe write marks */ - - while (parent) { - if (parent->frame[frameno]->allocated_stack <= slot * BPF_REG_SIZE) - /* since LIVE_WRITTEN mark is only done for full 8-byte - * write the read marks are conservative and parent - * state may not even have the stack allocated. In such case - * end the propagation, since the loop reached beginning - * of the function - */ - break; - /* if read wasn't screened by an earlier write ... */ - if (writes && state->frame[frameno]->stack[slot].spilled_ptr.live & REG_LIVE_WRITTEN) - break; - /* ... then we depend on parent's value */ - parent->frame[frameno]->stack[slot].spilled_ptr.live |= REG_LIVE_READ; - state = parent; - parent = state->parent; - writes = true; - } -} - static int check_stack_read(struct bpf_verifier_env *env, struct bpf_func_state *reg_state /* func where register points to */, int off, int size, int value_regno) @@ -1206,8 +1101,8 @@ static int check_stack_read(struct bpf_verifier_env *env, */ state->regs[value_regno].live |= REG_LIVE_WRITTEN; } - mark_stack_slot_read(env, vstate, vstate->parent, spi, - reg_state->frameno); + mark_reg_read(env, &reg_state->stack[spi].spilled_ptr, + reg_state->stack[spi].spilled_ptr.parent); return 0; } else { int zeros = 0; @@ -1223,8 +1118,8 @@ static int check_stack_read(struct bpf_verifier_env *env, off, i, size); return -EACCES; } - mark_stack_slot_read(env, vstate, vstate->parent, spi, - reg_state->frameno); + mark_reg_read(env, &reg_state->stack[spi].spilled_ptr, + reg_state->stack[spi].spilled_ptr.parent); if (value_regno >= 0) { if (zeros == size) { /* any size read into register is zero extended, @@ -1958,8 +1853,8 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno, /* reading any byte out of 8-byte 'spill_slot' will cause * the whole slot to be marked as 'read' */ - mark_stack_slot_read(env, env->cur_state, env->cur_state->parent, - spi, state->frameno); + mark_reg_read(env, &state->stack[spi].spilled_ptr, + state->stack[spi].spilled_ptr.parent); } return update_stack_depth(env, state, off); } @@ -2415,11 +2310,13 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, state->curframe + 1 /* frameno within this callchain */, subprog /* subprog number within this prog */); - /* copy r1 - r5 args that callee can access */ + /* copy r1 - r5 args that callee can access. The copy includes parent + * pointers, which connects us up to the liveness chain + */ for (i = BPF_REG_1; i <= BPF_REG_5; i++) callee->regs[i] = caller->regs[i]; - /* after the call regsiters r0 - r5 were scratched */ + /* after the call registers r0 - r5 were scratched */ for (i = 0; i < CALLER_SAVED_REGS; i++) { mark_reg_not_init(env, caller->regs, caller_saved[i]); check_reg_arg(env, caller_saved[i], DST_OP_NO_MARK); @@ -5058,7 +4955,7 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur, /* explored state didn't use this */ return true; - equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, frameno)) == 0; + equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, parent)) == 0; if (rold->type == PTR_TO_STACK) /* two stack pointers are equal only if they're pointing to @@ -5297,7 +5194,7 @@ static bool states_equal(struct bpf_verifier_env *env, * equivalent state (jump target or such) we didn't arrive by the straight-line * code, so read marks in the state must propagate to the parent regardless * of the state's write marks. That's what 'parent == state->parent' comparison - * in mark_reg_read() and mark_stack_slot_read() is for. + * in mark_reg_read() is for. */ static int propagate_liveness(struct bpf_verifier_env *env, const struct bpf_verifier_state *vstate, @@ -5318,7 +5215,8 @@ static int propagate_liveness(struct bpf_verifier_env *env, if (vparent->frame[vparent->curframe]->regs[i].live & REG_LIVE_READ) continue; if (vstate->frame[vstate->curframe]->regs[i].live & REG_LIVE_READ) { - err = mark_reg_read(env, vstate, vparent, i); + err = mark_reg_read(env, &vstate->frame[vstate->curframe]->regs[i], + &vparent->frame[vstate->curframe]->regs[i]); if (err) return err; } @@ -5333,7 +5231,8 @@ static int propagate_liveness(struct bpf_verifier_env *env, if (parent->stack[i].spilled_ptr.live & REG_LIVE_READ) continue; if (state->stack[i].spilled_ptr.live & REG_LIVE_READ) - mark_stack_slot_read(env, vstate, vparent, i, frame); + mark_reg_read(env, &state->stack[i].spilled_ptr, + &parent->stack[i].spilled_ptr); } } return err; @@ -5343,7 +5242,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) { struct bpf_verifier_state_list *new_sl; struct bpf_verifier_state_list *sl; - struct bpf_verifier_state *cur = env->cur_state; + struct bpf_verifier_state *cur = env->cur_state, *new; int i, j, err, states_cnt = 0; sl = env->explored_states[insn_idx]; @@ -5389,16 +5288,18 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) return -ENOMEM; /* add new state to the head of linked list */ - err = copy_verifier_state(&new_sl->state, cur); + new = &new_sl->state; + err = copy_verifier_state(new, cur); if (err) { - free_verifier_state(&new_sl->state, false); + free_verifier_state(new, false); kfree(new_sl); return err; } new_sl->next = env->explored_states[insn_idx]; env->explored_states[insn_idx] = new_sl; /* connect new state to parentage chain */ - cur->parent = &new_sl->state; + for (i = 0; i < BPF_REG_FP; i++) + cur_regs(env)[i].parent = &new->frame[new->curframe]->regs[i]; /* clear write marks in current state: the writes we did are not writes * our child did, so they don't screen off its reads from us. * (There are no read marks in current state, because reads always mark @@ -5411,9 +5312,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) /* all stack frames are accessible from callee, clear them all */ for (j = 0; j <= cur->curframe; j++) { struct bpf_func_state *frame = cur->frame[j]; + struct bpf_func_state *newframe = new->frame[j]; - for (i = 0; i < frame->allocated_stack / BPF_REG_SIZE; i++) + for (i = 0; i < frame->allocated_stack / BPF_REG_SIZE; i++) { frame->stack[i].spilled_ptr.live = REG_LIVE_NONE; + frame->stack[i].spilled_ptr.parent = + &newframe->stack[i].spilled_ptr; + } } return 0; } -- 2.25.1

1 3

[PATCH openEuler-1.0-LTS] blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
by Yang Yingliang 24 Aug '21

24 Aug '21

From: Yu Kuai <yukuai3(a)huawei.com> mainline inclusion from mainline-5.14-rc6 commit 454bb6775202d94f0f489c4632efecdb62d3c904 category: bugfix bugzilla: 175277 CVE: NA ------------------------------------------------- We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: Yu Kuai <yukuai3(a)huawei.com> Reviewed-by: Ming Lei <ming.lei(a)redhat.com> Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: block/blk-mq.c Signed-off-by: Yu Kuai <yukuai3(a)huawei.com> Reviewed-by: Jason Yan <yanaijie(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- block/blk-mq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index cad7507d4f709..b37721774895b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2513,10 +2513,12 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared) int i; queue_for_each_hw_ctx(q, hctx, i) { - if (shared) + if (shared) { hctx->flags |= BLK_MQ_F_TAG_SHARED; - else + } else { + blk_mq_tag_idle(hctx); hctx->flags &= ~BLK_MQ_F_TAG_SHARED; + } } } -- 2.25.1

1 0