v2->v3: drop "xfrm: fix a data-race in xfrm_gen_index()" to avoid kabi changes
v1->v2: remove redundant patches
Chen Yu (1): genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()
Christophe JAILLET (1): ACPI: sysfs: Fix create_pnp_modalias() and create_of_modalias()
Eric Dumazet (4): tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb tcp_metrics: add missing barriers on delete tcp_metrics: properly set tp->snd_ssthresh in tcp_init_metrics() tcp_metrics: do not create an entry from tcp_init_metrics()
Gou Hao (1): ext4: move 'ix' sanity check to corrent position
Jan Kara (1): quota: Fix slow quotaoff
Jeff Layton (1): overlayfs: set ctime when setting mtime and atime
Jorge Sanjuan Garcia (1): mcb: remove is_added flag from mcb_device struct
Kuniyuki Iwashima (1): dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses.
Marc Kleine-Budde (3): can: dev: move driver related infrastructure into separate subdir can: dev: can_restart(): don't crash kernel if carrier is OK can: dev: can_restart(): fix race condition between controller restart and netif_carrier_on()
Michal Koutný (1): cgroup: Remove duplicates in cgroup v1 tasks file
Neal Cardwell (1): tcp: fix excessive TLP and RACK timeouts from HZ rounding
Peter Zijlstra (1): sched,idle,rcu: Push rcu_idle deeper into the idle path
Reuben Hawkins (1): vfs: fix readahead(2) on block devices
Sunil V L (1): ACPI: irq: Fix incorrect return value in acpi_register_gsi()
Yan Zhai (1): ipv6: avoid atomic fragment on GSO packets
drivers/acpi/device_sysfs.c | 10 ++-- drivers/acpi/irq.c | 7 ++- drivers/cpuidle/cpuidle.c | 12 +++-- drivers/mcb/mcb-core.c | 10 ++-- drivers/mcb/mcb-parse.c | 2 - drivers/net/can/Makefile | 7 +-- drivers/net/can/dev/Makefile | 7 +++ drivers/net/can/{ => dev}/dev.c | 10 ++-- drivers/net/can/{ => dev}/rx-offload.c | 0 fs/ext4/extents.c | 10 ++-- fs/overlayfs/copy_up.c | 2 +- fs/quota/dquot.c | 66 +++++++++++++++----------- include/linux/mcb.h | 1 - include/linux/quota.h | 4 +- include/linux/quotaops.h | 2 +- include/net/tcp.h | 3 ++ kernel/cgroup/cgroup-v1.c | 5 +- kernel/irq/matrix.c | 6 +-- kernel/sched/idle.c | 22 ++++----- mm/readahead.c | 3 +- net/dccp/ipv6.c | 6 +-- net/ipv4/tcp_metrics.c | 15 +++--- net/ipv4/tcp_output.c | 25 +++++++--- net/ipv4/tcp_recovery.c | 2 +- net/ipv6/ip6_output.c | 8 +++- net/ipv6/syncookies.c | 7 +-- 26 files changed, 145 insertions(+), 107 deletions(-) create mode 100644 drivers/net/can/dev/Makefile rename drivers/net/can/{ => dev}/dev.c (99%) rename drivers/net/can/{ => dev}/rx-offload.c (100%)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3276 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/3...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3276 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/3...
From: Michal Koutný mkoutny@suse.com
stable inclusion from stable-v4.19.297 commit 987fb4353c3bcfdc80e960419eee70e824cdde2c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 1ca0b605150501b7dc59f3016271da4eb3e96fce upstream.
One PID may appear multiple times in a preloaded pidlist. (Possibly due to PID recycling but we have reports of the same task_struct appearing with different PIDs, thus possibly involving transfer of PID via de_thread().)
Because v1 seq_file iterator uses PIDs as position, it leads to a message:
seq_file: buggy .next function kernfs_seq_next did not update position index
Conservative and quick fix consists of removing duplicates from `tasks` file (as opposed to removing pidlists altogether). It doesn't affect correctness (it's sufficient to show a PID once), performance impact would be hidden by unconditional sorting of the pidlist already in place (asymptotically).
Link: https://lore.kernel.org/r/20230823174804.23632-1-mkoutny@suse.com/ Suggested-by: Firo Yang firo.yang@suse.com Signed-off-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- kernel/cgroup/cgroup-v1.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 2e3748b1a333..bc40331c53c4 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -392,10 +392,9 @@ static int pidlist_array_load(struct cgroup *cgrp, enum cgroup_filetype type, } css_task_iter_end(&it); length = n; - /* now sort & (if procs) strip out duplicates */ + /* now sort & strip out duplicates (tgids or recycled thread PIDs) */ sort(array, length, sizeof(pid_t), cmppid, NULL); - if (type == CGROUP_FILE_PROCS) - length = pidlist_uniq(array, length); + length = pidlist_uniq(array, length);
l = cgroup_pidlist_find_create(cgrp, type); if (!l) {
From: Peter Zijlstra peterz@infradead.org
stable inclusion from stable-v4.19.297 commit 788621afda4101ca0fae48de424040cda78193fe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 1098582a0f6c4e8fd28da0a6305f9233d02c9c1d upstream.
Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice that the locking tracepoints were using RCU.
Push rcu_idle_{enter,exit}() as deep as possible into the idle paths, this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
Specifically, sched_clock_idle_wakeup_event() will use ktime which will use seqlocks which will tickle lockdep, and stop_critical_timings() uses lock.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Steven Rostedt (VMware) rostedt@goodmis.org Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Tested-by: Marco Elver elver@google.com Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org Tested-by: Linux Kernel Functional Testing lkft@linaro.org Tested-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/cpuidle/cpuidle.c | 12 ++++++++---- kernel/sched/idle.c | 22 ++++++++-------------- 2 files changed, 16 insertions(+), 18 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 98c03272131b..b9990189afbd 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -140,13 +140,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv, * executing it contains RCU usage regarded as invalid in the idle * context, so tell RCU about that. */ - RCU_NONIDLE(tick_freeze()); + tick_freeze(); /* * The state used here cannot be a "coupled" one, because the "coupled" * cpuidle mechanism enables interrupts and doing that with timekeeping * suspended is generally unsafe. */ stop_critical_timings(); + rcu_idle_enter(); drv->states[index].enter_s2idle(dev, drv, index); if (WARN_ON_ONCE(!irqs_disabled())) local_irq_disable(); @@ -155,7 +156,8 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv, * first CPU executing it calls functions containing RCU read-side * critical sections, so tell RCU about that. */ - RCU_NONIDLE(tick_unfreeze()); + rcu_idle_exit(); + tick_unfreeze(); start_critical_timings();
time_end = ns_to_ktime(local_clock()); @@ -224,16 +226,18 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, /* Take note of the planned idle state. */ sched_idle_set_state(target_state);
- trace_cpu_idle_rcuidle(index, dev->cpu); + trace_cpu_idle(index, dev->cpu); time_start = ns_to_ktime(local_clock());
stop_critical_timings(); + rcu_idle_enter(); entered_state = target_state->enter(dev, drv, index); + rcu_idle_exit(); start_critical_timings();
sched_clock_idle_wakeup_event(); time_end = ns_to_ktime(local_clock()); - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu); + trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
/* The cpu is no longer idle or about to enter idle. */ sched_idle_set_state(NULL); diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 44a17366c8ec..4e3d149d64ad 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -53,17 +53,18 @@ __setup("hlt", cpu_idle_nopoll_setup);
static noinline int __cpuidle cpu_idle_poll(void) { + trace_cpu_idle(0, smp_processor_id()); + stop_critical_timings(); rcu_idle_enter(); - trace_cpu_idle_rcuidle(0, smp_processor_id()); local_irq_enable(); - stop_critical_timings();
while (!tif_need_resched() && - (cpu_idle_force_poll || tick_check_broadcast_expired())) + (cpu_idle_force_poll || tick_check_broadcast_expired())) cpu_relax(); - start_critical_timings(); - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); + rcu_idle_exit(); + start_critical_timings(); + trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
return 1; } @@ -90,7 +91,9 @@ void __cpuidle default_idle_call(void) local_irq_enable(); } else { stop_critical_timings(); + rcu_idle_enter(); arch_cpu_idle(); + rcu_idle_exit(); start_critical_timings(); } } @@ -148,7 +151,6 @@ static void cpuidle_idle_call(void)
if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick(); - rcu_idle_enter();
default_idle_call(); goto exit_idle; @@ -166,19 +168,15 @@ static void cpuidle_idle_call(void)
if (idle_should_enter_s2idle() || dev->use_deepest_state) { if (idle_should_enter_s2idle()) { - rcu_idle_enter();
entered_state = cpuidle_enter_s2idle(drv, dev); if (entered_state > 0) { local_irq_enable(); goto exit_idle; } - - rcu_idle_exit(); }
tick_nohz_idle_stop_tick(); - rcu_idle_enter();
next_state = cpuidle_find_deepest_state(drv, dev); call_cpuidle(drv, dev, next_state); @@ -195,8 +193,6 @@ static void cpuidle_idle_call(void) else tick_nohz_idle_retain_tick();
- rcu_idle_enter(); - entered_state = call_cpuidle(drv, dev, next_state); /* * Give the governor an opportunity to reflect on the outcome @@ -212,8 +208,6 @@ static void cpuidle_idle_call(void) */ if (WARN_ON_ONCE(irqs_disabled())) local_irq_enable(); - - rcu_idle_exit(); }
/*
From: Jorge Sanjuan Garcia jorge.sanjuangarcia@duagon.com
stable inclusion from stable-v4.19.297 commit c691fe37c3e9d5670a2552f990b38a4582e47aee category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0f28ada1fbf0054557cddcdb93ad17f767105208 upstream.
When calling mcb_bus_add_devices(), both mcb devices and the mcb bus will attempt to attach a device to a driver because they share the same bus_type. This causes an issue when trying to cast the container of the device to mcb_device struct using to_mcb_device(), leading to a wrong cast when the mcb_bus is added. A crash occurs when freing the ida resources as the bus numbering of mcb_bus gets confused with the is_added flag on the mcb_device struct.
The only reason for this cast was to keep an is_added flag on the mcb_device struct that does not seem necessary. The function device_attach() handles already bound devices and the mcb subsystem does nothing special with this is_added flag so remove it completely.
Fixes: 18d288198099 ("mcb: Correctly initialize the bus's device") Cc: stable stable@kernel.org Signed-off-by: Jorge Sanjuan Garcia jorge.sanjuangarcia@duagon.com Co-developed-by: Jose Javier Rodriguez Barbarin JoseJavier.Rodriguez@duagon.com Signed-off-by: Jose Javier Rodriguez Barbarin JoseJavier.Rodriguez@duagon.com Link: https://lore.kernel.org/r/20230906114901.63174-2-JoseJavier.Rodriguez@duagon... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/mcb/mcb-core.c | 10 +++------- drivers/mcb/mcb-parse.c | 2 -- include/linux/mcb.h | 1 - 3 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/mcb/mcb-core.c b/drivers/mcb/mcb-core.c index bb5c5692dedc..cee4f28baf95 100644 --- a/drivers/mcb/mcb-core.c +++ b/drivers/mcb/mcb-core.c @@ -390,17 +390,13 @@ EXPORT_SYMBOL_GPL(mcb_free_dev);
static int __mcb_bus_add_devices(struct device *dev, void *data) { - struct mcb_device *mdev = to_mcb_device(dev); int retval;
- if (mdev->is_added) - return 0; - retval = device_attach(dev); - if (retval < 0) + if (retval < 0) { dev_err(dev, "Error adding device (%d)\n", retval); - - mdev->is_added = true; + return retval; + }
return 0; } diff --git a/drivers/mcb/mcb-parse.c b/drivers/mcb/mcb-parse.c index 7369bda3442f..d5bfeef602d8 100644 --- a/drivers/mcb/mcb-parse.c +++ b/drivers/mcb/mcb-parse.c @@ -98,8 +98,6 @@ static int chameleon_parse_gdd(struct mcb_bus *bus, mdev->mem.end = mdev->mem.start + size - 1; mdev->mem.flags = IORESOURCE_MEM;
- mdev->is_added = false; - ret = mcb_device_register(bus, mdev); if (ret < 0) goto err; diff --git a/include/linux/mcb.h b/include/linux/mcb.h index b1a0ad9d23b3..8052e5f20630 100644 --- a/include/linux/mcb.h +++ b/include/linux/mcb.h @@ -66,7 +66,6 @@ static inline struct mcb_bus *to_mcb_bus(struct device *dev) struct mcb_device { struct device dev; struct mcb_bus *bus; - bool is_added; struct mcb_driver *driver; u16 id; int inst;
From: Jan Kara jack@suse.cz
stable inclusion from stable-v4.19.297 commit bb7e3a019b52d829949d02b64ebab37838148fbf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 869b6ea1609f655a43251bf41757aa44e5350a8f upstream.
Eric has reported that commit dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") heavily increases runtime of generic/270 xfstest for ext4 in nojournal mode. The reason for this is that ext4 in nojournal mode leaves dquots dirty until the last dqput() and thus the cleanup done in quota_release_workfn() has to write them all. Due to the way quota_release_workfn() is written this results in synchronize_srcu() call for each dirty dquot which makes the dquot cleanup when turning quotas off extremely slow.
To be able to avoid synchronize_srcu() for each dirty dquot we need to rework how we track dquots to be cleaned up. Instead of keeping the last dquot reference while it is on releasing_dquots list, we drop it right away and mark the dquot with new DQ_RELEASING_B bit instead. This way we can we can remove dquot from releasing_dquots list when new reference to it is acquired and thus there's no need to call synchronize_srcu() each time we drop dq_list_lock.
References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AM... Reported-by: Eric Whitney enwlinux@gmail.com Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- fs/quota/dquot.c | 66 ++++++++++++++++++++++++---------------- include/linux/quota.h | 4 ++- include/linux/quotaops.h | 2 +- 3 files changed, 43 insertions(+), 29 deletions(-)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 7310347a54c7..ee1054cab370 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -231,19 +231,18 @@ static void put_quota_format(struct quota_format_type *fmt) * All dquots are placed to the end of inuse_list when first created, and this * list is used for invalidate operation, which must look at every dquot. * - * When the last reference of a dquot will be dropped, the dquot will be - * added to releasing_dquots. We'd then queue work item which would call + * When the last reference of a dquot is dropped, the dquot is added to + * releasing_dquots. We'll then queue work item which will call * synchronize_srcu() and after that perform the final cleanup of all the - * dquots on the list. Both releasing_dquots and free_dquots use the - * dq_free list_head in the dquot struct. When a dquot is removed from - * releasing_dquots, a reference count is always subtracted, and if - * dq_count == 0 at that point, the dquot will be added to the free_dquots. + * dquots on the list. Each cleaned up dquot is moved to free_dquots list. + * Both releasing_dquots and free_dquots use the dq_free list_head in the dquot + * struct. * - * Unused dquots (dq_count == 0) are added to the free_dquots list when freed, - * and this list is searched whenever we need an available dquot. Dquots are - * removed from the list as soon as they are used again, and - * dqstats.free_dquots gives the number of dquots on the list. When - * dquot is invalidated it's completely released from memory. + * Unused and cleaned up dquots are in the free_dquots list and this list is + * searched whenever we need an available dquot. Dquots are removed from the + * list as soon as they are used again and dqstats.free_dquots gives the number + * of dquots on the list. When dquot is invalidated it's completely released + * from memory. * * Dirty dquots are added to the dqi_dirty_list of quota_info when mark * dirtied, and this list is searched when writing dirty dquots back to @@ -321,6 +320,7 @@ static inline void put_dquot_last(struct dquot *dquot) static inline void put_releasing_dquots(struct dquot *dquot) { list_add_tail(&dquot->dq_free, &releasing_dquots); + set_bit(DQ_RELEASING_B, &dquot->dq_flags); }
static inline void remove_free_dquot(struct dquot *dquot) @@ -328,8 +328,10 @@ static inline void remove_free_dquot(struct dquot *dquot) if (list_empty(&dquot->dq_free)) return; list_del_init(&dquot->dq_free); - if (!atomic_read(&dquot->dq_count)) + if (!test_bit(DQ_RELEASING_B, &dquot->dq_flags)) dqstats_dec(DQST_FREE_DQUOTS); + else + clear_bit(DQ_RELEASING_B, &dquot->dq_flags); }
static inline void put_inuse(struct dquot *dquot) @@ -571,12 +573,6 @@ static void invalidate_dquots(struct super_block *sb, int type) continue; /* Wait for dquot users */ if (atomic_read(&dquot->dq_count)) { - /* dquot in releasing_dquots, flush and retry */ - if (!list_empty(&dquot->dq_free)) { - spin_unlock(&dq_list_lock); - goto restart; - } - atomic_inc(&dquot->dq_count); spin_unlock(&dq_list_lock); /* @@ -595,6 +591,15 @@ static void invalidate_dquots(struct super_block *sb, int type) * restart. */ goto restart; } + /* + * The last user already dropped its reference but dquot didn't + * get fully cleaned up yet. Restart the scan which flushes the + * work cleaning up released dquots. + */ + if (test_bit(DQ_RELEASING_B, &dquot->dq_flags)) { + spin_unlock(&dq_list_lock); + goto restart; + } /* * Quota now has no users and it has been written on last * dqput() @@ -686,6 +691,13 @@ int dquot_writeback_dquots(struct super_block *sb, int type) dq_dirty);
WARN_ON(!dquot_active(dquot)); + /* If the dquot is releasing we should not touch it */ + if (test_bit(DQ_RELEASING_B, &dquot->dq_flags)) { + spin_unlock(&dq_list_lock); + flush_delayed_work("a_release_work); + spin_lock(&dq_list_lock); + continue; + }
/* Now we have active dquot from which someone is * holding reference so we can safely just increase @@ -799,18 +811,18 @@ static void quota_release_workfn(struct work_struct *work) /* Exchange the list head to avoid livelock. */ list_replace_init(&releasing_dquots, &rls_head); spin_unlock(&dq_list_lock); + synchronize_srcu(&dquot_srcu);
restart: - synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); while (!list_empty(&rls_head)) { dquot = list_first_entry(&rls_head, struct dquot, dq_free); - /* Dquot got used again? */ - if (atomic_read(&dquot->dq_count) > 1) { - remove_free_dquot(dquot); - atomic_dec(&dquot->dq_count); - continue; - } + WARN_ON_ONCE(atomic_read(&dquot->dq_count)); + /* + * Note that DQ_RELEASING_B protects us from racing with + * invalidate_dquots() calls so we are safe to work with the + * dquot even after we drop dq_list_lock. + */ if (dquot_dirty(dquot)) { spin_unlock(&dq_list_lock); /* Commit dquot before releasing */ @@ -824,7 +836,6 @@ static void quota_release_workfn(struct work_struct *work) } /* Dquot is inactive and clean, now move it to free list */ remove_free_dquot(dquot); - atomic_dec(&dquot->dq_count); put_dquot_last(dquot); } spin_unlock(&dq_list_lock); @@ -865,6 +876,7 @@ void dqput(struct dquot *dquot) BUG_ON(!list_empty(&dquot->dq_free)); #endif put_releasing_dquots(dquot); + atomic_dec(&dquot->dq_count); spin_unlock(&dq_list_lock); queue_delayed_work(system_unbound_wq, "a_release_work, 1); } @@ -953,7 +965,7 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid) dqstats_inc(DQST_LOOKUPS); } /* Wait for dq_lock - after this we know that either dquot_release() is - * already finished or it will be canceled due to dq_count > 1 test */ + * already finished or it will be canceled due to dq_count > 0 test */ wait_on_dquot(dquot); /* Read the dquot / allocate space in quota file */ if (!dquot_active(dquot)) { diff --git a/include/linux/quota.h b/include/linux/quota.h index 75c6c3a06819..f97f1ecc15e6 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -286,7 +286,9 @@ static inline void dqstats_dec(unsigned int type) #define DQ_FAKE_B 3 /* no limits only usage */ #define DQ_READ_B 4 /* dquot was read into memory */ #define DQ_ACTIVE_B 5 /* dquot is active (dquot_release not called) */ -#define DQ_LASTSET_B 6 /* Following 6 bits (see QIF_) are reserved\ +#define DQ_RELEASING_B 6 /* dquot is in releasing_dquots list waiting + * to be cleaned up */ +#define DQ_LASTSET_B 7 /* Following 6 bits (see QIF_) are reserved\ * for the mask of entries set via SETQUOTA\ * quotactl. They are set under dq_data_lock\ * and the quota format handling dquot can\ diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 91e0b7624053..fbc1c6f69b7e 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -59,7 +59,7 @@ static inline bool dquot_is_busy(struct dquot *dquot) { if (test_bit(DQ_MOD_B, &dquot->dq_flags)) return true; - if (atomic_read(&dquot->dq_count) > 1) + if (atomic_read(&dquot->dq_count) > 0) return true; return false; }
From: Jeff Layton jlayton@kernel.org
stable inclusion from stable-v4.19.297 commit 55fabed6352309cb71e0c76814f1f7023250ebf0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 03dbab3bba5f009d053635c729d1244f2c8bad38 ]
Nathan reported that he was seeing the new warning in setattr_copy_mgtime pop when starting podman containers. Overlayfs is trying to set the atime and mtime via notify_change without also setting the ctime.
POSIX states that when the atime and mtime are updated via utimes() that we must also update the ctime to the current time. The situation with overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask. notify_change will fill in the value.
Reported-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Jeff Layton jlayton@kernel.org Tested-by: Nathan Chancellor nathan@kernel.org Acked-by: Christian Brauner brauner@kernel.org Acked-by: Amir Goldstein amir73il@gmail.com Message-Id: 20230913-ctime-v1-1-c6bc509cbc27@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- fs/overlayfs/copy_up.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 30abafcd4ecc..96749f0af9b7 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -195,7 +195,7 @@ static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat) { struct iattr attr = { .ia_valid = - ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET, + ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_CTIME, .ia_atime = stat->atime, .ia_mtime = stat->mtime, };
From: Reuben Hawkins reubenhwk@gmail.com
stable inclusion from stable-v4.19.299 commit 70409dcd485a244b9f508aa6cdb21ac899a85dc1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 7116c0af4b8414b2f19fdb366eea213cbd9d91c2 ]
Readahead was factored to call generic_fadvise. That refactor added an S_ISREG restriction which broke readahead on block devices.
In addition to S_ISREG, this change checks S_ISBLK to fix block device readahead. There is no change in behavior with any file type besides block devices in this change.
Fixes: 3d8f7615319b ("vfs: implement readahead(2) using POSIX_FADV_WILLNEED") Signed-off-by: Reuben Hawkins reubenhwk@gmail.com Link: https://lore.kernel.org/r/20231003015704.2415-1-reubenhwk@gmail.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- mm/readahead.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/readahead.c b/mm/readahead.c index 7a21199c6227..7d78f86fcf6f 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -659,7 +659,8 @@ ssize_t ksys_readahead(int fd, loff_t offset, size_t count) */ ret = -EINVAL; if (!f.file->f_mapping || !f.file->f_mapping->a_ops || - !S_ISREG(file_inode(f.file)->i_mode)) + (!S_ISREG(file_inode(f.file)->i_mode) && + !S_ISBLK(file_inode(f.file)->i_mode))) goto out;
ret = vfs_fadvise(f.file, offset, count, POSIX_FADV_WILLNEED);
From: Gou Hao gouhao@uniontech.com
stable inclusion from stable-v4.19.299 commit 23ab419bde8cc2b378bdbf3d518b7a5542665fbe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit af90a8f4a09ec4a3de20142e37f37205d4687f28 ]
Check 'ix' before it is used.
Fixes: 80e675f906db ("ext4: optimize memmmove lengths in extent/index insertions") Signed-off-by: Gou Hao gouhao@uniontech.com Link: https://lore.kernel.org/r/20230906013341.7199-1-gouhao@uniontech.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- fs/ext4/extents.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 6edab0ef28fd..e29b55991e8f 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1052,6 +1052,11 @@ static int ext4_ext_insert_index(handle_t *handle, struct inode *inode, ix = curp->p_idx; }
+ if (unlikely(ix > EXT_MAX_INDEX(curp->p_hdr))) { + EXT4_ERROR_INODE(inode, "ix > EXT_MAX_INDEX!"); + return -EFSCORRUPTED; + } + len = EXT_LAST_INDEX(curp->p_hdr) - ix + 1; BUG_ON(len < 0); if (len > 0) { @@ -1061,11 +1066,6 @@ static int ext4_ext_insert_index(handle_t *handle, struct inode *inode, memmove(ix + 1, ix, len * sizeof(struct ext4_extent_idx)); }
- if (unlikely(ix > EXT_MAX_INDEX(curp->p_hdr))) { - EXT4_ERROR_INODE(inode, "ix > EXT_MAX_INDEX!"); - return -EFSCORRUPTED; - } - ix->ei_block = cpu_to_le32(logical); ext4_idx_store_pblock(ix, ptr); le16_add_cpu(&curp->p_hdr->eh_entries, 1);
From: Sunil V L sunilvl@ventanamicro.com
stable inclusion from stable-v4.19.297 commit 3ed90437a117336d7e6858d0464d7c2953868778 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 0c21a18d5d6c6a73d098fb9b4701572370942df9 upstream.
acpi_register_gsi() should return a negative value in case of failure.
Currently, it returns the return value from irq_create_fwspec_mapping(). However, irq_create_fwspec_mapping() returns 0 for failure. Fix the issue by returning -EINVAL if irq_create_fwspec_mapping() returns zero.
Fixes: d44fa3d46079 ("ACPI: Add support for ResourceSource/IRQ domain mapping") Cc: 4.11+ stable@vger.kernel.org # 4.11+ Signed-off-by: Sunil V L sunilvl@ventanamicro.com [ rjw: Rename a new local variable ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/acpi/irq.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/irq.c b/drivers/acpi/irq.c index 7c352cba0528..8ac01375fe8f 100644 --- a/drivers/acpi/irq.c +++ b/drivers/acpi/irq.c @@ -55,6 +55,7 @@ int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity) { struct irq_fwspec fwspec; + unsigned int irq;
if (WARN_ON(!acpi_gsi_domain_id)) { pr_warn("GSI: No registered irqchip, giving up\n"); @@ -66,7 +67,11 @@ int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, fwspec.param[1] = acpi_dev_get_irq_type(trigger, polarity); fwspec.param_count = 2;
- return irq_create_fwspec_mapping(&fwspec); + irq = irq_create_fwspec_mapping(&fwspec); + if (!irq) + return -EINVAL; + + return irq; } EXPORT_SYMBOL_GPL(acpi_register_gsi);
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
stable inclusion from stable-v4.19.299 commit f6a14247f954bd314706cfa65711dd881926b14f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 48cf49d31994ff97b33c4044e618560ec84d35fb ]
snprintf() does not return negative values on error.
To know if the buffer was too small, the returned value needs to be compared with the length of the passed buffer. If it is greater or equal, the output has been truncated, so add checks for the truncation to create_pnp_modalias() and create_of_modalias(). Also make them return -ENOMEM in that case, as they already do that elsewhere.
Moreover, the remaining size of the buffer used by snprintf() needs to be updated after the first write to avoid out-of-bounds access as already done correctly in create_pnp_modalias(), but not in create_of_modalias(), so change the latter accordingly.
Fixes: 8765c5ba1949 ("ACPI / scan: Rework modalias creation when "compatible" is present") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr [ rjw: Merge two patches into one, combine changelogs, add subject ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/acpi/device_sysfs.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c index 6cc71a2e5508..0ecb9f6676ce 100644 --- a/drivers/acpi/device_sysfs.c +++ b/drivers/acpi/device_sysfs.c @@ -164,8 +164,8 @@ static int create_pnp_modalias(struct acpi_device *acpi_dev, char *modalias, return 0;
len = snprintf(modalias, size, "acpi:"); - if (len <= 0) - return len; + if (len >= size) + return -ENOMEM;
size -= len;
@@ -218,8 +218,10 @@ static int create_of_modalias(struct acpi_device *acpi_dev, char *modalias, len = snprintf(modalias, size, "of:N%sT", (char *)buf.pointer); ACPI_FREE(buf.pointer);
- if (len <= 0) - return len; + if (len >= size) + return -ENOMEM; + + size -= len;
of_compatible = acpi_dev->data.of_compatible; if (of_compatible->type == ACPI_TYPE_PACKAGE) {
From: Chen Yu yu.c.chen@intel.com
stable inclusion from stable-v4.19.299 commit e1034dcd773ead055aa65c86d71178a5f7ed7f1e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a0b0bad10587ae2948a7c36ca4ffc206007fbcf3 ]
When a CPU is about to be offlined, x86 validates that all active interrupts which are targeted to this CPU can be migrated to the remaining online CPUs. If not, the offline operation is aborted.
The validation uses irq_matrix_allocated() to retrieve the number of vectors which are allocated on the outgoing CPU. The returned number of allocated vectors includes also vectors which are associated to managed interrupts.
That's overaccounting because managed interrupts are:
- not migrated when the affinity mask of the interrupt targets only the outgoing CPU
- migrated to another CPU, but in that case the vector is already pre-allocated on the potential target CPUs and must not be taken into account.
As a consequence the check whether the remaining online CPUs have enough capacity for migrating the allocated vectors from the outgoing CPU might fail incorrectly.
Let irq_matrix_allocated() return only the number of allocated non-managed interrupts to make this validation check correct.
[ tglx: Amend changelog and fixup kernel-doc comment ]
Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: Wendy Wang wendy.wang@intel.com Signed-off-by: Chen Yu yu.c.chen@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20231020072522.557846-1-yu.c.chen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- kernel/irq/matrix.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c index 0a5b2e26fd54..d3ba3cbbb515 100644 --- a/kernel/irq/matrix.c +++ b/kernel/irq/matrix.c @@ -459,16 +459,16 @@ unsigned int irq_matrix_reserved(struct irq_matrix *m) }
/** - * irq_matrix_allocated - Get the number of allocated irqs on the local cpu + * irq_matrix_allocated - Get the number of allocated non-managed irqs on the local CPU * @m: Pointer to the matrix to search * - * This returns number of allocated irqs + * This returns number of allocated non-managed interrupts. */ unsigned int irq_matrix_allocated(struct irq_matrix *m) { struct cpumap *cm = this_cpu_ptr(m->maps);
- return cm->allocated; + return cm->allocated - cm->managed_allocated; }
#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
From: Neal Cardwell ncardwell@google.com
stable inclusion from stable-v4.19.297 commit 6d022a7abf7278f5317bf3000e4ba5d8e55875dc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 1c2709cfff1dedbb9591e989e2f001484208d914 upstream.
We discovered from packet traces of slow loss recovery on kernels with the default HZ=250 setting (and min_rtt < 1ms) that after reordering, when receiving a SACKed sequence range, the RACK reordering timer was firing after about 16ms rather than the desired value of roughly min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the exact same issue.
This commit fixes the TLP transmit timer and RACK reordering timer floor calculation to more closely match the intended 2ms floor even on kernels with HZ=250. It does this by adding in a new TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies, instead of the current approach of converting to jiffies and then adding th TCP_TIMEOUT_MIN value of 2 jiffies.
Our testing has verified that on kernels with HZ=1000, as expected, this does not produce significant changes in behavior, but on kernels with the default HZ=250 the latency improvement can be large. For example, our tests show that for HZ=250 kernels at low RTTs this fix roughly halves the latency for the RACK reorder timer: instead of mostly firing at 16ms it mostly fires at 8ms.
Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: Yuchung Cheng ycheng@google.com Fixes: bb4d991a28cc ("tcp: adjust tail loss probe timeout") Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20231015174700.2206872-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- include/net/tcp.h | 3 +++ net/ipv4/tcp_output.c | 9 +++++---- net/ipv4/tcp_recovery.c | 2 +- 3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 863bbbccf209..44deff714e93 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -141,6 +141,9 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCP_RTO_MAX ((unsigned)(120*HZ)) #define TCP_RTO_MIN ((unsigned)(HZ/5)) #define TCP_TIMEOUT_MIN (2U) /* Min timeout for TCP timers in jiffies */ + +#define TCP_TIMEOUT_MIN_US (2*USEC_PER_MSEC) /* Min TCP timeout in microsecs */ + #define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* RFC6298 2.1 initial RTO value */ #define TCP_TIMEOUT_FALLBACK ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value, now * used as a fallback RTO for the diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8cd673d75f68..143faf2b086b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2502,7 +2502,7 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); - u32 timeout, rto_delta_us; + u32 timeout, timeout_us, rto_delta_us; int early_retrans;
/* Don't do any loss probe on a Fast Open connection before 3WHS @@ -2526,11 +2526,12 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto) * sample is available then probe after TCP_TIMEOUT_INIT. */ if (tp->srtt_us) { - timeout = usecs_to_jiffies(tp->srtt_us >> 2); + timeout_us = tp->srtt_us >> 2; if (tp->packets_out == 1) - timeout += TCP_RTO_MIN; + timeout_us += tcp_rto_min_us(sk); else - timeout += TCP_TIMEOUT_MIN; + timeout_us += TCP_TIMEOUT_MIN_US; + timeout = usecs_to_jiffies(timeout_us); } else { timeout = TCP_TIMEOUT_INIT; } diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 61969bb9395c..844ff390f726 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -122,7 +122,7 @@ bool tcp_rack_mark_lost(struct sock *sk) tp->rack.advanced = 0; tcp_rack_detect_loss(sk, &timeout); if (timeout) { - timeout = usecs_to_jiffies(timeout) + TCP_TIMEOUT_MIN; + timeout = usecs_to_jiffies(timeout + TCP_TIMEOUT_MIN_US); inet_csk_reset_xmit_timer(sk, ICSK_TIME_REO_TIMEOUT, timeout, inet_csk(sk)->icsk_rto); }
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.297 commit e480bfcf5fec910730b7eae6984bd2ea33d7a4d7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit f921a4a5bffa8a0005b190fb9421a7fc1fd716b6 upstream.
In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()") we allowed to send an skb regardless of TSQ limits being hit if rtx queue was empty or had a single skb, in order to better fill the pipe when/if TX completions were slow.
Then later, commit 75c119afe14f ("tcp: implement rb-tree based retransmit queue") accidentally removed the special case for one skb in rtx queue.
Stefan Wahren reported a regression in single TCP flow throughput using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt"). This last commit only made the regression more visible, because it locked the TCP flow on a particular behavior where TSQ prevented two skbs being pushed downstream, adding silences on the wire between each TSO packet.
Many thanks to Stefan for his invaluable help !
Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/ Reported-by: Stefan Wahren wahrenst@gmx.net Tested-by: Stefan Wahren wahrenst@gmx.net Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Neal Cardwell ncardwell@google.com Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- net/ipv4/tcp_output.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 143faf2b086b..1d3626330657 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2273,6 +2273,18 @@ static int tcp_mtu_probe(struct sock *sk) return -1; }
+static bool tcp_rtx_queue_empty_or_single_skb(const struct sock *sk) +{ + const struct rb_node *node = sk->tcp_rtx_queue.rb_node; + + /* No skb in the rtx queue. */ + if (!node) + return true; + + /* Only one skb in rtx queue. */ + return !node->rb_left && !node->rb_right; +} + /* TCP Small Queues : * Control number of packets in qdisc/devices to two packets / or ~1 ms. * (These limits are doubled for retransmits) @@ -2295,12 +2307,12 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, limit <<= factor;
if (refcount_read(&sk->sk_wmem_alloc) > limit) { - /* Always send skb if rtx queue is empty. + /* Always send skb if rtx queue is empty or has one skb. * No need to wait for TX completion to call us back, * after softirq/tasklet schedule. * This helps when TX completions are delayed too much. */ - if (tcp_rtx_queue_empty(sk)) + if (tcp_rtx_queue_empty_or_single_skb(sk)) return false;
set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.299 commit 73fb232859f2849abf35515eff679eefd7afcab6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit cbc3a153222805d65f821e10f4f78b6afce06f86 ]
When removing an item from RCU protected list, we must prevent store-tearing, using rcu_assign_pointer() or WRITE_ONCE().
Fixes: 04f721c671656 ("tcp_metrics: Rewrite tcp_metrics_flush_all") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- net/ipv4/tcp_metrics.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 7bbd9125b500..9ad4258cfcbc 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -913,7 +913,7 @@ static void tcp_metrics_flush_all(struct net *net) match = net ? net_eq(tm_net(tm), net) : !refcount_read(&tm_net(tm)->count); if (match) { - *pp = tm->tcpm_next; + rcu_assign_pointer(*pp, tm->tcpm_next); kfree_rcu(tm, rcu_head); } else { pp = &tm->tcpm_next; @@ -954,7 +954,7 @@ static int tcp_metrics_nl_cmd_del(struct sk_buff *skb, struct genl_info *info) if (addr_same(&tm->tcpm_daddr, &daddr) && (!src || addr_same(&tm->tcpm_saddr, &saddr)) && net_eq(tm_net(tm), net)) { - *pp = tm->tcpm_next; + rcu_assign_pointer(*pp, tm->tcpm_next); kfree_rcu(tm, rcu_head); found = true; } else {
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.299 commit 68229f84c3c19ca920df1ec62fadff0c08500fac category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 081480014a64a69d901f8ef1ffdd56d6085cf87e ]
We need to set tp->snd_ssthresh to TCP_INFINITE_SSTHRESH in the case tcp_get_metrics() fails for some reason.
Fixes: 9ad7c049f0f7 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- net/ipv4/tcp_metrics.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 9ad4258cfcbc..7d486295d75f 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -466,6 +466,10 @@ void tcp_init_metrics(struct sock *sk) u32 val, crtt = 0; /* cached RTT scaled by 8 */
sk_dst_confirm(sk); + /* ssthresh may have been reduced unnecessarily during. + * 3WHS. Restore it back to its initial default. + */ + tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; if (!dst) goto reset;
@@ -484,11 +488,6 @@ void tcp_init_metrics(struct sock *sk) tp->snd_ssthresh = val; if (tp->snd_ssthresh > tp->snd_cwnd_clamp) tp->snd_ssthresh = tp->snd_cwnd_clamp; - } else { - /* ssthresh may have been reduced unnecessarily during. - * 3WHS. Restore it back to its initial default. - */ - tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; } val = tcp_metric_get(tm, TCP_METRIC_REORDERING); if (val && tp->reordering != val)
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.299 commit febd9f4d7cb8b9d9adff87b105f3afd55902127e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a135798e6e200ecb2f864cecca6d257ba278370c ]
tcp_init_metrics() only wants to get metrics if they were previously stored in the cache. Creating an entry is adding useless costs, especially when tcp_no_metrics_save is set.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- net/ipv4/tcp_metrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 7d486295d75f..60619b1f4acd 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -474,7 +474,7 @@ void tcp_init_metrics(struct sock *sk) goto reset;
rcu_read_lock(); - tm = tcp_get_metrics(sk, dst, true); + tm = tcp_get_metrics(sk, dst, false); if (!tm) { rcu_read_unlock(); goto reset;
From: Yan Zhai yan@cloudflare.com
stable inclusion from stable-v4.19.299 commit c9faed4c8c975f941bad8a47d75759e1c397158e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 03d6c848bfb406e9ef6d9846d759e97beaeea113 ]
When the ipv6 stack output a GSO packet, if its gso_size is larger than dst MTU, then all segments would be fragmented. However, it is possible for a GSO packet to have a trailing segment with smaller actual size than both gso_size as well as the MTU, which leads to an "atomic fragment". Atomic fragments are considered harmful in RFC-8021. An Existing report from APNIC also shows that atomic fragments are more likely to be dropped even it is equivalent to a no-op [1].
Add an extra check in the GSO slow output path. For each segment from the original over-sized packet, if it fits with the path MTU, then avoid generating an atomic fragment.
Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1] Fixes: b210de4f8c97 ("net: ipv6: Validate GSO SKB before finish IPv6 processing") Reported-by: David Wragg dwragg@cloudflare.com Signed-off-by: Yan Zhai yan@cloudflare.com Link: https://lore.kernel.org/r/90912e3503a242dca0bc36958b11ed03a2696e5e.169815696... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- net/ipv6/ip6_output.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 4f31a781ab37..9951ccfb3147 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -153,7 +153,13 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk, int err;
skb_mark_not_on_list(segs); - err = ip6_fragment(net, sk, segs, ip6_finish_output2); + /* Last GSO segment can be smaller than gso_size (and MTU). + * Adding a fragment header would produce an "atomic fragment", + * which is considered harmful (RFC-8021). Avoid that. + */ + err = segs->len > mtu ? + ip6_fragment(net, sk, segs, ip6_finish_output2) : + ip6_finish_output2(net, sk, segs); if (err && ret == 0) ret = err; }
From: Marc Kleine-Budde mkl@pengutronix.de
stable inclusion from stable-v4.19.299 commit 4757b5ea2441c7018ba756de4d9b2c3c6f59978d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 3e77f70e734584e0ad1038e459ed3fd2400f873a ]
This patch moves the CAN driver related infrastructure into a separate subdir. It will be split into more files in the coming patches.
Reviewed-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Link: https://lore.kernel.org/r/20210111141930.693847-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Stable-dep-of: fe5c9940dfd8 ("can: dev: can_restart(): don't crash kernel if carrier is OK") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/net/can/Makefile | 7 +------ drivers/net/can/dev/Makefile | 7 +++++++ drivers/net/can/{ => dev}/dev.c | 0 drivers/net/can/{ => dev}/rx-offload.c | 0 4 files changed, 8 insertions(+), 6 deletions(-) create mode 100644 drivers/net/can/dev/Makefile rename drivers/net/can/{ => dev}/dev.c (100%) rename drivers/net/can/{ => dev}/rx-offload.c (100%)
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile index be9c70e77df3..deea03f1dead 100644 --- a/drivers/net/can/Makefile +++ b/drivers/net/can/Makefile @@ -7,12 +7,7 @@ obj-$(CONFIG_CAN_VCAN) += vcan.o obj-$(CONFIG_CAN_VXCAN) += vxcan.o obj-$(CONFIG_CAN_SLCAN) += slcan.o
-obj-$(CONFIG_CAN_DEV) += can-dev.o -can-dev-y += dev.o -can-dev-y += rx-offload.o - -can-dev-$(CONFIG_CAN_LEDS) += led.o - +obj-y += dev/ obj-y += rcar/ obj-y += spi/ obj-y += usb/ diff --git a/drivers/net/can/dev/Makefile b/drivers/net/can/dev/Makefile new file mode 100644 index 000000000000..cba92e6bcf6f --- /dev/null +++ b/drivers/net/can/dev/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_CAN_DEV) += can-dev.o +can-dev-y += dev.o +can-dev-y += rx-offload.o + +can-dev-$(CONFIG_CAN_LEDS) += led.o diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev/dev.c similarity index 100% rename from drivers/net/can/dev.c rename to drivers/net/can/dev/dev.c diff --git a/drivers/net/can/rx-offload.c b/drivers/net/can/dev/rx-offload.c similarity index 100% rename from drivers/net/can/rx-offload.c rename to drivers/net/can/dev/rx-offload.c
From: Marc Kleine-Budde mkl@pengutronix.de
stable inclusion from stable-v4.19.299 commit 728e0b44b76200dfbbe70457983d8ce4a849ee1e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit fe5c9940dfd8ba0c73672dddb30acd1b7a11d4c7 ]
During testing, I triggered a can_restart() with the netif carrier being OK [1]. The BUG_ON, which checks if the carrier is OK, results in a fatal kernel crash. This is neither helpful for debugging nor for a production system.
[1] The root cause is a race condition in can_restart() which will be fixed in the next patch.
Do not crash the kernel, issue an error message instead, and continue restarting the CAN device anyway.
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface") Link: https://lore.kernel.org/all/20231005-can-dev-fix-can-restart-v2-1-91b5c1fd92... Reviewed-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/net/can/dev/dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/dev/dev.c b/drivers/net/can/dev/dev.c index f4349aaac802..f4f224ec1009 100644 --- a/drivers/net/can/dev/dev.c +++ b/drivers/net/can/dev/dev.c @@ -556,7 +556,8 @@ static void can_restart(struct net_device *dev) struct can_frame *cf; int err;
- BUG_ON(netif_carrier_ok(dev)); + if (netif_carrier_ok(dev)) + netdev_err(dev, "Attempt to restart for bus-off recovery, but carrier is OK?\n");
/* * No synchronization needed because the device is bus-off and
From: Marc Kleine-Budde mkl@pengutronix.de
stable inclusion from stable-v4.19.299 commit 6288c8839b1c89f54cc8eb3b0f57777de20806ec category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 6841cab8c4504835e4011689cbdb3351dec693fd ]
This race condition was discovered while updating the at91_can driver to use can_bus_off(). The following scenario describes how the converted at91_can driver would behave.
When a CAN device goes into BUS-OFF state, the driver usually stops/resets the CAN device and calls can_bus_off().
This function sets the netif carrier to off, and (if configured by user space) schedules a delayed work that calls can_restart() to restart the CAN device.
The can_restart() function first checks if the carrier is off and triggers an error message if the carrier is OK.
Then it calls the driver's do_set_mode() function to restart the device, then it sets the netif carrier to on. There is a race window between these two calls.
The at91 CAN controller (observed on the sama5d3, a single core 32 bit ARM CPU) has a hardware limitation. If the device goes into bus-off while sending a CAN frame, there is no way to abort the sending of this frame. After the controller is enabled again, another attempt is made to send it.
If the bus is still faulty, the device immediately goes back to the bus-off state. The driver calls can_bus_off(), the netif carrier is switched off and another can_restart is scheduled. This occurs within the race window before the original can_restart() handler marks the netif carrier as OK. This would cause the 2nd can_restart() to be called with an OK netif carrier, resulting in an error message.
The flow of the 1st can_restart() looks like this:
can_restart() // bail out if netif_carrier is OK
netif_carrier_ok(dev) priv->do_set_mode(dev, CAN_MODE_START) // enable CAN controller // sama5d3 restarts sending old message
// CAN devices goes into BUS_OFF, triggers IRQ
// IRQ handler start at91_irq() at91_irq_err_line() can_bus_off() netif_carrier_off() schedule_delayed_work() // IRQ handler end
netif_carrier_on()
The 2nd can_restart() will be called with an OK netif carrier and the error message will be printed.
To close the race window, first set the netif carrier to on, then restart the controller. In case the restart fails with an error code, roll back the netif carrier to off.
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface") Link: https://lore.kernel.org/all/20231005-can-dev-fix-can-restart-v2-2-91b5c1fd92... Reviewed-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- drivers/net/can/dev/dev.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/dev/dev.c b/drivers/net/can/dev/dev.c index f4f224ec1009..40e2e1bbb8a6 100644 --- a/drivers/net/can/dev/dev.c +++ b/drivers/net/can/dev/dev.c @@ -583,11 +583,12 @@ static void can_restart(struct net_device *dev) priv->can_stats.restarts++;
/* Now restart the device */ - err = priv->do_set_mode(dev, CAN_MODE_START); - netif_carrier_on(dev); - if (err) + err = priv->do_set_mode(dev, CAN_MODE_START); + if (err) { netdev_err(dev, "Error %d during restart", err); + netif_carrier_off(dev); + } }
static void can_restart_work(struct work_struct *work)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.299 commit 8b3639cb78a1a45eeb85df92aba93cc459b6175b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MCB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 23be1e0e2a83a8543214d2599a31d9a2185a796b ]
Initially, commit 4237c75c0a35 ("[MLSXFRM]: Auto-labeling of child sockets") introduced security_inet_conn_request() in some functions where reqsk is allocated. The hook is added just after the allocation, so reqsk's IPv6 remote address was not initialised then.
However, SELinux/Smack started to read it in netlbl_req_setattr() after commit e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.").
Commit 284904aa7946 ("lsm: Relocate the IPv4 security_inet_conn_request() hooks") fixed that kind of issue only in TCPv4 because IPv6 labeling was not supported at that time. Finally, the same issue was introduced again in IPv6.
Let's apply the same fix on DCCPv6 and TCPv6.
Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paul Moore paul@paul-moore.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com --- net/dccp/ipv6.c | 6 +++--- net/ipv6/syncookies.c | 7 ++++--- 2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 89dbe7330b4a..4ce961b1dc18 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -349,15 +349,15 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb) if (dccp_parse_options(sk, dreq, skb)) goto drop_and_free;
- if (security_inet_conn_request(sk, skb, req)) - goto drop_and_free; - ireq = inet_rsk(req); ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; ireq->ireq_family = AF_INET6; ireq->ir_mark = inet_request_mark(sk, skb);
+ if (security_inet_conn_request(sk, skb, req)) + goto drop_and_free; + if (ipv6_opt_accepted(sk, skb, IP6CB(skb)) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) { diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index c1e21f55ac52..f55338bc3fd6 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -184,14 +184,15 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) treq->af_specific = &tcp_request_sock_ipv6_ops; treq->tfo_listener = false;
- if (security_inet_conn_request(sk, skb, req)) - goto out_free; - req->mss = mss; ireq->ir_rmt_port = th->source; ireq->ir_num = ntohs(th->dest); ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; + + if (security_inet_conn_request(sk, skb, req)) + goto out_free; + if (ipv6_opt_accepted(sk, skb, &TCP_SKB_CB(skb)->header.h6) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {