Backport 5.10.125 LTS patches from upstream. git cherry-pick v5.10.124..v5.10.125~1 -s
Already merged(-2): 9429b75bc271 tcp: increase source port perturb table to 2^16 df3f3bb5059d io_uring: add missing item types for various requests
Context conflict: 24b922a5da00 tcp: dynamically allocate the perturb table used by source ports
Total patches: 12 - 2 = 10
Christian Borntraeger (1): s390/mm: use non-quiescing sske for KVM switch to keyed guest
Damien Le Moal (1): zonefs: fix zonefs_iomap_begin() for reads
Eric Dumazet (1): tcp: add some entropy in __inet_hash_connect()
Lukas Wunner (1): serial: core: Initialize rs485 RTS polarity already on probe
Marian Postevca (1): usb: gadget: u_ether: fix regression in setting fixed MAC address
Will Deacon (1): arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
Willy Tarreau (4): tcp: use different parts of the port_offset for index and offset tcp: add small random increments to the source port tcp: dynamically allocate the perturb table used by source ports tcp: drop the hash_32() part from the index calculation
arch/arm64/mm/cache.S | 2 - arch/s390/mm/pgtable.c | 2 +- drivers/tty/serial/serial_core.c | 34 ++++------ drivers/usb/gadget/function/u_ether.c | 11 +++- fs/zonefs/super.c | 92 ++++++++++++++++++--------- net/ipv4/inet_hashtables.c | 22 +++++-- 6 files changed, 102 insertions(+), 61 deletions(-)
From: Christian Borntraeger borntraeger@linux.ibm.com
stable inclusion from stable-v5.10.125 commit ee4677b78eca67f9c89b2452e9e9d37cc2a7436d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 3ae11dbcfac906a8c3a480e98660a823130dc16a upstream.
The switch to a keyed guest does not require a classic sske as the other guest CPUs are not accessing the key before the switch is complete. By using the NQ SSKE things are faster especially with multiple guests.
Signed-off-by: Christian Borntraeger borntraeger@linux.ibm.com Suggested-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Link: https://lore.kernel.org/r/20220530092706.11637-3-borntraeger@linux.ibm.com Signed-off-by: Christian Borntraeger borntraeger@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/s390/mm/pgtable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index fabaedddc90c..1c05caf68e7d 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -734,7 +734,7 @@ void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep) pgste_val(pgste) |= PGSTE_GR_BIT | PGSTE_GC_BIT; ptev = pte_val(*ptep); if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) - page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1); + page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 0); pgste_set_unlock(ptep, pgste); preempt_enable(); }
From: Damien Le Moal damien.lemoal@opensource.wdc.com
stable inclusion from stable-v5.10.125 commit 355be6131164c5bacf2e810763835aecb6e01fcb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c1c1204c0d0c1dccc1310b9277fb2bd8b663d8fe upstream.
If a readahead is issued to a sequential zone file with an offset exactly equal to the current file size, the iomap type is set to IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is calculated as 0. This causes a WARN_ON() in iomap_iter():
[17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80 [...] [17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80 [...] [17309.754560] Call Trace: [17309.757078] <TASK> [17309.759240] ? lock_is_held_type+0xd8/0x130 [17309.763531] iomap_readahead+0x1a8/0x870 [17309.767550] ? iomap_read_folio+0x4c0/0x4c0 [17309.771817] ? lockdep_hardirqs_on_prepare+0x400/0x400 [17309.778848] ? lock_release+0x370/0x750 [17309.784462] ? folio_add_lru+0x217/0x3f0 [17309.790220] ? reacquire_held_locks+0x4e0/0x4e0 [17309.796543] read_pages+0x17d/0xb60 [17309.801854] ? folio_add_lru+0x238/0x3f0 [17309.807573] ? readahead_expand+0x5f0/0x5f0 [17309.813554] ? policy_node+0xb5/0x140 [17309.819018] page_cache_ra_unbounded+0x27d/0x450 [17309.825439] filemap_get_pages+0x500/0x1450 [17309.831444] ? filemap_add_folio+0x140/0x140 [17309.837519] ? lock_is_held_type+0xd8/0x130 [17309.843509] filemap_read+0x28c/0x9f0 [17309.848953] ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs] [17309.856162] ? trace_contention_end+0xd6/0x130 [17309.862416] ? __mutex_lock+0x221/0x1480 [17309.868151] ? zonefs_file_read_iter+0x166/0x4d0 [zonefs] [17309.875364] ? filemap_get_pages+0x1450/0x1450 [17309.881647] ? __mutex_unlock_slowpath+0x15e/0x620 [17309.888248] ? wait_for_completion_io_timeout+0x20/0x20 [17309.895231] ? lock_is_held_type+0xd8/0x130 [17309.901115] ? lock_is_held_type+0xd8/0x130 [17309.906934] zonefs_file_read_iter+0x356/0x4d0 [zonefs] [17309.913750] new_sync_read+0x2d8/0x520 [17309.919035] ? __x64_sys_lseek+0x1d0/0x1d0
Furthermore, this causes iomap_readahead() to loop forever as iomap_readahead_iter() always returns 0, making no progress.
Fix this by treating reads after the file size as access to holes, setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR and using the length argument as is for the iomap length. To simplify the code with this change, zonefs_iomap_begin() is split into the read variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops.
Reported-by: Jorgen Hansen Jorgen.Hansen@wdc.com Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Jorgen Hansen Jorgen.Hansen@wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- fs/zonefs/super.c | 92 +++++++++++++++++++++++++++++++---------------- 1 file changed, 62 insertions(+), 30 deletions(-)
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index df9c8e7e5d49..d99dfeb0e003 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -68,15 +68,49 @@ static inline void zonefs_i_size_write(struct inode *inode, loff_t isize) zi->i_flags &= ~ZONEFS_ZONE_OPEN; }
-static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, - unsigned int flags, struct iomap *iomap, - struct iomap *srcmap) +static int zonefs_read_iomap_begin(struct inode *inode, loff_t offset, + loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) { struct zonefs_inode_info *zi = ZONEFS_I(inode); struct super_block *sb = inode->i_sb; loff_t isize;
- /* All I/Os should always be within the file maximum size */ + /* + * All blocks are always mapped below EOF. If reading past EOF, + * act as if there is a hole up to the file maximum size. + */ + mutex_lock(&zi->i_truncate_mutex); + iomap->bdev = inode->i_sb->s_bdev; + iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize); + isize = i_size_read(inode); + if (iomap->offset >= isize) { + iomap->type = IOMAP_HOLE; + iomap->addr = IOMAP_NULL_ADDR; + iomap->length = length; + } else { + iomap->type = IOMAP_MAPPED; + iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset; + iomap->length = isize - iomap->offset; + } + mutex_unlock(&zi->i_truncate_mutex); + + return 0; +} + +static const struct iomap_ops zonefs_read_iomap_ops = { + .iomap_begin = zonefs_read_iomap_begin, +}; + +static int zonefs_write_iomap_begin(struct inode *inode, loff_t offset, + loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + struct super_block *sb = inode->i_sb; + loff_t isize; + + /* All write I/Os should always be within the file maximum size */ if (WARN_ON_ONCE(offset + length > zi->i_max_size)) return -EIO;
@@ -86,7 +120,7 @@ static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, * operation. */ if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ && - (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT))) + !(flags & IOMAP_DIRECT))) return -EIO;
/* @@ -95,45 +129,42 @@ static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, * write pointer) and unwriten beyond. */ mutex_lock(&zi->i_truncate_mutex); + iomap->bdev = inode->i_sb->s_bdev; + iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize); + iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset; isize = i_size_read(inode); - if (offset >= isize) + if (iomap->offset >= isize) { iomap->type = IOMAP_UNWRITTEN; - else + iomap->length = zi->i_max_size - iomap->offset; + } else { iomap->type = IOMAP_MAPPED; - if (flags & IOMAP_WRITE) - length = zi->i_max_size - offset; - else - length = min(length, isize - offset); + iomap->length = isize - iomap->offset; + } mutex_unlock(&zi->i_truncate_mutex);
- iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize); - iomap->length = ALIGN(offset + length, sb->s_blocksize) - iomap->offset; - iomap->bdev = inode->i_sb->s_bdev; - iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset; - return 0; }
-static const struct iomap_ops zonefs_iomap_ops = { - .iomap_begin = zonefs_iomap_begin, +static const struct iomap_ops zonefs_write_iomap_ops = { + .iomap_begin = zonefs_write_iomap_begin, };
static int zonefs_readpage(struct file *unused, struct page *page) { - return iomap_readpage(page, &zonefs_iomap_ops); + return iomap_readpage(page, &zonefs_read_iomap_ops); }
static void zonefs_readahead(struct readahead_control *rac) { - iomap_readahead(rac, &zonefs_iomap_ops); + iomap_readahead(rac, &zonefs_read_iomap_ops); }
/* * Map blocks for page writeback. This is used only on conventional zone files, * which implies that the page range can only be within the fixed inode size. */ -static int zonefs_map_blocks(struct iomap_writepage_ctx *wpc, - struct inode *inode, loff_t offset) +static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc, + struct inode *inode, loff_t offset) { struct zonefs_inode_info *zi = ZONEFS_I(inode);
@@ -147,12 +178,12 @@ static int zonefs_map_blocks(struct iomap_writepage_ctx *wpc, offset < wpc->iomap.offset + wpc->iomap.length) return 0;
- return zonefs_iomap_begin(inode, offset, zi->i_max_size - offset, - IOMAP_WRITE, &wpc->iomap, NULL); + return zonefs_write_iomap_begin(inode, offset, zi->i_max_size - offset, + IOMAP_WRITE, &wpc->iomap, NULL); }
static const struct iomap_writeback_ops zonefs_writeback_ops = { - .map_blocks = zonefs_map_blocks, + .map_blocks = zonefs_write_map_blocks, };
static int zonefs_writepage(struct page *page, struct writeback_control *wbc) @@ -182,7 +213,8 @@ static int zonefs_swap_activate(struct swap_info_struct *sis, return -EINVAL; }
- return iomap_swapfile_activate(sis, swap_file, span, &zonefs_iomap_ops); + return iomap_swapfile_activate(sis, swap_file, span, + &zonefs_read_iomap_ops); }
static const struct address_space_operations zonefs_file_aops = { @@ -612,7 +644,7 @@ static vm_fault_t zonefs_filemap_page_mkwrite(struct vm_fault *vmf)
/* Serialize against truncates */ down_read(&zi->i_mmap_sem); - ret = iomap_page_mkwrite(vmf, &zonefs_iomap_ops); + ret = iomap_page_mkwrite(vmf, &zonefs_write_iomap_ops); up_read(&zi->i_mmap_sem);
sb_end_pagefault(inode->i_sb); @@ -869,7 +901,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) if (append) ret = zonefs_file_dio_append(iocb, from); else - ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, + ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops, &zonefs_write_dio_ops, sync); if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { @@ -911,7 +943,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb *iocb, if (ret <= 0) goto inode_unlock;
- ret = iomap_file_buffered_write(iocb, from, &zonefs_iomap_ops); + ret = iomap_file_buffered_write(iocb, from, &zonefs_write_iomap_ops); if (ret > 0) iocb->ki_pos += ret; else if (ret == -EIO) @@ -1004,7 +1036,7 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) goto inode_unlock; } file_accessed(iocb->ki_filp); - ret = iomap_dio_rw(iocb, to, &zonefs_iomap_ops, + ret = iomap_dio_rw(iocb, to, &zonefs_read_iomap_ops, &zonefs_read_dio_ops, is_sync_kiocb(iocb)); } else { ret = generic_file_read_iter(iocb, to);
From: Marian Postevca posteuca@mutex.one
stable inclusion from stable-v5.10.125 commit 16b1994679a0ad7f12a3b40b928e70241303d6f4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit b337af3a4d6147000b7ca6b3438bf5c820849b37 upstream.
In systemd systems setting a fixed MAC address through the "dev_addr" module argument fails systematically. When checking the MAC address after the interface is created it always has the same but different MAC address to the one supplied as argument.
This is partially caused by systemd which by default will set an internally generated permanent MAC address for interfaces that are marked as having a randomly generated address.
Commit 890d5b40908bfd1a ("usb: gadget: u_ether: fix race in setting MAC address in setup phase") didn't take into account the fact that the interface must be marked as having a set MAC address when it's set as module argument.
Fixed by marking the interface with NET_ADDR_SET when the "dev_addr" module argument is supplied.
Fixes: 890d5b40908bfd1a ("usb: gadget: u_ether: fix race in setting MAC address in setup phase") Cc: stable@vger.kernel.org Signed-off-by: Marian Postevca posteuca@mutex.one Link: https://lore.kernel.org/r/20220603153459.32722-1-posteuca@mutex.one Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/usb/gadget/function/u_ether.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c index a40be8b448c2..64ef97ab9274 100644 --- a/drivers/usb/gadget/function/u_ether.c +++ b/drivers/usb/gadget/function/u_ether.c @@ -772,9 +772,13 @@ struct eth_dev *gether_setup_name(struct usb_gadget *g, dev->qmult = qmult; snprintf(net->name, sizeof(net->name), "%s%%d", netname);
- if (get_ether_addr(dev_addr, net->dev_addr)) + if (get_ether_addr(dev_addr, net->dev_addr)) { + net->addr_assign_type = NET_ADDR_RANDOM; dev_warn(&g->dev, "using random %s ethernet address\n", "self"); + } else { + net->addr_assign_type = NET_ADDR_SET; + } if (get_ether_addr(host_addr, dev->host_mac)) dev_warn(&g->dev, "using random %s ethernet address\n", "host"); @@ -831,6 +835,9 @@ struct net_device *gether_setup_name_default(const char *netname) INIT_LIST_HEAD(&dev->tx_reqs); INIT_LIST_HEAD(&dev->rx_reqs);
+ /* by default we always have a random MAC address */ + net->addr_assign_type = NET_ADDR_RANDOM; + skb_queue_head_init(&dev->rx_frames);
/* network device setup */ @@ -868,7 +875,6 @@ int gether_register_netdev(struct net_device *net) g = dev->gadget;
memcpy(net->dev_addr, dev->dev_mac, ETH_ALEN); - net->addr_assign_type = NET_ADDR_RANDOM;
status = register_netdev(net); if (status < 0) { @@ -908,6 +914,7 @@ int gether_set_dev_addr(struct net_device *net, const char *dev_addr) if (get_ether_addr(dev_addr, new_addr)) return -EINVAL; memcpy(dev->dev_mac, new_addr, ETH_ALEN); + net->addr_assign_type = NET_ADDR_SET; return 0; } EXPORT_SYMBOL_GPL(gether_set_dev_addr);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v5.10.125 commit 743acb520799bd8987edb7d1282b7eeddeb8f986 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c579bd1b4021c42ae247108f1e6f73dd3f08600c upstream.
Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash Port Selection Algorithm), a patient attacker could still be able to collect enough state from an otherwise idle host.
Idea of this patch is to inject some noise, in the cases __inet_hash_connect() found a candidate in the first attempt.
This noise should not significantly reduce the collision avoidance, and should be zero if connection table is already well used.
Note that this is not implementing RFC 6056 3.3.5 because we think Algorithm 5 could hurt typical workloads.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: David Dworken ddworken@google.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Cc: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/ipv4/inet_hashtables.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 2bb9ded807ee..aadafcf7358b 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -833,6 +833,11 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, return -EADDRNOTAVAIL;
ok: + /* If our first attempt found a candidate, skip next candidate + * in 1/16 of cases to add some noise. + */ + if (!i && !(prandom_u32() % 16)) + i = 2; WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
/* Head lock still held and bh's disabled */
From: Willy Tarreau w@1wt.eu
stable inclusion from stable-v5.10.125 commit dd46a868fcfdf3aac8ffb20b2321e174a0156fb2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 9e9b70ae923baf2b5e8a0ea4fd0c8451801ac526 upstream.
Amit Klein suggests that we use different parts of port_offset for the table's index and the port offset so that there is no direct relation between them.
Cc: Jason A. Donenfeld Jason@zx2c4.com Cc: Moshe Kol moshe.kol@mail.huji.ac.il Cc: Yossi Gilad yossi.gilad@mail.huji.ac.il Cc: Amit Klein aksecurity@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Cc: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/ipv4/inet_hashtables.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index aadafcf7358b..3fc82bd37952 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -778,7 +778,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, net_get_random_once(table_perturb, sizeof(table_perturb)); index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
- offset = READ_ONCE(table_perturb[index]) + port_offset; + offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32); offset %= remaining; /* In first pass we try ports of @low parity. * inet_csk_get_port() does the opposite choice.
From: Willy Tarreau w@1wt.eu
stable inclusion from stable-v5.10.125 commit d28e64b1c63eced06aedadcacb0be4997c10c7c1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit ca7af0402550f9a0b3316d5f1c30904e42ed257d upstream.
Here we're randomly adding between 0 and 7 random increments to the selected source port in order to add some noise in the source port selection that will make the next port less predictable.
With the default port range of 32768-60999 this means a worst case reuse scenario of 14116/8=1764 connections between two consecutive uses of the same port, with an average of 14116/4.5=3137. This code was stressed at more than 800000 connections per second to a fixed target with all connections closed by the client using RSTs (worst condition) and only 2 connections failed among 13 billion, despite the hash being reseeded every 10 seconds, indicating a perfectly safe situation.
Cc: Moshe Kol moshe.kol@mail.huji.ac.il Cc: Yossi Gilad yossi.gilad@mail.huji.ac.il Cc: Amit Klein aksecurity@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Cc: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/ipv4/inet_hashtables.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 3fc82bd37952..734bea0f883e 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -833,11 +833,12 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, return -EADDRNOTAVAIL;
ok: - /* If our first attempt found a candidate, skip next candidate - * in 1/16 of cases to add some noise. + /* Here we want to add a little bit of randomness to the next source + * port that will be chosen. We use a max() with a random here so that + * on low contention the randomness is maximal and on high contention + * it may be inexistent. */ - if (!i && !(prandom_u32() % 16)) - i = 2; + i = max_t(int, i, (prandom_u32() & 7) * 2); WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
/* Head lock still held and bh's disabled */
From: Willy Tarreau w@1wt.eu
stable inclusion from stable-v5.10.125 commit 24b922a5da0055f1bb8b391b83e494d2e5d56508 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit e9261476184be1abd486c9434164b2acbe0ed6c2 upstream.
We'll need to further increase the size of this table and it's likely that at some point its size will not be suitable anymore for a static table. Let's allocate it on boot from inet_hashinfo2_init(), which is called from tcp_init().
Cc: Moshe Kol moshe.kol@mail.huji.ac.il Cc: Yossi Gilad yossi.gilad@mail.huji.ac.il Cc: Amit Klein aksecurity@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Cc: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com
Conflicts: net/ipv4/inet_hashtables.c Reviewed-by: Wei Li liwei391@huawei.com --- net/ipv4/inet_hashtables.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 734bea0f883e..72716d1a06e4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -732,7 +732,8 @@ EXPORT_SYMBOL_GPL(inet_unhash); * memory. */ #define INET_TABLE_PERTURB_SHIFT 16 -static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT]; +#define INET_TABLE_PERTURB_SIZE (1 << INET_TABLE_PERTURB_SHIFT) +static u32 *table_perturb;
int __inet_hash_connect(struct inet_timewait_death_row *death_row, struct sock *sk, u64 port_offset, @@ -775,7 +776,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, if (likely(remaining > 1)) remaining &= ~1U;
- net_get_random_once(table_perturb, sizeof(table_perturb)); + net_get_random_once(table_perturb, + INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb)); index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32); @@ -912,6 +914,12 @@ void __init inet_hashinfo2_init(struct inet_hashinfo *h, const char *name, low_limit, high_limit); init_hashinfo_lhash2(h); + + /* this one is used for source ports of outgoing connections */ + table_perturb = kmalloc_array(INET_TABLE_PERTURB_SIZE, + sizeof(*table_perturb), GFP_KERNEL); + if (!table_perturb) + panic("TCP: failed to alloc table_perturb"); }
int inet_hashinfo2_init_mod(struct inet_hashinfo *h)
From: Willy Tarreau w@1wt.eu
stable inclusion from stable-v5.10.125 commit 7ccb026ecb997405b59d391140c25ee347891504 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit e8161345ddbb66e449abde10d2fdce93f867eba9 upstream.
In commit 190cc82489f4 ("tcp: change source port randomizarion at connect() time"), the table_perturb[] array was introduced and an index was taken from the port_offset via hash_32(). But it turns out that hash_32() performs a multiplication while the input here comes from the output of SipHash in secure_seq, that is well distributed enough to avoid the need for yet another hash.
Suggested-by: Amit Klein aksecurity@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Cc: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- net/ipv4/inet_hashtables.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 72716d1a06e4..31b4196ea7e1 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -778,7 +778,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
net_get_random_once(table_perturb, INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb)); - index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT); + index = port_offset & (INET_TABLE_PERTURB_SIZE - 1);
offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32); offset %= remaining;
From: Lukas Wunner lukas@wunner.de
stable inclusion from stable-v5.10.125 commit a1508d164e58dc0857a1103a014d78f321d7481f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 2dd8a74fddd21b95dcc60a2d3c9eaec993419d69 upstream.
RTS polarity of rs485-enabled ports is currently initialized on uart open via:
tty_port_open() tty_port_block_til_ready() tty_port_raise_dtr_rts() # if (C_BAUD(tty)) uart_dtr_rts() uart_port_dtr_rts()
There's at least three problems here:
First, if no baud rate is set, RTS polarity is not initialized. That's the right thing to do for rs232, but not for rs485, which requires that RTS is deasserted unconditionally.
Second, if the DeviceTree property "linux,rs485-enabled-at-boot-time" is present, RTS should be deasserted as early as possible, i.e. on probe. Otherwise it may remain asserted until first open.
Third, even though RTS is deasserted on open and close, it may subsequently be asserted by uart_throttle(), uart_unthrottle() or uart_set_termios() because those functions aren't rs485-aware. (Only uart_tiocmset() is.)
To address these issues, move RTS initialization from uart_port_dtr_rts() to uart_configure_port(). Prevent subsequent modification of RTS polarity by moving the existing rs485 check from uart_tiocmget() to uart_update_mctrl().
That way, RTS is initialized on probe and then remains unmodified unless the uart transmits data. If rs485 is enabled at runtime (instead of at boot) through a TIOCSRS485 ioctl(), RTS is initialized by the uart driver's ->rs485_config() callback and then likewise remains unmodified.
The PL011 driver initializes RTS on uart open and prevents subsequent modification in its ->set_mctrl() callback. That code is obsoleted by the present commit, so drop it.
Cc: Jan Kiszka jan.kiszka@siemens.com Cc: Su Bao Cheng baocheng.su@siemens.com Signed-off-by: Lukas Wunner lukas@wunner.de Link: https://lore.kernel.org/r/2d2acaf3a69e89b7bf687c912022b11fd29dfa1e.164290928... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/tty/serial/serial_core.c | 34 +++++++++++--------------------- 1 file changed, 12 insertions(+), 22 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 19f0c5db11e3..32d09d024f6c 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -144,6 +144,11 @@ uart_update_mctrl(struct uart_port *port, unsigned int set, unsigned int clear) unsigned long flags; unsigned int old;
+ if (port->rs485.flags & SER_RS485_ENABLED) { + set &= ~TIOCM_RTS; + clear &= ~TIOCM_RTS; + } + spin_lock_irqsave(&port->lock, flags); old = port->mctrl; port->mctrl = (old & ~clear) | set; @@ -157,23 +162,10 @@ uart_update_mctrl(struct uart_port *port, unsigned int set, unsigned int clear)
static void uart_port_dtr_rts(struct uart_port *uport, int raise) { - int rs485_on = uport->rs485_config && - (uport->rs485.flags & SER_RS485_ENABLED); - int RTS_after_send = !!(uport->rs485.flags & SER_RS485_RTS_AFTER_SEND); - - if (raise) { - if (rs485_on && RTS_after_send) { - uart_set_mctrl(uport, TIOCM_DTR); - uart_clear_mctrl(uport, TIOCM_RTS); - } else { - uart_set_mctrl(uport, TIOCM_DTR | TIOCM_RTS); - } - } else { - unsigned int clear = TIOCM_DTR; - - clear |= (!rs485_on || RTS_after_send) ? TIOCM_RTS : 0; - uart_clear_mctrl(uport, clear); - } + if (raise) + uart_set_mctrl(uport, TIOCM_DTR | TIOCM_RTS); + else + uart_clear_mctrl(uport, TIOCM_DTR | TIOCM_RTS); }
/* @@ -1116,11 +1108,6 @@ uart_tiocmset(struct tty_struct *tty, unsigned int set, unsigned int clear) goto out;
if (!tty_io_error(tty)) { - if (uport->rs485.flags & SER_RS485_ENABLED) { - set &= ~TIOCM_RTS; - clear &= ~TIOCM_RTS; - } - uart_update_mctrl(uport, set, clear); ret = 0; } @@ -2429,6 +2416,9 @@ uart_configure_port(struct uart_driver *drv, struct uart_state *state, */ spin_lock_irqsave(&port->lock, flags); port->mctrl &= TIOCM_DTR; + if (port->rs485.flags & SER_RS485_ENABLED && + !(port->rs485.flags & SER_RS485_RTS_AFTER_SEND)) + port->mctrl |= TIOCM_RTS; port->ops->set_mctrl(port, port->mctrl); spin_unlock_irqrestore(&port->lock, flags);
From: Will Deacon will@kernel.org
stable inclusion from stable-v5.10.125 commit 1a264b3a6940b2595dc6b51edf8b1d9a71963fc7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6EY
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit c50f11c6196f45c92ca48b16a5071615d4ae0572 upstream.
Invalidating the buffer memory in arch_sync_dma_for_device() for FROM_DEVICE transfers
When using the streaming DMA API to map a buffer prior to inbound non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU cachelines so that they will not be written back during the transfer and corrupt the buffer contents written by the DMA. This, however, poses two potential problems:
(1) If the DMA transfer does not write to every byte in the buffer, then the unwritten bytes will contain stale data once the transfer has completed.
(2) If the buffer has a virtual alias in userspace, then stale data may be visible via this alias during the period between performing the cache invalidation and the DMA writes landing in memory.
Address both of these issues by cleaning (aka writing-back) the dirty lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding them using invalidation.
Cc: Ard Biesheuvel ardb@kernel.org Cc: Christoph Hellwig hch@lst.de Cc: Robin Murphy robin.murphy@arm.com Cc: Russell King linux@armlinux.org.uk Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck Signed-off-by: Will Deacon will@kernel.org Reviewed-by: Ard Biesheuvel ardb@kernel.org Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- arch/arm64/mm/cache.S | 2 -- 1 file changed, 2 deletions(-)
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 2d881f34dd9d..7b8158ae36ec 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -228,8 +228,6 @@ SYM_FUNC_END_PI(__dma_flush_area) * - dir - DMA direction */ SYM_FUNC_START_PI(__dma_map_area) - cmp w2, #DMA_FROM_DEVICE - b.eq __dma_inv_area b __dma_clean_area SYM_FUNC_END_PI(__dma_map_area)