[PATCH OLK-6.6] netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation
From: Florian Westphal <fw@strlen.de> mainline inclusion from mainline-v7.0-rc1 commit 207b3ebacb6113acaaec0d171d5307032c690004 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15123 CVE: CVE-2026-45859 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Ulrich reports a regression with nfqueue: If an application did not set the 'F_GSO' capability flag and a gso packet with an unconfirmed nf_conn entry is received all packets are now dropped instead of queued, because the check happens after skb_gso_segment(). In that case, we did have exclusive ownership of the skb and its associated conntrack entry. The elevated use count is due to skb_clone happening via skb_gso_segment(). Move the check so that its peformed vs. the aggregated packet. Then, annotate the individual segments except the first one so we can do a 2nd check at reinject time. For the normal case, where userspace does in-order reinjects, this avoids packet drops: first reinjected segment continues traversal and confirms entry, remaining segments observe the confirmed entry. While at it, simplify nf_ct_drop_unconfirmed(): We only care about unconfirmed entries with a refcnt > 1, there is no need to special-case dying entries. This only happens with UDP. With TCP, the only unconfirmed packet will be the TCP SYN, those aren't aggregated by GRO. Next patch adds a udpgro test case to cover this scenario. Reported-by: Ulrich Weber <ulrich.weber@gmail.com> Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks") Signed-off-by: Florian Westphal <fw@strlen.de> Conflicts: include/net/netfilter/nf_queue.h net/netfilter/nfnetlink_queue.c [commit 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules") moved nf_reinject from nf_queue.c to nfnetlink_queue.c in mainline, but this commit is not present in the OLK-6.6 target branch. Additionally, commit e19079adcd26 ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table") added queue_num field to struct nf_queue_entry in mainline, which is absent in OLK-6.6. These differences caused multiple hunks to fail during patch application. The patch was manually adapted to preserve the target branch's file structure and error handling while applying the CVE fix logic.] Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> --- include/net/netfilter/nf_queue.h | 1 + net/netfilter/nfnetlink_queue.c | 124 +++++++++++++++++++------------ 2 files changed, 76 insertions(+), 49 deletions(-) diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h index c81021ab07aa..dc724e583e6e 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -19,6 +19,7 @@ struct nf_queue_entry { struct net_device *physout; #endif struct nf_hook_state state; + bool nf_ct_is_unconfirmed; u16 size; /* sizeof(entry) + saved route keys */ /* extra space to store route keys */ diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 0ac0db71dbc6..67a4a19711c3 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -227,6 +227,34 @@ find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id) return entry; } +/* return true if the entry has an unconfirmed conntrack attached that isn't owned by us + * exclusively. + */ +static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry, bool *is_unconfirmed) +{ +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + struct nf_conn *ct = (void *)skb_nfct(entry->skb); + + if (!ct || nf_ct_is_confirmed(ct)) + return false; + + if (is_unconfirmed) + *is_unconfirmed = true; + + /* in some cases skb_clone() can occur after initial conntrack + * pickup, but conntrack assumes exclusive skb->_nfct ownership for + * unconfirmed entries. + * + * This happens for br_netfilter and with ip multicast routing. + * This can't be solved with serialization here because one clone + * could have been queued for local delivery or could be transmitted + * in parallel on another CPU. + */ + return refcount_read(&ct->ct_general.use) > 1; +#endif + return false; +} + static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict) { const struct nf_ct_hook *ct_hook; @@ -244,6 +272,24 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict) } rcu_read_unlock(); } + + if (verdict != NF_DROP && entry->nf_ct_is_unconfirmed) { + /* If first queued segment was already reinjected then + * there is a good chance the ct entry is now confirmed. + * + * Handle the rare cases: + * - out-of-order verdict + * - threaded userspace reinjecting in parallel + * - first segment was dropped + * + * In all of those cases we can't handle this packet + * because we can't be sure that another CPU won't modify + * nf_conn->ext in parallel which isn't allowed. + */ + if (nf_ct_drop_unconfirmed(entry, NULL)) + verdict = NF_DROP; + } + nf_reinject(entry, verdict); } @@ -664,49 +710,6 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, return NULL; } -static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry) -{ -#if IS_ENABLED(CONFIG_NF_CONNTRACK) - static const unsigned long flags = IPS_CONFIRMED | IPS_DYING; - struct nf_conn *ct = (void *)skb_nfct(entry->skb); - unsigned long status; - unsigned int use; - - if (!ct) - return false; - - status = READ_ONCE(ct->status); - if ((status & flags) == IPS_DYING) - return true; - - if (status & IPS_CONFIRMED) - return false; - - /* in some cases skb_clone() can occur after initial conntrack - * pickup, but conntrack assumes exclusive skb->_nfct ownership for - * unconfirmed entries. - * - * This happens for br_netfilter and with ip multicast routing. - * We can't be solved with serialization here because one clone could - * have been queued for local delivery. - */ - use = refcount_read(&ct->ct_general.use); - if (likely(use == 1)) - return false; - - /* Can't decrement further? Exclusive ownership. */ - if (!refcount_dec_not_one(&ct->ct_general.use)) - return false; - - skb_set_nfct(entry->skb, 0); - /* No nf_ct_put(): we already decremented .use and it cannot - * drop down to 0. - */ - return true; -#endif - return false; -} - static int __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue, struct nf_queue_entry *entry) @@ -723,9 +726,6 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue, } spin_lock_bh(&queue->lock); - if (nf_ct_drop_unconfirmed(entry)) - goto err_out_free_nskb; - if (queue->queue_total >= queue->queue_maxlen) { if (queue->flags & NFQA_CFG_F_FAIL_OPEN) { failopen = 1; @@ -836,9 +836,10 @@ __nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue, static int nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) { - unsigned int queued; - struct nfqnl_instance *queue; struct sk_buff *skb, *segs, *nskb; + bool ct_is_unconfirmed = false; + struct nfqnl_instance *queue; + unsigned int queued; int err = -ENOBUFS; struct net *net = entry->state.net; struct nfnl_queue_net *q = nfnl_queue_pernet(net); @@ -862,7 +863,16 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) break; } - if ((queue->flags & NFQA_CFG_F_GSO) || !skb_is_gso(skb)) + /* Check if someone already holds another reference to + * unconfirmed ct. If so, we cannot queue the skb: + * concurrent modifications of nf_conn->ext are not + * allowed and we can't know if another CPU isn't + * processing the same nf_conn entry in parallel. + */ + if (nf_ct_drop_unconfirmed(entry, &ct_is_unconfirmed)) + return -EINVAL; + + if (!skb_is_gso(skb) || ((queue->flags & NFQA_CFG_F_GSO) && !skb_is_gso_sctp(skb))) return __nfqnl_enqueue_packet(net, queue, entry); nf_bridge_adjust_skb_data(skb); @@ -875,7 +885,23 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) goto out_err; queued = 0; err = 0; + skb_list_walk_safe(segs, segs, nskb) { + if (ct_is_unconfirmed && queued > 0) { + /* skb_gso_segment() increments the ct refcount. + * This is a problem for unconfirmed (not in hash) + * entries, those can race when reinjections happen + * in parallel. + * + * Annotate this for all queued entries except the + * first one. + * + * As long as the first one is reinjected first it + * will do the confirmation for us. + */ + entry->nf_ct_is_unconfirmed = ct_is_unconfirmed; + } + if (err == 0) err = __nfqnl_enqueue_packet_gso(net, queue, segs, entry); -- 2.43.0
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,转换为PR失败! 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ARH... 失败原因:调用atomgit api创建PR失败, 失败原因如下: Another open merge request already exists for this source branch: !23655 建议解决方法:请稍等,机器人会在下一次任务重新执行 FeedBack: The patch(es) which you have sent to kernel@openeuler.org has been converted to PR failed! Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ARH... Failed Reason: create PR failed when call atomgit's api, failed reason is as follows: Another open merge request already exists for this source branch: !23655 Suggest Solution: please wait, the bot will retry in the next interval
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,转换为PR失败! 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ARH... 失败原因:调用atomgit api创建PR失败, 失败原因如下: Another open merge request already exists for this source branch: !23656 建议解决方法:请稍等,机器人会在下一次任务重新执行 FeedBack: The patch(es) which you have sent to kernel@openeuler.org has been converted to PR failed! Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ARH... Failed Reason: create PR failed when call atomgit's api, failed reason is as follows: Another open merge request already exists for this source branch: !23656 Suggest Solution: please wait, the bot will retry in the next interval
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/23658 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ARH... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/23658 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ARH...
participants (2)
-
patchwork bot -
superdcc97@163.com