From: Eugene Crosser crosser@average.org
stable inclusion from stable-5.10.78 commit b9c85a71e1b4898e978c594a44e881456313a855 bugzilla: 185700 https://gitee.com/openeuler/kernel/issues/I4IAU2
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 55161e67d44fdd23900be166a81e996abd6e3be9 upstream.
This reverts commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1.
When an interface is enslaved in a VRF, prerouting conntrack hook is called twice: once in the context of the original input interface, and once in the context of the VRF interface. If no special precausions are taken, this leads to creation of two conntrack entries instead of one, and breaks SNAT.
Commit above was intended to avoid creation of extra conntrack entries when input interface is enslaved in a VRF. It did so by resetting conntrack related data associated with the skb when it enters VRF context.
However it breaks netfilter operation. Imagine a use case when conntrack zone must be assigned based on the original input interface, rather than VRF interface (that would make original interfaces indistinguishable). One could create netfilter rules similar to these:
chain rawprerouting { type filter hook prerouting priority raw; iif realiface1 ct zone set 1 return iif realiface2 ct zone set 2 return }
This works before the mentioned commit, but not after: zone assignment is "forgotten", and any subsequent NAT or filtering that is dependent on the conntrack zone does not work.
Here is a reproducer script that demonstrates the difference in behaviour.
==========
IPIN=172.30.30.1 IPOUT=172.30.30.2 PFXL=30
ip li sh vein >/dev/null 2>&1 && ip li del vein ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf nft list table testct >/dev/null 2>&1 && nft delete table testct
ip li add vein type veth peer veout ip li add tvrf type vrf table 9876 ip li set veout master tvrf ip li set vein up ip li set veout up ip li set tvrf up /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1 /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0 ip addr add $IPIN/$PFXL dev vein ip addr add $IPOUT/$PFXL dev veout
nft -f - <<__END__ table testct { chain rawpre { type filter hook prerouting priority raw; iif { veout, tvrf } meta nftrace set 1 iif veout ct zone set 1 return iif tvrf ct zone set 2 return notrack } chain rawout { type filter hook output priority raw; notrack } } __END__
uname -rv conntrack -F ping -W 1 -c 1 -I vein $IPOUT conntrack -L
Signed-off-by: Eugene Crosser crosser@average.org Acked-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Cc: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Chen Jun chenjun102@huawei.com Acked-by: Weilong Chen chenweilong@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/net/vrf.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index d406da82b4fb..2746f77745e4 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1313,8 +1313,6 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr); bool is_ndisc = ipv6_ndisc_frame(skb);
- nf_reset_ct(skb); - /* loopback, multicast & non-ND link-local traffic; do not push through * packet taps again. Reset pkt_type for upper layers to process skb. * For strict packets with a source LLA, determine the dst using the @@ -1371,8 +1369,6 @@ static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, skb->skb_iif = vrf_dev->ifindex; IPCB(skb)->flags |= IPSKB_L3SLAVE;
- nf_reset_ct(skb); - if (ipv4_is_multicast(ip_hdr(skb)->daddr)) goto out;