From: Toke Høiland-Jørgensen <toke(a)redhat.com>
[ Upstream commit ad1e03b2b3d4430baaa109b77bc308dc73050de3 ]
The current generic XDP handler skips execution of XDP programs entirely if
an SKB is marked as cloned. This leads to some surprising behaviour, as
packets can end up being cloned in various ways, which will make an XDP
program not see all the traffic on an interface.
This was discovered by a simple test case where an XDP program that always
returns XDP_DROP is installed on a …
[View More]veth device. When combining this with
the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending
side, SKBs reliably end up in the cloned state, causing them to be passed
through to the receiving interface instead of being dropped. A minimal
reproducer script for this is included below.
This patch fixed the issue by simply triggering the existing linearisation
code for cloned SKBs instead of skipping the XDP program execution. This
behaviour is in line with the behaviour of the native XDP implementation
for the veth driver, which will reallocate and copy the SKB data if the SKB
is marked as shared.
Reproducer Python script (requires BCC and Scapy):
from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP
from bcc import BPF
import time, sys, subprocess, shlex
SKB_MODE = (1 << 1)
DRV_MODE = (1 << 2)
PYTHON=sys.executable
def client():
time.sleep(2)
# Sniffing on the sender causes skb_cloned() to be set
s = AsyncSniffer()
s.start()
for p in range(10):
sendp(Ether(dst="aa:aa:aa:aa:aa:aa", src="cc:cc:cc:cc:cc:cc")/IP()/UDP()/Raw("Test"),
verbose=False)
time.sleep(0.1)
s.stop()
return 0
def server(mode):
prog = BPF(text="int dummy_drop(struct xdp_md *ctx) {return XDP_DROP;}")
func = prog.load_func("dummy_drop", BPF.XDP)
prog.attach_xdp("a_to_b", func, mode)
time.sleep(1)
s = sniff(iface="a_to_b", count=10, timeout=15)
if len(s):
print(f"Got {len(s)} packets - should have gotten 0")
return 1
else:
print("Got no packets - as expected")
return 0
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <skb|drv>")
sys.exit(1)
if sys.argv[1] == "client":
sys.exit(client())
elif sys.argv[1] == "server":
mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE
sys.exit(server(mode))
else:
try:
mode = sys.argv[1]
if mode not in ('skb', 'drv'):
print(f"Usage: {sys.argv[0]} <skb|drv>")
sys.exit(1)
print(f"Running in {mode} mode")
for cmd in [
'ip netns add netns_a',
'ip netns add netns_b',
'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b',
# Disable ipv6 to make sure there's no address autoconf traffic
'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1',
'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1',
'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa',
'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc',
'ip -n netns_a link set dev a_to_b up',
'ip -n netns_b link set dev b_to_a up']:
subprocess.check_call(shlex.split(cmd))
server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}"))
client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client"))
client.wait()
server.wait()
sys.exit(server.returncode)
finally:
subprocess.run(shlex.split("ip netns delete netns_a"))
subprocess.run(shlex.split("ip netns delete netns_b"))
Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Reported-by: Stepan Horacek <shoracek(a)redhat.com>
Suggested-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
net/core/dev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1c0224e..c1a3baf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4306,14 +4306,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
/* Reinjected packets coming from act_mirred or similar should
* not get XDP generic processing.
*/
- if (skb_cloned(skb) || skb_is_tc_redirected(skb))
+ if (skb_is_tc_redirected(skb))
return XDP_PASS;
/* XDP packets must be linear and must have sufficient headroom
* of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
* native XDP provides, thus we need to do it here as well.
*/
- if (skb_is_nonlinear(skb) ||
+ if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
skb_headroom(skb) < XDP_PACKET_HEADROOM) {
int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
int troom = skb->tail + skb->data_len - skb->end;
--
1.8.3
[View Less]
From: Al Viro <viro(a)zeniv.linux.org.uk>
mainline inclusion
from mainline-5.1-rc1
commit 35ac1184244f1329783e1d897f74926d8bb1103a
category: bugfix
bugzilla: 30217
CVE: NA
---------------------------
* make the reference from superblock to cgroup_root counting -
do cgroup_put() in cgroup_kill_sb() whether we'd done
percpu_ref_kill() or not; matching grab is done when we allocate
a new root. That gives the same refcounting rules for all callers
of cgroup_do_mount() - a reference to …
[View More]cgroup_root has been grabbed
by caller and it either is transferred to new superblock or dropped.
* have cgroup_kill_sb() treat an already killed refcount as "just
don't bother killing it, then".
* after successful cgroup_do_mount() have cgroup1_mount() recheck
if we'd raced with mount/umount from somebody else and cgroup_root
got killed. In that case we drop the superblock and bugger off
with -ERESTARTSYS, same as if we'd found it in the list already
dying.
* don't bother with delayed initialization of refcount - it's
unreliable and not needed. No need to prevent attempts to bump
the refcount if we find cgroup_root of another mount in progress -
sget will reuse an existing superblock just fine and if the
other sb manages to die before we get there, we'll catch
that immediately after cgroup_do_mount().
* don't bother with kernfs_pin_sb() - no need for doing that
either.
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: yangerkun <yangerkun(a)huawei.com>
Reviewed-by: Hou Tao <houtao1(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
kernel/cgroup/cgroup-internal.h | 2 +-
kernel/cgroup/cgroup-v1.c | 58 +++++++++--------------------------------
kernel/cgroup/cgroup.c | 16 +++++-------
3 files changed, 21 insertions(+), 55 deletions(-)
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 75568fc..6f02be1 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -196,7 +196,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
void cgroup_free_root(struct cgroup_root *root);
void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts);
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask);
int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
struct cgroup_root *root, unsigned long magic,
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index ff3e1aa..e66bb45 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1112,13 +1112,11 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
void *data, unsigned long magic,
struct cgroup_namespace *ns)
{
- struct super_block *pinned_sb = NULL;
struct cgroup_sb_opts opts;
struct cgroup_root *root;
struct cgroup_subsys *ss;
struct dentry *dentry;
int i, ret;
- bool new_root = false;
cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
@@ -1180,29 +1178,6 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
if (root->flags ^ opts.flags)
pr_warn("new mount options do not match the existing superblock, will be ignored\n");
- /*
- * We want to reuse @root whose lifetime is governed by its
- * ->cgrp. Let's check whether @root is alive and keep it
- * that way. As cgroup_kill_sb() can happen anytime, we
- * want to block it by pinning the sb so that @root doesn't
- * get killed before mount is complete.
- *
- * With the sb pinned, tryget_live can reliably indicate
- * whether @root can be reused. If it's being killed,
- * drain it. We can use wait_queue for the wait but this
- * path is super cold. Let's just sleep a bit and retry.
- */
- pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
- if (IS_ERR(pinned_sb) ||
- !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
- mutex_unlock(&cgroup_mutex);
- if (!IS_ERR_OR_NULL(pinned_sb))
- deactivate_super(pinned_sb);
- msleep(10);
- ret = restart_syscall();
- goto out_free;
- }
-
ret = 0;
goto out_unlock;
}
@@ -1228,15 +1203,20 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
ret = -ENOMEM;
goto out_unlock;
}
- new_root = true;
init_cgroup_root(root, &opts);
- ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD);
+ ret = cgroup_setup_root(root, opts.subsys_mask);
if (ret)
cgroup_free_root(root);
out_unlock:
+ if (!ret && !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
+ mutex_unlock(&cgroup_mutex);
+ msleep(10);
+ ret = restart_syscall();
+ goto out_free;
+ }
mutex_unlock(&cgroup_mutex);
out_free:
kfree(opts.release_agent);
@@ -1248,25 +1228,13 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
dentry = cgroup_do_mount(&cgroup_fs_type, flags, root,
CGROUP_SUPER_MAGIC, ns);
- /*
- * There's a race window after we release cgroup_mutex and before
- * allocating a superblock. Make sure a concurrent process won't
- * be able to re-use the root during this window by delaying the
- * initialization of root refcnt.
- */
- if (new_root) {
- mutex_lock(&cgroup_mutex);
- percpu_ref_reinit(&root->cgrp.self.refcnt);
- mutex_unlock(&cgroup_mutex);
+ if (!IS_ERR(dentry) && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
+ struct super_block *sb = dentry->d_sb;
+ dput(dentry);
+ deactivate_locked_super(sb);
+ msleep(10);
+ dentry = ERR_PTR(restart_syscall());
}
-
- /*
- * If @pinned_sb, we're reusing an existing root and holding an
- * extra ref on its sb. Mount is complete. Put the extra ref.
- */
- if (pinned_sb)
- deactivate_super(pinned_sb);
-
return dentry;
}
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 08bd40d..eaee21d 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1897,7 +1897,7 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts)
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
}
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
{
LIST_HEAD(tmp_links);
struct cgroup *root_cgrp = &root->cgrp;
@@ -1914,7 +1914,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
root_cgrp->ancestor_ids[0] = ret;
ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
- ref_flags, GFP_KERNEL);
+ 0, GFP_KERNEL);
if (ret)
goto out;
@@ -2091,18 +2091,16 @@ static void cgroup_kill_sb(struct super_block *sb)
struct cgroup_root *root = cgroup_root_from_kf(kf_root);
/*
- * If @root doesn't have any mounts or children, start killing it.
+ * If @root doesn't have any children, start killing it.
* This prevents new mounts by disabling percpu_ref_tryget_live().
* cgroup_mount() may wait for @root's release.
*
* And don't kill the default root.
*/
- if (!list_empty(&root->cgrp.self.children) ||
- root == &cgrp_dfl_root)
- cgroup_put(&root->cgrp);
- else
+ if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root &&
+ !percpu_ref_is_dying(&root->cgrp.self.refcnt))
percpu_ref_kill(&root->cgrp.self.refcnt);
-
+ cgroup_put(&root->cgrp);
kernfs_kill_sb(sb);
}
@@ -5371,7 +5369,7 @@ int __init cgroup_init(void)
hash_add(css_set_table, &init_css_set.hlist,
css_set_hash(init_css_set.subsys));
- BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0));
+ BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
mutex_unlock(&cgroup_mutex);
--
1.8.3
[View Less]
From: Yu'an Wang <wangyuan46(a)huawei.com>
driver inclusion
category: bugfix
bugzilla: NA
CVE: NA
In this patch, we try to fixup the problem of wrong judgement of
used parameter. When the accelerator driver registers to crypto,
the self-test program will send task to hardware, the used para
will decrease in interrupt thread, but exit flow of crypto will
call hisi_qm_stop_qp_nolock function to stop queue, which try to
get value of used. In the above scene, it will appear to get the
value …
[View More]first and then decrease, which causes null pointer. So we
should distinguish fault handling process from normal stop process.
Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com>
Reviewed-by: Cheng Hu <hucheng.hu(a)huawei.com>
Reviewed-by: Guangwei Zhou <zhouguangwei5(a)huawei.com>
Reviewed-by: Junxian Liu <liujunxian3(a)huawei.com>
Reviewed-by: Shukun Tan <tanshukun1(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
drivers/crypto/hisilicon/qm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index e89a770..86f4a12 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -1591,10 +1591,12 @@ static int hisi_qm_stop_qp_nolock(struct hisi_qp *qp)
if (qp->qm->wq)
flush_workqueue(qp->qm->wq);
+ else
+ flush_work(&qp->qm->work);
/* wait for increase used count in qp send and last poll qp finish */
udelay(WAIT_PERIOD);
- if (atomic_read(&qp->qp_status.used))
+ if (unlikely(qp->is_resetting && atomic_read(&qp->qp_status.used)))
qp_stop_fail_cb(qp);
dev_dbg(dev, "stop queue %u!", qp->qp_id);
--
1.8.3
[View Less]