From: Andrei Matei andreimatei1@gmail.com
mainline inclusion from mainline-v6.8-rc1 commit 6b4a64bafd107e521c01eec3453ce94a3fb38529 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I932VT CVE: CVE-2023-52452
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Privileged programs are supposed to be able to read uninitialized stack memory (ever since 6715df8d5) but, before this patch, these accesses were permitted inconsistently. In particular, accesses were permitted above state->allocated_stack, but not below it. In other words, if the stack was already "large enough", the access was permitted, but otherwise the access was rejected instead of being allowed to "grow the stack". This undesired rejection was happening in two places: - in check_stack_slot_within_bounds() - in check_stack_range_initialized() This patch arranges for these accesses to be permitted. A bunch of tests that were relying on the old rejection had to change; all of them were changed to add also run unprivileged, in which case the old behavior persists. One tests couldn't be updated - global_func16 - because it can't run unprivileged for other reasons.
This patch also fixes the tracking of the stack size for variable-offset reads. This second fix is bundled in the same commit as the first one because they're inter-related. Before this patch, writes to the stack using registers containing a variable offset (as opposed to registers with fixed, known values) were not properly contributing to the function's needed stack size. As a result, it was possible for a program to verify, but then to attempt to read out-of-bounds data at runtime because a too small stack had been allocated for it.
Each function tracks the size of the stack it needs in bpf_subprog_info.stack_depth, which is maintained by update_stack_depth(). For regular memory accesses, check_mem_access() was calling update_state_depth() but it was passing in only the fixed part of the offset register, ignoring the variable offset. This was incorrect; the minimum possible value of that register should be used instead.
This tracking is now fixed by centralizing the tracking of stack size in grow_stack_state(), and by lifting the calls to grow_stack_state() to check_stack_access_within_bounds() as suggested by Andrii. The code is now simpler and more convincingly tracks the correct maximum stack size. check_stack_range_initialized() can now rely on enough stack having been allocated for the access; this helps with the fix for the first issue.
A few tests were changed to also check the stack depth computation. The one that fails without this patch is verifier_var_off:stack_write_priv_vs_unpriv.
Fixes: 01f810ace9ed3 ("bpf: Allow variable-offset stack access") Reported-by: Hao Sun sunhao.th@gmail.com Signed-off-by: Andrei Matei andreimatei1@gmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Acked-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20231208032519.260451-3-andreimatei1@gmail.com Closes: https://lore.kernel.org/bpf/CABWLsev9g8UP_c3a=1qbuZUi20tGoUXoU07FPf-5FLvhOKO... Conflicts: tools/testing/selftests/bpf/verifier/* Signed-off-by: Pu Lehui pulehui@huawei.com --- kernel/bpf/verifier.c | 62 +++++++------------ .../selftests/bpf/verifier/basic_stack.c | 11 ++-- tools/testing/selftests/bpf/verifier/calls.c | 4 +- .../testing/selftests/bpf/verifier/int_ptr.c | 7 ++- .../selftests/bpf/verifier/raw_stack.c | 7 ++- .../testing/selftests/bpf/verifier/var_off.c | 61 +++++++++++++----- 6 files changed, 89 insertions(+), 63 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8caee6b90aee..91e83cfcbfb8 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -799,7 +799,10 @@ static int resize_reference_state(struct bpf_func_state *state, size_t n) return 0; }
-static int grow_stack_state(struct bpf_func_state *state, int size) +/* Possibly update state->allocated_stack to be at least size bytes. Also + * possibly update the function's high-water mark in its bpf_subprog_info. + */ +static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state *state, int size) { size_t old_n = state->allocated_stack / BPF_REG_SIZE, n = size / BPF_REG_SIZE;
@@ -811,6 +814,11 @@ static int grow_stack_state(struct bpf_func_state *state, int size) return -ENOMEM;
state->allocated_stack = size; + + /* update known max for given subprogram */ + if (env->subprog_info[state->subprogno].stack_depth < size) + env->subprog_info[state->subprogno].stack_depth = size; + return 0; }
@@ -2488,9 +2496,6 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, struct bpf_reg_state *reg = NULL; u32 dst_reg = insn->dst_reg;
- err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE)); - if (err) - return err; /* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0, * so it's aligned access and [off, off + size) are within stack limits */ @@ -2639,11 +2644,6 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, if (value_reg && register_is_null(value_reg)) writing_zero = true;
- err = grow_stack_state(state, round_up(-min_off, BPF_REG_SIZE)); - if (err) - return err; - - /* Variable offset writes destroy any spilled pointers in range. */ for (i = min_off; i < max_off; i++) { u8 new_type, *stype; @@ -3471,20 +3471,6 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, strict); }
-static int update_stack_depth(struct bpf_verifier_env *env, - const struct bpf_func_state *func, - int off) -{ - u16 stack = env->subprog_info[func->subprogno].stack_depth; - - if (stack >= -off) - return 0; - - /* update known max for given subprogram */ - env->subprog_info[func->subprogno].stack_depth = -off; - return 0; -} - /* starting from main bpf function walk all instructions of the function * and recursively walk all callees that given function can call. * Ignore jump and exit insns. @@ -3896,13 +3882,14 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env, * The minimum valid offset is -MAX_BPF_STACK for writes, and * -state->allocated_stack for reads. */ -static int check_stack_slot_within_bounds(int off, +static int check_stack_slot_within_bounds(struct bpf_verifier_env *env, + s64 off, struct bpf_func_state *state, enum bpf_access_type t) { int min_valid_off;
- if (t == BPF_WRITE) + if (t == BPF_WRITE || env->allow_uninit_stack) min_valid_off = -MAX_BPF_STACK; else min_valid_off = -state->allocated_stack; @@ -3951,7 +3938,7 @@ static int check_stack_access_within_bounds( max_off = reg->smax_value + off + access_size; }
- err = check_stack_slot_within_bounds(min_off, state, type); + err = check_stack_slot_within_bounds(env, min_off, state, type); if (!err && max_off > 0) err = -EINVAL; /* out of stack access into non-negative offsets */
@@ -3966,8 +3953,10 @@ static int check_stack_access_within_bounds( verbose(env, "invalid variable-offset%s stack R%d var_off=%s size=%d\n", err_extra, regno, tn_buf, access_size); } + return err; } - return err; + + return grow_stack_state(env, state, round_up(-min_off, BPF_REG_SIZE)); }
/* check whether memory at (regno + off) is accessible for t = (read | write) @@ -3982,7 +3971,6 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn { struct bpf_reg_state *regs = cur_regs(env); struct bpf_reg_state *reg = regs + regno; - struct bpf_func_state *state; int size, err = 0;
size = bpf_size_to_bytes(bpf_size); @@ -4100,11 +4088,6 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (err) return err;
- state = func(env, reg); - err = update_stack_depth(env, state, off); - if (err) - return err; - if (t == BPF_READ) err = check_stack_read(env, regno, off, size, value_regno); @@ -4239,7 +4222,8 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins
/* When register 'regno' is used to read the stack (either directly or through * a helper function) make sure that it's within stack boundary and, depending - * on the access type, that all elements of the stack are initialized. + * on the access type and privileges, that all elements of the stack are + * initialized. * * 'off' includes 'regno->off', but not its dynamic part (if any). * @@ -4322,8 +4306,11 @@ static int check_stack_range_initialized(
slot = -i - 1; spi = slot / BPF_REG_SIZE; - if (state->allocated_stack <= slot) - goto err; + if (state->allocated_stack <= slot) { + verbose(env, "verifier bug: allocated_stack too small"); + return -EFAULT; + } + stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE]; if (*stype == STACK_MISC) goto mark; @@ -4351,7 +4338,6 @@ static int check_stack_range_initialized( goto mark; }
-err: if (tnum_is_const(reg->var_off)) { verbose(env, "invalid%s read from stack R%d off %d+%d size %d\n", err_extra, regno, min_off, i - min_off, access_size); @@ -4371,7 +4357,7 @@ static int check_stack_range_initialized( state->stack[spi].spilled_ptr.parent, REG_LIVE_READ64); } - return update_stack_depth(env, state, min_off); + return 0; }
static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, diff --git a/tools/testing/selftests/bpf/verifier/basic_stack.c b/tools/testing/selftests/bpf/verifier/basic_stack.c index f995777dddb3..1e481fca2d72 100644 --- a/tools/testing/selftests/bpf/verifier/basic_stack.c +++ b/tools/testing/selftests/bpf/verifier/basic_stack.c @@ -17,8 +17,9 @@ BPF_EXIT_INSN(), }, .fixup_map_hash_8b = { 2 }, - .errstr = "invalid indirect read from stack", - .result = REJECT, + .result = ACCEPT, + .errstr_unpriv = "invalid indirect read from stack", + .result_unpriv = REJECT, }, { "uninitialized stack2", @@ -27,8 +28,10 @@ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, -8), BPF_EXIT_INSN(), }, - .errstr = "invalid read from stack", - .result = REJECT, + .result = ACCEPT, + .retval = POINTER_VALUE, + .errstr_unpriv = "invalid read from stack", + .result_unpriv = REJECT, }, { "invalid fp arithmetic", diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c index 4b0628cd2d03..0ac55b818867 100644 --- a/tools/testing/selftests/bpf/verifier/calls.c +++ b/tools/testing/selftests/bpf/verifier/calls.c @@ -1228,7 +1228,9 @@ .prog_type = BPF_PROG_TYPE_XDP, .fixup_map_hash_8b = { 23 }, .result = REJECT, - .errstr = "invalid read from stack R7 off=-16 size=8", + .errstr = "R0 invalid mem access 'inv'", + .result_unpriv = REJECT, + .errstr_unpriv = "invalid read from stack R7 off=-16 size=8", }, { "calls: two calls that receive map_value via arg=ptr_stack_of_caller. test1", diff --git a/tools/testing/selftests/bpf/verifier/int_ptr.c b/tools/testing/selftests/bpf/verifier/int_ptr.c index 02d9e004260b..c28cd2b8f1da 100644 --- a/tools/testing/selftests/bpf/verifier/int_ptr.c +++ b/tools/testing/selftests/bpf/verifier/int_ptr.c @@ -25,9 +25,10 @@ BPF_MOV64_IMM(BPF_REG_0, 1), BPF_EXIT_INSN(), }, - .result = REJECT, - .prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL, - .errstr = "invalid indirect read from stack R4 off -16+0 size 8", + .result = ACCEPT, + .retval = POINTER_VALUE, + .errstr_unpriv = "invalid indirect read from stack R4 off -16+0 size 8", + .result_unpriv = REJECT, }, { "ARG_PTR_TO_LONG half-uninitialized", diff --git a/tools/testing/selftests/bpf/verifier/raw_stack.c b/tools/testing/selftests/bpf/verifier/raw_stack.c index cc8e8c3cdc03..f9364f454009 100644 --- a/tools/testing/selftests/bpf/verifier/raw_stack.c +++ b/tools/testing/selftests/bpf/verifier/raw_stack.c @@ -10,9 +10,10 @@ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0), BPF_EXIT_INSN(), }, - .result = REJECT, - .errstr = "invalid read from stack R6 off=-8 size=8", - .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = POINTER_VALUE, + .errstr_unpriv = "invalid read from stack R6 off=-8 size=8", + .result_unpriv = REJECT, }, { "raw_stack: skb_load_bytes, negative len", diff --git a/tools/testing/selftests/bpf/verifier/var_off.c b/tools/testing/selftests/bpf/verifier/var_off.c index dc92a29f0d74..743b990904a9 100644 --- a/tools/testing/selftests/bpf/verifier/var_off.c +++ b/tools/testing/selftests/bpf/verifier/var_off.c @@ -58,9 +58,10 @@ BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .result = REJECT, - .errstr = "invalid variable-offset read from stack R2", - .prog_type = BPF_PROG_TYPE_LWT_IN, + .result = ACCEPT, + .errstr_unpriv = "R2 variable stack access prohibited for !root", + .result_unpriv = REJECT, + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, }, { "variable-offset stack write, priv vs unpriv", @@ -70,27 +71,58 @@ /* Make it small and 8-byte aligned */ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 8), BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16), + /* Add it to fp. We now have either fp-8 or + * fp-16, but we don't know which + */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10), + /* Dereference it for a stack write */ + BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + /* Check that the maximum stack depth is correctly maintained according to the + * maximum possible variable offset. + */ + .result = ACCEPT, + /* Variable stack access is rejected for unprivileged. + */ + .errstr_unpriv = "R2 variable stack access prohibited for !root", + .result_unpriv = REJECT, +}, +{ + /* Similar to the previous test, but this time also perform a read from the + * address written to with a variable offset. The read is allowed, showing that, + * after a variable-offset write, a priviledged program can read the slots that + * were in the range of that write (even if the verifier doesn't actually know if + * the slot being read was really written to or not. + * + * Despite this test being mostly a superset, the previous test is also kept for + * the sake of it checking the stack depth in the case where there is no read. + */ + "variable-offset stack write followed by read", + .insns = { + /* Get an unknown value */ + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), + /* Make it small and 8-byte aligned */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 8), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16), /* Add it to fp. We now have either fp-8 or fp-16, but * we don't know which */ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10), /* Dereference it for a stack write */ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0), - /* Now read from the address we just wrote. This shows - * that, after a variable-offset write, a priviledged - * program can read the slots that were in the range of - * that write (even if the verifier doesn't actually know - * if the slot being read was really written to or not. - */ + /* Now read from the address we just wrote. */ BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_2, 0), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - /* Variable stack access is rejected for unprivileged. + /* Check that the maximum stack depth is correctly maintained according to the + * maximum possible variable offset. */ + .result = ACCEPT, .errstr_unpriv = "R2 variable stack access prohibited for !root", .result_unpriv = REJECT, - .result = ACCEPT, }, { "variable-offset stack write clobbers spilled regs", @@ -233,9 +265,10 @@ BPF_EXIT_INSN(), }, .fixup_map_hash_8b = { 5 }, - .errstr = "invalid indirect read from stack R2 var_off", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_LWT_IN, + .result = ACCEPT, + .errstr_unpriv = "R2 variable stack access prohibited for !root", + .result_unpriv = REJECT, + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, }, { "indirect variable-offset stack access, priv vs unpriv",