[PATCH OLK-6.6 0/3] Sync patches from 6.6

Liu Chuang (1): trace: Fix kabi breakage in struct trace_iterator Steven Rostedt (Google) (2): tracing/ring-buffer: Fix wait_on_pipe() race ring-buffer: Make wake once of ring_buffer_wait() more robust include/linux/ring_buffer.h | 3 ++- include/linux/trace_events.h | 6 ++++- kernel/trace/ring_buffer.c | 39 ++++++++++++++++++++------------ kernel/trace/trace.c | 43 ++++++++++++++++++++++++++---------- 4 files changed, 63 insertions(+), 28 deletions(-) -- 2.34.1

From: "Steven Rostedt (Google)" <rostedt@goodmis.org> mainline inclusion from mainline-v6.9-rc1 commit 2aa043a55b9a764c9cbde5a8c654eeaaffe224cf category: bugfix bugzilla: http://openeuler.huawei.com/bugzilla/show_bug.cgi?id=189867 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the trace_pipe_raw file is closed, there should be no new readers on the file descriptor. This is mostly handled with the waking and wait_index fields of the iterator. But there's still a slight race. CPU 0 CPU 1 ----- ----- wait_index++; index = wait_index; ring_buffer_wake_waiters(); wait_on_pipe() ring_buffer_wait(); The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is that the ring_buffer_wait() needs the logic of: prepare_to_wait(); if (!condition) schedule(); Where the missing condition check is the iter->wait_index update. Have the ring_buffer_wait() take a conditional callback function and a data parameter that can be used within the wait_event_interruptible() of the ring_buffer_wait() function. In wait_on_pipe(), pass a condition function that will check if the wait_index has been updated, if it has, it will return true to break out of the wait_event_interruptible() loop. Create a new field "closed" in the trace_iterator and set it in the .flush() callback before calling ring_buffer_wake_waiters(). This will keep any new readers from waiting on a closed file descriptor. Have the wait_on_pipe() condition callback also check the closed field. Change the wait_index field of the trace_iterator to atomic_t. There's no reason it needs to be 'long' and making it atomic and using atomic_read_acquire() and atomic_fetch_inc_release() will provide the necessary memory barriers. Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after one more try to fetch data. That is, if it waited for data and something woke it up, it should try to collect any new data and then exit back to user space. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQmi... Link: https://lore.kernel.org/linux-trace-kernel/20240312121703.557950713@goodmis.... Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linke li <lilinke99@qq.com> Cc: Rabin Vincent <rabin@rab.in> Fixes: f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Conflicts: kernel/trace/trace.c include/linux/trace_events.h [Did not backport commit bce761d757452. And KABI adjustment.] Signed-off-by: Liu Chuang <liuchuang40@huawei.com> --- include/linux/ring_buffer.h | 3 ++- include/linux/trace_events.h | 6 ++++- kernel/trace/ring_buffer.c | 13 ++++++----- kernel/trace/trace.c | 43 ++++++++++++++++++++++++++---------- 4 files changed, 46 insertions(+), 19 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index ded528d23f85..318d3cdfd42c 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k }) typedef bool (*ring_buffer_cond_fn)(void *data); -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 3f10e29df05a..c91bc2b8bda7 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -104,7 +104,11 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; - long wait_index; + + /* Set when the file is closed to prevent new waiters */ + bool closed; + + atomic_t wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 72884e8073b3..f3523788e5e6 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1056,23 +1056,26 @@ static bool rb_wait_once(void *data) * @buffer: buffer to wait on * @cpu: the cpu buffer to wait on * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @cond: condition function to break out of wait (NULL to run once) + * @data: the data to pass to @cond. * * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon * as data is added to any of the @buffer's cpu buffers. Otherwise * it will wait for data to be added to a specific cpu buffer. */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data) { struct ring_buffer_per_cpu *cpu_buffer; struct wait_queue_head *waitq; - ring_buffer_cond_fn cond; struct rb_irq_work *rbwork; - void *data; long once = 0; int ret = 0; - cond = rb_wait_once; - data = &once; + if (!cond) { + cond = rb_wait_once; + data = &once; + } /* * Depending on what the caller is waiting for, either any diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2412c0a46e55..02a3d30e85a9 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1946,15 +1946,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu) #endif /* CONFIG_TRACER_MAX_TRACE */ +struct pipe_wait { + struct trace_iterator *iter; + int wait_index; +}; + +static bool wait_pipe_cond(void *data) +{ + struct pipe_wait *pwait = data; + struct trace_iterator *iter = pwait->iter; + + if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index) + return true; + + return iter->closed; +} + static int wait_on_pipe(struct trace_iterator *iter, int full) { + struct pipe_wait pwait; int ret; /* Iterators are static, they should be filled or empty */ if (trace_buffer_iter(iter, iter->cpu_file)) return 0; - ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full); + pwait.wait_index = atomic_read_acquire(&iter->wait_index); + pwait.iter = iter; + + ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full, + wait_pipe_cond, &pwait); #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -8262,9 +8283,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id) struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; - iter->wait_index++; + iter->closed = true; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); @@ -8364,6 +8385,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + bool woken = false; int entries, i; ssize_t ret = 0; @@ -8435,17 +8457,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, /* did we read anything? */ if (!spd.nr_pages) { - long wait_index; if (ret) goto out; + if (woken) + goto out; + ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; - wait_index = READ_ONCE(iter->wait_index); - ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent); if (ret) goto out; @@ -8454,10 +8476,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!tracer_tracing_is_on(iter->tr)) goto out; - /* Make sure we see the new wait_index */ - smp_rmb(); - if (wait_index != iter->wait_index) - goto out; + /* Iterate one more time to collect any new data then exit */ + woken = true; goto again; } @@ -8480,9 +8500,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned mutex_lock(&trace_types_lock); - iter->wait_index++; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); -- 2.34.1

From: "Steven Rostedt (Google)" <rostedt@goodmis.org> mainline inclusion from mainline-v6.9-rc1 commit b70f2938242a028f8e9473781ede175486a59dc8 category: bugfix bugzilla: http://openeuler.huawei.com/bugzilla/show_bug.cgi?id=189867 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The default behavior of ring_buffer_wait() when passed a NULL "cond" parameter is to exit the function the first time it is woken up. The current implementation uses a counter that starts at zero and when it is greater than one it exits the wait_event_interruptible(). But this relies on the internal working of wait_event_interruptible() as that code basically has: if (cond) return; prepare_to_wait(); if (!cond) schedule(); finish_wait(); That is, cond is called twice before it sleeps. The default cond of ring_buffer_wait() needs to account for that and wait for its counter to increment twice before exiting. Instead, use the seq/atomic_inc logic that is used by the tracing code that calls this function. Add an atomic_t seq to rb_irq_work and when cond is NULL, have the default callback take a descriptor as its data that holds the rbwork and the value of the seq when it started. The wakeups will now increment the rbwork->seq and the cond callback will simply check if that number is different, and no longer have to rely on the implementation of wait_event_interruptible(). Link: https://lore.kernel.org/linux-trace-kernel/20240315063115.6cb5d205@gandalf.l... Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7af9ded0c2ca ("ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Liu Chuang <liuchuang40@huawei.com> --- kernel/trace/ring_buffer.c | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index f3523788e5e6..a42cba532abe 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -413,6 +413,7 @@ struct rb_irq_work { struct irq_work work; wait_queue_head_t waiters; wait_queue_head_t full_waiters; + atomic_t seq; bool waiters_pending; bool full_waiters_pending; bool wakeup_full; @@ -907,6 +908,9 @@ static void rb_wake_up_waiters(struct irq_work *work) { struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); + /* For waiters waiting for the first wake up */ + (void)atomic_fetch_inc_release(&rbwork->seq); + wake_up_all(&rbwork->waiters); if (rbwork->full_waiters_pending || rbwork->wakeup_full) { /* Only cpu_buffer sets the above flags */ @@ -1035,20 +1039,21 @@ rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer, return false; } +struct rb_wait_data { + struct rb_irq_work *irq_work; + int seq; +}; + /* * The default wait condition for ring_buffer_wait() is to just to exit the * wait loop the first time it is woken up. */ static bool rb_wait_once(void *data) { - long *once = data; + struct rb_wait_data *rdata = data; + struct rb_irq_work *rbwork = rdata->irq_work; - /* wait_event() actually calls this twice before scheduling*/ - if (*once > 1) - return true; - - (*once)++; - return false; + return atomic_read_acquire(&rbwork->seq) != rdata->seq; } /** @@ -1069,14 +1074,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, struct ring_buffer_per_cpu *cpu_buffer; struct wait_queue_head *waitq; struct rb_irq_work *rbwork; - long once = 0; + struct rb_wait_data rdata; int ret = 0; - if (!cond) { - cond = rb_wait_once; - data = &once; - } - /* * Depending on what the caller is waiting for, either any * data in any cpu buffer, or a specific buffer, put the @@ -1098,6 +1098,14 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, else waitq = &rbwork->waiters; + /* Set up to exit loop as soon as it is woken */ + if (!cond) { + cond = rb_wait_once; + rdata.irq_work = rbwork; + rdata.seq = atomic_read_acquire(&rbwork->seq); + data = &rdata; + } + ret = wait_event_interruptible((*waitq), rb_wait_cond(rbwork, buffer, cpu, full, cond, data)); -- 2.34.1

Offering: HULK hulk inclusion category: bugfix bugzilla: http://openeuler.huawei.com/bugzilla/show_bug.cgi?id=189867 Reference: https://open.codehub.huawei.com/OpenSourceCenter/openEuler-source/kernel/fil... -------------------------------- Fix kabi change and a problem caused by kabi change. Fixes: 2aa043a55b9a ("tracing/ring-buffer: Fix wait_on_pipe() race") Signed-off-by: Liu Chuang <liuchuang40@huawei.com> --- include/linux/trace_events.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index c91bc2b8bda7..6d91ba4eb4f5 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -106,9 +106,9 @@ struct trace_iterator { unsigned int fmt_size; /* Set when the file is closed to prevent new waiters */ - bool closed; + KABI_FILL_HOLE(bool closed) - atomic_t wait_index; + KABI_REPLACE(long wait_index, atomic_t wait_index) /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; -- 2.34.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16276 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/K6K... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16276 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/K6K...
participants (2)
-
Liu Chuang
-
patchwork bot