From: Stefan Roesch <shr(a)devkernel.io>
mainline inclusion
from mainline-v6.2-rc1
commit 8e9d5ead865a1a7af74a444d2f00f1ef4539bfba
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAN96I
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
Patch series "mm/block: add bdi sysfs knobs", v4.
At meta network block devices (nbd) are used to implement remote block
storage. In testing and during production it has been observed that these
network block devices can consume a huge portion of the dirty writeback
cache and writeback can take a considerable time.
To be able to give stricter limits, I'm proposing the following changes:
1) introduce strictlimit knob
Currently the max_ratio knob exists to limit the dirty_memory. However
this knob only applies once (dirty_ratio + dirty_background_ratio) / 2
has been reached.
With the BDI_CAP_STRICTLIMIT flag, the max_ratio can be applied without
reaching that limit. This change exposes that knob.
This knob can also be useful for NFS, fuse filesystems and USB devices.
2) Use part of 1000000 internal calculation
The max_ratio is based on percentage. With the current machine sizes
percentage values can be very high (1% of a 256GB main memory is already
2.5GB). This change uses part of 1000000 instead of percentages for the
internal calculations.
3) Introduce two new sysfs knobs: min_bytes and max_bytes.
Currently all calculations are based on ratio, but for a user it often
more convenient to specify a limit in bytes. The new knobs will not
store bytes values, instead they will translate the byte value to a
corresponding ratio. As the internal values are now part of 1000, the
ratio is closer to the specified value. However the value should be more
seen as an approximation as it can fluctuate over time.
3) Introduce two new sysfs knobs: min_ratio_fine and max_ratio_fine.
The granularity for the existing sysfs bdi knobs min_ratio and max_ratio
is based on percentage values. The new sysfs bdi knobs min_ratio_fine
and max_ratio_fine allow to specify the ratio as part of 1 million.
This patch (of 20):
This adds the bdi_set_strict_limit function to be able to set/unset the
BDI_CAP_STRICTLIMIT flag.
Link: https://lkml.kernel.org/r/20221119005215.3052436-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20221119005215.3052436-2-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Chris Mason <clm(a)meta.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com>
---
include/linux/backing-dev.h | 1 +
mm/page-writeback.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index a689a21abe10..c67129defa3a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -106,6 +106,7 @@ static inline unsigned long wb_stat_error(void)
int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio);
int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
+int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit);
/*
* Flags in backing_dev_info::capability
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0d7cc65c6367..1f6104775a43 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -724,6 +724,21 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio)
}
EXPORT_SYMBOL(bdi_set_max_ratio);
+int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit)
+{
+ if (strict_limit > 1)
+ return -EINVAL;
+
+ spin_lock_bh(&bdi_lock);
+ if (strict_limit)
+ bdi->capabilities |= BDI_CAP_STRICTLIMIT;
+ else
+ bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
+ spin_unlock_bh(&bdi_lock);
+
+ return 0;
+}
+
static unsigned long dirty_freerun_ceiling(unsigned long thresh,
unsigned long bg_thresh)
{
--
2.39.2
From: Jan Kara <jack(a)suse.cz>
stable inclusion
from stable-v4.19.320
commit 2b2d2b8766db028bd827af34075f221ae9e9efff
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGRLH
CVE: CVE-2024-42131
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2 ]
The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits). If limits end up being larger, we will hit overflows,
possible divisions by 0 etc. Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway. For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits. For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc. So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.
This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.
Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reported-by: Zach O'Keefe <zokeefe(a)google.com>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
---
mm/page-writeback.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 43e83930ce44..06d8242a926e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -432,13 +432,20 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
else
bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;
- if (bg_thresh >= thresh)
- bg_thresh = thresh / 2;
tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
}
+ /*
+ * Dirty throttling logic assumes the limits in page units fit into
+ * 32-bits. This gives 16TB dirty limits max which is hopefully enough.
+ */
+ if (thresh > UINT_MAX)
+ thresh = UINT_MAX;
+ /* This makes sure bg_thresh is within 32-bits as well */
+ if (bg_thresh >= thresh)
+ bg_thresh = thresh / 2;
dtc->thresh = thresh;
dtc->bg_thresh = bg_thresh;
@@ -488,7 +495,11 @@ static unsigned long node_dirty_limit(struct pglist_data *pgdat)
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk))
dirty += dirty / 4;
- return dirty;
+ /*
+ * Dirty throttling logic assumes the limits in page units fit into
+ * 32-bits. This gives 16TB dirty limits max which is hopefully enough.
+ */
+ return min_t(unsigned long, dirty, UINT_MAX);
}
/**
@@ -527,10 +538,17 @@ int dirty_background_bytes_handler(struct ctl_table *table, int write,
loff_t *ppos)
{
int ret;
+ unsigned long old_bytes = dirty_background_bytes;
ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
- if (ret == 0 && write)
+ if (ret == 0 && write) {
+ if (DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE) >
+ UINT_MAX) {
+ dirty_background_bytes = old_bytes;
+ return -ERANGE;
+ }
dirty_background_ratio = 0;
+ }
return ret;
}
@@ -558,6 +576,10 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
+ if (DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE) > UINT_MAX) {
+ vm_dirty_bytes = old_bytes;
+ return -ERANGE;
+ }
writeback_set_ratelimit();
vm_dirty_ratio = 0;
}
--
2.25.1
tree: https://gitee.com/openeuler/kernel.git OLK-5.10
head: b3492b88408a530f644afcd9e80b98cbf9883a82
commit: 013280dfab06d20e73de842e8d2fc2a200055455 [26952/30000] urma: upload kernel patch for 20240224_rain
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240829/202408290746.znnxteuy-lkp@…)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240829/202408290746.znnxteuy-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408290746.znnxteuy-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/ub/urma/ubcore/ubcore_vtp.c:261:6: warning: no previous prototype for function 'ubcore_hash_table_rmv_vtpn' [-Wmissing-prototypes]
261 | void ubcore_hash_table_rmv_vtpn(struct ubcore_device *dev, struct ubcore_vtpn *vtpn)
| ^
drivers/ub/urma/ubcore/ubcore_vtp.c:261:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
261 | void ubcore_hash_table_rmv_vtpn(struct ubcore_device *dev, struct ubcore_vtpn *vtpn)
| ^
| static
1 warning generated.
vim +/ubcore_hash_table_rmv_vtpn +261 drivers/ub/urma/ubcore/ubcore_vtp.c
260
> 261 void ubcore_hash_table_rmv_vtpn(struct ubcore_device *dev, struct ubcore_vtpn *vtpn)
262 {
263 struct ubcore_hash_table *ht;
264
265 ht = ubcore_get_vtpn_ht(dev, vtpn->trans_mode);
266 if (ht == NULL)
267 return;
268 ubcore_hash_table_remove(ht, &vtpn->hnode);
269 }
270
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki