Kernel
Threads by month
- ----- 2025 -----
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- 47 participants
- 18240 discussions
hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4FS3G?from=project-issue
CVE: NA
---------------------------
There are some language problems in the README file, and MarkDown fromat
syntax is not effective, and it needs to be adjusted.
Signed-off-by: suqin <suqin2(a)huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin(a)huawei.com>
---
README | 226 ---------------------------------------------------
README.md | 237 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 237 insertions(+), 226 deletions(-)
delete mode 100644 README
create mode 100644 README.md
diff --git a/README b/README
deleted file mode 100644
index 46c9ea352..000000000
--- a/README
+++ /dev/null
@@ -1,226 +0,0 @@
-Contributions to openEuler kernel project
-=========================================
-
-Sign CLA
---------
-
-Before submitting any Contributions to openEuler, you have to sign CLA.
-
-See:
- https://openeuler.org/zh/cla.html
- https://openeuler.org/en/cla.html
-
-Steps of submitting patches
----------------------------
-
-1. Compile and test your patches successfully.
-2. Generate patches
- Your patches should be based on top of latest openEuler branch, and should
- use git-format-patch to generate patches, and if it's a patchset, it's
- better to use --cover-letter option to describe what the patchset does.
-
- Using scripts/checkpatch.pl to make sure there's no coding style issue.
-
- And make sure your patch follow unified openEuler patch format describe
- below.
-
-3. Send patch to openEuler mailing list
- Use this command to send patches to openEuler mailing list:
-
- git send-email *.patch -to="kernel(a)openeuler.org" --suppress-cc=all
-
- *NOTE*: that you must add --suppress-cc=all if you use git send-email,
- otherwise the email will be cced to the people in upstream community and mailing
- lists.
-
- *See*: How to send patches using git-send-email
- https://git-scm.com/docs/git-send-email
-
-4. Mark "v1, v2, v3 ..." in your patch subject if you have multiple versions
- to send out.
-
- Use --subject-prefix="PATCH v2" option to add v2 tag for patchset.
- git format-patch --subject-prefix="PATCH v2" -1
-
- Subject examples:
- Subject: [PATCH v2 01/27] fork: fix some -Wmissing-prototypes warnings
- Subject: [PATCH v3] ext2: improve scalability of bitmap searching
-
-5. Upstream your kernel patch to kernel community is strongly recommended.
- openEuler will sync up with kernel master timely.
-
-6. Sign your work - the Developer’s Certificate of Origin
- As the same of upstream kernel community, you also need to sign your patch.
-
- See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html
-
- The sign-off is a simple line at the end of the explanation for the patch,
- which certifies that you wrote it or otherwise have the right to pass it
- on as an open-source patch. The rules are pretty simple: if you can certify
- the below:
-
- Developer’s Certificate of Origin 1.1
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- By making a contribution to this project, I certify that:
-
- (a) The contribution was created in whole or in part by me and I have
- the right to submit it under the open source license indicated in
- the file; or
-
- (b The contribution is based upon previous work that, to the best of
- my knowledge, is covered under an appropriate open source license
- and I have the right under that license to submit that work with
- modifications, whether created in whole or in part by me, under
- the same open source license (unless I am permitted to submit under
- a different license), as indicated in the file; or
-
- (c) The contribution was provided directly to me by some other person
- who certified (a), (b) or (c) and I have not modified it.
-
- (d) I understand and agree that this project and the contribution are
- public and that a record of the contribution (including all personal
- information I submit with it, including my sign-off) is maintained
- indefinitely and may be redistributed consistent with this project
- or the open source license(s) involved.
-
- then you just add a line saying:
-
- Signed-off-by: Random J Developer <random(a)developer.example.org>
-
- using your real name (sorry, no pseudonyms or anonymous contributions.)
-
-Use unified patch format
-------------------------
-
-Reasons:
-
-1. long term maintainability
- openEuler will merge massive patches. If all patches are merged by casual
- changelog format without a unified format, the git log will be messy, and
- then it's hard to figure out the original patch.
-
-2. kernel upgrade
- We definitely will upgrade our openEuler kernel in someday, using strict
- patch management will alleviate the pain to migrate patches during big upgrade.
-
-3. easy for script parsing
- Keyword highlighting is necessary for script parsing.
-
-Patch format definition
------------------------
-
-[M] stands for "mandatory"
-[O] stands for "option"
-$category can be: bug preparation, bugfix, perf, feature, doc, other...
-
-If category is feature, then we also need to add feature name like below:
- category: feature
- feature: YYY (the feature name)
-
-If the patch is related to CVE or bugzilla, then we need add the corresponding
-tag like below (In general, it should include at least one of the following):
- CVE: $cve-id
- bugzilla: $bug-id
-
-Additional changelog should include at least one of the flollwing:
- 1) Why we should apply this patch
- 2) What real problem in product does this patch resolved
- 3) How could we reproduce this bug or how to test
- 4) Other useful information for help to understand this patch or problem
-
-The detail information is very useful for porting patch to another kenrel branch.
-
-Example for mainline patch:
-
- mainline inclusion [M]
- from $mainline-version [M]
- commit $id [M]
- category: $category [M]
- bugzilla: $bug-id [O]
- CVE: $cve-id [O]
-
- additional changelog [O]
-
- --------------------------------
-
- original changelog
-
- Signed-off-by: $yourname <$yourname(a)huawei.com> [M]
-
- ($mainline-version could be mainline-3.5, mainline-3.6, etc...)
-
-Examples
---------
-
-mainline inclusion
-from mainline-4.10
-commit 0becc0ae5b42828785b589f686725ff5bc3b9b25
-category: bugfix
-bugzilla: 3004
-CVE: NA
-
-The patch fixes a BUG_ON in the product: injecting single bit ECC error
-to memory before system boot use hardware inject tools, which cause a
-large amount of CMCI during system booting .
-
-[ 1.146580] mce: [Hardware Error]: Machine check events logged
-[ 1.152908] ------------[ cut here ]------------
-[ 1.157751] kernel BUG at kernel/timer.c:951!
-[ 1.162321] invalid opcode: 0000 [#1] SMP
-...
-
--------------------------------------------------
-
-original changelog
-
-<original S-O-B>
-Signed-off-by: Zhang San <zhangsan(a)huawei.com>
-Tested-by: Li Si <lisi(a)huawei.com>
-
-Email Client - Thunderbird Settings
------------------------------------
-
-If you are newly developer in the kernel community, it is highly recommended
-to use thunderbird mail client.
-
-1. Thunderbird Installation
- Get English version Thunderbird from http://www.mozilla.org/ and install
- it on your system。
-
- Download url: https://www.thunderbird.net/en-US/thunderbird/all/
-
-2. Settings
- 2.1 Use plain text format instead of HTML format
- Options -> Account Settings -> Composition & Addressing, do *NOT* select
- "Compose message in HTML format".
-
- 2.2 Editor Settings
- Tools->Options->Advanced->Config editor.
-
- - To bring up the thunderbird's registry editor, and set:
- "mailnews.send_plaintext_flowed" to "false".
- - Disable HTML Format: Set "mail.identity.id1.compose_html" to "false".
- - Enable UTF8: Set "prefs.converted-to-utf8" to "true".
- - View message in UTF-8: Set "mailnews.view_default_charset" to "UTF-8".
- - Set mailnews.wraplength to 9999 for avoiding auto-wrap
-
-Linux kernel
-============
-
-There are several guides for kernel developers and users. These guides can
-be rendered in a number of formats, like HTML and PDF. Please read
-Documentation/admin-guide/README.rst first.
-
-In order to build the documentation, use ``make htmldocs`` or
-``make pdfdocs``. The formatted documentation can also be read online at:
-
- https://www.kernel.org/doc/html/latest/
-
-There are various text files in the Documentation/ subdirectory,
-several of them using the Restructured Text markup notation.
-See Documentation/00-INDEX for a list of what is contained in each file.
-
-Please read the Documentation/process/changes.rst file, as it contains the
-requirements for building and running the kernel, and information about
-the problems which may result by upgrading your kernel.
diff --git a/README.md b/README.md
new file mode 100644
index 000000000..20832fd85
--- /dev/null
+++ b/README.md
@@ -0,0 +1,237 @@
+# How to Contribute
+-------
+
+- [How to Contribute](#How to Contribute)
+
+ \- [Sign the CLA](#Sign the CLA)
+
+ \- [Steps of submitting patches](#Steps of submitting patches)
+
+ \- [Use the unified patch format](#Use the unified patch format)
+
+ \- [Define the patch format](#Define the patch format)
+
+ \- [Examples](#Examples)
+
+ \- [Email client - Thunderbird settings](#Email client - Thunderbird settings)
+
+- [Linux kernel](#Linux kernel)
+
+### Sign the CLA
+
+-------
+
+Before making any contributions to openEuler, sign the CLA first.
+
+Address: [https://openeuler.org/en/cla.html](https://openeuler.org/en/cla.html)
+
+### Steps of submitting patches
+-------
+
+**Step 1** Compile and test your patches.
+
+**Step 2** Generate patches.
+
+Your patches should be generated based on the latest openEuler branch using git-format-patch. If your patches are in a patchset, it is better to use the **--cover-letter** option to describe what the patchset does.
+
+Use **scripts/checkpatch.pl** to ensure that no coding style issue exists.
+
+In addition, ensure that your patches comply with the unified openEuler patch format described below.
+
+**Step 3** Send your patches to the openEuler mailing list.
+
+To do so, run the following command:
+
+ `git send-email *.patch -to="kernel(a)openeuler.org" --suppress-cc=all`
+
+*NOTE*: Add **--suppress-cc=all** if you use git-send-email; otherwise, the email will be copied to all people in the upstream community and mailing lists.
+
+For details about how to send patches using git-send-email, see [https://git-scm.com/docs/git-send-email](https://git-scm.com/docs/git-send-….
+
+**Step 4** Mark "v1, v2, v3 ..." in your patch subject if you have multiple versions to send out.
+
+Use the **--subject-prefix="PATCH v2"** option to add the v2 tag to the patchset.
+
+ `git format-patch --subject-prefix="PATCH v2" -1`
+
+Subject examples:
+
+ Subject: [PATCH v2 01/27] fork: fix some -Wmissing-prototypes warnings
+
+ Subject: [PATCH v3] ext2: improve scalability of bitmap searching
+
+**Step 5** Upstream your kernel patches to the kernel community (recommended). openEuler will synchronize with the kernel master in a timely manner.
+
+**Step 6** Sign your work - the Developer’s Certificate of Origin.
+
+ Similar to the upstream kernel community, you also need to sign your patch.
+
+ For details, see [https://www.kernel.org/doc/html/latest/process/submitting-patches.html](htt….
+
+ The sign-off is a simple line at the end of the explanation of the patch, which certifies that you wrote it or otherwise have the right to pass it on as an open source patch. The rules are pretty simple. You can certify as below:
+
+ Developer’s Certificate of Origin 1.1
+
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ By making a contribution to this project, I certify that:
+
+ (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file;
+
+ (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file;
+
+ (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.
+
+ (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.
+
+Then you add a line saying:
+
+Signed-off-by: Random J Developer <random(a)developer.example.org>
+
+Use your real name (sorry, no pseudonyms or anonymous contributions).
+
+### Use the unified patch format
+-------
+
+Reasons:
+
+1. Long term maintainability
+
+ openEuler will merge massive patches. If all patches are merged by casual
+
+ changelog formats without a unified format, the git logs will be messy, and
+
+ then it is hard to figure out the original patches.
+
+2. Kernel upgrade
+
+ We definitely will upgrade our openEuler kernel in someday, so strict patch management
+
+ will alleviate the pain to migrate patches during big upgrades.
+
+3. Easy for script parsing
+
+ Keyword highlighting is necessary for script parsing.
+
+### Define the patch format
+-------
+
+[M] stands for "mandatory".
+
+[O] stands for "option".
+
+$category can be: bug preparation, bugfix, perf, feature, doc, other...
+
+If category is feature, we need to add a feature name as below:
+
+```cpp
+category: feature
+feature: YYY (the feature name)
+```
+
+If the patch is related to CVE or bugzilla, we need to add the corresponding tag as below (In general, it should include at least one of the following):
+
+```cpp
+CVE: $cve-id
+bugzilla: $bug-id
+```
+
+Additional changelog should include at least one of the following:
+
+1. Why we should apply this patch
+
+2. What real problems in the product does this patch resolved
+
+3. How could we reproduce this bug or how to test
+
+4. Other useful information for help to understand this patch or problem
+
+The detailed information is very useful for migrating a patch to another kernel branch.
+
+Example for mainline patch:
+
+```cpp
+mainline inclusion [M]
+from $mainline-version [M]
+commit $id [M]
+category: $category [M]
+bugzilla: $bug-id [O]
+CVE: $cve-id [O]
+
+additional changelog [O]
+
+--------------------------------
+
+original changelog
+Signed-off-by: $yourname <$yourname(a)huawei.com> [M]
+($mainline-version could be mainline-3.5, mainline-3.6, etc...)
+```
+
+### Examples
+-------
+
+```cpp
+mainline inclusion
+from mainline-4.10
+commit 0becc0ae5b42828785b589f686725ff5bc3b9b25
+category: bugfix
+bugzilla: 3004
+CVE: N/A
+
+The patch fixes a BUG_ON in the product: Injecting a single bit ECC error to the memory before system boot using hardware inject tools will cause a large amount of CMCI during system booting .
+[ 1.146580] mce: [Hardware Error]: Machine check events logged
+[ 1.152908] ------------[ cut here ]------------
+[ 1.157751] kernel BUG at kernel/timer.c:951!
+[ 1.162321] invalid opcode: 0000 [#1] SMP
+
+-------------------------------------------------
+
+original changelog
+
+<original S-O-B>
+Signed-off-by: Zhang San <zhangsan(a)huawei.com>
+Tested-by: Li Si <lisi(a)huawei.com>
+```
+
+### Email client - Thunderbird settings
+-------
+
+If you are a new developer in the kernel community, it is highly recommended that you use the Thunderbird mail client.
+
+1. Thunderbird Installation
+
+ Obtain the English version of Thunderbird from [http://www.mozilla.org/]( http://www.mozilla.org/) and install it on your system.
+
+ Download URL: https://www.thunderbird.net/en-US/thunderbird/all/
+
+2. Settings
+
+ 2.1 Use the plain text format instead of the HTML format.
+
+ Choose **Options > Account Settings > Composition & Addressing**, and do **NOT** select Compose message in HTML format.
+
+ 2.2 Editor settings
+
+ **Tools > Options> Advanced > Config editor**
+
+ \- To bring up the Thunderbird's registry editor, set **mailnews.send_plaintext_flowed** to **false**.
+
+ \- Disable HTML Format: Set **mail.identity.id1.compose_html** to **false**.
+
+ \- Enable UTF-8: Set **prefs.converted-to-utf8** to **true**.
+
+ \- View messages in UTF-8: Set **mailnews.view_default_charset** to **UTF-8**.
+
+ \- Set **mailnews.wraplength** to **9999** to avoid auto-wrap.
+
+# Linux kernel
+-------
+
+There are several guides for kernel developers and users, which can be rendered in a number of formats, like HTML and PDF. You can read **Documentation/admin-guide/README.rst** first.
+
+In order to build the documentation, use **make htmldocs** or **make pdfdocs**. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/
+
+There are various text files in the Documentation/ subdirectory, several of which use the Restructured Text markup notation. See Documentation/00-INDEX for a list of what is contained in each file.
+
+Read the **Documentation/process/changes.rst** file, as it contains the requirements for building and running the kernel, and information about the problems that may be caused by upgrading your kernel.
+
--
2.22.0
1
0
backport psi feature and avoid kabi change
bugzilla: https://gitee.com/openeuler/kernel/issues/I47QS2
Baruch Siach (1):
psi: fix reference to kernel commandline enable
Dan Schatzberg (1):
kernel/sched/psi.c: expose pressure metrics on root cgroup
Johannes Weiner (12):
mm: workingset: tell cache transitions from workingset thrashing
sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
sched: loadavg: make calc_load_n() public
sched: sched.h: make rq locking and clock functions available in
stats.h
sched: introduce this_rq_lock_irq()
psi: pressure stall information for CPU, memory, and IO
psi: cgroup support
psi: make disabling/enabling easier for vendor kernels
psi: fix aggregation idle shut-off
psi: avoid divide-by-zero crash inside virtual machines
fs: kernfs: add poll file operation
sched/psi: Fix sampling error and rare div0 crashes with cgroups and
high uptime
Josef Bacik (1):
blk-iolatency: use a percentile approache for ssd's
Liu Xinpeng (2):
psi:enable psi in config
psi:avoid kabi change
Miklos Szeredi (1):
fuse: ignore PG_workingset after stealing
Olof Johansson (1):
kernel/sched/psi.c: simplify cgroup_move_task()
Suren Baghdasaryan (6):
psi: introduce state_mask to represent stalled psi states
psi: make psi_enable static
psi: rename psi fields in preparation for psi trigger addition
psi: split update_stats into parts
psi: track changed states
include/: refactor headers to allow kthread.h inclusion in psi_types.h
Yafang Shao (1):
mm, memcg: add workingset_restore in memory.stat
Documentation/accounting/psi.txt | 73 +++
Documentation/admin-guide/cgroup-v2.rst | 22 +
Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/arm64/configs/openeuler_defconfig | 2 +
arch/powerpc/platforms/cell/cpufreq_spudemand.c | 2 +-
arch/powerpc/platforms/cell/spufs/sched.c | 9 +-
arch/s390/appldata/appldata_os.c | 4 -
arch/x86/configs/openeuler_defconfig | 2 +
block/blk-iolatency.c | 183 +++++-
drivers/cpuidle/governors/menu.c | 4 -
drivers/spi/spi-rockchip.c | 1 +
fs/fuse/dev.c | 1 +
fs/kernfs/file.c | 31 +-
fs/proc/loadavg.c | 3 -
include/linux/cgroup-defs.h | 12 +
include/linux/cgroup.h | 17 +
include/linux/kernfs.h | 8 +
include/linux/kthread.h | 4 +
include/linux/mmzone.h | 3 +
include/linux/page-flags.h | 5 +
include/linux/psi.h | 55 ++
include/linux/psi_types.h | 95 +++
include/linux/sched.h | 13 +
include/linux/sched/loadavg.h | 24 +-
include/linux/swap.h | 1 +
include/trace/events/mmflags.h | 1 +
init/Kconfig | 28 +
kernel/cgroup/cgroup.c | 55 +-
kernel/debug/kdb/kdb_main.c | 7 +-
kernel/fork.c | 4 +
kernel/kthread.c | 3 +
kernel/sched/Makefile | 1 +
kernel/sched/core.c | 16 +-
kernel/sched/loadavg.c | 139 ++--
kernel/sched/psi.c | 823 ++++++++++++++++++++++++
kernel/sched/sched.h | 178 ++---
kernel/sched/stats.h | 86 +++
kernel/workqueue.c | 23 +
kernel/workqueue_internal.h | 6 +-
mm/compaction.c | 5 +
mm/filemap.c | 20 +-
mm/huge_memory.c | 1 +
mm/memcontrol.c | 2 +
mm/migrate.c | 2 +
mm/page_alloc.c | 9 +
mm/swap_state.c | 1 +
mm/vmscan.c | 10 +
mm/vmstat.c | 1 +
mm/workingset.c | 117 +++-
49 files changed, 1837 insertions(+), 279 deletions(-)
create mode 100644 Documentation/accounting/psi.txt
create mode 100644 include/linux/psi.h
create mode 100644 include/linux/psi_types.h
create mode 100644 kernel/sched/psi.c
--
1.8.3.1
1
25
hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA
---------------------------
enable CONFIG_HISILICON_ERRATUM_1980005 by default for test.
set default value of CONFIG_MMC_DW_IDMAC and CONFIG_MMC_DW_HI3XXX.
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
arch/arm64/configs/hulk_defconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/hulk_defconfig b/arch/arm64/configs/hulk_defconfig
index b146c2e2a93dc..01e02aa2cb092 100644
--- a/arch/arm64/configs/hulk_defconfig
+++ b/arch/arm64/configs/hulk_defconfig
@@ -398,7 +398,7 @@ CONFIG_ARM64_ERRATUM_845719=y
# CONFIG_QCOM_QDF2400_ERRATUM_0065 is not set
# CONFIG_SOCIONEXT_SYNQUACER_PREITS is not set
CONFIG_HISILICON_ERRATUM_161600802=y
-# CONFIG_HISILICON_ERRATUM_1980005 is not set
+CONFIG_HISILICON_ERRATUM_1980005=y
# CONFIG_QCOM_FALKOR_ERRATUM_E1041 is not set
CONFIG_HISILICON_ERRATUM_HIP08_RU_PREFETCH=y
# CONFIG_HISILICON_HIP08_RU_PREFETCH_DEFAULT_OFF is not set
@@ -4204,10 +4204,12 @@ CONFIG_MMC_SPI=m
CONFIG_MMC_CB710=m
CONFIG_MMC_VIA_SDMMC=m
CONFIG_MMC_DW=m
+# CONFIG_MMC_DW_IDMAC is not set
CONFIG_MMC_DW_PLTFM=m
CONFIG_MMC_DW_BLUEFIELD=m
# CONFIG_MMC_DW_EXYNOS is not set
# CONFIG_MMC_DW_HI3798CV200 is not set
+# CONFIG_MMC_DW_HI3XXX is not set
# CONFIG_MMC_DW_K3 is not set
# CONFIG_MMC_DW_PCI is not set
CONFIG_MMC_VUB300=m
--
2.25.1
1
0

[PATCH kernel-4.19 1/4] iommu/arm-smmu: Prevent forced unbinding of Arm SMMU drivers
by Yang Yingliang 28 Oct '21
by Yang Yingliang 28 Oct '21
28 Oct '21
From: Will Deacon <will(a)kernel.org>
mainline inclusion
from mainline-v5.5-rc1
commit 34debdca68efd5625a2fcea7df1a215591a01f80
category: bugfix
bugzilla: 95382
CVE: NA
-------------------------------------------------
Forcefully unbinding the Arm SMMU drivers is a pretty dangerous operation,
since it will likely lead to catastrophic failure for any DMA devices
mastering through the SMMU being unbound. When the driver then attempts
to "handle" the fatal faults, it's very easy to trip over dead data
structures, leading to use-after-free.
On John's machine, he reports that the machine was "unusable" due to
loss of the storage controller following a forced unbind of the SMMUv3
driver:
| # cd ./bus/platform/drivers/arm-smmu-v3
| # echo arm-smmu-v3.0.auto > unbind
| hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
| platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x00000146
| [hwprod 0x00000146, hwcons 0x00000000]
Prevent this forced unbinding of the drivers by setting "suppress_bind_attrs"
to true.
Link: https://lore.kernel.org/lkml/06dfd385-1af0-3106-4cc5-6a5b8e864759@huawei.com
Reported-by: John Garry <john.garry(a)huawei.com>
Signed-off-by: Will Deacon <will(a)kernel.org>
Tested-by: John Garry <john.garry(a)huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
drivers/iommu/arm-smmu-v3.c | 5 +++--
drivers/iommu/arm-smmu.c | 7 ++++---
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c9ff437cb3283..49408e73716e7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -4532,8 +4532,9 @@ MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
static struct platform_driver arm_smmu_driver = {
.driver = {
- .name = "arm-smmu-v3",
- .of_match_table = of_match_ptr(arm_smmu_of_match),
+ .name = "arm-smmu-v3",
+ .of_match_table = of_match_ptr(arm_smmu_of_match),
+ .suppress_bind_attrs = true,
},
.probe = arm_smmu_device_probe,
.remove = arm_smmu_device_remove,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f948de8af412c..2c2bdeac758e0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2287,9 +2287,10 @@ static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume);
static struct platform_driver arm_smmu_driver = {
.driver = {
- .name = "arm-smmu",
- .of_match_table = of_match_ptr(arm_smmu_of_match),
- .pm = &arm_smmu_pm_ops,
+ .name = "arm-smmu",
+ .of_match_table = of_match_ptr(arm_smmu_of_match),
+ .pm = &arm_smmu_pm_ops,
+ .suppress_bind_attrs = true,
},
.probe = arm_smmu_device_probe,
.remove = arm_smmu_device_remove,
--
2.25.1
2
4
backport psi feature from upstream 5.4
bugzilla: https://gitee.com/openeuler/kernel/issues/I47QS2
Baruch Siach (1):
psi: fix reference to kernel commandline enable
Dan Schatzberg (1):
kernel/sched/psi.c: expose pressure metrics on root cgroup
Johannes Weiner (11):
sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
sched: loadavg: make calc_load_n() public
sched: sched.h: make rq locking and clock functions available in
stats.h
sched: introduce this_rq_lock_irq()
psi: pressure stall information for CPU, memory, and IO
psi: cgroup support
psi: make disabling/enabling easier for vendor kernels
psi: fix aggregation idle shut-off
psi: avoid divide-by-zero crash inside virtual machines
fs: kernfs: add poll file operation
sched/psi: Fix sampling error and rare div0 crashes with cgroups and
high uptime
Josef Bacik (1):
blk-iolatency: use a percentile approache for ssd's
Liu Xinpeng (2):
psi:enable psi in config
psi:avoid kabi change
Olof Johansson (1):
kernel/sched/psi.c: simplify cgroup_move_task()
Suren Baghdasaryan (6):
psi: introduce state_mask to represent stalled psi states
psi: make psi_enable static
psi: rename psi fields in preparation for psi trigger addition
psi: split update_stats into parts
psi: track changed states
include/: refactor headers to allow kthread.h inclusion in psi_types.h
Documentation/accounting/psi.txt | 73 +++
Documentation/admin-guide/cgroup-v2.rst | 18 +
Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/arm64/configs/openeuler_defconfig | 2 +
arch/powerpc/platforms/cell/cpufreq_spudemand.c | 2 +-
arch/powerpc/platforms/cell/spufs/sched.c | 9 +-
arch/s390/appldata/appldata_os.c | 4 -
arch/x86/configs/openeuler_defconfig | 2 +
block/blk-iolatency.c | 183 +++++-
drivers/cpuidle/governors/menu.c | 4 -
drivers/spi/spi-rockchip.c | 1 +
fs/kernfs/file.c | 31 +-
fs/proc/loadavg.c | 3 -
include/linux/cgroup-defs.h | 12 +
include/linux/cgroup.h | 17 +
include/linux/kernfs.h | 8 +
include/linux/kthread.h | 4 +
include/linux/psi.h | 55 ++
include/linux/psi_types.h | 95 +++
include/linux/sched.h | 13 +
include/linux/sched/loadavg.h | 24 +-
init/Kconfig | 28 +
kernel/cgroup/cgroup.c | 55 +-
kernel/debug/kdb/kdb_main.c | 7 +-
kernel/fork.c | 4 +
kernel/kthread.c | 3 +
kernel/sched/Makefile | 1 +
kernel/sched/core.c | 16 +-
kernel/sched/loadavg.c | 139 ++--
kernel/sched/psi.c | 823 ++++++++++++++++++++++++
kernel/sched/sched.h | 178 ++---
kernel/sched/stats.h | 86 +++
kernel/workqueue.c | 23 +
kernel/workqueue_internal.h | 6 +-
mm/compaction.c | 5 +
mm/filemap.c | 11 +
mm/page_alloc.c | 9 +
mm/vmscan.c | 9 +
38 files changed, 1726 insertions(+), 241 deletions(-)
create mode 100644 Documentation/accounting/psi.txt
create mode 100644 include/linux/psi.h
create mode 100644 include/linux/psi_types.h
create mode 100644 kernel/sched/psi.c
--
1.8.3.1
2
24
From: zhangguijiang <zhangguijiang(a)huawei.com>
ascend inclusion
category: feature
feature: Ascend emmc adaption
bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL
CVE: NA
--------------------
To identify Ascend HiSilicon emmc chip, we add a customized property
to dts. In this patch we add an interface to read this property. At
the same time, we provided a switch, which is CONFIG_ASCEND_HISI_MMC,
for you to get rid of our modifications.
Signed-off-by: zhangguijiang <zhangguijiang(a)huawei.com>
Reviewed-by: Ding Tianhong <dingtianhong(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
drivers/mmc/Kconfig | 10 ++++
drivers/mmc/core/host.c | 43 ++++++++++++---
include/linux/mmc/host.h | 115 +++++++++++++++++++++++++++++++++++++++
include/linux/mmc/pm.h | 1 +
4 files changed, 162 insertions(+), 7 deletions(-)
diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index ec21388311db2..8b29ecadd1862 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,6 +12,16 @@ menuconfig MMC
If you want MMC/SD/SDIO support, you should say Y here and
also to your specific host controller driver.
+config ASCEND_HISI_MMC
+ bool "Ascend HiSilicon MMC card support"
+ depends on MMC
+ default n
+ help
+ This selects for Hisilicon SoC specific extensions to the
+ Synopsys DesignWare Memory Card Interface driver.
+ You should select this option if you want mmc support on
+ Ascend platform.
+
if MMC
source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index dd1c14d8f6863..b29ee31e7865e 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -348,6 +348,11 @@ int mmc_of_parse(struct mmc_host *host)
EXPORT_SYMBOL(mmc_of_parse);
+static inline int mmc_is_ascend_hi_mci_1(struct device *dev)
+{
+ return !strncmp(dev_name(dev), "hi_mci.1", strlen("hi_mci.1"));
+}
+
/**
* mmc_alloc_host - initialise the per-host structure.
* @extra: sizeof private data structure
@@ -374,7 +379,10 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
}
host->index = err;
-
+ if (mmc_is_ascend_customized(dev)) {
+ if (mmc_is_ascend_hi_mci_1(dev))
+ host->index = 1;
+ }
dev_set_name(&host->class_dev, "mmc%d", host->index);
host->parent = dev;
@@ -383,10 +391,11 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
device_initialize(&host->class_dev);
device_enable_async_suspend(&host->class_dev);
- if (mmc_gpio_alloc(host)) {
- put_device(&host->class_dev);
- return NULL;
- }
+ if (!mmc_is_ascend_customized(host->parent))
+ if (mmc_gpio_alloc(host)) {
+ put_device(&host->class_dev);
+ return NULL;
+ }
spin_lock_init(&host->lock);
init_waitqueue_head(&host->wq);
@@ -439,7 +448,9 @@ int mmc_add_host(struct mmc_host *host)
#endif
mmc_start_host(host);
- mmc_register_pm_notifier(host);
+ if (!mmc_is_ascend_customized(host->parent) ||
+ !(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY))
+ mmc_register_pm_notifier(host);
return 0;
}
@@ -456,7 +467,9 @@ EXPORT_SYMBOL(mmc_add_host);
*/
void mmc_remove_host(struct mmc_host *host)
{
- mmc_unregister_pm_notifier(host);
+ if (!mmc_is_ascend_customized(host->parent) ||
+ !(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY))
+ mmc_unregister_pm_notifier(host);
mmc_stop_host(host);
#ifdef CONFIG_DEBUG_FS
@@ -483,3 +496,19 @@ void mmc_free_host(struct mmc_host *host)
}
EXPORT_SYMBOL(mmc_free_host);
+
+
+int mmc_is_ascend_customized(struct device *dev)
+{
+#ifdef CONFIG_ASCEND_HISI_MMC
+ static int is_ascend_customized = -1;
+
+ if (is_ascend_customized == -1)
+ is_ascend_customized = ((dev == NULL) ? 0 :
+ of_find_property(dev->of_node, "customized", NULL) != NULL);
+ return is_ascend_customized;
+#else
+ return 0;
+#endif
+}
+EXPORT_SYMBOL(mmc_is_ascend_customized);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 7e8e5b20e82b0..2cd5a73ab12a2 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -19,6 +19,9 @@
#include <linux/mmc/pm.h>
#include <linux/dma-direction.h>
+#include <linux/jiffies.h>
+#include <linux/version.h>
+
struct mmc_ios {
unsigned int clock; /* clock rate */
unsigned short vdd;
@@ -63,6 +66,7 @@ struct mmc_ios {
#define MMC_TIMING_MMC_DDR52 8
#define MMC_TIMING_MMC_HS200 9
#define MMC_TIMING_MMC_HS400 10
+#define MMC_TIMING_NEW_SD MMC_TIMING_UHS_SDR12
unsigned char signal_voltage; /* signalling voltage (1.8V or 3.3V) */
@@ -78,7 +82,25 @@ struct mmc_ios {
#define MMC_SET_DRIVER_TYPE_D 3
bool enhanced_strobe; /* hs400es selection */
+#ifdef CONFIG_ASCEND_HISI_MMC
+ unsigned int clock_store; /*store the clock before power off*/
+#endif
+};
+
+#ifdef CONFIG_ASCEND_HISI_MMC
+struct mmc_cmdq_host_ops {
+ int (*enable)(struct mmc_host *mmc);
+ int (*disable)(struct mmc_host *mmc, bool soft);
+ int (*restore_irqs)(struct mmc_host *mmc);
+ int (*request)(struct mmc_host *mmc, struct mmc_request *mrq);
+ int (*halt)(struct mmc_host *mmc, bool halt);
+ void (*post_req)(struct mmc_host *mmc, struct mmc_request *mrq,
+ int err);
+ void (*disable_immediately)(struct mmc_host *mmc);
+ int (*clear_and_halt)(struct mmc_host *mmc);
};
+#endif
+
struct mmc_host;
@@ -168,6 +190,12 @@ struct mmc_host_ops {
*/
int (*multi_io_quirk)(struct mmc_card *card,
unsigned int direction, int blk_size);
+#ifdef CONFIG_ASCEND_HISI_MMC
+ /* Slow down clk for ascend chip SD cards */
+ void (*slowdown_clk)(struct mmc_host *host, int timing);
+ int (*enable_enhanced_strobe)(struct mmc_host *host);
+ int (*send_cmd_direct)(struct mmc_host *host, struct mmc_request *mrq);
+#endif
};
struct mmc_cqe_ops {
@@ -255,6 +283,30 @@ struct mmc_context_info {
wait_queue_head_t wait;
};
+#ifdef CONFIG_ASCEND_HISI_MMC
+/**
+ * mmc_cmdq_context_info - describes the contexts of cmdq
+ * @active_reqs requests being processed
+ * @active_dcmd dcmd in progress, don't issue any
+ * more dcmd requests
+ * @rpmb_in_wait do not pull any more reqs till rpmb is handled
+ * @cmdq_state state of cmdq engine
+ * @req_starved completion should invoke the request_fn since
+ * no tags were available
+ * @cmdq_ctx_lock acquire this before accessing this structure
+ */
+struct mmc_cmdq_context_info {
+ unsigned long active_reqs; /* in-flight requests */
+ bool active_dcmd;
+ bool rpmb_in_wait;
+ unsigned long curr_state;
+
+ /* no free tag available */
+ unsigned long req_starved;
+ spinlock_t cmdq_ctx_lock;
+};
+#endif
+
struct regulator;
struct mmc_pwrseq;
@@ -328,6 +380,9 @@ struct mmc_host {
#define MMC_CAP_UHS_SDR50 (1 << 18) /* Host supports UHS SDR50 mode */
#define MMC_CAP_UHS_SDR104 (1 << 19) /* Host supports UHS SDR104 mode */
#define MMC_CAP_UHS_DDR50 (1 << 20) /* Host supports UHS DDR50 mode */
+#ifdef CONFIG_ASCEND_HISI_MMC
+#define MMC_CAP_RUNTIME_RESUME (1 << 20) /* Resume at runtime_resume. */
+#endif
#define MMC_CAP_UHS (MMC_CAP_UHS_SDR12 | MMC_CAP_UHS_SDR25 | \
MMC_CAP_UHS_SDR50 | MMC_CAP_UHS_SDR104 | \
MMC_CAP_UHS_DDR50)
@@ -368,6 +423,34 @@ struct mmc_host {
#define MMC_CAP2_CQE (1 << 23) /* Has eMMC command queue engine */
#define MMC_CAP2_CQE_DCMD (1 << 24) /* CQE can issue a direct command */
#define MMC_CAP2_AVOID_3_3V (1 << 25) /* Host must negotiate down from 3.3V */
+#ifdef CONFIG_ASCEND_HISI_MMC
+#define MMC_CAP2_CACHE_CTRL (1 << 1) /* Allow cache control */
+#define MMC_CAP2_NO_MULTI_READ (1 << 3) /* Multiblock read don't work */
+#define MMC_CAP2_NO_SLEEP_CMD (1 << 4) /* Don't allow sleep command */
+#define MMC_CAP2_BROKEN_VOLTAGE (1 << 7) /* Use the broken voltage */
+#define MMC_CAP2_DETECT_ON_ERR (1 << 8) /* I/O err check card removal */
+#define MMC_CAP2_HC_ERASE_SZ (1 << 9) /* High-capacity erase size */
+#define MMC_CAP2_PACKED_RD (1 << 12) /* Allow packed read */
+#define MMC_CAP2_PACKED_WR (1 << 13) /* Allow packed write */
+#define MMC_CAP2_PACKED_CMD (MMC_CAP2_PACKED_RD | \
+ MMC_CAP2_PACKED_WR)
+#define MMC_CAP2_CMD_QUEUE (1 << 18) /* support eMMC command queue */
+#define MMC_CAP2_ENHANCED_STROBE (1 << 19)
+#define MMC_CAP2_CACHE_FLUSH_BARRIER (1 << 20)
+/* Allow background operations auto enable control */
+#define MMC_CAP2_BKOPS_AUTO_CTRL (1 << 21)
+/* Allow background operations manual enable control */
+#define MMC_CAP2_BKOPS_MANUAL_CTRL (1 << 22)
+
+/* host is connected by via modem through sdio */
+#define MMC_CAP2_SUPPORT_VIA_MODEM (1 << 26)
+/* host is connected by wifi through sdio */
+#define MMC_CAP2_SUPPORT_WIFI (1 << 27)
+/* host is connected to 1102 wifi */
+#define MMC_CAP2_SUPPORT_WIFI_CMD11 (1 << 28)
+/* host do not support low power for wifi*/
+#define MMC_CAP2_WIFI_NO_LOWPWR (1 << 29)
+#endif
int fixed_drv_type; /* fixed driver type for non-removable media */
@@ -461,6 +544,12 @@ struct mmc_host {
bool cqe_on;
unsigned long private[0] ____cacheline_aligned;
+#ifdef CONFIG_ASCEND_HISI_MMC
+ const struct mmc_cmdq_host_ops *cmdq_ops;
+ int sdio_present;
+ unsigned int cmdq_slots;
+ struct mmc_cmdq_context_info cmdq_ctx;
+#endif
};
struct device_node;
@@ -588,4 +677,30 @@ static inline enum dma_data_direction mmc_get_dma_dir(struct mmc_data *data)
int mmc_send_tuning(struct mmc_host *host, u32 opcode, int *cmd_error);
int mmc_abort_tuning(struct mmc_host *host, u32 opcode);
+#ifdef CONFIG_ASCEND_HISI_MMC
+int mmc_cache_ctrl(struct mmc_host *host, u8 enable);
+int mmc_card_awake(struct mmc_host *host);
+int mmc_card_sleep(struct mmc_host *host);
+int mmc_card_can_sleep(struct mmc_host *host);
+#else
+static inline int mmc_cache_ctrl(struct mmc_host *host, u8 enable)
+{
+ return 0;
+}
+static inline int mmc_card_awake(struct mmc_host *host)
+{
+ return 0;
+}
+static inline int mmc_card_sleep(struct mmc_host *host)
+{
+ return 0;
+}
+static inline int mmc_card_can_sleep(struct mmc_host *host)
+{
+ return 0;
+}
+#endif
+
+int mmc_is_ascend_customized(struct device *dev);
+
#endif /* LINUX_MMC_HOST_H */
diff --git a/include/linux/mmc/pm.h b/include/linux/mmc/pm.h
index 4a139204c20c0..6e2d6a135c7e0 100644
--- a/include/linux/mmc/pm.h
+++ b/include/linux/mmc/pm.h
@@ -26,5 +26,6 @@ typedef unsigned int mmc_pm_flag_t;
#define MMC_PM_KEEP_POWER (1 << 0) /* preserve card power during suspend */
#define MMC_PM_WAKE_SDIO_IRQ (1 << 1) /* wake up host system on SDIO IRQ assertion */
+#define MMC_PM_IGNORE_PM_NOTIFY (1 << 2) /* ignore mmc pm notify */
#endif /* LINUX_MMC_PM_H */
--
2.25.1
1
7

27 Oct '21
From: zhangguijiang <zhangguijiang(a)huawei.com>
ascend inclusion
category: feature
feature: Ascend emmc adaption
bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL
CVE: NA
--------------------
To identify Ascend HiSilicon emmc chip, we add a customized property
to dts. In this patch we add an interface to read this property. At
the same time, we provided a switch, which is CONFIG_ASCEND_HISI_MMC,
for you to get rid of our modifications.
Signed-off-by: zhangguijiang <zhangguijiang(a)huawei.com>
Reviewed-by: Ding Tianhong <dingtianhong(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
drivers/mmc/Kconfig | 10 ++++
drivers/mmc/core/host.c | 47 +++++++++++++---
include/linux/mmc/host.h | 115 +++++++++++++++++++++++++++++++++++++++
include/linux/mmc/pm.h | 1 +
4 files changed, 164 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index ec21388311db2..8b29ecadd1862 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,6 +12,16 @@ menuconfig MMC
If you want MMC/SD/SDIO support, you should say Y here and
also to your specific host controller driver.
+config ASCEND_HISI_MMC
+ bool "Ascend HiSilicon MMC card support"
+ depends on MMC
+ default n
+ help
+ This selects for Hisilicon SoC specific extensions to the
+ Synopsys DesignWare Memory Card Interface driver.
+ You should select this option if you want mmc support on
+ Ascend platform.
+
if MMC
source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index f57f5de542064..69cc778706855 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -348,6 +348,11 @@ int mmc_of_parse(struct mmc_host *host)
EXPORT_SYMBOL(mmc_of_parse);
+static inline int mmc_is_ascend_hi_mci_1(struct device *dev)
+{
+ return !strncmp(dev_name(dev), "hi_mci.1", strlen("hi_mci.1"));
+}
+
/**
* mmc_alloc_host - initialise the per-host structure.
* @extra: sizeof private data structure
@@ -374,7 +379,10 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
}
host->index = err;
-
+ if (mmc_is_ascend_customized(dev)) {
+ if (mmc_is_ascend_hi_mci_1(dev))
+ host->index = 1;
+ }
dev_set_name(&host->class_dev, "mmc%d", host->index);
host->parent = dev;
@@ -383,12 +391,13 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
device_initialize(&host->class_dev);
device_enable_async_suspend(&host->class_dev);
- if (mmc_gpio_alloc(host)) {
- put_device(&host->class_dev);
- ida_simple_remove(&mmc_host_ida, host->index);
- kfree(host);
- return NULL;
- }
+ if (!mmc_is_ascend_customized(host->parent))
+ if (mmc_gpio_alloc(host)) {
+ put_device(&host->class_dev);
+ ida_simple_remove(&mmc_host_ida, host->index);
+ kfree(host);
+ return NULL;
+ }
spin_lock_init(&host->lock);
init_waitqueue_head(&host->wq);
@@ -441,7 +450,9 @@ int mmc_add_host(struct mmc_host *host)
#endif
mmc_start_host(host);
- mmc_register_pm_notifier(host);
+ if (!mmc_is_ascend_customized(host->parent) ||
+ !(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY))
+ mmc_register_pm_notifier(host);
return 0;
}
@@ -458,7 +469,9 @@ EXPORT_SYMBOL(mmc_add_host);
*/
void mmc_remove_host(struct mmc_host *host)
{
- mmc_unregister_pm_notifier(host);
+ if (!mmc_is_ascend_customized(host->parent) ||
+ !(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY))
+ mmc_unregister_pm_notifier(host);
mmc_stop_host(host);
#ifdef CONFIG_DEBUG_FS
@@ -485,3 +498,19 @@ void mmc_free_host(struct mmc_host *host)
}
EXPORT_SYMBOL(mmc_free_host);
+
+
+int mmc_is_ascend_customized(struct device *dev)
+{
+#ifdef CONFIG_ASCEND_HISI_MMC
+ static int is_ascend_customized = -1;
+
+ if (is_ascend_customized == -1)
+ is_ascend_customized = ((dev == NULL) ? 0 :
+ of_find_property(dev->of_node, "customized", NULL) != NULL);
+ return is_ascend_customized;
+#else
+ return 0;
+#endif
+}
+EXPORT_SYMBOL(mmc_is_ascend_customized);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 840462ed1ec7e..78b4d0a813b71 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -19,6 +19,9 @@
#include <linux/mmc/pm.h>
#include <linux/dma-direction.h>
+#include <linux/jiffies.h>
+#include <linux/version.h>
+
struct mmc_ios {
unsigned int clock; /* clock rate */
unsigned short vdd;
@@ -63,6 +66,7 @@ struct mmc_ios {
#define MMC_TIMING_MMC_DDR52 8
#define MMC_TIMING_MMC_HS200 9
#define MMC_TIMING_MMC_HS400 10
+#define MMC_TIMING_NEW_SD MMC_TIMING_UHS_SDR12
unsigned char signal_voltage; /* signalling voltage (1.8V or 3.3V) */
@@ -78,7 +82,25 @@ struct mmc_ios {
#define MMC_SET_DRIVER_TYPE_D 3
bool enhanced_strobe; /* hs400es selection */
+#ifdef CONFIG_ASCEND_HISI_MMC
+ unsigned int clock_store; /*store the clock before power off*/
+#endif
+};
+
+#ifdef CONFIG_ASCEND_HISI_MMC
+struct mmc_cmdq_host_ops {
+ int (*enable)(struct mmc_host *mmc);
+ int (*disable)(struct mmc_host *mmc, bool soft);
+ int (*restore_irqs)(struct mmc_host *mmc);
+ int (*request)(struct mmc_host *mmc, struct mmc_request *mrq);
+ int (*halt)(struct mmc_host *mmc, bool halt);
+ void (*post_req)(struct mmc_host *mmc, struct mmc_request *mrq,
+ int err);
+ void (*disable_immediately)(struct mmc_host *mmc);
+ int (*clear_and_halt)(struct mmc_host *mmc);
};
+#endif
+
struct mmc_host;
@@ -168,6 +190,12 @@ struct mmc_host_ops {
*/
int (*multi_io_quirk)(struct mmc_card *card,
unsigned int direction, int blk_size);
+#ifdef CONFIG_ASCEND_HISI_MMC
+ /* Slow down clk for ascend chip SD cards */
+ void (*slowdown_clk)(struct mmc_host *host, int timing);
+ int (*enable_enhanced_strobe)(struct mmc_host *host);
+ int (*send_cmd_direct)(struct mmc_host *host, struct mmc_request *mrq);
+#endif
};
struct mmc_cqe_ops {
@@ -255,6 +283,30 @@ struct mmc_context_info {
wait_queue_head_t wait;
};
+#ifdef CONFIG_ASCEND_HISI_MMC
+/**
+ * mmc_cmdq_context_info - describes the contexts of cmdq
+ * @active_reqs requests being processed
+ * @active_dcmd dcmd in progress, don't issue any
+ * more dcmd requests
+ * @rpmb_in_wait do not pull any more reqs till rpmb is handled
+ * @cmdq_state state of cmdq engine
+ * @req_starved completion should invoke the request_fn since
+ * no tags were available
+ * @cmdq_ctx_lock acquire this before accessing this structure
+ */
+struct mmc_cmdq_context_info {
+ unsigned long active_reqs; /* in-flight requests */
+ bool active_dcmd;
+ bool rpmb_in_wait;
+ unsigned long curr_state;
+
+ /* no free tag available */
+ unsigned long req_starved;
+ spinlock_t cmdq_ctx_lock;
+};
+#endif
+
struct regulator;
struct mmc_pwrseq;
@@ -328,6 +380,9 @@ struct mmc_host {
#define MMC_CAP_UHS_SDR50 (1 << 18) /* Host supports UHS SDR50 mode */
#define MMC_CAP_UHS_SDR104 (1 << 19) /* Host supports UHS SDR104 mode */
#define MMC_CAP_UHS_DDR50 (1 << 20) /* Host supports UHS DDR50 mode */
+#ifdef CONFIG_ASCEND_HISI_MMC
+#define MMC_CAP_RUNTIME_RESUME (1 << 20) /* Resume at runtime_resume. */
+#endif
#define MMC_CAP_UHS (MMC_CAP_UHS_SDR12 | MMC_CAP_UHS_SDR25 | \
MMC_CAP_UHS_SDR50 | MMC_CAP_UHS_SDR104 | \
MMC_CAP_UHS_DDR50)
@@ -367,6 +422,34 @@ struct mmc_host {
#define MMC_CAP2_CQE (1 << 23) /* Has eMMC command queue engine */
#define MMC_CAP2_CQE_DCMD (1 << 24) /* CQE can issue a direct command */
#define MMC_CAP2_AVOID_3_3V (1 << 25) /* Host must negotiate down from 3.3V */
+#ifdef CONFIG_ASCEND_HISI_MMC
+#define MMC_CAP2_CACHE_CTRL (1 << 1) /* Allow cache control */
+#define MMC_CAP2_NO_MULTI_READ (1 << 3) /* Multiblock read don't work */
+#define MMC_CAP2_NO_SLEEP_CMD (1 << 4) /* Don't allow sleep command */
+#define MMC_CAP2_BROKEN_VOLTAGE (1 << 7) /* Use the broken voltage */
+#define MMC_CAP2_DETECT_ON_ERR (1 << 8) /* I/O err check card removal */
+#define MMC_CAP2_HC_ERASE_SZ (1 << 9) /* High-capacity erase size */
+#define MMC_CAP2_PACKED_RD (1 << 12) /* Allow packed read */
+#define MMC_CAP2_PACKED_WR (1 << 13) /* Allow packed write */
+#define MMC_CAP2_PACKED_CMD (MMC_CAP2_PACKED_RD | \
+ MMC_CAP2_PACKED_WR)
+#define MMC_CAP2_CMD_QUEUE (1 << 18) /* support eMMC command queue */
+#define MMC_CAP2_ENHANCED_STROBE (1 << 19)
+#define MMC_CAP2_CACHE_FLUSH_BARRIER (1 << 20)
+/* Allow background operations auto enable control */
+#define MMC_CAP2_BKOPS_AUTO_CTRL (1 << 21)
+/* Allow background operations manual enable control */
+#define MMC_CAP2_BKOPS_MANUAL_CTRL (1 << 22)
+
+/* host is connected by via modem through sdio */
+#define MMC_CAP2_SUPPORT_VIA_MODEM (1 << 26)
+/* host is connected by wifi through sdio */
+#define MMC_CAP2_SUPPORT_WIFI (1 << 27)
+/* host is connected to 1102 wifi */
+#define MMC_CAP2_SUPPORT_WIFI_CMD11 (1 << 28)
+/* host do not support low power for wifi*/
+#define MMC_CAP2_WIFI_NO_LOWPWR (1 << 29)
+#endif
int fixed_drv_type; /* fixed driver type for non-removable media */
@@ -460,6 +543,12 @@ struct mmc_host {
bool cqe_on;
unsigned long private[0] ____cacheline_aligned;
+#ifdef CONFIG_ASCEND_HISI_MMC
+ const struct mmc_cmdq_host_ops *cmdq_ops;
+ int sdio_present;
+ unsigned int cmdq_slots;
+ struct mmc_cmdq_context_info cmdq_ctx;
+#endif
};
struct device_node;
@@ -587,4 +676,30 @@ static inline enum dma_data_direction mmc_get_dma_dir(struct mmc_data *data)
int mmc_send_tuning(struct mmc_host *host, u32 opcode, int *cmd_error);
int mmc_abort_tuning(struct mmc_host *host, u32 opcode);
+#ifdef CONFIG_ASCEND_HISI_MMC
+int mmc_cache_ctrl(struct mmc_host *host, u8 enable);
+int mmc_card_awake(struct mmc_host *host);
+int mmc_card_sleep(struct mmc_host *host);
+int mmc_card_can_sleep(struct mmc_host *host);
+#else
+static inline int mmc_cache_ctrl(struct mmc_host *host, u8 enable)
+{
+ return 0;
+}
+static inline int mmc_card_awake(struct mmc_host *host)
+{
+ return 0;
+}
+static inline int mmc_card_sleep(struct mmc_host *host)
+{
+ return 0;
+}
+static inline int mmc_card_can_sleep(struct mmc_host *host)
+{
+ return 0;
+}
+#endif
+
+int mmc_is_ascend_customized(struct device *dev);
+
#endif /* LINUX_MMC_HOST_H */
diff --git a/include/linux/mmc/pm.h b/include/linux/mmc/pm.h
index 4a139204c20c0..6e2d6a135c7e0 100644
--- a/include/linux/mmc/pm.h
+++ b/include/linux/mmc/pm.h
@@ -26,5 +26,6 @@ typedef unsigned int mmc_pm_flag_t;
#define MMC_PM_KEEP_POWER (1 << 0) /* preserve card power during suspend */
#define MMC_PM_WAKE_SDIO_IRQ (1 << 1) /* wake up host system on SDIO IRQ assertion */
+#define MMC_PM_IGNORE_PM_NOTIFY (1 << 2) /* ignore mmc pm notify */
#endif /* LINUX_MMC_PM_H */
--
2.25.1
1
7
首先非常感谢大家参与openEuler社区,并给openEuler kernel开源项目提补丁。
openEuler kernel开源项目的openEuler-21.03创新分支以更加开阔的视野接纳企业,高校以及所有爱好和关注Linux内核
人士的想法和建议,期望和大家共同探索底层软件在构建云与计算、5G、终端等全场景下的前景和潜力,共同推动
底软在物联网、智能计算背景下的全新视界;另外openEuler-21.03创新分支同时希望给高校提供更多的教学素材,
为高校基础研究和产教结合道路奉献绵薄之力。
- 如果您对如何参与openEuler kernel开源项目有疑问,可以发邮件至bobo.shaobowang(a)huawei.com,
也可以参考文档:https://mp.weixin.qq.com/s/a42a5VfayFeJgWitqbI8Qw
- 您也可以通过openEuler kernel官网提交issue: https://gitee.com/openeuler/kernel
以下补丁已经过社区maintainer review和openEuler社区的验证测试,将合入到openEuler-21.03分支5.10.0-4.25.0版本。
0ee74f5aa533 (HEAD -> openEuler-21.03, tag: 5.10.0-4.25.0) RDS tcp loopback connection can hang
fc7ec5aebb45 usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind
dae6a368dafc ALSA: seq: Fix race of snd_seq_timer_open()
3a00695cb8e2 RDMA/mlx4: Do not map the core_clock page to user space unless enabled
ed4fd7c42adc Revert "ACPI: sleep: Put the FACS table after using it"
37c85837cf42 ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
94fccf25dd49 nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
47f090fbcbb9 regulator: fan53880: Fix missing n_voltages setting
0b9b74807478 net/nfc/rawsock.c: fix a permission check bug
789459f344e7 scsi: core: Only put parent device if host state differs from SHOST_CREATED
08f8e0fb4b59 usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
0e2bd1220f8a phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe()
47f3671cfd67 usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms
559b80a5925d scsi: core: Fix failure handling of scsi_add_host_with_dma()
e897c103ecde ALSA: hda/realtek: headphone and mic don't work on an Acer laptop
d39b22f602f5 isdn: mISDN: netjet: Fix crash in nj_probe:
1013d6a98975 nvmet: fix false keep-alive timeout when a controller is torn down
9ffec7fff577 cgroup: disable controllers at parse time
e2c4bbd88218 RDMA/ipoib: Fix warning caused by destroying non-initial netns
0eb3e33d9814 gpio: wcd934x: Fix shift-out-of-bounds error
af64e02cb927 NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()
34a0d49e311d usb: dwc3: ep0: fix NULL pointer exception
ca5ed7b6d2ac spi: bcm2835: Fix out-of-bounds access with more than 4 slaves
420a6301307e NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
762e6acf28f1 regulator: core: resolve supply for boot-on/always-on regulators
d84eb5070d03 net: macb: ensure the device is available before accessing GEMGXL control registers
b9a3b65556e9 sched/fair: Make sure to update tg contrib for blocked load
074babe38e68 KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message
447f10de04c8 ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun()
262986c9f618 dm verity: fix require_signatures module_param permissions
bfa96859a312 usb: chipidea: udc: assign interrupt number to USB gadget structure
177f5f81e9fc regulator: max77620: Use device_set_of_node_from_dev()
818f49a8aa09 USB: serial: omninet: add device id for Zyxel Omni 56K Plus
1790bfdad278 spi: Cleanup on failure of initial setup
642b2258a1f7 drm/msm/a6xx: avoid shadow NULL reference in failure path
be1c43cba161 USB: f_ncm: ncm_bitrate (speed) is unsigned
e41037151205 nvme-fabrics: decode host pathing error for connect
期待后续合作。
Alexandre GRIVEAUX (1):
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
Axel Lin (1):
regulator: fan53880: Fix missing n_voltages setting
Dai Ngo (1):
NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on
error.
Dmitry Baryshkov (1):
regulator: core: resolve supply for boot-on/always-on regulators
Dmitry Osipenko (1):
regulator: max77620: Use device_set_of_node_from_dev()
Hannes Reinecke (1):
nvme-fabrics: decode host pathing error for connect
Hans de Goede (1):
ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
Hui Wang (1):
ALSA: hda/realtek: headphone and mic don't work on an Acer laptop
Jeimon (1):
net/nfc/rawsock.c: fix a permission check bug
John Keeping (1):
dm verity: fix require_signatures module_param permissions
Jonathan Marek (1):
drm/msm/a6xx: avoid shadow NULL reference in failure path
Kamal Heib (1):
RDMA/ipoib: Fix warning caused by destroying non-initial netns
Kyle Tso (1):
usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms
Li Jun (1):
usb: chipidea: udc: assign interrupt number to USB gadget structure
Lukas Wunner (2):
spi: Cleanup on failure of initial setup
spi: bcm2835: Fix out-of-bounds access with more than 4 slaves
Maciej Żenczykowski (1):
USB: f_ncm: ncm_bitrate (speed) is unsigned
Marian-Cristian Rotariu (1):
usb: dwc3: ep0: fix NULL pointer exception
Mayank Rana (1):
usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
Ming Lei (2):
scsi: core: Fix failure handling of scsi_add_host_with_dma()
scsi: core: Only put parent device if host state differs from
SHOST_CREATED
Rao Shoaib (1):
RDS tcp loopback connection can hang
Sagi Grimberg (2):
nvmet: fix false keep-alive timeout when a controller is torn down
nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
Sean Christopherson (1):
KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message
Shakeel Butt (1):
cgroup: disable controllers at parse time
Shay Drory (1):
RDMA/mlx4: Do not map the core_clock page to user space unless enabled
Srinivas Kandagatla (1):
gpio: wcd934x: Fix shift-out-of-bounds error
Takashi Iwai (1):
ALSA: seq: Fix race of snd_seq_timer_open()
Takashi Sakamoto (1):
ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun()
Trond Myklebust (1):
NFSv4: Fix deadlock between nfs4_evict_inode() and
nfs4_opendata_get_inode()
Vincent Guittot (1):
sched/fair: Make sure to update tg contrib for blocked load
Wang Wensheng (1):
phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe()
Wesley Cheng (1):
usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind
Zhang Rui (1):
Revert "ACPI: sleep: Put the FACS table after using it"
Zheyu Ma (1):
isdn: mISDN: netjet: Fix crash in nj_probe:
Zong Li (1):
net: macb: ensure the device is available before accessing GEMGXL
control registers
arch/x86/kvm/trace.h | 6 ++--
drivers/acpi/sleep.c | 4 +--
drivers/gpio/gpio-wcd934x.c | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
drivers/infiniband/hw/mlx4/main.c | 5 +--
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 1 +
drivers/isdn/hardware/mISDN/netjet.c | 1 -
drivers/md/dm-verity-verify-sig.c | 2 +-
drivers/net/ethernet/cadence/macb_main.c | 3 ++
drivers/net/ethernet/mellanox/mlx4/fw.c | 3 ++
drivers/net/ethernet/mellanox/mlx4/fw.h | 1 +
drivers/net/ethernet/mellanox/mlx4/main.c | 6 ++++
drivers/nvme/host/Kconfig | 3 +-
drivers/nvme/host/fabrics.c | 5 +++
drivers/nvme/target/core.c | 15 ++++++---
drivers/nvme/target/nvmet.h | 2 +-
drivers/phy/cadence/phy-cadence-sierra.c | 1 +
drivers/regulator/core.c | 6 ++++
drivers/regulator/fan53880.c | 3 ++
drivers/regulator/max77620-regulator.c | 7 +++++
drivers/scsi/hosts.c | 16 +++++-----
drivers/spi/spi-bcm2835.c | 10 ++++--
drivers/spi/spi-bitbang.c | 18 ++++++++---
drivers/spi/spi-fsl-spi.c | 4 +++
drivers/spi/spi-omap-uwire.c | 9 +++++-
drivers/spi/spi-omap2-mcspi.c | 33 ++++++++++++--------
drivers/spi/spi-pxa2xx.c | 9 +++++-
drivers/usb/chipidea/udc.c | 1 +
drivers/usb/dwc3/ep0.c | 3 ++
drivers/usb/gadget/function/f_fs.c | 3 ++
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/serial/omninet.c | 2 ++
drivers/usb/typec/ucsi/ucsi.c | 1 +
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4proc.c | 20 +++++++++++-
include/linux/mlx4/device.h | 1 +
include/linux/usb/pd.h | 2 +-
kernel/cgroup/cgroup.c | 13 ++++----
kernel/sched/fair.c | 2 +-
net/nfc/rawsock.c | 2 +-
net/rds/connection.c | 23 ++++++++++----
net/rds/tcp.c | 4 +--
net/rds/tcp.h | 3 +-
net/rds/tcp_listen.c | 6 ++++
sound/core/seq/seq_timer.c | 10 +++++-
sound/firewire/amdtp-stream.c | 2 +-
sound/pci/hda/patch_realtek.c | 12 +++++++
sound/soc/intel/boards/bytcr_rt5640.c | 11 +++++++
48 files changed, 228 insertions(+), 73 deletions(-)
--
2.25.1
1
37
kylin inclusion
category: feature
bugfix: https://gitee.com/openeuler-competition/summer-2021/issues/I3EIMT?from=proj…
CVE: NA
--------------------------------------------------
In some atomic operation scenarios, such as interrupt context, it is not possible to sleep.
Therefore, when memory allocation in this scenario, it will not enter the direct_reclaim link,
and will not even wake up the kswapd process. For example, in the soft interrupt processing
function of the network card receiving packets, there may be a phenomenon that the page cache
is too occupied and the remaining memory of the system is insufficient, and the memory cannot
be allocated for the received data packet, and the packet is directly lost.
This is the problem to be solved by the page cache limit.
The page cache limit is mainly used to detect whether the page cache exceeds the upper limit
we set (/proc/sys/vm/pagecache_limit_ratio) when the page cache is added to the application
(that is, when the add_to_page_cache_lru function is called)
Provides 3 /proc interfaces, respectively:
echo x > /proc/sys/vm/pagecache_limit_ratio(0 < x < 100):Enable page cache limit function
x means limit the percentage of page cache to the total system memory
/proc/sys/vm/pagecache_limit_ignore_dirty :Whether to ignore dirty pages when calculating the
memory occupied by the page cache. The default value is 1, which means ignore.
Because the recycling of dirty pages is time-consuming.
/proc/sys/vm/pagecache_limit_async:1 means asynchronous recycling, 0 means synchronous recycling
signed-off-by: wen zhiwei <wenzhiwei(a)kylinos.cn>
Signed-off-by: wenzhiwei <wenzhiwei(a)kylinos.cn>
---
include/linux/memcontrol.h | 7 +-
include/linux/mmzone.h | 7 +
include/linux/swap.h | 15 +
include/trace/events/vmscan.h | 28 +-
kernel/sysctl.c | 139 ++++++++
mm/filemap.c | 2 +
mm/page_alloc.c | 52 +++
mm/vmscan.c | 650 ++++++++++++++++++++++++++++++++--
mm/workingset.c | 1 +
9 files changed, 862 insertions(+), 39 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 71a5b589bddb..731a2cd2ea86 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,7 @@ enum memcg_memory_event {
struct mem_cgroup_reclaim_cookie {
pg_data_t *pgdat;
+ int priority;
unsigned int generation;
};
@@ -492,8 +493,7 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
* @node combination. This can be the node lruvec, if the memory
* controller is disabled.
*/
-static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
- struct pglist_data *pgdat)
+static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct pglist_data *pgdat)
{
struct mem_cgroup_per_node *mz;
struct lruvec *lruvec;
@@ -1066,8 +1066,7 @@ static inline void mem_cgroup_migrate(struct page *old, struct page *new)
{
}
-static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
- struct pglist_data *pgdat)
+static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct pglist_data *pgdat)
{
return &pgdat->__lruvec;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 82fceef88448..d3c5258e5d0d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -445,6 +445,13 @@ struct zone {
* changes.
*/
long lowmem_reserve[MAX_NR_ZONES];
+ /*
+ * This atomic counter is set when there is pagecache limit
+ * reclaim going on on this particular zone. Other potential
+ * reclaiers should back off to prevent from heavy lru_lock
+ * bouncing.
+ */
+ atomic_t pagecache_reclaim;
#ifdef CONFIG_NEED_MULTIPLE_NODES
int node;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 9b708c0288bc..b9329e575836 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -377,6 +377,21 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
unsigned long *nr_scanned);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
+
+#define ADDITIONAL_RECLAIM_RATIO 2
+extern unsigned long pagecache_over_limit(void);
+extern void shrink_page_cache(gfp_t mask, struct page *page);
+extern unsigned long vm_pagecache_limit_pages;
+extern unsigned long vm_pagecache_limit_reclaim_pages;
+extern int unsigned vm_pagecache_limit_ratio;
+extern int vm_pagecache_limit_reclaim_ratio;
+extern unsigned int vm_pagecache_ignore_dirty;
+extern unsigned long pagecache_over_limit(void);
+extern unsigned int vm_pagecache_limit_async;
+extern int kpagecache_limitd_run(void);
+extern void kpagecache_limitd_stop(void);
+extern unsigned int vm_pagecache_ignore_slab;
+
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern unsigned long reclaim_pages(struct list_head *page_list);
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 2070df64958e..3bfe47a85f6f 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -183,48 +183,48 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re
#endif /* CONFIG_MEMCG */
TRACE_EVENT(mm_shrink_slab_start,
- TP_PROTO(struct shrinker *shr, struct shrink_control *sc,
- long nr_objects_to_shrink, unsigned long cache_items,
- unsigned long long delta, unsigned long total_scan,
- int priority),
-
- TP_ARGS(shr, sc, nr_objects_to_shrink, cache_items, delta, total_scan,
- priority),
+ TP_PROTO(struct shrinker *shr, struct shrink_control *sc,
+ long nr_objects_to_shrink,unsigned long pgs_scanned,
+ unsigned long lru_pgs, unsigned long cache_items,
+ unsigned long long delta, unsigned long total_scan),
+ TP_ARGS(shr, sc, nr_objects_to_shrink,pgs_scanned, lru_pgs, cache_items, delta, total_scan),
TP_STRUCT__entry(
__field(struct shrinker *, shr)
__field(void *, shrink)
__field(int, nid)
__field(long, nr_objects_to_shrink)
__field(gfp_t, gfp_flags)
+ __field(unsigned long, pgs_scanned)
+ __field(unsigned long, lru_pgs)
__field(unsigned long, cache_items)
__field(unsigned long long, delta)
__field(unsigned long, total_scan)
- __field(int, priority)
),
TP_fast_assign(
- __entry->shr = shr;
+ __entry->shr = shr;
__entry->shrink = shr->scan_objects;
__entry->nid = sc->nid;
__entry->nr_objects_to_shrink = nr_objects_to_shrink;
__entry->gfp_flags = sc->gfp_mask;
+ __entry->pgs_scanned = pgs_scanned;
+ __entry->lru_pgs = lru_pgs;
__entry->cache_items = cache_items;
__entry->delta = delta;
__entry->total_scan = total_scan;
- __entry->priority = priority;
),
-
- TP_printk("%pS %p: nid: %d objects to shrink %ld gfp_flags %s cache items %ld delta %lld total_scan %ld priority %d",
+TP_printk("%pF %p: nid: %d objects to shrink %ld gfp_flags %s pgs_scanned %ld lru_pgs %ld cache items %ld delta %lld total_scan %ld",
__entry->shrink,
__entry->shr,
__entry->nid,
__entry->nr_objects_to_shrink,
show_gfp_flags(__entry->gfp_flags),
+ __entry->pgs_scanned,
+ __entry->lru_pgs,
__entry->cache_items,
__entry->delta,
- __entry->total_scan,
- __entry->priority)
+ __entry->total_scan)
);
TRACE_EVENT(mm_shrink_slab_end,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c7ca58de3b1b..4ef436cdfdad 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -111,6 +111,7 @@
static int sixty = 60;
#endif
+static int zero;
static int __maybe_unused neg_one = -1;
static int __maybe_unused two = 2;
static int __maybe_unused four = 4;
@@ -648,6 +649,68 @@ static int do_proc_dointvec(struct ctl_table *table, int write,
return __do_proc_dointvec(table->data, table, write,
buffer, lenp, ppos, conv, data);
}
+int setup_pagecache_limit(void)
+{
+ /* reclaim $ADDITIONAL_RECLAIM_PAGES more than limit. */
+ vm_pagecache_limit_reclaim_ratio = vm_pagecache_limit_ratio + ADDITIONAL_RECLAIM_RATIO;
+
+ if (vm_pagecache_limit_reclaim_ratio > 100)
+ vm_pagecache_limit_reclaim_ratio = 100;
+ if (vm_pagecache_limit_ratio == 0)
+ vm_pagecache_limit_reclaim_ratio = 0;
+
+ vm_pagecache_limit_pages = vm_pagecache_limit_ratio * totalram_pages() / 100;
+ vm_pagecache_limit_reclaim_pages = vm_pagecache_limit_reclaim_ratio * totalram_pages() / 100;
+ return 0;
+}
+
+static int pc_limit_proc_dointvec(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+ if (write && !ret)
+ ret = setup_pagecache_limit();
+ return ret;
+}
+static int pc_reclaim_limit_proc_dointvec(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int pre_reclaim_ratio = vm_pagecache_limit_reclaim_ratio;
+ int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (write && vm_pagecache_limit_ratio == 0)
+ return -EINVAL;
+
+ if (write && !ret) {
+ if (vm_pagecache_limit_reclaim_ratio - vm_pagecache_limit_ratio < ADDITIONAL_RECLAIM_RATIO) {
+ vm_pagecache_limit_reclaim_ratio = pre_reclaim_ratio;
+ return -EINVAL;
+ }
+ vm_pagecache_limit_reclaim_pages = vm_pagecache_limit_reclaim_ratio * totalram_pages() / 100;
+ }
+ return ret;
+}
+static int pc_limit_async_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (write && vm_pagecache_limit_ratio == 0)
+ return -EINVAL;
+
+ if (write && !ret) {
+ if (vm_pagecache_limit_async > 0) {
+ if (kpagecache_limitd_run()) {
+ vm_pagecache_limit_async = 0;
+ return -EINVAL;
+ }
+ }
+ else {
+ kpagecache_limitd_stop();
+ }
+ }
+ return ret;
+}
static int do_proc_douintvec_w(unsigned int *tbl_data,
struct ctl_table *table,
@@ -2711,6 +2774,14 @@ static struct ctl_table kern_table[] = {
},
{ }
};
+static int pc_limit_proc_dointvec(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+
+static int pc_reclaim_limit_proc_dointvec(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+
+static int pc_limit_async_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
static struct ctl_table vm_table[] = {
{
@@ -2833,6 +2904,74 @@ static struct ctl_table vm_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = &two_hundred,
},
+ {
+ .procname = "pagecache_limit_ratio",
+ .data = &vm_pagecache_limit_ratio,
+ .maxlen = sizeof(vm_pagecache_limit_ratio),
+ .mode = 0644,
+ .proc_handler = &pc_limit_proc_dointvec,
+ .extra1 = &zero,
+ .extra2 = &one_hundred,
+ },
+ {
+ .procname = "pagecache_limit_reclaim_ratio",
+ .data = &vm_pagecache_limit_reclaim_ratio,
+ .maxlen = sizeof(vm_pagecache_limit_reclaim_ratio),
+ .mode = 0644,
+ .proc_handler = &pc_reclaim_limit_proc_dointvec,
+ .extra1 = &zero,
+ .extra2 = &one_hundred,
+ },
+ {
+ .procname = "pagecache_limit_ignore_dirty",
+ .data = &vm_pagecache_ignore_dirty,
+ .maxlen = sizeof(vm_pagecache_ignore_dirty),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#ifdef CONFIG_SHRINK_PAGECACHE
+ {
+ .procname = "cache_reclaim_s",
+ .data = &vm_cache_reclaim_s,
+ .maxlen = sizeof(vm_cache_reclaim_s),
+ .mode = 0644,
+ .proc_handler = cache_reclaim_sysctl_handler,
+ .extra1 = &vm_cache_reclaim_s_min,
+ .extra2 = &vm_cache_reclaim_s_max,
+ },
+ {
+ .procname = "cache_reclaim_weight",
+ .data = &vm_cache_reclaim_weight,
+ .maxlen = sizeof(vm_cache_reclaim_weight),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &vm_cache_reclaim_weight_min,
+ .extra2 = &vm_cache_reclaim_weight_max,
+ },
+ {
+ .procname = "cache_reclaim_enable",
+ .data = &vm_cache_reclaim_enable,
+ .maxlen = sizeof(vm_cache_reclaim_enable),
+ .mode = 0644,
+ .proc_handler = cache_reclaim_enable_handler,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ {
+ .procname = "pagecache_limit_async",
+ .data = &vm_pagecache_limit_async,
+ .maxlen = sizeof(vm_pagecache_limit_async),
+ .mode = 0644,
+ .proc_handler = &pc_limit_async_handler,
+ },
+ {
+ .procname = "pagecache_limit_ignore_slab",
+ .data = &vm_pagecache_ignore_slab,
+ .maxlen = sizeof(vm_pagecache_ignore_slab),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
#ifdef CONFIG_HUGETLB_PAGE
{
.procname = "nr_hugepages",
diff --git a/mm/filemap.c b/mm/filemap.c
index ef611eb34aa7..808d4f02b5a5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -922,6 +922,8 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
{
void *shadow = NULL;
int ret;
+ if (unlikely(vm_pagecache_limit_pages) && pagecache_over_limit() > 0)
+ shrink_page_cache(gfp_mask, page);
__SetPageLocked(page);
ret = __add_to_page_cache_locked(page, mapping, offset,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 71afec177233..08feba42d3d7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8933,6 +8933,58 @@ void zone_pcp_reset(struct zone *zone)
local_irq_restore(flags);
}
+/* Returns a number that's positive if the pagecache is above
+ * the set limit*/
+unsigned long pagecache_over_limit()
+{
+ unsigned long should_reclaim_pages = 0;
+ unsigned long overlimit_pages = 0;
+ unsigned long delta_pages = 0;
+ unsigned long pgcache_lru_pages = 0;
+ /* We only want to limit unmapped and non-shmem page cache pages;
+ * normally all shmem pages are mapped as well*/
+ unsigned long pgcache_pages = global_node_page_state(NR_FILE_PAGES)
+ - max_t(unsigned long,
+ global_node_page_state(NR_FILE_MAPPED),
+ global_node_page_state(NR_SHMEM));
+ /* We certainly can't free more than what's on the LRU lists
+ * minus the dirty ones*/
+ if (vm_pagecache_ignore_slab)
+ pgcache_lru_pages = global_node_page_state(NR_ACTIVE_FILE)
+ + global_node_page_state(NR_INACTIVE_FILE);
+ else
+ pgcache_lru_pages = global_node_page_state(NR_ACTIVE_FILE)
+ + global_node_page_state(NR_INACTIVE_FILE)
+ + global_node_page_state(NR_SLAB_RECLAIMABLE_B)
+ + global_node_page_state(NR_SLAB_UNRECLAIMABLE_B);
+
+ if (vm_pagecache_ignore_dirty != 0)
+ pgcache_lru_pages -= global_node_page_state(NR_FILE_DIRTY) / vm_pagecache_ignore_dirty;
+ /* Paranoia */
+ if (unlikely(pgcache_lru_pages > LONG_MAX))
+ return 0;
+
+ /* Limit it to 94% of LRU (not all there might be unmapped) */
+ pgcache_lru_pages -= pgcache_lru_pages/16;
+ if (vm_pagecache_ignore_slab)
+ pgcache_pages = min_t(unsigned long, pgcache_pages, pgcache_lru_pages);
+ else
+ pgcache_pages = pgcache_lru_pages;
+
+ /*
+ *delta_pages: we should reclaim at least 2% more pages than overlimit_page, values get from
+ * /proc/vm/pagecache_limit_reclaim_pages
+ *should_reclaim_pages: the real pages we will reclaim, but it should less than pgcache_pages;
+ */
+ if (pgcache_pages > vm_pagecache_limit_pages) {
+ overlimit_pages = pgcache_pages - vm_pagecache_limit_pages;
+ delta_pages = vm_pagecache_limit_reclaim_pages - vm_pagecache_limit_pages;
+ should_reclaim_pages = min_t(unsigned long, delta_pages, vm_pagecache_limit_pages) + overlimit_pages;
+ return should_reclaim_pages;
+ }
+ return 0;
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* All pages in the range must be in a single zone, must not contain holes,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 23f8a5242de7..1fe2c74a1c10 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -175,6 +175,39 @@ struct scan_control {
*/
int vm_swappiness = 60;
+/*
+ * The total number of pages which are beyond the high watermark within all
+ * zones.
+ */
+unsigned long vm_pagecache_limit_pages __read_mostly = 0;
+unsigned long vm_pagecache_limit_reclaim_pages = 0;
+unsigned int vm_pagecache_limit_ratio __read_mostly = 0;
+int vm_pagecache_limit_reclaim_ratio __read_mostly = 0;
+unsigned int vm_pagecache_ignore_dirty __read_mostly = 1;
+
+unsigned long vm_total_pages;
+static struct task_struct *kpclimitd = NULL;
+unsigned int vm_pagecache_ignore_slab __read_mostly = 1;
+unsigned int vm_pagecache_limit_async __read_mostly = 0;
+
+#ifdef CONFIG_SHRINK_PAGECACHE
+unsigned long vm_cache_limit_ratio;
+unsigned long vm_cache_limit_ratio_min;
+unsigned long vm_cache_limit_ratio_max;
+unsigned long vm_cache_limit_mbytes __read_mostly;
+unsigned long vm_cache_limit_mbytes_min;
+unsigned long vm_cache_limit_mbytes_max;
+static bool kpclimitd_context = false;
+int vm_cache_reclaim_s __read_mostly;
+int vm_cache_reclaim_s_min;
+int vm_cache_reclaim_s_max;
+int vm_cache_reclaim_weight __read_mostly;
+int vm_cache_reclaim_weight_min;
+int vm_cache_reclaim_weight_max;
+int vm_cache_reclaim_enable;
+static DEFINE_PER_CPU(struct delayed_work, vmscan_work);
+#endif
+
static void set_task_reclaim_state(struct task_struct *task,
struct reclaim_state *rs)
{
@@ -187,10 +220,12 @@ static void set_task_reclaim_state(struct task_struct *task,
task->reclaim_state = rs;
}
+static bool kpclimitd_context = false;
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
#ifdef CONFIG_MEMCG
+static DEFINE_IDR(shrinker_idr);
static int shrinker_nr_max;
/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
@@ -346,7 +381,6 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
}
}
-static DEFINE_IDR(shrinker_idr);
static int prealloc_memcg_shrinker(struct shrinker *shrinker)
{
@@ -646,7 +680,9 @@ EXPORT_SYMBOL(unregister_shrinker);
#define SHRINK_BATCH 128
static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
- struct shrinker *shrinker, int priority)
+ struct shrinker *shrinker,
+ unsigned long nr_scanned,
+ unsigned long nr_eligible)
{
unsigned long freed = 0;
unsigned long long delta;
@@ -670,8 +706,10 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
nr = xchg_nr_deferred(shrinker, shrinkctl);
if (shrinker->seeks) {
- delta = freeable >> priority;
- delta *= 4;
+ //delta = freeable >> priority;
+ //delta *= 4;
+ delta = (4 * nr_scanned) / shrinker->seeks;
+ delta *= freeable;
do_div(delta, shrinker->seeks);
} else {
/*
@@ -682,12 +720,12 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
delta = freeable / 2;
}
- total_scan = nr >> priority;
+ total_scan = nr;
total_scan += delta;
total_scan = min(total_scan, (2 * freeable));
trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
- freeable, delta, total_scan, priority);
+ freeable, delta, total_scan, nr_scanned,nr_eligible);
/*
* Normally, we should not scan less than batch_size objects in one
@@ -744,7 +782,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
#ifdef CONFIG_MEMCG
static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg, int priority)
+ struct mem_cgroup *memcg, unsigned long nr_scanned, unsigned long nr_eligible)
{
struct shrinker_info *info;
unsigned long ret, freed = 0;
@@ -780,7 +818,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
!(shrinker->flags & SHRINKER_NONSLAB))
continue;
- ret = do_shrink_slab(&sc, shrinker, priority);
+ ret = do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
if (ret == SHRINK_EMPTY) {
clear_bit(i, info->map);
/*
@@ -799,7 +837,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
* set_bit() do_shrink_slab()
*/
smp_mb__after_atomic();
- ret = do_shrink_slab(&sc, shrinker, priority);
+ ret = do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
if (ret == SHRINK_EMPTY)
ret = 0;
else
@@ -846,7 +884,8 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
*/
static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
struct mem_cgroup *memcg,
- int priority)
+ unsigned long nr_scanned,
+ unsigned long nr_eligible)
{
unsigned long ret, freed = 0;
struct shrinker *shrinker;
@@ -859,7 +898,8 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
* oom.
*/
if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
- return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
+ return 0;
+ // return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
goto out;
@@ -871,7 +911,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
.memcg = memcg,
};
- ret = do_shrink_slab(&sc, shrinker, priority);
+ if (memcg_kmem_enabled() &&
+ !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
+ continue;
+
+ if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+ sc.nid = 0;
+
+ ret = do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible);
if (ret == SHRINK_EMPTY)
ret = 0;
freed += ret;
@@ -905,7 +952,7 @@ void drop_slab_node(int nid)
freed = 0;
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
- freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
+ freed += shrink_slab(GFP_KERNEL, nid, memcg, 1000,1000);
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
} while (freed > 10);
}
@@ -2369,7 +2416,7 @@ unsigned long reclaim_pages(struct list_head *page_list)
EXPORT_SYMBOL_GPL(reclaim_pages);
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
- struct lruvec *lruvec, struct scan_control *sc)
+ struct lruvec *lruvec, struct mem_cgroup *memcg, struct scan_control *sc)
{
if (is_active_lru(lru)) {
if (sc->may_deactivate & (1 << is_file_lru(lru)))
@@ -2683,7 +2730,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
nr[lru] -= nr_to_scan;
nr_reclaimed += shrink_list(lru, nr_to_scan,
- lruvec, sc);
+ lruvec, NULL, sc);
}
}
@@ -2836,7 +2883,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
unsigned long reclaimed;
unsigned long scanned;
-
+ unsigned long lru_pages;
/*
* This loop can become CPU-bound when target memcgs
* aren't eligible for reclaim - either because they
@@ -2873,7 +2920,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
shrink_lruvec(lruvec, sc);
shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
- sc->priority);
+ sc->nr_scanned - scanned,
+ lru_pages);
/* Record the group's reclaim efficiency */
vmpressure(sc->gfp_mask, memcg, false,
@@ -3202,6 +3250,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat)
{
+ struct mem_cgroup *memcg;
struct lruvec *target_lruvec;
unsigned long refaults;
@@ -3273,8 +3322,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
if (cgroup_reclaim(sc)) {
struct lruvec *lruvec;
- lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup,
- zone->zone_pgdat);
+ lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, zone->zone_pgdat);
clear_bit(LRUVEC_CONGESTED, &lruvec->flags);
}
}
@@ -3745,6 +3793,8 @@ static bool kswapd_shrink_node(pg_data_t *pgdat,
return sc->nr_scanned >= sc->nr_to_reclaim;
}
+static void __shrink_page_cache(gfp_t mask);
+
/*
* For kswapd, balance_pgdat() will reclaim pages across a node from zones
* that are eligible for use by the caller until at least one zone is
@@ -4208,6 +4258,27 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
wake_up_interruptible(&pgdat->kswapd_wait);
}
+/*
+ * The reclaimable count would be mostly accurate.
+ * The less reclaimable pages may be
+ * - mlocked pages, which will be moved to unevictable list when encountered
+ * - mapped pages, which may require several travels to be reclaimed
+ * - dirty pages, which is not "instantly" reclaimable
+ */
+
+static unsigned long global_reclaimable_pages(void)
+{
+ int nr;
+
+ nr = global_node_page_state(NR_ACTIVE_FILE) +
+ global_node_page_state(NR_INACTIVE_FILE);
+
+ if (get_nr_swap_pages() > 0)
+ nr += global_node_page_state(NR_ACTIVE_ANON) +
+ global_node_page_state(NR_INACTIVE_ANON);
+ return nr;
+}
+
#ifdef CONFIG_HIBERNATION
/*
* Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
@@ -4246,6 +4317,498 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
return nr_reclaimed;
}
#endif /* CONFIG_HIBERNATION */
+/*
+ * Returns non-zero if the lock has been acquired, false if somebody
+ * else is holding the lock.
+ */
+static int pagecache_reclaim_lock_zone(struct zone *zone)
+{
+ return atomic_add_unless(&zone->pagecache_reclaim, 1, 1);
+}
+
+static void pagecache_reclaim_unlock_zone(struct zone *zone)
+{
+ BUG_ON(atomic_dec_return(&zone->pagecache_reclaim));
+}
+
+/*
+ * Potential page cache reclaimers who are not able to take
+ * reclaim lock on any zone are sleeping on this waitqueue.
+ * So this is basically a congestion wait queue for them.
+ */
+DECLARE_WAIT_QUEUE_HEAD(pagecache_reclaim_wq);
+DECLARE_WAIT_QUEUE_HEAD(kpagecache_limitd_wq);
+
+/*
+ * Similar to shrink_zone but it has a different consumer - pagecache limit
+ * so we cannot reuse the original function - and we do not want to clobber
+ * that code path so we have to live with this code duplication.
+ *
+ * In short this simply scans through the given lru for all cgroups for the
+ * give zone.
+ *
+ * returns true if we managed to cumulatively reclaim (via nr_reclaimed)
+ * the given nr_to_reclaim pages, false otherwise. The caller knows that
+ * it doesn't have to touch other zones if the target was hit already.
+ *
+ * DO NOT USE OUTSIDE of shrink_all_zones unless you have a really really
+ * really good reason.
+ */
+
+static bool shrink_zone_per_memcg(struct zone *zone, enum lru_list lru,
+ unsigned long nr_to_scan, unsigned long nr_to_reclaim,
+ unsigned long *nr_reclaimed, struct scan_control *sc)
+{
+ struct mem_cgroup *root = sc->target_mem_cgroup;
+ struct mem_cgroup *memcg;
+ struct mem_cgroup_reclaim_cookie reclaim = {
+ .pgdat = zone->zone_pgdat,
+ .priority = sc->priority,
+ };
+
+ memcg = mem_cgroup_iter(root, NULL, &reclaim);
+ do {
+ struct lruvec *lruvec;
+
+ lruvec = mem_cgroup_lruvec(memcg, zone->zone_pgdat);
+ *nr_reclaimed += shrink_list(lru, nr_to_scan, lruvec, memcg, sc);
+ if (*nr_reclaimed >= nr_to_reclaim) {
+ mem_cgroup_iter_break(root, memcg);
+ return true;
+ }
+ memcg = mem_cgroup_iter(root, memcg, &reclaim);
+ } while (memcg);
+
+ return false;
+}
+/*
+ * Tries to reclaim 'nr_pages' pages from LRU lists system-wide, for given
+ * pass.
+ *
+ * For pass > 3 we also try to shrink the LRU lists that contain a few pages
+ *
+ * Returns the number of scanned zones.
+ */
+static int shrink_all_zones(unsigned long nr_pages, int pass,
+ struct scan_control *sc)
+{
+ struct zone *zone;
+ unsigned long nr_reclaimed = 0;
+ unsigned int nr_locked_zones = 0;
+ DEFINE_WAIT(wait);
+
+ prepare_to_wait(&pagecache_reclaim_wq, &wait, TASK_INTERRUPTIBLE);
+
+ for_each_populated_zone(zone) {
+ enum lru_list lru;
+
+ /*
+ * Back off if somebody is already reclaiming this zone
+ * for the pagecache reclaim.
+ */
+ if (!pagecache_reclaim_lock_zone(zone))
+ continue;
+
+
+ /*
+ * This reclaimer might scan a zone so it will never
+ * sleep on pagecache_reclaim_wq
+ */
+ finish_wait(&pagecache_reclaim_wq, &wait);
+ nr_locked_zones++;
+
+ for_each_evictable_lru(lru) {
+ enum zone_stat_item ls = NR_ZONE_LRU_BASE + lru;
+ unsigned long lru_pages = zone_page_state(zone, ls);
+
+ /* For pass = 0, we don't shrink the active list */
+ if (pass == 0 && (lru == LRU_ACTIVE_ANON ||
+ lru == LRU_ACTIVE_FILE))
+ continue;
+
+ /* Original code relied on nr_saved_scan which is no
+ * longer present so we are just considering LRU pages.
+ * This means that the zone has to have quite large
+ * LRU list for default priority and minimum nr_pages
+ * size (8*SWAP_CLUSTER_MAX). In the end we will tend
+ * to reclaim more from large zones wrt. small.
+ * This should be OK because shrink_page_cache is called
+ * when we are getting to short memory condition so
+ * LRUs tend to be large.
+ */
+ if (((lru_pages >> sc->priority) + 1) >= nr_pages || pass >= 3) {
+ unsigned long nr_to_scan;
+
+ nr_to_scan = min(nr_pages, lru_pages);
+
+ /*
+ * A bit of a hack but the code has always been
+ * updating sc->nr_reclaimed once per shrink_all_zones
+ * rather than accumulating it for all calls to shrink
+ * lru. This costs us an additional argument to
+ * shrink_zone_per_memcg but well...
+ *
+ * Let's stick with this for bug-to-bug compatibility
+ */
+ while (nr_to_scan > 0) {
+ /* shrink_list takes lru_lock with IRQ off so we
+ * should be careful about really huge nr_to_scan
+ */
+ unsigned long batch = min_t(unsigned long, nr_to_scan, SWAP_CLUSTER_MAX);
+
+ if (shrink_zone_per_memcg(zone, lru,
+ batch, nr_pages, &nr_reclaimed, sc)) {
+ pagecache_reclaim_unlock_zone(zone);
+ goto out_wakeup;
+ }
+ nr_to_scan -= batch;
+ }
+ }
+ }
+ pagecache_reclaim_unlock_zone(zone);
+ }
+ /*
+ * We have to go to sleep because all the zones are already reclaimed.
+ * One of the reclaimer will wake us up or __shrink_page_cache will
+ * do it if there is nothing to be done.
+ */
+ if (!nr_locked_zones) {
+ if (!kpclimitd_context)
+ schedule();
+ finish_wait(&pagecache_reclaim_wq, &wait);
+ goto out;
+ }
+
+out_wakeup:
+ wake_up_interruptible(&pagecache_reclaim_wq);
+ sc->nr_reclaimed += nr_reclaimed;
+out:
+ return nr_locked_zones;
+}
+
+/*
+ * Function to shrink the page cache
+ *
+ * This function calculates the number of pages (nr_pages) the page
+ * cache is over its limit and shrinks the page cache accordingly.
+ *
+ * The maximum number of pages, the page cache shrinks in one call of
+ * this function is limited to SWAP_CLUSTER_MAX pages. Therefore it may
+ * require a number of calls to actually reach the vm_pagecache_limit_kb.
+ *
+ * This function is similar to shrink_all_memory, except that it may never
+ * swap out mapped pages and only does four passes.
+ */
+static void __shrink_page_cache(gfp_t mask)
+{
+ unsigned long ret = 0;
+ int pass = 0;
+ struct reclaim_state reclaim_state;
+ struct scan_control sc = {
+ .gfp_mask = mask,
+ .may_swap = 0,
+ .may_unmap = 0,
+ .may_writepage = 0,
+ .target_mem_cgroup = NULL,
+ .reclaim_idx = MAX_NR_ZONES,
+ };
+ struct reclaim_state *old_rs = current->reclaim_state;
+ long nr_pages;
+
+ /* We might sleep during direct reclaim so make atomic context
+ * is certainly a bug.
+ */
+ BUG_ON(!(mask & __GFP_RECLAIM));
+
+retry:
+ /* How many pages are we over the limit?*/
+ nr_pages = pagecache_over_limit();
+
+ /*
+ * Return early if there's no work to do.
+ * Wake up reclaimers that couldn't scan any zone due to congestion.
+ * There is apparently nothing to do so they do not have to sleep.
+ * This makes sure that no sleeping reclaimer will stay behind.
+ * Allow breaching the limit if the task is on the way out.
+ */
+ if (nr_pages <= 0 || fatal_signal_pending(current)) {
+ wake_up_interruptible(&pagecache_reclaim_wq);
+ goto out;
+ }
+
+ /* But do a few at least */
+ nr_pages = max_t(unsigned long, nr_pages, 8*SWAP_CLUSTER_MAX);
+
+ current->reclaim_state = &reclaim_state;
+
+ /*
+ * Shrink the LRU in 4 passes:
+ * 0 = Reclaim from inactive_list only (fast)
+ * 1 = Reclaim from active list but don't reclaim mapped and dirtied (not that fast)
+ * 2 = Reclaim from active list but don't reclaim mapped (2nd pass)
+ * it may reclaim dirtied if vm_pagecache_ignore_dirty = 0
+ * 3 = same as pass 2, but it will reclaim some few pages , detail in shrink_all_zones
+ */
+ for (; pass <= 3; pass++) {
+ for (sc.priority = DEF_PRIORITY; sc.priority >= 0; sc.priority--) {
+ unsigned long nr_to_scan = nr_pages - ret;
+ struct mem_cgroup *memcg = NULL;
+ int nid;
+
+ sc.nr_scanned = 0;
+
+ /*
+ * No zone reclaimed because of too many reclaimers. Retry whether
+ * there is still something to do
+ */
+ if (!shrink_all_zones(nr_to_scan, pass, &sc))
+ goto retry;
+
+ ret += sc.nr_reclaimed;
+ if (ret >= nr_pages)
+ goto out;
+
+ reclaim_state.reclaimed_slab = 0;
+ for_each_online_node(nid) {
+ do {
+ shrink_slab(mask, nid, memcg, sc.nr_scanned,
+ global_reclaimable_pages());
+ } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+ }
+ ret += reclaim_state.reclaimed_slab;
+
+ if (ret >= nr_pages)
+ goto out;
+
+ }
+ if (pass == 1) {
+ if (vm_pagecache_ignore_dirty == 1 ||
+ (mask & (__GFP_IO | __GFP_FS)) != (__GFP_IO | __GFP_FS) )
+ break;
+ else
+ sc.may_writepage = 1;
+ }
+ }
+
+out:
+ current->reclaim_state = old_rs;
+}
+
+#ifdef CONFIG_SHRINK_PAGECACHE
+static unsigned long __shrink_page_cache(gfp_t mask)
+{
+ struct scan_control sc = {
+ .gfp_mask = current_gfp_context(mask),
+ .reclaim_idx = gfp_zone(mask),
+ .may_writepage = !laptop_mode,
+ .nr_to_reclaim = SWAP_CLUSTER_MAX *
+ (unsigned long)vm_cache_reclaim_weight,
+ .may_unmap = 1,
+ .may_swap = 1,
+ .order = 0,
+ .priority = DEF_PRIORITY,
+ .target_mem_cgroup = NULL,
+ .nodemask = NULL,
+ };
+
+ struct zonelist *zonelist = node_zonelist(numa_node_id(), mask);
+
+ return do_try_to_free_pages(zonelist, &sc);
+}
+
+
+static void shrink_page_cache_work(struct work_struct *w);
+static void shrink_shepherd(struct work_struct *w);
+static DECLARE_DEFERRABLE_WORK(shepherd, shrink_shepherd);
+
+static void shrink_shepherd(struct work_struct *w)
+{
+ int cpu;
+
+ get_online_cpus();
+
+ for_each_online_cpu(cpu) {
+ struct delayed_work *work = &per_cpu(vmscan_work, cpu);
+
+ if (!delayed_work_pending(work) && vm_cache_reclaim_enable)
+ queue_delayed_work_on(cpu, system_wq, work, 0);
+ }
+
+ put_online_cpus();
+
+ /* we want all kernel thread to stop */
+ if (vm_cache_reclaim_enable) {
+ if (vm_cache_reclaim_s == 0)
+ schedule_delayed_work(&shepherd,
+ round_jiffies_relative(120 * HZ));
+ else
+ schedule_delayed_work(&shepherd,
+ round_jiffies_relative((unsigned long)
+ vm_cache_reclaim_s * HZ));
+ }
+}
+static void shrink_shepherd_timer(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct delayed_work *work = &per_cpu(vmscan_work, cpu);
+
+ INIT_DEFERRABLE_WORK(work, shrink_page_cache_work);
+ }
+
+ schedule_delayed_work(&shepherd,
+ round_jiffies_relative((unsigned long)vm_cache_reclaim_s * HZ));
+}
+
+unsigned long shrink_page_cache(gfp_t mask)
+{
+ unsigned long nr_pages;
+
+ /* We reclaim the highmem zone too, it is useful for 32bit arch */
+ nr_pages = __shrink_page_cache(mask | __GFP_HIGHMEM);
+
+ return nr_pages;
+}
+static void shrink_page_cache_work(struct work_struct *w)
+{
+ struct delayed_work *work = to_delayed_work(w);
+ unsigned long nr_pages;
+
+ /*
+ * if vm_cache_reclaim_enable or vm_cache_reclaim_s is zero,
+ * we do not shrink page cache again.
+ */
+ if (vm_cache_reclaim_s == 0 || !vm_cache_reclaim_enable)
+ return;
+
+ /* It should wait more time if we hardly reclaim the page cache */
+ nr_pages = shrink_page_cache(GFP_KERNEL);
+ if ((nr_pages < SWAP_CLUSTER_MAX) && vm_cache_reclaim_enable)
+ queue_delayed_work_on(smp_processor_id(), system_wq, work,
+ round_jiffies_relative(120 * HZ));
+}
+
+static void shrink_page_cache_init(void)
+{
+ vm_cache_limit_ratio = 0;
+ vm_cache_limit_ratio_min = 0;
+ vm_cache_limit_ratio_max = 100;
+ vm_cache_limit_mbytes = 0;
+ vm_cache_limit_mbytes_min = 0;
+ vm_cache_limit_mbytes_max = totalram_pages >> (20 - PAGE_SHIFT);
+ vm_cache_reclaim_s = 0;
+ vm_cache_reclaim_s_min = 0;
+ vm_cache_reclaim_s_max = 43200;
+ vm_cache_reclaim_weight = 1;
+ vm_cache_reclaim_weight_min = 1;
+ vm_cache_reclaim_weight_max = 100;
+ vm_cache_reclaim_enable = 1;
+
+ shrink_shepherd_timer();
+}
+
+static int kswapd_cpu_down_prep(unsigned int cpu)
+{
+ cancel_delayed_work_sync(&per_cpu(vmscan_work, cpu));
+
+ return 0;
+}
+int cache_reclaim_enable_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+ if (ret)
+ return ret;
+
+ if (write)
+ schedule_delayed_work(&shepherd, round_jiffies_relative((unsigned long)vm_cache_reclaim_s * HZ));
+
+ return 0;
+}
+
+int cache_reclaim_sysctl_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+ if (ret)
+ return ret;
+
+ if (write)
+ mod_delayed_work(system_wq, &shepherd,
+ round_jiffies_relative(
+ (unsigned long)vm_cache_reclaim_s * HZ));
+
+ return ret;
+}
+#endif
+
+static int kpagecache_limitd(void *data)
+{
+ DEFINE_WAIT(wait);
+ kpclimitd_context = true;
+
+ /*
+ * make sure all work threads woken up, when switch to async mode
+ */
+ if (waitqueue_active(&pagecache_reclaim_wq))
+ wake_up_interruptible(&pagecache_reclaim_wq);
+
+ for ( ; ; ) {
+ __shrink_page_cache(GFP_KERNEL);
+ prepare_to_wait(&kpagecache_limitd_wq, &wait, TASK_INTERRUPTIBLE);
+
+ if (!kthread_should_stop())
+ schedule();
+ else {
+ finish_wait(&kpagecache_limitd_wq, &wait);
+ break;
+ }
+ finish_wait(&kpagecache_limitd_wq, &wait);
+ }
+ kpclimitd_context = false;
+ return 0;
+}
+
+static void wakeup_kpclimitd(gfp_t mask)
+{
+ if (!waitqueue_active(&kpagecache_limitd_wq))
+ return;
+ wake_up_interruptible(&kpagecache_limitd_wq);
+}
+
+void shrink_page_cache(gfp_t mask, struct page *page)
+{
+ if (0 == vm_pagecache_limit_async)
+ __shrink_page_cache(mask);
+ else
+ wakeup_kpclimitd(mask);
+}
+
+/* It's optimal to keep kswapds on the same CPUs as their memory, but
+ not required for correctness. So if the last cpu in a node goes
+ away, we get changed to run anywhere: as the first one comes back,
+ restore their cpu bindings. */
+static int kswapd_cpu_online(unsigned int cpu)
+{
+ int nid;
+
+ for_each_node_state(nid, N_MEMORY) {
+ pg_data_t *pgdat = NODE_DATA(nid);
+ const struct cpumask *mask;
+
+ mask = cpumask_of_node(pgdat->node_id);
+
+ if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
+ /* One of our CPUs online: restore mask */
+ set_cpus_allowed_ptr(pgdat->kswapd, mask);
+ }
+ return 0;
+}
/*
* This kswapd start function will be called by init and node-hot-add.
@@ -4286,16 +4849,61 @@ void kswapd_stop(int nid)
static int __init kswapd_init(void)
{
- int nid;
+ /*int nid;
swap_setup();
for_each_node_state(nid, N_MEMORY)
kswapd_run(nid);
- return 0;
+ return 0;*/
+ int nid, ret;
+
+ swap_setup();
+ for_each_node_state(nid, N_MEMORY)
+ kswapd_run(nid);
+#ifdef CONFIG_SHRINK_PAGECACHE
+ ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "mm/vmscan:online", kswapd_cpu_online,
+ kswapd_cpu_down_prep);
+#else
+ ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "mm/vmscan:online", kswapd_cpu_online,
+ NULL);
+#endif
+ WARN_ON(ret < 0);
+#ifdef CONFIG_SHRINK_PAGECACHE
+ shrink_page_cache_init();
+#endif
+ return 0;
+
}
module_init(kswapd_init)
+int kpagecache_limitd_run(void)
+{
+ int ret = 0;
+
+ if (kpclimitd)
+ return 0;
+
+ kpclimitd = kthread_run(kpagecache_limitd, NULL, "kpclimitd");
+ if (IS_ERR(kpclimitd)) {
+ pr_err("Failed to start kpagecache_limitd thread\n");
+ ret = PTR_ERR(kpclimitd);
+ kpclimitd = NULL;
+ }
+ return ret;
+
+}
+
+void kpagecache_limitd_stop(void)
+{
+ if (kpclimitd) {
+ kthread_stop(kpclimitd);
+ kpclimitd = NULL;
+ }
+}
+
#ifdef CONFIG_NUMA
/*
* Node reclaim mode
diff --git a/mm/workingset.c b/mm/workingset.c
index bba4380405b4..9a5ad145b9bd 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -253,6 +253,7 @@ void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages)
void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
{
struct pglist_data *pgdat = page_pgdat(page);
+ struct mem_cgroup *memcg = page_memcg(page);
unsigned long eviction;
struct lruvec *lruvec;
int memcgid;
--
2.30.0
1
0

[PATCH kernel-4.19] drivers/txgbe: fix buffer not null terminated by strncpy in txgbe_ethtool.c
by shenzijun 26 Oct '21
by shenzijun 26 Oct '21
26 Oct '21
From: 沈子俊 <shenzijun(a)kylinos.cn>
kylin inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4AG3E?from=project-issue
CVE: NA
---------------------------------------------------
change copy size in the function strncpy().
Signed-off-by: 沈子俊 <shenzijun(a)kylinos.cn>
---
drivers/net/ethernet/netswift/txgbe/txgbe_ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netswift/txgbe/txgbe_ethtool.c b/drivers/net/ethernet/netswift/txgbe/txgbe_ethtool.c
index 5cb8ef61e04b..9af9f19fb491 100644
--- a/drivers/net/ethernet/netswift/txgbe/txgbe_ethtool.c
+++ b/drivers/net/ethernet/netswift/txgbe/txgbe_ethtool.c
@@ -1040,7 +1040,7 @@ static void txgbe_get_drvinfo(struct net_device *netdev,
strncpy(drvinfo->version, txgbe_driver_version,
sizeof(drvinfo->version) - 1);
strncpy(drvinfo->fw_version, adapter->eeprom_id,
- sizeof(drvinfo->fw_version));
+ sizeof(drvinfo->fw_version) - 1);
strncpy(drvinfo->bus_info, pci_name(adapter->pdev),
sizeof(drvinfo->bus_info) - 1);
if (adapter->num_tx_queues <= TXGBE_NUM_RX_QUEUES) {
--
2.30.0
1
0