- Kernel - mailweb.openeuler.org

[PATCH OLK-6.6] netconsole: avoid OOB reads, msg is not nul-terminated
by Jinjiang Tu 10 Jun '26

10 Jun '26

From: Jakub Kicinski <kuba(a)kernel.org> mainline inclusion from mainline-v7.0-rc2 commit 82aec772fca2223bc5774bd9af486fd95766e578 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14743 CVE: CVE-2026-43197 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ------------------------------------------- msg passed to netconsole from the console subsystem is not guaranteed to be nul-terminated. Before recent commit 7eab73b18630 ("netconsole: convert to NBCON console infrastructure") the message would be placed in printk_shared_pbufs, a static global buffer, so KASAN had harder time catching OOB accesses. Now we see: printk: console [netcon_ext0] enabled BUG: KASAN: slab-out-of-bounds in string+0x1f7/0x240 Read of size 1 at addr ffff88813b6d4c00 by task pr/netcon_ext0/594 CPU: 65 UID: 0 PID: 594 Comm: pr/netcon_ext0 Not tainted 6.19.0-11754-g4246fd6547c9 Call Trace: kasan_report+0xe4/0x120 string+0x1f7/0x240 vsnprintf+0x655/0xba0 scnprintf+0xba/0x120 netconsole_write+0x3fe/0xa10 nbcon_emit_next_record+0x46e/0x860 nbcon_kthread_func+0x623/0x750 Allocated by task 1: nbcon_alloc+0x1ea/0x450 register_console+0x26b/0xe10 init_netconsole+0xbb0/0xda0 The buggy address belongs to the object at ffff88813b6d4000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes to the right of allocated 3072-byte region [ffff88813b6d4000, ffff88813b6d4c00) Fixes: c62c0a17f9b7 ("netconsole: Append kernel version to message") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Reviewed-by: Simon Horman <horms(a)kernel.org> Link: https://patch.msgid.link/20260219195021.2099699-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Conflicts: drivers/net/netconsole.c [Context conflicts due to commit e7650d8d475c ("net: netconsole: split send_ext_msg_udp() function") isn't merged.] Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> --- drivers/net/netconsole.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index fffffa3658d2..69d54f58aa97 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -854,7 +854,8 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, if (msg_len + release_len <= MAX_PRINT_CHUNK) { /* No fragmentation needed */ if (nt->release) { - scnprintf(buf, MAX_PRINT_CHUNK, "%s,%s", release, msg); + scnprintf(buf, MAX_PRINT_CHUNK, "%s,%.*s", release, + msg_len, msg); msg_len += release_len; msg_ready = buf; } -- 2.43.0

2 1

[PATCH OLK-6.6] bpf: fix UAF by restoring RCU-delayed inode freeing in bpffs
by Luo Gengkun 10 Jun '26

10 Jun '26

From: Deepanshu Kartikey <kartikey406(a)gmail.com> maillist inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15537 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- commit 4f375ade6aa9 ("bpf: Avoid RCU context warning when unpinning htab with internal structs") moved inode cleanup from ->free_inode() into ->destroy_inode() to avoid sleeping in RCU context when calling bpf_any_put(). However this removed the RCU delay on freeing the inode itself and the cached symlink body (i_link), both of which can be accessed by RCU pathwalk (pick_link, may_lookup etc.). This causes a use-after-free when a concurrent unlinkat() drops the last inode reference and destroy_inode() frees the inode immediately, while another task is still walking the path in RCU mode and reads inode->i_opflags (offset +2) inside current_time() -> is_mgtime(). KASAN reports: BUG: KASAN: slab-use-after-free in is_mgtime include/linux/fs.h:2313 Read of size 2 at addr ffff8880407e4282 (offset +2 = i_opflags) The rules (per Al Viro): ->destroy_inode() called immediately, can sleep, use for blocking cleanup e.g. bpf_any_put() ->free_inode() called after RCU grace period, use for freeing inode and anything RCU-accessible e.g. i_link Fix: split the two concerns properly: - keep bpf_any_put() in bpf_destroy_inode() since it is blocking and needs to run promptly - introduce bpf_free_inode() to handle kfree(i_link) and free_inode_nonrcu() with proper RCU delay, preventing the UAF Fixes: 4f375ade6aa9 ("bpf: Avoid RCU context warning when unpinning htab with internal structs") Reported-by: syzbot+36e50496c8ac4bcde3f9(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=36e50496c8ac4bcde3f9 Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20260423043906.GN3518998@ZenIV/ Link: https://lore.kernel.org/all/20260602002607.110866-1-kartikey406@gmail.com/T/ [v1] Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com> Acked-by: Al Viro <viro(a)zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20260602025249.113828-1-kartikey406@gmail.com Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Luo Gengkun <luogengkun2(a)huawei.com> --- kernel/bpf/inode.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 9a9630adcba4..99ec74325175 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -611,10 +611,18 @@ static void bpf_destroy_inode(struct inode *inode) { enum bpf_type type; - if (S_ISLNK(inode->i_mode)) - kfree(inode->i_link); if (!bpf_inode_type(inode, &type)) bpf_any_put(inode->i_private, type); +} + +/* + * Called after RCU grace period - safe to free inode and anything + * that might be accessed by RCU pathwalk (inode fields, i_link). + */ +static void bpf_free_inode(struct inode *inode) +{ + if (S_ISLNK(inode->i_mode)) + kfree(inode->i_link); free_inode_nonrcu(inode); } @@ -623,6 +631,7 @@ static const struct super_operations bpf_super_ops = { .drop_inode = generic_delete_inode, .show_options = bpf_show_options, .destroy_inode = bpf_destroy_inode, + .free_inode = bpf_free_inode, }; enum { -- 2.34.1

2 1

[PATCH OLK-5.10] netfilter: x_tables: ensure names are nul-terminated
by superdcc97＠163.com 10 Jun '26

10 Jun '26

From: Florian Westphal <fw(a)strlen.de> stable inclusion from stable-v5.10.253 commit bcac50ea0a29d430eedc5ac87b215393b567baa9 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14560 CVE: CVE-2026-43028 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit a958a4f90ddd7de0800b33ca9d7b886b7d40f74e ] Reject names that lack a \0 character before feeding them to functions that expect c-strings. Fixes tag is the most recent commit that needs this change. Fixes: c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- net/netfilter/xt_cgroup.c | 6 ++++++ net/netfilter/xt_rateest.c | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c index c0f5e9a4f3c6..bfc98719684e 100644 --- a/net/netfilter/xt_cgroup.c +++ b/net/netfilter/xt_cgroup.c @@ -53,6 +53,9 @@ static int cgroup_mt_check_v1(const struct xt_mtchk_param *par) info->priv = NULL; if (info->has_path) { + if (strnlen(info->path, sizeof(info->path)) >= sizeof(info->path)) + return -ENAMETOOLONG; + cgrp = cgroup_get_from_path(info->path); if (IS_ERR(cgrp)) { pr_info_ratelimited("invalid path, errno=%ld\n", @@ -85,6 +88,9 @@ static int cgroup_mt_check_v2(const struct xt_mtchk_param *par) info->priv = NULL; if (info->has_path) { + if (strnlen(info->path, sizeof(info->path)) >= sizeof(info->path)) + return -ENAMETOOLONG; + cgrp = cgroup_get_from_path(info->path); if (IS_ERR(cgrp)) { pr_info_ratelimited("invalid path, errno=%ld\n", diff --git a/net/netfilter/xt_rateest.c b/net/netfilter/xt_rateest.c index 72324bd976af..b1d736c15fcb 100644 --- a/net/netfilter/xt_rateest.c +++ b/net/netfilter/xt_rateest.c @@ -91,6 +91,11 @@ static int xt_rateest_mt_checkentry(const struct xt_mtchk_param *par) goto err1; } + if (strnlen(info->name1, sizeof(info->name1)) >= sizeof(info->name1)) + return -ENAMETOOLONG; + if (strnlen(info->name2, sizeof(info->name2)) >= sizeof(info->name2)) + return -ENAMETOOLONG; + ret = -ENOENT; est1 = xt_rateest_lookup(par->net, info->name1); if (!est1) -- 2.43.0

2 1

[PATCH OLK-5.10] mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
by Wupeng Ma 09 Jun '26

09 Jun '26

From: Donet Tom <donettom(a)linux.ibm.com> stable inclusion from stable-v6.6.113 commit ad25061d1d73e9067fdddf9133b8f4cb6c89dc0d category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9368 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 4d6fc29f36341d7795db1d1819b4c15fe9be7b23 ] Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.17586487… Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process") Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process") Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou(a)linux.dev> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Donet Tom <donettom(a)linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: xu xin <xu.xin16(a)zte.com.cn> Cc: <stable(a)vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> [ changed mm_flags_test() to test_bit() ] Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: include/linux/ksm.h [Conflicts due to commit e2942062e01d ("ksm: count all zero pages placed by KSM") no merged] Signed-off-by: Wupeng Ma <mawupeng1(a)huawei.com> --- include/linux/ksm.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/ksm.h b/include/linux/ksm.h index debef5446114f..5d1430f7389cc 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -34,6 +34,9 @@ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) int ret; if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) { + mm->ksm_merging_pages = 0; + mm->ksm_rmap_items = 0; + ret = __ksm_enter(mm); if (ret) return ret; -- 2.43.0

2 1

[PATCH OLK-6.6] mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
by Wupeng Ma 09 Jun '26

09 Jun '26

From: Donet Tom <donettom(a)linux.ibm.com> stable inclusion from stable-v6.6.113 commit ad25061d1d73e9067fdddf9133b8f4cb6c89dc0d category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9368 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 4d6fc29f36341d7795db1d1819b4c15fe9be7b23 ] Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.17586487… Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process") Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process") Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou(a)linux.dev> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Donet Tom <donettom(a)linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: xu xin <xu.xin16(a)zte.com.cn> Cc: <stable(a)vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> [ changed mm_flags_test() to test_bit() ] Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: include/linux/ksm.h [Conflicts due to cleanup commit 283dfdd20e57 is merged.] Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> Signed-off-by: Wupeng Ma <mawupeng1(a)huawei.com> --- include/linux/ksm.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 691c1f54254e6..5c96c3c524c12 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -56,8 +56,15 @@ static inline long mm_ksm_zero_pages(struct mm_struct *mm) static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { - if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) + if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) { + long nr_ksm_zero_pages = atomic_long_read(&mm->ksm_zero_pages); + + mm->ksm_merging_pages = 0; + mm->ksm_rmap_items = 0; + atomic_long_add(nr_ksm_zero_pages, &ksm_zero_pages); + __ksm_enter(mm); + } } static inline void ksm_exit(struct mm_struct *mm) -- 2.43.0

2 1

[PATCH OLK-5.10] netfilter: nf_conntrack_helper: pass helper to expect cleanup
by superdcc97＠163.com 09 Jun '26

09 Jun '26

From: Qi Tang <tpluszz77(a)gmail.com> stable inclusion from stable-v5.10.253 commit 5cf28d5c8dcbbe8af6d3b145babe491906d7bad1 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14559 CVE: CVE-2026-43027 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit a242a9ae58aa46ff7dae51ce64150a93957abe65 ] nf_conntrack_helper_unregister() calls nf_ct_expect_iterate_destroy() to remove expectations belonging to the helper being unregistered. However, it passes NULL instead of the helper pointer as the data argument, so expect_iter_me() never matches any expectation and all of them survive the cleanup. After unregister returns, nfnl_cthelper_del() frees the helper object immediately. Subsequent expectation dumps or packet-driven init_conntrack() calls then dereference the freed exp->helper, causing a use-after-free. Pass the actual helper pointer so expectations referencing it are properly destroyed before the helper object is freed. BUG: KASAN: slab-use-after-free in string+0x38f/0x430 Read of size 1 at addr ffff888003b14d20 by task poc/103 Call Trace: string+0x38f/0x430 vsnprintf+0x3cc/0x1170 seq_printf+0x17a/0x240 exp_seq_show+0x2e5/0x560 seq_read_iter+0x419/0x1280 proc_reg_read+0x1ac/0x270 vfs_read+0x179/0x930 ksys_read+0xef/0x1c0 Freed by task 103: The buggy address is located 32 bytes inside of freed 192-byte region [ffff888003b14d00, ffff888003b14dc0) Fixes: ac7b84839003 ("netfilter: expect: add and use nf_ct_expect_iterate helpers") Signed-off-by: Qi Tang <tpluszz77(a)gmail.com> Reviewed-by: Phil Sutter <phil(a)nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- net/netfilter/nf_conntrack_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 32cc91f5ba99..31f87eff5a36 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -468,7 +468,7 @@ void nf_conntrack_helper_unregister(struct nf_conntrack_helper *me) */ synchronize_rcu(); - nf_ct_expect_iterate_destroy(expect_iter_me, NULL); + nf_ct_expect_iterate_destroy(expect_iter_me, me); nf_ct_iterate_destroy(unhelp, me); /* Maybe someone has gotten the helper already when unhelp above. -- 2.43.0

2 1

[PATCH OLK-5.10] mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
by Wupeng Ma 09 Jun '26

09 Jun '26

From: Donet Tom <donettom(a)linux.ibm.com> stable inclusion from stable-v6.6.113 commit ad25061d1d73e9067fdddf9133b8f4cb6c89dc0d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID7OBU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 4d6fc29f36341d7795db1d1819b4c15fe9be7b23 ] Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.17586487… Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process") Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process") Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou(a)linux.dev> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Donet Tom <donettom(a)linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: xu xin <xu.xin16(a)zte.com.cn> Cc: <stable(a)vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> [ changed mm_flags_test() to test_bit() ] Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: include/linux/ksm.h [Conflicts due to commit e2942062e01d ("ksm: count all zero pages placed by KSM") no merged] Signed-off-by: Wupeng Ma <mawupeng1(a)huawei.com> --- include/linux/ksm.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/ksm.h b/include/linux/ksm.h index debef5446114f..5d1430f7389cc 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -34,6 +34,9 @@ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) int ret; if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) { + mm->ksm_merging_pages = 0; + mm->ksm_rmap_items = 0; + ret = __ksm_enter(mm); if (ret) return ret; -- 2.43.0

2 1

[PATCH OLK-6.6] mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
by Wupeng Ma 09 Jun '26

09 Jun '26

From: Donet Tom <donettom(a)linux.ibm.com> stable inclusion from stable-v6.6.113 commit ad25061d1d73e9067fdddf9133b8f4cb6c89dc0d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID7OBU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 4d6fc29f36341d7795db1d1819b4c15fe9be7b23 ] Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.17586487… Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process") Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process") Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou(a)linux.dev> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Donet Tom <donettom(a)linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: xu xin <xu.xin16(a)zte.com.cn> Cc: <stable(a)vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> [ changed mm_flags_test() to test_bit() ] Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: include/linux/ksm.h [Conflicts due to cleanup commit 283dfdd20e57 is merged.] Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> Signed-off-by: Wupeng Ma <mawupeng1(a)huawei.com> --- include/linux/ksm.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 691c1f54254e6..5c96c3c524c12 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -56,8 +56,15 @@ static inline long mm_ksm_zero_pages(struct mm_struct *mm) static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { - if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) + if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) { + long nr_ksm_zero_pages = atomic_long_read(&mm->ksm_zero_pages); + + mm->ksm_merging_pages = 0; + mm->ksm_rmap_items = 0; + atomic_long_add(nr_ksm_zero_pages, &ksm_zero_pages); + __ksm_enter(mm); + } } static inline void ksm_exit(struct mm_struct *mm) -- 2.43.0

2 1

[PATCH openEuler-1.0-LTS] scsi/hiraid: Support new RAID features and bug fixes
by LinKun 09 Jun '26

09 Jun '26

From: 岳智超 <yuezhichao1(a)h-partners.com> driver inclusion category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/9101 CVE: NA -------------------------------- - Fix issue where throughput may drop to 0 under large I/O conditions - Fix drive letter misalignment when creating/ deleting drive letters too quickly - Fix edge case in timeout handling where equal to timeout value is not handled - Fix OS failure to start in single queue mode - Adapt to configurable management command timeout - Adapt to configurable I/O queue depth - Simplify PRP and SGL workflow - Simplify stream identification feature - Add driver fast recovery after linkdown/linkup - Add boot drive priority reporting during drive letter enumeration - Add FLR support with firmware adaptation for fast driver recovery after FLR - Add NCQ feature for SATA drives - Remove legacy driver template class definitions Signed-off-by: Zhichao Yue <yuezhichao1(a)h-partners.com> --- drivers/scsi/hisi_raid/hiraid.h | 87 ++- drivers/scsi/hisi_raid/hiraid_main.c | 919 ++++++++++++++------------- 2 files changed, 536 insertions(+), 470 deletions(-) diff --git a/drivers/scsi/hisi_raid/hiraid.h b/drivers/scsi/hisi_raid/hiraid.h index cb90d3d..25df43b 100644 --- a/drivers/scsi/hisi_raid/hiraid.h +++ b/drivers/scsi/hisi_raid/hiraid.h @@ -50,6 +50,10 @@ #define HIRAID_DEV_INFO_FLAG_VALID(flag) ((flag) & 0x01) #define HIRAID_DEV_INFO_FLAG_CHANGE(flag) ((flag) & 0x02) +#define HIRAID_DEV_INFO_FLAG_DEV_MAGIC_DEV(flag) ((flag) >> 2) +#define HIRAID_DEV_INFO_FLAG_DELETE_OR_ADD(flag, org_flag) \ + (HIRAID_DEV_INFO_FLAG_DEV_MAGIC_DEV(flag) != \ + HIRAID_DEV_INFO_FLAG_DEV_MAGIC_DEV(org_flag) ? 1 : 0) #define HIRAID_CAP_MQES(cap) ((cap) & 0xffff) #define HIRAID_CAP_STRIDE(cap) (((cap) >> 32) & 0xf) @@ -74,7 +78,19 @@ #define PCI_VENDOR_ID_HUAWEI_LOGIC 0x19E5 #define HIRAID_SERVER_DEVICE_HBA_DID 0x3858 +#define HIRAID_SERVER_DEVICE_HBAS_DID 0x3918 #define HIRAID_SERVER_DEVICE_RAID_DID 0x3758 +#define HIRAID_SERVER_DEVICE_RAIDS_DID 0x38D8 + +#define HIRAID_SCSI_VPD_LEN SCSI_VPD_PG_LEN +#define SCSI_RSVD_INIT_SHIFT 0XFC +#define NCQ_PRIO_SUPPORT_BYTE 213 +#define VPD_NCQ_PRIO_SUPPORT_BIT 4 + +#define HIRAID_CAP_CAC_STRIDE 8 + +#define HIRAID_MAX_PD_NUM (40 + 1) +#define HIRAID_MAX_STREAM_NUM 8 enum { HIRAID_SC_SUCCESS = 0x0, @@ -90,6 +106,7 @@ enum { enum { HIRAID_REG_CAP = 0x0000, + HIRAID_REG_VS = 0x0008, HIRAID_REG_CC = 0x0014, HIRAID_REG_CSTS = 0x001c, HIRAID_REG_AQA = 0x0024, @@ -236,6 +253,20 @@ enum { DISPATCH_BY_DISK, }; +enum hiraid_stream_type { + HIRAID_STREAM_TYPE_TOTAL, + HIRAID_STREAM_TYPE_WRITE, + HIRAID_STREAM_TYPE_READ, + HIRAID_STREAM_TYPE_CLEAN, + HIRAID_STREAM_TYPE_BOTTOM +}; + +enum hiraid_stream_io_operation_type { + HIRAID_STREAM_TYPE_DELETE_SINGLE_IO = 1, + HIRAID_STREAM_TYPE_DELETE_SINGLE_IO_LIST, + HIRAID_STREAM_TYPE_DELETE_ALL_IO_LIST +}; + struct hiraid_completion { __le32 result; union { @@ -271,6 +302,19 @@ struct hiraid_ctrl_info { __u8 rsvd1[4020]; }; +struct hiraid_stream { + /* recog-window */ + u64 stream_lba; + u32 stream_len; + u16 did; + u16 type; + /* aging ctrl */ + int aging_credit; + int aging_grade; + u16 stream_id; + u16 stream_is_using; +}; + struct hiraid_dev { struct pci_dev *pdev; struct device *dev; @@ -318,6 +362,18 @@ struct hiraid_dev { u8 hdd_dispatch; struct request_queue *bsg_queue; + + u16 hiraid_stream_io_count; + u64 hiraid_stream_aging_time; + u64 hiraid_stream_sent_io_size[HIRAID_MAX_PD_NUM] + [HIRAID_MAX_STREAM_NUM]; + u16 hiraid_stream_io_num_per_pd[HIRAID_MAX_PD_NUM] + [HIRAID_STREAM_TYPE_BOTTOM + 1]; + spinlock_t hiraid_stream_array_lock; + struct hiraid_stream hiraid_stream_array[HIRAID_MAX_PD_NUM] + [HIRAID_MAX_STREAM_NUM]; + struct task_struct *hiraid_stream_submit_task; + u64 hiraid_stream_io_last_pull_time[HIRAID_MAX_PD_NUM]; }; struct hiraid_sgl_desc { @@ -696,6 +752,10 @@ struct hiraid_mapmange { u32 sge_cnt; u32 len; bool use_sgl; + /* This field is used in the I/O read/write process. + * It indicates the TRANSFER LENGTH field in the CDB. + */ + u32 cdb_data_len; dma_addr_t first_dma; void *sense_buffer_virt; dma_addr_t sense_buffer_phy; @@ -755,38 +815,19 @@ struct hiraid_sdev_hostdata { u8 flag; u8 rg_id; u8 hwq; + u8 sata_ncq_prio_enable; + u8 rsvd[3]; u16 pend_count; }; -enum stream_type { - TYPE_TOTAL, - TYPE_WRITE, - TYPE_READ, - TYPE_CLEAN, - TYPE_BOTTOM -}; - -struct HIRAID_STREAM_S { - /* recog-window */ - u64 stream_lba; - u32 stream_len; - u16 did; - u16 type; - /* aging ctrl */ - int aging_credit; - int aging_grade; - u16 stream_id; - u16 using; -}; - -struct IO_LIST_S { +struct hiraid_stream_io_list { struct list_head list; struct hiraid_scsi_io_cmd io_cmd; struct hiraid_queue *submit_queue; unsigned int sector_size; }; -struct mutex_list_head_s { +struct mutex_list_head { struct list_head list; struct mutex lock; }; diff --git a/drivers/scsi/hisi_raid/hiraid_main.c b/drivers/scsi/hisi_raid/hiraid_main.c index cdba466..9251fb8 100644 --- a/drivers/scsi/hisi_raid/hiraid_main.c +++ b/drivers/scsi/hisi_raid/hiraid_main.c @@ -26,6 +26,10 @@ #include <linux/bsg-lib.h> #include <asm/unaligned.h> #include <linux/sort.h> +#include <linux/kthread.h> +#include <linux/mutex.h> +#include <linux/sched.h> +#include <linux/sched/prio.h> #include <target/target_core_backend.h> #include <scsi/scsi.h> @@ -35,14 +39,10 @@ #include <scsi/scsi_transport.h> #include <scsi/scsi_dbg.h> #include <scsi/sg.h> -#include <linux/kthread.h> -#include <linux/mutex.h> -#include <linux/sched.h> -#include <linux/sched/prio.h> #include "hiraid.h" -static u32 admin_tmout = 60; +static u32 admin_tmout = 180; module_param(admin_tmout, uint, 0644); MODULE_PARM_DESC(admin_tmout, "admin commands timeout (seconds)"); @@ -101,7 +101,7 @@ static const struct kernel_param_ops io_queue_depth_ops = { .get = param_get_uint, }; -static u32 io_queue_depth = 1024; +static u32 io_queue_depth = 4096; module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644); MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2"); @@ -143,8 +143,6 @@ static void hiraid_handle_async_notice(struct hiraid_dev *hdev, u32 result); static void hiraid_handle_async_vs(struct hiraid_dev *hdev, u32 result, u32 result1); -static struct class *hiraid_class; - #define HIRAID_CAP_TIMEOUT_UNIT_MS (HZ / 2) static struct workqueue_struct *work_queue; @@ -155,7 +153,7 @@ static struct workqueue_struct *work_queue; __func__, ##__VA_ARGS__); \ } while (0) -#define HIRAID_DRV_VERSION "1.1.0.1" +#define HIRAID_DRV_VERSION "2.1.0.1" #define ADMIN_TIMEOUT (admin_tmout * HZ) #define USRCMD_TIMEOUT (180 * HZ) @@ -178,8 +176,12 @@ static struct workqueue_struct *work_queue; #define MIN_CREDIT 0 #define MAX_CREDIT 64 #define CREDIT_THRES 32 +#ifndef MIN #define MIN(a, b) (((a) < (b)) ? (a) : (b)) +#endif +#ifndef MAX #define MAX(a, b) (((a) > (b)) ? (a) : (b)) +#endif enum SENSE_STATE_CODE { SENSE_STATE_OK = 0, @@ -362,95 +364,8 @@ static u32 hiraid_get_max_cmd_size(struct hiraid_dev *hdev) return sizeof(struct hiraid_mapmange) + alloc_size; } -static int hiraid_build_passthru_prp(struct hiraid_dev *hdev, - struct hiraid_mapmange *mapbuf) -{ - struct scatterlist *sg = mapbuf->sgl; - __le64 *phy_regpage, *prior_list; - u64 buf_addr = sg_dma_address(sg); - int buf_length = sg_dma_len(sg); - u32 page_size = hdev->page_size; - int offset = buf_addr & (page_size - 1); - void **list = hiraid_mapbuf_list(mapbuf); - int maplen = mapbuf->len; - struct dma_pool *pool; - dma_addr_t buffer_phy; - int i; - - maplen -= (page_size - offset); - if (maplen <= 0) { - mapbuf->first_dma = 0; - return 0; - } - - buf_length -= (page_size - offset); - if (buf_length) { - buf_addr += (page_size - offset); - } else { - sg = sg_next(sg); - buf_addr = sg_dma_address(sg); - buf_length = sg_dma_len(sg); - } - - if (maplen <= page_size) { - mapbuf->first_dma = buf_addr; - return 0; - } - - pool = hdev->prp_page_pool; - mapbuf->page_cnt = 1; - - phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); - if (!phy_regpage) { - dev_err_ratelimited(hdev->dev, "allocate first admin prp_list memory failed\n"); - mapbuf->first_dma = buf_addr; - mapbuf->page_cnt = -1; - return -ENOMEM; - } - list[0] = phy_regpage; - mapbuf->first_dma = buffer_phy; - i = 0; - for (;;) { - if (i == page_size / PRP_ENTRY_SIZE) { - prior_list = phy_regpage; - - phy_regpage = dma_pool_alloc(pool, GFP_ATOMIC, - &buffer_phy); - if (!phy_regpage) { - dev_err_ratelimited(hdev->dev, "allocate [%d]th admin prp list memory failed\n", - mapbuf->page_cnt + 1); - return -ENOMEM; - } - list[mapbuf->page_cnt++] = phy_regpage; - phy_regpage[0] = prior_list[i - 1]; - prior_list[i - 1] = cpu_to_le64(buffer_phy); - i = 1; - } - phy_regpage[i++] = cpu_to_le64(buf_addr); - buf_addr += page_size; - buf_length -= page_size; - maplen -= page_size; - if (maplen <= 0) - break; - if (buf_length > 0) - continue; - if (unlikely(buf_length < 0)) - goto bad_admin_sgl; - sg = sg_next(sg); - buf_addr = sg_dma_address(sg); - buf_length = sg_dma_len(sg); - } - - return 0; - -bad_admin_sgl: - dev_err(hdev->dev, "setup prps, invalid admin SGL for payload[%d] nents[%d]\n", - mapbuf->len, mapbuf->sge_cnt); - return -EIO; -} - static int hiraid_build_prp(struct hiraid_dev *hdev, - struct hiraid_mapmange *mapbuf) + struct hiraid_mapmange *mapbuf) { struct scatterlist *sg = mapbuf->sgl; __le64 *phy_regpage, *prior_list; @@ -656,67 +571,9 @@ static void hiraid_sgl_set_seg(struct hiraid_sgl_desc *sge, } } -static int hiraid_build_passthru_sgl(struct hiraid_dev *hdev, - struct hiraid_admin_command *admin_cmd, - struct hiraid_mapmange *mapbuf) -{ - struct hiraid_sgl_desc *sg_list, *link, *old_sg_list; - struct scatterlist *sg = mapbuf->sgl; - void **list = hiraid_mapbuf_list(mapbuf); - struct dma_pool *pool; - int nsge = mapbuf->sge_cnt; - dma_addr_t buffer_phy; - int i = 0; - - admin_cmd->common.flags |= SQE_FLAG_SGL_METABUF; - - if (nsge == 1) { - hiraid_sgl_set_data(&admin_cmd->common.dptr.sgl, sg); - return 0; - } - - pool = hdev->prp_page_pool; - mapbuf->page_cnt = 1; - - sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); - if (!sg_list) { - dev_err_ratelimited(hdev->dev, "allocate first admin sgl_list failed\n"); - mapbuf->page_cnt = -1; - return -ENOMEM; - } - - list[0] = sg_list; - mapbuf->first_dma = buffer_phy; - hiraid_sgl_set_seg(&admin_cmd->common.dptr.sgl, buffer_phy, nsge); - do { - if (i == SGES_PER_PAGE) { - old_sg_list = sg_list; - link = &old_sg_list[SGES_PER_PAGE - 1]; - - sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &buffer_phy); - if (!sg_list) { - dev_err_ratelimited(hdev->dev, "allocate [%d]th admin sgl_list failed\n", - mapbuf->page_cnt + 1); - return -ENOMEM; - } - list[mapbuf->page_cnt++] = sg_list; - - i = 0; - memcpy(&sg_list[i++], link, sizeof(*link)); - hiraid_sgl_set_seg(link, buffer_phy, nsge); - } - - hiraid_sgl_set_data(&sg_list[i++], sg); - sg = sg_next(sg); - } while (--nsge > 0); - - return 0; -} - - static int hiraid_build_sgl(struct hiraid_dev *hdev, - struct hiraid_scsi_io_cmd *io_cmd, - struct hiraid_mapmange *mapbuf) + struct hiraid_scsi_io_cmd *io_cmd, + struct hiraid_mapmange *mapbuf) { struct hiraid_sgl_desc *sg_list, *link, *old_sg_list; struct scatterlist *sg = mapbuf->sgl; @@ -776,102 +633,52 @@ static int hiraid_build_sgl(struct hiraid_dev *hdev, return 0; } -#define MAX_PD_NUM (40 + 1) -#define MAX_STREAM_NUM 8 -#define PER_MB (1024 * 1024) -#define MAX_IO_NUM (200 * PER_MB) -#define STREAM_LEN (4 * PER_MB) +#define MAX_IO_NUM (200 * 1024 * 1024) +#define STREAM_LEN (4 * 1024 * 1024) #define MAX_IO_NUM_ONCE 100 #define IO_SUBMIT_TIME_OUT 100 -#define MAX_AGING_NUM 100 - +#define MAX_AGING_NUM 5000 +#define MAX_AGING_TIME 16 +#define AGING_DEGRADE (-2) #define MIN_IO_SEND_TIME 10 #define MAX_IO_SEND_TIME 50 -enum io_operation_type { - TYPE_DELETE_SINGLE_IO = 1, - TYPE_DELETE_SINGLE_IO_LIST, - TYPE_DELETE_ALL_IO_LIST -}; - -struct HIRAID_STREAM_S stream_array[MAX_PD_NUM][MAX_STREAM_NUM] = {0}; -struct mutex_list_head_s io_heads_per_stream[MAX_PD_NUM * MAX_STREAM_NUM]; -spinlock_t stream_array_lock; +struct mutex_list_head hiraid_io_heads_per_stream[ + HIRAID_MAX_PD_NUM * HIRAID_MAX_STREAM_NUM]; DEFINE_MUTEX(g_stream_operation_mutex); -u64 g_io_transport_num[MAX_PD_NUM][MAX_STREAM_NUM] = {0}; -u16 g_io_stream_num[MAX_PD_NUM][TYPE_BOTTOM] = {0}; -u16 g_io_count = 1; - -void hiraid_inc_io_transport_num(u16 disk_id, u16 streamd_id, u16 nlb) -{ - g_io_transport_num[disk_id][streamd_id] += nlb; -} - -void hiraid_refresh_io_transport_num(u16 disk_id, u16 streamd_id) -{ - g_io_transport_num[disk_id][streamd_id] = 0; -} - -void hiraid_inc_stream_num(u16 disk_id) -{ - spin_lock(&stream_array_lock); - g_io_stream_num[disk_id][TYPE_TOTAL]++; - spin_unlock(&stream_array_lock); -} - -void hiraid_dec_stream_num(u16 disk_id) -{ - spin_lock(&stream_array_lock); - if (g_io_stream_num[disk_id][TYPE_TOTAL] > 0) - g_io_stream_num[disk_id][TYPE_TOTAL]--; - spin_unlock(&stream_array_lock); -} - -static bool hiraid_io_recog_check_stream_exceed(u16 disk_id) +static inline bool hiraid_io_recog_check_stream_exceed(struct hiraid_dev *hdev, + u16 disk_id) { bool exceed_flag; - spin_lock(&stream_array_lock); - exceed_flag = (g_io_stream_num[disk_id][TYPE_TOTAL] >= MAX_STREAM_NUM); - spin_unlock(&stream_array_lock); + spin_lock(&hdev->hiraid_stream_array_lock); + exceed_flag = + (hdev->hiraid_stream_io_num_per_pd[disk_id][HIRAID_STREAM_TYPE_TOTAL] >= + HIRAID_MAX_STREAM_NUM); + spin_unlock(&hdev->hiraid_stream_array_lock); return exceed_flag; } -static u16 hiraid_get_stream_num(u16 disk_id) -{ - return g_io_stream_num[disk_id][TYPE_TOTAL]; -} - -static inline struct HIRAID_STREAM_S *hiraid_get_stream(u16 disk_id, - u16 stream_id) -{ - return &stream_array[disk_id][stream_id]; -} - -static inline struct mutex_list_head_s *hiraid_get_io_head(u16 disk_id) -{ - return &(io_heads_per_stream[disk_id]); -} - -static bool hiraid_recognition_acknowledge(const struct HIRAID_STREAM_S *stream) +static inline bool hiraid_recognition_acknowledge( + const struct hiraid_stream *stream) { return (stream->aging_credit >= CREDIT_THRES) ? true : false; } -void hiraid_io_recognition_init(void) +void hiraid_io_recognition_init(struct hiraid_dev *hdev) { u16 i; - spin_lock_init(&stream_array_lock); - for (i = 0; i < (MAX_PD_NUM * MAX_STREAM_NUM); i++) { - INIT_LIST_HEAD(&hiraid_get_io_head(i)->list); - mutex_init(&hiraid_get_io_head(i)->lock); + spin_lock_init(&hdev->hiraid_stream_array_lock); + for (i = 0; i < (HIRAID_MAX_PD_NUM * HIRAID_MAX_STREAM_NUM); i++) { + INIT_LIST_HEAD(&hiraid_io_heads_per_stream[i].list); + mutex_init(&hiraid_io_heads_per_stream[i].lock); } } -static void hiraid_io_recognition_iterator(struct HIRAID_STREAM_S *stream, - int direction) +static void hiraid_io_recognition_iterator(struct hiraid_stream *stream, + int direction) { stream->aging_grade = stream->aging_grade + direction * INC_GRADE; stream->aging_grade = MAX(stream->aging_grade, MAX_DECREASE_GRADE); @@ -881,46 +688,52 @@ static void hiraid_io_recognition_iterator(struct HIRAID_STREAM_S *stream, stream->aging_credit = MIN(stream->aging_credit, MAX_CREDIT); } -struct HIRAID_STREAM_S *hiraid_io_pick_stream( +struct hiraid_stream *hiraid_io_pick_stream(struct hiraid_dev *hdev, struct hiraid_scsi_rw_cmd *req, u16 type, u16 actual_id) { - struct HIRAID_STREAM_S *first_hit_stream = NULL; - struct HIRAID_STREAM_S *temp_stream = NULL; - u16 pick_flag = 0; + struct hiraid_stream *first_hit_stream = NULL; + struct hiraid_stream *temp_stream = NULL; + bool pick_flag = false; u8 i; - for (i = 0; i < MAX_STREAM_NUM; i++) { - temp_stream = &stream_array[actual_id][i]; + for (i = 0; i < HIRAID_MAX_STREAM_NUM; i++) { + temp_stream = &hdev->hiraid_stream_array[actual_id][i]; temp_stream->stream_id = i; if (req->slba < temp_stream->stream_lba || req->slba >= temp_stream->stream_lba + temp_stream->stream_len || - temp_stream->type != type) { + temp_stream->type != type || + !temp_stream->stream_is_using) { continue; } - if (!pick_flag) { + if (pick_flag == false) { temp_stream->stream_lba = req->slba; first_hit_stream = temp_stream; - pick_flag = 1; + pick_flag = true; continue; } - hiraid_dec_stream_num(actual_id); - memset(temp_stream, 0, - sizeof(struct HIRAID_STREAM_S)); // 去重影 + spin_lock(&hdev->hiraid_stream_array_lock); + if (hdev->hiraid_stream_io_num_per_pd[actual_id] + [HIRAID_STREAM_TYPE_TOTAL] > 0) + hdev->hiraid_stream_io_num_per_pd[actual_id] + [HIRAID_STREAM_TYPE_TOTAL]--; + + memset(temp_stream, 0, sizeof(struct hiraid_stream)); + spin_unlock(&hdev->hiraid_stream_array_lock); } return first_hit_stream; } -static struct HIRAID_STREAM_S *hiraid_init_flow_stream( +static struct hiraid_stream *hiraid_init_flow_stream(struct hiraid_dev *hdev, struct hiraid_scsi_rw_cmd *req, u16 type, u16 actual_id) { int i; - struct HIRAID_STREAM_S *stream = NULL; + struct hiraid_stream *stream = NULL; - for (i = 0; i < MAX_STREAM_NUM; i++) { - stream = hiraid_get_stream(actual_id, i); - if (!stream->using) { - stream->using = 1; + for (i = 0; i < HIRAID_MAX_STREAM_NUM; i++) { + stream = &hdev->hiraid_stream_array[actual_id][i]; + if (!stream->stream_is_using) { + stream->stream_is_using = 1; stream->stream_id = i; break; } @@ -934,82 +747,96 @@ static struct HIRAID_STREAM_S *hiraid_init_flow_stream( return stream; } -static struct HIRAID_STREAM_S *hiraid_stream_detect(struct hiraid_dev *hdev, - struct hiraid_scsi_rw_cmd *io_cmd, u16 actual_id) +static struct hiraid_stream *hiraid_stream_detect(struct hiraid_dev *hdev, + struct hiraid_scsi_rw_cmd *io_cmd, u16 actual_id) { - u16 type = io_cmd->opcode == HIRAID_CMD_WRITE ? TYPE_WRITE : TYPE_READ; - struct HIRAID_STREAM_S *stream = hiraid_io_pick_stream(io_cmd, type, - actual_id); + u16 type = HIRAID_STREAM_TYPE_READ; + struct hiraid_stream *stream = NULL; + + if (io_cmd->opcode == HIRAID_CMD_WRITE) + type = HIRAID_STREAM_TYPE_WRITE; + + stream = hiraid_io_pick_stream(hdev, + io_cmd, type, actual_id); - if (stream != NULL) { /* 可以命中一个stream */ + if (stream != NULL) return stream; - } - if (hiraid_io_recog_check_stream_exceed(actual_id)) + if (hiraid_io_recog_check_stream_exceed(hdev, actual_id)) return NULL; - stream = hiraid_init_flow_stream(io_cmd, type, actual_id); - hiraid_inc_stream_num(actual_id); + + stream = hiraid_init_flow_stream(hdev, io_cmd, type, actual_id); + spin_lock(&hdev->hiraid_stream_array_lock); + hdev->hiraid_stream_io_num_per_pd[actual_id] + [HIRAID_STREAM_TYPE_TOTAL]++; + spin_unlock(&hdev->hiraid_stream_array_lock); return stream; } -u64 g_io_last_pull_time[MAX_PD_NUM] = {0}; - static u16 hiraid_get_submit_io_stream(u16 did, struct hiraid_dev *hdev) { u64 temp_num, i; - static u16 stream_num[MAX_PD_NUM] = {0}; + static u16 stream_num[HIRAID_MAX_PD_NUM] = {0}; - if (g_io_last_pull_time[did] == 0) - g_io_last_pull_time[did] = jiffies_to_msecs(jiffies); + if (hdev->hiraid_stream_io_last_pull_time[did] == 0) + hdev->hiraid_stream_io_last_pull_time[did] = + jiffies_to_msecs(jiffies); - for (i = 0; i < MAX_STREAM_NUM; i++) { - temp_num = g_io_transport_num[did][i]; + for (i = 0; i < HIRAID_MAX_STREAM_NUM; i++) { + temp_num = hdev->hiraid_stream_sent_io_size[did][i]; if (temp_num != 0) { if ((temp_num < MAX_IO_NUM) && - ((jiffies_to_msecs(jiffies) - g_io_last_pull_time[did]) - < IO_SUBMIT_TIME_OUT)) { + ((jiffies_to_msecs(jiffies) - + hdev->hiraid_stream_io_last_pull_time[did]) < + IO_SUBMIT_TIME_OUT)) { stream_num[did] = i; return i; } - g_io_last_pull_time[did] = jiffies_to_msecs(jiffies); - hiraid_refresh_io_transport_num(did, i); - stream_num[did] = ((i+1) % MAX_STREAM_NUM); - return ((i+1) % MAX_STREAM_NUM); + hdev->hiraid_stream_io_last_pull_time[did] = + jiffies_to_msecs(jiffies); + hdev->hiraid_stream_sent_io_size[did][i] = 0; + stream_num[did] = ((i+1) % + HIRAID_MAX_STREAM_NUM); + return ((i+1) % HIRAID_MAX_STREAM_NUM); + } } - g_io_last_pull_time[did] = jiffies_to_msecs(jiffies); - return ((stream_num[did]++) % MAX_STREAM_NUM); + hdev->hiraid_stream_io_last_pull_time[did] = + jiffies_to_msecs(jiffies); + return ((stream_num[did]++) % HIRAID_MAX_STREAM_NUM); } static void hiraid_submit_io_stream(u16 hdid, struct hiraid_dev *hdev) { - struct mutex_list_head_s *io_slist = NULL; + struct mutex_list_head *io_slist = NULL; struct list_head *node = NULL; struct list_head *next_node = NULL; struct list_head temp_header; - struct hiraid_scsi_io_cmd io_cmd = {0}; + struct hiraid_scsi_io_cmd io_cmd; u16 submit_stream_id; - struct IO_LIST_S *temp_io_stream = NULL; + struct hiraid_stream_io_list *temp_io_stream = NULL; u16 count = 0; INIT_LIST_HEAD(&temp_header); submit_stream_id = hiraid_get_submit_io_stream(hdid, hdev); - io_slist = hiraid_get_io_head(hdid * MAX_STREAM_NUM + submit_stream_id); + io_slist = &hiraid_io_heads_per_stream[hdid * HIRAID_MAX_STREAM_NUM + + submit_stream_id]; mutex_lock(&io_slist->lock); list_for_each_safe(node, next_node, &io_slist->list) { list_del(node); list_add_tail(node, &temp_header); - if (++count >= MAX_IO_NUM_ONCE) + if (count++ >= MAX_IO_NUM_ONCE) break; } mutex_unlock(&io_slist->lock); list_for_each_safe(node, next_node, &temp_header) { - temp_io_stream = list_entry(node, struct IO_LIST_S, list); + temp_io_stream = list_entry(node, + struct hiraid_stream_io_list, list); io_cmd = temp_io_stream->io_cmd; hiraid_submit_cmd(temp_io_stream->submit_queue, &io_cmd); - hiraid_inc_io_transport_num(hdid, submit_stream_id, - io_cmd.rw.nlb * temp_io_stream->sector_size); + hdev->hiraid_stream_sent_io_size[hdid][submit_stream_id] += + io_cmd.rw.nlb * temp_io_stream->sector_size; list_del(node); kfree(temp_io_stream); } @@ -1017,59 +844,90 @@ static void hiraid_submit_io_stream(u16 hdid, struct hiraid_dev *hdev) list_del_init(&temp_header); } -static u8 hiraid_detect_if_aging(void) +static u8 hiraid_aging_detect(struct hiraid_dev *hdev) { - if (++g_io_count == MAX_AGING_NUM) { - g_io_count = 0; + if (hdev->hiraid_stream_aging_time == 0) + hdev->hiraid_stream_aging_time = jiffies_to_msecs(jiffies); + if (++hdev->hiraid_stream_io_count > MAX_AGING_NUM) { + hdev->hiraid_stream_io_count = 0; + if ((jiffies_to_msecs(jiffies) - + hdev->hiraid_stream_aging_time) < + MAX_AGING_TIME) + return 0; + hdev->hiraid_stream_aging_time = jiffies_to_msecs(jiffies); return 1; } return 0; } +static void hiraid_sync_using_stream(struct hiraid_dev *hdev, u16 hdid) +{ + u8 i; + struct hiraid_stream *temp_stream = NULL; + + hdev->hiraid_stream_io_num_per_pd[hdid][HIRAID_STREAM_TYPE_TOTAL] = 0; + for (i = 0; i < HIRAID_MAX_STREAM_NUM; i++) { + temp_stream = &hdev->hiraid_stream_array[hdid][i]; + if (temp_stream->stream_is_using) + hdev->hiraid_stream_io_num_per_pd[hdid] + [HIRAID_STREAM_TYPE_TOTAL]++; + } +} + static void hiraid_aging(struct hiraid_dev *hdev) { - struct HIRAID_STREAM_S *temp_stream = NULL; + struct hiraid_stream *temp_stream = NULL; int i = 0; int j = 0; - - for (i = 0; i < MAX_PD_NUM; i++) { - for (j = 0; j < MAX_STREAM_NUM; j++) { - temp_stream = hiraid_get_stream(i, j); - if (temp_stream->using) { - hiraid_io_recognition_iterator(temp_stream, -1); + u16 *tmp_io_num = NULL; + + for (i = 0; i < HIRAID_MAX_PD_NUM; i++) { + for (j = 0; j < HIRAID_MAX_STREAM_NUM; j++) { + temp_stream = &hdev->hiraid_stream_array[i][j]; + if (temp_stream->stream_is_using) { + hiraid_io_recognition_iterator(temp_stream, + AGING_DEGRADE); + spin_lock(&hdev->hiraid_stream_array_lock); + tmp_io_num = &hdev->hiraid_stream_io_num_per_pd + [i][HIRAID_STREAM_TYPE_TOTAL]; if (temp_stream->aging_credit <= 0) { - hiraid_dec_stream_num(i); + if (*tmp_io_num > 0) + (*tmp_io_num)--; memset(temp_stream, - 0, sizeof(struct HIRAID_STREAM_S)); // 老化 + 0, sizeof(struct hiraid_stream)); } + spin_unlock(&hdev->hiraid_stream_array_lock); } } + hiraid_sync_using_stream(hdev, i); } } -static u8 hiraid_io_list_operation(u32 hdid, u16 cid, u16 hwq, u8 operation) +static u8 hiraid_stream_delete_io_list(struct hiraid_dev *hdev, u32 hdid, + u16 cid, u16 hwq, u8 delete_operation) { int i, j; - - struct mutex_list_head_s *io_slist = NULL; + struct mutex_list_head *io_slist = NULL; struct list_head *node = NULL; struct list_head *next_node = NULL; struct hiraid_scsi_io_cmd *io_cmd = NULL; struct hiraid_queue *hiraidq = NULL; - struct IO_LIST_S *temp_io_stream = NULL; + struct hiraid_stream_io_list *temp_io_stream = NULL; + u8 max_hd_num = delete_operation == + HIRAID_STREAM_TYPE_DELETE_ALL_IO_LIST ? HIRAID_MAX_PD_NUM : hdid + 1; - u8 max_hd_num = operation == TYPE_DELETE_ALL_IO_LIST ? - MAX_PD_NUM : hdid + 1; for (i = hdid; i < max_hd_num; i++) { - for (j = 0; j < MAX_STREAM_NUM; j++) { - io_slist = hiraid_get_io_head(i * MAX_STREAM_NUM + j); + for (j = 0; j < HIRAID_MAX_STREAM_NUM; j++) { + io_slist = + &hiraid_io_heads_per_stream[i * HIRAID_MAX_STREAM_NUM + j]; mutex_lock(&io_slist->lock); list_for_each_safe(node, next_node, &io_slist->list) { - temp_io_stream = list_entry(node, - struct IO_LIST_S, list); + temp_io_stream = + list_entry(node, struct hiraid_stream_io_list, list); io_cmd = &(temp_io_stream->io_cmd); hiraidq = temp_io_stream->submit_queue; - if (operation >= TYPE_DELETE_SINGLE_IO_LIST) { + if (delete_operation >= + HIRAID_STREAM_TYPE_DELETE_SINGLE_IO_LIST) { list_del_init(node); kfree(temp_io_stream); temp_io_stream = NULL; @@ -1089,54 +947,27 @@ static u8 hiraid_io_list_operation(u32 hdid, u16 cid, u16 hwq, u8 operation) return 0; } -static u8 hiraid_check_io_list(u32 hdid, u16 cid, u16 hwq) -{ - u8 ret; - - mutex_lock(&g_stream_operation_mutex); - ret = hiraid_io_list_operation(hdid, cid, hwq, TYPE_DELETE_SINGLE_IO); - mutex_unlock(&g_stream_operation_mutex); - return ret; -} - -static u8 hiraid_delete_single_pd_io_list(u32 hdid) +static u8 hiraid_add_io_to_list(struct hiraid_dev *hdev, + struct hiraid_queue *submit_queue, struct hiraid_stream *tmp_stream, + struct hiraid_scsi_io_cmd io_cmd, unsigned int sector_size, + u16 actual_id) { - u8 ret; + struct mutex_list_head *io_slist = NULL; + struct hiraid_stream_io_list *new_io_node = NULL; + u16 temp_io_id = actual_id * HIRAID_MAX_STREAM_NUM + + tmp_stream->stream_id; - mutex_lock(&g_stream_operation_mutex); - ret = hiraid_io_list_operation(hdid, 0, 0, TYPE_DELETE_SINGLE_IO_LIST); - mutex_unlock(&g_stream_operation_mutex); - return ret; -} - -static u8 hiraid_delete_all_io_list(void) -{ - u8 ret; - - mutex_lock(&g_stream_operation_mutex); - ret = hiraid_io_list_operation(0, 0, 0, TYPE_DELETE_ALL_IO_LIST); - mutex_unlock(&g_stream_operation_mutex); - return ret; -} - -static u8 hiraid_add_io_to_list(struct hiraid_queue *submit_queue, - struct HIRAID_STREAM_S *tmp_stream, struct hiraid_scsi_io_cmd io_cmd, - unsigned int sector_size, u16 actual_id) -{ - struct mutex_list_head_s *io_slist = NULL; - struct IO_LIST_S *new_io_node = NULL; - - new_io_node = kmalloc(sizeof(struct IO_LIST_S), GFP_KERNEL); + new_io_node = kmalloc(sizeof(struct hiraid_stream_io_list), GFP_KERNEL); if (!new_io_node) return 0; new_io_node->io_cmd = io_cmd; new_io_node->submit_queue = submit_queue; new_io_node->sector_size = sector_size; - io_slist = hiraid_get_io_head(actual_id * - MAX_STREAM_NUM + tmp_stream->stream_id); + io_slist = &hiraid_io_heads_per_stream[temp_io_id]; mutex_lock(&io_slist->lock); INIT_LIST_HEAD(&(new_io_node->list)); - list_add_tail(&(new_io_node->list), &io_slist->list); + list_add_tail(&(new_io_node->list), + &hiraid_io_heads_per_stream[temp_io_id].list); mutex_unlock(&io_slist->lock); return 1; } @@ -1146,10 +977,8 @@ static void hiraid_submit_io_threading(struct hiraid_dev *hdev) int i = 0; while (!kthread_should_stop()) { - mutex_lock(&g_stream_operation_mutex); - for (i = 0; i < MAX_PD_NUM; i++) + for (i = 0; i < HIRAID_MAX_PD_NUM; i++) hiraid_submit_io_stream(i, hdev); - mutex_unlock(&g_stream_operation_mutex); usleep_range(MIN_IO_SEND_TIME, MAX_IO_SEND_TIME); } } @@ -1158,27 +987,34 @@ static void hiraid_destroy_io_stream_resource(struct hiraid_dev *hdev) { u16 i; - for (i = 0; i < (MAX_PD_NUM * MAX_STREAM_NUM); i++) - list_del_init(&hiraid_get_io_head(i)->list); + for (i = 0; i < (HIRAID_MAX_PD_NUM * HIRAID_MAX_STREAM_NUM); i++) { + list_del_init(&hiraid_io_heads_per_stream[i].list); + mutex_destroy(&hiraid_io_heads_per_stream[i].lock); + } } -struct task_struct *g_hiraid_submit_task; static void hiraid_init_io_stream(struct hiraid_dev *hdev) { - hiraid_io_recognition_init(); - g_hiraid_submit_task = kthread_run((void *)hiraid_submit_io_threading, - hdev, "hiraid_submit_thread"); + hiraid_io_recognition_init(hdev); + hdev->hiraid_stream_submit_task = + kthread_run((void *)hiraid_submit_io_threading, + hdev, "hiraid_stream_submit_task"); } #define HIRAID_RW_FUA BIT(14) +#define RW_LENGTH_ZERO (67) static int hiraid_setup_rw_cmd(struct hiraid_dev *hdev, struct hiraid_scsi_rw_cmd *io_cmd, - struct scsi_cmnd *scmd) + struct scsi_cmnd *scmd, + struct hiraid_mapmange *mapbuf) { + u32 ret = 0; u32 start_lba_lo, start_lba_hi; u32 datalength = 0; u16 control = 0; + struct scsi_device *sdev = scmd->device; + u32 buf_len = cpu_to_le32(scsi_bufflen(scmd)); start_lba_lo = 0; start_lba_hi = 0; @@ -1187,6 +1023,8 @@ static int hiraid_setup_rw_cmd(struct hiraid_dev *hdev, io_cmd->opcode = HIRAID_CMD_WRITE; } else if (scmd->sc_data_direction == DMA_FROM_DEVICE) { io_cmd->opcode = HIRAID_CMD_READ; + } else if (scmd->sc_data_direction == DMA_NONE) { + ret = RW_LENGTH_ZERO; } else { dev_err(hdev->dev, "invalid RW_IO for unsupported data direction[%d]\n", scmd->sc_data_direction); @@ -1194,6 +1032,9 @@ static int hiraid_setup_rw_cmd(struct hiraid_dev *hdev, return -EINVAL; } + if (ret == RW_LENGTH_ZERO) + return ret; + /* 6-byte READ(0x08) or WRITE(0x0A) cdb */ if (scmd->cmd_len == 6) { datalength = (u32)(scmd->cmnd[4] == 0 ? @@ -1230,18 +1071,32 @@ static int hiraid_setup_rw_cmd(struct hiraid_dev *hdev, control |= HIRAID_RW_FUA; } - if (unlikely(datalength > U16_MAX || datalength == 0)) { - dev_err(hdev->dev, "invalid IO for err trans data length[%u]\n", + if (unlikely(datalength > U16_MAX)) { + dev_err(hdev->dev, "invalid IO for illegal transfer data length[%u]\n", datalength); WARN_ON(1); return -EINVAL; } + if (unlikely(datalength == 0)) + return RW_LENGTH_ZERO; + io_cmd->slba = cpu_to_le64(((u64)start_lba_hi << 32) | start_lba_lo); /* 0base for nlb */ io_cmd->nlb = cpu_to_le16((u16)(datalength - 1)); io_cmd->control = cpu_to_le16(control); + mapbuf->cdb_data_len = (u32)((io_cmd->nlb + 1) * sdev->sector_size); + if (mapbuf->cdb_data_len > buf_len) { + /* return DID_ERROR */ + dev_err(hdev->dev, "error: buf len[0x%x] is smaller than actual length[0x%x] sectorsize[0x%x]\n", + buf_len, mapbuf->cdb_data_len, sdev->sector_size); + return -EINVAL; + } else if (mapbuf->cdb_data_len < buf_len) { + dev_warn(hdev->dev, "warn: buf_len[0x%x] cdb_data_len[0x%x] nlb[0x%x] sectorsize[0x%x]\n", + buf_len, mapbuf->cdb_data_len, + io_cmd->nlb, sdev->sector_size); + } return 0; } @@ -1296,14 +1151,17 @@ static bool hiraid_disk_is_hdd_rawdrive(u8 attr) } static int hiraid_setup_io_cmd(struct hiraid_dev *hdev, - struct hiraid_scsi_io_cmd *io_cmd, - struct scsi_cmnd *scmd) + struct hiraid_scsi_io_cmd *io_cmd, struct scsi_cmnd *scmd, + struct hiraid_mapmange *mapbuf) { memcpy(io_cmd->common.cdb, scmd->cmnd, scmd->cmd_len); io_cmd->common.cdb_len = scmd->cmd_len; + /* init cdb_data_len */ + mapbuf->cdb_data_len = cpu_to_le32(scsi_bufflen(scmd)); + if (hiraid_is_rw_scmd(scmd)) - return hiraid_setup_rw_cmd(hdev, &io_cmd->rw, scmd); + return hiraid_setup_rw_cmd(hdev, &io_cmd->rw, scmd, mapbuf); else return hiraid_setup_nonrw_cmd(hdev, &io_cmd->nonrw, scmd); } @@ -1360,7 +1218,7 @@ static int hiraid_io_map_data(struct hiraid_dev *hdev, ret = scsi_dma_map(scmd); if (unlikely(ret < 0)) - return ret; + return SCSI_MLQUEUE_HOST_BUSY; mapbuf->sge_cnt = ret; /* No data to DMA, it may be scsi no-rw command */ @@ -1390,7 +1248,12 @@ static void hiraid_check_status(struct hiraid_mapmange *mapbuf, struct scsi_cmnd *scmd, struct hiraid_completion *cqe) { - scsi_set_resid(scmd, 0); + u32 datalength = cpu_to_le32(scsi_bufflen(scmd)); + + if (datalength > mapbuf->cdb_data_len) + scsi_set_resid(scmd, datalength - mapbuf->cdb_data_len); + else + scsi_set_resid(scmd, 0); switch ((le16_to_cpu(cqe->status) >> 1) & 0x7f) { case SENSE_STATE_OK: @@ -1443,6 +1306,35 @@ static inline void hiraid_query_scmd_tag(struct scsi_cmnd *scmd, u16 *qid, *cid = blk_mq_unique_tag_to_tag(tag); } +static void hiraid_set_ncq_prio(struct scsi_cmnd *scmd, + struct hiraid_scsi_io_cmd *io_cmd, + struct hiraid_sdev_hostdata *hostdata, + struct hiraid_dev *hdev) +{ + struct request *rq = scmd->request; + int class = 0; + + if (hostdata->sata_ncq_prio_enable) { + class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq)); + switch (class) { + case IOPRIO_CLASS_NONE: + break; + case IOPRIO_CLASS_RT: + io_cmd->rw.rsvd2 &= SCSI_RSVD_INIT_SHIFT; + io_cmd->rw.rsvd2 |= (1 << IOPRIO_CLASS_RT); + break; + case IOPRIO_CLASS_BE: + break; + case IOPRIO_CLASS_IDLE: + break; + default: + break; + } + } + dev_log_dbg(hdev->dev, "blk_ncq_prio = %d, scmd_ncq_prio = 0x%02x\n", + class, io_cmd->rw.rsvd2); +} + static int hiraid_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) { @@ -1452,7 +1344,7 @@ static int hiraid_queue_command(struct Scsi_Host *shost, struct hiraid_sdev_hostdata *hostdata; struct hiraid_scsi_io_cmd io_cmd; struct hiraid_queue *ioq; - struct HIRAID_STREAM_S *tmp_stm = NULL; + struct hiraid_stream *tmp_stream; u16 hwq, cid; int ret; @@ -1482,9 +1374,16 @@ static int hiraid_queue_command(struct Scsi_Host *shost, io_cmd.rw.hdid = cpu_to_le32(hostdata->hdid); io_cmd.rw.cmd_id = cpu_to_le16(cid); - ret = hiraid_setup_io_cmd(hdev, &io_cmd, scmd); + hiraid_set_ncq_prio(scmd, &io_cmd, hostdata, hdev); + + ret = hiraid_setup_io_cmd(hdev, &io_cmd, scmd, mapbuf); if (unlikely(ret)) { - set_host_byte(scmd, DID_ERROR); + if (ret == RW_LENGTH_ZERO) { + scsi_set_resid(scmd, scsi_bufflen(scmd)); + set_host_byte(scmd, DID_OK); + } else { + set_host_byte(scmd, DID_ERROR); + } scmd->scsi_done(scmd); atomic_dec(&ioq->inflight); return 0; @@ -1507,6 +1406,10 @@ static int hiraid_queue_command(struct Scsi_Host *shost, mapbuf->cid = cid; ret = hiraid_io_map_data(hdev, mapbuf, scmd, &io_cmd); if (unlikely(ret)) { + if (ret == SCSI_MLQUEUE_HOST_BUSY) { + atomic_dec(&ioq->inflight); + return ret; + } dev_err(hdev->dev, "io map data err\n"); set_host_byte(scmd, DID_ERROR); scmd->scsi_done(scmd); @@ -1515,20 +1418,20 @@ static int hiraid_queue_command(struct Scsi_Host *shost, } WRITE_ONCE(mapbuf->state, CMD_FLIGHT); - if (hiraid_is_rw_scmd(scmd) && hiraid_disk_is_hdd_rawdrive(hostdata->attr)) { - if (hiraid_detect_if_aging()) + if (hiraid_aging_detect(hdev)) hiraid_aging(hdev); - tmp_stm = hiraid_stream_detect(hdev, &(io_cmd.rw), sdev->id); - if (tmp_stm != NULL) { - hiraid_io_recognition_iterator(tmp_stm, 1); - if (hiraid_recognition_acknowledge(tmp_stm) && - (hiraid_get_stream_num(sdev->id) > 1)) { - if (hiraid_add_io_to_list(ioq, - tmp_stm, io_cmd, sdev->sector_size, sdev->id)) { + tmp_stream = hiraid_stream_detect(hdev, &(io_cmd.rw), sdev->id); + if (tmp_stream != NULL) { + hiraid_io_recognition_iterator(tmp_stream, 1); + if (hiraid_recognition_acknowledge(tmp_stream) && + (sdev->id < HIRAID_MAX_PD_NUM) && + (hdev->hiraid_stream_io_num_per_pd[sdev->id][HIRAID_STREAM_TYPE_TOTAL] > + 1)) { + if (hiraid_add_io_to_list(hdev, ioq, tmp_stream, + io_cmd, sdev->sector_size, sdev->id)) return 0; - } } } } @@ -1666,6 +1569,7 @@ static int hiraid_slave_configure(struct scsi_device *sdev) static void hiraid_shost_init(struct hiraid_dev *hdev) { struct pci_dev *pdev = hdev->pdev; + struct scsi_host_template *hostt; u8 domain, bus; u32 dev_func; @@ -1674,8 +1578,10 @@ static void hiraid_shost_init(struct hiraid_dev *hdev) dev_func = pdev->devfn; hdev->shost->nr_hw_queues = work_mode ? 1 : hdev->online_queues - 1; - hdev->shost->can_queue = hdev->scsi_qd; - + hdev->shost->can_queue = min(hdev->scsi_qd, + le16_to_cpu(hdev->ctrl_info->max_cmds) - + HIRAID_AQ_DEPTH - + HIRAID_TOTAL_PTCMDS(hdev->online_queues - 1)); hdev->shost->sg_tablesize = le16_to_cpu(hdev->ctrl_info->max_num_sge); /* 512B per sector */ hdev->shost->max_sectors = @@ -1689,7 +1595,13 @@ static void hiraid_shost_init(struct hiraid_dev *hdev) hdev->shost->this_id = -1; hdev->shost->unique_id = (domain << 16) | (bus << 8) | dev_func; hdev->shost->max_cmd_len = MAX_CDB_LEN; - hdev->shost->hostt->cmd_size = hiraid_get_max_cmd_size(hdev); + + hostt = (struct scsi_host_template *)hdev->shost->hostt; + hostt->cmd_size = hiraid_get_max_cmd_size(hdev); + + dev_info(hdev->dev, "nr_hw_queues[%u] can_queue[%u] unique_id[%u] cmd_size[%u]\n", + hdev->shost->nr_hw_queues, hdev->shost->can_queue, + hdev->shost->unique_id, hdev->shost->hostt->cmd_size); } static int hiraid_alloc_queue(struct hiraid_dev *hdev, u16 qid, u16 depth) @@ -2505,6 +2417,14 @@ static int hiraid_get_queue_cnt(struct hiraid_dev *hdev, u32 *cnt) return 0; } +static bool hiraid_pci_is_present(struct hiraid_dev *hdev) +{ + if (!pci_device_is_present(hdev->pdev) || + readl(hdev->bar + HIRAID_REG_VS) == 0Xffffffff) + return false; + return true; +} + static int hiraid_setup_io_queues(struct hiraid_dev *hdev) { struct hiraid_queue *adminq = &hdev->queues[0]; @@ -2579,7 +2499,7 @@ static void hiraid_delete_io_queues(struct hiraid_dev *hdev) u8 opcode = HIRAID_ADMIN_DELETE_SQ; u16 i, pass; - if (!pci_device_is_present(hdev->pdev)) { + if (!hiraid_pci_is_present(hdev)) { dev_err(hdev->dev, "pci_device is not present, skip disable io queues\n"); return; } @@ -2619,7 +2539,7 @@ static void hiraid_disable_admin_queue(struct hiraid_dev *hdev, struct hiraid_queue *adminq = &hdev->queues[0]; u16 start, end; - if (pci_device_is_present(hdev->pdev)) { + if (hiraid_pci_is_present(hdev)) { if (shutdown) hiraid_shutdown_control(hdev); else @@ -2850,11 +2770,16 @@ static int hiraid_luntarget_sort(const void *l, const void *r) const struct hiraid_dev_info *rn = r; int l_attr = HIRAID_DEV_INFO_ATTR_BOOT(ln->attr); int r_attr = HIRAID_DEV_INFO_ATTR_BOOT(rn->attr); - /* boot first */ if (l_attr != r_attr) return (r_attr - l_attr); + l_attr = HIRAID_DEV_INFO_ATTR_VD(ln->attr); + r_attr = HIRAID_DEV_INFO_ATTR_VD(rn->attr); + /* vd second */ + if (l_attr != r_attr) + return (r_attr - l_attr); + if (ln->channel == rn->channel) return le16_to_cpu(ln->target) - le16_to_cpu(rn->target); @@ -2892,6 +2817,19 @@ static void hiraid_scan_work(struct work_struct *work) if (HIRAID_DEV_INFO_FLAG_VALID(flag)) { if (!HIRAID_DEV_INFO_FLAG_VALID(org_flag)) { + down_write(&hdev->dev_rwsem); + memcpy(&old_dev[i], &dev[i], + sizeof(struct hiraid_dev_info)); + memcpy(&new_dev[count++], &dev[i], + sizeof(struct hiraid_dev_info)); + up_write(&hdev->dev_rwsem); + } else if (HIRAID_DEV_INFO_FLAG_DELETE_OR_ADD(flag, + org_flag)) { + down_write(&hdev->dev_rwsem); + old_dev[i].flag &= 0xfe; + up_write(&hdev->dev_rwsem); + hiraid_delete_device(hdev, &old_dev[i]); + down_write(&hdev->dev_rwsem); memcpy(&old_dev[i], &dev[i], sizeof(struct hiraid_dev_info)); @@ -3044,7 +2982,7 @@ static void hiraid_bsg_buf_unmap(struct hiraid_dev *hdev, struct bsg_job *job) } static int hiraid_bsg_buf_map(struct hiraid_dev *hdev, struct bsg_job *job, - struct hiraid_admin_command *cmd) + struct hiraid_admin_command *cmd, u16 qid) { struct hiraid_bsg_request *bsg_req = job->request; struct request *rq = blk_mq_rq_from_pdu(job); @@ -3058,24 +2996,26 @@ static int hiraid_bsg_buf_map(struct hiraid_dev *hdev, struct bsg_job *job, mapbuf->sgl = job->request_payload.sg_list; mapbuf->len = job->request_payload.payload_len; mapbuf->page_cnt = -1; + mapbuf->hiraidq = &hdev->queues[qid]; if (unlikely(mapbuf->sge_cnt == 0)) goto out; - mapbuf->use_sgl = !hiraid_is_prp(hdev, mapbuf->sgl, mapbuf->sge_cnt); - ret = dma_map_sg_attrs(hdev->dev, mapbuf->sgl, mapbuf->sge_cnt, - dma_dir, DMA_ATTR_NO_WARN); + dma_dir, DMA_ATTR_NO_WARN); if (!ret) goto out; - if ((mapbuf->use_sgl == (bool)true) && + mapbuf->use_sgl = !hiraid_is_prp(hdev, mapbuf->sgl, mapbuf->sge_cnt); + + if ((mapbuf->use_sgl == true) && (bsg_req->msgcode == HIRAID_BSG_IOPTHRU) && - (hdev->ctrl_info->pt_use_sgl != (bool)false)) { - ret = hiraid_build_passthru_sgl(hdev, cmd, mapbuf); + (hdev->ctrl_info->pt_use_sgl != false)) { + ret = hiraid_build_sgl(hdev, (struct hiraid_scsi_io_cmd *)cmd, + mapbuf); } else { mapbuf->use_sgl = false; - ret = hiraid_build_passthru_prp(hdev, mapbuf); + ret = hiraid_build_prp(hdev, mapbuf); cmd->common.dptr.prp1 = cpu_to_le64(sg_dma_address(mapbuf->sgl)); cmd->common.dptr.prp2 = cpu_to_le64(mapbuf->first_dma); @@ -3156,9 +3096,7 @@ static int hiraid_init_control_info(struct hiraid_dev *hdev) if (hdev->ctrl_info->asynevent > HIRAID_ASYN_COMMANDS) hdev->ctrl_info->asynevent = HIRAID_ASYN_COMMANDS; - hdev->scsi_qd = work_mode ? - le16_to_cpu(hdev->ctrl_info->max_cmds) : - (hdev->ioq_depth - HIRAID_PTHRU_CMDS_PERQ); + hdev->scsi_qd = hdev->ioq_depth - HIRAID_PTHRU_CMDS_PERQ; return 0; } @@ -3191,7 +3129,7 @@ static int hiraid_user_send_admcmd(struct hiraid_dev *hdev, struct bsg_job *job) admin_cmd.common.cdw14 = cpu_to_le32(ptcmd->cdw14); admin_cmd.common.cdw15 = cpu_to_le32(ptcmd->cdw15); - status = hiraid_bsg_buf_map(hdev, job, &admin_cmd); + status = hiraid_bsg_buf_map(hdev, job, &admin_cmd, 0); if (status) { dev_err(hdev->dev, "err, map data failed\n"); return status; @@ -3251,19 +3189,13 @@ static void hiraid_free_io_ptcmds(struct hiraid_dev *hdev) } static int hiraid_put_io_sync_request(struct hiraid_dev *hdev, - struct hiraid_scsi_io_cmd *io_cmd, - u32 *result, u32 *reslen, u32 timeout) + struct hiraid_scsi_io_cmd *io_cmd, struct hiraid_cmd *pt_cmd, + u32 *result, u32 *reslen, u32 timeout) { int ret; dma_addr_t buffer_phy; struct hiraid_queue *ioq; void *sense_addr = NULL; - struct hiraid_cmd *pt_cmd = hiraid_get_cmd(hdev, HIRAID_CMD_PTHRU); - - if (!pt_cmd) { - dev_err(hdev->dev, "err, get ioq cmd failed\n"); - return -EFAULT; - } timeout = timeout ? timeout : ADMIN_TIMEOUT; @@ -3271,8 +3203,9 @@ static int hiraid_put_io_sync_request(struct hiraid_dev *hdev, ioq = &hdev->queues[pt_cmd->qid]; if (work_mode) { - ret = ((pt_cmd->qid - 1) * HIRAID_PTHRU_CMDS_PERQ + - pt_cmd->cid) * SCSI_SENSE_BUFFERSIZE; + ret = ((pt_cmd->qid - 1) * + HIRAID_PTHRU_CMDS_PERQ + pt_cmd->cid) * + SCSI_SENSE_BUFFERSIZE; sense_addr = hdev->sense_buffer_virt + ret; buffer_phy = hdev->sense_buffer_phy + ret; } else { @@ -3293,8 +3226,6 @@ static int hiraid_put_io_sync_request(struct hiraid_dev *hdev, (le32_to_cpu(io_cmd->common.cdw3[0]) & 0xffff)); hiraid_admin_timeout(hdev, pt_cmd); - - hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); return -ETIME; } @@ -3305,8 +3236,6 @@ static int hiraid_put_io_sync_request(struct hiraid_dev *hdev, } } - hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); - return pt_cmd->status; } @@ -3316,6 +3245,7 @@ static int hiraid_user_send_ptcmd(struct hiraid_dev *hdev, struct bsg_job *job) (struct hiraid_bsg_request *)(job->request); struct hiraid_passthru_io_cmd *cmd = &(bsg_req->pthrucmd); struct hiraid_scsi_io_cmd pthru_cmd; + struct hiraid_cmd *pt_cmd; int status = 0; u32 timeout = msecs_to_jiffies(cmd->timeout_ms); // data len is 4k before use sgl, now len is 1M @@ -3358,16 +3288,21 @@ static int hiraid_user_send_ptcmd(struct hiraid_dev *hdev, struct bsg_job *job) pthru_cmd.common.cdw26[2] = cpu_to_le32(cmd->cdw26[2]); pthru_cmd.common.cdw26[3] = cpu_to_le32(cmd->cdw26[3]); + pt_cmd = hiraid_get_cmd(hdev, HIRAID_CMD_PTHRU); + if (pt_cmd == NULL) { + dev_err(hdev->dev, "err, get ioq cmd failed\n"); + return -EFAULT; + } + status = hiraid_bsg_buf_map(hdev, job, - (struct hiraid_admin_command *)&pthru_cmd); + (struct hiraid_admin_command *)&pthru_cmd, pt_cmd->qid); if (status) { dev_err(hdev->dev, "err, map data failed\n"); - return status; + goto ret; } - status = hiraid_put_io_sync_request(hdev, &pthru_cmd, job->reply, - &job->reply_len, timeout); - + status = hiraid_put_io_sync_request(hdev, &pthru_cmd, pt_cmd, + job->reply, &job->reply_len, timeout); if (status) dev_info(hdev->dev, "opcode[0x%x] subopcode[0x%x] status[0x%x] replylen[%d]\n", cmd->opcode, cmd->info_1.subopcode, @@ -3375,6 +3310,9 @@ static int hiraid_user_send_ptcmd(struct hiraid_dev *hdev, struct bsg_job *job) hiraid_bsg_buf_unmap(hdev, job); +ret: + hiraid_put_cmd(hdev, pt_cmd, HIRAID_CMD_PTHRU); + return status; } @@ -3396,23 +3334,28 @@ static bool hiraid_check_scmd_finished(struct scsi_cmnd *scmd) return false; } -static enum blk_eh_timer_return hiraid_timed_out(struct scsi_cmnd *scmd) +#define EH_TIMED_RET_TP enum blk_eh_timer_return +#define EH_TIMED_RET_DONE BLK_EH_DONE +#define EH_TIMED_RET_NOT_HANDLED BLK_EH_DONE +#define EH_TIMED_RET_RESET_TIMER BLK_EH_RESET_TIMER +static EH_TIMED_RET_TP hiraid_timed_out(struct scsi_cmnd *scmd) { struct hiraid_mapmange *mapbuf = scsi_cmd_priv(scmd); unsigned int timeout = scmd->device->request_queue->rq_timeout; if (hiraid_check_scmd_finished(scmd)) - goto out; + return EH_TIMED_RET_DONE; - if (time_after(jiffies, scmd->jiffies_at_alloc + timeout)) { + if (time_after_eq(jiffies, scmd->jiffies_at_alloc + timeout)) { + scmd_printk(KERN_WARNING, scmd, "timeout is work[%d], need abort\n", + mapbuf->state); if (cmpxchg(&mapbuf->state, CMD_FLIGHT, CMD_TIMEOUT) == CMD_FLIGHT) - return BLK_EH_DONE; + return EH_TIMED_RET_NOT_HANDLED; } -out: - return BLK_EH_RESET_TIMER; -} + return EH_TIMED_RET_RESET_TIMER; +} /* send abort command by admin queue temporary */ static int hiraid_send_abort_cmd(struct hiraid_dev *hdev, u32 hdid, u16 qid, u16 cid) @@ -3535,7 +3478,7 @@ static int hiraid_dev_disable(struct hiraid_dev *hdev, bool shutdown) struct hiraid_queue *adminq = &hdev->queues[0]; u16 start, end; - if (pci_device_is_present(hdev->pdev)) { + if (hiraid_pci_is_present(hdev)) { if (shutdown) hiraid_shutdown_control(hdev); else @@ -3733,11 +3676,19 @@ static int hiraid_abort(struct scsi_cmnd *scmd) hiraid_check_scmd_finished(scmd)) return SUCCESS; + if (hdev->state != DEV_LIVE) + return SUCCESS; + hostdata = scmd->device->hostdata; cid = mapbuf->cid; hwq = mapbuf->hiraidq->qid; - if (hiraid_check_io_list(hostdata->hdid, cid, hwq)) { + mutex_lock(&g_stream_operation_mutex); + ret = hiraid_stream_delete_io_list(hdev, hostdata->hdid, + cid, hwq, HIRAID_STREAM_TYPE_DELETE_SINGLE_IO); + mutex_unlock(&g_stream_operation_mutex); + + if (ret == 1) { dev_warn(hdev->dev, "find cid[%d] qid[%d] in host, abort succ\n", cid, hwq); return SUCCESS; @@ -3749,14 +3700,14 @@ static int hiraid_abort(struct scsi_cmnd *scmd) ret = hiraid_wait_io_completion(mapbuf); if (ret) { dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, not found\n", - cid, hwq); + cid, hwq); return FAILED; } dev_warn(hdev->dev, "cid[%d] qid[%d] abort succ\n", cid, hwq); return SUCCESS; } dev_warn(hdev->dev, "cid[%d] qid[%d] abort failed, timeout\n", - cid, hwq); + cid, hwq); return FAILED; } @@ -3778,7 +3729,10 @@ static int hiraid_scsi_reset(struct scsi_cmnd *scmd, enum hiraid_rst_type rst) if ((ret == 0) || (ret == FW_EH_DEV_NONE && rst == HIRAID_RESET_TARGET)) { if (rst == HIRAID_RESET_TARGET) { - hiraid_delete_single_pd_io_list(hostdata->hdid); + mutex_lock(&g_stream_operation_mutex); + hiraid_stream_delete_io_list(hdev, hostdata->hdid, + 0, 0, HIRAID_STREAM_TYPE_DELETE_SINGLE_IO_LIST); + mutex_unlock(&g_stream_operation_mutex); ret = wait_tgt_reset_io_done(scmd); if (ret) { dev_warn(hdev->dev, "sdev[%d:%d] target has %d peding cmd, target reset failed\n", @@ -3818,9 +3772,13 @@ static int hiraid_host_reset(struct scsi_cmnd *scmd) dev_warn(hdev->dev, "sdev[%d:%d] send host reset\n", scmd->device->channel, scmd->device->id); - hiraid_delete_all_io_list(); - if (hiraid_reset_work_sync(hdev) == -EBUSY) + mutex_lock(&g_stream_operation_mutex); + hiraid_stream_delete_io_list(hdev, 0, 0, 0, + HIRAID_STREAM_TYPE_DELETE_ALL_IO_LIST); + mutex_unlock(&g_stream_operation_mutex); + if (hiraid_reset_work_sync(hdev) == -EBUSY) { flush_work(&hdev->reset_work); + } if (hdev->state != DEV_LIVE) { dev_warn(hdev->dev, "sdev[%d:%d] host reset failed\n", @@ -3852,7 +3810,10 @@ static pci_ers_result_t hiraid_pci_error_detected(struct pci_dev *pdev, scsi_block_requests(hdev->shost); hiraid_dev_state_trans(hdev, DEV_RESETTING); - hiraid_delete_all_io_list(); + mutex_lock(&g_stream_operation_mutex); + hiraid_stream_delete_io_list(hdev, 0, 0, 0, + HIRAID_STREAM_TYPE_DELETE_ALL_IO_LIST); + mutex_unlock(&g_stream_operation_mutex); return PCI_ERS_RESULT_NEED_RESET; case pci_channel_io_perm_failure: @@ -3888,9 +3849,24 @@ static void hiraid_reset_pci_finish(struct pci_dev *pdev) { struct hiraid_dev *hdev = pci_get_drvdata(pdev); + if (hiraid_reset_work_sync(hdev) == -EBUSY) + flush_work(&hdev->reset_work); + dev_info(hdev->dev, "enter hiraid reset finish\n"); } +static void hiraid_reset_pci_prepare(struct pci_dev *pdev) +{ + struct hiraid_dev *hdev = pci_get_drvdata(pdev); + + mutex_lock(&g_stream_operation_mutex); + hiraid_stream_delete_io_list(hdev, 0, 0, 0, + HIRAID_STREAM_TYPE_DELETE_ALL_IO_LIST); + mutex_unlock(&g_stream_operation_mutex); + msleep(100); + dev_info(hdev->dev, "exit hiraid reset prepare\n"); +} + static ssize_t csts_pp_show(struct device *cdev, struct device_attribute *attr, char *buf) { @@ -3898,7 +3874,7 @@ static ssize_t csts_pp_show(struct device *cdev, struct hiraid_dev *hdev = shost_priv(shost); int ret = -1; - if (pci_device_is_present(hdev->pdev)) { + if (hiraid_pci_is_present(hdev)) { ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_PP_MASK); ret >>= HIRAID_CSTS_PP_SHIFT; @@ -3914,7 +3890,7 @@ static ssize_t csts_shst_show(struct device *cdev, struct hiraid_dev *hdev = shost_priv(shost); int ret = -1; - if (pci_device_is_present(hdev->pdev)) { + if (hiraid_pci_is_present(hdev)) { ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_SHST_MASK); ret >>= HIRAID_CSTS_SHST_SHIFT; @@ -3930,7 +3906,7 @@ static ssize_t csts_cfs_show(struct device *cdev, struct hiraid_dev *hdev = shost_priv(shost); int ret = -1; - if (pci_device_is_present(hdev->pdev)) { + if (hiraid_pci_is_present(hdev)) { ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_CFS_MASK); ret >>= HIRAID_CSTS_CFS_SHIFT; @@ -3946,7 +3922,7 @@ static ssize_t csts_rdy_show(struct device *cdev, struct hiraid_dev *hdev = shost_priv(shost); int ret = -1; - if (pci_device_is_present(hdev->pdev)) + if (hiraid_pci_is_present(hdev)) ret = (readl(hdev->bar + HIRAID_REG_CSTS) & HIRAID_CSTS_RDY); return snprintf(buf, PAGE_SIZE, "%d\n", ret); @@ -4239,22 +4215,78 @@ static ssize_t dispatch_hwq_store(struct device *dev, return strlen(buf); } +static ssize_t sata_ncq_prio_enable_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct hiraid_sdev_hostdata *hostdata; + + hostdata = to_scsi_device(dev)->hostdata; + return snprintf(buf, PAGE_SIZE, "%d\n", hostdata->sata_ncq_prio_enable); +} + +bool scsi_ncq_prio_support(struct scsi_device *sdev, struct hiraid_dev *hdev) +{ + unsigned char *buf; + bool ncq_prio_supp = false; + + if (!scsi_device_supports_vpd(sdev)) { + dev_err(hdev->dev, "scsi device not support vpd\n"); + return ncq_prio_supp; + } + + buf = kzalloc(HIRAID_SCSI_VPD_LEN, GFP_KERNEL); + if (!buf) + return ncq_prio_supp; + + if (!scsi_get_vpd_page(sdev, 0x89, buf, HIRAID_SCSI_VPD_LEN)) + ncq_prio_supp = (buf[NCQ_PRIO_SUPPORT_BYTE] >> + VPD_NCQ_PRIO_SUPPORT_BIT) & 1; + + kfree(buf); + return ncq_prio_supp; +} + +static ssize_t sata_ncq_prio_enable_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + struct scsi_device *sdev = to_scsi_device(dev); + struct hiraid_sdev_hostdata *hostdata = sdev->hostdata; + struct hiraid_dev *hdev = shost_priv(sdev->host); + bool ncq_prio_enable = 0; + + if (kstrtobool(buf, &ncq_prio_enable)) + return -EINVAL; + + if (!scsi_ncq_prio_support(sdev, hdev)) { + dev_err(hdev->dev, "sdev[%d:%d] not support ncq prio\n", + sdev->channel, sdev->id); + return -EINVAL; + } + + hostdata->sata_ncq_prio_enable = ncq_prio_enable; + + return strlen(buf); +} + static DEVICE_ATTR_RO(raid_level); static DEVICE_ATTR_RO(raid_state); static DEVICE_ATTR_RO(raid_resync); static DEVICE_ATTR_RW(dispatch_hwq); +static DEVICE_ATTR_RW(sata_ncq_prio_enable); static struct device_attribute *hiraid_dev_attrs[] = { &dev_attr_raid_state, &dev_attr_raid_level, &dev_attr_raid_resync, &dev_attr_dispatch_hwq, + &dev_attr_sata_ncq_prio_enable, NULL, }; static struct pci_error_handlers hiraid_err_handler = { .error_detected = hiraid_pci_error_detected, .slot_reset = hiraid_pci_slot_reset, + .reset_prepare = hiraid_reset_pci_prepare, .reset_done = hiraid_reset_pci_finish, }; @@ -4518,14 +4550,17 @@ static void hiraid_remove(struct pci_dev *pdev) dev_info(hdev->dev, "enter hiraid remove\n"); - kthread_stop(g_hiraid_submit_task); - hiraid_delete_all_io_list(); + kthread_stop(hdev->hiraid_stream_submit_task); + mutex_lock(&g_stream_operation_mutex); + hiraid_stream_delete_io_list(hdev, 0, 0, 0, + HIRAID_STREAM_TYPE_DELETE_ALL_IO_LIST); + mutex_unlock(&g_stream_operation_mutex); hiraid_destroy_io_stream_resource(hdev); hiraid_dev_state_trans(hdev, DEV_DELETING); flush_work(&hdev->reset_work); - if (!pci_device_is_present(pdev)) + if (!hiraid_pci_is_present(hdev)) hiraid_flush_running_cmds(hdev); hiraid_unregist_bsg(hdev); @@ -4548,8 +4583,12 @@ static void hiraid_remove(struct pci_dev *pdev) static const struct pci_device_id hiraid_hw_card_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, HIRAID_SERVER_DEVICE_HBA_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, + HIRAID_SERVER_DEVICE_HBAS_DID) }, { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, HIRAID_SERVER_DEVICE_RAID_DID) }, + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI_LOGIC, + HIRAID_SERVER_DEVICE_RAIDS_DID) }, { 0, } }; MODULE_DEVICE_TABLE(pci, hiraid_hw_card_ids); @@ -4572,22 +4611,9 @@ static int __init hiraid_init(void) if (!work_queue) return -ENOMEM; - hiraid_class = class_create(THIS_MODULE, "hiraid"); - if (IS_ERR(hiraid_class)) { - ret = PTR_ERR(hiraid_class); - goto destroy_wq; - } - ret = pci_register_driver(&hiraid_driver); if (ret < 0) - goto destroy_class; - - return 0; - -destroy_class: - class_destroy(hiraid_class); -destroy_wq: - destroy_workqueue(work_queue); + destroy_workqueue(work_queue); return ret; } @@ -4595,7 +4621,6 @@ destroy_wq: static void __exit hiraid_exit(void) { pci_unregister_driver(&hiraid_driver); - class_destroy(hiraid_class); destroy_workqueue(work_queue); } -- 2.45.1.windows.1

2 5

[PATCH OLK-6.6] mm/sharepool: Support numa_maps
by Yin Tirui 09 Jun '26

09 Jun '26

Support numa_maps Signed-off-by: Yin Tirui <yintirui(a)huawei.com> --- mm/share_pool.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 126 insertions(+), 4 deletions(-) diff --git a/mm/share_pool.c b/mm/share_pool.c index b6e543f1cefc..4e11c60db2fc 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -3984,11 +3984,125 @@ static void get_process_non_sp_res(unsigned long total_rss, unsigned long shmem, static void print_process_prot(struct seq_file *seq, unsigned long prot) { if (prot == PROT_READ) - seq_puts(seq, "R"); + seq_printf(seq, "%-4s ", "R"); else if (prot == (PROT_READ | PROT_WRITE)) - seq_puts(seq, "RW"); + seq_printf(seq, "%-4s ", "RW"); else - seq_puts(seq, "-"); + seq_printf(seq, "%-4s ", "-"); +} + +static int sp_numa_pte_entry(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + unsigned long *size_array = walk->private; + pte_t entry = ptep_get(pte); + struct page *page; + int nid; + + if (pte_present(entry)) { + if (is_zero_pfn(pte_pfn(entry))) + return 0; + + page = pte_page(entry); + if (page && !PageReserved(page)) { + nid = page_to_nid(page); + if (nid >= 0 && nid < MAX_NUMNODES) + size_array[nid] += PAGE_SIZE; + } + } + return 0; +} + +static int sp_numa_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + unsigned long *size_array = walk->private; + pmd_t entry = pmdp_get(pmd); + struct page *page; + int nid; + + if (pmd_present(entry) && pmd_huge(entry)) { + if (is_zero_pfn(pmd_pfn(entry))) + goto skip; + + page = pmd_page(entry); + if (page && !PageReserved(page)) { + nid = page_to_nid(page); + if (nid >= 0 && nid < MAX_NUMNODES) + size_array[nid] += PMD_SIZE; + } +skip: + walk->action = ACTION_CONTINUE; + } + return 0; +} + +static int sp_numa_hugetlb_entry(pte_t *ptep, unsigned long hmask, + unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + unsigned long *size_array = walk->private; + pte_t pte = huge_ptep_get(ptep); + struct page *page; + int nid; + + if (pte_present(pte)) { + if (is_zero_pfn(pte_pfn(pte))) + return 0; + page = pte_page(pte); + if (page && !PageReserved(page)) { + nid = page_to_nid(page); + if (nid >= 0 && nid < MAX_NUMNODES) + size_array[nid] += (~hmask + 1); + } + } + return 0; +} + +static const struct mm_walk_ops sp_numa_walk_ops = { + .pte_entry = sp_numa_pte_entry, + .pmd_entry = sp_numa_pmd_entry, + .hugetlb_entry = sp_numa_hugetlb_entry, +}; + +static void print_process_numa_maps(struct seq_file *seq, struct sp_group_node *spg_node) +{ + struct mm_struct *mm = spg_node->master->mm; + struct sp_group *spg = spg_node->spg; + unsigned long size[MAX_NUMNODES] = {0}; + struct rb_node *p, *n; + struct sp_area *spa; + int nid, ret; + + if (!mmget_not_zero(mm)) + goto out; + + for (p = rb_first(&spg->spa_root); p; p = n) { + n = rb_next(p); + spa = container_of(p, struct sp_area, spg_link); + + if (spa->type != SPA_TYPE_ALLOC || + spa->applier != spg_node->master->tgid) + continue; + + mmap_read_lock(mm); + ret = walk_page_range(mm, spa->va_start, spa->va_start + spa_size(spa), + &sp_numa_walk_ops, size); + mmap_read_unlock(mm); + if (ret < 0) { + pr_err_ratelimited("walk_page_range failed %d for 0x%lx.\n", + ret, spa->va_start); + continue; + } + + cond_resched(); + } + + mmput_async(mm); + +out: + for_each_node_state(nid, N_MEMORY) + seq_printf(seq, "%-9ld ", byte2kb(size[nid])); } static void spa_stat_of_mapping_show(struct seq_file *seq, struct sp_mapping *spm) @@ -4180,6 +4294,7 @@ static int proc_usage_by_group(int id, void *p, void *data) page2kb(mm->total_vm), page2kb(total_rss), page2kb(shmem)); print_process_prot(seq, spg_node->prot); + print_process_numa_maps(seq, spg_node); seq_putc(seq, '\n'); } up_read(&spg->rw_lock); @@ -4190,6 +4305,8 @@ static int proc_usage_by_group(int id, void *p, void *data) static int proc_group_usage_show(struct seq_file *seq, void *offset) { + int nid; + if (!should_show_statistics()) return -EPERM; @@ -4197,10 +4314,15 @@ static int proc_group_usage_show(struct seq_file *seq, void *offset) spa_overview_show(seq); /* print the file header */ - seq_printf(seq, "%-8s %-8s %-9s %-9s %-9s %-8s %-7s %-7s %-4s\n", + seq_printf(seq, "%-8s %-8s %-9s %-9s %-9s %-8s %-7s %-7s %-4s ", "PID", "Group_ID", "SP_ALLOC", "SP_K2U", "SP_RES", "VIRT", "RES", "Shm", "PROT"); + for_each_node_state(nid, N_MEMORY) + seq_printf(seq, "N%-8d ", nid); + + seq_putc(seq, '\n'); + down_read(&sp_global_sem); idr_for_each(&sp_group_idr, proc_usage_by_group, seq); up_read(&sp_global_sem); -- 2.43.0

2 1